Skip to Main Content
Access your saved content
Accenture makes a price-performance comparison between a bare-metal and a cloud-based Hadoop cluster.
Big data is changing organizations’ perspective on information and the value it adds (or does not add) to business. Big data platforms are replacing traditional data infrastructure, offering capacity and performance increases at incremental costs, compared with traditional infrastructure’s exponential costs.
This change in how businesses store and process their data has led them to glean more insight from their existing data by combining multiple datasets and sources to yield a more complete view of their customers and operations. The success of businesses using big data to change how they operate and interact with the world has made many other businesses prioritize big data rollouts as IT initiatives to realize similar results.
Apache Hadoop has been at the center of this big data transformation, providing tools for businesses to store and process data on a scale that was unheard of several years ago.
Read our original price for performance study.
Accenture conducted a price-performance comparison of a bare-metal Hadoop cluster and cloud-based Hadoop clusters. Using an original total cost of ownership model, we created eight different cloud-based Hadoop clusters using four virtual machine instance types, each with two data-flow models to compare against our bare-metal Hadoop cluster. The Accenture Data Platform Benchmark provided us with three real-world Hadoop applications to compare the execution-time performance of these clusters.
We were able to observe the performance impact of data locality and remote storage within the cloud. While counterintuitive, our experiments prove that using remote storage to make data highly available outperforms local disk Hadoop distributed file systems relying on data locality.
Focusing on the price-performance ratio in this study, we wanted to confirm results from a previous Accenture study: Cloud-based Hadoop deployments offer a better price-performance ratio than bare metal. Additionally, our goal was to explore the performance impacts of data flow models and cloud architecture on the Accenture Technology Labs’ Data Platform Benchmark suite.
We continued to explore two divergent views on the price-performance ratio for Hadoop deployments. A typical view is that a virtualized Hadoop cluster is slower because Hadoop’s workload has intensive I/O operations, which tend to run slowly on virtualized environments. The other contrasting view is that the cloud-based model provides compelling cost savings because its individual server node tends to be less expensive; furthermore, Hadoop is horizontally scalable.
Results of this study reinforce our original findings.
First, cloud-based Hadoop deployments—Hadoop on the cloud and Hadoop-as-a-Service—offer better price-performance ratios than bare-metal clusters.
Second, the benefit of performance tuning is so huge that cloud’s virtualization layer overhead is a worthy investment as it expands performance-tuning opportunities.
Third, despite the sizable benefit, the performance-tuning process is complex and time-consuming and thus requires automated tuning tools.
Choosing a cloud-based Hadoop deployment depends on the needs of the organization: Hadoop on the cloud offers more control of Hadoop clusters, while Hadoop-as-a-Service offers simplified operation.
Once a deployment model has been selected, organizations should consider four key areas when selecting a cloud provider:
Carefully considering these factors will ensure that businesses are successful and are able to maximize their performance on the cloud.
January 8, 2014
Skip Footer Links