Skip to Main Content
Access your saved content
Apache Hadoop™ is one of the key technologies at the center of today’s big data landscape.
It provides an ecosystem with multiple tools to store and process large datasets in a scalable and cost-effective way.
When an enterprise decides to adopt Hadoop, one of the key decisions it must make is what deployment model to use. A key consideration in choosing the appropriate model is price-performance ratio.
We have explored two divergent views related to the price-performance ratio for Hadoop deployments, using the Data Platform Benchmark suite by Accenture Technology Labs,
Download the full report [PDF, 132KB]PDF Help
Traditionally, data infrastructure has been something of a gatekeeper for data access; such infrastructures were built as self-contained, monolithic ‘scale-up’ appliances. Furthermore, additional scale required additional resources, which often increased costs exponentially.
Big data platforms, on the other hand, can increase capacity and performance by adding nodes at a linear cost increase. This shift unlocks a world of potential, allowing businesses to mine their data for greater insights by combining different sets of data to gain a broader view of consumer behaviors and interests.
For that reason, big-data technology is changing many business organizations’ perspective on data.
When deploying Hadoop, enterprises have four options:
On-premises full custom: businesses purchase commodity hardware, install the software, and operate it themselves, giving them full control of the Hadoop cluster.
Hadoop appliance: this preconfigured Hadoop cluster allows businesses to bypass detailed technical configuration decisions and jump start data analysis.
Hadoop hosting: as with a traditional ISP model, organizations rely on a service provider to deploy and operate Hadoop clusters on their behalf.
Hadoop-as-a-Service: this gives businesses instant access to Hadoop clusters with a pay-per-use consumption model, providing greater business agility.
To determine which of these options offers the most appropriate deployment model, organizations must consider five key areas:
To aid in assessing the deployment options provided by Hadoop, we assessed the price-performance ratio between bare-metal Hadoop clusters and Hadoop-as-a-Service on Amazon Web Services.TM
We calculated the total cost of ownership of a 24-node 50 TB-capacity bare-metal Hadoop cluster and derived the capacity of nine different cloud-based Hadoop clusters at the matched TCO. The performance of each option was then compared by running three real-world Hadoop applications.
For the experiment, we first built the total cost of ownership (TCO) model to control two environments at the matched cost level. Then, using the Accenture Data Platform Benchmark as real-world workloads, we compared the performance of both a bare-metal Hadoop cluster and Amazon ElasticMapReduce (Amazon EMRTM).
Employing these empirical and systemic analyses, our study revealed that Hadoop-as-a-Service offers better price-performance ratio. Thus, this result debunks the idea that the cloud is not suitable for Hadoop MapReduce workloads, with their heavy I/O requirements.
The benefit of performance tuning is so huge that cloud’s virtualization layer overhead is a worthy investment, as it expands performance tuning opportunities. However, despite the sizable benefit, the performance tuning process is complex and time-consuming and requires automated tuning tools.
June 24, 2013
Skip Footer Links