Recently developed open-source software, Apache Myriad, provides a unified infrastructure for enterprise datacenters. By allowing Mesos and YARN to co-exist, it helps prevent under-utilization of IT resources.
Consider an enterprise data center today. There’s a dedicated Hadoop/YARN cluster assigned for long-running, analytical processing workloads such as log processing, machine learning, recommendation algorithms, and so on. With the advent of application containers such as Docker, companies now maintain a second cluster running on a different resource manager, Mesos, that’s dedicated to running operational applications such as a frontend application for booking flights, a MongoDB server, or even a simple desktop application like PowerPoint! Both analytical workloads and operational applications can be bursty. For example, during a Black Friday sale, online retailers can experience higher than usual traffic which means utilizing more resources (compute, storage) than usual to support their applications. Meanwhile, during normal operations, resources in a given region can be underutilized. This fluctuation in resource utilization “forces IT departments to size each cluster based on peak utilization, letting resources go under-utilized much of the time.”i Additionally, this siloed architecture forces users to constantly deal with moving data back and forth between clusters. These challenges make resource management hard from a DevOps perspective, not to mention expensive. The next sections provide an overview of the two resource managers: YARN and Mesos.
Overview: Apache Hadoop YARN
YARN was primarily created to scale Hadoop. The previous version, Hadoop MapReduce V1, was limited to a couple thousand machines. YARN introduces many benefits to a big data world. It gives users the ability to run databases on top of Hadoop YARN like HBase, SQL like Hive or Impala, machine learning, and analytics like Mahout or Spark. It co-locates resources on top of Hadoop’s HDFS enabling the ability to bring compute to the data instead of data to compute. Now, let’s take a look at how YARN performs job scheduling. When a job request comes into the YARN resource manager, YARN evaluates all the resources available and places the job, making the decision itself and following a monolithic approach. YARN was primarily built to support long-running state-less batch jobs (i.e. the jobs can be restarted if they fail). It is not optimized for long-running operational applications or interactive queries.ii
Overview: Apache Mesos
Apache Mesos is a generic resource manager that supports HPC and Big Data workloads. Prior to Mesos, administrators and developers alike would build systems where each application or cluster lives in a group of systems or servers. This solution is great but only up to a point. When installation requirements start to become more complex and one application cannot own one entire cluster, engineers needed to find other servers and dedicate resources specifically to them. However, that leads to server sprawl and becomes expensive when procuring resources on an existing datacenter or running in the cloud. Other inconvenient scenarios are when servers become unresponsive and the dedicated resources that serve your particular application need human intervention. This forces the engineering team to respond by either finding other servers that are not being used or decommissioning an existing server from an entirely different cluster that has lower utilization and allocating that resource to the impacted cluster. With the advent of application containers, like Docker, we now have a cleaner and more predictable deployment strategy that is already proven to scale. Scheduling in Mesos is quite different from YARN. After determining the available resources, Mesos makes an offer to the framework which can either accept or reject the offer. This non-monolithic approach of a two-level scheduler allows Mesos to support multiple scheduling algorithms across multiple schedulers that each framework might use.
Frameworks: Myriad, YARN on Apache Mesos
Until recently, the two resource managers (YARN and Mesos) were considered incompatible with each other because of their design priorities and scheduling techniques each optimized for their specific workloads. While Mesos can manage the entire datacenter, YARN can only handle Hadoop jobs. Currently most enterprises create a manual partition to allow the two resource managers to coexist. To address the above issues and provide a unified infrastructure for an enterprise datacenter, an open-source project, Apache Myriad was created. Myriad allows Mesos and YARN to co-exist making Mesos the primary resource manager for the datacenter.
When Myriad launches a new YARN Node Manager:
Myriad Scheduler forwards the required configuration and task launch information to the Mesos Master which forwards that to the Mesos Slave(s).
Mesos Slave launches Myriad Executor which manages the lifecycle of the YARN Node Manager (NM)
At launch, Myriad Executor configures the NM by specifying CPU and memory. In the following diagram, the NM is allotted 2.5 CPU and 2.5 GB RAM.
The YARN NM, on startup, advertises configured resources to the YARN Resource Manager (RM). In the following example, 2.0 CPU and 2.0 GB RAM are advertised (the rest are used for running the myriad executor NM)
The YARN RM can now launch containers (C1, C2), via this NM within the specified CPU and Memory limits. iii These containers can now run the analytical workloads such as Spark jobs or MapReduce jobs.
Running Docker applications on the same cluster
With Myriad, we now have the capability to run analytical workloads such as MapReduce jobs and operational Docker-based applications.
We use Marathon (a container orchestration platform providing service discovery, load balancing and health checks) to launch Docker workloads on our cluster. We specify a JSON with configuration information needed to launch the Docker application (See Figure 2).
For example, a typical JSON configuration file will include the name of the Docker image we want to run (e.g. python:3). This image can reside in the public repo (Docker Hub) or a local registry created in the private datacenter. The configuration also includes the number of CPUs needed (0.5), memory in MiB (64MiB), port mappings (mapping the container port 8080, to any available host port), number of instances (2) and any dependencies.
More sophisticated applications can be run by specifying container groups for running an application stack with dependencies.
The cluster setup in the labs consists of 43 nodes running 3.13.0-66-generic kernel, Ubuntu 14.04 operating system. Additionally, we have MapR 5.0, Mesos: 0.26.0 and Marathon: 0.14.0 running on the cluster. With these prerequisites in place, we installed Myriad 0.1.0. As a quick note, about half the servers run on 1GB network infrastructure while the others run on 10GB.
We used Ansible scripts to switch easily between YARN on MapR and YARN on Myriad and to rapidly deploy Mesos, MapR and Myriad efficiently across the entire cluster. Ansible is a configuration management and provisioning tool, similar to Chef, Puppet or Salt. Designed for multi-tier deployments, Ansible models the IT infrastructure by describing how all of the systems interrelate. Figure 2 shows a high-level representation of the Ansible playbook we used to orchestrate and install the nested playbooks or dependencies. As the picture depicts, the playbook installs the Mesos Playbook (consisting of its dependent prerequisites), MapR Playbook and the Myriad Playbook. Using this approach, we can rapidly deploy this environment across all the nodes in our cluster complete with all the dependent software.
For example: Here are instructions that go into the myriad role playbook:
Overview of TeraGen benchmark
TeraGen generates a random dataset that can be used as input data for a subsequent TeraSort run. The user specifies the number of rows to generate and the output directory and TeraGen runs a MapReduce program to generate the data.
The syntax for running TeraGen is as follows:
For example: To run 1 Million rows with output HDFS directory /tmp/myriad-nm-1000000
Here the time system call is used to calculate the time taken to complete the job. It generates three outputs: real, user and sys. Real is the wall-clock time from start to finish of the job call including time slices used by other processes running on the cluster. User is the CPU time spent in executing the process in user-mode (outside the kernel). Sys is the CPU time spent in the kernel executing system calls within the process. So User+Sys gives the total CPU time consumed by the executing process.
Benchmarking YARN on Myriad v0.1.0 Vs. YARN on MapR 5.0
The consolidated cluster running Myriad on Mesos allows us to run MapReduce jobs alongside Docker-based applications. However, for the purposes of this paper, we only focus on benchmarking Myriad’s ability to run MapReduce jobs compared to MapR. For our Myriad setup, we used 32 Node Managers (NM) and 2 mappers with NM specs: 10 vCPU and 12288 MB of RAM. For MapR, we used 35 Node Managers and 2 mappers with similar NM specs. From our initial benchmarking, Myriad performs just as well as MapR, but does not scale to higher workloads (Figure 3). We have encountered issues when performing Teragen MapReduce jobs to generate 1 billion rows with 2 mappers. MapR’s RM will scale up to 10 billion rows and fails when we attempt to hit 100 billion. This doesn’t indicate that Myriad does not work well. The current and first version of Myriad is still in incubation. It will be better in future releases and theoretically speaking, assuming the underlying hardware is the same, it should match the performance of MapR. Our findings prove that the results are the same according to our hypothesis until we are proven incorrect.
Myriad 0.2 promises even better performance
In the last 10 years, great open-source software has been in the forefront of innovation. There’s a plethora of tools disrupting the way we solve problems—from distributed databases like Cassandra to scalable analytical workloads like Hadoop, Docker for containerization technologies, and so on. Apache Mesos, a tool born of necessity in Twitter, based on Google’s Borg, is part of this list of software that scales and provides a distributed operating system level of abstraction to manage resources. Apache Myriad is a framework that connects with Mesos. Its goal is to provide the level of abstraction to bring analytical workloads to Mesos, which manages the entire datacenter. While we were benchmarking the first release of Myriad (0.1), a second release of Myriad in the Apache Incubator was released (29 June 2016). This release is supposed to focus on increased reliability of Fine Grained Scaling, support for running the Node Manager and Auxiliary Services (JobHistory Server etc.) in Docker Containers, and better distribution management of the Hadoop configuration and binaries. Any future work will benchmark Myriad 0.2 as it compares with MapR as well as co-running operational workloads simultaneously with the MapReduce jobs.