Security weaknesses are sometimes due to failure to implement best practices that are less well known.

One of the main causes of issues found in our cloud penetration tests is insecure default configurations that haven’t been changed or misconfigurations. A standard "out-of-the-box" cloud configuration in AWS might leave your resources unprotected. One example is the Amazon EMR service (previously called, Amazon Elastic MapReduce).

It’s worth noting that this is not a vulnerability that is inherent in the product itself but a result of relying on the default configuration. Regardless, AWS has been informed and, in consultation with Accenture, AWS is working to update its documentation for better visibility about secure configuration.

What is EMR?

EMR is an AWS service used for data analytics that conducts extraction-transform-load by interacting with data from various resources such as S3 buckets and Amazon Relational Database Service (RDS) databases. EMR uses the Apache Hadoop framework to process these requests using a cluster-based architecture. Yet Another Resource Negotiator (YARN) is a prerequisite for Hadoop and provides cluster resource management, allowing multiple data processing engines (worker nodes) to handle data stored in a single platform.

Why the default configuration leaves EMR vulnerable


The YARN service on EMR’s main server exposes a representational state transfer (REST) application programming interface (API) which allows remote users to submit new applications to the cluster. By default, security controls that are available in AWS for EMR, such as SSH access, Kerberos authentication and outbound traffic restrictions are not enabled.

When launching EMR, the ‘block public access’ setting is enabled by default, based on AWS region settings, with the exception of port 22, unless this is specifically disabled. As a result, EMR is mostly only accessible within the virtual private cloud (VPC) network (for internal organization users).

Access to these EMR clusters is usually controlled by AWS Security Groups, which are restricted to internal IP addresses, rather than being exposed to the Internet. However, when these clusters are not protected using additional measures, a remote unauthenticated attacker can execute operating system command injection to gain access to the underlying Amazon Elastic Compute Cloud (EC2) instance. This is possible due to Kerberos authentication not being enabled on Hadoop services in the default AWS configuration.

We therefore recommend following the guidelines outlined in AWS documentation when creating an EMR resource:

What is the impact?


The effects can be severe if EMR is exposed directly to the Internet. They can include malicious activities such as crypto mining, potential access to other AWS resources that contain sensitive information (e.g., S3 buckets) and complete infrastructure takeover.

However, as EMR is usually behind firewalls and VPCs, it is most often reachable from the internal network. As such, it can be accessed by, and exposed to, a malicious insider or an external attacker who gains access internally.

Consider the following scenario: A regular Active Directory user is connected to the organization’s VPN and should not have access to any resources in AWS. By scanning private subnets, this user can discover live hosts with the YARN API on port 8088 being exposed on an EMR server. They can then send remote commands to EMR and gain access to the underlying EC2 instance. At this stage, they could enable persistent access to this EC2 instance and obtain the Identity and Access Management token of the EC2 instance using Instance Metadata Service Version 1 (another service that is enabled by default with server-side request forgery implications). This token can then be used to access sensitive S3 bucket contents or call RDS database queries and other services with which only EMR is authorized to communicate.

How can attackers exploit EMR?

An organization would have restricted inbound access from the Internet to the EMR cluster and access would be available to their internal on-premises IP address range via AWS Security Groups. This is the “stock standard” configuration for EMR in most cloud environments.

If the EMR server and the port is accessible, the following proof-of-concept (PoC), written in Python, exists for the YARN package manager remote code execution exploit, which can be executed by a remote, unauthenticated user:

<<< Start >>>

Proof-of-concept for YARN package manager remote code execution exploit, which can be executed by a remote, unauthenticated user.

<<< End >>>

Once executed, this will create a process on the EMR main cluster that will be picked up by one of the worker nodes, which will be visible from the YARN console - http://<remotehostIP>:8088/cluster:

<<< Start >>>

Process on the EMR main cluster

<<< End >>>

The following bash command can be inserted into the Python PoC code.

<<< Start >>>

<<< End >>>

 

This will result in the information being sent back to an attacker-controlled public IP address, since the cluster allows all outbound requests:

<<< Start >>>

Result of running the bash command

<<< End >>>

This can be converted into an interactive reverse shell using the following Metasploit module, because EMR uses the Hadoop framework: https://www.rapid7.com/db/modules/exploit/linux/http/hadoop_unauth_exec/

<<< Start >>>

Interactive reverse shell using Metasploit module

<<< End >>>

An interactive reverse shell would then be initiated in the attacker's machine as the "yarn" user, once one of the EMR worker nodes processes the application (and once the attacker is listening to inbound connections on port 4444 from 0.0.0.0/0):

<<< Start >>>

Interactive reverse shell initiated in attacker's machine

<<< End >>>

This attack can be summarized using the following diagram:

<<< Start >>>

Diagram summarizing attack path

<<< End >>>

Accenture Security is a leading provider of end-to-end cybersecurity services, including advanced cyber defense, applied cybersecurity solutions and managed security operations. We bring security innovation, coupled with global scale and a worldwide delivery capability through our network of Advanced Technology and Intelligent Operations centers. Helped by our team of highly skilled professionals, we enable clients to innovate safely, build cyber resilience and grow with confidence. Follow us @AccentureSecure on Twitter, LinkedIn or visit us at accenture.com/security.

Accenture, the Accenture logo, and other trademarks, service marks, and designs are registered or unregistered trademarks of Accenture and its subsidiaries in the United States and in foreign countries. All trademarks are properties of their respective owners. All materials are intended for the original recipient only. The reproduction and distribution of this material is forbidden without express written permission from Accenture. The opinions, statements, and assessments in this report are solely those of the individual author(s) and do not constitute legal advice, nor do they necessarily reflect the views of Accenture, its subsidiaries, or affiliates. Given the inherent nature of threat intelligence, the content contained in this article is based on information gathered and understood at the time of its creation. It is subject to change. Accenture provides the information on an “as-is” basis without representation or warranty and accepts no liability for any action or failure to act taken in response to the information contained or referenced in this report.

Copyright © 2022 Accenture. All rights reserved.

Abhishek Simkhada

Senior Security Engineer – Advanced Attack and Readiness Operations, Accenture Security

Subscribe to Accenture's Cyber Defense Blog Subscribe to Accenture's Cyber Defense Blog