AWS Step Functions, which debuted at AWS re:Invent 2016, is a basic orchestration tool that provides a “state machine” framework for AWS cloud-native application development. The AWS service provides a straightforward way for application developers to create an execution workflow to coordinate the use of multiple AWS Lambda or Amazon Elastic Compute Cloud (EC2) components in distributed applications running on the cloud.
Use cases for the new service often feature complex computational or business processes that can be broken down into a series of stages. These workflows require a pre-determined process to be executed based on input and output at each stage, such as document and data processing or e-commerce using microservices.
AWS Step Functions is essential to the rapidly growing area of developing serverless architectures. As explained in more depth in this Accenture Cloud blog, each individual AWS Lambda function stored in the cloud works as an interaction layer, running and scaling code with a sole purpose, such as launching a window or calling up and displaying an image. By stringing together hundreds or even thousands of AWS Lambda functions, some types of applications can be scaled up and out to do massive amounts of work, without provisioning or managing servers.
AWS Step Functions is designed to help string those functions together to create a system as a sum of its microservice parts, providing orchestration to distributed applications. The service also makes it easy to build dynamic and robust applications that can respond to inputs and errors in different ways, increasing a developer’s ability to contextualize each execution of the application when required.
Basic orchestration service
AWS Step Functions is meant to be a rapid route to orchestration and requires no additional infrastructure to be provisioned for use, enabling architects and developers to build born-in-cloud applications more quickly. Due to this bare-bones nature, the service provides basic orchestration-as-a-Service functionality and is not meant to replace complex application orchestration software. A related tool, Amazon Simple Workflow Service (SWF), provides more robust orchestration capabilities but is more time- and effort-intensive to use effectively.
Outside of cloud-first development, AWS Step Functions’ nearest cousin would be an enterprise software product from a company such as MuleSoft, SnapLogic or Informatica, which provide more advanced functionality for multi-faceted, complex development projects.
Working with AWS Step Functions
An AWS Step Functions workflow, called a “state machine,” provides the capability for application architects to develop complex distributed, microservice-based applications visually. State machine diagrams are common in systems architecture design and look like a flow chart, helping developers to see the application states and understand how the input and output traverse from the start to the end of an execution (see Figure 1). When an AWS Step Function state machine is created, it leverages this traditional format to show how AWS is stitching together the components of a system behind the scenes to help developers ensure their system is configured as designed.
Figure 1: Example of AWS State Functions script/orchestration flow
AWS Step Functions can also process compute functions in parallel. Based on the outcome of the program when it reaches a certain state, for example, the tool can launch dozens of additional AWS Lambda functions at the same time, sending information simultaneously to multiple other states. An e-commerce example could see a purchase clear and then simultaneously branch down parallel workflows to update inventory systems, inform the company’s finances as required, and begin the fulfillment process. AWS Step Functions can even help orchestrate longer workflows lasting up to a year.
Handful of lessons
Accenture AWS Business Group has been using AWS Step Functions since its release date (see sidebar for more information). We have curated our early experiences into some tips that can help simplify and speed cloud-native development, though it should be stated that with AWS’s rapid service development many of the challenges noted here may be resolved quickly. The lessons learned include:
Cloud-native error handling—Currently, AWS Lambda’s compute power and memory power is limited. Likewise, processing time allowed to execute a function is limited to five minutes.
Prior to the release of AWS Step Functions, if a failure occurred, developers would have to figure out which processing state failed and manually build and maintain custom error handling automation to ensure the application execution restarts at the right point. AWS Step Functions provides a workaround for this issue through a cloud-native capability to automatically retry an AWS Lambda function if it fails—be that a time out, run time error or any other type of failure.
Step function naming parameters—Each AWS Step Functions state machine execution needs to have a unique execution name, otherwise the execution will fail. This largely comes into play when initiating state machine executions programmatically. It is important to create application logic that will do this for every execution. For example, even though it is easy to think of AWS Lambda function invocations as entirely separate executions of their shared code, they will share global variables when the container used to execute AWS Lambda code is initialized by AWS. Therefore, if the state machine execution name is defined in a global context, it will fail to launch the execution.
State machine language limitations—AWS created a JSON language for AWS Step Functions that defines how states will transition to each other. However, there is currently no way to update the JSON document for a state machine once it is uploaded and built in the cloud. Since the code cannot be modified in place, a developer must recreate the state machine and upload the new version to replace the existing version. AWS CloudFormation can be used to simplify this process. Because AWS services are largely API-driven, the process for maintaining AWS Step Functions can integrate neatly with the continuous integration/continuous delivery pipeline.
No out-of-the-box capability to dynamically invoke AWS Lambda—Currently, AWS Step Functions does not horizontally scale to dynamically launch AWS Lambda functions in parallel. For example, if the output of state 1 is 20 messages, it cannot spin up 20 AWS Lambda invocations at state 2 to process those messages individually. State 2 is statically defined as a parallel state in the state machine document, and, therefore, will provision the same amount of task executors each time.
Standardized cloud environment boosts automation options
At the forefront of AWS cloud-native development, Accenture used AWS Step Functions to build a client’s AWS governance solution using AWS Lambda functions. It is essential for companies to follow a strict governance standard in their cloud environments as it will simplify any future automation protocols.
By breaking the solution down into a microservice architecture and orchestrating using AWS Step Functions, Accenture built a solution to regularly monitor the client’s AWS environment, evaluate infrastructure standards (tagging, volume, security group, etc.) against a pre-defined set of rules, and send emails to pre-defined recipients if any types of non-compliant items were discovered.
This architecture provided the ability to side-step existing AWS Lambda development hurdles like processing power and execution time. AWS Step Functions also provided the robust error-handling to make sure the solution executed correctly and actively reported when it did not.
One way to sidestep this issue is to use a dispatching function and AWS Step Functions’ SendTaskHeartbeat capability, which allows for long-running or critical components to periodically check in with the AWS Step Functions state machine execution to make sure processing is occurring as anticipated.
To learn more about AWS Step Functions and apply in your serverless architecture development using AWS Lambda, please contact AWS COE email@example.com A “state machine” defines a collection of interconnected tasks that a program will perform and complete based on the outcome of other tasks. AWS Step Functions visually shows the trigger points (whether an AWS Lambda or Amazon EC2 component) that passes information from one state to the next, or back and forth between states.