Recently, one of our major healthcare clients requested us to evaluate business process management software to recommend building or buying a workflow system that manages analytics data pipelines. We were expected to build a minimal viable product (MVP) within three months.
BPM as a starting point
Business process management (BPM) is defined as a discipline in operations management that involves the use of modeling, automation, execution, control, measurement and optimization of business activity flows.
To solve our client’s problem, we began an evaluation of ten companies, then narrowed it down to Flowable and Camunda, which support standard BPMN 2.0 and DMN 1.1 specification.
But while BPM systems satisfy the need to involve human operators in the process flow, they do not connect very well with analytics data pipelines involving the Apache Hadoop ecosystem such as Spark and Hive. There were a few challenges and opportunities:
- While BPMN 2.0 is an industry standard, it describes the syntax and data structure and every vendor implements its own proprietary extensions. Adding proprietary tags into the descriptive language means that the XML model files are not portable across multiple vendors’ systems.
- The bridges to the Hadoop ecosystem require custom code with limited portability.
- Rules engine DMN 1.1 language does not require a lot of custom code. Thus, the variations among multiple vendors are more manageable. With some adjustments, an XML file from one vendor’s modeling system can be loaded into another vendor’s execution engine.
Thinking out of the box
With a tight schedule, we realized we did not have time to port standard BPM software modules and make them cloud-native friendly or to write a complete workflow system from scratch.
We explained various scenarios and the client agreed that we would proceed with a custom solution with the following requirements:
- The system would have a graphical designer workbench to let non-technical managers to do workflow modeling. We wanted something easier than BPMN 2.0.
- Workflow models would be partitioned for each team where members could review the designs and managers could promote and deploy them.
- A workflow consisted of “human” and “system” tasks to accept user inputs or decisions and to map to data pipelines respectively.
- When a workflow was executed, the system allowed each user to see data pipeline status in real-time.
- The workflow system would be integrated with enterprise Single Sign-On (SSO) system.
- Minimal or zero-configuration for deployment to multiple environments.
- Truly cloud-native design with high scalability and automatic self-healing.
We visualized that a modern workflow with a human operator and data pipelines could be represented by an entity relationship diagram. Since the number of tasks in a single workflow does not exceed 1,000 items, we reduced the complexity with an in-memory graph (Google “Guava” library). It is a primitive property graph system but functionally similar to Neo4J.
We defined each task in YAML declaring the required inputs and a named “route” for the corresponding service handler. To enable this loose coupling, we used the Accenture Mercury microservices toolkit so we did not need to spend time connecting the dots together.
As shown in Figure 1, when a workflow is started, it will transverse the graph from the beginning and find the next task. When a task is executed, it simply sends an event to the “service route” that is associated with a microservices “function” anywhere in the system. The concept can be expressed in the following pseudo code:
Since a graph is an entity diagram, it is easy to represent parallel processing with join-n-fork, thus allowing multiple data pipelines to be executed in an asynchronous manner.
The workflow is managed through a “workflow executor” which is a microservices function itself. When an executor fails, the housekeeper will immediately detect the failure and recover the states of the workflow into the next available executor in a reactive and self-healing manner.
Since the in-memory graph keeps the current states, the eventual consistency of the states in the database has no impact to the data integrity of the execution of the workflow.
With over 40 APIs, we wanted our developers to focus in business logic and nothing else. Using the REST automation system, an endpoint is created like this:
An authenticated HTTP request will be converted to an event for delivery to the target service.
This technology not only accelerates software development but also enhances performance with non-blocking asynchronous communication.
Our event-driven architecture extends to the browser using websocket as an event channel. The system can push events to the browser during the execution of a workflow. Even in the MVP phase, we demonstrated single user multiple devices and multi-users in a team to collaborate in real-time, resulting in a very responsive user experience.
Given the short timeline, we had many long nights and weekends.
Nevertheless, this was an interesting journey in creating a complex workflow system in a very short time. We adopted rapid prototyping to deliver a solution that is truly event-driven, cloud-native and fully extensible.
When we first started the project, we were joking that one day this system can replace Jenkins for CI/CD. While Jenkins replacement is a distant target, the system is flexible enough to take care of constantly evolving business requirements.