In search of business agility
In a previous post, we started the conversation about modern engineering, its principles and value proposition. In this post, we continue that conversation by focusing more on why organizations transform. This is a key topic to understand when trying to initiate the conversation with our clients. This way, we can understand what their challenges are and find out more about their aspirations.
In a nutshell, organizations transform because they’re looking for business agility, aligned with these four dimensions of operating the business—speed, quality, cost and culture.
Organizations, simply put, are often not capable of operating at the desired speed.
One common symptom is long lead times between the moment a business requirement is born (a new idea, business offering, legal requirement, etc.) and the moment it is available to end customers, starting to generate business value.
This is often confused with deployment cadence. An organization may be able to deploy new versions every week, but still require many months for an idea to be developed, tested, validated and deployed. We assisted a client that, although having weekly deployments in place, still needed at least nine months to deliver even small enhancements.
Delivering at speed is important—but delivering with quality is paramount. No matter how fast an organization may deliver, if the delivery suffers from bad quality, it produces business distress, increases cost and more importantly causes end customers to suffer downtimes and personal inconveniences. This ultimately makes customers abandon a product or platform.
Organizations must address quality issues by infusing a quality engineering mindset. This is not reduced to better and more thorough testing but spans the whole delivery lifecycle. Quality improvements to the delivery cycle range from a better way of capturing and analyzing requirements, to improved developer tools, static code profiling, a proper test strategy combining unit, integration, functional and performance testing, security testing, and even new techniques for production testing, like A/B testing, shadow testing, canary releases or chaos engineering.
Cost is always a powerful driver for transformation. And for many, it’s the most obvious one. It is a clear indicator to executives that something is not "going well" and may open the door for transformational conversations.
Cost, however, is complex to analyze as it may have multiple, concurrent, root causes. Inefficiencies, idle time, cost of rework, cost of poor quality, and oversized infrastructure bills are among the most commonly named causes. But there might be other less obvious causes, like the cost of opportunity. For example, an organization might miss business opportunities because of its inability to deliver on time and on purpose.
Culture may seem like a minor aspect to consider, but on the contrary is often at the core of the problem.
A bad culture generates a lack of collaboration, siloed groups and rigid business structures with clashing objectives. It is human nature to play the "blame game", ignoring issues and expecting they will “magically” get fixed, dumping problems on others, or setting up barriers between parts of the organization. Without a collaborative nature, people may hide knowledge from the team, with the aspiration of becoming essential, indispensable, more valued (and better paid!).
But it is also human nature to play as a team, closely collaborating to achieve common objectives and always lending a helping hand with pride, ownership and commitment to the work. These desirable behaviors can be learned and fostered to help drive the transformation forward.
Therefore, proper culture is arguably the most important, as with it, everything else tends to fit into place.
Drivers for a successful transformation through modern engineering practices
Now that we know a bit more about why organizations transform, let's introduce the desired end state and the drivers that guide the organization along the transformation journey.
The best way to describe those is through the set of modern engineering practices:
- Change-oriented teams delivering in small increments
To fight against obsolete processes and IT waste, organizations must empower change-oriented teams delivering in small increments. These are cross-functional teams, including all the functions needed to deliver: product owners, business analysts, architects, scrum masters, squad/team leads, data scientists, developers, testers, database administrators, systems administrators, UX and designers, and any other relevant function.
- Lifecycle management and configuration management
To eradicate the code change paralysis, teams must have a proper lifecycle management and configuration management strategy in place. With clear ownership of every aspect of technology systems—software and hardware—everyone on a team knows what needs to be done and how to do it. Barriers fade away and team velocity skyrockets.
- Continuous integration and choreographed pipelines
When there is a lack of automation along the delivery lifecycle, requiring lots of manual effort and boring, error prone processes, teams must unleash the benefits of automation with continuous integration and choreographed pipelines. Mature teams commit changes regularly and often, always leaving the software in a stable, buildable and releasable state. When something is wrong, the team works together to find the root cause and fixes it at the highest priority, before moving onto something else.
- Exhaustive code inspection and peer reviews
To prevent inadequate code analysis, CI and pipelines contain exhaustive code inspection and peer reviews. Setting up peer review mechanisms like pull requests confirms that every change, before it is integrated, has an adequate review and conversation. Teams leverage tools for the arduous effort of reviewing coding standards and adhering to best practices so they can focus on discussing the purpose of any piece of code. Tools play a significant role along the pipeline, validating that everything integrates well and conforms to expected quality gates.
- Automated tests, early and often, in production-like environments
Many times, we have met teams suffering from limited test environments. Ingenious workarounds to enable better local testing are common. But no matter how hard teams try, there are many features and integrations that seem only to be testable during production, leading to complicated situations. Against that, teams must conduct automated tests, early and often, in production-like environments. When teams have the ability and effort allocation to write automated tests, they can repeat those tests as many times as needed, even after the slightest of changes. And when those tests are executed in environments which are representative of what is expected once in production (hardware, software, versions, network and security), they can catch nearly any bug before it reaches the users.
- Software-defined infrastructure and zero-downtime deployments
For many organizations and teams, publishing a change stills requires stop-the-world deployments. Fortunately, we know how to do it better with software-defined infrastructure and zero-downtime deployments. Instead of asking your users to wait a few hours until the new version is ready, it is convenient to create a brand-new environment which conforms to the desired specifications. The new application is deployed to the new environment. Once operational tests are okay, incoming traffic can be progressively routed into the new environment, causing the previous one to be redundant and decommissioned once the process is completed. Everything happens without affecting end customers, even in cases when something goes wrong, and the deployment must be rolled back. Trivial to do, trivial to undo.
- Resilient, self-healing, cloud-native systems
The ability to deliver new ideas rapidly often clashes with brittle, tightly coupled application architectures. This causes components to fail for unforeseen circumstances without any previous warning. We must prepare software for failure, learn from failure and build resilient, self-healing, cloud-native systems. Resilient applications can self-monitor and self-heal when something goes wrong. They are prepared to overcome failures in other parts of the system and will not cascade those failures any further. With the adoption of cloud-native architecture principles, applications are more independent and may even continue working under a certain amount of failure. They can scale (up or out) independently, resist malicious "siege and conquest" situations or even prevent data leakage when a breach is proactively detected.
- Control plane observability and automated operations
The lifecycle story does not end once a new application or a change hits production. It’s common to find production environments hard to observe and operate. Observability is adamant for any healthy product or platform: pervasive business metrics allow for better business decisions, and monitoring computing resources and system activity may identify issues before they are a concern and impact users. But efficient operations are key to keep the total cost of ownership under control. Therefore, the final principle is the need for control plane observability and automated operations.
When a system can collect measures for important observable features and operational metrics for the relevant components in the architecture, it can react to events and activate counter-measures for any situation. The system then attains the so-desired level of operational excellence. Human effort is not wasted on countless, repetitive, manual tasks. Instead it’s spent on highly creative endeavors, which usually orbit around one single and strong motivation: how to serve end customers better and more efficiently.
The principles above may sound familiar to you, as they emanate from well-known paradigms like Agile, DevOps (and DevSecOps), design thinking, systems thinking, The Twelve-Factor App, or Site Reliability Engineering. Although not new, their relevance in the context of modern engineering relies in the need to apply them all harmoniously to succeed.
Once these modern engineering practices are introduced, we emphasize the concept of choreography, versus the more common orchestration. While orchestration is great for smaller, simpler systems, more complex systems require a different approach. Choreography is an analogy to explain how CI and pipelines must be done at scale.
In an orchestra, the conductor instructs the musicians. But the number of tracks that a conductor can follow is limited, setting up a hard limit for the size of the orchestra. However, in a ballet, opera or musical, through the arduous training, every dancer, singer or actor learns their part—who they relate to and interact with, and the movements and paths they perform. The director, along with the rest of the creative staff, works with them—many times in smaller groups. If seen at that scale, the play may not make sense. But once the different sections start to assemble into final play, the choreographed scenes evolve into a magical, virtuous flow of music, dance and acting.
So, in more complex or distributed systems, every component and stage are modelled to know which other components and stages are related to it. A given service has the knowledge (and configuration and scripts) about what needs to be done to get built, validated or deployed to the next stage, and which dependencies are needed.
Technically, we rely more on declarative pipelines—written with the help of useful DSLs which are contained in files under version control along with the component source code, and webhooks—a generic mechanism to signaling other parties that something else needs to be done. In this new "theater" of CI, the CI tools (servers, agents, builders) are "dumb and empty", with just the basic logic to know how to execute the given tasks at any given point in time, but without any previous knowledge about the systems, components, environments, and stages.
That approach is not only of great importance towards scaling systems, but also towards resilience, as the whole CI/CD pipelines that feed the production system can be restored from version control as needed.
In this post we introduced why organizations transform in the context of modern engineering and described the practices that drive that transformation journey while also depicting the desired end state.
In future posts, we will discuss the different possible transformation strategies as well as the key roles that Agile, DevOps and other paradigms play in those strategies.