I’ve often engaged in conversations on the topic of DevOps metrics and KPI’s and almost always the conclusion is to measure certain components for the DevOps process—be it an operational dimension such a deployment frequency or time, or a quality dimension such as code coverage.
Metrics for DevOps should be chosen in such a way that they complement the reason why DevOps was adopted—to improve business outcomes.
It’s all about flow
DevOps is all about improving flow, taking a systems view of the value generating process and improving that system.
Understand your constraints in delivering value
Before metrics for DevOps are established and measured, flow and value stream mapping must be first understood. The key concept to value stream mapping for DevOps is “Concept to Cash”, which maps the lifecycle of a revenue generating business idea to production deployment, i.e., to the point whereby it is truly generating revenue. Understanding and optimizing this key lifecycle will be rewarded with increased competitive advantage by virtue of faster time to market.
To optimize the value stream, we must apply systems thinking, which requires us to review the end-to-end system, not just local component such as production deployment. Flow through the value stream can be pictured as the time it takes for a business value generating feature to flow through the system to be transformed into a consumable product, i.e., through development, QA and then operations. How well the flow functions is determined by the constraints and hindrances in the systems. For example, manual functional testing would be an obvious constraint.
A Lean Value Stream maps the end-to-end lifecycle of a value generation process (or transformation). To improve a value stream map, the end-to-end process must be mapped at each stage and measured for waste. Waste is the key anti-pattern to Lean, and can be measured by:
Waste = Cycle time – value added time or queuing.
As the diagram indicates, we map the stages that a business idea must flow through before it reaches the consumer, i.e., production deployment, and at each stage it is measured for waste. Mapping a value stream is no trivial task; it requires walking through the value generation process with all stakeholders from business requirement through to development, QA and operations. Essentially we are mapping the lifecycle of user stories from idea to revenue generation.
When we map the flow, we will see many types of constraints in the systems from different sources:
People and culture
Institutional norms and assumptions
A key point is that a (local or micro) metric should be used to measure and improve a local constraint that could improve the overall flow of the system. In the example given above, there is a significant constraint at the QA stage—two days of elapsed time versus one hour of value added time. This could be caused by waiting for availability of a testing system or manual testing. We have now zoned into a known constraint in the system that we can measure with a focus to improve.
Knowing that there are multiple constraints in the system, we can prioritize them and build a scorecard of metrics. The scorecard is the holistic measure of the system. In the example, the metrics could be:
Average or mean time to functionally test a user story
How long does it take to get feedback for functional verification?
Average or mean time to integrate software product
Improved by integrating more frequently or practicing continuous integration
Average or mean time of staging software for testing
Not the act of deployment, but from the time of actual need to software deployment
The constraint could be caused by overzealous IT governance
Once we have understood the constraints, we can be prescriptive on the resolution. In this case the solution could implement Continuous Integration that triggers automated functional testing, allowing QA team to self-serve their own deployments to cloud-based system. The key is not to have arbitrary metrics or KPIs, but they must be chosen and tailored to measure the improvements (of flow) in the system. More importantly, it must be understood that once the constraint is improved, that the highest impacting constraint may move or shift any part of the system, which I explain below as constraints can continually shift.
Sometimes we fail to understand a metric in its rightful context, we choose to report local metrics without understanding the larger picture, for example, deployment time. We could improve deployment time up to a point, but its impact to the overall lifecycle (value stream) will have diminished returns. In fact, precious resources could be wasted in trying to optimally improve deployment time with minimal or no impact to overall flow. We have simply failed to focus on the highest priority constraint.
Constraints are continually shifting
As the constraints improve it is expected that the bottleneck in the system will move to another part of the system. In the deployment automation example we saw earlier, there could be a huge initial gain and overall flow would be greatly improved. However, the constraint could now be manual functional testing and the system therefore could be vastly improved by functional test automation. Don’t fall into the trap of focusing on the wrong priorities. Prioritize and re-evaluate local metrics over time, as they will change.
When developing or recommending metrics:
Map the end-to-end value stream and understand the constraints in the system. Remember “waste = (cycle time – value added time)”.
Establish a macro metric that measures user stories deployed to production (concept to cash value stream). This will require traceability from requirements tracking to production release notes.
Establish (local) micro level metrics to measure individual constraints, such as deployment time, frequency and mean time to recovery.
Prioritize local (micro) constraints based on waste.
Regularly re-evaluate constraints and compare changes in micro metrics to the macro metric for effectiveness. Simple ratios can be used.