DevOps & OKRs: From Micromanagement Misery to Finding Flow
Organizations deploying DevOps share the goal of increasing continuous improvement and the flow of value to the customer. However, dated methods of measuring delivery that are not suitable to the age of software continue to enforce old behaviors. This creates a problematic mismatch, with DevOps teams constantly fighting against business metrics that do not support fast feedback and learning. The solution to this measurement dysfunction comes from the combination of OKRs and Flow Metrics. Focusing all stakeholders on visualizing and measuring the flow of value, and providing autonomy to value streams on how to optimize flow, creates the conditions for accelerating transformation while empowering teams.
In this talk, I will share the past two years of lessons learned from large-scale transformations that leveraged OKRs and Flow Metrics to create an end-to-end organizational wide feedback loop, as well as pitfalls that snapped some organizations back into the waterfall ways of measuring delivery.
Dr. Mik Kersten
Founder and CEO, Tasktop
Chapters
Full transcript
The complete talk, organized by section.
Mik Kersten
Hello everyone. My name is Mik Kersten. I'm the founder and CEO of Tasktop and the author of "Project to Product." I'm thrilled to be part of this learning community and to share some of the main things I've learned since the April-May timeframe, when I last gave a talk on tracking DevOps metrics around flow and measuring objectives and key results, or OKRs.
I hope to share my latest learnings on how we go from micromanagement misery to finding flow for our organizations. This has been a hot topic in many executive discussions I've been having as OKRs have become more important. At the same time, over the past year, I've seen a lot of failure modes around OKRs and a lot of organizations struggling to adopt them effectively, and struggling to shift away from the ways we've been accustomed to tracking activities instead of outcomes.
The goal is to show you how you can use OKRs to help move from project to product, from outputs to tracking outcomes, and to help your organization move faster in becoming a digital and product-oriented innovator.
OKRs for many people probably started with reading books like John Doerr's "Measure What Matters," which in 2018 defined objectives and key results: objectives are meant to be qualitative and inspirational. They give us a way to point the direction around driving innovation and driving value to our customers and to the market. The key thing is having only two to five KRs, or key results, which are metrics of progress: how we track how we're doing.
In an interesting coincidence, this week is my tenth year of setting OKRs for Tasktop. We're finalizing this particular week, and it's been a learning journey of over a decade to understand how to get better and better at it. We've made significant changes in this cycle, and I've also been learning how OKRs get adopted at massive scale in organizations that have tens of thousands of people relying on alignment around OKRs.
The key thing about OKRs is that, in the end, what we want to track is outcomes, not activities. Projects tend to specify activities rather than the important outcomes for what we're delivering to the business, to the customer, to the market, and to our partners. This is where the Flow Framework came in. The Flow Framework originated out of my own work within our own organization: understanding how to connect these business metrics to actual business results, key results, and outcomes.
What the Flow Framework basically says is that we need to understand what we're measuring those outcomes for. Everyone working on that particular value stream needs to understand what outcomes they're driving and who the customer of those outcomes is. Business key results tend to be well understood already: improving things like Net Promoter Scores, retention, or financial metrics.
But the key thing I realized was missing was a way of tracking how we get there. How do we know whether we're improving? How do we know whether the DevOps initiatives we have underway are making it easier for our teams to deliver those business outcomes to the customer? The idea of flow metrics is to have a set of flow key results that tell us whether we're improving or whether we have impediments and bottlenecks slowing us from improving, in terms of how easily we're able to deliver those outcomes and understand where we need to invest. Are we lacking DevOps automation? Are we lacking important platform components?
The idea is that we track both business key results and flow key results. When used effectively, OKRs can help catalyze the shift from project to product. The challenge, and I'll speak to some pitfalls shortly, is when we snap into old behaviors while renaming them with OKRs.
Here are a few of the bigger pitfall examples I've encountered of how things can go wrong with OKRs. One of the main ones is when we basically bring back the same behaviors. The whole idea around OKRs is to make sure we're empowering teams to set their own targets, understand the outcomes, and have those aligned to business outcomes. However, OKRs will often get used as a way of micromanaging teams: for example, to just accelerate this feature, getting to the point where OKRs are dates for things that should actually be tracked on release plans and roadmaps. We're moving away from how we should be doing Agile and planning, and how we should be tracking those activities, and snapping back into effectively waterfall ways of planning.
When OKR cascades become a whole bunch of projects and activities that tell teams what to do, we're completely missing the boat. That's exactly what we should be using OKRs aligned to Agile to steer away from.
Another pitfall I've noticed across the board is when the only key results being tracked are business and financial metrics, so metrics based around cost. If we're only ever tracking cost, we fall back into that cost-center trap rather than tracking innovation and tracking how investments are driving business outcomes and outcomes for customers. We need to complement those financial metrics.
Another common pitfall is to take team-level telemetry and metrics, such as uptime, deploys per day, user story point velocity, and those sorts of things, and say those should be organizational key results. When you take a team's telemetry, which is very important telemetry for that team, and set it as a goal for the entire organization, you fall into the local optimization of the value stream trap. As John Willis said beautifully, it's measuring a twelve-inch value stream with a two-inch ruler. We need both sets of metrics, and we need to understand the interplay between them.
Another pitfall is when OKRs are introduced and feedback cycles are much too slow. By design, OKRs came from Silicon Valley. They are around product-oriented innovation rather than project management: the way we operate digital and innovation. When you have cycles taking 120 days to deliver value and provide feedback, you can't even adopt OKRs. With the OKR guru Felipe Castro, in a webinar and a Project to Product podcast we did recently, we realized that when organizations can't deliver features to customers within 90 days, and you're deploying OKRs there, your feedback cycle is far too slow. You can be in a situation where the flow within the organization today doesn't support deployment of OKRs, which is something we want to move away from, because OKRs can be such an effective vehicle.
The main thing I want to get across is that in each of these cases, we don't have the three ways of DevOps in place: the flow, feedback, and continual learning needed to adopt OKRs. In the end, OKRs are a learning vehicle for the organization. They speed our decision-making. They help us pivot and make adjustments faster. They help us experiment and test hypotheses faster.
How do we move away from that? The fundamental thing is to understand flow, and to understand that when we have key constraints that are not measured, where we can improve them, we fall into traps that make it too difficult to adopt this fast-paced Agile planning and goal-setting methodology.
I'll relate experiences with some data. I'll share stories of what we've seen over the past year from organizations that have adopted a shift to value stream thinking and are starting to shift from project to product, but have challenging starting points. The majority of this data set, collected through Tasktop Viz, is from organizations adopting OKRs or very similar methodologies.
One common factor across this substantial data set is that only 80% of what's planned by Agile teams gets delivered. You can have the best-set OKRs, but if that's the ratio of what's planned to what's delivered, there is a very big mismatch between the capacity of the organization to deliver on the OKRs and the OKRs being set. These things have become very disconnected in many organizations.
Another startling finding is that 20% of features are canceled after code has been written. Plans and goals come in from the business, driven by customers and demand, and scope changes with that frequency. That 20% of capacity that has been started is being thrown away, making flow load and work in progress, or WIP, challenges even bigger. Again, there is a big disconnect between how planning is being done and how we're delivering.
Thirty-five percent of product value streams we're seeing have no capacity for new work, another disconnect between the capacity of value streams and the way those value streams are being planned. Of course there is an assumption that there is capacity for new work; otherwise why are we planning those outcomes?
Eighty-five percent of products under-invest in security and debt. This exacerbates the deployment of things like OKRs when there hasn't been investment in the improvement of work to take on the new work being planned.
Finally, 95% of value streams don't know their actual efficiency, and in many cases don't know their capacity. Capacity is not being communicated to the business or to the planning cycle. This is some of the ground truth we're seeing. The goal is to improve these numbers and get to good alignment between the work being planned, the work being delivered, and most importantly, the feedback loop.
That's really the goal of approaches like OKRs: to establish an effective and fast feedback loop between business planning strategy, understanding and measuring and delivering to the customer base, and what we're delivering through our daily work.
The thing that strikes me most is how often measuring flow has not been part of how we plan and work. In most of my presentations to leadership teams, I've gone back to Gene Kim's tried-and-true quote from "The Phoenix Project" that improving daily work is even more important than doing daily work.
If I could state one challenge around the way OKRs are being deployed today, it is that they ignore the improvement of daily work. We're asking our teams to do so much and change so much. Their backlogs tend to be so large, and if we don't add capacity to the improvement of daily work, we will just overload them even more and get even less out, as you saw in some of those statistics.
The question becomes: how can we use OKRs to drive the improvement of daily work? To do that, because OKRs are meant to speak in terms of outcomes, those key results we're delivering to the customer and to the business, we need to connect improvements in daily work to those business outcomes.
I'll share a few short examples of how I've seen this work successfully in organizations. First, we need to get on the same page about how to measure improvement and how to track flow across technology, business stakeholders, and the way we measure and operate. Fundamentally, to improve flow, we need a way of measuring flow that we can all agree on.
As a quick recap of the Flow Framework, which you can see more about on flowframework.org: the question is how we measure what flows in software delivery so that we know if we're improving it, making the daily work of our teams better, and removing burden from them versus adding burden. The goal is that, as we're creating new approaches like deploying OKRs, they support this improvement of daily work, not just put more on the teams.
What flows in software delivery according to the Flow Framework is features, defects, risks, and debts. All work and all issue types in your Agile tool map into one of these, and these things are a zero-sum game. They're mutually exclusive and collectively exhaustive. If a value stream can take on more feature work, chances are it can drive more outcomes. However, if there's too much technical debt, that takes away capacity from the value stream. If defect rates go too high, or if there are too many incidents because of lack of investment in stable platforms, that also takes capacity from the value stream.
The point of the Flow Framework is to expose and make visible this zero-sum game on every value stream, and to allow us to create organizational objectives around improving that, in addition to creating objectives around revenue, cost, and customer outcomes.
Just like there are four flow items, there are four flow metrics, plus flow distribution. Flow velocity is how much we're able to deliver across a period of time, from the business and customer's point of view: end-to-end velocity, not how long it takes one team to do it, but all the way from when work entered the value stream to when it's done. Flow efficiency asks where the bottlenecks are. Flow time asks how long it took to deliver from end to end, from when work was committed to when we had running software. Flow load is the WIP metric: how much work in progress there is.
As an example, we have a visual of a value stream that has a bottleneck and looks fairly clogged. Some badly structured OKRs clogged it. When OKRs are basically around "get me this next feature" and "where is my feature," too many features get put into that value stream, that set of teams delivering value for a customer. When they're being micromanaged, that causes thrashing. The canceled work you saw, that 20% of work being canceled, is exactly what's happening. When work is canceled, additional work is put in, and everyone struggles to keep up while new work is coming in.
The cause is that planning and demand are not connected to the actual flow and capacity of the value stream, nor is understanding bottlenecks part of planning. You keep cramming more into the pipe without improving the width of the pipe, and you get predictable and problematic results. When you're ignoring capacity, you keep increasing work in progress. One of the biggest findings we've seen is that overloaded value streams are endemic across enterprise organizations.
Good OKRs are an opportunity to improve that. Bad OKRs make it worse; good OKRs are a great opportunity to improve it. One of the main things good OKRs do is cascade business outcomes to the value stream so that everyone understands the outcome we're trying to drive. Are we trying to capture more customers? Keep customers happier? Grow our customer base? Make a business partner happy? Make an API more stable because so much of our business builds on it?
To do that, we need to measure the flow of value. We need to understand where bottlenecks are happening and how we can improve the situation to have capacity for more work, because most organizations are constrained by the capacity of their value streams. Then we need to prioritize learning and improvement, always making sure that once we've found one bottleneck, we're learning where the next bottleneck is and improving there.
I'll give some examples of how meaningful this approach can be. In one financial services organization, a sizable bank, one of their key results was to improve time to market by reducing flow time. They made that an objective and measured it by how many days feature flow time improved. They saw a reduction from 55 days down to 38 days across the organization. By removing bottlenecks, implementing DevOps automation, implementing security scanning automation, and fixing upstream bottlenecks, that flow key result translated to an $800 million revenue pull-forward because of all the capacity they had to deliver the digital experiences their business cases and digital transformation were based on.
Another example is reducing the cost of quality. Many of us work with large legacy systems, and some of those have frequent outages and incidents. Investing in test automation, test harnesses, and platform stability unlocked defect resolution time being reduced by 70%, made life much easier for those teams, and unlocked $52 million of revenue growth from a much healthier customer base.
A large securities organization drove a 40% acceleration of feature delivery by reducing technical debt. They had measurement of investment in technical debt and the outcome of that being additional feature delivery, and then the outcome of that additional feature delivery capacity from effectively the same size organization drove a $140 million revenue pull-ahead. This shows how we can use objectives and key results, and measuring flow as the key results, as leading indicators of the business outcomes we're after.
We need to understand the different types of metrics and what we can measure as key results to get those results. Business metrics tend to be very well understood in every organization for the business-facing parts of the portfolio. One key thing in the shift from project to product is to provide business metrics for platform products, shared services, and delivery of the CI/CD pipeline itself. Each needs some kind of outcome metric: how many developers are on the new pipeline, how many business applications are using the new APIs or cloud environment, and so on.
Flow metrics provide a leading indicator of how we're doing. If we're able to deliver more features with less toil, less burden, and more quickly, we should be driving the customer outcome and the business metric. The interplay is that flow metrics are leading indicators, while business metrics tend to be lagging indicators. Driving additional revenue or driving costs down tends to happen in many-month cycles, whereas flow metrics can be brought into monthly and quarterly review cycles to understand whether things are moving fast or not.
Do not throw away team metrics. Organizations sometimes think deploys per day are no longer representative because they can deploy multiple times a day. You still need to measure how many times you deploy per day. You very much need that DORA metric, but chances are you need a bigger-picture metric to understand where your bottleneck is upstream or downstream of deployments. These metrics are meant to work together. Flow metrics are end-to-end and cross-team, cross-value-stream, rather than focused on improving one part of the value stream. When you find a bottleneck, then it's all about those particular metrics. If the bottleneck is in the safety and reliability of your deployment pipeline, that's where you're focusing, and those are the metrics you improve as part of your quarterly OKR for that part of your portfolio.
Roadmaps, release plans, and OKRs need to work together. Roadmaps and release plans define what gets delivered and in which order. I briefly flashed a visualization of the Suez Canal in terms of needing to think about improving flow and what happens when we have big bottlenecks. We can think of roadmaps as the order of container ships that need to go through that canal and be delivered to customers in order to minimize delays and maximize outcomes.
Roadmaps alone are not enough. They need to be aligned to OKRs, but in the end we need value stream OKRs around accelerating flow, accelerating how quickly we can deliver on those roadmaps, and finding bottlenecks, like the Ever Given stuck in the Suez Canal. Think of value stream OKRs for every value stream independently: how do we widen that canal to get more ships through? How do we make it easier to get ships through? How do we make it safer to get ships through that canal?
In addition to looking at the roadmap, which defines what will get done and whose delivery defines customer and business outcomes, always look at how we make it easier to deliver, how we make it better to deliver, and how we get to the five ideals from "The Unicorn Project" in our daily work. Becoming measured around that is where OKRs are a great tool.
Organizational OKRs are where transformations and large-scale digital transformations can use OKRs to track the improvement of work. One big challenge is that many large transformations have taken a completely waterfall approach to their Agile transformation, DevOps transformation, or digital transformation as a whole. We want to measure the transformation. How do we build new canals in terms of transformations? How do we build a new cloud platform that we move many business applications and customer-facing applications onto? How do we track whether we're getting the right results from that, whether that canal is working, whether we dug it too shallow, or whether there are unforeseen circumstances that mean it is not as productive as we thought? Understanding organizational OKRs is key to measuring iteratively and applying flow, feedback, and continual learning, the principles of DevOps, to the transformation itself.
Here's an example from an insurance company. One of their OKRs was to become the most innovative insurer in their industry. They had relatively small market share; there were larger insurers and insurtech companies, and they wanted to grow market share from 6% to 30% over the course of their planning window, which was a year. That's a significant stretch.
They created an aspirational KR to reduce the time it took to provision a policy, cutting it in half from 43 to 20 days. This is all about the customer: how can we get the customer their home insurance or car insurance in half the time? Cascading that down to every value stream and team supporting this is extremely valuable because it makes you rethink the technology architectures, sign-on, authentication, and sign-up approach you're using in case that's the bottleneck to the customer journey.
The last key result they tracked was to improve flow time by 20%. That's how they captured the need to drive improvement within the value stream itself. That last one is powerful because it's not just prioritizing work; it's helping us improve daily work.
The mobile application value stream picked this up. They said, "For us to become most innovative, customers have to love our mobile experience. Our star ratings are not great right now. We need to improve our mobile experience." They translated that into increasing Net Promoter Score for mobile applications from 31 to 60. To do that, they realized they needed to get more features to customers: features that delighted customers, made it easier to authenticate, and made it easier to buy insurance products. They needed feature flow time down from over 20 days to under 10 days.
From flow efficiency improvements at the organizational level, they already had experiments going to understand where the bottlenecks were. From those experiments and their measurements, they realized that verification of new features by business partners was the bottleneck. The bottleneck was not the development teams themselves. So they set an aspirational goal within one quarter: target zero days wait states on business input.
As a result, flow time went from nearly 30 days to well under 10 days on average for new features. Net Promoter Score started climbing, but NPS was a lagging indicator because it resulted from delivering those great features. If they had delivered the features and NPS had not improved, that would indicate another problem: a disconnect between investment in flow, building great features, and driving customer outcomes. In this case, they could see a direct correlation between improving flow and driving business outcomes.
This is the kind of feedback loop we want to put in place to make sure the activities we're doing and how we're improving work are actually driving customer and business outcomes, and then feeding back into planning cycles. This was a major success for the organization and taught them the importance of faster feedback loops.
Another quick example is a platform team in the same organization, not a customer-facing team. They realized they had done such a fast lift and shift to move core platforms and systems to the cloud that they were making inefficient use of storage services, because they were not using managed services. They had moved many databases to be hosted, and per-application hosting costs were prohibitively high. Part of the ROI of cloud was supposed to be improving the cost profile, not making it dramatically worse.
The teams knew they had brought on technical debt in this fast move to cloud because they were not leveraging the new storage services available from their hosting provider. They spent a release cycle where their flow velocity was dedicated to tech debt reduction, and they tracked both that flow metric and application hosting cost as a key result. As they took down technical debt and moved to the new storage services, the hosting cost bubble was dramatically reduced. They were in a better position for scaling and bringing more of the business onto the new cloud platform.
Whether we're focused on top-line customer results or cost structure, when we're not keeping up with technical debt and not leveraging new technology platforms as intended, this gives us a feedback loop that connects investment in flow and tech debt reduction to actual business outcomes.
In summary, if you're leveraging OKRs or similar planning systems, you can use those to catalyze your shift from project to product. The main part is to make sure that as you focus on daily work, improve your roadmaps for every product value stream, and make your internal products and platforms have first-class roadmaps, you prioritize and measure both daily work and improvement in daily work. Every value stream should always have that flow metric as a key result.
Those flow metrics can help you prioritize and track improvement, and as you saw in the examples, see a fast payback loop on that improvement. They empower value streams to set their own flow key results and connect those to business outcomes. The key thing I've also learned is to celebrate those successes. When you've established that feedback loop and you're driving those outcomes, let the rest of the organization know.
In terms of help I'm looking for, if you're seeing that, I'd love for you to share it on the DevOps Enterprise channels or on flowframework.org. To learn more, check out projecttoproduct.org or the "Project to Product" book, and remember that all author proceeds go to supporting charitable programs for women and minorities in technology. With that, thank you.