Project to Product: Lessons From a Year of Data Driven Flow Diagnostics

Over the past year, Dr. Kersten and his research team have been collecting exhaustive value stream data sets from enterprise IT organizations undergoing digital transformation.


They have used the Flow Metrics defined in the Project to Product book to trace the path that hundreds of thousands of software artifacts take from inception to running software. As they sliced and diced that data to correlate flow metrics to business results, some fascinating diagnostics emerged.


In this talk, Dr. Kersten will summarize those learnings, and show the correlation of each to one of the Five Ideals from The Unicorn Project. From minor maladies to major pathologies, each of the flow diagnostics offers powerful lessons that help us understand the biggest bottlenecks to achieving DevOps at scale.

DM

Dr. Mik Kersten

Founder and CEO, Tasktop

Transcript

00:00:07

Hello, everyone. It's great to be speaking with all of you today from Vancouver, Canada. I love that DevOps enterprise summit community and the one of the vent because of how it's constantly advancing the state of the practice in dev ops agile and in the end, how software is built. So instead of doing the typical project that probably talk, did they tell you my last 12 months of lessons learned from collecting all of the different flow data that we've measured across many organizations, value streams, I've collected some of the most interesting stories. I'm going to tell you how they actually help highlight some of the five ideals that we've learned from the unicorn project. And we'll show you some of these for diagnostics. I've never seen data of this kind before, where it's live real data from large organizations, not just open source projects, not just smaller, uh, product value streams, but a true enterprise end to end data. So I really hope that this data helps us all advance the state of the art on how software is built and how we can help our organizations transform and scale

00:01:10

Now to really understand data and software at scale, we need to go to where the work is being done in production. We go to the factory floor. So this is what's called gemba. The actual place where work happens, this is something that the owner said was key for executives to take each a portion of each week, have to learn where production happens, how production happens, where waste could be found, and to really understand what's going on in the P on the production line to focus on learning. And it was really directed at executives. So the people who were making decisions about how productions happen, because of course the people on the floor itself, they tended to know what was wrong. They had things well down the line. So just the end on cord, and it became a key part of the management. So what they really want to understand is how these, how going to gemba actually manifests itself.

00:02:03

I did that through a two day trip of going to gemba at the BMW Leipzig plant. So this is me and my colleagues there, and it tastes strata is one of the characters and the project, the product book. And we had this, this two day, not just 45 minutes, like that's the only recommended, but this full today gemba walk. And it was fascinating. I learned amazing things about production around how you connect business to delivery through these amazing advanced production lines. But there's also something that struck me as very different. This is some that don't ride since pointed out. And it's a problem of, of not being able to see and observe directly what happens in parts of production. Now with cars, it's, it's fairly straightforward. So to understand quality problems, we just need to look at the rework portion of the production line, uh, to understand where bottlenecks are.

00:02:50

We actually see where production slows down, or we have the largest error rates, things like, you know, high complexity of, of wiring harnesses. But the challenge is that once we're shift to intangibles, once we are no longer producing cars or physical objects, it gets much more difficult to see what production is like. And again, but walks from all about seeing flow across value streams. So we no longer understand unless we have a way of observing the production of intangibles on software is not tangible. It's not something that we can touch. Uh, we're not understanding how that fall happens. The big challenge with that of course is, and this, this is something that, um, Daniel Conaman talked about in thinking fast and slow, is that what you see is all there is, is a common fallacy that leaders have, but that actually everybody has. So we assume that what we see sufficient, and if we're only seeing things like org charts, if we're only seeing things like, uh, like budgets and cost centers, then that's how we're making decisions on how software should be built.

00:03:51

Now, this is very different to what technologists see when they're building software, because for technologists, those intangibles are very real, real. They're the code that we touch. They're the infrastructure that we're constantly managing and improving and, and debugging. So I notice that there's this very large disconnect between what reality was, what executives saw on the softer side and what practitioners actually saw, and that we need to connect these two worlds. So the question really becomes is how the Skimbo work for intangibles for building software, for building software digital and data assets in our organizations. Is it sufficient that we do management by walking around and executives have conversations with different staff are working on different parts of the software portfolio. Our meetings deficient where you actually did review thing where you review the software that's being built. Is it sufficient to have a big room, a big room planning and all those kinds of ceremonies that we've put into agile frameworks.

00:04:47

And then we have to effectively connect the way that we do planning and conversations around business strategy to the software that's being built. So personally having experienced all those things, worked with countless organizations to see how they implemented them, how they did their things like big group planning or PI planning or implementing a force to get businesses software. I realized that these things really weren't sufficient that the further that you were from the code, the further that you were actually from that you were from the actual work. So the closer you got to the code, the better meaning, the closer that you understood the flows that were actually happening, the problems that were there, the distractions and the issues and the architectural issues that developers were commonly struggling with, or that others were constant struggling with. So the question for me became, how can we actually get this information into the right state to understand, and for people to see, how can we have the right kind of gemba walk, given that conversation simply aren't enough when you're talking about software at very large scale.

00:05:46

And really for me, the Sterny produced the flow framework. The goal of the flow framework was to have a way of inspecting. And you see at the bottom layer of the flow farmer, they're inspecting the work that was actually happening because this magical things happens with software, which is that we use very advanced tools and the work happening in the tools actually represents the work that's going on. So if we could somehow at the right level inspect those tools, then we could maybe get some kind of better view that would make sense both to the technology side and to executives. And a key part of my PhD thesis about understanding value, streams and flow was actually to understand, and to create these abstract obstructions on what was flowing through those tools that represented our work as we work on defects and incidents and improving the software through features and architecture improvements and so on.

00:06:34

So the question became is how can we make this better model, this more abstract model of how we work? We know that we need to connect to where the work is happening. That is the whole point of the Gamba to degrade into that, but that we needed to show this at a more abstract level, because I've seen many executives do things like learning to code classes and such, and it's not quite enough to understand the code because in the end, we need to see the way that code flows, the way that value flows across many value streams and across very complex value streams. So we somehow needed to create something that was value stream oriented. So that was end-to-end that allowed probably vouchers to be defined. And that will allow us to view the flow of work through those value streams. And that's exactly the goal of the flow framework.

00:07:18

And the flow metrics is to define those value streams and then to measure things like flow velocity, flow efficiency, full time and flow load. So I won't go into detail on those metrics here, but philosophy is how much gets done efficiency is the ratio of waiting to active work full-time is how quickly workflows from end to end. So all the way from the business idea strategy to the customer and then flow load, what's the work in progress? What are those metrics? The goal is that we do these for the four flow items that provide a more abstract view that both the technology side and that the business side can agree on. So rather than having all of the granularity of things like working dozens of work, item types and story points for measuring things and so on, when start seeing these things in just these buckets of features, defects, risks, and that, so we can make these trade-offs and what's happened over time.

00:08:05

Is that over the, really over the past two years, I've taken the data here in this presentation from the past year, we've been able to see these dynamics in place. So we've been able to see, for example, how flow philosophy might increase when technical debt is reduced. And so I'm going to tell you stories about these, these different lessons that we've learned by inspecting these very dynamic software value streams. So, and the goal of course here is that where this data is allowing us to see these dynamics and hopefully gives you some insights on how you might apply this to your own organization. Now, something inspiration, uh, for me here was actually understanding and learning more, and then working with gene Kim, uh, on, on the five ideals, because one of the things that was so amazing to me about the unicorn project is just how much of a gemba walk it is.

00:08:53

And as you are worried are, as you're reading Maxine story, or you get immersed in it and you actually get to have that feel of what it's like to be working with code what it's like to be struggling, what it's like to be following through this path of a ticket that, that bottom of the value stream network that you see there, the tool network, um, what it's like to be struggling through trying to get work done through this complex and disconnected and dysfunctional value stream. So a fascinating thing that happened as we were analyzing this work is that we know as the five ideals emerge from the data stats and we actually soft flow stories. And these diagnostics I'm about to show you that highlight the existence of the five ideals as we try to improve flow. And so with the first one that we're going to start on is going to be locality and simplicity.

00:09:39

So the first ideal in Gene's unicorn project and the flow diagnostic, I'm going to show you the highlight, how important this is through real data was an organization that financial services organization who's checking investment in modularity. So they actually understood the value of locality and the fact that you only want to change one place in the code, uh, ideally, or one place in the, in the larger code base to deliver some unit of value. Now, this realization, and, uh, I'm showing you some of the charts of their different, uh, flow metrics right here. I'll just highlight the key parts that you need to understand. They want it to deliver more value there's as usual there's disruption happening. So there's some innovative FinTech companies out there, a much smaller with much less baggage. And they felt that what you see over here in the green is that their flow distribution for features.

00:10:28

So how many features are they able to deliver? As well as their flow velocity for features was just too low. They needed to deliver more value to the market more quickly to really remain relevant and to really deliver what they wanted to their customers. So now the question becomes what gets in the way of that. And as soon as that's happening, as soon as you have all of those feature backlog getting too big, because the business wants those features, uh, the backlogs grow and grow. So somehow they were not able to get enough features done. And what you see here basically is this value stream, the flow load. So the work in progress over here gets so high. There's no way to finish all of this. The work is just queuing up and waiting for longer and longer periods of time. So we've got a really big problem here.

00:11:11

We've got a vice stream that can't deliver on its business goals. Now, if we actually then dig in to what's happening, if we go deeper into the Gamba, we go deeper down to the production line. We have to start looking at things like the user stories and what's blocking the work. A fascinating thing happens. So here's a snapshot of that. It turns out that most work is blocked on core backend service, this the core backend services. And you can see over here, there's some user story and you see it coming up all the time that other parts of the software portfolio of the actual products constantly block on core backend services on this, on this basically on the smaller lift. So what's going on here is that there's this lack of locality, those parts we dug into this further, those parts of the portfolio, of course, we're replicating all the business logic.

00:12:00

That core backend services should have been doing core backend services was painfully, uh, understaffed at this point. And so this lack of investment, because they result in this core back in some respects, monolith made it very difficult for this organization to compete. So what happened here is the data showed that the monolith had to be slayed. This initiative been around for two years, by the way, to slay them on lift, but the data was never visible enough. Of course it wasn't the technology side that was just less visible to the business side to really heavily invest in this rather than in the customer facing parts of the product portfolio, seamless data actually emphasized that ideal of how important locality and simplicity is, how important making this piece more modular for that ideal was to making everything move faster. Now, the second ideal and the journey and finding second idea, all of this data was quite interesting.

00:12:54

So this one is focus, flow and joy, and definitely a personal favorite of mine. So let's see how we uncover this. And some of the flow stories that we came across, this is a transportation services company. And when I looking at the impact that the flow metrics after the COVID-19 shelter at home came into place in the offices and the, in the states where they operate. So at first, this very interesting thing happened. So, uh, the wait time increased significantly after the shelter at home orders. So this was quite disruptive. This is a large organization, a lot of traditional work happening there, but also very, very committed to, uh, to transformation and to be, to becoming a software innovator. So they're tracking these things very closely. Now, what we also see is that in April, this is just this past April full efficiency decreases as well.

00:13:46

So the decreases that well, so the sheltered home is really having a significant impact, but a month after those orders to some of whom were provided, flow load actually starts to begin to decrease. So we're watching this and for me, a really big part of the gemba walk experiences to work with multiple organizations, to help look at their full metrics. So I get to see how each of their production lines work. And whenever you see flow of decreasing, there could be something positive happening. Now, what was actually happening over here is that the backlog, so you can actually see if you look a little bit up flow, velocity went way up what happened. This was an amazing thing. The teams took this opportunity where there was so much disruption, uh, around different aspects of the business. They took this opportunity to basically clear the backlogs, to take all those smaller user stories, all those things that were clogging their backlogs and actually get them all done.

00:14:37

So to stop starting work and to start finishing work. And this is this, to me, it was just amazing experience because I saw some of our own teams do this, uh, internally at Tasktop. We take that opportunity to improve things in terms of work, getting in terms of finishing things and, you know, in a very fascinating way, when you allow all those teams to focus. So they were able to take that work off quickly, finish it quickly without being interrupted without constant context switching, they got a ton more done, and they actually significantly reduced their backlog. As you can imagine, that's something that was quite a difficult time for the organization as it was for many, and for all these individuals and all these teams, the joy that actually comes from delivering this much value in this sort of time is, is very substantial. So that focus that allow the teams to have a lot more flow at a time that they, they started working at home and then deliver so much more for the organization for their customer was just a great thing to see.

00:15:34

So the third ideal, uh, improvement of daily work. So this is a healthcare organization and this health organization is, it was sort of clear. You can see this, uh, and these charts over here that work is purple. So that work means that there's the, uh, we're now working on the reducing technical debt, but what's interesting is just the sheer amount of work in progress as seen in the flow chart here was tremendous, and you can see it's trending up. So once again, and this, by the way in the last year of learnings has been a really common thing, uh, is that we're seeing the flow load on most value streams that we've measured is just too high. And when it's too high, there's a cost of that because if your flow is too high, it turns out your velocity is actually worse. This might be counter-intuitive, but this is something that's been clearly established, but, you know, through the workup in product development flow by Donald Reinertsen and all the follow on work from, from that.

00:16:31

So you even have a Dominica to grant us. Here's one of our timeframes here. Uh, the cost of too much with whip is very substantial as she documented in her book, making work visible. So the problem with this is they never have a chance to catch up again, more demands coming into the value stream. And there is no way to keep it up, keep up because the philosophy is simply not high enough. And so this will, again, do you want these scenarios where things just get worse and worse and worse? So the question is, how do you fix this? Now we see that the flow efficiency, and this is why it's so important to take these four metrics as, as, as dynamics that are measuring this, this complex value stream. So what we see here is that the flow efficiency indicates that a lot of work is started and then stuck in these long waits states.

00:17:15

So there's something wrong with this picture because if it's waiting this long and more workers arriving, we're seeing that it's very hard for the development teams to keep up. So of course our full hypothesis at this point is, okay, well, you know, maybe there's, we just need more developers here. So the fix to this then is to look at this, you know, how can we, uh, how can we reduce this cost of delay because everything is queuing up. And the bottom line was the only way to do that here. And this is a little bit similar to the last story, except, uh, it was not, it was, uh, it was not observed us yet, is that the work in progress? So the flow load has to be decreased if the flow is not decreased, work will delay longer and longer. Efficiency will keep going down and down.

00:18:00

And that cost of delay of actually delivering the work to customers will get worse and worse and worse. So this basically paved the path for the team and, but think of the team as the whole, all the teams that comprise the value stream to work together, to reduce the work in progress. And by reducing that work in progress to actually get more work done at the end of the day, at the end of the sprint at the end of the release cycle, now we're onto the fourth ideal, and the fourth ideal is psychological safety. And this has been a really interesting, interesting one. And I should mention that I, this, this to me was one of the most sophisticated ones, but as I was, we're actually working with Carmen, the ARDA with Dominica, the ground, this, uh, on learning this flow data and with Naomi Lori as well, uh, they started talking about this.

00:18:45

This really came from Dominica, this concept of a flow safety that if teams didn't feel safe in making the work visible, uh, work would not be visible. And again, you would have the wrong assumptions being made by leadership by executives and by others. So let me just tell you the story, then get back to this notion of, of the, of how important this notion of psychological safety is in improving, and then creating that positive feedback loop. This is a telecommunications organization. And the very interesting thing that's going on here is that when we look at the flow distribution, so how much work is being done on, uh, defects risks. That's a features all we're seeing is red and red in these charts as defects. So it seems like on this particular product value stream, which is substantial, what's its numerous agile teams, it looks like only defects are being delivered.

00:19:34

So that seems odd. I'd actually, to this point, never seen a chart that showed only defects. And then we see that the flow load is something very different. So there's all these backlogs of features. And of course they're piling ups. So what, what exactly is going on here becomes the question and what's actually happening is that the, the, uh, in this particular case, the teams had not indicated how their feature work is being done. So feature workers being taken into one system where features were being tracked, where robots were being managed, but the delivery teams had not made their feature work visible. So the question becomes why had they not made the feature work visible? In one case, what we saw it was, it was actually a set of contractors who were simply not using the same agile tool that the organization was using. In another case, a team, a set of teams actually said that they didn't want to make that part of the work visible until they were a hundred percent agile.

00:20:33

And so these are both very problematic statements because it makes it look, it makes things look much worse to the organization, uh, because it looks like there's nothing being done. So you've got basically people, uh, at a higher level in your organization, assuming that, okay, there's, there's not enough progress being done here. Something's fundamentally wrong. Meanwhile, all that's happening is that the teams have not been given the time to actually make their work visible nor the safety. In one case to say, no, you don't have to be a hundred percent agile. We want to help you at an executive level. We want to support your improvement. Now, today we want to, it was actually that same part of the organization. Leadership wants to champion investment tech that meanwhile, of course, there's no feature work. Ortec tech that work visible. So I think the key thing here was this organization realized that they need to work together to make sure that each of the full items was visible, that the team was getting credit, not only for the features that we're delivering.

00:21:25

And so there's multiple teams actually, but they were also getting credit for the risk work that they were doing, uh, as well as for the tech work that they were doing. And so to do that, you really need to put in place the psychological safety then drives that improvement that makes work visible across the value stream. And I think one of the key things I've learned is that actually gives, uh, developers, testers, the ops staff, uh, basically the credit for all of this work that they're doing. Because again, the, the value streams can be overloaded with work, not vice versa. So once it becomes, once it becomes visible, the too high, you you've got too much load, you can actually have the right discussions on how to optimize throughput. If the work is not visible, you don't have the safety to do that. You can't have the right kind of discussions for that.

00:22:09

Uh, here's another interesting one. And the last, this is the fifth ideal. So this is the issue of, of customer focused. So let's take a look at this. This was an interesting one at a health insurance company, which of course is also trying to improve their bottlenecks. And so what we're seeing here, uh, is tons of course have items that are, that have work in progress. So you're probably seeing this trend here, uh, overly high flow load is it seems to be a pretty, uh, consistent thing in the industry, but let's think a little deeper into this. So if we look at where work is actually being done, uh, what's happening is we're seeing there's multiple states and this, by the way, in the Azure DevOps and get hubs and get labs and GRS out there, uh, this is a very common thing in the dev tools.

00:22:55

Is there multiple definitions of what's done? It's the key question is what does done actually mean? And this has been one of a really fascinating learning over, over the course of the past year. So often when we're measuring value, streams done means it's been implemented. It has not been delivered. So when we looked at these formats, we see that, okay, something's pretty fundamentally wrong here because what's happening is it looks like all this work is being done, but the is not getting value. And this is to me, just one of the many examples of why it's so important to structure your value streams and to structure the entire delivery process around customer focus and customer centricity. If you've caught something done, when the dev team is done with it, if you've got something done, when a security interview's done, uh, you're only getting a partial view of the value stream.

00:23:44

And so this actually manifest itself in all sorts of different ways, we've seen overly high flow efficiencies. It looks like flow is very efficient, but that's because you've got a bottlenecks post, uh, post code delivery, or you're only measuring when things have been delivered to a staging or development environment. And again, that's not the customer's perspective. So the key thing we've learned is it's critical to actually measure flow from a customer focus and to measure every value stream, which evaluation is fundamentally about a customer's pull from a customer's focus. So you then reprocess this data. This was an interesting learning to change what it means for it to be done to when it's actually deployed. And you get a completely different chart. If you think of that flow chart what's actually happening. And what we see here is that that analysis showed that a development is not at all where things are piling up, right?

00:24:35

Development's actually been doing a great job handling that flow load. It's actually been in a lack of deployment automation, and it's actually been in the traditional challenges with dev ops. So it's these actually it's, it's measuring this at the right level that allows the business as a whole to understand where do we need to invest? Are we actually as good at our dev ops automation as we thought we were? Or, or is it only working for a few of our value streams that were more forward-looking and do we actually need to invest a lot more in devolves? And that's absolutely one of the trends that we're seeing. And what's fascinating about this is this has been uncovered just by combining simple flow metric flow metrics with the customer centricity of done is not done until that work is delivered. And just because one value stream has gotten very good at their CICB pipeline doesn't mean that this is happening across organization.

00:25:23

And you would actually be just at the very start of your dev ops modernization effort, not at the end. So the bottom line is that to really find that Denbigh, to find what organizations are able to do with advanced manufacturing and do so effectively. Uh, this is a, this is the now back in production, um, in China, uh, we need to see things at the right level of granularity and the flow framework is your tool to do that. So what the flow framework allows you to do is to go away from missing projects and cost centers to measure these product value streams, and to inspecting them, to see how these very complex value streams, uh, that have various kinds of trade-offs that you're making, uh, between how work is taken in, how it's processed, the tools, the frameworks, the technologies that you're using and measure them end to end measure with them with a customer focus, uh, measure where defining what you've got bottlenecks in your locality and simplicity and so on to go away from silos and proxy metrics to flow metrics and business results actually connect those flows that you're seeing to how those are driving business results, rather than measuring these sub of the value stream and getting that false sense of security.

00:26:32

As we just saw in that example, um, of, of flow not being measured end to end, but for only being measured, uh, until until development is done. And then of course, of going from this fragment value stream to an inter integrated value stream network, the same thing that we've got in an advanced production line, and making sure that we've got that measurable network, uh, that it's actually baked measurement into it, as we've seen with advanced production, where whenever you go to the gemba in this case, through these more abstract views, because we're delivering intangibles, you actually see meaningful information that can help you help the teams deliver it can help you make the right investment decisions. So with that, I'll wrap up, uh, if you're interested in the project, the product book, uh, just Google for project, a product it's been published, an it revolution, uh, to get in touch with me, uh, go to LinkedIn or Twitter, and also be hosting an ask me anything session, uh, around this talk.

00:27:26

And we'll be very happy to take your questions and share more of the learnings that we have. We've had over the course of the last year of learning, how to measure these value streams, learn how to diagnose these foot problems and then helping others understand these foot problems, uh, through the lens of the flow frameworks. So I saw also mentioned the flow framework is licensed creative comments. Also, we can reuse it to your heart's content and all of the proceeds go to charity charitable program often proceeds from project to product, good charitable programs, supporting women and minorities in technology. So with that, thank you, please stay it's that stay safe. And I look forward to hearing from everyone.