Why the Dora Metrics and Feature Management are a Brilliant Combination (US 2021)

Find out how the 4 key DORA metrics, as popularized by the Accelerate book, can be enhanced through the use of feature flagging. Michael Gillett, author of the new book Feature Management with LaunchDarkly, will talk about how decoupling deployments from feature releases, testing in production, and adopting trunk-based development will enable you to deploy more frequently. Each of the DORA metrics (Deployment Frequency, Lead Time for Changes, Change Failure Rate, and Mean Time To Restore) can all benefit from these approaches and through this talk you will be discover how you can accelerate your team’s performance through the use of feature management. This session is presented by LaunchDarkly.

breakoutuslas vegasvegas2021

(No slides available)


Michael Gillett

Head of Development, Betway Group





Okay. Hi, I'm Michael I've recently authored a book about feature management with LaunchDarkly and I'm head of development at a global company based in London today. I want to talk about the ways in which I think the door and metrics and feature management are a brilliant combination and I've used feature management for a number of years. And today I want to share with you the experiences that I've had and some of the ways in which feature management has really enabled us to accelerate the way that we work and improve the door metrics that we follow Share my screen.


So the first, the first thing I want to talk about are the four door and metrics, just to make sure that we're all on the same page with them and refresh anyone's mind if, if you've forgotten what they are. Dora stands for DevOps research and assessment team, and they've taken a look over the past few years, understanding how the kind of dev ops landscape looks and what does a high-performing team versus a low performing team look like. And from that, they were able to pull together these four metrics, the first being deployment frequency, how often can deployments be made? The second is meanly time for changes. How long does a piece of work take to go from a developers machine, from the idea, writing it, committing it, pull requests, all of that good stuff and getting to production. The third is the change failure rate.


So how often of all of these deployments does that change result in a failure? And then finally is the meantime to recovery. So when something has gone wrong, how long does it take to recover from that? Let's just first look at the deployment frequency. So as I said, that, that, that the door team have, have identified what a high performing team and what a low performing team looked like. And actually this is really just a spectrum and we might go to place ourselves on this spectrum. And it's useful to understand this because it does show the level to which, um, our dev ops maturity is. And that's useful to know. And it also gives us something to work towards when we identify where we sit on this spectrum and high-performing team for deployment frequency metric is actually they can deploy on demand whenever they need.


So that could be multiple times a day or might just be one time a day. But the point is, is whenever they needed to release, they can release without that things kind of getting in the way, the processes taking so long, whereas a low team, and maybe once every one to six months, and we're all going to be somewhere on this spectrum. The second one, mainly time for changes. Again, it's a spectrum, but this time, that is where it's less than a day for that, that work to be done, to be committed, to go through the various quality gates code reviews, pull requests, all of that, um, before it ends up on production, it's less than a day to do that. Whereas the low performing team is actually one to six months for that change to get to production. It can be a small change.


It can be a massive change, but the point is even when we're dealing with the small changes, we're still looking ideally at less than a day, big changes. Maybe you're doing kind of small iterations of work, the change failure, right? Again, another spectrum. But this time, what we're looking at here is that for high performing teams, they have a change failure rate of only zero to 15%. So of all of their releases, all of their deployments, zero to 15% of them result in a failure in the production product. Whereas a low performing team is actually much higher. It's between 40 and 65%. And again, I'm sure we can all look at this and kind of get a sense of where, where we are then the final one, the meantime to recovery. How long does it take for us to recover from an issue? Well, the high performing teams is less than an hour, so deployment's gone out.


Something's not right. Let's recover from that. We need to redeploy and we need to roll back. We need to do something okay. It's less than an hour for a high-performance team to get back to a good state. Whereas for low performing teams is one week to one month. And knowing these four door metrics, understanding where we fit within these is useful to know how good we are at just kind of that, that whole dev ops methodology. And many of us here, if not all of us here will be aspiring to be on the high performing teams and believe in a good dev ops practice. And these door metrics are a great way to identify how mature we are with dev ops. So now let's just take a look at feature management and understand what it is by feature management. That, I mean, and then share with you some of the things that I've learned and some of the ways in which we've been able to improve our metrics.


So the first thing to understand with each management is it boils down basically to a feature flag. Now it could also be known as a feature toggle, but the idea here is that we've got a piece of code that we, um, or an experience and we know what that is, but we want to offer, uh, another variation of that as well. Um, it could be a variation of, of, um, improving that feature. It could be that we didn't even have the feature in the first place and that we're going to introduce a new feature, but the point is we're able to encapsulate bits of logic within our app, within the customer experience that we can then determine when we want to turn that on. Really, it's just an if statement where we can return variation a or variation B the important bit where they feature flagging system is how does the evaluation get determined?


Now, feature flagging. Isn't a new concept. It's been around for a long time. It's been done by databases. It's been done by app setting, config, files, all of this, but what I'm really looking at today, and what my experience is really about is actually a modern feature management platform I'm experienced with LaunchDarkly. There are others out there, but what I'm talking about, feature flagging feature management. It is this idea of a modern kind of distributed feature management system that actually is doing the evaluation for us, but there's a lot more to the evaluation than it just being done by someone else actually what's going on is that we are targeting a feature to a user or to a session. So we can have the idea that we're we're, we've got a person, a customer was using our product. We need to be able to turn on conditionally variation, a variation beat, turn this feature on or off customer.


And that's really what we want a feature management platform to be able to do very well for us. We need to provide data about the customer. Now that could be the country they're in. It could be the device they're on, but it could actually be more information about the customer themselves. Maybe if they're on a subscription to you, we could use an entitlement or it could be to do with how much money they've got in their balance, or even the country that they registered in. We need to provide that. And then within a feature management platform, we can target features variations to those users, those sessions that actually meet those requirements. That is key to the whole point of the feature management, have that fine grain control over who will and who won't receive any of the variations that we have. So with that in mind, let's take a look first at switches, which are perhaps the simplest feature management concept.


The feature is either on or off for all customers. Now that's very powerful because when things aren't going right without our product could be that we've got increased demand load of traffic coming to our site, that's degrading things. Or what if we have some non-essential functionality on our site that we could actually turn off to restore service? Well, we can do the opposite even, which is maybe there's some customers who are experiencing, um, a weird edge case and we can't easily identify what's going on for them. Or we could turn on extra debug for them. We could serve them an unmodified version of JavaScript, which will contain more human names, maybe has additional logging going on within it. That's really powerful for us to be able to have these leavers of either turning something off for everyone or fine tuning the customers who we want to turn things on for.


Now, this helps with the door metric of the meantime to recovery, because if we're suffering, if we're struggling, if we've got a slightly degraded experience for our customers, what we really want to do is get back to the good state and with switches that does help us. Certainly if we can turn off expensive, but not essential pieces of functionality on our product that should free up some resource and allow customers to experience the essential part of the product. And equally, if there are some edge cases, then we can get more information by turning on additional logs for those customers, allowing us to recover faster, move us towards that high performing end of the spectrum for this metric. The next one is roll-ups. So this one's a bit more interesting. This isn't about turning something on or off for everyone. It's about turning something on for select parts of our customer base.


Now, this ally, who's asked to think about how we would roll out how we would deploy, how we would release features. And the important thing here is that actually every new feature that we develop, we should put in the off state of the feature flag so that when it's deployed to production, that feature is not exposed to any customer. It is only exposed to those users who we want it to be. And in that manner, we can decouple a deployment from a feature release. And that gives us a lot more safety when doing deployments, there's something extremely valuable to do so then, okay, let's go down this kind of scenario where we've made our deployment, the code is there. Well, what we can do is we can roll this feature out first to our QA team, our stakeholders, to sign this thing off, we can select those individual users.


That's great. They can validate that this works as expected. And then we can think about rolling this out to customers. And we've got two ways of doing that. We could roll this out to a percentage of customers. We've got five, 10% the customers, or we could roll this out to a group of customers and do a ring Rhode Island, either way. These are progressive. Roll-ups where we're going to take a feature just slowly and carefully to more and more customers. Ultimately, we want this to get to a hundred percent of customers, and then we can tidy up that feature flag. This is an extremely safe way of releasing that new code to our customers. And if we think about it from the door, a metrics point of view, well, this actually helps with the change failure rate because we aren't going to expose new code to customers without having gone through quality gates without having gone through our own QA processes and sign off.


So deployments could happen all the time, but nothing's changing for the customer. So in that manner, the change failure rate can be reduced because yes, there is still the chance that a deployment could break production. But if we're encapsulating our feature facts correctly, the chance of production actually breaking is seriously reduced through the use of rollout and through the pattern of all new work being done as feature flags with the, the default experience, the current code that we've got being the default flag setting, that's what customers receive. And then we turn on the new implementation when we need to moving on from that though is experiments. So maybe with the rollout, we wanted to check that something technically worked as we wanted it. Maybe we need to improve some caching. We have an idea of what that new caching kind of experience you do. How is it going to improve performance?


That's kind of a technical hypothesis that we think we know what the metrics are going to be like. Let's roll it out. Is it looking good? Yes, it is cool. Keep rolling out till we get to a hundred percent, but there's another type of experiment, another type of hypothesis, which maybe is more of a business hypothesis where we want to build some features for our customers, but we don't actually know if the customers are going to like it all. Maybe there's better ways of doing it than we were originally thinking. And in that regard, we've got a business hypothesis. So we want to experiment with, and this becomes interesting. So we can use robotics. We can use percentage rollers, and we can gather information about how the users are using the product. Are they clicking it more than engaging with it more? Are they spending more depends on what the metric is of that, that particular hypothesis that would deem it a success or a failure.


And that's the interesting bit when we're doing an experiment, it might succeed, but it might also fail. If it fails. We don't want to spend a long time building this. We want to spend the minimum amount of time building this. And with that opportunity for doing work that might ultimately come back and show us we're on the wrong path, or we need to rethink this or our customers. Aren't interested in it. Well that drives us to do things in small chunks and test those small iterations of a, of a feature or steps towards building out a bigger aspect of the, of a product that helps with deployment frequency. The reason being is because the changes that we're looking to make to deploy and then to release the customers are small. We don't want big changes. We want very small iterations. Now with that in mind, what we're able to do is to pull more frequently.


Our code is going to be smaller each time the tickets, the work items themselves are going to be smaller each time we, it, to be able to deploy more frequently, but then there's the other bit as well, which is, well, maybe if we are confident within our release pipeline that we have got good unit testing, we have got automation testing, or can we have less testing going on within the pipeline? So the actual deployment takes less time. Remember what I said about rollouts, which was we can decouple deployments from releases and features. So we can have deployments happening really, really quickly that are doing regression tests and smoke tests. But do we need to run the full suite of tests every time? Well, maybe not. That's an option to us. And when we're happy to take that approach, then we're in a position where our deployments can speed up.


But there is a way in which we can work differently with feature management, not just about how we're doing things in the product, but actually how we build the software itself. And that's called trunk based development. And the idea here is that we want to be as close to the main branch of our source code repository as possible. So some of you might be using get flow where there's the notion of release branches and feature branches. And that's all fine. There's a bit of admin, a bit of overhead and managing the branches and keeping them all in sync. But the idea of trunk based development is that you work a lot closer to the mains. You actually get rid of kind of the release branches and you just make feature branches off of the main branch work when it pull requests and put it back in.


Now that speeds up development because you don't have so many branches to deal with it. And there's also slightly less trying to merge conflicts as well, because you're actually dealing with fewer changes and going on there's a little bit simpler, a little bit quicker, a little bit easier, but then there's another opportunity as well within this kind of idea of how can we change the way that we work, which is production is the best test environment that we have now. Most people don't like the concept of testing and production, and it's kind of seen as a joke, but when we've got feature management and we're using feature fact encapsulation in the way I described earlier, then we do actually have this opportunity to deploy untested code. If you really wanted to, I might advise against that, but it is possible to deploy untested code to production.


And as long as you have a very strong set of tests within your pipeline, that will check that nothing has degraded. Nothing has regressed. That untested code is encapsulated in the off-state of a feature flag. Won't be exposed to customers. So it's actually very safe to do. You can then go along as a developer and turn it on just for yourself and see what that's like on production. So it is possible to skip test environments entirely and go straight from a local machine to production. And as long as nothing's been written outside of that feature, flagging encapsulation, then you're going to be in a pretty safe space to make that kind of change. Now that helps with the mean lead time for changes combined with what I was saying around experiments and having things be small iterations. What actually changes now can be done really, really quickly.


We can iterate really, really fast. That pipeline is rock solid for us. We can employ on demand. Ideally the work we're doing is small. It takes a very short amount of time to get through that pipeline because we've removed some of the steps to it. Now we can get it live onto production. We can turn it on for ourselves. It can turn it on for stakeholders. We can show things as we're building them. That means that changes can happen really, really quickly. And if we want, we can turn these things on quickly for customers as well, coming book fixing much better as well. There are fewer steps, fewer gates in that pipeline, which can really help with reducing that mean the time for changes.


So now let's just take another look at this and approach it more from the doer metrics on your I've touched on them there, but let's just switch this around a little bit and actually look ahead during metrics and future management really do go together. So the first one was deployment frequency. So in what ways can the deployment frequency metric be improved? Well, if we're making smaller changes, we can release more often so we can deploy more frequently. And that comes around again from, from this idea of certainly from experimentation, but actually it worked for everything we're learning is that we always want to be doing small iterations. You don't want big bang releases.


And so with this, if we can make smaller changes, especially when it comes to experiments where it might ultimately be a failure. So let's not invest a huge amount of time and resource, and we get to release more often. Additionally, there's less testing for each release just because we're not turning the feature on for customers. When it gets deployed does mean we don't have to do as much testing for every deployment. It isn't to say we shouldn't do testing, but we can do less. We can cherry pick the type of tests that we want to do for each deployment, because it is when the feature is going to be released, that the testing becomes most crucial, most important to deployments don't need all of the testing, but the feature will, but we can determine when the feature gets enabled and we can do that testing just before we want to turn that feature on.


So deployment frequency can be improved through the use of feature management and move us towards that. High-performing end of that spectrum that we looked at earlier, the second is the mainly time for changes. This is how long does it take to go from the idea from someone working on it all the way through that pipeline and with all of the quality gates we've got to then getting to production? Well, similar to deployment frequency, actually, if we're doing smaller changes and then disregard, or don't actually smaller features, well, that helps that's, that's reducing the time to dev things, but that's maybe not everything about this metric because you could have already been doing small features. So what we're, what we're really talking about here is a complete mindset where everything can be seen as a small feature, but there's also the opportunity that every day, even unfinished work can still be released to production.


It's always good practice to commit your code at the end of the day, when the fire alarm goes off or whatever, to make sure that all of your work is still there, or what if every release, every commit, triggered a release. And if you're doing it in the correct manner with feature flags, capsulation, very safe to do this keeps the releases small, and we can constantly be shipping changes, but equally we'll have less testing, just like the deployment frequency one, we can skip entire testing environment, the whole parts of the testing kind of framework that you you'll be using could be skipped for deployments, regression and smoke. I would say need to be there. Unit tests need to be there, but there could be a huge suite of automation tests that you've got. You don't need to run that on every release. And that does mean this mean lead time for changes will reduce.


The third is the change failure rate. So because all changes should be done within a feature flag and encapsulated in that manner, it does mean that there's a fair degree of safety and in making releases and deployments, because the known state of a feature is what is always going to be served by default to our customers. And it's the new work, the new implementation that is being added within the flag where we need to turn it on. So the change failure rate should come down with feature management because it is less likely the deployments are going to disrupt that customer experience and actually have a negative impact to our site.


And that is because the flags are all off by default. As long as you follow that pattern, you can safely release work, as I say, in unfinished ways, maybe and untested ways, but that does allow the change frame direct to come right down. Then the final metric is the meantime to recovery. So how long does it take to recover if things have gone bad or perhaps I should say when things have gone bad as assume they will, at some point, how, how quickly can we recover from this? Well, if we've just turned on a new feature and it's not looking very good, even though it went through one of our tests, well, we can turn off, it's a click of a button to turn a off with feature management. So in that regard, new features can be turned off immediately, which again is bringing us down to that.


High-performing end of the spectrum where we can recover in less than an hour. There are a few other things as well. So deployments are small and roll backs really easy. Now, what I mean by this is because we're looking to do some more changes in small features. If the actual deployment itself breaks production, which you would hope wouldn't happen with really good testing in your pipeline, but it can happen if it does happen. Well, the, the, the, the change was very small. It's quite quick and easy to identify what has gone wrong. Cause it should really only be one change per release per deployment. The rollbacks are easy. We can redeploy, we, we, we can work with, this is not a huge exercise of figuring out which particular change within the large releases caused this problem. It should be quite apparent which release and therefore kind of which change has actually resulted in this problem.


So that can allow us to recover more quickly as well. And then finally, with the opportunity to use switches, like I mentioned, right at the beginning as a mechanism, as a lever to pull, when our site is degraded, in some way we can turn nonessential pieces of functionality off and that can allow us to recover as well. And there might be other work that needs to be done. So they're not completely out of the woods, but maybe we can return to offering our essential functionality to our customers. That's important. And again, these, these are just kind of options for us to really reduce the mean time to recovery and improve that door metric. So to me, that is why the door metrics and feature management are a brilliant combination. They compliment each other very, very well. And it feels to me like feature management is an extension of the ways in which the door metrics can be improved as an extension of CIC D even where not only are we looking at deployments, being the kind of the crucial metric and thing that we measure, but actually what we're able to do with feature management is reduced the risk of deployments, um, not completely but significantly.


And that does allow us to really then improve those door metrics because the releasing of a new feature, the releasing of new code talk customers is separated from the deployment, allowing us to release, deploy, or to deploy far more frequently, far more often that allows us to make smaller changes. We can recover from that faster. And when things have gone wrong, we were in a position that we've got options to available to us to, to improve the product that we're trying to serve to our customers.


And for me, it's a great opportunity to look at feature management and really things like experimentation is valuable as well. The city within a conference like this, and the focus on Dora metrics, I think is a fascinating opportunity for us all to really improve those door metrics with feature management. I have recently written a book about feature management through using LaunchDarkly. It is available soon, and it's available on Amazon. If you want to know more about ways in which we can use feature management, um, to not just improve your metrics, but a whole host of other things, then do check the book out. Thank you very much for listening. I hope I've given you something to think about.