Las Vegas 2020

From Velocity to Value – Scaling the DevOps Impact

As large organizations embrace DevOps some are seeing the pace of innovation accelerate but many still struggle to realize a real increase in value being delivered by DevOps initiatives. This is often a result of attempting to move at DevOps speed while needing to work with parts of the organization that remain based on legacy processes and systems.


Join us to learn about companies that have taken an approach based on policies and automation to help developers get quality code into production as rapidly and safely as possible. You’ll hear about how they leveraged existing DevOps toolsets and added centralized access to key metrics and insights to improve the entire application value stream.


This session is presented by ServiceNow.

EL

Eric Ledyard

Sr. Principal Product Manager – DevOps, ServiceNow

BR

Ben Riley

Advisory Solution Consultant, ServiceNow

Transcript

00:00:12

Hello and welcome today's DevOps enterprise summit session from service. Now this was a service now is discussion around scaling to enterprise DevOps. My name is Eric . I'm a principal product manager for the DevOps business unit. And today joining me remotely will be Ben Riley, uh, advisory solution consultant as part of our, of acquisition. We wanted to talk about three key things that we're going to be discussing today. One is how to scale dev ops to customer case studies on customers that have been with us during our journey and three Q and a so that we can have questions and answer with you. So the business imperative for scaling DevOps is really all around the fact that cloud native is becoming the standard defacto way of doing things. So a mode two or cloud native or microservices based development is really the way that most companies are organizing to be able to deliver code to their, their customers.

00:01:06

Uh, the reason for this is that, uh, cloud adoption is rapidly accelerating. And so when you go to that world legacy, traditional code development methodologies are just not going to cut it. So a lot of organizations are transferring not only to the cloud, but they're also translating the way that they deliver code to their, uh, customers and value and tracking that end to end value stream is really, what's important to most organizations that are adopting the DevOps culture. And so the second big initiative to this is that dev ops is a very strategic initiative now. And especially post COVID, we're seeing a lot more organizations need to move faster, but with less risk. Um, and so the DevOps adoption is going to roughly double in the next year, uh, from 41% of the companies that tried in 2017 up to as much as 80% or more that are going to attempt this in 2021.

00:01:54

The next big part of this is all around infrastructure as code and infrastructure automation. So companies are going to start to do more and more around treating their entire infrastructure deployments as code, um, and treating pretty much everything in their organization is code. Whether it be pipelines, whether it be software development, whether it be, you know, pretty much everything, uh, everything is turning into as code. So it can be managed and tracked as part of an automated pipeline. The other part that's accelerating the DevOps adoption and value is around the challenges that a lot of companies are facing. So one of the big challenges is that, you know, many companies that are trying to adopt DevOps, um, have failed to see any significant release frequency increase. Um, and a lot of folks might tell you that value is more important than release frequency, but the reality is, and I can tell you, you know, all of our projects were funded.

00:02:41

Um, I, I'm a former executive from bank of America who tried this at a 200 year old bank. All of our projects were built on the premise that we wanted to go from nine months of a release cycle from ideation to production down to nightly builds. So our immediate metrics were all around a release frequency, and we built three complete software factories. We trained 500 individuals how to be agile and do their development in an agile methodology. And yet no one would actually let us push the code to production because governance, risk compliance, audit. Uh, we didn't really solve for any of the it service management components. And so that's a common theme, is that a lot of companies, while they can write code quickly while they can develop code iteratively, they're not able to push that code to production so they can't get it in the hands of their customers driving value.

00:03:26

Another big part of this is that developers 61% of developers interviewed in 2019 for the active state survey said that they spend less than four hours a day writing code, and less than about an hour a day writing, you know, mission critical, uh, feature differentiating code that would help drive the business value to their customers. A lot of the tasks that they do, 70% of the tests that they do are the mundane, uh, process driven manual legacy, uh, carry ons from their old world. And so this is definitely a, a challenge for a lot of those individuals. Another problem is that, you know, 90% of released problems have configuration errors. Um, and so when we do the root cause analysis, configuration management becomes a massive challenge because there's so many configuration files traveling throughout all of these deployments, both from an application configuration all the way to the infrastructure cloud configuration.

00:04:14

Um, so a lot of this being that it's challenged in this way is that we're actually seeing more and more configuration management. It becoming more important to this end to end service delivery. The other side of this is that, you know, it takes a long time to get changes approved in most organizations. Uh, on average, we've seen around 23 days on average to get a normal change push through. Um, and the second part is that there's a massive explosion in configuration data that's taking place. So this challenge of configuration is just getting more and more exacerbated as we grow the organization's capabilities. So what we're doing is bringing service now and dev ops together. So we're bringing the enterprise service management capabilities of the service now platform together with the dev ops needs of our customers. Our customers need speed to deliver faster visibility, to see across their entire tool chain and all of their tool chains and in increased productivity.

00:05:06

And the trick to doing all of this is through a strong integrations framework so that we can connect to all the tools that the customers currently use. We don't want people to change any of their tool sets. We want to leave them productive in the tool sets that they use every day and being able to integrate with those tools, pull the data into our platform, and then do all of the functionality that we're doing, which really obligate is around change automation, developer insights, and push button audit. What we did was try and bring together the two worlds of dev and ops seamlessly into our platform. Uh, and it's a little bit challenging because we have to meet the needs of the developers, which is they want to move quickly and get products to market as quickly as possible with minimal, uh, process, you know, bogged down and, and minimal kind of wait time in their process.

00:05:50

Whereas ops need to make sure that we control and govern things so that we're not introducing risk into the environment and we're not taking outages and downtime for our, the way we did this was we leveraged the service now is core data model, which is our CSDM common service data model, uh, and built a dev ops data model to pull in all the data from all of these CICT tooling, from planning to coding, to orchestration, to static code analysis, testing to artifact management all the way through that entire life cycle. What we do is we bring all that data into the platform and we allows us to do a lot of things. Like one is the change automation piece where we can actually use decision tree is to say if this and this and this and this all meat approve this change into production and let it deploy automatically with the CICD tooling.

00:06:34

And, you know, from there through agile team planning to continuous compliance, both on the upfront, pre-prod where, before you even deploy we're checking compliance governance to post production when you're actually running in a service environment, to be able to maintain that governance risk and compliance, um, on an ongoing basis, leveraging things like our GRC platform and SecOps, um, all the way to the service health, which really ties into our it operations management suite and being able to do, you know, kind of remediation and meantime through repair, and a lot of the pieces of seeing a service degradation in production, that's due to a change that just got pushed out. Um, all of that information is being brought into the platform so that we can see that end to end, and we helped do root cause analysis. We help reduce the impact of outages. Um, there's a number of benefits across the service health landscape.

00:07:21

Finally, there's a piece that I, uh, you know, I'm probably most excited about being a former leader is the dev ops insights and analytics. This ability to see across all my tool chains, what teams are high performers, what teams are low performers, which teams need some help, where we have breakdowns in our end-to-end value stream of delivery. All of these pieces are done by having all the analytics and insights across the entire data. And then finally we bring on the configuration management piece and that configuration management piece is all around managing configuration data, securing configuration data, and validating that configuration data for both preventative reasons before we actually deploy as well as remediative reasons for after we deploy. And we see an issue in production that could be caused by a configuration mismatch.

00:08:06

The other part of this is, again, all of this end to end visibility culminates in the fact that we have this ability to come in and see your end to end value stream of delivery from ideation to planning, to commit insights, to development, insights, deployment, insights, change, acceleration system health, pretty much the entire accelerate metrics from the door report, as well as the accelerate book, we've brought them into our platform to be able to measure the high performers, low performers, and all of our teams that we manage. I can tell you as a former leader, I was blind to all of this across my teams. I had no way of seeing data, how well teams were doing in their delivery cycle, where we had breakdowns in their value streams. So this is extremely important to me. And this became the reason why we were named a value stream management leader in the Forrester wave this year.

00:08:50

It's all because of the fact that we have all this data and we're tracking it all with performance analytics, we have trend analysis over time. So we can see whether we're up or down, trending in the right direction and start to see the performance across our teams. This brings together a lot of benefits for us, right? So we have many different benefits that we drive, uh, the most, uh, you know, kind of robust, easy return on investment is the change automation. Uh, what we do is we estimate that for every hundred users, we bring into the platform, we save about one and a half million dollars. That's done by basically returning 14% of the time to the development teams that was wasted in the legacy change processes, but that's not the only place that we're saving money and that's not the only place we're driving value.

00:09:31

We're really driving value across all four planes of project execution, executing products, faster increasing flow throughout the system. We can actually drive top line revenue with this, by being able to get products to market sometimes three to four times faster than we would have before, uh, being able to capture revenue from our customers and be able to actually translate into, you know, revenue to the streets. So that's an incredible driver for most executives. Uh, we reduced cab meetings obviously, and reduce the impact of change. Uh, we've got all the reporting and analytics, which allows you to basically keep your developers, you know, writing code rather than sitting in status meetings or trying to update their leadership. As a leader, I can see what all of my teams are working on and I can drill in proactively without having to have them waste any of their time, reporting to me what they're doing or how well it's going.

00:10:16

And then finally, developer productivity. It's all about getting developers back to writing code. That's what this whole platform is about is optimizing the end to end workflow automating as many places as possible and allowing to do that, uh, allows us to give back the developers a lot of their time so that they can go focus on writing more stories per sprint and start to increase the flow of value throughout the end to end delivery cycle. So now we have some use cases that we can talk about, and a lot of this is around deploying fast, but safely. One of our major key lighthouse, uh, customers is DNB. Um, they're basically, uh, the largest bank in the Nordics in 2019. Um, they're, they're transitioning into really becoming a true digital bank. Um, and what they did was when they started this project, they looked at the dev side, you know, what they knew when they started the challenges on their dev side was that they were reducing the Tyco cycle time from ideation to implementation.

00:11:08

Um, they had a large variation in their Kanban structure, so disjointed processes going on and the status of ongoing work was kind of hard to see and track. Um, they had one pipeline tool set per team. So they had lots of tools that were disparate and spread across those areas and no coordinated pipeline structure or policy set for driving governance around change management or any of those pieces. So that's causes a lot of complexity and slowdowns because it was hard to get any of those three big use cases that we talked about, the automating change, the visibility and traceability across your entire landscape, the audit, all of that, uh, made it very complex and challenging. What they knew from the upside was that they had this very time-consuming change ticket form that no one liked to fill out. Uh, they had a very long cycle time for change advisory boards, the tech board meetings.

00:11:54

They had strict policies and little insight knowledge when they were actually making the operational judgment calls. And there was a long distance between the approvers and the developers. So there was many different layers of, uh, you know, process in between the people that were actually wanting to make the change and the people that would approve it. So, you know, basically they started to set up this, this carrot for a rewarding, good performance, which was, we would automate all the change tickets if, you know, they started to put their guidelines in place. And so they've been seeing a lot of great results by saying, you know, your CICD pipeline tool is accepted and the deployment process is at least partially automated. You have a set pipeline that separates environments you'll always run the different tests. So they, they started to set their criteria as these are the things that we have to meet in order to approve our changes automatically.

00:12:39

Um, and so they've seen a pretty great return on this. And the overall business case that they've brought to us was that they've estimated about a 20 minutes reduction per change ticket. Um, that can start to really add up, especially when you have thousands and thousands of changes every year. And so for the first dev team of six deploying two increments today, they save two hours per day or 10 hours per week. Um, and so the next team of 28 has not been to compliant to the policies, but will still save around two hours per day, um, or 10 per week. And so, you know, if you start to add that up across many, many different developer teams and many, many pieces, um, the capital savings ends up growing dramatically. And that's where we've seen DMB ver be very successful in their return on investment numbers as they've moved forward with, uh, scaling out, uh, the dev ops to their enterprise. And so now we're going to go into the next step of this, which is talking about eliminating those configuration management challenges. So I'm going to pass it over to my colleague and let him explain to you about our suite of acquisition and configuration management.

00:13:41

Thanks, Eric. Really good session. So I'm just going to go into a little bit more detail on configuration data management. So, um, we're for squiggle and we were acquired by service now in July. Um, and we're a configuration data management solution. We really aim on improving the way that, um, our customers, uh, manage configuration, use it, test it, all of these different things. Um, and we see it that it's one of the biggest areas, um, for our customer base at the moment that is trying to get a handle on how they're dealing with configuration and by its very nature. You want it to be configurable. You want it to be malleable that you want to be able to update different settings, different features, Canary deployments, all that different kind of stuff. Um, we, we do exist handled by config, um, but there's still a huge proportion of outages caused by, um, poor configuration mistakes, mistakes happen, right.

00:14:32

Um, but the, the, the time lost to those outages across kind of the, the user base of, uh, of an organization is, is pretty kind of, um, pretty high. So all we do, it's what we do Sweden. And we do as part of our configuration data management solution is, um, is firstly let's try and centralize input some good practices. Um, we applied tests and we, um, integrating kind of lots of different tools in kind of quite an automated fashion to act almost as a, as a watcher of conflict. So we watch it, we track it. Um, we understand when changes are made. And then based on that, we, we give you kind of either dependencies or validations, you know, can we see who else requires that conflict? Can we see who needs to go and use it? Um, or do we need to validate the quality of that change, how you, somebody, you know, changed the region from a mirror to north America previously that's hurt us.

00:15:26

We're not allowed to deploy there for whatever reason. Um, let's stop that change. Let's stop that as part of the pipeline. And that stopped as part of an automated deployment and just make sure that, you know, we are pushing a preventative methodology when it comes to configuration, what rather than kind of a reactive methodology, IE incidents happened, oh, dear. What we want to do is get to, okay. That incident happened, that incident now we're aware of it. We can put in, uh, a validation of policy looking to prevent somebody from making that change again. Um, so I just want to talk about a couple of customer journeys and, um, use cases that, that have been taking on one is a rather large telco we're using our validation and validation kind of engine that tests and tracks things and make sure that they're good quality, um, around their data center.

00:16:14

So the essentially three principle, um, CMDB is a large infrastructure estate and a huge amount of duplication of resources, depending on which CMDB you go and look at from a SLA perspective, from a contracts perspective, um, just from a risk perspective, um, not very good at all. So, you know, it was very hard to synchronize those, um, those three kind of sources, um, and therefore when either an incident occurred or when somebody needs to go and do some work, understanding where to look for the truth, uh, really difficult, hard to measure therefore, and the contractual compliance, basically a very manual process to try and work that out. And generally speaking, by the time that synchronization efforts kind of happened, it has to start again. And that was a long, long process to kind of get through. Um, so you're kind of constantly chasing the tail poor data quality, therefore inconsistent data missing kind of a real source of the truth.

00:17:13

So when an incident does happen, you can kind of, uh, really get into the weeds quickly and understand what what's happening and where, um, and a lack of compliance or lack of compliance in, in being able to understand what is deployed out on the estate. And, and there's various other things, right? Like you might have time spent human effort or rework going in to try and work these sorts of things out rather than kind of value add in into the business. Um, so as a agnostic repository of the validation engine, we saw it as a really good opportunity to start to do some machine to machine correlation. So, um, very much that the more high quality automation you can do, um, and kind of allow people to get on with, um, not proper tasks, but more value add tasks. That's, that's really good. Um, so, so we just got a graph data model in it, and what that means is that we can store things in lots of different ways, and it's very easy for us to therefore do that synchronization.

00:18:10

Um, so continuously looking at modifications in any of the three sources, um, to check that, you know, is that modification good, bad or ugly. Um, and then based on that, um, does another Sandy, we need to know about it. Um, does it already have that information? Um, so we've kind of positioned it in a, in a manner that we could collect data from all of them and then synchronize that data between them. So synchronization scripts triggering out kind of workflows to either correct something, enrich, something that's missing, um, but do it in a totally automated fashion, um, really kind of useful, very busiest state in terms of different tooling, different automation, tooling, different preferences on tools, depending on you know, which team you're working with and those types of stuff from our perspective, um, doesn't matter, um, heavily API driven and therefore, you know, we can integrate in a very non-intrusive way kind of in the background, but making sure that all of that work is still still getting done.

00:19:09

So, you know, reducing people's, um, time and effort and having to deal and manage with these things, giving improved set of data, you can make better decisions. You can, you know, really believe that if there is an incident, if there is a problem, you can go investigate that without having to go and do a lot of research first, um, and then, you know, help, help to kind of apply, um, or remove those SLA breaches and that kind of control contractual compliance kind of element. Um, second customer I wanna talk about is, uh, it's more finance based. Um, this is more of a kind of traditional application release pipeline space. So CIC D um, broad technology space and not just broad but deep as well. So a mixture of legacy and Greenfield applications, a mixture of manual processes, automated deployments. And the thing that really stuck with me was that they could only, the quote was, can only find our configuration incidents in production, right?

00:20:07

They're really struggling to get handle on that anywhere before in any of their UAT, their test environments, their performance environments, because for them, there is only one production and therefore, you know, that's where it really emerges. And, you know, testing and production, maybe not, maybe not the aim, um, large set sets of kind of human direction with their can't think she's not necessarily a problem, but something that you need to track and be aware of. Um, and, uh, just a larger application estate, right? Um, 200 applications, which are a mixture of microservices and mixture of legacy components, um, and dev configurations essentially are being kind of transformed slowly into production and not in a manner that's, um, that's required to, he was giving them some poor customer experience, some outages, a lot of incidents on a rework to go back and fix some of these problems, um, and quite a lot of security issues as well, especially around tokens passwords, those, those sorts of things.

00:21:05

So, um, I mentioned, I mentioned in a minute ago, but we try to be as automated or as kind of, um, highly automated with quality as possible. So acting as kind of a central repository, what it meant was that we could slip into, um, lots of different processes, primarily no CICB pipelines, um, as a, as a couple of steps that collect configuration, um, that's being added or modified or removed, um, and validate the quality of, of what those changes are. So, um, you know, taking that data, seeing that, you know, we're changing a region from Amir to north America or changing a port from X to Y whatever it might be, um, and having the contextual information around that meant that, you know, we can test that, that change is actually good. And if we've been burned by something before, then maybe that's a point where we can kind of notify raising incident before that deployment happens.

00:21:58

And one of the big games was, you know, preventative methodology, you know, um, very reactive is think the world that, that lots of people live in, but we're really trying to push this customer into a more preventative space, meaning that somebody can make a change in full confidence, knowing that, you know, if there is a problem of been burned by something before it's going to get stopped as part of that pipeline, um, and even if the pipeline doesn't stop, right, uh, the, the knowledge of what that changes the auditability, the securing of secrets and those types of things is all, is all handled. Um, really allowed them to promote a better, a better, best practice or a better good practice, um, in terms of their continuous delivery. And in terms of their continuous integration, um, also gives them the ability to start to standardize what these teams are doing with configuration.

00:22:47

So rather than everybody going off in their own, um, in their own ways, um, it gave them freedom of flexibility, but everybody comes through that, that same standardization process. Um, but in the background, you're not changing tools. They're not, you know, having to leave, leave the environments that they like to work in, but it doesn't mean that everybody ends up going through that same pipeline. And that the quality that gets put through there is, um, is, uh, is higher. Um, if an incident happens, it happens, mistakes happen, you know, it is it's the world that we live in, but we're given, giving them a platform to prevent that mistake from happening again. Um, so those are kind of two stories that we've got around configuration, huge amount, more, um, fixed lots of problems in this space. Hopefully it's been interesting if you've got any questions, now's a great time. Um, really appreciate you joining the session, feel free to follow up myself

00:23:39

Or after