DevOps 2020 - The Next Decade

John has over 35 years of experience, focusing on IT infrastructure and operations. He has helped early startups such as Chef, Enstratius (now Dell), and Docker navigate the "DevOps" movement.


He is one of the original core organizers of DevOpsDays and has been a prominent keynote speaker at various DevOps events throughout the years. He is also a co-author of The DevOps Handbook along with Gene Kim, Jez Humble, and “the Godfather” of DevOps, Patrick Debois.

JW

John Willis

Senior Director, Global Transformation Office, Red Hat

Transcript

00:00:00

<silence>

00:00:09

As many of you know, I've had the pleasure of working with John Willis on so many projects since we've met in 2010. He's one of the programming committee members for this conference. He was my co-author on both the DevOps Handbook and the Beyond the Phoenix Project audiobook. We worked on an astounding panel that we did here at this conference with Dr. Richard Cook and Dr. Sidney Decker from the Safety Culture community, as well as Dr. Steven Spear from the Lean Community who will be presenting tomorrow. Hang out to your seatbelt. He's going to take you on a wild ride describing some of his learnings from the past several years, and his belief on the importance of platforms, not just for every developer, but for every company. Please welcome John.

00:00:51

Hi everybody. This is John Willis. Um, presentation's called DevOps 2020 Rethink. Um, I'm, uh, this was a collaboration with a couple of my coworkers. I'll talk about 'em, but one of 'em is Jay Bloom and Andrew Clay Schafer and Kevin Bearer. But, uh, so I just wanna make sure there's credit for some of the slides they, uh, collaborated with me on. So this presentation is about, you know, I called it DevOps 2020 to sort of set the stage of that last 10 years. We've done a really good job in DevOps. And so the question is, what are we gonna do now? It's, it's the start of the next decade. And, um, so I I, some things that I've been thinking about, about sort of areas that I think we need to delve into a little deeper and, and the three areas I'm gonna focus on is what I call organizational conversations, organizational design, and something that we're calling, uh, and, and my team, and I'll explain my team in a minute, uh, the three economies.

00:01:47

So, um, so this is, uh, my team. I started at Red Hat last October in, uh, 2019 2019. And, uh, that's Andrew Schafer. He's my boss. That's Kevin Bear next to him, and that's me, the small guy. And then, um, and then they, um, Jay Bloom, who I've been working with, he's, uh, he's getting a, a, a PhD in transition design. So a lot of these ideas that I'm gonna be pointing out really come from him about how does design and transition design apply, as Andrew likes to say, we wrote some books. Um, I was the co-author of DevOps Handbook, co-author of Beyond the Phoenix Project, Kevin co-author of the Phoenix Project, and Andrew, uh, co-wrote Web operations and some of the site reliability engineering. So, here's the deal, right? This is the sort of the, the Pete Chela joke. We come to find out this, this, um, this slide, it was actually originally created by Patrick Devar.

00:02:38

All things lead back to Patrick. But if you think about the last 10 years, like there's sort of like, we've been this unicorn poop, if you will, on the enterprise, right? And, and I mean that in, in the best possible, right? Like DevOps, enterprise Summit, all the accomplishments we've made. But like, it's been a struggle, right? Starting off like the enterprise, the first conversations were, I don't think the enterprise can do DevOps. And then it was, can security apply the security things around DevOps? And, and you know, here we said at 2020, I think we've got that pretty much solved in terms of everybody has the memo, right? But if we look at digital transformation, like by all accounts, this conversation around digital transformation, it's been creeping up for the last couple of years in, in, in a modern discussion. It's been around forever. Um, there was, you see a lot of stories about failures, you know, different reports, different studies.

00:03:27

This one particular one says 70% of all digital transformations fail. So, so sort of jokingly, right? The next 10 years maybe is digital transformation, unicorn poop gonna get, but we're gonna have to, and I'm half joking because the truth of the matter is we do have to have a better, stronger, bigger conversation about what we're doing. And, and so, and again, like for the most of the people that are at this conference, you've all been doing that. It's for the people that we need to educate. And so, uh, Andrew, uh, just came up with this, this idea of five elements. And I'll talk Andrew Clay Shafer, my boss, my coworker, and, um, and so I'll talk about 'em a little bit. But if we think about sort of the five failures, right? Like leadership, right? They, they, you know, in, again, the, the people that we interact with today and at this conference, right?

00:04:16

We tend to be better than this, but a majority of the organizations out there are still leadership is preventing change either in the form of governance and risk or, or just general business, um, product building, things that don't matter. We really still haven't gotten Eric Reese's memo yet, uh, from lean startup development in a lot of cases are still building the wrong things architecture or basically designing or not even involved in the decision of the design. So they're building the things wrong. And then in operations, we still have a split mindset of incident, you know, operations outages and sort of half in on service management, half in on sort of DevOps or, or some of the new sort of thoughts about incident management. So what I wanted to propose is some of the areas that I've been thinking about as this rethink. We're sitting here, 2020, the, you know, we could say, like I said earlier, that we've done a really good job and we have, like we should all collectively, I'm, and I'm not, I'm being serious, pat ourselves on the back.

00:05:16

We've done an, a tremendous job in the industry, uh, improving commerce, improving people's lives. And, but the question now is at 2020, what are we gonna do for the next decade? Because if we're still talking about GI ops and CICD in five years from now, then we, we, we probably have failed miserably. And the digital transformation discussion will have overtaken us, right? So we really need to start thinking about how we improve. If you go back, you know, five years ago if you had a conversation about continuous delivery, that was a novel idea. Today it's table stakes. So we don't want these conversations to be, we want the new things to be table stakes. So the first thing I want to talk about is something I've been doing for the last three or four years. I, I, I call it, I've called a lot of things.

00:06:00

I'm just gonna call it organizational conversations. And this is where I've gone into large organizations and really just literally spoke to hundreds and hundreds of people usually come in at the CIO level. And usually it's a champion inside of the organization that says, you, you should talk to this guy John Willis. He sort of knows what he's doing. And then I get to talk to CIO and I convinced the CIO to let me just have conversations. And I want the conversations is where I talk about the people at the edge, the people who put their fingers on the keyboard. 'cause I'm more interested in that form of discussion than I am talking to leadership. And top down, I wanna go bottom up. And so, uh, I've done this over the last few years, very large banks, insurance companies. And one of the things I came up with, which is this quote I've made, which is, you can't lean agile, safe, or even DevOps your way out of a bad organizational culture.

00:06:50

So the idea is that we, you know, we have to these ideas like lean, agile, safe, and are great frameworks or great pattern and practice tools for us. But the thing where we, if we don't get to the bottom of how things really work or the truth or have the real conversations with the people doing the work, these things actually can give us false truths. And so one of the things I've been doing over the last, uh, few years, I've had this thing that I call the seven deadly sins of DevOps. I won't go into this in detail, but there, there are patterns that you find when you have these conversations. And one of the most interesting ones is they all seem to funnel down into what I call security and compliance theater. In other words, your audits are basically nonsense. And I've got full presentations on this and some of the work I've been doing, automated governance and, uh, automated cloud governance, again, you can just look me up, you'll find that.

00:07:43

So I love this story by, uh, Abraham Wald, right? Which actually, it was a, a story that Sidney Decker told at one of the DevOps enterprise summits. I've seen them, um, I've seen him give this presentation a few times. And so during World War ii, there was a set of scientists and mathematicians and specifically statisticians that were looking at how to do the proper repairs of fighter planes that would come back. And so they'd figure out where the bullet holes are, the weight ratios and all that. And at one point, Abraham Wald had this aha moment. We said, we've got it wrong. What we're doing is we're looking where we're, we're repairing and fixing where the bullet holes are, those are the planes that are coming back. What we need to do is look where the bullet holes aren't because they're the ones that aren't coming back, right?

00:08:31

And it actually was sort of the original definition of survivor bias. Sidney Decker says that we don't need to look at the absence of negatives. We need to look for the presence of capacity, the things that go right, right? So I use this in this whole conversation, this this organizational conversation dialogue, Elliot GoRat, um, who wrote the goal. And for most of you know, um, the Phoenix Project was a, a, a modern day rewrite of the goal. Fantastic stories, both. He also, 20 years after he wrote the goal, he wrote, um, he did an audio only project called Beyond the Goal. And one of the parts in there, he talks about complex systems and complex adaptive systems. And what he talks about is, if you look at these two systems, A, system B and a system A, if you ask generally people, anyone, which one's more complicated?

00:09:19

Most people would say system B. But if you ask the physicist or somebody really understands complex systems, they're more likely to say system A, because it allows more degrees of freedom. So in a sense, when I go in and have conversations with customers and, you know, the working for CIO, but literally talking to what I call the edge people who are doing the work, they tend to want to give me system B answers. They wanna say, well, that works, or My CMDB is fine, John, don't worry about it. Oh, don't worry about that, that works. And what I really need to do is get to beyond that to the truth. And so I like to use this cartoon of the fine dog, right? So what people are constantly telling me is, don't worry about this is fine. And once you earn their trust or you create an open collaborative dialogue, psychologically safe environment, what you actually wind up getting to is the real conversations where you're finding the places that are really on fire.

00:10:13

And what's interesting is when you get that psychological safety and trust, people will tell you the most fantastic workarounds and the the real fire stories, which really are the sort of system a discussions that I'm looking for. And if you look at like the Equifax breach, right, in 2015, it's a classic example of a system a, system B conversation. So, um, for those who know this, it was a library called Strut two. It was in there, it was a Jakarta, it was a posing module. Just simply, if you did a curl on a system that had that library, the chances are you could actually compromise that system with this little, um, command thing here. When it was all said and done, the CEO said that, well, this, we know what was wrong. The breach was basically a single person who failed to deploy the patch, right?

00:10:58

That's a system B answer, right? But when you go in, first off, you look at the 2018, uh, Congress did an oversight report on the breach, and it was, it had tons of complex problems and systems. One, the chief security officer reported a chief legal officer. So when the chief security officer asked under Congress, and um, in review, it was asked, why didn't you notify the CEO of the breach? The answer was, I didn't think about it. Right? And they didn't think about it because they were reporting to the chief legal officer, the IDS, the intrusion dissection systems on the perimeter had 18 months expired certs, right? So there was all these things. So that, so again, what I look for is those type of things to find out all those sort of complicated, honest answers and discussions. So the second area that I've been thinking about for 2020 is focus is organizational design.

00:11:54

And a lot of this I get from Jay Bloom in terms of transition design and thinking about design when we talk about transformation. And so if we look at just a simple evolution, right? Everybody knows this. We go from a, a agricultural economy to industrial economy, to a knowledge economy. So we're in a knowledge economy, but right now, if we talk about lean and we talk about Toyota TPS, right? We're still in this struggle, this conflict between how do we map the things that we know work really well in an industrial economy? And what are the things in knowledge economy like? So knowledge economy is still sort of art. Now, things like lean have been able to try to apply science, but we still have these debates on what really maps properly, right? So, so we really need to sort of like get over that.

00:12:38

We need to sort of, not sort of, but we need to actually start applying true science, the way operations research, all the things that we could learn from the industrial economy, truly in an gon. And I would say we're still not doing a great job there. And so I talked about earlier how Andrew, um, had come up with this idea of the five elements. So Andrew spent like five or six years over at Pivotal, really large transformations. And one of the things when we all came at Red Hat, we actually had this powwow, like, again, the glass half full, glass half empty. What are the things we've done, right? But what are the things we really haven't done a great job on? And so if you think about a pre DevOps conversation, like it was all about development. It was agile manifesto, the development, the DevOps conversation opened up this sort of balance theory between operations and development, right?

00:13:29

So it was really sort of differentiation versus scale. You could say it's DevOps, and it was an engineering focused discussion, right? And we've done a really good job there. What we haven't done a good job is architecture, enterprise architecture. I took almost large companies I talk to. I mean, the, I get from the DevOps people, they get screaming, please, if you could help us get the enterprise architects on page, we've left. In a lot of cases, maybe not your case, but in a lot of cases, enterprise architects are still working off the 1990s paradigm of architecture. And, and then product in most cases is, is a mess as well. So it's like the, you know, Chinese medicine and, and it's based on balance theory and leadership in the middle that we use this canvas to start a discussion about. So if, you know, if you've got too much weight here in development or development ops, but not in architecture, we use this now to start a conversation of what's your balance, what's your balance theory amongst these five elements?

00:14:29

And so one of the other things that, if you go back to Toyota, one of the more successful parts of Toyota, you know, as we, we talk about lean as, as a definition of what Toyota proxy systems is, was something called the Toyota Supply chain, and something they called the four Vs of learning. And that was variety, variability, velocity and visibility. And so I think if, when we talk about that sort of middle area between an industrial economy and a knowledge economy, could we take the, this, you know, Andrew's five elements, or we're calling our sort of global transition transformation office, five elements, and try to map that with the four vls to get a better sense of how we can do knowledge economy based on these, these pure principles. So if you look at, um, what I did is I created a grid here.

00:15:17

So looking at the motivation and conflicts pretty obvious, but if you look at a developer, a developer has wants increased variety, right? Sort of balance, economic more choices, um, variability wants, it doesn't really want tolerance and, and lockdown, it wants to basically expand. They want of course increased velocity, but they want, um, decreased visibility, right? They don't in general want GRC governance, risk compliance. They don't want the cab, they don't want sort of nfr, if you will. And product is is if, if you can remember, the grid is pretty much aligned. Leadership wants everything <laugh>, right? Like increase everything. But if you look at operations and architecture, which are reasonably aligned, at least on our five element grid, um, they wanna decrease variety. They wanna decrease optionality, they want reuse, they want scale. Um, they want to, uh, decrease variation. They don't want to give you, they want to tighten your tolerance.

00:16:11

And then velocity, they in general wanna decrease the speed. And I know that sort of the DevOps mantra, you know, everybody's gotten the memo, but in general, large organizations still, um, a big party organization is trying to decrease, but they wanna increase visibility. They want more nfr, they want more operationalization, they want, um, some form. Like again, I think a lot of people are getting better at cabs, but they do want some sort of audit and control. And, and again, architectures same way. So if we look at that, then we can sort of dive into, I'm gonna just talk about, um, variety and, um, variance or variability here, and I'll save the other two vs for some other conversation. But in variety, we're talking about optionality, we're talking about balancing marketing demands and operational efficiency. Uh, the Toyota supply chain book is an excellent book if you want to understand the real details of how they competed the vault competed against the Prius.

00:17:06

Um, so we look at variety, we can look at some systems thinkers. Um, this is Alicia Guro. Um, she says, in general, constraints enable freedoms basically says by compelling the potential variation and component behavior. I know this is very sorta of techy, but, um, the context, depending constraints paradoxically also create new freedom. So in general, certain types of governance systems enable freedoms. So we need to learn more about systems thinking. We need to learn from the four vls. We had to understand what Toyota did incredibly well with, with variety. And another great, um, is the tragedy of comments by Garrett Hardin, right? And this is self-interest behave contrary to the common good of all uses spoiling the shared resources. So there's a balance again. And, and, and to just sort of summarize it, consumables must be managed to preserve the system. Too many cows consume all the grass and the field collapses, right?

00:18:03

And we got Abby's law. And so ultimately, I'm gonna tell you that I think all this has to be balanced in the five elements. And then what, what I conclusion is gonna be that you really need a platform, but I'll get to that in a little bit. So ASP's law is, uh, a system to be stable with a number of states of its controlled mechanism must be greater than or equal to the number of states of a system being controlled. So if you think about a platform, a platform does that, it does that balancing act between controlled and controller and controlee. And in, in general, stable system controls must be greater and equal than the, than the controlled systems. Uh, last but not least, um, Don Ernstein, um, the problem in any prioritization decision is that it, it is a decision to serve one job and delay another one.

00:18:52

So in general, what he's saying, without all the gory details or reading his, uh, you know, intense book, focus on high val value, high probably items in your backlog. So the, the common theme here is economic balance and how to make those trade-offs and decisions. And we've got, uh, tons of literature of science from incredibly smart people to help us. And so in general, um, what you have is, you know, the constraints enable freedoms, consumables, um, must be managed to preserve the system stability, uh, ashby's law, and then cost by Don Ryson quickly, uh, variability, which is variation. I love this quote. This is, uh, unknown author, misunderstanding. Variation is the root cause of all knee-jerk reactions over control, micromanagement, and tampering. If you, uh, if you go to Deming's, um, writings, basically he says the importance of operational definitions in collecting data, without them, the data is suspect change the definition and the data changes.

00:19:53

And when you don't have a written definition, the different, I'm sorry, definition, the different opinions of those collecting data results in Mudd data, right? And, and so here's the bottom line is right, we quote the hell outta Deming, but we very rarely actually listen to 'em, right? There's just, I mean like, like every presentation that has this sort of Deming quote, but like, are we really doing operational research, right? Are we really using applying the science statistical process control system of profound knowledge, Deming and Chu's thoughts about, um, plan, do check act or study act? And then just quickly, um, another place to look for, um, variance and how to, to create opportunity variances, uh, uh, to Gucci and the, to tag Gucci lost function, sorry, a cost is more important than quality, but quality is the best way to reduce costs, right? And so here's what he's saying is find the edges of your variability.

00:20:51

It's not how tight your tolerance levels are, it's how far you can stretch them. Where can you get the value, the hidden values are at the corners. And then there's what's called the red queen effect. And this is, um, basically from Alice in one land. But in general, when we talk about sitting here in 2020, uh, dash 2020, if we're running in the same place, we're losing so in in, in, in summary, uh, statistical process control tolerance from tag GCI and the red queen effect. So one of the things I wanna say here is that if you think about what's in common when all these things I just talked about, they're math engineering statistics, we need to do a better job in the next aid of stop this knee jerk reaction of like, we get a failure, let's hire a let's we get a breach and let's hire a hundred new security professionals.

00:21:39

And that's a true story actually from a bank and, and doing finger in the wind. We have this knowledge and it's been used by industrial, uh, industrialization power plants. It's a hundred years of engineering that's sitting in our face that we can actually apply and think about and be better at. And in fact, there's a great quote about, um, uh, uh, Schuit Walter Schuit and Deming said that in 1980, that it will be 50 years before we actually figure out the real value of what Chut was saying. So Schuit was actually created the genesis of most of Deming's work, statistical process control plan, do study act, right? So he is like, we're still like 20 years away from Deming's prediction of doing this. So, so the bottom line of what I've said over the last couple of slides before I get into the next section is that there's a lot of really good information in industrial engineering operations research.

00:22:29

We need to stop just quoting that stuff. And we actually need to start looking at the real science and make breakthroughs. And again, I I'm not trying to trivialize the people who have done tremendous work. I'm just saying in general organizations, you know, I I, I had some person tell me, oh, I'll never get my management to, to understand tag Gucci or just like, well, Toyota got their management to do it and they decimated a market for 50 years. Alright? So I'd end up with a couple of things about platforms. And so this idea we've been talking about, uh, internally, and Jay Bloom's been writing about it for a couple years, and he calls it the three economies. And so most of our discussions around how we think about infrastructure or scale or even a DevOps conversations have been bound around this idea of two economies, differentiation and scale.

00:23:16

We can call that a dev or ops, right? Or infrastructure and development. It's been a bimodal discussion, right? And, and so differential economy, right? Velocity, novel, niche, experimentation, incubation, right? The things that you would expect from a differentiation development, and then scale, we could say it's sort of the ops or infrastructure. It's to regulate, reduce, uh, create resilience, reuse, consolidation, right? The, we, we understand those two scale those economies pretty well. So one of the things I, I've, I've been fortunate enough to have great conversations with Mark Burgess and, uh, mark wrote the forward to the SRE book. And, you know, we were talking about sort of SRE and how Google has, um, built their infrastructure over maybe say the last 10 or 15 years. Um, and in fact, if you understand the history of Google, they started out with something called the Borg. They, um, they turned that into a project called Omega.

00:24:09

Ultimately we see that as the open source project called Kubernetes. And what Mark said was one of the brilliant things that Google did, which was they made a non determinist infrastructure look deterministic to their developers. And what they did was developers didn't know anything about the, the, the particulars of the virtualization of the storage platform. They just had APIs or interfaces. And in most cases, they didn't really, they, they didn't, weren't even given the ability to know those things. So they just created applications and services through interfaces that were bounded by, and what Jade would call this is a scope economy. Now, Google didn't call this a scope economy, but it was a, it was this, this clutch, if you will, in between scale and differentiation. And it's not just a platform, it's an interface. It's an abstraction that separates, allows the developers to get the best value.

00:25:01

And as the infrastructure get this value, you almost think of the differentiation kind of crushing in and scale, crushing in towards the middle where the scope economy is, the things that will actually create the adoption and the, the control and all those things want. So if we look at this like the differentiation, um, the scope of kind of gives us the ability to enable the, the vs the velocity variability and, uh, and, and, um, vari variety, uh, allows recombining all those things that the, um, uh, the tragedy of commons, um, scale controls velocity, again, variability. So it gives the best of bulk worlds for the scope. And, and like I said, it, it really becomes this clutch. And, um, if you Google, um, three economies, you'll find more presentations, more details on this discussion. Jay's written a couple of, there's, uh, there's some really good worldly mapping examples with this as well.

00:25:53

And so, so in the end, one of, I think the most important things we talked about 2020, and this is self-serving. 'cause I work for a company that actually sells a platform, uh, OpenShift <laugh>. But, um, but, but I do truly believe this is true. You know, Stephen O'Grady said that the, that developers are the king makers. I think when I sit here in 2020, I say, if you are not thinking about the next decade platform and how that platform's gonna look, how you're gonna utilize that platform? What are the strengths of your organization? Use that platform. You didn't get the memo and you're probably gonna, um, you, you're probably gonna lose. I, I do I fundamentally believe whether I work for red or not, I've been believing this, you know, prior to six months ago, uh, that the, the new way forward is platform.

00:26:39

So platforms are the new king makers. Like get the memo. Uh, yeah. So platform by design. The, the whole idea of this is if you think about what all the, what I would call cloud titan, so the early experimenters in scale of infrastructure, your Googles, your Twitters, your, uh, Netflix, right? Um, in Facebook, they all did this by platform. Now, the question is, what does a platform mean? How do you, how do you, uh, use it? And like, and don't get sort of lost on the marketing hype in which anybody can give, including my company. Um, but what we, what we're talking about here is think about what I said about what Mark Burges, what Mark Burgess said about the brilliance of Google, right? They created this abstraction that allowed the developers to complete completely divorce from the infrastructure. I mean, all they were given is a set of APIs, interfaces, and well-documented interface to basically do anything you needed to do.

00:27:34

And that's where the enterprises have to get to. Now, it's a long haul to get there because we have, you know, Google's one application base or one infrastructure, you know, uh, few applications, banks are like, you know, you know, um, you know, 20, 30 lines of businesses, you know, thousands, maybe 10, you know, 10,000 services. So it's a much harder ask, but in the end, you have to get there, right? And to get there is stop thinking about platform as a differentiation economy, a a platform as a service or a self-service. Or even worse, you know, I've got a container system that manages clusters. And you start thinking about the, um, what I'm calling a scope economy and how the, um, a platform really becomes a platform as an interface. How it starts enabling the things that you need from the, um, you know, from the differential Asian economy, sorry, and the scale economy.

00:28:33

Those things get collapsed in your scope economy. The platform becomes this enabler. And so you, you tend to start looking more like Google. I I always think when, whenever I hear a conversation, a large organization, I say, Hey, calm down. You're not Google. You're a bank and you're healthcare and you can like calm down because you're reading all this stuff about the way infrastructure's supposed to work. And then I say later in a presentation, by the way, you can be like Google, but you have to understand Google didn't really create a pass. They created a platform as an interface. And, and, you know, and as we see things like, um, service mesh, Istio, envoy, all those things that really the platform becomes this experience. And so, you know, if I look over the last four or five years, some of the smartest people I know used to work with software companies, they're all taking VP of engineering jobs in large, you know, global 1000 companies, shoe companies and, and banks.

00:29:33

Why? Because these are the people that the enterprises know have to get them through the next 10 years of, and, and it has to be based on a new generation, a new way of thinking. And I would say it's a scope economy based on a platform as interface. So my ask for everybody is I would love to, uh, push this conversation about organizational conversations, about the, um, some of the things that we've learned from Toyota, some of the things we should be doing better from operations management are, are we applying, maybe I'm right, maybe I'm wrong. I don't think we're applying the right, you know, the right science. And I think we're a lot of platitudes and a lot of quotes. And then, um, all this bound into a discussion about platforms. So my ask is anybody who wants to have a next generation conversation about these three areas, please um, hit me up. Help, help me help you drive that conversation. Uh, thank you very much. My name's John Willis. Uh, B Loop on Twitter, j willis at red hat com.