What is Architecture, and Why It Matters

We studied organizations that had the best project due date performance in Development, the best stability and reliability in Operations, and they also had the best posture of security and compliance.

We wanted to understand how these organizations made their “good to great” transformation, so that other organizations could replicate their outcomes.


There have been many surprises on this 20 year journey. But by far, the biggest surprise was how it brought me into the middle of the DevOps movement. The last time any industry has been transformed like our industry is being disrupted today was probably manufacturing in the 1960s, when it was transformed through the Lean and Toyota Production System.

GK

Gene Kim

Founder and Author, IT Revolution

DS

Dr. Steve Spear

Author, The High Velocity Edge: How Market Leaders Leverage Operational Excellence to Beat the Competition

Transcript

00:00:13

Hello. I am so delighted that you're here and I hope that you're having a fantastic conference. So for the next 30 minutes or so my mentor, Dr. Steven spear, and I will be talking about architecture. I think this will be a broader interpretation of architecture than you might have heard of before. And we'll be talking about why structure and architecture. So massively impact the dynamics and performance of a system. I'm hoping that this will potentially elevate and further illuminate the impact of structure and architecture in your own work. And as you've likely heard, Steve and I, we've been working on this book for nearly two years and we're hoping, uh, that it will come out next year, hoping wait, it will come out next year. And so we do talks like this to further clarify our own thinking. So let me first introduce Steve. So without a doubt, one of the most impactful learning moments for me was taking a workshop at MIT in 2014, uh, taught by Steven spear, which is why I went to the class.

00:01:10

Um, and I cannot tell you how much, uh, he's influenced my own thinking. So he is famous for many things, but he's probably most famous for writing one of the most downloaded Harvard business review papers of all time in 1999 called decoding the DNA of the Toyota production system. So this was based in part on his doctoral dissertation that he did at the Harvard business school. And in support of that, <laugh>, he worked on the manufacturing plant four floor of a tier one Toyota supplier for six months. And so since then he's extended his work beyond just high repetition manufacturing work to engine design at Pratton Whitney to the building as a safety culture at OAA and how we can make truly safe healthcare systems for everyone. And so for the last decade, he's been part of a us Navy initiative to create high velocity learning across all aspects of that enterprise.

00:02:00

And so for the last two years, we've been talking two to three times a week or more trying to see if we can codify what we've both observed in our careers about that amazing and magical dynamic that is created that can fully unleash human creativity and problem solving in almost every domain. So, uh, earlier this year, we presented on aspects of fast and slow integrated problem solving the four characteristics of great structures. And so this time we'll be presenting on more aspects of great architectures and structures. So Steve I'm so delighted that you're here to teach us about architecture.

00:02:37

Yeah. Um, gene, thanks very much. And, uh, as far as the book coming out, remember that's early next year. Not just anytime next year, <laugh> on the topic of architecture. Let, let me just start with, uh, a reference back in January. I attended a conference, a symposium with, with my wife, Miriam, who's actually an architect and someone opened up with a quote from Winston Churchill who said, first we design our buildings and then our buildings design us. And, and what he meant by that is that in the moment of, um, doing the drawing, doing the building, doing the construction, we think we have control over the building. And by extension, this was a, a symposium about urban design. We think we have control over the layout of the streets and the placement, the buildings along those streets, but then once all that stuff is in place, they determine how we behave.

00:03:25

So we behave on them and then once we're done behaving, they behave on us. And I, I thought that that was such so telling because you and I have spent so much time talking about the architecture of technical products and the way in which we architect them, then determines back how we behave around them and towards them and with each other towards them. And in fact, how we architect our organizations in terms of the flows of information, the possibilities, and the pathways for collaboration, we design that social circuitry, but once it's designed, then it designs us back in terms of how we act and how we behave. So anyway, this, this metaphor of architecture is actually, I think, more literal than metaphorical. And I hope to elaborate on it a little bit right now. So again, in terms of background, you, you made reference to some of the work I've done, but really I've had now 25, 30 years of trying to explain anomalous outcomes and those anomalous outcomes take the form of two, 202,000 X.

00:04:27

And what I mean by that is that, um, back in the seventies, and certainly by the 1980s, people were making observations that when you looked, let's say in the automobile industry, there was Toyota, which had productivity, which was double, uh, what was the world's standard. And it had levels of quality, which are somewhere between in the hundreds versus the thousands better than anybody else. And, um, a few years after the first work came out about Toyota and its, um, manufacturing systems, uh, there were a bunch of people at university of Michigan who did studies of its design systems and found out same ratios that on any given day Toyota is producing, um, twice the number of new models and half the time with much higher manufacturable quality, much higher, uh, product quality down the road. And as people started looking across, um, these different environments, which you started to see as no matter where you looked in industry, you were seeing these ratios of, uh, double productivity, hundreds of thousands of times better in terms of quality.

00:05:26

And in terms of workplace safety, also huge, huge differences, which we documented about OAA. And I wrote a, a case about that in my book, the high velocity edge. But anyway, what we kept finding is no matter where we looked, you mentioned healthcare, social services, you could literally double the output of an organization, increased dramatically, its quality with that doubling of output. You were you reducing cost, increasing affordability, increas, increasing accessibility, so on and so forth. And it turned out no matter you looked, what you were finding is, uh, planes, trains, automobiles, tech, biotech, pharma, healthcare, education, social services, military, and every vertical, every sector and across every phase, every phase of value creation from way upstream discovery, all the way through development, design, production delivery, after sales, uh, service and that sort of thing that you were finding these crazy ratios of, uh, two, 202,000 X, it was every place about everything.

00:06:26

And it sort of begs the question, um, where does that come from? And something we've, which we've explored, uh, previously, but it's always worth repeating. It's always worth repeating is that most everyone, when they start a venture starts at a very, very low level of competency and capability. And, and as far as Toyota, cuz that was the first intersection automobiles and manufacturing to put this in some perspective, Toyota in 1958, when they started their venture and their adventure, in terms of coming into the us market, they were arguably the worst automaker in the world. Their productivity was one eighth, the world standard, their product quality was horrendous. There was no product variety. And within 20 years they had transitioned from worst to first with the highest quality, the highest productivity, et cetera, etcetera, et cetera. So when we look at this, um, as a journey from moving from positions of very, very low capability and very low competence to very high, very, uh, very high capability and very high competence, what we have to say is that we're really trying to do is manage in such a way to have very high speed or high velocity, uh, learning dynamics going on.

00:07:38

And, um, it turns out that anywhere you look across all those verticals and all those phases, what we find is that some are much, much better at creating the conditions in which people can give fuller expression to their individual potential, to be creative and have that individual potential and that individual expression harmonized and integrated towards common purpose. And again, we see it everywhere. And so, um, it does beg the question and this ties us back to our issue about architecture. If we design things and then the things we design influence back, how we behave. And in this particular case, we're concerned about the things we design and how they influence back us in terms of our behavior and our potential to be creative individually and collectively towards common purpose. What are the two primary things we design? Well, one is the processes by which your work and my work and Ann's work and Erin's work gets harmonized, harmonized and integrated towards common purpose.

00:08:41

Those are the processes, these enterprise processes. And so when we start thinking about that, we want to think in terms of structure that allows better expression. And we want to think about dynamics, which allows better expression of our individual and our integrated, uh, creativity. And, you know, we've talked in the past about the importance of having processes, which are simplified. So they're less confusing and less distracting and less pulling our attention with more standards, temporary standards, but standards nevertheless, as a capture of our best understanding of how to succeed. So again, there's less initial confusion about how to get started and that coupled with, um, the dynamics of stabilization, again, to make sure that if there is difficulty, if there is confusion, if there is admiration that it's contained in time. So we're less distracted and less aggravated for a shorter period and not only contained in time, but in space.

00:09:39

So it doesn't spill over. And my aggravation, my disruption, my confusion then spreads to, um, Aaron and Anne and to you. And so, uh, the architecture of our enterprise processes matters a ton in terms of our ability to bring our intellect onto the technical problems in front of us. Then the next thing of course is about those technical products and the question of how can we go about designing those in such a way that it's easier to, um, bring our intellectual horsepower onto their design, their improvement, their maintenance, their use, their development and their operations. So anyway, taking this, um, a little bit further in terms of, uh, particular, uh, tactics, we can start thinking about a contrast between systems which are fragile and don't lend themselves to easy development. Don't lend themselves to easy operation and then systems, which are more resilient, more agile and just overall higher performing.

00:10:37

And so when we started thinking first about now flipping this back from the enterprise process to the object, you know, what are the qualities of a object, which, um, are really unattractive in terms of allowing us to give fuller expression to our intellectual potential in terms of its design and its operation? Well, first is let's just make it an integrated black box technology, boom, everything connected to everything else in all sorts of convoluted, different ways. And once the immediate consequence of that, the inevitable consequence of that is that no matter what change we wanna make, no matter how small it is, we have to coordinate that change with every other part of the system and everyone else responsible for every other part of that system. And so, uh, what's the impact. It means we can't be flexible. We have we're fragile, cuz if there's a disruption locally, it becomes a disruption systemically, we lost, we lose, uh, agility, we lose, uh, resilience and eventually we lose relevancy because the object we've engineered can't be changed to keep up with changing circumstance.

00:11:44

Now, gene, that, um, issue of highly integrated intertwined design carries over from the object, the technical object front of us to the organization doing the design and doing the operations. So, um, we'll come back to that. But anyway, what are the qualities of, of a nicely, uh, architected object? It's one which is modular and not only modular, but nested. And what, why is that is because, um, in a modular design we can make a local change without having to coordinate it with everything else and we can make another local change without having to coordinate it with everything else. And for the modular, if the modules really are nicely modular, we can actually change the overall architecture and the layout and the configuration without changing the pieces. Anyway, what's the impact is that when we, um, design the technical object in front of us, in terms of being modular and nested, it gives us huge opportunity to be agile, to be flexible, to be resilient, to be responsive and otherwise to maintain the relevance of that object, that technical system societal, as I was, uh, prefacing before the same logic carries over to the, to the design, to the architecting of the enterprise processes in front of us.

00:13:02

And we have similar set of choices just as we have with the technical object, the technical system between highly modular, I'm sorry, highly black box intertwined, convoluted versus modular nested. We can do the same thing and we do the same thing with our enterprise processes. So in some cases, you're in an organization and no matter what you wanna do, you have to coordinate your changes with everybody else and your actions with everybody else and your experiments with everybody else. What does that mean is that you have to get everyone aligned and in agreement and synchronized all at once. And so what's the impact is that you have an organization and organizational processes, which, um, persistently people are either confused or flumed and they can change. They can't adapt, they can't be agile, they can't be resilient, they can't be adaptive. And again, the organization, as a consequence loses its societal relevancy, cause it can't be high performing and productive.

00:14:03

Now what's the alternative to that. The alternative to that is design design enterprise processes with the same mindset of creating things, which are more nested, more modular and consequently, um, more aligned and, you know, you know, simpler more standardized, more stabilized, more synchronized again to have the effective enterprise processes, which have all these beautiful qualities of modularity and nesting. Now what's the consequence of that is that when we have workflows, which are simplified, so they're easier to parse and standardized. So they're easier to parse and stabilize. So local issues don't become systemic issues and systemic issues don't necessarily become local issues. Then, uh, we have an enormous amount of opportunity for people to bring their intellect, their creativity locally focused without being distracted by concerns about how they're going to be disrupted by or disruptive to the larger enterprise. And so what's the impact is that we are creating organizations, enterprises, which can have all those beautiful qualities of agility, resilience, responsiveness, flexibility, on and on.

00:15:19

So not only can they establish a very high level of social relevancy, they can continue to maintain it regardless of how the environment around them is changing. So anyway, gene, back to where we started, why is architecture important? Cause how we design things and how we design ourselves. We have some initial control, but once that's locked in the architecture, the configuration for the object of what's connected to what, in what way, determines how we connect on the object and when for the system and the enterprise and the processes into which we're embedded, who's connected to whom in what way in what form, um, determines how we connect. And of course, what we wanna do is act in ways which have all these beautiful qualities that will allow us to be more creative and more productive individually and collectively. And the way to do that is architectures, which are more modular, more nested and more apt and more receptive to our creative intent over to you.

00:16:28

So what excites me so much is that Steve has shown us this problem and the direction of a solution. And I think the language of software does so much to help illuminate and give concrete examples of this. So put in another way, some systems constrain or even extinguish entirely the creativity and the full problem solving potential of everyone within the system versus those that Steve mentioned fully unleashed and the creativity and problem solving of everyone in those systems. And I think you'll recognize one very, very famous example in, uh, the DevOps history. And that is the Amazon API example. So Dr. Verner Vogels wrote in this ACMQ article in 2004, how amazon.com started off 10 years ago as a monolithic application, running on a web server, talking to a database on the backend and it was called ABI dust. You might might even remember the, uh, URL having Abido in it, uh, you know, that far back.

00:17:21

And so that application held all the business logic, display logic, and the functionality that allowed for recommendation limania reviews, et cetera. And so he said, there are all these characteristics that you want in a good software environment that suddenly that not suddenly over time could not be done anymore. It was, uh, the pieces could not evolve independently. Uh, and so increased the need to coordinate, communicate schedule prioritize together, right? Because there was no isolation and as a result, no ownership. So in other words, one small piece of system could cause global chaos and disruption. And so this is what led to the famous $1 billion API re architecture of the entire Amazon e-commerce system. And there is this very famous, uh, memo by Steven yey, uh, who talked about, uh, this, uh, who characterized, uh, how this went to place. And he characterized that Jeff Bezo sent out a memo that said all teams well, hence for expose their data and functionality through service interfaces.

00:18:20

In other words, APIs and teams C can communicate only through those interfaces. No other form of inter process communication, uh, is allowed. So it doesn't matter what technology you used said, Bezos HTTP, CORBA pub sub Bezos doesn't care. Those service interfaces without exception must be designed from the ground up to be external liable. And anyone who does doesn't do this will be fired. And then seven he writes is thank you and have a good day. So, uh, Steve Yagi wrote number seven is obviously a joke because obviously Bezos doesn't care whether you have a good day or not. And so who enforces, uh, does a famous story about Amazon CIO, former army us army ranger. Rick Zel was, uh, put into place responsible for creating these hard partition between teams. And so, uh, it visually depicted the before state of Amazon e-commerce systems looks like this.

00:19:12

In other words, uh, it was very difficult to, uh, get anything done without having to touch other pieces of the system and the nature of, you know, technical debt, the way that, uh, architectures can, uh, disappear, uh, where lines between modules are blurred. This eventually turns into this <laugh> right where suddenly no piece connect independently every time you wanna make a change or change a cable, uh, you have to touch potentially every other cable. And if you make a mistake, uh, everything goes down. So the after state of the API, um, re architecture at Amazon probably look more like this is that suddenly you can regain independence. You can make changes, um, uh, independently, uh, without the risk of having global, uh, disruptive impact. So now teams can work, develop, test and deploy value to customers independent of each other. And this is what caused orders of magnitude improvements, uh, in productivity.

00:20:05

Um, and by the way, uh, architecture is elusive. This might look like it has all the great characteristics of the previous picture, but what you don't see is what's at the bottom is that, uh, you have, uh, an incredibly convoluted pile of spaghetti where it is impossible to pull or manipulate one cable without potentially, uh, touching everything else. So what was the result of that amazing investment that, uh, Amazon had made by 1999? Let's say they were doing thousand deployments per year, but by 2010, uh, by 2001, you know, they had ground to this state where you could only do tens of deploy a year because, uh, the risk of deployments were so catastrophic by 20 11,000, uh, by 2 20 11, uh, Jeff Jenkins, uh, shocked the world by describing how they were doing 15,000 deployments per day. And by 2015, Ken Exner, uh, director of dev productivity at Amazon said we were doing 136,000 deployments a day.

00:20:57

So I think this just shows the example of how investing in architecture can truly unleash, uh, productivity, uh, and fully enable the creative problem solving potential of tens of thousands of engineers. So, uh, this is actually what we found in the state of DevOps research. So this is one of the biggest aha moments in, uh, that study for me, it is that architecture is one of the top predictors of performance as measured by what is, to what extent can we make large scale changes to our part system, without permission from anyone outside of our team? To what extent can we do our work without a lot of fine grain, communication and coordination with people outside of the team, exactly what Steve had talked about. And by the way, that's the good case of just having to communicate and coordinate in a bad case, we have to schedule together, prioritize together deconflict, and this is what causes those massive escalations up and down the org chart.

00:21:46

And for those things are, uh, present. Then we can deploy and release our service on demand, independent of services. We depend upon to what extent can we do our testing on demand without the use of a scarce integrated test environment, because everything is modular, right through information hiding, we can localize effects. And if all those things are true, we should be able to do deployments during normal business hours with ible downtime. And it's only by that, that we can have these incredible, uh, characteristics of doing tens hundreds or even hundreds of thousand deployments a day where teams can work independently deploying value to their customers without having to, uh, be coupled to the rest of the enterprise. So the reason that I think is, uh, what makes it so remarkable is that in software, we have a language for this already. So this is Conway's law.

00:22:33

There is an isomorphic link between the communication paths of an organization and the software architecture that they work within. And so this is based on Dr. Melvin Conway's famous 1968 experiment, where he built a compiler in two ways, the team that was organized into, uh, the group that was organized into three teams, built a three pass compiler. The team that was organized into five teams, built a five pass compiler. In other words, there is an indexable link between how the team is organized and that will constrain, uh, the software that they create. What I find so wonderful is that in the military community, they have words for this already. And so they would say, uh, this is really inviting two concepts. One is unity of command, uh, and or unity of effort, right? So ideally both. Uh, and then when, uh, and that is made, uh, most effective.

00:23:23

When you have decentralized execution, you have an architecture that allows for people at the edges to work independently of each other. And so this, uh, is really the story of team of teams, where they pushed and enabled decision making to go to the edges, which allowed to go from sighting to capture from never, uh, to 45 minutes where say a 22 year old drone operator, uh, could, uh, set into sequence a series of events that lead to, um, capture of an enemy terrorist leader. So that makes a question. If these structures are so important in terms of predicting organizational outcomes and performance, where do they come from? And for this, we can go to what Dr. Westrom taught us last year, uh, which is this is what leaders are accountable and responsible for. And I think the most beautiful phrasing of this is Jack Rao's rule number 23 of leadership, uh, that Dr.

00:24:12

Westrom told us about. And he said, simply this, if you have a dope at the top, you will have, or soon will have dopes all the way down. <laugh> right. So, which I think has such explanatory power because the simultaneously explains the best experiences, uh, I've had, um, where it was fully reinforced by who was at the top. And it also simultaneously explains the worst working experiences I've had again, fully enabled by who was at the top. And so this is where he introduc introduce the term of the sociotechnical Myro or actually the technical Maestro. But, uh, Steve and I were broadening to say, this is really the sociotechnical system that the Maestro creates and the five characteristics of the Maestro is that they have high energy, high standards. Uh, they're great in the large, in other words, uh, they can see the entire system, but they're also great in the small so that they can ask good questions, uh, that, uh, uh, they know when they're being, uh, lied to and they love walking the floor.

00:25:07

And I think so many leaders in the devil community absolutely, uh, have these five characteristics. And so really structure is a phenomenal predictive performance. And what I find so exciting about the work that Steve and I are doing is that if we open up the aperture and look back at, uh, these pair wise comparisons over the last 150 years, where you have great organizations, soundy defeating, uh, their non great organizations, uh, we will see these characteristics. And I think what's even more interesting is that you have, uh, pair wise comparisons of before and after, for example, the, uh, Fremont manufacturing plant that was run by general motors. That's the before case, the after case is the same plant with the same people, uh, but run under the Toyota production system and the joint venture with new me, right. Uh, they went from worst, uh, to first.

00:25:56

Um, so that, uh, again, there's more evidence that we can isolate the variables that cause great performance. And that is literally the management system, the structure and architecture. So I'm hoping that you have learned about one, a broader interpretation of architecture than you might have heard before. Two is why we believe so strongly that structure and architecture, so massively impact the dynamics and performance of a system and that this elevates or further illuminates impact of structure and architecture in your own work. So, uh, stay tuned. Uh, we will have, hopefully, uh, in fact, uh, Steve, I should say, <laugh>, we will have a draft of book available for DevOps enterprise live, uh, later this in October. <laugh> yes, exactly. We are fully committed, uh, and it'll be out, uh, again sometime next year. So with that, Steve, I want to thank you so much for this incredible ability to work with you on this problem and, uh, uh, doing up things just like this.

00:26:59

Yeah. Jean, this has been fantastic. Thank you very much. I think we're really getting towards, uh, giving people meaningful, useful, practical answers to a persistent problem of how to work together successfully.

00:27:12

Thank you.