Las Vegas 2023

ACCOUNTING VS. PHYSICS – Coordination Costs and How Organizations Win

ACCOUNTING VS. PHYSICS – Coordination Costs and How Organizations Win

SP

Scott Prugh

Transformational Technology Leader,

Transcript

00:00:00

<silence>

00:00:13

It's, uh, it's really great to, to be here. It's great to see, uh, a lot of familiar faces. So, uh, thanks to that, it's, um, it's been 10 years. It's been 10 years, and Scott, so it's really a privilege to, um, to be back. I'm gonna talk, uh, a bit today about, um, something I call accounting versus physics. And, um, then really coordination costs, which are really kind of the physics part of this. A lot of, a lot of work that we do in software and in organizations, we usually measure, um, in very linear ways. And, uh, unfortunately our, our organizations behave, um, very differently. And so that's really kind of the physics part. I also did, um, a lot to kind of try to integrate, um, gene and Steve's, uh, recent book, uh, and really kind of overlay that onto the talk. And the good thing is there's just a ton of synergies there, which kind of bring, um, a lot of the discussion forward.

00:01:09

So I thought it was, um, uh, really good to overlay that. Um, I'm gonna start with this because I kind of view a bunch of our jobs as leaders, and at least my job as a technologist, as a CTO, as a, as a leader, to really bring focus, flow, and joy to our organization so that we can deliver outcomes, uh, effectively to not just the business, but the customers, of course. And to do that, we need to apply a couple things, obviously, leadership, but I also view architecture as one of our jobs as leaders. And architecture has kind of many dimensions, as you'll see. It's not just about how we build software, but it's also how we design the processes. It's how we think about enabling, uh, the great people that do the work. Um, and it's also, um, how we design the systems to really kind of all come together, and that's how we can deliver that focus, flow and joy.

00:02:00

So the problems we kind of start with in this, um, are really the things that we see, or what I call kind of the observed problems, right? The ones that bubble up to us every day. We do things like try to get the capacity of our teams. We do estimates, and we get to do estimates again. And sometimes we get, you know, asked the third time because whenever they come back, they don't, uh, appear to be right. Um, and those seem to fail miserably. They just don't work. And, uh, everyone's frustrated. And, you know, product management yells at engineering and says, you're terrible at estimating, and we need to get more productivity as it teams. Um, the teams, they struggle to make progress for a variety of reasons. You know, it may be that, you know, uh, bills and environments are really hard to, um, are really hard to work with.

00:02:48

Uh, it may be that their tools are inadequate. It may be that they don't understand the requirements, maybe that everything takes too long. Uh, escalations start to become the norm. So the way that then folks can actually get work done in those environments is they escalate to their leaders, they escalate to other organizations. Um, and that's, um, how they try to survive. Uh, the people in those environments, um, are, are waiting or frustrating, frustrated, rework, um, occurs often. And that comes back to us in production incidents or defects or things that our customers find and are frustrated with. And that comes back to the teams, and then that disrupts their current flow of work. And then, of course, you know, the, the, the outcome is that the customers at the end of all this, um, wait and are unhappy. And those are not, um, the, uh, the results that we actually want, uh, in our, uh, software development processes for delivering the valuable outcomes that, um, our companies, um, want for us.

00:03:44

Uh, so I'm gonna take us through kind of a, a journey of understanding this, and there's a bunch of stuff of the theory upfront, but I'm gonna talk about, and then the back end of this is an actual example with real data. It's a real portfolio that we transformed the data is, um, from what we pulled from doing some value stream analysis. There's a bit of it that is what I would call sped up. So that the tolerability is, is, um, a little bit, but, um, bit, it's a, a factual, uh, example. Um, so the first piece is really talking about the three layers of organizations and, um, how to rewire think about rewiring organizations for improvement. And then we talk about architecture and the three kind of dimensions there, organizational process and system, and how important transformational leadership is. Uh, and the thought process around, um, architecture, we'll talk about physics of the coordination costs, really kind of counterbalanced in that with the idea of using county based mechanisms.

00:04:46

Um, the three Cs of coordination costs, uh, the golden rule of dependencies, um, the three dimensions of architecture, and then how we apply simplification. And then we'll get into that example. Um, and there's a, a bit of math, and I'll ask some questions in there. And, uh, we'll see if, uh, folks, uh, um, are, uh, are paying attention. So the three layers, and this is outta jean's, uh, Jean and Steve's, um, great, great book. And we had a workshop yesterday on it. There's, there's the three of these. The, the layer one or that lowest layer is really the, the, the person that is doing the work with the technical object. And, and so in our world, those are really the developers, the architects, the testers, it's the folks that are working with the code and are working on the code. Layer two, the next layer off are really the tools that you use.

00:05:36

And these are things like your, i your version controls, things like cicd, um, telemetry infrastructure as code work tracking. Um, it's even, you know, the great new tools and you have to get copilot and AI in here, or otherwise it would not be a 2023 talk. Um, but those are what considered layer two. And then layer three is the organizational architecture, your system architecture, your processes, the way information flow occurs, and ideas and even behavioral norms and the cultural norms, um, all exist kind of up at this layer three level. And that's even, you know, how we treat people and you know, what, what type of culture we kind of create. And that is all that social circuitry. And, um, you have to be aware of these three levels because you have to affect things at multiple levels to make changes, right? If you only pick one of these levels and you'll see in a minute and make a change at layer one, it may not have that, that, um, desired effect.

00:06:29

It doesn't mean you might not do it, but it's just is not gonna have the desired effect. Um, overall improvement. So let's take an example. This is real data from a value stream analysis, um, of feature flow through a system. So a feature is something valuable that we wanna deliver to a customer. It comes in and it goes through this process, and we do some analysis on that value stream. And we find that it takes about 2 81 days from where someone asks for a feature to flow through the organization and actually get to a customer. Um, so if we kind of look at that and we kind of map that to layer three, that is really kind of that, that flow of work and ideas throughout the organization. Um, if you look, we'll see kind of the development phase, uh, marked out here, and that is only 65 days in a 2 81 day process.

00:07:16

And that represents some 23% of the total time. So the important thing is to kind of really understand this one is value stream analysis of how important it's, but also really kind of understand that, how that is hard coded in the organization. Because if you, if you can't affect those things, that kind of layer three, your improvements at other layers may not really have the effect that you're, you're hoping for. So if we kind of look at this and we say of that time, about half of the time is actually spent in an IDE coating, and that's actually generous. There's some analysis that we've actually done that, um, puts time that developers actually spend coding between seven 8%. And actually, um, Nick Ki um, from, uh, from plan view, um, finds about that time also, but we'll divide half make easy and we'll say it's about 12, 12% of the total lead time for that, uh, feature is actually spent in coding.

00:08:09

So we've got this developer all the way down to layer one, they're working and, uh, let's, they have some tools, right? There's this really cool tool, it's called copilot, and it makes developers really fast. So if we now go and apply those tools, which is, is probably a good thing to do, what is the best result that we can hope for here, right? If, if they become a hundred percent efficient, it's, you know, basically they'll get, you know, they'll get those four hours a day back hopefully of, of coding, but it really is the best that they can do is to have 65 days of coding, which is still 23% of the total time, and you're never gonna get all that back anyways, right? So this is also known as, you know, copilot might help, um, but it won't save you. Um, so do it, but don't expect to get the returns.

00:08:52

You know, developers will be happy, produce some better code. Um, so the thing to think about and, and, uh, Nick, um, had a blog a couple weeks ago and talked about, uh, work that has joint plan view, which is really to analyze the data at this higher level. Uh, Scott, Ella and I had a good discussion earlier too, around what can we do at this level to really analyze how the work flows through the system, get the understanding of that, and really start to try to, um, rewire that because that's really an important thing to tackle. Um, so this is, again, important to understand kind of across these, these three layers. So now I'll flip to the, the kind of DevOps picture, um, really came out of accelerate, you know, really about architecture and transformational leadership, and really how as leaders, um, our job is to build great teams, great technology, great organization, and enable our teams to re-architect, uh, the systems and processes.

00:09:46

And, you know, I just kind of flip that and really think about, in in Dean's book, we're really talking about rewiring the organizations, um, with the technical practices and lean product management, this is really, really important, um, because it really has to kind of coexist with, um, this, this rewiring has to coexist with these other technical practices. So let's talk about the physics, um, and, uh, really kind of what the play here. So we had the Phoenix project, which, um, presented wait time present, busy percent idle, which was really about how work queues up in systems and basically how people respond to, um, you know, being overloaded. And bottom line is, is that as you, as you overload folks that respond slower and slower, um, that's very important. The next is coordination risk, which is a one in two to the end chance of arriving on time with every dependency.

00:10:34

Uh, so as you add a dependency, your odds of arriving on time have every time. And then finally, the knowledge left, which, um, was originally published in, um, opex, uh, book, Jonathan Smart brought it into his book, um, which is, every time you have a handoff, you lose what's called tacit knowledge. And every handoff cuts the tat knowledge in half. And so these are really the physics that are at play. And I really kind of counterbalance this with the accounting that's generally used in software development, which is gimme your estimates on how long is something gonna take and basically divide the number of people, and now I know how many people I need, right? That's an account based approach to solving the problems. And, um, we really need to kind of think of differently because these are really the formulas that I play in the organization.

00:11:19

So kind of the properties then that, uh, sit underneath this that are causing these physics, you know, one is contention, which is conflict over access to a shared resource. Uh, so you contend for that resource, and now you have conflict coupling, basically, um, the degree of interdependency between two teams, resources, people in an organization, and then coherence, which is very important, which is the quality of forming unified and logical whole. So we can break a system into hundred parts, but if we can't create coherence from all those parts, we still have a problem, right? We can that, that in terms of microservices, if, you know, that was really where microservices are going. But one of the downsides of microservices is you over fracture the system. You make it really hard to understand. So there's a balance in all that. So now let's talk about the golden rule, which is, you know, you're in Vegas, we wanna double your odds, you know, removing a dependency, double your odds.

00:12:13

And it's very important. And the way we approach it is with, uh, this layer three architecture and addressing that and addressing the organizational architecture of the system and the process architecture kind of at level and to really kind of, you know, put this in contrast with how, how, um, Dean and Steve look at it. This is really underneath simplification and really how we look at modularization creating, um, linearization and working incrementally, um, in organizations to win. So now here's the example, and here's where all the math comes about. And we kind of set up, uh, this, this situation where have four teams. We have different tech stacks, uh, we have different time zones. You see we're kind of all over the place from UTC minus six to UTC plus five and UTC plus 10. Um, the features get delivered in these two week iterations. Um, we basically send all UI work to the UI team.

00:13:04

Um, all teams need to really deliver a coherent solution. And the features also often have dependencies because true domains config order, billing and ui. If you look at how we kind of got here, the reason we got here was team went about microservices theater. Every team should choose their own language, their own database, and as you go off and build their own microservice. And so we kind of had this fractured domain. We also had this kind of spa sanity theater that went about, you know, building a really complicated UI with hundreds of thousands of lines of JavaScript and TypeScript in the UI using kind of very sophisticated technologies there. And, um, you know, this is kind of what we, we ended up with. And so now this is where kind you get chat QT involved and you say, Hey, is this a modularizing linearized structure where work can be done incrementally and, uh, we're really not sure?

00:13:54

Um, but you know, you'll see in a minute that it's actually challenging to kind of, kind of work across this. So we take kind of round one of this picture and we look across and we load up features. We've got four features below. Um, we go get some estimates and we kind of balance capacity. We start all teams on all, we have some options of, of what we need to decide. We saw all teams at once we sequence the feature, we kind of will end up with a combo where we start on some parts of the features and then we coordinate the final functionality at the end. Um, and you kind of look, total capacity is about four 80 points, jelly, beans, whatever you use to measure there. Um, each team is, you know, and this is where we simplify it just to make it easy as the same capacity.

00:14:37

Um, and then we load them up to two 70. So all the data here says, you know, we got plenty of capacity. One iteration should be sufficient. You know, you might have some challenges with linear sequencing. Um, we get to the round run run results, and basically we find this, the teams take four iterations to get done. There's lots of overtime, there's lots of escalations, there's lots of escape defects. Accounting said it should take two 70, so less than one full iteration, but it really took us, you know, seven times that effort to get done. And so who wants to guess why? This is coordin coordination. So what chat couldn't see or we couldn't represent in the system, and this is the thing that we need to figure out is how do we surface those dependencies, right? How do we get those kind of into the system and visible?

00:15:30

And uh, those orange lines are those dependencies. So those are the, the distinct kind of coordination dependencies in the system. There's four of them that gives us two to the fourth, which is 16. So the odds are one in, one in 16 that you'll actually arrive on time with this type of system in process architecture in place. These codependent delays make everything late. Um, and that's kind of really important to kind of figure that out. So the round one countermeasures and in the accounting world, what do we do? What are the top three things that we do when we encounter this problem?

00:16:06

That's one of them. So we often pad the estimates kind of next time is one of them. The other is that, um, we had people only if I had more people to get the work done right? Um, and then we had more work to catch up. And so this is a really great way to, that's well, that they obviously already did, that's why they potentially got here. But, uh, they also called the microservices consultants. But, um, this is, this is how you light money on fire. This is an extremely expensive way to manage, um, a, a process. And look, we, we've seen a bunch of different companies, 10 different portfolios. We've seen this problem play out very similarly in lots of different teams, technologies, industries. Um, it, it, it unfortunately kinda often, often plays out this way. So now we think about it when we get back to this, you know, we're moving to dependency doubles, um, your odds, and we look at this layer three architecture.

00:17:06

Let, let's rewire this. And you know, this is where we take a little bit of literary license because this does not happen in minutes. It happens in months and years. And we kind of look at this and we say, solve for these coordination costs, you know, um, and removing the one dependency handoff doubles your odds, though not to be delayed. Can we remove several of them? So the first thing we look at here is this whole config in ui ui problem we had is all of the config for the system went through the config team. And let's just say that was a good design. But the problem was that we had to coordinate with that team for every time we put new, new config knobs in the system. So we make the config self-service, we had a lot of, you know, um, self-serve tooling. And so now teams that, that need, you need to put new config in the system, um, don't have to actually coordinate someone there.

00:17:52

So they come more of a platform team, and we do a similar effect with the ui. And so you can see those, those coordination lines go away and we can actually see greatly diminished coordination kind of on the ui. So we make the UI frameworks a lot easier to use. We embed UI on those teams. So now we actually start to remove a bit of that coordination. And this enables modularity and linearization in the system. So now what, what can we kind of do next? So now we actually have the harder problem that we look at here, which is, um, is, is coupling. And coupling is this degree of interdependency between software and modules and taking software sense and cohesion is the degree which elements in a, in a module kind of belong together. Like they should change, uh, together kind of in the system.

00:18:36

Um, so technical coupling is an easier thing to comprehend. And it's things like, I've got, you know, a service that depends on an API or a shared database, and those are easy to resolve. We have a bunch of patterns for that. The harder stuff is what is called functional or semantic coupling, where I have functionality in one subsystem that is dependent on another. And in this case, we had a significant challenge when billing changed. Order often changed, and then when order changes, billing must change. So we have this problem now across time zones, like if you look at that right, UTC minus six and UTC plus 10, they have a 16 hour time difference between two teams that have codependent functionality that they need to synchronize. So that is an extremely difficult thing because, because this is usually resolved by people talking to each other.

00:19:24

And if those people aren't even awake, like, how do you do this? Well, we put it into documents and we send it the next day. And yes, there's value in working asynchronously doing those things, but it's hard. It's very, very hard to actually do this, and it shouldn't be your default choice default. So if we kind of then go look at this, one of the things we kind of thought through, and this again took us months to, um, to, to, to go work through, is we basically collapsed those modules together and ported off of the one programming stack kind of into the other and collapsed the, into the software modules. So what that did is that actually removed these time zone kind of coordination from us. We basically hugged the co hugged the coupling, um, we basically used modularity and cohesion together, put it into one time zone, one technology stack.

00:20:15

And so that made it now a lot easier to coordinate. You know, people are not as good at coordinating as compilers <laugh>. So having stuff in the same module that you actually have code and maybe access interfaces, you still have modularity at the code level, but the compiler will actually tell you when dependencies are not necessarily resolved. That's a lot easier to do than to have people coordinate across, um, across 16 hours. So we look at this, the teams take less than one iteration to now get done. Um, it takes us basically about 360, um, you know, points to actually finish this. Um, where county total is 2 35. So the reason is we have one distinct dependency. So our odds are 50 50 that will arrive on time, which is a lot better than the four dependencies. So if we start to kind of summarize this up, there's a bunch of actions that we took.

00:21:06

One was to make the config and UI self-serve. Um, basically use a platform team pattern, self-service APIs. We basically implemented modularity and linearization, we affected the org, the system and the process. We embedded UI talent in the team. So that was full stack team pattern and cross skilling. Again, modularity and linearization. We affected the org and the, the processes there. We collapsed the billing and ordering, which affected all three dimensions, which basically we inverted the coupling. Um, by bringing those things together, development standards, same programming, language counseling, cohesion was key. So if you look at before you can see kind of all the coordination, four handoffs, basically a one in 16 chance of getting home. The cost was 1920 afterwards. Basically we, um, we have one handoff, basically, which is a 50, a one and two chance. Um, there would be no delay. The cost is 360.

00:22:00

So that's about five times better, um, result that, um, that we actually got here. So the final thing, and I'll talk about the real problem, which is goes beyond just, um, doing development. This is a support flow of incidents kind of coming through from a customer that talks to level one and level one talks to level two, and then level two might talk to the, and then I might go to the engineering teams and the engineering teams have to coordinate across all that. If you go look at this eight distinct dependencies, and so you really have a 1 56 chance of arriving on time that, you know, basically there'll be delay here. And so one of these kind of dependencies that you can remove starts to make a significant difference in how you can actually respond to the customer at the end. Um, and you know, we'll see this often where just kind of, there's multiple groups that keep getting layered on to actually, um, sit in front of the support process.

00:22:54

And you get just this really kind of diminished effect on diminished response time for your customers. Um, and then the, you know, the final thing here is these long feedback loops. They thwart any effort now to refactor your architecture and system. You get these things in and changing is just so hard because it takes so long. So the final kind of summary here is, you know, architecture plus leadership equals, you know, that focus flow and joy. You wanna win by rewiring and re-architecting the system, the physics, the coordination costs really be great exponentially. It doesn't behave, you know, in, in an accounting way. The three Cs, you know, contention, coupling, and coherency are really what you're battling the golden rule or remove that dependency and double your odds. Um, the three dimensions, um, of architecture, the process system and then giving simplification and then the patterns. Platform self full stack teams domain and team modularity and cohesion, time zone cohesion, and getting kind of standards in place across the teams. Uh, that's it. Any questions? Uh, I have like a minute left.

00:24:11

The automation, I mean, nobody objects to a dependency on aws. Yeah. So it's really about human in the loop dependencies that introduce variation, right?

00:24:21

Yep. And it's, it's really about the, yeah, the coordination costs that really sit under the dependency, which isn't actually captured in a simple, the simple dependency equation. Um, but I really think of it as self-serve, right? I don't have to meet a person at a point in time to coordinate that dependency. I can choose when to do that myself. And so that's very important.

00:24:44

And one other question, could you clarify a little bit your usage of the word linear ability here? <laugh>? I had a brush with transaction theory 20 years ago, so,

00:24:52

Well, so, so that is a new, um, a new term from Jean's book. Um, and, uh, so I'm kind of new to it too, but basically it means that, um, multiple work paths can be executed at the same time and you can join the result together at the end. And so, and you need modularity to be able to do that because if you think about software, I need modular interfaces, things that can be tested independently, and I can bring those together later on. So now I've decoupled the dependency in the work stream that those two things can work linearly. So, but it is a new one for me. <laugh>. Yes,

00:25:29

Sir. Um, the question is regarding the dependencies which are outside your sphere of influence, especially external parties. Yeah, yeah. I mean, resolving dependencies, contentions, et cetera within your sphere is probably relatively easier. I can exert my influence.

00:25:42

Yeah, you bet. What

00:25:43

About boundaries?

00:25:44

Yeah. Well, that's one of the hardest things and you know, um, I'll, I'll, uh, Damon Edwards love tickets to put in more tickets and, and unfortunately that, uh, <laugh> that, um, that ends up being the operational norm for most organizations is once you get outta your boundary and control, everything goes to ticket coordination and it becomes extremely difficult to, to coordinate that morale. It's very, very hard. Um, you need to be aware of it and, and hopefully you have some sort of influence to start to affect that very difficult shared outcomes are one of them. Yeah. Feedback, feedback loops and, and even just making, oftentimes folks, you know, Gordon don't even make the data visible. Like, you know, it's like, well, we put a ticket in, right? But just wait. But like, how long does it take? Well, 12 days and like, I can't do anything for 12 days. And, and oftentimes folks don't even know that, that that's what's sitting in the system. So well, so a lot of

00:26:42

This is, is the result of previous accounting based decisions, right? <laugh> I this team in this location because they're cheaper. Yeah. And I'm gonna use this third party because they got me a better deal. Yeah. Um, it would seem that one thing we could all do is try to bring, um, forward in the shitty estimation process, the accurate calculation of the costs of these previously unit cost focused things. Because the real answer is undo all of those crappy decisions and have co-located teams that own more rather than less. But most organizations gut themselves in a path where that's impossible, but, but they don't know the cost until the cost has been incurred. So I, I love the way you framed this up. I just think trying to find a way to do that as an upfront diagnostic rather than an after the fact explanation might actually serve us all really well.

00:27:39

Yeah. I mean, it decays over time, unfortunately, right? Like no one at, at the beginning people didn't design it this way and over time it, it, it progressed that way and you just continue to decay and it's almost like the boiling frog problem. It's like, well get more and more painful kind of over time. So gentlemen over there.

00:28:00

Yeah. So, uh, when you take these plans to different corporations, how do they respond? Because it, it

00:28:07

Feels like

00:28:07

Someone is going to have to either give up or share power to really make it go Yeah. Political issues.

00:28:14

Well, I, I think, I think you highlight the challenge, right? So, so, you know, for this type of analysis, that's kind of layer three. One is are you willing and willing to do the analysis? Um, I was talking with some folks this morning on that and even look at how work flows and then once you kind of get to that and make that data visible, are, you know, are your, is your leadership even willing to take action on that? Or are they just willing to live in this case where, you know, all these expensive developers are only spending more or less 7% of their time actually writing code. Um, and so it, it, it's a challenge, right? So you have to have support at that layer three level to actually start to address these things. Anything else?

00:28:59

I got one more. Um, what about with PI planning and the cost of coordinating

00:29:04

<laugh>? Well, I mean, there's a lot there and we can get into Bing and all kinds of other stuff, but I just, to be honest, it really kind of comes down to more of the question at the leadership level. Do you understand these are the types of things that one can happen? And it doesn't mean that PI planning is bad, but there's ways to actually, incrementally plan and work and restructure the architecture, restructure the work processes that you don't not put it with a massive batch that produces awful outcomes. So, um, I'm not sure I answered your your question per se, but there's a way to have a one hour PI claiming, and I've seen it done. Um, but you didn't get there overnight and you didn't get there without continually affecting a lot of other things in the organization. All right, thank you.