Europe Virtual 2024

Decoupling from Foundational LLMs

A dialogue between Dr. Mik Kersten and Gene Kim.

DM

Dr. Mik Kersten

CTO, Planview

GK

Gene Kim

Founder, Author, IT Revolution

Transcript

00:00:00

So to set up this next talk, um, lemme tell you, talk about who it is and then what we're going to talk about. So, the, who is Dr. Kersten? So he wrote the awesome book project to product, uh, now six years ago, which is being used by so many organizations that change how they think about their software efforts. And, uh, he's now currently CTO plan view. And over the years he has taught me so much about architecture, which so much dictates how organizations are wired, uh, which is all about, uh, the book that I worked on with Dr. Steven Spear. Um, so as is so often when I talk to Mick, uh, he said something that totally shocked me. And this is all about how we may inadvertently be coupling ourselves to LLMs and, uh, more importantly, what we can do about it. So, uh, this will seem like the, uh, oddest way to set up a question, but, uh, let me set up the question by telling a story, Mick.

00:00:51

And, uh, um, I can, I don't have your video yet. So the story is, I was, uh, telling him how over the last year, couple years I've become very paranoid about shared services. Um, so I was a part of the team that, uh, uh, we created the, uh, forum paper called The Checkbox Project, uh, where, uh, I think it was just an incredible description of, uh, how, uh, small things require superhero efforts. And, uh, you know, one of the people was on team, she's this incredible technology leader. She said, uh, you know, we have an SAP role security team. Uh, they have an SLA of turning around changes in two weeks, but their average was around seven to eight weeks. And they had an NPS score of negative 87, uh, which I thought was pretty funny because, uh, before I heard that story, I thought NPS scores were from zero to a hundred, not negative a hundred to 100 <laugh>.

00:01:36

And so her countermeasure was, you know, uh, she said, we love shared services, but not there. So what did they do about it? Uh, they took the group and broke them up and embedded the, the SAP role, security engineers into the business units so that they could, uh, you know, do the prioritization themselves. And the problem just disappeared. So I shared that story with Mick and we're sort of, uh, laughing about it. And then, uh, he shared a story about how you're going, you came up with an example of how you're going in the opposite direction. You're centralizing certain components of your AI team because, uh, the way you did it, it has, uh, now, um, the, the situation you found themselves in is that it took months to switch from GP four to G to cloud three. My reaction is what, how can that be the API shape is almost identical, like the switching costs should be minimal. So can you talk about, tell us about, introduce yourself, tell us the story of like, uh, verify that the switching costs wasn't small, how that happened and what you're doing about it.

00:02:31

Hello, Jean, great to be here. Thanks for having me. And yes, this has been just a, a, a fascinating journey. So I think the, the entire, uh, DevOps and technology leadership community over the years has, I think, become better and better understanding the importance of architecture, right? And I think that in the, over the last, especially the last few years, we've actually seen more and more literature and more and more contributions in terms of how architecture and team structures interact. So the interesting thing for me is, I, I thought I knew how to deal with these things. I thought I knew, you know, for example, you build it, you run it, uh, team structures are more effective and more cost effective and more effective for driving team autonomy in cloud than having things completely functionally distributed, right? Uh, I, like you said, I avoided these shared services things to help support that autonomy, to make sure value streams nested value streams had as much autonomy as as possible.

00:03:24

Uh, and then a year and a half ago, I found myself with, you know, something that we created with our AI and data science team at the time, which is a very successful demo of a co-pilot leveraging GT three five back then. So it was time to scale this. Uh, and it was time to actually, as we scale, of course, more and more people came onto this, uh, on, onto this product, this new product. We realized this product was coupled in a, in a more interesting way, right? Because of course, we real, these LLMs themselves are highly capable. So they're this kind of external shared service that that just tends to work. And like you said, it seems like the, the API shape looks similar. You know, back then as we were getting our hands on cloud two as well, um, for philanthropic, it looked like it was all kind of the same.

00:04:06

Um, but we realized we had an interesting use case. 'cause their use case was really more around quantitative data and structured data less just around, you know, let's say images or, uh, more natural language data because, you know, just the, the problem domain that, that this plan v co-pilot existed in. So, uh, we were building up the teams and I realized that some odd things were happening. And keep in mind by the way, I had, I had the privilege of reading wiring the wind organization fairly early on, uh, kind of around around the same time that, that we were all learning well, prior to learning about GPD three five. Uh, and I realized that there was some really odd things happening in terms of the concepts that that really, you know, you say you, you would, I helped inspire some of them, but it really crystallized that book for me, crystallized in the core concepts, uh, need that I needed to actually understand the organizational structure and how to wire it.

00:04:54

So it may be hypersensitive to the signals that were being amplified or signals that were being suppressed, as well as, of course, the thing I'm used to being very sensitive to, which is the, the coupling cohesion, um, between these teams. And, and so what was happening is, uh, that we were having the prompt engineering parts, right? One of these really important new skills and super powerful new things that we can do with large language models. Uh, they were getting sucked into the various products because this plan v copilot needs to work in the context of a dozen different products that, that, uh, that our company offers. And so the product teams, let's say the team working on the road mapping tool, right? They were starting like looking at ma writing their own prompts. And I realized something odd was happening because what we of course wanna do is we want to make sure that we've got as much modularization to go as option value, uh, for the changes we want to make in the future.

00:05:47

And, uh, our principal data scientist let me know that fairly early on, as he was experimenting more, you know, of course GPT 14 months, the market CLO three was coming, or, you know, as soon as we got a hand on it and started experimenting it. And lo and behold, the chain of thought process, how you get these LMS to reason over data more iteratively, which is really what what we rely on, uh, turns out, was very different. So the way that you feed it, the prompt and the data for CLO three was very different than than GT four. And yet what we've done is avoiding the centralization. Uh, we, you know, we were basically spreading out the knowledge of creating these agents and prongs into the various product teams. So now fast forward, let's say, uh, six months, if we actually decide to make the change of LLMs or something new came onto the market, someone else brought one at scale, uh, it would be extremely difficult because every one of those teams would then be wired in with their prompts with this LLM. So by having basically decentralized, we would've coupled across the organization, across the teams into GBT four, and I realized something was wrong.

00:06:55

And by the way, that is just fascinating, right? I mean, you've spent decades studying software architectures and, uh, what, you know, what it takes to decouple and to see the same phenomenon showing up in how we communicate and use LMS is a little bit shocking to me. I mean, is that, so this is definitely, can you just confirm that this is definitely a form of coupling that increases the switching co that prevents, or at least makes it more difficult, uh, to change, it increases the cost of change. Is that correct interpretation?

00:07:22

That, that's exactly it. And there are, I think there, there's two, there are two, there are a couple really shocking things to me, right? So first of all, there's an, it, there's a very interesting thing with, with the gen AI work that's being done, right? Which is that demos are very easy because you don't have to care as much about architecture. So I think everyone across the industry persons made super cool demos that really leverage the power of these LLMs. But to make something really unique and interesting and powerful for your, your customers, uh, for your users, you actually need to do some fairly serious work with the data. Because it, what really makes them more interesting is the, the data and how you're curating and feeding the data to the LLM, right? Otherwise you're just getting the, the out of the box functionality. And then to do that, you really need to have the right architecture then of course, agents that can act on that data and do interesting things in the context of your, your products and your applications.

00:08:13

And so in the demos, no one really has to care much about the architecture. But what happens very quickly, and even more quickly to my experience, you know, we rewind back to our first experiences with cloud. We, we, we all became fair, fairly sensitive, fairly quickly to the cost of cloud, right? Understanding if we lifted and shifted. And there's been amazing work and research in this community around that things actually got more expensive than cloud, not less. Well after your very first gen I demo, when you, you know, you've built something on top of the, the APIs provide by GPT or by flawed, it becomes very clear very quickly that this thing is gonna get extremely expensive if you don't actually think about how you're going to, to make those LLM pumps. So all of a sudden, from both, from the architect's perspective, the cost profile becomes really significant.

00:08:58

And then what we realize is even more importantly, this is the opts for coupling the wrong ways for not properly linear linearized the work and not properly modularizing what you're building, uh, are just massive because what do you do, right? Who's responsible? You've got kind of this coupling between the understanding of what it should do for particular product domain, like road mapping, let's say. Right? Um, uh, but then you've, you know, who's going to under, are you gonna put a data scientist or four data scientists in every team where we're at today, by the way, Jean is, we've got over two dozen people working on the centralized core of this thing. 'cause that's how big a problem it is. And it turns out basically the, the wiring between those people is critical. So who runs who, who operates? Um, who does LL mops? Is it the product teams? Is it the AI team, the data scientists who live in the world of Python and, and run when they see Java because they don't like it <laugh>? Um, or is it, uh, is it, is it the co-pilot team? Because the, the product front end, those questions turn out to be very difficult to answer

00:10:02

<laugh>. Okay, so let's just put that in the box for a second. So can you connect the dots about sort of what you did about it? So, uh, you centralized that group, uh, because you wanted to make sure that the surface area of coupling is actually con uh, confined, right? And so that you could actually bring down switching costs. Can you connect the dots of like what you expect to have happen by centralizing that concern and how that will lower the, uh, uh, cost of change and decouple yourself from a specific LM provider?

00:10:26

Yeah. So, well, let me tell you because I, it, it sounds like it was, this is all very thoughtful and planned out from the front. So let me just correct that <laugh>, um, <laugh>, uh, I said, no way in hell the data science team is gonna do. You build it, you run it, we do, you build it, you run everywhere else except data science. 'cause who wants data scientists on call and data scientist don't want, don't wanna be call, right? So I I, I said that some months ago yesterday in our quarterly product review with our whole leadership team, uh, our principal data scientist said, we just shifted to, you build it, you run it because that's how we're gonna move fastest. And we now actually have engineers on call, including data engineers and engineers supporting those services, right? Some of those services, the RAG services, uh, which generation really simple to you, throw into a Lambda and, and support some of them, some of the, the actual agent services that are using various parts of, in, of various LMS and other models are actually quite complex.

00:11:16

So I think the key thing is to have a set of principles and really this is where we apply the principles of wiring the WIN organization in our basically monthly discussions about this, right? Right. Because we learned as we went, and again, some of the architectures out there, like plan chain is great for prototyping, it's just not enough in terms of helping us understand how to create the architecture for this thing. And the architecture and team structure, of course need to line up, right? This is where I think we've got very good words for understanding software architecture. Now we've got very good words and concepts and uh, and, and frameworks for understanding the wiring architecture, right? The system three, and I encourage everyone to, to dig into this of course, from wiring the winning organization, um, the, the, the level, this level three architecture of how things wire up and we've had to change it.

00:12:04

And what we realized is that we had to, I, I guess to me this is the biggest lesson. Uh, we had to have conversations with the architects and team leaders, both from the AI and co-pilot team and the product teams on effectively a monthly basis. And today we're actually running two different wirings from two different parts of the portfolio. 'cause we're gonna see which one wins, right? Mm-Hmm. One. Now we did one thing that we did do Jean, um, is it, it is already clear to us that we need to have basically tight cohesion in all the prompt engineering. They need to be in one repository, and we need low switching costs for LLMs because we can't predict exactly how LLMs will evolve. We don't know what's ha coming in g PT five, right? Because now everyone's excited with excitement cloud three, um, both cost and performance wise and context window wise. But what if then they get very excited about we can do all sorts of new things in G PT five, let's say G PT next. So we, we knew we had this principle we needed optionality, uh, and basically type, you know, type education and all the prompt engineering so that we changed their centralized way, because if they were decentralized, it would be too difficult.

00:13:19

Uh, so interesting <laugh>, so, so interesting. And so maybe just to really, uh, concretely land the point, um, uh, Ethan Molik, Dr. Ethan Molik talked yesterday about how the LMS we're using today are the worst and the most expensive they'll ever be. Um, and we just don't know, uh, which technology is gonna emerge as, uh, the best in the short term or medium term. And so, uh, you know, this is why we must enable optionality and we must enable low, uh, switching cost. Uh, am I capturing that correctly?

00:13:48

You are. And so this is, this to me has been the, like, the fascinating thing is, so I think two key lessons, and of course we're still learning, uh, so this will be an ongoing process, but I think one thing is clear is that the cost aspects of the architecture, we've never seen them be this profound where you can be paying basically your, uh, the, the, the product that you're building can be completely invi if you're, if you're making excessive use of LLM calls in the wrong way, right? We already use multiple levels of LMS for depending on what kind of prompts coming in. By the way, of course, the, the top one, the one that's looking at which agent to pass it to, of course, has to be the most powerful, um, like opus or, or GPD four. But the, the, you know, with that, the architecture within that will actually determine the cost profile.

00:14:36

And a lot of, if you don't do get the architecture right, your cost profile will be too high. Now, of course, if you don't get the team wiring, right, if you don't get the organizational wiring right, then what you'll do is you'll, you'll put yourself into the net, right? Because the d it was very clear that the, the, just applying the architecture and team structure principles that we thought were correct by decoupling and decentralizing, which by the way is just what we've done for our data, right? We have a data mesh architecture to enable all of this, which is completely decentralizing the ownership and production of data catalogs to all the product teams. So it's wonderful. It's, it's exactly in line with your story. <laugh> <laugh>, um, on, on the, on the horror of shared services, if we'd centralized or, and made a centralized data lake made a, if we'd done that, we would not be where we are today.

00:15:23

The decentralization was critical, even though, of course everyone wants a centralized, a bunch of the data governance and those sorts of things, just a common governance layer in decentralization. So we were headed that way with the gen AI capabilities. If we'd done that, we'd be today locked into GPT four, and it'd take us months to switch to, let's say cloud three. Right now we've already got cloud three up and running, and we can switch between the two of them because we centralized the thing where we wanted, again, that cohesion and then the loose coupling to the external shared service, which is in this case this extremely expensive thing called the LLM.

00:15:57

Yeah. This is so awesome. I, I just, uh, the reason why I thought this was, uh, talk was so important for this community is, uh, just because it's the same principles, uh, the same sort of symptoms in a place where you just wouldn't, or at least I didn't expect it. And by the way, uh, Mike, uh, McLarty and Steven Fishman are talking later today about their amazing books Unbundling the Enterprise, which is all about preserving optionality in those conditions where we just can't predict the future, which is very much seems to characterize, uh, the world of ai. Now. So Mick, you said also, uh, we have five minutes. Uh, you said something to me that just kind of struck me, took me aback. You said, you and I talked about how wiring, uh, a technology organization isn't getting easier, especially as we were adding all these new functional specialties, you know, to our products and value streams. Uh, and in fact, in one time you said, all I do these days is think about who should be doing what and who should be talking to who. Can you talk about, uh, one, can you confirm that you actually said that and I didn't make that up? And if so, like why <laugh>? Uh, you know, can you just sort of paint the picture of like why that's so difficult and, uh, maybe some of the, the lessons you've learned?

00:16:56

Yeah, I may have said, who should be talking to who, or I said, who should be talking to who as well, right? Yeah. Because that's the, you know, we need now it's these things again. They get, they get such tight coupling and that that's okay for demos, that's okay for your proof of concepts. That's great. It's when you wanna scale this thing that this becomes highly problematic and you have to make the decision, are we embedding AI and data scientists in every team across the organization, or are we centralizing them? And for that, you have to have these guiding principles and the language and structures to talk about. Again, the language we use is, is from wiring to the winning organization. So Gene, to me, right now, what I realized through this, this story is that, you know, we talk a a lot in this community around tech debt and so on.

00:17:37

The organizational debt that we would've created by decentralizing is, is completely shocking to me, right? The fact that we would've made it hard to make a switch where we could have had a half the cost, let's say, um, or just much more capability by not actually ha creating the right again, the right cohesion and minimizing coupling in the right spots. 'cause you always have coupling, right? We have to make this, again, this very odd decision with a centralization. We now have an AI and data science team who has to operate the, you know, do you build it your own, operate the software, but it, it was, it's the right thing for this context. The key thing is we had the language. Now keep in mind I'm talking about two dozen people, and this is this consequential. So when you actually scale this, I was, you know, I was speaking, uh, a couple days ago on exactly this topic, um, to the chief technology officer, one of the larger banks.

00:18:31

He's dealing with the same problem as he's looking at how to roll out gen ai, how to deal with their data platforms, what to do with on-prem versus how much data moves into the cloud and support all of this and so on. And, and again, this, this problem is now orders of magnitude larger in terms of the number of people. And, and what I think has been happening is, I think one of the reasons we're seeing some organizations, again, that's kind of the studies that, uh, that we've seen in this community. Um, some organizations be a hundred, a thousand, you know, in the last stock we saw potentially 10,000 times more productive. Uh, is, is because of this. I think we attributed so much of it to bad tech debt, that the, the wiring is, I think in this case, if we got the wiring wrong, that would've created the tech debt.

00:19:17

Yeah. Right? So, and it's not because the teams were doing the wrong things. It's because as leaders, we would've put in place the wrong condition for them to create the right architecture, the right platform and investment. So for me, this was a really profound shift, is that, uh, as again, and, and this is what's so interesting, is that leaders affect us directly and every day is by not allowing us to rewire monthly, which is what would really restructure the teams on a monthly basis, which is, you know, you don't do as often. We, we try to do it quarterly. A lot of companies only do it annually. Um, by not putting in place the right conditions to rewire, we would've put the company in the tech that that end. And we've complained about the tech debt, not about the wiring

00:19:54

<laugh>. So good. MI always learn so much every time I talk to you. Uh, is there any help you're looking for these days that you would want people to reach out to you? Uh,

00:20:02

Yes. The, the big thing is, is just, just two things, and then maybe they're not two little things. One is just best practices on structuring gen ai, data science teams, and then these copilot agent product teams, right? What you're doing, did decentralize, did you decentralize, did you evolve it over time? So that's one thing. And then the second thing is the architectures, right? Like, again, we've got like the start of these things with <inaudible> and so on, but scaling these architectures, I think there's not enough guidance around there yet. 'cause I know we've had to create our own, and we're just hungry for looking at what others have done. And of course, not just at the scale of, of say, Microsoft or, or open ai, but at that, at that scale of, um, of, of product and enterprise organizations. So,

00:20:44

Awesome. Thanks m thank you so much. And, uh, I, I'm so delighted to make time for this, uh, today. Uh, looking forward to catching you soon.

00:20:51

Thanks so much for having me, gene.

00:20:53

See you Mick.