Las Vegas 2019

Health Care Modernization at Scale

David Cherryhomes loves success so much he embraces failure on a daily basis. He has held multiple roles in technology where he learned the hard way to craft flexible designs, write clean code, reduce risk through automated testing, monitor and automate everything. He dislikes Marketecture, armchair architects, and seagull managers. David is presently enjoying failures and successes as a Vice President of Software Engineering at Optum, a UnitedHealth Group company. David has a BA in Philosophy from the University of Southern California (failed to launch that career) and a MS in Software Engineering from University of Minnesota (yay for success!).


Heather Mickman is VP Platform Engineering + Practices at Optum. Throughout her career, she has built APIs to unlock enterprise data, created awesome platforms, led Ops organizations, and built supply chain software for Fortune 50 companies. Heather has a passion for technology, building high performing teams, driving a culture of innovation, and having fun along the way.

DC

David Cherryhomes

VP, IT, Optum

HM

Heather Mickman

VP, Platform Engineering & Practices, Optum

Transcript

00:00:02

One of the next speakers is someone that the DevOps enterprise community knows very well. Heather Mickman presented at the very first DevOps Enterprise Conference in 2014 when it was in San Francisco. Along with her colleague Ross Clanton bought the incredible transformation underway at Target over the three years. She described the amazing API enablement effort that allowed hundreds of programs, uh, and business initiatives to succeed, which liberated developers from having to wait for integration teams to grant them access to critical data they needed. Their story was a huge inspiration, uh, for a big portion of the Unicorn Project. And Heather Mickman is back after a two year absence to present about her amazing work at Optum as their VP of Platform engineering. She's co-presenting with dairy David Cherry Homes vp it Optum, about their work to modernize healthcare at United Health Group, the world's largest healthcare company serving over 230 million people. With that, Heather and David,

00:01:05

Well good morning everyone, and thank you Jean for the really warm welcome. Uh, I'm really excited to be back on this stage and to be here this morning with David. Uh, we're looking forward to sharing some of the journey and learnings from our work to modernize healthcare at scale. So, I'm Heather McMan, and I lead the platform Engineering and Practices team at Optum Tech. I have accountability for providing many of the platforms and corresponding architectures and patterns for our development community across the enterprise, including like our big data platform, uh, data movement capabilities like streaming API gateways. We also work very closely with our public cloud team, uh, so that we can provide a unified public and private cloud strategy for our developers. I've been at Optum for just over two years now. Um, and we're definitely in the early days of these capabilities. I'm here this morning with David Cherry Homes, an awesome engineering leader, uh, working to modernize and build, uh, a strategic next generation capability for UnitedHealthcare.

00:02:04

And those are both of the stories we're gonna tell you this morning. Before we dive into that modernization journey, um, I'll give some context on UnitedHealth Group. So UnitedHealth Group has two distinct businesses. Uh, we have Optum Healthcare, which provides healthcare coverage and benefit services. And Optum, which provides information and technology enabled health services. Both of these business businesses leverage three core competencies, clinical insights, technology and data and information. David and I sit within the Optum organization as part of our Optum technology group. Um, I'm part of the infrastructure team, and David sits on the, on the app dev side of the org. Um, and arguably is one of my most vocal customers.

00:02:49

So this is the UnitedHealth Group mission, and I love it. In fact, it's one of the reasons, uh, why I chose to work at Optum. We help people live healthier lives and help make the health system work better for everyone. It's really inspiring to work on problems that help make the health system better for all of us. Imagine waking up every morning with that mission in mind. It's pretty awesome. So for some quick perspective on the scale of United Health Group, it's a Fortune six company revenues of 226 billion, serving more than 230 million individuals globally. We have 300,000 employees, and it's the world's largest healthcare company. So with that kind of scale, you can imagine that it takes a lot of technology to make it all happen. So here are a few interesting numbers to get a feel for the size of our tech org. We have more than 21,000 employees worldwide.

00:03:41

We've been very, very focused on elevating our engineering culture through a number of different initiatives, including our, uh, a large college graduate program referred to a peer as our technology development program, as well as creating an engineering track, uh, for fellows, distinguished and principal engineers. We spend approximately 3.5 billion annually on technology and innovation. And you can also see from the numbers here, we have a significant, um, uh, a significant data center footprint as well as a large mainframe. Not called out on this slide, um, is our journey to the public cloud. We're in the early days for sure, um, and I would imagine this time next year we'll have some great successes to share as we go through the learnings there. So it's a massive organization, and I'll admit that it was more daunting to join and learn how to work across the teams. And I had imagined, um, I mentioned before, I've been at Optum for, you know, just over two years now.

00:04:36

And it was a challenge, uh, to learn how to work effectively across such a large organization to influence and build the relationships that I needed to in order to drive across enterprise strategy and vision. And that's when I'm gonna talk through just real shortly. Um, it's been an incredible experience with a lot of ups and downs, and of course I've, you know, kind of adjusted my approach of, um, along the way. But it's been fantastic fun getting to build my amazing team, uh, evangelize well define and evangelize modernization strategies, and of course, working with a lot of great people that I've like cherry homes, of course.

00:05:11

Okay, so onto the stories that we're gonna share today, we're clearly working at massive scale, right, uh, across UnitedHealth Group and within Optum Tech. And there are endless angles to our DevOps modernization journey and a lot of great work that's been happening over the last many years that we could literally spend hours talking about. And I'm sure we'll have, uh, the opportunity to talk about that with many of you here over the next few days. Uh, we did though wanna focus this morning on two particular aspects of that journey, unlocking the data and modernizing a large engineering team. I believe that unlocking data is table stakes for any successful DevOps modernization. So I wanted to share our approach in progress because it's an essential foundational building block for modernization. And David will share his journey in modernizing a large engineering team. It was really, really overwhelming trying to fit these stories into a 30 minute window, but we'll try our best.

00:06:04

So let's get started. So accessing data reliably, consistently, and in known reusable ways is arguably our number one developer challenge because we have a lot of data. UHG has grown significantly over the years through acquisitions. So you can imagine that as a result, there are a lot of different source systems, and these are across the primary data domains that you would imagine, um, in a healthcare company like eligibility, claims provider clinical. And when I say there's a lot of different source systems across those domains, I'm not talking about like three or four different source systems. We're talking in the double digits. And as a result, data integrations are incredibly complex and typically take many months with significant price tags to build. Data isn't standardized. So engineers would need to understand the internal workings of the multiple sources and solve over and over and over again across many of these sources anytime data was needed. And we usually need data when we're building software, right? Um, I shared a similar challenge and Gene um, mentioned it as he was doing our intro this morning. I shared a similar challenge on this stage, uh, from my former organization five years ago. And it's fun working to tackle a similar though very different challenge at a significantly more complex scale.

00:07:25

Uh, so in addition to lower engineering productivity as a result of the integrations and having to spend so much time, uh, time just getting access to the data, we also had thousands of APIs exposing a lot of that same data, which in which then also introduced incomplete and inconsistent data challenges. These APIs weren't inventoried or discoverable because so many were project specific and or point to point integrations that had been built over time to start the analysis of cataloging, uh, of cataloging the APIs. We actually pulled the API gateway team into my organization because we literally had to scour what was deployed into production to understand what we had, um, and saying that's a lot easier than actually doing it. Um, that said, it's really important of course to understand where you're starting from, and we were starting with a lot. Also interesting to note is that many of these APIs had five to 10 different versions and in some cases even more running in production.

00:08:23

And again, that's an artifact of our project based work versus product, as well as, uh, not having kind of backwards compatibility as we think about how we're building our software. So we define the data domains in which to catalog the APIs, and we rebuilt an API certification and registry, uh, uh, certification and registry program to focus on our customers. And in this case, our customers are the engineers. And the goal here was removing friction and making this as easy as possible and automated as well. Uh, so we could turn it into kind of an easy to use workflow versus what previously had existed, which was largely very manual. We still have work to do here, but it's, we're definitely have made some great progress. Lastly, as part of this work, we identified the highest volume, API that, and that would be our initial use case for how to unlock and modernize access to our valuable data.

00:09:15

So in order to unlock and modernize access to the data we needed to platform enable it. When I say that, what do I mean? I mean, standardizing, mastering, protecting the data once for reuse across the enterprise in near real time. We need to insulate our source systems so that we can start to modernize those systems and continue to have access to the data platform. Enabling companies is not a new concept, so I'm definitely not winning any kinda rocket science awards here talking about that, right? Um, and that's, and so what we did, of course was to look to industries that have already platformed, you know, that have already gone through this platform, uh, journey and digitally enabling, uh, their industries. So retail, finance and the tech giants. What we had to do though was to figure out how to implement that within the world's largest healthcare company.

00:10:04

And that is incredibly difficult. So we use that highest volume, API, that directly consume from a large mainframe source system to build the MVP implementation and patterns. We've also started to decom some of those APIs in the point-to-point integrations. And we still have a lot of work to do to rationalize the ecosystem, but the great news is, is that we now have, uh, a framework in place. We understand where we are and we can build the roadmaps for what's next. So when I say platform enabled data, what do I really mean? And how do we think about implementing it? So here, this is a typical data pipeline, right? We source, de normalize, standardize, master, protect, and then filter the data based on the access that, you know, based on the data that we should be seeing. And these pipelines are created across many, many sources that we have. And remember, there are the number of sources are continually increasing because of the, the acquisitions that are, that continue to, that we continue to do. And then those pipelines are built for, you know, for many, many sources to enable many, many different use cases over and over again. Sometimes it's an exponential issue, right? Magnified further, like I said by the acquisitions as we continue to increase the number of source systems in our ecosystem.

00:11:18

So instead, we do the shared work once, which by the way, is incredibly difficult. And then we make a config driven consumption for our consumers.

00:11:30

So that is how we think about platform enabling our data at an incredibly high level. Um, we platform enable the data once to be used across many different consumers. And that provides us, of course, faster time to innovation and value consistent, complete and secure data across multiple channels, ensuring consistent experiences for our customers, whether that's on portals, mobile apps within clinics, or even in our call centers source system installation. So I've said that a couple times. That also is very important as well in this approach because it allows us to modernize those source systems as well as we move into an event-based microservices architecture. Now, getting to a place where I can say that the organization has largely embraced this architecture, has definitely been a labor of love across my team and many others across the organization. The MVP platform and patterns were built with some early wins, and now we're, we're starting to scale to the cloud and build additional tooling to enable a federated delivery approach across the large, um, across our large, uh, organization so that we can incrementally deliver value.

00:12:38

So here's where we are. Uh, so we're, uh, we've been working for the last one and a half years or so, and have had some initial really great successes. Technologies have been built to enable the platform architecture. They've been built in our data centers as well as in public clouds. They're automated with zero manual touchpoints, enabling large scale with very small team. We also use all open source technologies or cloud native services where it makes sense. More than 160 teams have started streaming data, which I think is amazing in such a short amount of time. With the initial installation of the mainframe that I mentioned from that high volume API, many MIPS have been offloaded saving millions of dollars per year. We also built an API gateway working closely with the Kong community to replace, uh, expensive and difficult to operate gateways. There's been fantastic adoption and migration to this.

00:13:29

Um, not mentioned here, but I do want to, um, I do wanna call it out as well. Uh, we now have a source of truth for our APIs and our teams can now discover what is available for reuse. We've started a similar, uh, a similar story. I shouldn't say a similar sort, similar rationalization journey in our big data space as well. And so again, next time, this time next year, I think we'll have some really interesting things, um, as we learn through that. So we're definitely at the start of un our unlocking the data journey with a lot to do. But these early wins and adoption are a really great start. So, and I can talk about that, you know, platforms and architectures and early day wins all day long, but what's really important are the adoption and modernization across the massive tech organization. I think we all know how difficult that is. We need to transform our teams and how we do our work. And David is gonna talk through the journey his organization has been doing in that space.

00:14:22

Thank you.

00:14:29

Thank you all for having me on here. Uh, I just wanted to start a little bit about this journey. And this journey has definitely intersected a lot with what Heather is doing. So we've been working a lot and our teams have been working a lot together to really evolve, advance where we're at, uh, conceptually. But we really had to start at the very beginning. And we started at the beginning with this, uh, large initiative that we decided to do several years ago. And it's a, a multi-year initiative. Uh, quite a few number of people, uh, over 2000 people are involved in total for this initiative. And the goal was to reduce our operational footprint, our IT investments, our business investments to one third current levels. And by doing that, being able to materially affect the health and wellbeing of the healthcare system, an ambitious goal.

00:15:15

And as Heather hit on the size and scope of our company, we don't do anything small. So we said, let's start small for us. And that's about a million, obviously not very small. Now we're not, we're a risk averse company as well. It's a large established, uh, company. So we're not gonna just jump in and release out in a big bang to a million. So we also strategically roll it out as we're moving forward. But again, we had this great idea and we said, now where are we going with this? Where do we start? And the very first thing you have to start with, of course, is blockchain, <laugh>. Sorry, that's, they're down the hall. Nevermind, uh, we start with people <laugh>.

00:15:56

So what matters to the people, right? Top down will never work. All right? I've seen this, I've seen this actually at our own company as well. We're going to do agile because we've read a book and it's a great idea. And I've read the Phoenix project and we should now do DevOps. And it comes from the top and push down never works. So the first thing we did, and again, we started with an existing team. We said to this existing team, we're going to transform the way you are functioning today. And how are we gonna transform that team and why, right? That was the biggest thing. Why is it important to you? Why does it matter to you? And taking a team that had large QA team, and when I say qa, I really mean qc. It's all manual testing.

00:16:42

And to transform from that into one of automation, and you tell that story and say, why are we gonna do this? And it's gonna create that individual feeling of success, that agility, the speed to market, the higher net promoter scale. So you see this from the business perspective. Why am I doing this and what's the value? Am I gonna get out of it? And then as an individual, where am I gonna get this? But of course, the team can get this and the engineers can love the idea, embrace the idea, and say, let's, let's run with this. But you still have a very important group to deal with. And that is the executives, executives who look at agile and think of this, it's the cowboy developers that just throw things out in production. You want to get to production, sure, it's just gonna go out there without any testing. And of course, everybody in the agile and DevOps community knows this is the farthest from the truth that we can possibly be.

00:17:38

But this is that entrenched orthodoxy that we also have to fight against and educate, right? That's the way for us to actually move forward is educate them, make them understand. So we're going up the scale as well as down the scale as to why this change in transformation matters. And as we went through this journey and we started defining agile and DevOps for ourselves and agile, and we started with that, that was a little bit easier of a buy-in, it's about evolving project management, being more flexible, being recognizing change as a first class citizen, which is actually a very difficult thing to do, especially to an entrenched organization who itil, ITIL is a first class citizen, right? So we really don't want change to suddenly change them and say, no, no, no. Change is a good thing. Smaller changes, incremental changes, this is a very good thing and it'll create improved business value.

00:18:32

You know, we're going to directly involve the users in everything that we are doing. And we're looking at systems. When I talk about systems improvement, I don't, I don't think of it as one computer system, right? Think of systems like demming, things of systems, the whole system from the ideation through the business delivery operations, IT support, right? That's what we're looking at from it. And constantly questioning, sorry, for, for Agile, constantly questioning our processes and quality, right? As underlying everything that we do there. And then we look at DevOps and we say DevOps is really the evolution of agile for us, where we take a lot of those ideas that we had in Agile and we just advance them, we make them more, we make them faster. Like taking an hour long presentation and putting it down to 15 minutes. So, and we focus on culture, right?

00:19:22

And again, everything underlying everything we do, and every principle we have is that quality focus. And so we took this team, right, that had very few unit tests and almost everything. So we'll invert this pyramid was manual testing. And this is a snapshot for where this program, this, this, this series of applications that are working together, this is how they're building today. So 158,000 tests are run on each build over 100 builds per day, which means that there are over 16 million tests that are being run every single day before we even get to the business scenario testing, which is at this point too, we've also started embracing more of our cloud principles. We spin up whole environments and whole environments. This is multiple database, multiple applications, streaming. All of this is spinning up, running through these different, uh, series of tests, validating from the business perspective that this release candidate actually is a good candidate.

00:20:21

And then we tear it all down afterwards, get our reports, everything is good, we send everything out, everyone's happy. And we've really been able to invert this pyramid. So manual testing really comes down to that user experience piece of manual testing instead of validating what we're doing that has enabled us to go from a team and a culture. Lemme just just pause here for a moment here. You see too, we, we got at this point of two releases per day, which by itself is pretty fantastic. Rewind. When we started this project, when we started this, it was quarterly releases. So every three months we could release. If you had defect fixes, you could do that every month. That's the mentality we had was we started and we started that and we said, we're gonna do this agile thing. We're gonna do DevOps, we're going to be getting to daily releases.

00:21:15

With my goal, we're still not there, but my goal that every time a developer checks in that can run through the whole pipeline and that can get out to production, fully automated, not there yet, still working on it, but I'm really proud that we got to two per day from a culture that was one per three months. And we did that through that investment in quality, through that investment in the people and training and showing them, and automating all of those tests all the way through. This has enabled us to deliver 2,500 user stories per year. Uh, almost 1500 defects that we resolve, whether they get to production, most of 'em are caught before that ever gets to production through this, these different pipelines. And then when we see, now in this case, you a critical incident, a critical incident, ISN in the system down system's down it, it's uh, actually it's only happened once and that was a network problem.

00:22:06

We got it up and running in less than an hour. Uh, I'm talking about a business scenario where I don't like it, that it's doing this. And we got that now to about 24 or less than 24 hours from where it was before, more about three days. And the 10 days for a medium, it's a lower priority, where before it was 30 to 90 days before anybody would even get to that medium level. So just changing that culture, changing that approach. And of course, this is a fairly sizable organization that we're working with and we come around to how do we do this? Because we look at a lot of the ideas in DevOps and say, individual accountability, love that idea. And owning a piece of code that you've worked on and having it for the rest of your life, it sounds really boring, sounds good, but really boring.

00:22:57

And that was kind of the feedback too, as we went through with the teams. And so we tried something very different after trying several different models. And one of them is where we're at right now, which is we take the different domain teams that we have, and it's a volunteer basis. So volunteer comes from that domain team and joins this steady state of support team. So our site reliability engineers, our IT operations, that's about half of the overall, uh, support team out that's running at any point in time. And the domain teams are coming into this and they're getting automated alerts, they're getting the monitors, they're getting the user tickets, and they're being able to take this. They take this daily standup, they take this information and really quickly within hours, they have workarounds, they have defects, log tracked, uh, delivered usually within that same day, depending on the severity of it.

00:23:55

So this rotating model, every 10 weeks, we have our teams moving through. This is how we've been doing this. Um, it's, it's been going for about two years that we've been running it at this point has been pretty successful for this. We'll talk about some of both the pros and cons of this approach. The biggest pro has been that we can scale on demand. And we've seen this, right? As we've been ramping up more of the business onto this new platform, and we have suddenly a, a doubling of the business. This, that, that shows up on the platform. We also need to have support that goes with it. So we can really rapidly move people over into that support mode and come in and out. And we've had 103 so far just going through this year, the monitors and alerts that are written by the site reliability engineers, that's great, but that's not really what we want, right?

00:24:43

We want to go back into the domain teams, that ownership feeling. And that's what has been happening. So we've been moving that further and further back in all the way throughout that performance expertise has been moving back in. And a side effect that happened a lot of people didn't think would happen at first too, was the engineers actually finding joy in the support mode. Because before, especially with more of an ITIL type of organization, you throw things over the wall and it's them that have to deal with it after production. And I just get to do bright new code all the time. That seems boring. This is fun. All of a sudden it changes, right? With this feeling of ownership and that engagement as well, because you get to directly see what you do and how it affects users. And you're talking to the direct users and seeing that.

00:25:28

So the engineers really got to see and experience that joy. And then the greatest thing with all of it too is that our quality continued to improve as they experienced those challenges of production. Taking those lessons back into the, the domain teams cons, of course, that at any point, right? Right. At least at the beginning of those 10 weeks, about half of the team is brand new to it. So they have to learn the processes, the, the tools, technologies that we're using. Um, the ones that are on support, if there's a defect that comes up that might not be the code that they're familiar with, right? 'cause the domains can be big, but they know somebody who is right. So it's a mitigating factor of that.

00:26:12

So with all of that, we wanted to take a step back and say, where are we at? And then why are we here? So one is just to, to say that we are here to, to learn from all of you. All of us are on different journeys. The the fact that this is an enterprise summit is our enterprise companies that are all dealing with similar challenges. And this is the fantastic thing of us being here and being able to learn from all of you as well. We are in the early days of modernization and in the early days of this, we're learning a lot. We're challenged by a lot of what we are doing. Many of our teams are still just beginning these journeys, whether it's the cloud journey or it's the DevOps journey or the data journey. Freeing the data from locked into the different proprietary systems a difficult thing.

00:27:02

And we are in the early days of this and evangelizing the new architecture or process is really hard. It takes grit to stick with it and to keep on pushing against that established orthodoxy investment in our patterns, our tools, our technologies, our our training programs. The worst thing ever is when you just say, go learn it on your own and turn around. So actually investing in that, building up training camps within your teams, that's what we've been doing. It is hard, it is an investment, but as we've been able to see, it's paid off significantly. And of course, the very last point as we bold it, automation, documentation, training and guardrails are super important. That prevents some of that cowboy nature of just release to everything.

00:27:56

So David, before you wrap up, this is from Jean, had a note pass to us. Um, can you take a minute and describe what set of applications you transformed and why it's important?

00:28:08

Absolutely. So impromptu questions, all right. Okay. We're rolling

00:28:12

It. <laugh>, this is the fun part of the talk. Yeah.

00:28:17

<laugh>, what we took was, uh, really the core of health administration platform, right? So what it sits really behind the scenes, what you're not seeing, um, as a member, when you go and you present your card to the doctor, a whole lot of things happen behind there. Whether it's the portals that are checking your, your benefits. Well, how do we define those benefits? How are those structured? How are we validating the rules of what you're eligible for, what you're not eligible for? That's those backend systems that we are, uh, developing. And we went through, uh, many different ways to, of looking at that data and the existing systems that we had and how are we going to transform this? Where are we gonna go with this? We could start with, uh, big data, right? That was one of the very first things, everyone that was very flash and, and and shiny at that point. Let's make a document database for everything. Um, but I want transactionality, I want reliability. And so we had to go through this very difficult journey from a technology standpoint as well as the evolving the individuals of how are we looking at, uh, evolving our, all of our processes, but also our tools and technologies, uh, go at that.

00:29:26

So is that what you wanted there?

00:29:29

Yeah. And as David says that as well, I mean, we have one more minute. So maybe I'll just build on that with the, like the, in the initial days, the architecture for this, um, you know, this important set of applications and capabilities, um, the, the strategy, the approach to like how do we get access to the data, how do we expose the data very much was, um, it, it was using big data technologies and so ran into a lot of issues, reliability issues from a data perspective. Um, and so that was one of the big drivers for this, the architecture that I kind of talked through with unlocking the data of actually we've got a couple of different access patterns, three access patterns in fact. And we need to enable those with in near real time for streaming. So when we have transactional type of workloads that we need to enable, let's build APIs or let's move into events to actually ensure that we're getting that at high, you know, at the high scale that we need in our transactional systems. And then let's use our big data technologies for what our big data technologies are are helpful for. So really happy to talk about that further and click in to that with anyone here today. Um, so thank you very much for having us. It's been a fun journey. So

00:30:33

Thank you.