How A Hotel Company Ran $30B of Revenue In Containers (Las Vegas 2020)

App modernization with containers for the rest of us.

plenaryvegas2020

(No slides available)

DH

Dwayne Holmes

Vice President of Converged Applications, Large Bank

TRANSCRIPT

00:00:06

Thank you, Adam and Lauren. So I have been a big fan of this next speaker for over two years. People kept coming to me and saying, you've got to meet this guy because he is doing things with containers that will blow your mind. And it was so true. I learned that among many other things, he was containerizing all of the revenue generating systems at a top hotel company that was collectively supporting over $30 billion of annual revenue. Duane Holmes did this work as a senior director of DevSecOps and enterprise platforms. For years. I've wanted him to give a talk about what he was doing here at this conference, but due to a variety of reasons, we were never able to make that happen. But thanks to him moving to a new company. I'm so delighted that he's finally able to share his story, which among many other things earned him. The title of Google cloud certified fellow having built and managed one of the world's largest Kubernetes installations. So I'm even happier that he is now joining our longtime friend, John rhe Starsky at PNC bank as a VP of converged applications and cloud. Please welcome Duane Holmes.

00:01:26

Thanks Jean so much. Uh, I really appreciate the opportunity for you to have me here. I've definitely been following your work like everyone. And I appreciate the awesome words. So my name is Duane Holmes, and I'm talking about my dev ops journey and, uh, to protect the innocent in this presentation, I will be vague, uh, all the way up until last month, I worked for one of the largest hotel companies in the world. In 2019, they have revenues of over $20 billion. They have 170,000 employees. They're over 90 years old with over 7,400 locations or hotels with 1.4 million rooms available. In 2016, they had a large merger approved in 2018. That merger integration was completed. And in 2019, we rolled out a massive program for our customers. So a little about the things that we accomplished during that time, all the way up until last month, I ran a team that supported over 3000 developers across multiple service providers.

00:02:41

Uh, our model was that we had few FTS, but lots of service providers when it came to development. So development was done when there was a project, um, that was green-lit by corporate in 2016, microservices and containers were running actually in, in 2017. Over one, billion was processed in containers. I didn't say microservices because we had microservices and also micro monoliths that were running in containers at the time. 90% of all new applications that were coming out of development were in containers or are in containers. And Kubernetes was actually running in production in 2017 and 2018. We were one of the top five largest production, Kubernetes clusters by revenue, according to red hat and Jay frog. And by 2020, when I left, we did 20. Uh, we did thousands of builds and deployments per day. And we ended up having to Google cloud certified fellows.

00:03:44

And we had experiencing experience running Kubernetes in five cloud providers. And the one that most people won't guess is Alli web. So you need to know in order to know where you're going, you need to know where you've been. So about me in 2012, uh, I work for a financial company and over 95% of infrastructure was outsourced. So out of 500 employees, only five were retained. Developers thought that if they outsourced all of infrastructure to a provider, the provider would do architecture, engineering and operations was, everyone knows the same amount of work, um, still was required because we had no engineers or architects, but we had a large outsource provider contingent that was able to help us. So I learned three principles when I was at this financial company. The first one, the CIO always talked about dial tone. I'm also a closet economist. So I believe in Adam Smith about the division of labor, labor, and trade specialization.

00:04:53

And finally, I believe in automating everything. So what is the dial tone principle? I always tell this story anytime I get a new team and it's basically like no one cares about any of the, any of the technology that you use to implement a phone. It could be Cisco or a via or whatever. It just better work. When the business picks up the phone, they expect the dial tone, anything less makes the business upset. And this to me means that you focus on what is important. So this led me to Ruby on rails and most people, especially my mom, no way Etsy and Hulu and square and Instacart and Airbnb and Twitter and Twitch do. However, they don't know the development platform that all these started on, which was Ruby on rails. And the reason why I love Ruby on rails is because of doctrine too, where it talks about convention over configuration.

00:05:55

And that basically means that we focus on things that do not accelerate business value, and we should focus on things that accelerate business value. So because of that, a lot of decisions already made before you even use the framework. The other thing, because I'm, I'm a closet economist. I love Adam Smith's wealth, the nations, I firmly believe in division of labor and also trade, um, specialization. And there's, uh, um, a story called the lawyer versus secretary. Suppose you have a secretary, um, or a lawyer, and this lawyer can type faster file faster and use a computer faster than the secretary. However, would the attorney choose to be a secretary or choose to be an attorney? That's of course, um, they would choose to be the attorney. So every hour that an attorney is spent doing secretarial work is an hour that they can't become a lawyer.

00:07:00

So because of that, you bring on a secretary in order to maximize your productivity. And that's how I feel about my dev ops or release engineering teams that developers are like the, the attorney, their job is to put out amazing code and our job is to support them. So every hour that a developer is focusing on things that don't provide business value is an hour that they're not providing that value. The other thing that I love is when all else fails automate. So there's only two ways that you can increase productivity. The first way is automation. And the second way is increasing resources. The issue is, is in my career, resources have always been scarce. So because of that, in order to increase productivity, I've always had the fall over to automation to do things. So how does this all fit in? Well, I decided one day that, uh, um, or in late 2015, I was a, vice-president at a financial company.

00:08:12

And if you look at the top left corner, um, that was the office I was in. And I had a corner off, uh, corner desk that overlooked the Harbor and the city on top of that, because they got rid of most of the infrastructure. I had amazing career stability and, uh, everything was great. However, one day I go to a meetup and this meetup fills my head with crazy ideas about containers. See, I went to this meetup to learn more about Rubio umbrellas because I was doing some work for my mom and I was doing lots of development. However, um, when I heard about containers, it satisfied three things, dial tone, containers, abstracting infrastructure. It also talked about specialization. Operations could create containers that devs could use over and over and over again. And automation, I can build containers over and over again, and everything will just work, which is awesome.

00:09:13

So I knew I needed to make a change. So I found out that this hotel company was willing to go ahead and go all in on containers. The issue is, is that, um, it was probably bad for my career, at least. So I thought, and everyone told me that. So I went from a VP to a contractor. I went from amazing stability to no stability at all, because one, I was a contractor and two, this was a experimental project that if it didn't work, then they could cut the project. And not only that, but instead of having amazing views on the city and the Harbor, I was sitting at a table, not a desk and I was in a room with no windows. So as a result, I didn't know if I had made the right decision. And I was, I would call, um, and talk over and over again with people, whether or not this was the right thing to do.

00:10:07

And most people said I was a full, however, the thing that allowed me to stay on was an amazing team. And they formed a great team of, um, people who had amazing talents, cross-functional team. We ended up, we had three developers and three infrastructure people, and I love giving nicknames to people. So I had our fearless leader who rallied the troops. We had the genius, who was the superstar developer. We had the professor and who was the superstar. He knew he knew everything about everything. He was a developer and an infrastructure person. And he's the one who actually suggested containers and Superman who had unbelievable energy. And he was our Dewar. And for me, I didn't give myself a nickname. I was just glad to help. So the goal of this team was essentially to talk about evolution versus revolution. So the goal was that we would take something and we would totally change the way the enterprise worked by this cross-functional team.

00:11:20

And I learned lots of things. And so one of the things I learned, especially early on is that environments, especially lower-level environments to be production light. And the reason why is we are a high-performing DevOps team, unfortunately we didn't make any money. So the only performance slot that we could get was from 12, midnight to 5:00 AM because legacy teams had all the best slots and we actually got the worst slot. The other great thing about having this amazing team was that at the time we couldn't Google anything about containers, we have to actually create our own inf uh, orchestration engine in order to, um, orchestrate containers on multiple VMs. And so we were able to proper, um, each other up and bring ourselves along. And as a result, we started creating frameworks, um, that were based on containers, uh, uh, frameworks and libraries. And we really thought, how can we, um, secure these and how can we deploy these over and over again?

00:12:26

And so everything was kind of like a pyramid where we built on things, so that things would go faster. The other thing is, is I learned about the greatest microservices on the planet, and I'm a Lennox guy. And if the way we thought about containers is that you have the command, which is a container, then your command line options, which are environment variables. And then anytime you did a pipe, think of that as a sidecar. And so as a result, Linux make the best microservices because you can take command and based on command line options, you can change how it works, and then you can do a pipe, um, command, and then you can change it even more by adding extra commands. So this is how we built a lot of our containers. And the result was that we came up with a framework, how we could deploy containers and multiple servers, multiple ways.

00:13:23

So a lot of people believe that you have to be in Kubernetes day one in order to use containers. That's not true. The other thing is people believe that containers are immutable well, depending on how you design them, they're not, especially if you're using them on a VM. So we really believe that containers are awesome. That if you focus on container hygiene, in other words, how you build your containers, you can run a container or anything in inside of a container on a VM. So in the end, especially when you start out, always focus or maybe putting containers on VMs, and then you can go to Kubernetes. And the reason why we loved these frameworks, especially the ones that we built is because dial tone. We abstract where we ran containers. In other words, no one knew where we were running these containers. They just knew there was a URL that they could get a microservice.

00:14:23

The other thing is, in terms of specialization, we realized that a small team can service a much larger team. And then finally automation, we could build hundreds of times without us getting involved so that the ass was after we, the Ram project was successful. They asked me whether or not I wanted to come on as a full-time employee. And so I was like, sure. So I asked for six things, I asked for all containerized workloads, asked for developer tools, pipelines platforms, and base images. And if you think about it three years from then, um, that is actually considered modern application, uh, modern operations. And so I was actually asking to run modern applications, our operations, but I felt that containers in platforms and base images and pipelines and developer tools were really important for me to do my job for release engineering. So the reality was that if you look at the classic dev ops issue is that you have development throwing code over the dev ops wall and hitting operations.

00:15:36

Well, guess what? I was like code being thrown over the wall into operations. One, the infer SVP didn't believe in the team to have few allies in the operations because most of my time was spent with development. Not only that, but multiple reworks left me under different VP who thought the dev ops team or the release engineering team was really a QA team. On top of that, the operations team had created a, another dev ops team where they were essential, uh, essentially implementing a service catalog with chef. And I was just thinking to myself, oh my goodness, I'm a team of one. Even though I have all these developer friends, this is deja VU all over again and I'm by myself. So the issue was is that we had two completing pep co two competing platforms, and it was what will devs choose? A, you could have a service catalog where you can have a dropdown menu and you can pick compute memory or storage, or you could have a platform where you go into get a hit commit.

00:16:51

And then afterwards Jenkins does some things creates an artifact, which is a container and then deploys it to compute. And the key thing is, is that, uh, the service catalog in my mind was TMI too much information where developers had to know all this compute stuff and the pipeline was obstruction, which I want. So the fortunate thing was that developers chose our pipelines a hundred percent of the time, which is awesome because if you think about it, dial tone, infrastructure provided too much information that developers didn't want. Whereas we for dial tone, they just hit commit to see their code running. Once they do something in Jenkins, we manage Jenkins containers and compute. So the developers could be amazing at what they do and do what they love. And then we focused on workflow, not necessarily building, uh, servers. So as a result, if you look at everything that we built, this is kind of like our framework, where we had a whole process and how we built our base images, how we pulled in libraries and frameworks, how we secured everything, how developers interacted with the pipeline, as well as tickets, how we secure things, whether it was like with, um, code quality scans or static code analysis.

00:18:21

And then security was everywhere with Aqua security, basically scanning containers, as well as making sure that when container was running, it was secure. So once everyone found out that a hundred percent of things were going our team's way, we got even more work. And part of this work was merger, integration, work, customer product work, and also the refactoring of API. And also we are going all in on international expansion with a partner. And so this is the phrase which I call go faster. So the issue is, is even though we had done all these amazing things and we wanted to go ahead and use Kubernetes, the SVP really didn't, uh, wasn't all sold on Kubernetes on top of that, because we were a small team and, uh, developers were choosing us a hundred percent of the time. We ended up having an unbelievable workload and I thought to myself, oh my goodness, did anyone get the memo?

00:19:28

So fortunately, or unfortunately, depending on who you are, we found out one day that our multi-billion dollar website went down because of her release everyone's on the call. The SEP is hot. People are talking about how to roll back, um, uh, release changes. And so of course, when these calls tend to happen, it starts out with infrastructure. And then afterwards you bring on more and more people, um, as time goes on. So the SVP and people on the call, which were mostly infrastructure people assume it's going to take hours to roll back the change. Well, we meekly shared our screen. We push one button, which is the easy button and we've rolled back to change. And literally we blew everyone away because everyone was thinking that it would take hours to roll back the change. And instead it took minutes. And this is one of my favorite, uh, um, diagrams that I used to put up.

00:20:30

I've created in 2007 after, uh, attending Google IO. And I was obsessed with machine learning and I showed this to the SVP later on, and you already know how operations and development people go do infrastructure people, always trying to figure out why developers are doing stuff. And developers always like, you know, it's the network. So I go to my SEP and I say, well, guess what I own. I asked for dev tools and I have most of them. I need some more, but in the end, um, all these tools can be used to build models, to do three things, great commits, great developers, and also grade the team and based on things that they do either reject or prove a release. So, and then I said, in order to do that, I need a couple tools. I need JIRA and he Jenkins and he get, I need Artifactory.

00:21:35

I need all these various tools. And then we can go ahead and integrate them. And then they can go ahead and begin to grate these different things that developers do. Well, guess what? That blue, my VP SVP is mine and he loved it. So as a result, we got the green light to continue to consolidate. So as a result, we formed a new, uh, infrastructure, um, organization. And to me, I really think that, um, this is another one of my favorite slides that I go to. So in terms of value, we can talk about like customer service and cus customer service is what I call like canceling service. It's high touch, high costs, very low value. So a lot of times when people think about dev ops organizations, they think of sitting developers and then, um, operations people together, and they might be in the same room, but, and through assmosis all this amazing stuff happens to me.

00:22:39

I really believe that the amazing stuff happens when there's clear communications between operations and development. So for example, um, if we began to create API APIs that developers can use in order to use infrastructure, developers are able to go faster. And as a result, their happiness goes up. One of the people I love listening to is Kelsey Hightower, Kayce, Kelsey Hightower talks about no ops and it's not removing the D um, the operations organization, but it's forming product teams. So these products teams control the end to end flow on how to provide a product to operations on development. And as a result, we think about how to productize things. So when I think about the value scale of, um, providing DevOps, I think of customer service, which is low value, then you go to the platform which is location agnostic, not only that, but instead of being an artisan focused team, you're now focused on process then CIC D, which allows a team to be the enablers of all the, um, people who are experts in their field.

00:23:59

So we can begin to standardize tools and platforms. We can begin to, um, uh, force good practices, and it allows people to standard, um, do specialization and then finally base images. They say images contain enterprise standards. They're opinionated, they're automated. But the great thing about this is that there are contract between infrastructure and operations. See, I'm not a development operations. So operations, um, sometimes gets involved when a development team has, um, a hundred steps to go and they're on step 90. And no one likes being told on step 90 that you have to go back to step one. And so our base images are a way that you can use them. And if everything works in conduction, when our pipeline, you can deploy to our platform as fast as possible. So the Dow tone is, is that developers should understand how to use your pap platform.

00:25:01

And in our case, we use key value pairs would get, and this control of the pipeline as well as Kubernetes, because Kubernetes is hard. We focused on Legos. First hands-on work. The specialization was that the team could focus on innovation. In other words, developers could focus on providing business value while we think about and worry about all the rest of the stuff, and then automation taking all these pieces together over and over again, to build something amazing. So as a result, we went from two teams that had like separate focus to one team where we combined all these things. Cloud-based image, infrastructure, automation, shared service, uh, general programming platform CICT, which was awesome. And the results were again, we were able to support lots of developers. We were able to provide microservices and containers that were running in production. We processed a lot of money on our containers, and we had Kubernetes in production where most people were just playing with it in a lab.

00:26:06

And as a result, we were able to do thousands of builds a day, and we were able to go, um, do what most, some businesses try to do. And that is be multi-cloud. Most people can't get one cloud provider, right. We actually got five cloud providers, right. So if I were to give people advice, one, I'd say, take calculated risks. In other words, I didn't know whether or not my, uh, foray into, uh, being a contractor at the time would be good. I thought it was actually the worst mistake I ever made. But the issue is, is that I went ahead and did it because I firmly believed in the technology. And I thought it would be a paradigm shift. Not only that form teams of like-minded individuals, when you feel down and you feel like you're fighting against everyone, you can actually look to the person right next to you and you can come along, um, or feel better.

00:27:10

Not only that, but digital transformation unfortunately, is politics. Um, I was offered a VP role in 2017 and I didn't do it because I want it to be hands on keyboard. And I didn't like politics. The issue is, is sometimes you have to take a promotion if you need to control your own destiny. And for me, I love technology, but I really thought I did a disservice to my team. Finally start slow. You don't need to run Kubernetes today. Um, but most workloads can go inside of a container. So a lot of people jump to microservices and doing all this amazing stuff. It's okay to take baby steps. We took baby steps a long time ago, and that was the reason why we're able to be where we are now. So then finally dial tone. That means abstract specialize have a team obsessed with release engineering and finally automate, automate, automate resources are normally scarce.

00:28:18

And so the only way to, uh, overcome that is to automate, and this is the help I'm looking for. And so everyone needs to convince Jean to allow me to do a container Kubernetes and CICB pipeline deep dive because in the end we are on our, um, when I was at the hotel company, we are on our gym for containers. It was PLA, um, cloud portable. It was scalable. Health checks were built in, we had tests for latency versus CPU and certs were no longer in the application or managed by developers. On top of that, we focused on circuit breaking and we had APM built-in and zero trust. And finally our images were very small. Again, that's all about container hygiene and our side cars were used to enhance everything. The other thing that I'm really passionate about is pipeline security. In other words, how do you secure, um, uh, end to end CIC pipeline, um, with security?

00:29:23

So how plugins in the ID that give feedback all the time to developers have, uh, uh, library and framework, remote repository, most people use Artifactory and access to do that, have container scanning over and over again. Aqua security is great for that. And then when you talk about even running or day two operations, the containers, now you have to think about how do I secure the host? How do I white lists commands and how do I do forensics? Because containers are femoral and they're dying all the time. The other thing I would, um, talk about is that it's very important to use, uh, environment variables, because most of the time, uh, people do EMD prod or EMV QA, or using the equals dev. And the issue is, is that tree folder structures are horrible to maintain. The other thing is configurations should, um, be a separate pipeline than your artifact and finally environment variables should be used. So if you can convince Jean to allow me to do that type of deep dive, that would be amazing. And the end, thank you so much for your time.