Las Vegas 2019

What Tech Leaders Must Know About Microservices

If your team practices DevOps, it's only a matter of time before they say "We want to move to microservices".

How do we, the tech leaders, arm ourselves with enough knowledge to make good decisions on the move to microservices. Many times engineers will emphatically push for this move because microservices represent the bleeding edge. They do not always assess the value of this change in the same way we do.


Using our real life experiences at Comcast, we will answer the following questions to help tech leaders make good decisions:

1. What are microservices, exactly?

2. What are the main reasons to move to microservices?

3. What are reasons to NOT move to microservices?

4. What maintenance considerations should my team prepare for?

5. Where are the hidden costs that come with microservices?

At the end of this talk, you will have an appropriate level of understanding of microservices and will be prepared to make informed decisions with your team.


Leslie Chapman is a Distinguished Engineer and Architect for Comcast's X1 platform. She has a passion for inspiring young women to explore STEM fields. When she's not busy designing the future of television, she is at home taking care of her 2 cats.


Michael Winslow picked up his love for programming when he was 10 years old writing GW-Basic code on his Tandy-1000. With his passion for designing simple solutions to complex problems, Michael has played key roles at companies like Aramark, Ortho-McNeil, Oracle and Xfinity Mobile. He is currently a DevOps advocate, Agile enthusiast, and dedicated people-leader for the Core Applications group at Comcast.

LC

Leslie Chapman

Distinguished Engineer, Comcast

MW

Michael Winslow

Director, Comcast

Transcript

00:00:02

Thank you all so much for being here and learning about what tech leaders need to know about microservices. It's fantastic. So my name is Leslie Chapman. I am a distinguished engineer at Comcast. What does that mean? It means I get to write code and I don't have to worry about much other than that. So it's kind of like the best job in the entire universe. Um, and this story that we're going to walk you through and this explanation that we're going to walk you through is like super near and dear to my heart because we are going through this right now on my team. So

00:00:42

Get ready. All right. And I'm Michael Winslow. And just as a homage to Philadelphia, I'm going to do the classic DJ jazzy, Jeff and fresh prince stance and say, she's the coder on the manager. So I'm a director of Comcast. I'm, what's called the used to be a coder. And my fun fact is I still code just for management tasks. So I try to do it in a way that, that my leader doesn't realize that it's a coded report that he gets, but he's just like, wow, you're just so detail oriented in your report. And I'm like, yeah, it's generated. So feel free to keep in touch with this on our Twitter or LinkedIn. And I think we'll get started. I want to get started. All right. Tell everybody tell the nice people about Comcast.

00:01:30

So Comcast is in Philadelphia, right? Downtown. We work in a sparkly new building. It's kind of amazing. It's called the Comcast technology center and it's just full of like a ton of nerds, which is super fun. Uh, we have a gym and a cafeteria, which is super fun and we build amazing products that millions of people use every night when they go home and relax. And while a lot of people might think it's just television. It's not, it's kind of your friend, right? Come on. You can all admit it. Who loves their TV. Come on. Yeah. Yeah. All right. All right. So it's the most fulfilling job in the universe.

00:02:17

We love it.

00:02:19

And we have a huge footprint that you guys probably don't even know is our footprint. You guys know about Comcast as a cable provider. You probably know about us as a high-speed data provider. Um, we have a lot of apps, uh, that you can use to interact with, uh, your, your, um, your ex fi I'm sorry with your high-speed data. Um, but a lot of people don't know we recently got into the mobile business. Um, we also have Comcast spotlight, which does advertising for local businesses in the Philadelphia region, which is advertising. I can't, there's too much. That's going to go on, on this slide. I can't explain it.

00:03:03

All right. Let's just kind of, and then what, how did we expand reasonably well, who do we buy?

00:03:08

I kind of a big deal. Um, and we have all these networks that people don't know about. We employ engineers far beyond software engineers, mechanical engineers at our universal theme parks. And one of the things that people don't really realize about Comcast is we are a technology company. We're not a cable company, so let's just dispel.

00:03:34

Absolutely. Absolutely. And in true dev ops form, this, we don't like this chart for one main reason. This is what we want the chart to look more like all those lines disappearing, right. There should be no difference. We should be able to talk across all those lines. And we work every day to try to make sure that those lines get blurred and then eventually go away. All right, now it comes to me

00:03:58

And it over to him. He's the dev ops guy. I'm just here for the,

00:04:02

I'm not sure how I got that moniker at Comcast, but recently I've been known as the dev ops guy all the time. So I've been coming to dev ops meetups and different events for years and years. Last year, it was my first time at DevOps enterprise summit. And now I am completely honored to be up here. I'm a huge dev ops fan. I always thought that the name dev ops was the one thing that I kind of didn't like, because it does make you think it's just about Devin ops. And it's so much more than that. Anybody who this is the third day, you guys have probably heard this many times before. What I like is the acronym that John Willis and Damon Edwards came up with was cams. They've expanded it to comms recently with jazz humble, but I, I like to keep things simple and I'm not, I don't think take on new letters unless I have to.

00:04:47

And so it's culture automation measurement, which is like the thing that so many people forget about here and then sharing. So we gave you ideas of how we want to share on the previous slide, want to make sure that we share between companies, but I think this describes my thought of dev ops so much more than the word DevOps itself. So you're here for microservices. We're going to go through some fun slides here, a little bit of background about how I came up originally with this deck, and then we worked on it together. So I started as a principal engineer at, at Xfinity mobile. I'm sorry. That's a lot smaller than I thought it would be down there. I'll just read along with you guys. And then one day the team decided to move to microservices and we grew fast. And then I ended up taking a leadership role.

00:05:41

And when I was in the leadership role, I talked to one of my upstream leaders. And I said, by the way, for the last two months on our time sheets, we've been putting microservices, do you know what microservices are? And he said, no. And I said, what do you do if your boss asks you this microservices thing? He's like, I just say the engineers told me we had to do it. So it doesn't sound, it doesn't sound great. So I started off this whole deck started off as a way to teach my leader what microservices are. So we're going to go through it. So I had two unique perspectives because of this leadership role that I took. I thought that was that microservice would be cool because it's cool tech, but I also wanted it to prove out its business value. So when we made this change, the microservices things in the end worked out well, but it didn't have it. Wasn't always sunshine. So this is the part that I love. I get to help the audience be more interactive here. We get to make a choice here. Would you rather hear about the lessons that we learned along the way, moving to microservices? Or do you just want to hear about how great the team was red pill or blue pill?

00:06:47

Sorry, this is my reputation and I wouldn't, there's no way I would come up here and not talk about what a great team we had putting together. Microservices. I promise we'll keep it short. This is the, my favorite t-shirt that I always wear around. And you guys saw me in it last night, if you did it. And it was just a, such a great, amazing team that put it together. And one thing I definitely want to point out is that that was on a, this was part of my lightning talk. So it has the 15 minute, 15 second timer, the diversity of the team. I mean, I talk about all kinds of diversity. There's like an old guy there, right? And so I'm old, you're old and you're actually not in there. So, so,

00:07:33

But serious microservices. So I, I don't know what that means.

00:07:40

Gotcha. And so a quilt that I like to use, this is a, this is a, you mind if I stand over here for a second, I want to, this kind of feels like it's in the way. So my uncle and one of my first mentors had a great quote where he said a true seasoned professional should be able to speak for five minutes intelligently about any subject that has kind of been something that I've carried with me all my life. I don't know how many times you sat down for the first time with somebody. And they talked about a subject that you'd never heard about before. And you just decided, I don't know anything about it. I'm not interested. That's not me. You know, I want to get my five minutes in and figure out from that person how I can speak intelligently about that. And that's how I felt when I wanted to make this deck for my leader that didn't know about microservices.

00:08:23

So Michael, by the way, I am playing the role of the leader that doesn't know. I actually do know what a microservices, but for the purposes of this talk, I do not. So Michael, why do we need to do this? I like, what's this going to bring to our business? What's, what's the difference between a monolith and a microservice?

00:08:45

Glad you asked. So monolith versus microservice, these are the words you always hear going around. Many of you actually know this already, but I'm going to do it for the couple of people who won't admit that they don't know it. So we used to call this thing, just the service, right? Then all of a sudden when microservices came around this thing, that was just a normal service before was given the moniker, the monolith. All right. And what is the monolith? The monolith takes in a call from an HTTP request. It comes to a controller that controller calls a service, which has the business logic in it, that service calls a repository of some kind. And that repository normally would go to a database with sometimes may go to other services. Okay. And in your controller, you would define all kinds of end points in this case, eligibility, user, and device.

00:09:30

And if you're writing code in a very systematic way, you will have matching business logic for all of those that come in and possibly even exact matching repository layers, which matches three particular tables that you might have in your database. Okay. Of course there's variation that can happen, but this is a good example of a standard monolith. Okay. So this little thing is a microservice, but I want you to be aware that just because it's called a microservice, doesn't mean that it's physically smaller than the model that you would have started with many times you have so much boilerplate code in a, in a microservice that the actual file size could be larger than the model that, that you started with. And that definitely was the case in our, in our case. So HTTP requests would come into a microservice and it would, instead of hitting a controller that had all the end points in it, it would have a very specific controller that had that one end point for device. And then it looks very similar as it goes down and has its own data store. Like I said, things vary along the way, but this is a typical example of a microservice. And then we would have two more microservices to represent the other items that we had in eligibility and user. Okay. So you start to understand what a microservice is

00:10:43

And, and, but here's the thing. So I think what you're saying is we have all these people on this team and we can just kind of break them off and have them work on microservices.

00:10:55

You're getting it. So this might be a reason to move to microservices. You have a great number of developers and they keep stepping on each other's foot and the microservice, you might want to break those teams up and, and give them independent projects to work on. Okay. So let's move this to the side a little bit and talk about a challenge that you have when you move to microservices. If you remember what we had in the monolith, there was one particular HTTP endpoint that would come in and you could use that to go to several different end points. Okay? If you notice the diagram of the microservices, you actually have three very unique services here. And when you're actually migrating users over you, can't tell them instead of calling one particular end point call three. So you need a traffic cop there. You needed to add a gateway in there. Okay. Now we used Netflix Zul as a gateway at the time, just to let you know, API gateway in Amazon is the equivalent to what you would build out yourself with Netflix zoo. Okay. And Zul is open source API gateway. We all know what that is. All right. Any questions?

00:11:56

Is this going to scale? I mean, our monolith is working. Why are we messing with it? Are we going to be able to deliver on the same cadence? Is this going to take us longer?

00:12:06

I'm glad you asked. Okay. Let's say for example, that eligibility microservice that you had starts getting traffic that way out passes outpatient, the rest of your microservices. What you can do with microservices is go ahead and just add two more eligibility microservices at this point. Okay. And what you're going to need to do at that point is discovered that there are new microservices there. Okay. If you're still using the Netflix stack, you're going to use something called Eureka to do this. All right. And if you're on the Amazon side, the Amazon uses ECS for its service discovery and several other things that you can put into Amazon as well. All right. So once it discovers, there's two new eligibility microservices there, it decides that it's going to expand that gateway and register them behind that gateway so that it can start taking on traffic. And that's when Netflix ribbon comes in, Netflix ribbon is the ability to say, I want to make sure that I evenly balanced the load of eligibility calls between my three eligibility microservices. Okay. In Amazon, they use elastic load balancing. Okay. So

00:13:16

What happens when traffic just gets, how do we, how do we really load balance this? And what happens if one of those nodes goes down?

00:13:25

Oh, I'm glad you asked fault-tolerant tolerance and the circuit breaker pattern. Let's talk about it. Okay. Let's say the database for one of your eligibility, microservices decides to fail. And then that causes the microservice to fail. Okay. We might retry once retry, twice retried three times, and then you're going to want to say something that says, I want this out of my mix. Cheerio. Goodbye. Hystrix is what handles that on the Netflix stack. It's basically graceful fault tolerance. It will take the one eligibility out of the rotation. If you're hand rolling your own. Sometimes you may actually have to start up another eligibility, microservice yourself. As in all things on Amazon, it's basically just to check of a box that it would handle it all for you on AWS.

00:14:11

Okay. This is sounding really great. So what you've told me so far is that we can be super fault tolerant. Um, we can take this big team that we already have, and we can assign them out to work on these microservices. So it's not really going to be a hit to my productivity. Where's the, but

00:14:36

Where's the button. Well, that's a great breakdown of what you've learned so far. So now I think we should let the audience decide. Would you rather have us tell us what our team learned or have us tell you what our team learn, red or blue guys, right? This time you got me, you got this. All right. So if you go out there and you're trying to get to start your journey on microservices, you might find a few very snarky comments out on the internet. And I'd like to show you one of them. When I first started looking out, it was a great flow chart, which was a decision maker on whether or not you should move to microservices. And it looks something like this. Are you Netflix? No.

00:15:18

So that was a joke back then. It's still kind of a joke, but there are definitely great. And a lot of use cases to move to microservices, but a lot of people make mistake when they jump into soon. Okay. I want to quote another friend of mine, a developer that was Onyx Trinity mobile. When we did this complexity begets complexity, soon tools are needed to manage the complexity. Then tools are needed to manage the tools. I thought that was awesome. So I decided to put it in here. And if you ever want to talk to him about his quote, our Emerly is how it's pronounced. All right, cool.

00:15:53

Here's the thing. I think our Splunk costs went up really? Yeah. Oh, how are we gonna figure this out?

00:16:03

All right. Let's maybe we can figure out how that happened with microservices. Let's take an example. Here. We need to call our API to determine if a particular user is eligible to purchase a device. This is a very good example and a very real world example. First, the scoreboards at top guys on the, your left, my right, this is a monolith. And here's how that would happen. These dots will come in. They represent process boundaries. And every time we've crossed a process boundary, we're going to make a SIS log entry. So the red dots are the SIS log entries. So the HTC request comes into the eligibility microservice. It goes to the service layer. It calls the user service and it goes to the database. We have made two hits. Then we need to find out we got the user information. Now we want to get the information about the device to find out if they are eligible for that device.

00:16:58

So a lot of paraphrasing here, but let's say we have three process Browns. You jumps three SIS log entries. All right. Now on the microservices side, the first thing that you're going to notice a lot more process boundaries between the microservices. So we come into the API gateway. That's one eligibility call that's two eligibility does, is processing called the user microservice. It comes out of the gateway because that's the pattern that's usually followed. Okay. And then it goes to the database. We're really building them up here. Aren't we, then we've got to go to the device and we got to get to the database. Okay. So for the same thing in a microservice, you've caused six SIS log entries to three. Okay. Also, we don't get into this very much, but the increased process boundaries also create more surface layer for possible security threats.

00:17:50

All right. So I was feeling real great about this earlier. Um, but now you're making me a little worried. How can we, how can we make sure it's all worth it,

00:18:01

Calm down, boss. All right. We're going to be fine. Okay. But I gotta be honest with you. We're not done with the logs conversation yet. Fair enough. So application logs

00:18:15

Is

00:18:15

Tell me more. Okay. So let's talk about how we used to troubleshoot in production. How did we find out that things were going wrong? So using that same scenario, you come into the eligibility microservice and it tries to go to the user database. And it's a failure, right? There. Has anybody ever seen anything that looks like this?

00:18:35

I'm taking off my boss hat and putting on my actual developer hat. I love stack tray

00:18:40

Stack traces. Yeah. Well that you might be snarky with that, but trust me, snack stack traces might be better than, than the alternative. So we did find out that the username was no, and that's why the failure was happening. But we were also able to climb the stack here and say, well, that happened in the user repository, which was called by the user service, which was called by the eligibility service, which was called by the controller. We were able to go all the way back up that stack to find out the path of the error with that stack trace. Okay. With one app log entry, we were able to do that. Why? Because the stack trace has the visibility of everything inside the monolith. All right. Let's do the same thing over here in the API gateway, we call the eligibility microservice. It comes through fails on the repository layer. And this is what you see. I want you to notice that the only thing that it has here is the repository layer. Actually. I think somebody changed this anyway. Yeah. These should all say repository. I believe. Right. Anyway, the reason why is because the stack trace only has, sorry, that was the correct. The stack trace only has the visibility for the current microservice. All right. So that's why it doesn't even know where it originated at that point. So all context from the previous microservices lost, all right. And then we had to step in with,

00:20:02

So we have to have a way to solve this, right? Like my developers need to be able to look at a full stack trace so that we can debug issues.

00:20:12

Don't worry, boss. I got you covered. All right. And we're going to go fast right now. So way back, about 10 years ago, maybe Google came out with a paper called dapper that foresaw this problem with microservices, they didn't actually implement anything yet. They just came up with a paper that said, you're going to need some sort of distributed tracing. If you're going to have a very distributed system like this, then the first person to come out with something was Twitter with their Zipkin. And then we came out with something called money. It doesn't use much. I actually would not it's open source, but don't nevermind.

00:20:48

It works well for us. That's all I got. And then, and then they recently came out with open telemetry, which is probably going to be the new, new hotness for distributed tracing. Okay. And if you want to talk to the right he's real good. Paul Claire is good. Just he moved on to another thing and okay. All right. So yeah, he's amazing. So we used to use distributed tracing to track transactions across applications. Now we need to track transactions within a single application because everything's distributed now that's Katherine Rodriguez who said, all right, I'm going to just go through. Okay. So how does it look? How do we actually track application problems in a microservices world? So something comes into the API gateway. And at that moment you give it a correlation ID. If you put this in production, I would not recommend you start with the number one.

00:21:37

So when it goes to the eligibility end point, at that point, when you log in there, you log that correlation ID along in your logs, okay. Then when it does its processing, every time you log something, you should put that correlation ID in there. Then if something goes wrong and you do get that final stack trace that doesn't know anything about the previous path, you have this correlation where you can actually draw out what happened the whole time. So as you can see though, where you had one app log before with a single stack trace, now you have seven in order to get the same information. Okay. And over time, and this is a very conservative estimate. You could have four times as much the log file size. If you're not careful when you move to microservices, okay, I'm gonna just keep going because of time.

00:22:27

All right, here we go. So what'd you used to do for one in the monolith. You now have to do for many, what do I mean by that? This was a real spreadsheet that I put together for that same boss to let them know that when we used to have one monolith, this is all of our parts that work together in a second. We'll talk what those are, basically all you need to know where green is across the board. All right. But then all the other parts, when you started moving to microservices, we're kind of just so enthralled and moving to microservices that we weren't checking all the boxes of the things that need to get done. For example, this area had to do with code review and continuous integration. We were instead just dropping war files and jar files onto the server in the beginning.

00:23:07

Okay. Continuous deployment of it to all environments with automated testing. Okay. We were like, we'll get to that later firewalls and observability huge. Right? Because before it used to be able to just talk in between, in one service. Now you might actually have a firewall problem just talking to another piece of the microservice. So, and then security scans or something that we had in place. And we had to basically set that out, these vulnerability scans. So my suggestions would be, do not create all your microservices at once yesterday. Scott Pruitt mentioned the strangler pattern, go with it, man. It's an awesome way to strangle the monolith and move over to microservices one at a time. Okay. How do you decide which ones to take care of first? Maybe you want to monitor the usage and find your most utilized end points and then prioritize in that way as far as moving to microservices.

00:23:55

Okay. And then have a template for your microservices. Something that every developer doesn't have to start from scratch every single time they want to create a new microservice. That way everybody kind of knows a starting point. Okay. The change management team wants to release notes. Okay. This is important. And important thing. If you still have change management team, things got so crazy that we had to write our applications when we automated our release notes. And I want to just go over the parts that we had to come up with there. So when we checked our code in, basically we had to have something that automated this section of our release notes, which laid out every single microservice that we had and said, here's the link to the test for that microservice. Here's the link to the Jenkins job. That's going to deploy that microservice, the Vericode security scans that we were doing at the time, here's our score for the security scans.

00:24:46

And here's our method that we use to roll back. Those were all things that needed to be satisfied by the change control board before we were allowed to roll out. Okay. Some people would say, just get rid of the change control board. We just satisfied them with an automated report. Alright. Version control and compatibility. Very important to think about. Cause we ran into this issue in one particular release. This might be what your, your list of microservices looks like. Okay. Now a release is no longer completely tied to the version of the, the binary file that you have now releases is an aggregation of all your microservice versions. Okay. And those versions of each microservice can all move independently from each other. So it's really tough to keep track of over time. We ran into it when we basically sold all of our code to charter and charter said, you know what?

00:25:38

We want the version of your code from November of last year. And I said, good luck. You know, we had to go back and find out what code was checked in. At that time, it was the version of the particular of the particular code. It was not easy. It was not ugly. And finally, finally, we get to the thing that you care about. Burnout is real. All right. You mind if I stress this point real quick? All right, this is the lesson we did learn. We did have a lot of people. We were under a lot of tight deadlines, Xfinity mobile. We had not let yet released to production and we had deadlines. And at that same time we decided we were going to go from a monolith to microservices. So the world health organization has recently said that this is an actual condition. So you can look it up and just, I don't want to belabor that point, but try not to overwork your people. All right. So let's end with some do's and don'ts do you want the dues with the don'ts?

00:26:29

Yeah. I'd like to add just a couple things to really drive it home here. Um, number one is one of the great things about moving to microservices is how highly unit testable and automated test a bowl they are, because it really is separating any of your front end from your business logic. So that's going to get you to speedier deployments because you can actually depend on the unit tests that your people are writing. As they're writing these microservices, to make sure that they are going out with quality. Um, the other thing that I would like to stress about moving to microservices is that it also enables people to, again, with the separation of the front end, from the business logic, it enables people to really focus on a core area of expertise. And it is important to think about microservices as something that does employ business logic. So if you have a data service that you're hitting, that's giving you every piece of data that you need and you don't need to munge it, or do a union on it then perhaps that's not a great rationale for a microservice, but when you are hitting multiple data services and having to employ your own business logic on top, that's the sweet spot for using a microservice.

00:28:01

All right. In 20 seconds, I'm going to tell you do have a reason to move to microservices, maybe resilience or cost application optimization, don't practice, resume driven development. All right. Do have a plan for monitoring telemetry and distributed tracing don't count on traditional logging techniques do peel off one microservice at a time. Try the strangler application pattern. Don't reflect everything at once. Do you understand that all the teams are impacted QA release management? You're not in a vacuum. Don't say we're doing microservices yellow, do staff appropriately and monitor employee health. Don't ignore the human impact of your decision and sorry, I'm going to go back in case you guys want a picture. All right. And the last one do encourage dev ops teams to pursue microservices. If there are benefits and don't under estimate, the time, effort and money involved. Thank you everybody.