Las Vegas 2019

Functional Programming for (Dev)Operations and Infrastructure

Functional programming had mostly been relegated to academic endeavors until recently. What?s changed is that our apps are now distributed systems and are simply too complex for us to reason about without help. Programming intent and ceding some of the control to an interpreter takes makes creating solutions to very hard problems doable.


We've seen an increase in the uptake of functional programming for application development using languages such as Kotlin and Clojure, and now we are seeing some key tenets of functional programming seeping into operations.


Through a series of demonstrations, in this talk I'll draw some parallels between functional programming and tools that are increasingly used on the operations side of the house. If part of Devops is about building empathy between developers and operators (and it is), then shared core tenets help.

CD

Cornelia Davis

Chief Technology Officer, Weaveworks

Transcript

00:00:02

My name is Cornelia Davis. I work for pivotal VP of technology there, which basically means that I get to work on emerging technology and help our customers and our own business tie that to, to business value. Um, so that's, what I love to do is play with new shiny objects. I worked in our past product, our platform as a service product cloud Foundry for quite a number of years, uh, the first three or four years still work on that. Um, but for the last two and a half years, the shiny object I've been focusing on is Kubernetes. And I do love it. Um, and we'll talk more about that. I'm a computer science scientist by background. The reason that I go all the way back to Cal state Northridge and Indiana university is that those two different experiences really shaped the way that who I am as a technologist, as an undergraduate, I went to a greater LA based a university where I learned lots of pragmatic things like software engineering and programming languages like C and Fortran and COBOL.

00:00:59

Yes, I am that old. Then I went and I, uh, started PhD, never finished it, but started a PhD in the middle of nowhere in Indiana, in a small university town where I did research. And I did, um, uh, theory of computing and programming languages research. And there I became a functional programmer and really came to understand the mathematical foundations of function, functional programming. And I'll talk more about why that comes back and why that's important. Now, I already told you what I do at pivotal. I came there as an application developer, and then I very quickly learned that, um, while we talk about paths being about developer productivity platforms, or at least as much around operations as they are developer productivity. And, uh, earlier this year at the beginning of June, I published my book, which yes, that's the t-shirt that I'm wearing. And I'm going to do a little bit of crowing and jump out of my slides for a moment and come over here to the unicorn project, um, offer, uh, you heard Jean talk about it this morning, the 11 books it's I could not be more proud to be part of the 11 books.

00:02:18

Um, and Jean actually wrote the forward to my book, uh, and he has just been a super supporter. So thank you, Jean. Um, all right, so let me jump then in, uh, sorry. Should have skipped through that slide. See one more click.

00:02:40

All right. With the Mac. If I go one too far, then I can't go back. So I want to talk a little bit about, very quickly about the evolution of computing. When I started my career, I started at Hughes aircraft working on embedded systems. I wrote for a single threaded single processor machine that was going to be embedded on a missile system. I was doing image processing there. And so that was a relatively simple model. Then we started evolving as an industry into having multi, multi threads, multiple threads, and then we started creating multiple processors that were creating multiple threads. Now, as that evolution went on, we continued to evolve. And now we've got not only a whole bunch of different processes that might be running on a single machine, which allowed us to make assumptions and take advantage of shared components on that machine.

00:03:39

But now we're in this highly distributed environment. So we're all I, I like to say that we're all distributed systems programmers. Now, 30 years ago, when I was in school, there were a few courses on distributed programming, distributed, distributed systems, but not that many. It was pretty niche and not a lot of people were doing it. Now, in terms of the languages, the languages have evolved as those physical architectures have involved. I have evolved, yes, I did a little bit of assembly language programming. And once in my 30 year career, I actually found a bug in a compiler where I went down to the assembly level and found it in a machine instruction that was incorrect, got that fixed in the compiler. So we did assembly language when it was pretty simple, but I'm sure everyone would agree that there's no way that we can do assembly language programming for distributed distributed systems simply too hard.

00:04:30

So we've had all these different languages that have evolved as the hardware has evolved to give us a better model in which we can think through the programming. Now, all of those languages that you see at the top are what I call imperative languages. So they tend to be very control oriented. And I'll talk a little bit more about what that means. I mentioned that when I went to Indiana university, that was where I discovered functional programming. And the interesting thing is that when I left Indiana and I went back to industry, I pretty much left functional programming behind, except for, for hobbies, but in the last five or so years, maybe a few more than that. We've started to see these languages start to get more popular and to actually get used in industry languages like closure and Scala and Kotlin and F sharp are being used.

00:05:23

Now I just finished reading the unicorn project. I did get an advanced copy. And for those of you who haven't read it yet, when you read it, you'll see. And you've heard Jean talk for a few years about how in love he has fallen with functional programming that carries through the entire book. It's really, really very cool. So he actually goes through programming examples in the book where he, uh, has the protagonist of the story. Maxine chambers convert some code from imperative into functional and gain resilience through that, get rid of some of the edge cases. So if we look at some of those hallmarks very quickly, the hallmarks of imperative programs are that they're very control oriented. One of the most important ones I'm going to talk about today is that they allow side effects. They allow you to take a variable, set the value, and then they allow you to change that value.

00:06:19

Anytime you want. Now, most of us are so familiar with that. You might be thinking well, yeah, of course, that's the way it works. It doesn't have to work that way. If we start moving over to a different model, which is functional programming, one of the biggest hallmarks of functional programming is no side effects. Variables are fine. You can set a valuable to a variable and then use that variable, but you can never change the value once it's set. Whoa, for those of you who have never worked in an environment like that, you might be thinking, how do I ever program? I'll give you an example in just a moment. There's some really great things that come around that is, it allows us to do things like prove correctness of an algorithm. It eliminates hairy edge cases and so on. So what I'm really getting at here is that what we're doing is I'm challenging us to look at the models that we use to think through some of these hairy problems that we're solving today.

00:07:21

Now, I won't go through this in detail. I'll just build out the slide and I'll leave it to your pleasure because I do want to make sure that I've got plenty of time to go through the main content, including my demo, which may fail spectacularly. We'll see. But these are some of the kinds of comparisons between imperative programming and functional programming. And my whole point here is that while she is really bad-ass this 80 year old programmer, um, she doesn't necessarily need to program every single control point of the applications. So what I want to do is talk about having machines do more of the reasoning for us in certain scenarios. So let me give you a very concrete example here. Here's an example where I have a list and I want to filter out all of the things that are less than 30 and more than 20, and this is the way that I can program it really, really simply.

00:08:19

And then what I want to do is I want to get the second element out of the resulting set. And so the way that this would work and the way that I think it through mentally is that I do the first filter. I do the second filter, and then I pick out the value if I take that exact same code and I have hallmarks like immutability, then the compiler can actually do some optimizations for me. And the compiler can, instead of doing it that way, I don't have to program differently. The compiler is making the decision to do this instead is the first element less than 30. Yup. Isn't more than 20. Yup. There's my first answer is the second element. Less than 30. Yup. Less than greater than 20. Yep. And now I get my answer without processing the entire list. So those types of optimizations are possible.

00:09:18

If you follow a certain mathematical principles, mathematical principles that exist in functional programming. Now I said that I'm always thinking about business value, not just shiny tech. Sure. That was some great code, but what's the business value in this? Well for that small list, not that big of a difference, but if I can have my computer, my processor, my compiler apply optimizations like this. I might bring my product to market faster because I can process my big data faster. We heard gene this morning talk about an example that he cited in the, in the unicorn project book, where if something took 48 hours, they knew they were doomed. If you can shrink that time, you can shrink time your, the time down to, uh, uh, time to market and you can actually win out in the marketplace. That's real business value tied to some pretty deep tech.

00:10:17

Okay. So all of this is how does this relate to ops and infrastructure? Well, if we look at the way that we're programming systems and by programming systems, I mean, let's take an exact example, like an application and deploy it, that deployment. We all know that we're using infrastructure as code. We all know you were using code to automate things. That's the kind of programming I'm talking about is the programming to deploy and keep our systems running. Now in the past, we've used languages like bash and puppet chef and in salt and Ansible, which are all tend to be more used. Even if that you can do it differently, used in an imperative style. But as these systems get more complex, we have a need for some adapt, some new programming models and the new programming model that I mentioned that I've been spending a lot of time with is Kubernetes.

00:11:14

And this programming model, which is much more functional, like exists in a number of other places like in the cloud Foundry platform, but most popularly recently in Kubernetes. And I'm actually going to show you some live demo with that. So going back to that slide that I showed you earlier between the, um, imperative versus functional programming, there's a number of different, uh, uh, examples left to right. Um, that kind of parallel what I showed. When I talked about programming applications today, I'm going to focus on two of those declarative deployments and immutable infrastructure. And that's what I want to jump into more in detail now. So let's talk about declarative deployments to start. I want to use a very concrete example, and this is an example that runs through my entire book. And by the way, the book is written for application developers and architects.

00:12:13

So it's really about the patterns that you need to understand implement. And sometimes you don't implement them yourself. You just leverage these patterns to create more resilient software that runs well in the constantly changing and highly distributed environment that exists in the cloud. So here's an example. I've got three microservices, two of them connect to a backend database. They're doing simple things like connections are who, who are the users and who follows, who posts are their tweets or their posts. Those are stored in the cookbook database. Cause I love to cook. And then on the left-hand side, we have an aggregator that brings those things together. So if I end of course, there's multiple instances of all of these. So for example, there's seven instances of the connection service seven, four of the post-service and five of the connections post service, the aggregator. So if I've got this topology and I comes time now for me to deploy this, how do I decide how I'm going to deploy this across?

00:13:15

Let's say failure domains. Is this something that I, as a human being should be worrying about? The answer is that for the last several decades, the answer has been, yes, we've gone through and said, oh, well, I'm going to go ahead and distribute this across availability zones if you were even using availability zones, because a lot of people still don't even don't do that. So how do I do that distribution? I'm going to suggest that we humans shouldn't be doing that at all. We should not be making those decisions. So let's let a machine do that for us. And in fact, this is the place where I'm going to do my first demo. Now this is just a placeholder to rhyme. Remind me to go over and do the demo. You'll see in a moment, this running live. So I'm going to need my glasses here to get started.

00:14:07

I'm going to jump over. And the first thing that I'm going to do, I'm going to spend a little bit of time. Oh, okay. That was my glasses. It is sharp. I'm going to spend a little bit of time just showing you the structure of my applications. So the first thing that I want to do is show you what the application is structured. Like. So what I have here are deployment manifests for Kubernetes, what you see on the backing services. And I've already deployed those for, for speed is a Mike SQL database. Remember that was on the far right hand side, a token store, which was tied to the aggregator and another spring, spring cloud services, um, component there, those are already deployed. And then in Yammel files on the top are the three microservices. And those are pulling containers from my Docker hub in deploying those as an application.

00:15:01

So let as a set of applications. So let me show you what that looks like. So currently here is my console. So I have deployed what you can see here and I'll read some of these to you is I have, oh, that's memcache D which is something else. This is that spring cloud service that I talked about down here. I have my sequel and the Reddis services. So those are both deployed. And these two icons here, let's see if I can include increase. Now it doesn't really increase too much. These two bubbles that you see in the middle, those are the actual Kubernetes nodes. So that's where workloads are running. Okay. So what I want to do first is I want to go ahead and deploy my workload. Now I'm going to deploy my workload in a very interesting way. And I'll explain this in just a moment.

00:15:56

I'm just going to get it started is that I am using a PR uh, um, open source project called flux. That comes from the CNCF. What flux does is it allows you to do deployments via get hub. So I'm not going to actually deploy anything directly into this Kubernetes cluster. I am going to declare what I want deployed, commit that into GitHub. And then the system will take care of doing the deployment for me. You see what I mean? I don't do that. I don't make those decisions, which means that I am out of the business of making errors as well. So I'm going to go ahead and create that. Then I am going to do a coop cuddle apply, uh, which is going to actually deploy that agent, that flux agent into my deployment. And now I need to do one other thing is that I need to allow this agent, which is going to be checking the GitHub repository and orchestrating the deployment for me.

00:17:08

I need to give it privileges to, um, I flux cuddle identity. I need to give it privileges to actually do writes back into my get hub repository, because not only is it going to look at what I've done, it's also going to record the things that it does in the get hub repository for me. So I'm going to take that and I'm going to come over here and add it into my get hub repository as a new key, and I'm going to give it right access and the big test. Do I remember my password? Yes. And what should be happening now is it should go ahead and do the deployment. Now I may have to give it a little bit of a bump. Yes, I did the key. I'm going to do a, uh, there we go. So you can see it on the right hand side.

00:18:16

Let me go over to my thing here. What you can see here is all of those instances are starting to pop up. All of the instances have been created. Now there's also something called a service, which is a load balancer across those services. And so what we see here is we see the post service. I'll move it over here on the right hand side. What we see here is the connection service. Actually I'll put the connections up top, just like the diagram that we saw below. I'm going to move that there. And, uh, the aggregator. So let's see if things are starting up, the containers are creating and hopefully they, uh, start running here very, very quickly. I'm going to do a quick check. Okay. That is, so those are doing downloads. So I am now beholden to the wireless speeds. It's downloading my containers and my demo just failed spectacularly. Um, I'm going to very quickly and I promise I will, uh, just go back to the slides cause I don't want to waste your time, but I am going to do a very quick, uh,

00:19:52

Check. Yeah. Okay. I think I ran out of memory is what I ran out of. So I apologize for that. My demo's going to fail spectacularly as I, uh, had half expected it to. But part of the reason that I wanted to show you, this was part of the reason I captured this as the, what you'll notice here is that this is the topology that should have been deployed. And as it started deploying, as I said, I ran out of memory and actually my Kubernetes cluster crashed. The reason for that is that I'm actually running it all on my laptop. And I wasn't able to get it up running in the cloud because of some very specific technology that I'm using. So I apologize for that, but what is happened here, and this is what the deployment, what the deployment should have showed is here's the aggregator.

00:20:40

And you'll notice that across the different machines, I've got the different machines here. It has evenly distributed those nodes across the different machines. Okay. So it's made that decision for me. Now, let me go ahead and go back into the mode here and talk about another thing. I'm going to come back to somewhere more business value. Okay. So now I've done this deployment and now we've had a spectacular failure. One of those availability zones has gone away now, does that mean that I should be paged? I, as an operator should be paged immediately. Is that something that I need to be woken up for in the middle of the night? Maybe, but maybe not, maybe because there are still additional nodes that are running and it's going to recover that system. Maybe I just come in the next morning and I get notifications that something's happened, but this system has repaired itself.

00:21:47

Now what's the value in that? Well resilient systems, which means happy customers, which when you read the unicorn project, ideal number five that you saw up on the screen today was customer focus. So happier customers. Another one less stressed staff, ideal. Number two is about joy focus and flow. It's about employee and developer Savvis satisfaction. And we know that those things tied to business outcomes. All right. So I want to now turn my focus to immutable infrastructure and talk about immutability a little bit more. Now, if my demo hadn't failed. So spectacularly, I would have been able to show you in a demo that what I just showed you was ops for the application, right? I was deploying my application and doing operations around that. What I want to show you. And I, again, I won't be able to demo it live, but I'll show you what I would have done in the demo, um, to show you that you can do that for your infrastructure as well.

00:23:05

So what we had running was we had a Kubernetes cluster with two worker nodes that is that's where the workloads are going to be distributed across those two worker nodes. Now what, when those worker nodes, um, uh, get deployed, when the content on those gets deployed, I have all of my application code and my runtime dependencies running on an operating system inside of a container that containers that little dotted line box there. Now, if something goes wrong or if I need to update something, I can just do this. I can throw out the container, create a new one. If I need to make a change to something, some configuration value in there, do I SSH into the container and make that change? Absolutely not. I'm going to throw out a container and create a new one with that new configuration or that new source code.

00:24:08

And I can keep doing that. That is there. And there are so many values, ramifications business values that come from being able to do that simple thing of throwing out a container and creating a new one. Like one of those is when a bad actor comes in and installs malware on the system. And then the bad actor disappears now, while the bad actors in your network, they're, they're easier to, to, to detect, but malware can sit on a system for months and be collecting information, looking for additional ways to exploit things. And so how do we combat this malware? Well, we can try to get better at detecting it, but honestly, that's pretty hard. So there's another approach that we've started to see folks using in the industry and that is, they proactively throw out the containers. They proactively say, you know what, on a regular basis, I'm just going to throw away this container.

00:25:23

If there was no malware in there, there's still no malware, but if there was malware, then it only lived in there for a week, three days, maybe only a day. And we can do this at the host level as well at the infrastructure level as well. If there was malware, it's now gone. And so what I'm suggesting that you do here is that you repave the environment. Don't wait for the potholes to get really bad. Just go ahead and repave. Very often, we have a customer at pivotal who does repaving every three days. And I can tell you who it is cause he speaks about it publicly. It's the Wells Fargo in the financial services space cleans up the containers and the host they're running on, throws them away and recreates them every three days with zero downtime. And he's not satisfied he's working on doing it on a daily basis.

00:26:21

So what's the mental model for humans to do this? Well, how about we just do a repave? And so in the last couple of minutes, I'm going to show you what that looks like to do or repave on this. So the demo that I wanted to show you, and I seriously doubt that anything's come back. Nope, nothing's come back. It's still pretty hosed, but let me go ahead and show you over here. That what I was going to do in the demo was here's the manifest for my Kubernetes cluster. You can see that I have one master node running here at IP address, zero dot two. I have a worker running at zero dot three and I have another worker running at zero dot four. What I'm going to do, or what I was going to do was I was going to take this next worker and I was going to add a new worker to it. Like I was going to throw out a worker and start a new one, so I can do this, which is I'm going to uncomment that out. And then over here, I was going to do a get diff to just show you that I added this new machine. Okay. So the new machine was there. And then I was going to do a, get, add, get commit.

00:27:49

And I'm going to put in a comment, which says repave now. And then I was going to do a good push. And that was going to be all that it took. Because again, I had that get ops agent, that CNCF flux project watching, they get to have a repository, not for my application code, but for what my infrastructure topology looks like. And that was gonna spin up the new node. And then we were going to do the repave and it was going to move workloads onto that new node as we threw them out. Okay. So you see how, oh, and let me show you one last thing before I get back to my last couple of slides in the last minute that I have left and that is, I want to show you the log. So everything that I have done and the things that my agents that the computer's done for me are recorded in this log.

00:28:53

So we're leveraging the component, like get to not only be the place that I put source for both my immutable infrastructure and my applications, but also to keep track of the things that these automated agents are doing for me. I have it in a single log. Everything that's happened to my infrastructure. That's pretty darn cool. All right. So let me just close up and it just went to zero, but let me just very quickly close on a couple of notes. Um, Jean is always asking what we still need help with. And so I do want to spend just the last 30 seconds on this slide. Um, this first comment is referenced to how difficult it be for us to change our mindset from imperative, thinking to functional thinking, to allowing the computer, to letting go of needing, to control every single detail and programming in a different model.

00:29:56

And that I refer to that as learning the toe turn and snowboarding. Those of you who have learned how to snowboard, you know what I mean? But more interesting is that that programming model that I'm referring to is a native thing. In Kubernetes. If you think that Kubernetes is all about container scheduling, that's only the first use case that has become popular with Kubernetes. Kubernetes actually has this deep programming model that I'm referring to here that allows you to do things in this eventually consistent and more functional style of programming your infrastructure. So that's someplace where we need help as an industry is to really understand that programming model and leverage it. And then finally it gets very complex in managing temporal dependencies. This is an area where I'm spending a lot of time thinking about those things. And if anybody is thinking about those things, please let me know. I'd love to chat with you. And with that, I thank you. And I apologize for going a minute over and I apologize for the spectacular fail of my live demo, but thank you so much. .