Las Vegas 2018

DevOps and Jaguar Land Rover

Chris Hill, Head of Systems Engineering at Jaguar Land Rover, will share how to accelerate software development through DevOps and strategic tooling.


Chris Hill is the Head of Systems Engineering of Infotainment at Jaguar Land Rover and is responsible for executing industry leading standards, principles, and the future vision for software development and operations throughout the entire business. Hill has over ten years of experience working in the software development industry. Prior to working at Jaguar Land Rover, Hill has held positions at Wirestorm and Shorepower Technologies.

CH

Chris Hill

Head of Systems Engineering, Infotainment, Jaguar Land Rover

Transcript

00:00:05

There are a couple of challenges involved with building software for vehicles specifically as you're driving, think about if you're driving here, does your code distract the driver, right? Uh, second thing could be, does your code work for 175 different markets, uh, and does it work for, or is it legal in those 175 different markets? Um, also does your code account for, does your code account for uh, every specific vehicle variant that a customer could possibly choose when they're trying to check out their vehicle? If I could get my slides, that'd be great. Um, moving on. So, um, building software for a vehicle because of all the challenges creates a high level of complexity and why they're getting some of these slides up. Can we actually just go back one more?

00:01:08

I've got a story to share about infotainment in general. Uh, my wife and I both were driving separate cars to the same destination and just like any other day, I typically use Spotify. And as I'm sitting there in my car, my phone automatically syncs with Bluetooth plays at Spotify and returns to the previous song that I was listening to before I shut the car off. What I'm interacting with every time that it happens is an infotainment system. And an infotainment system is essentially your technology experience within the vehicle. Uh, or just the touchscreen in the middle, however you want to call it. As my wife and I are both headed to the same destination, I'm listening to my Spotify and all of a sudden my Spotify cuts out and what happens is, is I start to hear my sister-in-law on the speakers,

00:02:04

But only for about three seconds and then it goes away back to my Spotify. Believe it or not, when I drive, uh, I pay a lot of attention to the infotainment system that I help build. So, um, it happens again 15 seconds later. Uh, at which point I figured out that the waves and the proximity between my car and my wife's car in traffic get farther and farther apart as we go and as we stop, get closer and closer and she was on the phone with her sister sinking and taking prior priority of my Spotify Bluetooth phone and taking over my car and resuming that call on the car because she was a previous, because her phone was a previously synced phone.

00:02:50

Instead of actually resolving the problem, which probably I probably could have just turned it off, I sat there actually thinking about if I were the developer, would I have thought of this use case? And when you think about if you were designing a Bluetooth stack and you're thinking about, well, we've got a phone that has an active phone call, I just found out about it when I'm scanning Bluetooth, well the first thing I'm gonna do is think that a active Bluetooth phone call will always take priority, right? Instead of obviously allowing the Spotify to continue to play. And I thought about if I were a developer, would I have thought of this use case and how would I have handled it? And ultimately the answer was I probably wouldn't have thought about the use case. It is a very complex use case to think of and when I think about it now, and when I identified it to the team, we talked about looking at Bluetooth proximity, but is really Bluetooth proximity that important?

00:03:48

'cause how many times do you find yourself running into that particular use case? I don't know. And it actually went into a backlog and still to this day we would've run in, we we will still run into the same problem with our vehicles, right? The problem is when you have so many complex use cases and your process is complex, your throughput will suffer. And what I want to talk about specifically in this talk is how we improved the throughput within infotainment by adopting more of a DevOps methodology. Before I do that, I wanna talk a little bit about who we are. 40,000 employees, 24 billion pounds a year in revenue. That's like $30 billion. Um, we sold 604,000 vehicles last year and we have approximately 5,000 software personnel. Our infotainment system, the one edit embedded device that I was just talking about is one of 40 embedded devices that are in the vehicle.

00:04:45

So imagine thousands of people developing software for a 4,000 pound aluminum box on wheels. All of the embedded devices all interact with each other in real time. What has been the latest buzz is the vehicle we just released and that is the Jaguar I pace. This is our first all electric vehicle. It does zero to 60 in 4.5 seconds. It has a range of about 280 miles and the car looks sexy. What I've come to realize though is Devon ops isn't always so sexy, the people are sexy, don't get me wrong, but the this idea where becoming a change agent and doing transformation in large businesses takes some serious and hard work. And there are some qualities that I've begun to respect in terms of individuals that I know are on board with the change and those are qualities like inspiration, motivation, persistence and attitudes of continuous improvement. And one thing that I wanna walk you guys through is essentially our evolution of adopting DevOps up until infotainment. So I'm gonna start with a little roadmap and in this analogy I'm gonna go through our course corrections and what we learned along the way.

00:06:16

It started with two of us in Portland, Oregon and two chief engineers from the United Kingdom that understood our throughput and our output wasn't quite meeting our potential and it actually took six months for me to convince our boss to let us go to our first DevOps enterprise summit. This was in 2016, it was in San Francisco. And out of everything that we read prior, what we didn't read was given to us at this summit and that was the goal by eliahu Gold Rat. You guys may have heard of that, the Phoenix Project by Mr. Jean Kim probably heard of that one too. This was finally enough inspiration for us to at least hang on to a foundation on what would end up turning out to be our start of our DevOps journey. When we got back act actually after this conference, highly energized, ready to essentially take over the world from our company. We wanted to change the way we did software development in our business that day. Nobody else went to the same conference and nobody else was really enter, uh, was really energized to do anything fun, to really do any sort of changes at all whatsoever.

00:07:32

What we knew we had to do was get something working and one of the principles that I first heard actually in the UK from a company called Code Think they mentioned working software trumps everything. And really this is the CEO that said it. Really what he meant by that is all of the design talk and all of the conceptual talk doesn't carry as much weight as actually proving something out. So we decided to prove something out and we needed a server.

00:08:04

Uh,

00:08:05

The initial server wasn't actually this dusty and I haven't seen an optical drive for like 10 years. That's amazing. Um, it didn't have an optical drive either and it also wasn't a big desktop like this,

00:08:18

But the software we chose could have run on anything. It could have run on this, it could have run on any piece of hardware that we had laying around and that's because we chose free and open source software. And the free and open source software we chose, uh, was Linux as our operating system and we chose GitLab as a revision control system. We were heavily interested in moving to a git in general, uh, from a very traditional revision control system. And GitLab seemed like the right fit specifically because it was free open source it had on-prem and it also had continuous integration, right? This was our essentially first prototype for two or three different projects. These two or three different projects uh, started to move their bus factors from one to two to three to everyone who essentially knew how continuous integration and build worked for their projects. Unfortunately, having CI run on your same server that you are running your vision control system, you'll end up seeing resource exhaustion. And ultimately we did what really any software team does when they find themselves in a pickle and that is buy more hardware <laugh>. We ended up buying three additional build slaves, which also introduced us to three additional personalities. We had serious configuration drift essentially to the point where if your continuous integration pipeline landed on anything but runner number one, you were waiting like four to eight times. As long

00:09:52

As more projects started to adopt the same tool set, we ended up getting more volume and more complaints. Now, I don't like to think of complaints as just complaints. I really like to think of them as uh, feedback loops and fact your only course correction, the only way you can determine whether or not what you are doing actually is on the right path or not. Some of those complaints, some of these positive feedback notes happen to be wrapped in explosive emails with choice words. Arguably what's more important equal or more important than the complaint itself is the response to the complaint. No delusion, no denial, but psychological safety in that the complaint will actually carry out into something that will be resolved and you can move forward, uh, with progress. Now what we initially did here just a second,

00:10:58

Was we decided not to continue down the same route in that we were had we, we were having three additional servers with three additional personalities. We decided to move to something where we could imagine more self-service and we got a list of uh, essentially our top complaints. Our favorite complaint was really, I asked the ops team three weeks ago to put a build dependency on all of our servers and they still haven't done it yet. And in fact, because it's not there yet, I'm just gonna go back to building on my own machine. This obviously is a knife to the heart 'cause this is essentially regression when you're trying to make progression. What I really like about this complaint is this not only gave us a course correction that was technical in nature, but also behavioral in nature. And we decided to move to ephemeral docker containers

00:11:57

And established a pattern in that all of the build infrastructure for every application was defined in code. We used a tool, a HashiCorp tool called Packer and we defined every single bit of build infrastructure in another GI repository and then we gave it to an application developer. Now, if I were an application developer, I had full access over the build infrastructure and full access over my own code, basically full autonomy, which really means that a complaint that an ops team never actually added a build dependency is out the window. Now you can do it yourself, right? Ultimately what now became the bottleneck was managing bare metal servers and managing bare metal servers. Uh, you guys have probably dealt with this prior power outages, network outages, switch outages, you name it. We weren't actually staffed to handle bare metal servers. The fact that they were running was great, but we didn't have staff to take care of these types of problems. We were asked actually in a standup one time, somebody raised their hand and said, our servers don't do anything at night. Should we be like mining Bitcoin or something? We had compute power that was essentially going to waste at night and this was another one of our course corrections. Even though I'm a fan of cryptocurrency, you guys can probably tell we move to the cloud and this was essentially a technical change for us, but also behavioral change and a process change. In fact, the ephemeral docker container pattern was still there, but now it was an ephemeral EC2 instance

00:13:43

That served the underlying docker container. Now, every time that a build needed a continuous integration pipeline, we started up an EC2 instance, we started Docker on top of it, fulfilled the job and then tore itself down and deleted it.

00:14:00

This also got us thinking that now that we're identifying EC2 instances in infrastructure as code and docker and infrastructure as code, let's actually move all of our software application to infrastructure as code as well. Now that we are on the cloud and we had this abstraction level to work with, we started to use another Hashi Corp tool called Terraform and we defined all of our GitLab stack, for example, all of our runner stack for example. And essentially every bit of infrastructure change that was done was done through a Git repository and done through a continuous integration pipeline. This is full transparency to every change that could possibly be made in our lifecycle. Around this time there started to be a little bit more interest from the uk. I was asked to actually move myself and my family over to London or over to Midlands, United Kingdom, not as violently as Subzero does here at Scorpion, but um, and I don't even look good in green, so it's not even, doesn't even look like me. What we had to do was continue the transformation and even though on my way to work, I was basically entering into an environment where nobody had used any of the tool set or understood any of the principles. I knew that the only way we were going to get bigger adoption was to be at the headquarters of our company. And this is why it was worth it for us to move.

00:15:32

I had a coworker recently bring me a revelation and that is when someone is promoting a tool and they'll use phrases like our engineering team spent 10 years pouring into the feature development of this tool, it will help your team become more efficient. What's interesting is that when you acquire the tool, you don't actually gain the 10 years. It's not like magically in the RPG of life, all of your software developers just get plus 10 software engineer outta nowhere. And what we noticed is that even though the application may be the right move for you longer term in the shorter term, it may make sense for you to feel that pain. First,

00:16:20

I learned that I did the same thing in the uk. In fact, there were developers that I just set the expectation, do this. I'm not gonna tell you why do this because we've already figured this out and I didn't transfer any of the pain and it wasn't necessarily understood. And I remember a software engineer that started, he was brand new in the in the United Kingdom and he said, why am I messing with uh, infrastructure as code? Why am I even looking at Ansible? I'm an application developer, right? He hadn't felt the pain of what it was like to be an ops team maintaining a set of build slaves for hundreds of applications that all needed bespoke configurations. So I needed to do a better job with essentially transferring some of that pain. So what we were drastically interested is the bigger project adoption and the project we looked at first in terms of big numbers was infotainment. Infotainment now is the same infotainment across 46 different vehicles. Specifically when we walked into the software development lifecycle, we saw some indicators and we knew we had to improve. The first one was our feedback loops were four to six weeks long.

00:17:36

I don't remember what I had for breakfast yesterday, let alone the code I wrote six weeks ago. Could you imagine that big of a context switch to go, oh yeah, that last six weeks of works you've been doing, forget about that. Jump right into where your frame of mind was six weeks ago. That was painful. We knew we had to fix that. We had larger numbers than we were used to. We had a thousand developers all on one project. It was going to take us determining whether or not we could scale to be able to support that. We also had three people on the earth that knew how the build system worked end to end. Like literally only three people actually knew how it worked. And it was so massively complex that sometimes we'd have multiple meetings day after day with 12 plus people trying to talk about how we would change the build server to accommodate some sort of new change. This was drastically frustrating because all 12 of them had a different idea on how we were actually going to implement it. So we started doing continuous integration, we started continuous build, we actually moved the infotainment project to Git,

00:18:47

And with the introduction of that, we have much more positive indicators. Instead of continuing in their old fashioned, we now have about 30 minute feedback loops. You're not having to wait six weeks. We've now automated how a thousand developers can all contribute around the same time. If we've got a release train, um, or we have a staging branch that everyone is releasing against, we found that on some days we would have this huge burst of development. We happen to build a Linux distribution and that can take some time to build, especially when you're changing multiple components of different parts of the stack. So being able to prepare for incoming work actually was important that we automated. We also changed the build infrastructure to be super simple. In fact, out of all the applications we run in our Linux distribution, which is about I guess if you're including open source components for over 50,000, if you are just a JavaScript developer that was writing code specifically for one small part of our infotainment, even that developer knows exactly how the entire build system works to incorporate that component into the entire ecosystem. Tremendously simplified. We've used this same tool set

00:20:16

To deliver at least in the last one and a half years, to nine different vehicles. And each of these nine different vehicles all run the same Linux distribution, all customized for those things that I mentioned earlier. 175 different markets, all the customer variants. One of our latest ones was the Range Rover Velar. And for the first time we were able to incorporate two screens, we call it a lower blade screen and a upper three pillar screen. And for the first time, we were able to essentially drive two different touchscreens at the same time. This was revolutionary for what we were doing with infotainment. We wouldn't have been able to do this as fast as we did unless we had moved to a different tool set. Now this is a picture of the Jaguar Ipace, the one that I mentioned earlier, the all electric battery vehicle. This is the first time JLR has ever done this. In arguably one of our boldest moves, we got this out there before the established competition. Other than the Model X, this is meant to be a Model X competitor. And in Europe, the automotive manufacturers were fighting as hard as they could to be the first one to come out with something that could compete with the Model X.

00:21:44

Now, one thing we were challenged to do with infotainment was how could we change the game in the fact that we ditched combustion engines? How could we change the game in infotainment extend the sexy? And what we just developed, and this was released last month, was the ability to do software over the air. One of my favorite features across all of infotainment is the ability to deliver a full Linux distribution in the form of a file system Delta directly to your vehicle in a bluegreen type of deployment. As you're driving, you can deliver the entirely new version and not know that it's happening in the background. The reason why I love this feature so much is because it's a process investment and a product investment. There aren't many direct customer features where you can tremendously help engineering. And what I mean by that is, even though we may actually deploy over the air to customers four times a year in engineering, we now deploy software hundreds of times per day. And we talk about the indicator all the time, how many deploys per day per developer? I was always embarrassed to share ours because it was non-existent. We weren't getting new software to vehicles fast enough. We just simply didn't have the mechanism. It just took too long. But now with software over the air, in fact, I was driving home getting a new, getting the latest version from Master, from the master branch every time I drove home getting the latest version every time I drove into work.

00:23:21

In fact, I told my wife at one point, our Land Rover has the most up-to-date software in the entire world.

00:23:27

<laugh>,

00:23:28

She didn't really care. <laugh>, what I'd like to show you, to give you a little bit more context, is a video on the infotainment system for the IPA.

00:23:48

There's a wealth of innovative technology inside the new I pace. At the heart of the cabin is the sleek and minimalist dual screen touch pro duo interface. Everything you need is at your fingertips. An unrivaled audio experience, climate features, even driving modes and vehicle dynamic setups. With the Connect Pro Pack, you can create a 4G wifi hotspot providing connectivity for up to eight devices at a time. It also provides enhanced navigation functionality including door to door route planning, search for public charge points, real-time traffic, satellite view, and the ability to share your estimated time of arrival. Software over the air enables software updates to be sent directly to the vehicle without having to visit the retailer. The remote app lets you interact with the I Pace via your smartphone, allowing you to control charging access range information lock or unlock, see your last parked location and find your way back to it with on foot directions. Compatibility with personal assistant devices will allow you to speak to your vehicle from the comfort of your own home. With the smartphone pack, you can share your Android or Apple smartphone screen, allowing vehicle optimized apps to be controlled through the vehicle's touch screen

00:25:31

Navigation Pro comprises an integrated real-time navigation system incorporating features specific to electric vehicles, such as accurate range estimations, low power alerts and charge station location and routing. Smart settings works with the intelligent key FOB to allow the vehicle to automatically recognize the approaching driver and configure itself to their preferences, including seat position, climate settings, and call list contacts. Jaguars, second generation advanced head up display helps keep your focus on the road with crystal clear information and prompts projected onto the windscreen. The all new Jaguar eye base Jaguar electrifies.

00:26:40

So they didn't include anything about uh, GI and continuous integration, but uh, I always imagined us making a marketing video that was just for engineers, but we never ended up doing that. There are a couple of lessons that we've learned, uh, throughout our journey, uh, and specifically the difference between a true strategy and a set of objectives. And that is a strategy needs to make you more competitive in the marketplace. The fact that we went with continuous build, we went from about 26 builds of our Linux distribution per day to 700, excuse me, 26 builds per year to 700 times per day. A allowed us to springboard the ability to iterate quicker. And when we took some of the complexity off of our build system, we knew we were ahead of our competition because we made everything simpler to allow throughput to excel. We learned that principle-based software delivery can sometimes mean uncomfortable and conflicting opinions. Excuse me. Without the principles, you didn't have any foundations for how you were actually going to carry out your change. We learned that democracy isn't always the best approach. This was interesting. The reason why your company is in the situation that it's in is because the majority hasn't advocated for the change in the past. So sometimes if you poll everybody on what you should do, that may actually be the wrong answer. What's important is to show up with data, to show up with evidence, and to show up with working and proven software.

00:28:25

We learned that there were no wasted experiments. Specifically the idea of continuous experimentation allows us to deliver software over the air in which we're essentially removing all ability for the technicians to have to do any dealership updates anymore. We learned that when you articulate the why and you talk specifically about why making an investment in the process is important, you can use raw numbers on what has happened historically to help make your case. We learned that if you lead with focus, sometimes laser focus on small batches and you provide positivity and transparency to everyone that wins during battle. We also learn that acquiring sponsors and acquiring and acquiring sponsors, excuse me, and partners that specifically are there to help you with your vision, will help tremendously throughout your journey. There still are things that we need help with. We currently still have problems we don't know yet exist that we need help to solve. We also still are recruiting for attitudes of continuous improvement. We would also love to hear about your challenges, any of your problems with your transformations and your overall DevOps journey. Thank you for your guys' time. Thank you for listening and enjoy the rest of your conference.