A DevNetOps Journey – How We Brought DevOps to Our Network Team (US 2021)

There is so much more to infrastructure and network today than ever before. Our Network has evolved and changed from physical routers and switches to a programmable Network (Virtual Machines, Containers, Cloud, IoT, IaC, NETCONF, CIDC pipeline, AI/ML and APIs). This disruption has led to a need to merge Network Engineers with Application Developers to help bridge the gaps between the infrastructure and development worlds and become one team. A DevNetOps team that is a fully engaged and empowered team marching toward one goal and desired outcome – to create new opportunity for the business and enhance the customer experience. We’ll discuss our journey of recruiting developers to our infrastructure world and creating a win-win situation where our network engineers learn DevOps best practices and think in term of Automation to solve problems, automate workflows, network management and monitoring tasks; while on the other hand, how to help developers see the opportunity to apply their dev skills and DevOps mindset and tools, to ensure infrastructure availability, performance and resiliency. Want to know more about our DevNetOps Journey - come join us!

2021breakoutlas vegasvegasus
HD

Hoda D Alshami

Technology Product Manager – Infrastructure & Operation, Nationwide Insurance

ML

Mike Leuzinger

AVP, Network Technology, Nationwide Insurance

TRANSCRIPT

00:00:13

Hello everyone. My name's I'm the technology product manager from nationwide.

00:00:19

Hi, and I'm Mike Lloyd singer chief architect for nationwide. About two years ago, I was leading Nationwide's network services organization and Facebook updating our strategy to keep up with the growing complexity of cloud connectivity, work from home connectivity and the increasing expectations of new near zero downtime. We knew a core part of this updated strategy needed to focus on automation, infrastructure as code and dev net ops practices. While we already had a good amount of stripping style automation in place, we had stagnated and moving toward a more modern dev net ops style of capability. I had an open manager role as well as a couple of open engineering positions, which we would typically backfill with folks with strong enterprise networking backgrounds. However, since we had the luxury of already having had many excellent enterprise network engineers on staff, we decided to use our open roles to hire folks with more of an app dev background to help jumpstart and accelerate our dev net ops efforts. This resulted in hiring Hoda al-Shami as the manager for what was then known as the network monitoring team, and then giving her a few open positions to fill with developers for the rest of this presentation. We're going to use a question and answer format to walk through the journey that Hoad and her team had been on for the past 18 months Hoda before coming over to manage the network monitoring team, you were a developer and then one of Nationwide's first ever strong master and agile coaches. What drew you to the network monitoring manager opportunity?

00:01:47

Yeah, you're right. Actually the oppurtunity and the challenge that this position, um, brought to me, um, as you mentioned before, I spent more than 14 years in the active area and I had the opportunity to be a scrum master and insight core. So I work with different teams from the active area, bringing the DevOps principle and agile brands for that team. And we all know, even in that, um, that side of the house, it's not like one size fits all, so you need to work with a team and make sure like, you know, those changes make sense to them and see the value in that. So I wonder if it's going to be the same, uh, situation with INO and network and specific. So I said, okay, let's try it. And Mike, uh, I'll say like you did a great job and I think there's shock to me because I remember our, our first, um, discussion when I reach out to you to know more about that position and your expectation, you really helped me understand how this position will help me. And what's the excitement around for those, uh, you know, uh, you know, activity that's need to be, have been, you know, adopting the dev ops and agile and triggered my curiosity to know more. So I remember the first thing I did. I had to go and, and do some research around. What's always I model what is the infrastructure as code was the switch and routers a vendor and all that. So thank you for that.

00:03:09

Absolutely. And likewise, you know, you, you came in, um, obviously I gave you background on what we were aiming for, for the network strategy. Um, but there was a reason why I hired you. And one of those reasons was you were much more familiar with dev ops than, than I was. Right. So as you came on and you kind of learned some of the networking side, uh, how did you assess the situation and what was your vision for the team and the overall capability?

00:03:33

Yeah. So when I joined at that time, you mentioned two things, teams and kid, well, I don't think there were, there were there, so it was, you know, more likely global, great engineer. They are as me, they work in silos, they were like, you know, it's like one, one, um, relationship between the tool they are working on and that who is working with that as network engineer. So basically they were like very like heads down doing all the operational work. They are great. Me. They're doing all the admin work for that. And there is no like a cross boundary between them. And when talking about capability at that time capability was, was not like a term that used most of, most of us using tools. So I have this tool, I working with this tool, I do network net network, config management. I do network performance monitoring where available to monitor and do packet capture.

00:04:27

So the first activity I did with a team, like I said, let's go to one room and I vote on that whiteboard. I'm not going to talk about tools anymore. Let's talk about capability. So we kind of know laid down. This are the tools we have, and this is the capability that bring to the table. And it was a great meeting, great session. And, you know, the outcome of that, we found out, you know, both and for me, and I bought it with the team to see how much overlap we have with the tools we have. And the other thing, we have great tools, but because, you know, we only have one engineer working that we're not able to use this tool to the best of its ability. And there is a great feature that will not implement it because you always hear that there is not too many hours of the day to do the operation and admin and take care of the tasks and implement this new feature. So my, my vision from, I would say month one, joining this team, we need to be one team working together and we need to streamline the tool we have and make the best of the tools we have.

00:05:28

Great. And no, he, this would be a multi-step journey, a long journey of that. How did you see that vision evolving over time?

00:05:37

Yeah, it's been 18 months, but you know, it's, it's, uh, it feels like a long time, time, time, because, uh, I remember when I was young, as you mentioned, we were network monitoring team. So we are responsible for the monitoring. Then at that time, we evolved in move to the product centric model and we changed the branding for our team. And we said we network management. So severability and automation, it's under our scope. So at that time he was working with the engineer to see, okay, what automation means to us, what things that we need to automate and try to build that automation backlog, and try to help them think in terms of automation. So I, I remember I reached out to the other team in the network or for the canvas team and the firewall team. If you have any idea about automation or in manual processes that you think technology would help us to automate, would that be called, give it booted it card, put it on our board at that time, we have a physical board and let's talk about it.

00:06:34

So just gathering, you know, the ideas from the old, old about what automation means for them. And, you know, we started growing our backlog and, you know, we need to start execution at that time. We don't have development, we have a great engineered to scripting, but we lacking that liking the, um, uh, development mindset, the object oriented, or their script, more like, you know, serial tasks. So we said, let's add that the first deployment to that team. And that was very critical decision because I really was looking for someone who can help join, help the team, humble of self, a driver, and who is like a teacher and coach who can help them and, you know, bridge that gap and be like, you know, not with them, you know, be all on the same page. So basically I hired that for that, uh, first developer, it was a good choice and really building that backbone for the automation team, then, you know, our backlog growing and growing, and I have the opportunity to add more.

00:07:33

So I reach out to more entry level, um, tech innovator, one of the candidates, a fresh graduate from college, other candidate, like people who are, you know, willing to learn and grow and their network. And, you know, I'm not trying to, you know, the excited to make that difference. So I did the same approach I did with my first developer. I went to them and was very, you know, um, transparent with them. This is it. This is a different position. Uh, ambiguity is big. You're going to wear multiple hats because you need to help with designing requirement, coding and testing. It's not like, okay, you are sorry, you have the, I shared, like I only do development or testing or requirement allies. And now we are one team. We have one backlog and we all working together to execute that backlog. So it wasn't quite the journey.

00:08:26

So as you mentioned, you know, you had some very tenured, uh, network engineers on staff, as well as the newer, uh, early career developers coming on board. How did you guide your team through these significant changes?

00:08:39

Yeah, it is a significant change and, um, always see change because, um, um, I'm throwing the lot to those engineers. So I'm throwing the old DevOps, agile and agile principle, the different way to do stuff. So I was thinking, okay, the first thing I need to do that they want to create that, um, friendly learning environment. And I always say, no question is a valid question. No idea is a bad idea. Whenever we're doing design thinking or requirement and brainstorming, you have an idea of putting for the table. So creating that environment really helped two people open and share their, um, recommendation and ideas. And the other thing I was looking to lead people to the comfort zone because everyone moving away from the conference went including me. So I said, okay, we cannot learn together. And very important here. Like, um, I want to make sure, like we listen to each other, we are active listener.

00:09:37

So for instance, I want the network engineer to really listen to the developer because the developer and the new, the new technology, and maybe they have a new way to do stop the smarter way to do the same work. Uh, because again, we need to automate, but not going to automate the same process, like to optimize it, to clear that work for them, automate that. And on the other side, I want the developer to listen to the network engineer because those people like know the stuff they have. Uh, they have a great years in that environment. They know the customer, they know what's high risk activity that maybe they don't feel started starting with that. They want to need like start with more like a, you know, low impact, um, uh, tasks to automate. So all that was important and we own learning and new stuff.

00:10:23

So I already worked with my team to help everyone get an acquire, the required learning that's needed to be a sexist one vet journey. So we are very fortunate nationwide that we have a program with a community college that I was able to send a couple of minutes of engineer to the community college to just learn the basic DevOps, you know, container Python, scripting, object, oriented CICT pipeline, and all that. You know, they were able to learn a six months and come back with a great knowledge that they can talk to the developer and, you know, have the same terminology, same thing I worked with, uh, with the developer to give them like, you know, crash course, like one-on-one network and infrastructure. So they know exactly when they involved in discussion with management engineer, what they are talking about. So that was very, you know, important, creating that environment, learning environment and help everyone provide them with the tools and training required to be, you know, to be, to feel like they are walking the journey.

00:11:22

It's not like this a change is not risk for them because they can take their job away from them that we all wanting. And you know, that the other thing, because I'm putting, pushing a lot of the team and sometimes you get the pushback and the famous for like, you know, this dev ops thing work for the app dev area. We are network, we are in production. It's not built for us. So sometimes you get that, you know, kill the idea from day one. So I always say, you know, stash, I was 30. That model like stage the flaring, you know, we try something, let's try it, let's give it some time. If things doesn't work for us, we need to tweak it and change it. That's fine. They see tweak it and make it help them. What are the most important thing that we need to see the value?

00:12:09

I'll give you an example of the standup and retrospective, maybe for the sander who did not apply it. Like every day, everyone needs to talk about what we did yesterday, what I'm doing today, what, what the blockers, we'll kind of tweak it a little bit to make sure like people see value and that's 15 minutes. And the other thing that, you know, that really helped me introduce you, that the change to the team and system that change is that as our model here, because as you mentioned, when I start, you know, assessing the current state and created that awareness, that why we need DevOps, how the us can help us, why do we need to switch the agile? What is the desire there? So I helped create that branding about, yeah, we needed over to join us. We need to go that, you know, automation, backlog, we need to move.

00:12:58

Then when the transition is around, okay. Now what thing that we need to learn? So developer need to learn more about networking and engineers need to know more about DevOps. So we'll help them that training to help them be successful. Then now, you know, today and the future, how can we make sure that we grow and sustain and learn from our mistakes? Because the last thing I want to do that we do it today and tomorrow with factual or old habits. And that's include changes in the tools we use to adopt the new tools and, you know, upskilling the people and even the process, the way how we intake the world, how we prioritize it, how we present the world, have that VMs and how we take it from planning to execution, all that changed.

00:13:44

So with some of those changes, you know, you mentioned people in process. Yeah. Can you walk us through, you know, some of the tools and people and processes that went through change as well?

00:13:53

Yeah. So basically dev ops, I think it's, um, it's, it's the magic here. I, I really like this slide because you know, most people think about DevOps from tools, perspective, you know, DevOps it's help us to get the tools that can help us in planning and coding and version control and testing and operating. Think about it this way from today's perspective and is this is true. You know, you need to learn the DevOps tool and B you know, master then to build the right DevOps environment with your team. But DevOps really helped us to prepare that mindset and building a culture of shared responsibility, transparency, and festive feedback. When, you know, you have your network engineer and you have deliver walking together, owning this piece of fault, and it really would need to work together in all this life cycle to make sure that it's working. So it's not like silos anymore. We all like you are one team working towards one goal. So this is really where the DevOps help building DevOps mindset throughout the then tool itself.

00:14:53

Great. And, uh, I believe you had some information to share on kind of how the team came together, as well as some of the processes.

00:15:01

Yeah. So cross functional team, I think, is the magic work here because, um, we know like, you know, you need, you know, everyone to work together. So you read your developer, they are the expert in writing code, the comfortable working with a DevOps team. And, but you need to be, you need to help the neutral continue thinking in terms of organization and think about how can we use this technology to address the problems that we are solving today. And what's come to network engineer, you know, they are the subject matter expert. They have walls of experience and they know the environment very well. Um, they, they really need to work with a developer to help, you know, gathering the requirements. Like I need to automate this provision for this switch, what tasks need to be to happen, feathering this requirement design and do the user acceptance testing, exiting criteria for the chair that will bring to that environment and build that customer feedback and feed through.

00:15:59

So those four where, you know, the initial conditions will come and help address that. And since, you know, as you know, uh, we, we kind of, you know, um, pushing a lot of change with the team. Our backlog is growing. We, we introducing this in your test, but you still have to do your operational tasks, you know, keep the lights on. So that's sometimes creates some ambiguity. So as a leader, you have to make sure like clear the expectation and clear the priority. So that's what for the leader, when we have the weekly meeting where we say, okay, you are working on this, but guess what? This is no priority for now, for this week. I want you to work on this fast because, you know, it's, it's, uh, pushing do that. So for the reader is very important for setting the words for them to help them. And, you know, having spend less time in the team or technology delivery, very helpful because they can take all the war building and only the backlog, you know, holding, facing the metrics and dashboard and dashboard and doing all the agile, um, cadence meeting, you know, fascinating, those, helping that at all, all that. So all four rules here need to work together to make sure you have a healthy DevOps team.

00:17:10

That's great. Can you tell me a bit more about the pipeline tool chain?

00:17:15

Yeah, bye-bye, it's an interesting story because I remember the first time I introduced CI CD pipelines to the team. I got the pushback. There is no way we can do this. This is network, um, continuous integration, why you need it. You know, it's good for the app dev team because they have 15, 20 developers sometimes accessing the same base code and you need a mechanism to make sure like you are continuously integrate those changes. And there is no conflict for me. You know, I have a config for switch, for instance, how many neutral engineer exit that once a day, once a week, why I need that? It's a hassle, it's like an overwhelm that I don't need it. Same thing with a continuous deployment. We're not going to drive changes to one network on a daily basis or hourly basis when it says, well, not Netflix, so we don't need it.

00:18:03

So this is where, you know, the first time I need to use this as a team, I was like, no, this is not for us, but I know like, you know, it has to be a way and I have to go back and revise my question to say, what's the best way to introduce this to the team, but I need an automated way to audit those changes to that environment. So we talk about it and I say, let's talk about it from the test perspective, what are the activity today that you do to bring this change to that environment? And when we look at the current state, we said, okay, we have a technology engineer who designed the change and decide when to put the change. We have a change management system when we put like a record for that to change and we approve it, then it's up to the technology engineer to deploy it.

00:18:47

Sometimes when you have an issue in production, it took us time to go back and understand who did that change, where, where the last version of that config file and what was the last situation in that environment. But if we have that, what are we called? The delivery pipeline or pipeline, you know, using the DevOps tool, going to help us with that. So we tried to go and everything, you know, how we can make that happen. So we say the first thing that we need to do, you need to version control your intake. So we said, okay, let's get our config and put it and get help. Then when we looked at the config, they were more like a board document. We said, okay, can we create a template for those slip, for the template? There's some research that is Jinja and philosophy, which is okay, let's go for ginger and make it this, our standard.

00:19:32

So now we create certain two-page standard for our team and for the nitro. And we, you know, convert all those config to send to the template. This is our standard. Then we created a mechanism to, you know, render the data and the, and the templates. And now we have the actual fake. Then we said, okay, once we have that actual fig, we need to run some testing. We have to make sure there is a sanity check. We need to set it analysis. And we looked at the tool that we can apply for that. And we said, this is a requirement, no config go to production before passing this step then, okay, we did that, build it on commission. And that just said, then sometimes, and most of the time we need like human eyes to look at that veg and make sure, you know, you've captured everything.

00:20:19

Not everything could be captured in this thing. So we added this peer review require the 7 0 1 pipeline. And we really use this peer review for knowledge share and for knowledge transfer as well, because we always assigned, okay, that Google will be one of our senior engineer, but we need a junior in junior engineer to, to look in there. So unlimited as required. Then today we know looking at the kinds of state there is like the production readiness checklist. That engineer go and five one by one and make sure, you know, I checked this, I checked that. We said, how can we automate that? If there's anything, then we automate and we put it as a seminar pipeline. Then we use the tool that we have to ultimately to pull this, change it, because it switches and routers. Then once we have a set, okay, is there anything I need to monitor any to do post validation on any to extract extra telemetric data from the environment to make sure I did not break anything. And you know, sometimes you do all due diligence, but things happen. So how can we learn from that? How we can take that knowledge and maybe I need to update my sanity check. I need to update my review peer review on stage, or I need to update my production business checklist. So all that's important to build the feedback and to be, and to continuous improvement. And that's what we, um, outcome where the, you know, we call it the change five, nine.

00:21:46

Wow. That's, that's amazing. I mean, the, the amount of work that you and the team have done over the last 18 months is absolutely unreal. So we, we have about two to three minutes left if you'd share some of the, so it's been 18 months, what do you see as your, and the team's biggest successes over the last 18 months?

00:22:05

Um, I think this is towards chain. It's really help us. And it was a great exercise with a team to look at the tool that we are using today and look at the future adoption, having that network to the other of the, uh, you know, technology stack we have, and, and our organization. I see this is what we use today, is that anything we can then leverage from other areas. So that was a great exercise with a team. Some of the challenge that we have, it just show progress because we are working on very foundation involved that some take time to build, take time to build that mindset, take time, to see the fruit of that. So how can we, you know, show the leadership like you are moving. We are, you know, we are making progress example for that side, uh, on Timberland station for the document, you know, we're doing show and tell them that ticket 15 minutes, it took us months to come with the idea and she was there to then execute on that.

00:23:01

Absolutely. So speaking of that, uh, what kind of challenges do you see ahead?

00:23:07

Um, it just keeps everything motivated now because, um, it it's a log of four and keep them, you know, on tuned. And the other thing I know, like they built that team organically. So, um, you know, we have the great skills now, but if someone decides to change position, like, okay, now I need to go, um, start all over again. So keep everything would have it and keep them, see the value and that will be due and again, choke progress.

00:23:35

Absolutely. And I can tell you just from walking the hallways, you know, when I see your team, um, yeah. Not, not as much in the hallways anymore with most of us being remote, but just the energy is very noticeable. And I think it's a really great story. Thank you for sharing it with us. Thank you. Right. I think I get a lot of requests that people need to join the innovation team. Now that's a good thing. Awesome. Thank you so much. Thank you. Bye folks.