Employing DevSecOps for Air Force Cyberweapons (US 2021)

Many commercial companies conduct DevSecOps transformation to increase business value. The US Department of Defense currently seeks to implement Continuous Integration/Continuous Delivery in several areas to overcome previous multi-year program delays. Our corporation recently contracted to move a USAF cyber weapon system from Waterfall model to DevSecOps. Our story involves moving multiple facilities, building a culture, developing relationships, and incorporating Agile practices despite compliance, technical, and communication challenges. While our journey continues, this session shares our experiences at incorporating DevSecOps and an Agile workplace into a Department of Defense Cyber Weapon System.

breakoutlas vegasvegasus2021
DM

Dr. Mark Peters

Technical Lead, Novetta

TRANSCRIPT

00:00:12

Good afternoon. I'm Dr. Mark Peterson. I'm the center here today with DevSecOps for the USEF cyber weapon system. You're one, I'm excited to be here at the DevOps enterprise summit. Uh, this is my first year at the DevOps enterprise summit. And although I potentially rather be in Vegas, it's certainly a nice view. We're sending all of you, regardless of the situation, uh, with me today or with me the question to answer and with Jeff Sorrell who are business development, uh, for technical, as well as Brian Butler, who's our system integration lead for technical. Uh, both of these people will help us advance the program, uh, and help answering particular questions. You may have. Uh, no, as we started, I started out working with DevSecOps in Technica as their lead IAA for security engineer. Uh, so I got a lot of experience and looking at some of the problems with soft and some of those unique problems we faced in trying to take, uh, an air force cyber weapons system and integrate into this blended DevSecOps.

00:01:07

So when we're looking at that, we're looking at what we're doing. We have to know that the mission background is the U S air force specializes in air power at the right place at the right time. Uh, that's how we made our money. I did 22 years in the air forces, intelligence professional, uh, and that's how we learned what we had to do. Now. That culture that comes with that, delivering it the right place in the right time comes with high levels of training and high levels of risk. Uh, so that culture influences even cyber tools. I used to work with a lot of fighter squadrons. I worked at both the F sixteens, which was the Falcons and the, or the Vipers and the FTD, the, the Eagle or the striking when they had a flying culture, they used to talk about that the rules were written in blood, that the standards they had in place to prevent things all came out of having fatal accidents.

00:01:54

One good example is with the XX teams, which are single engine aircraft. When they, when they come forward to fight, they go into a one circle fight page, turn off like this, their cockpits face each other. And there's one circle. While in training, they were only allowed to do 180 degrees of the turnaround that circle before they had to exit the circle and go off that made sure that the circle state of constant distance, it didn't get smaller, uh, and they didn't crash into each other and lose pilots, right? Because even though that might not happen, the chances of it happening more higher, especially if you allowed it to go more into that practice. So they test these practices, they test, and they test again to prevent an error. However, this runs contrary to when we get into the culture of dev ops and we try to move faster and we try to make sure that we can move faster. So what we want to do is became here was try to figure out how we can accelerate that DevSecOps delivery, uh, for the weapon we were delivering while we were overcoming those cultural challenges, while we're overcoming the waterfall mindset that said stop and check at every level, checking every facet so that we made sure we went to the right place at the right time, uh, and did what we were supposed to.

00:02:57

And now we start off in one of the challenges we faced. If you look at the slide over on the right of the picture, on my right, uh, as I talk about it, uh, this is the simplified and easy version of the DOD acquisition strategy that defines what you have to do in order to have a project delivered. Uh, unfortunately each of those little blocks, those little different colored blocks probably has anywhere from six months to a year's worth of effort, uh, involved in the whole process. If you do it like, right. So a government waterfall program says three to five year cycle to get anything out. And no matter how simple, right, the simplest projects are often, uh, take longer and can be more complicated. Uh, I was once in line where they had to adapt a computer so they could focus in a pressurized environment.

00:03:41

And that even was the three to five-year cycle, even though we could buy it off the shelf, the admin and packaging of those components was more important than actually delivering a functional product to the other end, getting all the right paperwork associated with it, to make sure that something bad didn't happen was the goal. And the reason we move forward and the reason what's that goal were those rules written in blood, right? We had to do it the right way. So we started our DevSecOps transformation. We see that the DOD, uh, in some of the higher levels of DOD mandated the use of scaled agile, they liked the scaled agile framework. They wanted everybody to use it and we were going to be no exception. So when we came onto the scale dads, when we looked at some of the integrated teams, we started off, we said, Hey, we're going to have a team for development.

00:04:22

We're gonna have multiple teams for development. Uh, we're gonna have to have kind of an operational setup, which from a program standpoint is kind of more of your program manager, point management viewpoint. We're going to have a security team. We're going to have a lab team, and we're going to have a couple other teams. And we're trying to build those teams in together, the best we can to get to the right process for us. Well, what do we see as our mission requirements? And I know I'm blocking that slide at the bottom part of the bottom actually says accelerated timelines. Uh, there was no standard infrastructure. We took over the contract from someone previous. The previous contract had actually not been one contract, but had been three contracts. They'd run one for sustainment, one for maintenance and one for development. Uh, instead with technical picked up this contract, we ran one centralized contract, although we maintain subs, but we did it all together so that everybody could talk together and work together.

00:05:13

Uh, but we didn't inherit any infrastructure. So we had to build our own infrastructure immediately. We had no infrastructure. We had no dev environment. We had no complaints. We could go to test things and we still don't have the infrastructure. We didn't have any physical workplaces when we started, uh, one of the things was, we'd said, we go and have, and workspace and we had it contracted out. But then the contract ownership of the Realty changed, which meant our contract changed, which with what we expected to be there on day one, uh, it didn't wind up showing until about day 180. You know, these things happen. They're part of the, the agile process, right? Uh, the previous contract hadn't had integrated security built in that had different periods of it. Built-in they left us with a half-built model. They hadn't finished their release when they got out.

00:05:54

And not only that, but the releases they did have had incomplete documentation. They didn't build it all the way through in what the government expected. Now, I know agile says that you should for functional software to paperwork, but at the same time, there's places where the paperwork is helpful. Especially if you've got users scattered all over the world, like the air force and different users have to do the same thing in a different manner. So what did that requested feature look like? What were they talking to about us when they said a weapon system? Well, the weapon system we're talking about is CVA hunter. Uh, the CBA hunter services is kind of a, a blue force kind of an off net sock, right. That it can come in and it can take a look at your functionality. It's someone else's functionality or the functionality for that mission owner.

00:06:38

Uh, and then provide feedback, say, where is it good? Where is it bad outside of your normal operations set outside of the normal process where you might see problems happen? So very good, very valuable, uh, cyber weapon system on the defensive side of things, right? Not an offensive, not getting out there into the enemies network, but, uh, sneaking on our network, looking at traffic, we have rights to, and trying to figure out how we can get the best results, uh, for the systems we have, how we can protect it the best way. Well, our flight plan, if you want to call it a flight plan, uh, was to close some of those software and documentation loopholes, right? Get right up to speed on the processes and get everything out there as fast as possible. We thought we could innovate our way to successful delivery and do that through creating a winning culture.

00:07:21

In the dev ops mindset, we looked at the architecture, we built up the systems, the architecture, uh, to mirror to our integration. So then in our system lab, when we were talking about it, we had not just virtual environments, but we had hardware environments. So we used those hardware systems, uh, and we tied them to our service desk, service desk, answering, uh, tier one tier two issues, right? Or tier one is a problems you can fix with a checklist. Tier two is configuration issues. Tier three is we need code to do it, right? You have to have code separately while our development teams were right there in the code or in the process. So they could fix it if it ever came to a Deere tier three. Uh, but we had these blocks that, that Dan's program, the defense and network systems that we had, um, in one of the most valuable was what we talked about as being our lab or integration lab, where we got the capability for each developer to set up their own virtual environment where they could do their own testing or their first levels of testing before it came out to the next point that we needed.

00:08:23

We talked about our physical challenges. Uh, we had, we actually went through three temp sites. The picture you see is the building. We eventually wound up in that one that was supposed to be there on day one. Uh, but do the reality issues had to be delayed just a little bit, just a little bit when we got there. It was great though. It was smooth. You'll see. Another picture later on, uh, we got the videos up on the, on the wall for the I-track to monitor the status of the system, monitor the dev environments. Uh, we moved from having that dev controlled environment, uh, individually to having a continuous infrastructure. So under the previous contract, even though they had their own environments, we hadn't inherited that. Right. But the devs could go in and destroy the entire not destroy update exchange, expand, uh, the entire overall system.

00:09:06

When we brought them in, we brought them all into that lab so that they stood up their own environment and we provide the infrastructure and it was that deployable infrastructure, infrastructure as code whenever they wanted it to be able to move to the next step. Uh, and that really allowed us to accelerate our delivery because with that, and here's a picture, then you say, right, you get the pictures up and they've got those great technical beans, but what they let us have and say, let us have that. I drag in that control over what the different integrations were using. So not only could we look at the continuous infrastructure they were deploying, we could manage those requirements and resources. And there was one instance, there was probably a couple of instances where we noticed during the pandemic, uh, developers were using it from home. And, uh, certain developers were using more of the system, more that VPN pipeline, that security back and forth, uh, then we may have actually needed to do the job because there was only one or two of them that was using that much.

00:10:00

They weren't using too much functionality. Well, they were moving tired, ISO back and forth. Uh, every time they started up and every time they'd boot it, that was bad. But, uh, with the spirit of dev ops, once we figured out what that was, and that was hampering our flow, we gave him the feedback. We did some experimentation to make sure that we had the right answer. Uh, and we found ways to speed it up. And we only did that because we committed to that continuous infrastructure. We committed to that continuous way to move forward. Uh, that was the lessons learned in the process. And of course that was learning that virtual and presence is not always equal to virtual presence, right? Sometimes you actually have to get a face-to-face with an individual to get the results you want out of the Tufts, that program.

00:10:40

Uh, we did see some very baselines, uh, we saw some problems, uh, in running through with the government, not because of anyone, but because no one had run this fast before, when we took that first baseline, we took that half done software. Uh, they'd been working on for two years. Uh, we finished it off by the end of the first month and pushed out the door after that we were on about a three month cycle. So every three months, uh, it's not true. Well, I say not true agile. It was our agile. So it doesn't matter if it was true DevSecOps or true agile or our agile because we made it work. Uh, but every quarter we'd be pushing something out the door. We used a system of integrated testing, and you can see the system of integrated testing up on the slide to get functional tests, dynamic tests, uh, integration into the product, uh, in check how those things work in along the way as we were building them, we drafted the documentation and a lot of people hate documentation, but it's a necessary factor, especially when you distributed, we drafted the documentation, we held it in a highly visible format.

00:11:42

Uh, and we let the devs actually write their own documentation. Uh, and then we snap the chalk line. At the end. We will have the tech writers take it and make it, uh, more presentable to others, but as passed it around those teams, we made sure that when we were drafting the documentation that every other person could do it, this is one of our key points of security to, uh, being a member of the security team. Our security guys participated in depth process, and we saw the results of these tests. And when the documentation was being drafted, we saw that and we were, uh, extremely familiar with where the documentation was. So when it comes time for an audit and our auditor back at the ATO facility, uh, it comes back to me and says, Hey, uh, we're having an audit. We need to have two factor.

00:12:25

You guys have to have two factor. You've skipped this requirement. Show me where you have to factor. I said, not only do you have to factor, I can point you are, the instructions are for the two factor. I can give you a copy of how I set it up. And I can even show you how I went into my instance and ran two factors to show how it worked. It was run all the way across the way, and it worked really smoothly. Bring people down there and bring everybody together to create a process. Those, those tech writers, instead of redoing things, helped us spend our effort on doing where we had to refactor the code base and work on the code base that developers could work on the dev parts and just pass enough information down for the tech writer, do their part. But those tech writers, just like the security guys participated in the daily events.

00:13:07

They participate in the daily stand-ups and they understood how it worked. We'd got the audit data from security that we needed as we were running our scans and we merged it with the infrastructure. We kept track of a standard baseline. And then we did the scans as each item came out of the DevSecOps version and we compared those. And then again, we can allow the developers to focus on the elements that needed their help so that they were only fixing things that needed to be changed instead of just kind of working on a blanket set. Uh, we could coordinate the resources between the devs because we had those VMs. So when you're having a problem, instead of having to drag somebody else in, they could do the virtual bit with the pandemic. They could log into your environment, they could see what you were working on and see what the issues were and move forward in their discussion.

00:13:54

So you say that's great, and that gives you some of the functional aspects, but it doesn't get you the compliance. It doesn't get to the compliance aspects. It's true that when we inherited, uh, we had a large compliance problem and we didn't think we were going get agile. Uh, if you've heard about the authorization operate on the government and the authorizing official, who has to sign up off on it, no, the previous CVH version, hadn't, uh, an ATO with conditions, which meant they had to fix some issues, uh, before they could get a full authorization. Well, unfortunately, those several issues that we had to fix, uh, to get to a full ATO or about, uh, 1,260 when we took it over and we closed more than 90% of those, I think when we actually turned in our ITO documentation, uh, we had less than 40 of those open.

00:14:43

Now, granted, a lot of those had to do with policy. A lot of those, uh, we had to rewrite policy, we had to get policies signed. We had to get policy approved, but the way we were to do it was because of the integration with the teams that integration of compliance and agile, having the security folks there meant we knew exactly what was going on in those teams. We knew what we were working on. We knew what the features were, and we knew where our authority authorization to operate our ATO tied into those teams practices, which really let us have some successes on integrating our compliance and agile. And it started with having those security folks down at the team level. We had four teams, we had four security guys, uh, and I was supposed to be a security lead. Even me. I went and sat one day in each team and every couple of months we'd rotate through.

00:15:28

And not only did we rotate through, but when we rotated, uh, or when we went to the teams, we came back to our meeting, which was later in the day, we talked about what we heard in the teams and we discussed the team meeting with us. So we were functioning as a separate, uh, dev team, agile team in the structure so that we could get all the information together. We use the structure we had on the VMs. We use a structure, we have in depth nets to get to a continuous monitoring solution. We were pushing out hardware for this system. It doesn't push it out on a cloud, uh, but it pushes out hardware. So we could monitor how it went and we could monitor out what for it. Now, at some point soon it will be on a cloud system and getting that cloud system is going to enable even more of a continuous monitoring. Because even though they're still gonna deploy independent servers, they're still gonna monitor when they come back, they're going to check in, they're gonna look at that process. And that home system is going to say, Hey, you're in this state, we can verify that you're in this state, that's a known good. And then we can update you and move on.

00:16:26

So as we get kind of the end of the process, so we talk about the, the main things that we looked at, right? What are the lessons learned that I can give you, that you can take away or that you can move forward? Or we address debt in three different areas. We address cultural debt, technical debt, and process debt. When I say cultural debt, I talk about the shift between, uh, an agile culture and a waterfall country culture that we started with. We had to really make allowances for some of these leadership structures. One of the things was that when we got to it, uh, at the end of every session, at the end of every, uh, program increment, uh, the program manager wanted to snap the chalk and have release, and he wanted to have test to go with it. We said, you know what?

00:17:06

We're not really sure that's agile. Uh, we'd like to do the features. We'd like to deploy the features and work with the features. Uh, and the releases will come when the release come. And he said, Nope, you know what? We're going to do this. We said, all right, you know, you're in charge of the government. You've contracted us. We're happy to do it your way. Uh, we'll use our dev sec ops teams. Uh, we'll integrate this. And every three months we've been pushing out a feature or we've been pushing out a new release. So we have, uh, we started with a three.two.one, and then we pushed a three out three and then a three dot four and then three out five. And what actually happened was we were pushing it so fast that that administrative process, the government had built up. Couldn't actually keep up with the speed at which we were doing releases.

00:17:48

And this is only once a quarter, not once a day. Like we talked about the high end of the door metrics, right. Uh, but it got us there. And we were able to show those tests and build those tests and the continuous monitoring, uh, to get us that next step and get us to the level where we can move forward. We talked about some of our technical debt, how we ran some pipeline runners. Uh, we started off with Jenkins, Jenkins flows, what we inherited from the previous team. It was what they were familiar with, uh, in looking at how we wanted to get bigger, because we knew we were going to get bigger and faster. Uh, we were looking at whether we wanted to go to a cloud B's or get lab, and our guys decided they'd rather be on the get lab, uh, good things and bad things, right?

00:18:25

You move. Uh, we said, we're going to use get lab. One of the discussions I had with them and really developers, and it was great. Uh, later on, he said, Hey, we're going to get lab. I said, I haven't used a lot of get lab. Let me go do some search searches. You know, I'm more familiar with the other one. I'm more familiar with Jenkins. And I went and I looked at it and I talked to the get lab guy and, you know, had those discussions and looked at all the security features we had available and get lab. And I said, you know what? This is great. I said, there are so many security features and get lab. We are going to tie these in. We are going to have monitoring across the pipeline, uh, in life is going to get so much easier for security.

00:18:58

And I went back to our developer apps that I'm excited. This is what I need. This is the version I want. Show me where it turned out. He said, well, we just bought the basic version because we're there. We can do everything we need. We need to limit the number of seats. So we got better over time. We got better at it, but we used to get lab and the get lab instances to create transparency and dashboards. Uh, again, our security folks were able to go to any of the tests in the pipeline and see them all the time. Uh, they can look at the security scans or they could see the static and dynamic scans happening as the different elements code were being committed, both at the individual level. And then at that, uh, that feature level, as it went through and it helped us solve our technical deck.

00:19:36

It helped us figure out what it was and not only what it was, but figuring out where it was in the cycle and how to close it quickly. We also talked about process debt, right? Our customer wanted things fast while he thought he wanted things fast. Uh, but once we started delivering every three months, it turned out it was a little bit faster than he can handle it. Right? Like I mentioned, uh, we need to organize the teams. There were a lot of government folks. So instead of having the government folks be in charge, we did the scaled agile thing. We will, those government folks, the product owner, where they were accepting the products, but it didn't mean that they drove the structure of each of those individual teams. And some of that took a little doing to, uh, because that's really a cultural shift for a lot of those folks, right?

00:20:16

If they'd been in that government structure for a long time in brilliant, they're used to having that control over anybody. You put them with they're on the top of the team. They're not on the bottom of the team, uh, as they move through the levels, but we organized it. We worked through it. We made sure it worked, uh, in, in organizing the teams to get to the processes. One of the other unique things we did when we came through, we'll probably not unique for everybody but unique for us at the time, which we drafted against those features and those printed programming command. Each of the scrub masters knew what the technical capabilities of their teamwork and they got together and they did a one for one draft, just like you wouldn't fantasy football. Uh, they picked out that they got the chance to when they were going to go first, second or third or fourth after we added the fourth team and they pick the feature they wanted to work on, and then the next team picked and then the next team picked, and then they were responsible for committing it, uh, kind of like name that tune, right?

00:21:07

You say, I can do it. And I can do that. Uh, that aspect in 35 story points team two says I can do it in 30 story points team to, you know, build that feature. Uh, and it worked great for us obviously by the, uh, the production record.

00:21:24

So what do we see overall? What's kind of our, our taxing to the hanger. What do we do with this weapon system? Well, we did six releases in 20 months previously, they'd be looking at about 24 months per release. They did one release in 24 months. They did a second release and then they turned it over to us or almost second release. And we got it up to six releases in 20 months. That's just outstanding speed. Right? That's outstanding timeline for it. Uh, and we did it through applying DevSecOps processes, starting with the basics and starting with the beginning of where we were compliant with the standards. Along the way. I mentioned some of our individual security wins. Some of the individual processes that we got to, uh, we're talking about NIST 853, uh, which if you haven't seen it as a whole beast, obviously it's a wholeness standard, but if there's 1200 different instances that you have to address that you have to close, that's a lot.

00:22:22

Uh, so we got a two year ATO, uh, for the program a two year authorization operate. Uh, this year was the first time in the eight year program history that had ever received a two-year ATO for that amount of time. The previous one had been that 18 months with conditions. We did get a six month extension in all fairness, uh, because of the pandemic. So they gave us six extra months to operate, start closing some of those issues. Uh, but we still made it. We were compliant, no conditions. This time we moved into that new building. Uh, we went through a couple of the, we work type situations, but we filed them in that new building. We built those dedicated environments that we're gonna use those dedicated systems, that infrastructure is code. And then we increased the four dev teams. We started with three, we went to four, uh, so we can have nonstop improvements, uh, that we can keep adding features and keep building forward.

00:23:12

Even when the government isn't completely ready to get there. So what are the final lessons? What did we take away that we work on next? Because it's parts I expect you to take away is the, the people processing in cultural that the kind of things you can do with your team, our team builds to increase the continuous monitoring as we go to cloud, to be able to hit hook systems together and know what the different systems are doing. So we can monitor user status whenever they're connected. Uh, instead of just when they tell us they're connected to get to that continuous ATO document, or instead of doing on a year, year basis, we take what we find is the vulnerabilities we take. What we know is that the known bad on a system, and we dump it on a repo where folks can check at any time to get some AI, AI, and ML agents in our pipelines to get improvement both on the systems as they do the networks detection, uh, in, in the pipeline to show where we're having problems in our pipeline.

00:24:05

As we speed up to accelerate from that chalk line delivery every three months to release on demand, to just build up and build up and build up as we're doing deliveries and just kind of put them in that feature folder. And every time somebody says, you know what I want that they can go to the selection boards, just like the Chinese menu. And they can say, I want one, three and 13. Can you bundle this up for me to release them? We can bundle up one, three and 13 and we can release those on demand. So instead of being a three-four two, a three, five a, we call it a three for one or whatever we need to do for the administration, but we can move it forward. So those are our next steps. And we're going to get there by setting some clear goals.

00:24:42

We're going to continue talking about accelerating the value delivery. We're going to have that quarterly release based on the on-prem solutions. We're gonna have some rapid maintenance response with the service desk, uh, because why didn't talk about as much in this was the whole time we have a service desk. We're not only building a new system, but we're also responsible for retaining the ones in the field. So folks are constantly well, not constantly calling in because this system worked pretty well. Uh, but you've all worked with users and they all have something they want to call it and talk about. Uh, so we had them calling in and we talked to them. Uh, we worked with them as best we could. We secure the test integration. We had our test processes complete. And again, we were pre processing so quick that operational test element that the air force uses, uh, to make sure that they all comply.

00:25:24

Uh, we actually overrun them over random. They couldn't keep up with the speed we were going. And I said, you know what, we're going to do this, but we're only going to work comfortable with the level of you're producing. We're comfortable quality. We're only going to test every six months. Uh, and we got through to comprehensive compliance or compliant. Now we have that two years process and we're building it to a more integrated process all the way so that we can speed up and accelerate what we deliver and deliver the best capability to defend everyone, right. Uh, as we deliver for cyber vulnerability analysis. So that does it for my presentation. I'm excited to be here at the DevOps enterprise summit. Uh, again, my name is Dr. Mark Peters. You can find that online, uh, either on LinkedIn or on Twitter at tiny cyber, uh, and I'll be in the slack channel for questions as well as Jeff Sorrell and, uh, Brian Butler. Thanks for your time and have a great day.