San Francisco 2014

How DevOps Can Fix Federal Government IT

The Federal Government spends more than $80 billion each year on information technology. As the fiasco with demonstrates, the results are not always good. Government IT programs are expensive and monolithic, and the lead time from a “mission need” to a deployed capability is often measured in years (in one of our agency’s programs, about 12 years!). IT systems are often difficult to use, and the US government’s online service offerings to citizens are far from meeting the expectations of a public that is used to Google, Facebook, and Twitter.

The US government has only recently begun to adopt agile approaches, and only in a few agencies. But the results have been encouraging, and show that it is possible for the bureaucracy to be agile. DevOps, however is a game changer. At USCIS we have moved to a continuous integration, continuous delivery approach, and have begun experimenting with a DevOps model tailored to the needs of the government.

By combining DevOps with some ideas taken from the Lean Startup movement, I believe we can cause a radical change in how the government does IT. We can dramatically reduce lead times and costs, improve the usability of systems, provide more transparency, create citizen-centric online services, and – importantly – significantly improve the government’s security posture.


Mark Schwartz

CIO, US Citizenship and Immigration Services (USCIS)



I'm going to try to convince you all that absolutely crazy is, is better. G and I noticed something about your speakers. I don't know if this has anything to do with how you recruited them, but most of the people speaking today have mentioned bureaucracy and mentioned it in a way as if it was a negative thing. So I thought I would talk a little bit about that and maybe talk a little bit about something else. That's come up a lot cultural change. And I'd like to question a few assumptions, maybe around cultural change. So I joined the federal government for the first time, just over four years ago from the bay area startup community. I knew it was going to be somewhat different for one thing. People called me, sir. And even, even when they weren't being sarcastic.


So they called me, sir, I was for another thing. I was going from an environment where I was managing about 30 people here to managing about 2000 people at USC. I S from a budget of about $3 million a year to about $500 million a year. So it was clearly going to be a big change for me. U S CIS, if you're not familiar with it, I bet a lot of you are, we are responsible for legal immigration to the country. So if you're confused about the different immigration arms that are part of DHS, now I can clarify it. We're not the mean and nasty people who deport people, that's ice. We're not the grumpy people who sit at the airports and look at your passports. That CVP we're the ones with the long backlogs. So there's our specialty. We're the ones you apply to. If you want a green card, you apply to us.


If you want an H1B, you apply to us. If you want to become a citizen, you apply to us. We receive about 7 million applications a year. And, uh, we, we process them as quickly as we can basically is what it comes down to. So joining the government like that, realizing it was going to be different. The idea in my head or the thought in my head was can we have a lean bureaucracy? Can we make bureaucracy lean? Can we make it agile? Is it possible because the government is bureaucratic. I mean, that's, that's the way it is.


And it's an interesting question. When you think about it, everybody talks about bloat in the bureaucracy and waste and so on. Can we actually take it and figure out a way to make it lean? So there's my question for you. A couple of incidents stand out from my first few weeks in the government, or at least, you know, I think I remember things happening. Uh, the first one, I'm in a conference room with a few people on my staff and we're talking about some important user need that has just come up and they've shown me a Gantt chart for what's going to be involved in doing it. And the Gantt chart says that it's going to take about a year to do. And, and, uh, you know, I looked at them, scratch my head a little bit. I said, at least this is the way I remember it.


You know, th this is just a few changes to a webpage. How can it possibly take a year to do? It's probably a little bit more than that, but that's what it seemed like. And they seemed to get really nervous and they chatted amongst themselves. And then I said, okay, I think we can do it in eight months, eight months. How could it possibly take eight months to do this? I could do it myself. And just a couple of minutes. Yeah. We could do it ourselves in a couple of minutes too, but we can't do anything in less than eight months. They meant that they actually meant that literally we cannot get a release out in less than eight months. And, uh, so I said, what, how, how is that possible? You know what, why can't you get a release out in less than eight months, the SDLC? And I thought, you know, that's a little strange. I thought I knew what an SDLC is, a software development life cycle. I'd never heard it used as a reason why you can't do things


So eight, eight months to get a release out. And I thought, I'm the CIO. You know, I'm going to take charge of this. I said, well, then let's change the SDLC. I don't know if any of you have ever tried learning a foreign language. You know, maybe you've experienced this. You're, you're learning a language and you're sitting there talking and blah, blah, blah, blah, blah. And all of a sudden, everybody gets quiet and starts to look embarrassed. And you realize, oh, I probably said something wrong there. I meant to say, I'm getting a haircut. And I say that everybody's dying in the hospital or something like that. So it was like that, you know, we can't change the SDLC. Uh, why can't you change the SDLC? I've been in troublemakers since I joined the government. You know, why not? Let me see if this works. Why can't we change the SDLC MD 1 0 2, sir. MD 1 0 2 MD 1 0 2 that's management directive, number 1 0 2. It's uh, you know, we're part of the department of Homeland security. That's our parent agency and they have a policy called MD 1 0 2 and it has our software development life cycle. It's there. Um, in fact, I never traveled without it. This is actually M D two right here.


That's double sided, by the way. Am, do you want to, if anybody wants to look at it later on, I've got it here. Um, they had sent a copy of MD 1 0 2 before I started work and said, you should probably read this. And I had glanced through it. And I realized immediately it had nothing to do with the way we actually develop software. So I didn't bother reading it in retrospect, that might've been a little bit of a mistake. So, second, second incident that I remember in a conference room, it was a big conference room. It's filled with people, government meetings have a lot of people. It's kind of a rule and we're discussing a legacy system that nobody is using anymore. And it's costing us a lot of money to maintain. And it was called our knacks with a name like our next, you know, it's a legacy system, right?


It's one of those things. So nobody's using it. I'm the new CIO. And I ask a lot of questions. I just make sure that nobody is using our knacks anymore and that it's costing us money to maintain. And then I say, all right, let's decommission it. And it's that silence again? Just like, just like that. I know I said something wrong, right? So little voice from the back of the room. I oh, Sarah. Yes. Tina, what is it? Sorry, you don't have the authority to do that. And I thought why, you know, I'm sir, don't, you know, So, all right, Tina, why, why don't I have the authority to do that MD 1 0 2.


So I figured, okay. It's time. I have to read the thing. I, I didn't, I didn't just read it. If you look at this thing, actually it's highlighted. It is underlying that it's got little comments in the margins. It's it's actually, I studied this thing. I studied every word and I became the greatest expert in the government on . You can quiz me if you want. And you know, it's a beautiful document. It's, it's absolutely brilliant. Um, if, if you wanted to instruct a large group of people that they must use the waterfall approach, this is you couldn't possibly write it better. I mean, it's, it's gorgeous. It says when you're developing software, you have to divide everything into nine phases. You have to have a solution engineering phase and a requirements phase and a design phase, and then develop eating, you know, and there's a gate review in between each of those phases.


There were 11 gate reviews altogether. Um, my favorite was the test readiness review, which is the review to make sure that you've absolutely finished development completely before you can start testing. So there's this gorgeous waterfall document that is official policy. It says that we have to write for each release somewhere between 90 and 110 documents, depending on the kind of release it is. It's a functional requirements documented or whatever. And the integrated logistical support plan, you know, big mountain of documentation. I actually showed Jean this the first time he visited my office, you know, here's the documentation that we've just spent two years preparing without actually touching a keyboard. It's not done yet. So clearly MD 1 0 2 was the enemy. I had to fight MD 1 0 2. So I took it on, uh, I, first of all, brought in a whole bunch of agile coaches. I made up my own policy. Uh, it was called, uh, MD 0, 0 1 or something as my first policy. And it said from now on we're agile, try it, see what happens.


You won't believe how well it actually worked. That's the thing. So mark 0, 0 1 says we're going to be agile. It, it defined agility in a very reasonable way. It had eight core practices. And it said from now on all of our development within U S CIS is going to use these eight practices, uh, things like, uh, individually testable requirements, you know, and a few other things that wouldn't really surprise you. So I rolled that out within U S C I S and I somehow got myself, everybody heard that there was this thing called agile methodology and wanted it somehow for DHS. So I got myself appointed to a group that was going to rewrite and improve MD 1 0 2 boy. Did they choose the right person? So, um, agile coaches everywhere. Uh, we have about 75 legacy systems within a year. We had done a hundred, some odd agile releases.


Um, we were, uh, starting to move. We had great product ownership from the business side, everything was going fine. Uh, and I was working on rewriting MD 1 0 2, the big enemy. And, uh, suddenly I had this, I had this weird epiphany. I started to think about MD 1 0 2 in a very different way. Um, what happened was I sat down with the people who had written it and I discussed some of the changes that I wanted to make. They didn't like them, of course. Um, but they, they told me a little bit about the context and what they were thinking when they, when they wrote this thing. So what you have to imagine is it's 2002, 2003, the department of Homeland security is being stood up and, uh, it's being stood up in the wake of nine 11, and it's being stood up by putting together 22 different components, basically gluing them together in one big agency.


It includes us CIS and ice and CBP and TSA and FEMA, and the secret service and coast guard and you know, all this other stuff. And, uh, it's because of the way it's stood up, it's overseen by 104 different congressional committees. No, no committee in Congress wanted to give up its control over whatever it was overseeing. So when you put all this together, you've got 104 committees, um, and it's, it's total chaos, right? It's a merger of 22 companies suddenly, boom, uh, total chaos. And there's this group of people that is being held accountable for overseeing all of the spending of the organization, especially the it spending, which is a lot of money. And there are projects that aren't working out and there's total chaos, and nobody understands what money is being spent on what, and this poor group of people is accountable for this.


And they're, they're good, civil servants, public servants. They want to do the right thing. And, and they're in a sort of nightmare situation. So they turn to what they knew. Uh, they didn't know much about it in particular that a lot of them had come from the department of defense. So they, they went back to the way they used to do things and they wrote what I really think is a brilliant document. You know, if you're committed to that way of doing things, um, and it was the best they knew it was the best they could come up with. So, uh, I started to read MD 1 0 2 again, as I was studying it. And, um, reading between the lines, it was, it was this tremendously human document, all of a sudden, I mean, it's, it's a little strange to say that when you're talking about test readiness reviews and stuff, but in the words of this document, if you read it the right way, you could see the fears of these people who are writing the document, the hopes, the dreams, the integrity, you know, the commitment, um, because they really are trying to do the right thing and they're in a, an untenable situation.


So what I, what I realized after talking to them, um, first of all, the, uh, what seems like outright horrible bureaucracy, um, the faceless bureaucracy. That's not what it is actually. It's, it's human beings in a government organization trying to do the right things. Uh, don't ever let anybody tell you it's a faceless bureaucracy. I mean, it's, it's real people. I know them. Um, so what I, what I realized is that in order to bring agile practice into the government, what we have to do is somehow meet the needs that are expressed in MD 1 0 2. It's not the right way to do things, but it's there because of these needs, this oversight bodies, the accountability to the public and so on. Um, and so the challenge became, can I create an agile practice that I can map to those same needs? Or if you think about it, the way we handle agile requirements is we try not to have requirements.


We try to hear a business need and figure out the best solution to meet that need. So here underlying the culture of the government, underlying MD 1 0 2, there are real needs that are special and the government can we find a better solution to meeting those needs? That's what the problem became for me in trying to figure out a lean bureaucracy. The government is low trust necessarily. If you think about how our government is structured from the very first moment, it was really a low trust government. We have a system of checks and balances. We have three different branches of government. The idea is you can't trust the president. So Congress has checks and balances on the president. You can't trust Congress. So the president can veto things, uh, you know, the judiciary, uh, the three are intended to control each other because of a lack of trust.


We have freedom of the press. Why one of the good reasons for that is that we want somebody looking at the government and finding the problems. You know, we don't really trust the government. We want to hear about the problems you might have noticed that the press rarely writes articles about how wonderfully the government is doing. Um, I'm hoping someday to be on Jon Stewart, by the way, being made fun of that's one of my ambitions. So, so can we, you know, can we take this low trust environment, necessarily low trust environment, where we have congressional oversight looking over our shoulders all the time, the press is looking at what we do. All of you, the public are looking at what we do critically. Um, we have rules that we have to follow compliance, all sorts of low trust sorts of things. And can we find an agile or lean practice that will satisfy all those things?


Uh, it turns out dev ops was, it seems like it was created to solve this need. So where we're using a scrum based process before we're now moving towards more of a dev ops based approach. And I'll give you some of the reasons why, let me tell you a little bit about what we've done so far though at USC. I S so now everything is agile because of mark 0, 0 1. Um, but, uh, we have started tooling up. We have three different, big projects going with continuous delivery pipelines. Um, the same stack everybody else mentioned today, but he said basically Jenkins and get, and chef and grateful. And we have some wonderful production monitoring from new Relic. And I don't know what else they usual sort of thing. Um, and, uh, we have developers working in Java, spring hibernate sort of stack, and some working in Ruby on rails.


Uh, we have automated testing, uh, continuous integration and continuous delivery into the Amazon cloud. Uh, it's all set up and waiting for the first deployment basically, which is going to happen in a few weeks. Why is it going to wait a few more weeks? There's an election coming. We're going to wait until after the election. Um, that will be released one. Uh, I actually am very excited to say we also have a chaos monkey, um, based on the Netflix model. Um, so, uh, the chaos monkey, if you're not familiar, it's a script that causes havoc. Basically it tries to screw things up in production. So if someday you are reading the Washington post and you see on the front page that the chaos monkey has shut down the department of Homeland security. Uh, you'll take it as a success, right? It will, it will mean that our dev ops practices out there,


Big success. So, um, I'm going to try to run through a few of the reasons, uh, just to show you the logic of why a dev ops practice can help meet these underlying government needs and how I think about these things. So, uh, one thing about the federal government is we have very strict procurement laws, uh, laws actually. So what might be good strategy for a company is something that we legally have to do. Uh, and it's very common for us to have to change contractors over time as a different contractor wins contracts. And we have to try to create a level playing field where different contractors can, can propose on things. So, uh, one of the ways that in the past we've dealt with that is we've created a lot of documentation, right? Another contractor is going to have to take over for you.


So we need you to document absolutely everything you're doing. So the new contractor can get started right away. Well, with, uh, with a dev ops practice with a continuous delivery practice, really everything's scripted. There was no documentation to be written for the next contractor. The day they start, they can deploy to production. Um, the, um, the test suite, the regression test suite, let's them start working right away. Refactor make changes, mess things up. It doesn't matter. They're going to learn from it right away. So, uh, to deal with the fact that we have lots of contractors coming in and out a good, um, well thought out DevOps practice helps support it. There's one for you, uh, metrics, the government loves metrics. And there's a good reason for that. We have to be able to prove that everything we, every decision we made was made objective.


Really the idea of bureaucracy basically is we follow rules. It's not arbitrary. It's not based on whim. We make our decisions objectively. Great. I can instrument my pipeline. I can instrument production. I can create lots of data that we can use for decision-making. If we are looking at our cycle time and we make a decision to try and experiment, to reduce our cycle time. And it goes from six days to five days, somebody asks me, why did you make that decision? We were looking at cycle time. We figured this might help improve cycle time. We tested it out. We measured it. We can show that it really worked great. No arbitrariness, there it's no, no whimsy or anything like that. Compliance. Another area that's been mentioned. I have a lot of thoughts on compliance. One of the things that we try to obviously is a big part of what we do.


One of the things we try to do is if there is something we have to comply with, let's turn it into an automated test, sort of test driven development. So FISMA compliance, the federal information, security management act, security compliance. If we have automated test suites that check for FISMA compliance, then compliance becomes a matter of doing exactly what the tests are testing for a similar situation with section 5 0 8 of the rehabilitation act. This is a to, to provide accessibility for people with disabilities. In the past, we would develop a system. The gatekeepers would come in and review it and say, oh no, it's not compliant with section 5 0 8, go back to the drawing board. Now we are in the process of putting in place an automated test suite for section 5 0 8 compliance. The developers can keep testing against that. And our code, the subtlety here is I want the system to be in a deployable state at all times.


We can do that because it's test driven development for the compliance, our other quality control rules, the QA rules that we follow, those can be implemented as rules in sonar. For example, we can do static code analysis. We do static code analysis for security as well. So by taking the compliance issues and incorporating them into automated tests, we know our system is always deployable. We're not going to find out at the end that it's out of compliance with something, and it's a well-defined problem. As long as it meets those tests. It's compliant by definition. Um, couple of other examples, oh, security is an interesting area right there. Um, I think dev ops gives us the potential to really take FISMA compliance the next level. Uh, we can improve the government's security posture considerably. I think, uh, one change that was already going on in the government before we started to introduce dev ops is that we were doing continuous monitoring of production to find vulnerabilities.


And we have now a process where we don't even just check compliance of a system once when we release it. And then every two or three years, we have ongoing authorization, ongoing monitoring, and production to look for vulnerable vulnerabilities, uh, with a dev ops approach. This is just feedback from production, right? We instrument production for feedback with continuous monitoring tools. Along with our performance tools, we can have constant feedback to the developers on how secure the system is or what vulnerabilities are found. If there's a problem we can patch it, push the button, deploy a new version. In fact, in the cloud, we can tear down the old version that might've been compromised and spin up a new version. We can do static code analysis so that the developers are, uh, starting to produce code that already meets the security requirements that we've set up.


It's easy enough to do static code analysis, to check for the top 10 security issues and have the developers run the static code analysis against their code, constantly find the vulner or find the issues in their code, repair them right away. And over time, I hope this is future. The developers get used to not creating those problems, right? We can, we can push the compliance issues, right? To the very beginning of the development pipeline, um, risk management, uh, the last thing any government bureaucrat wants is to be on the front page of the Washington post with an unflattering article, but just let them try to find a problem here. We're doing tiny experiments, right? Our risks are all very tiny and they're built right into our process and we deal with any issues as they come up. Um, last thing I'll mention, uh, reducing waste.


Everybody in the government loves reducing waste, right? It's it's the biggest effort. Everybody can agree. Democrats Republicans waste is bad. Bureaucrats. Everybody says waste is bad. Uh, so to be able to go out with a lean approach and say, here's what we're doing about waste here's, here's how we're taking it. Point by point, very easy sell on the government. So to bring it all together, the point that I'm making here is that, um, cultural change, the government has a particular culture for particular reasons. It is a bureaucracy because that's the nature of it. It's something that's run by rules rather than arbitrariness. That's really the original definition of bureaucracy. Uh, government has special needs, and we can be agile about how we bring agility into the government. We can find ways to meet those government needs using just as a side effect of the way that we want to do things agile li um, and that is a cultural change.


People start to do things differently, but it's a cultural change that doesn't, uh, rip out the foundations essentially of the culture, what is really trying to accomplish, uh, and the reasons why it's there in the first place. So don't ever let anybody tell you that the government is a faceless bureaucracy. It is a bunch of people trying very hard to do the right thing and to meet the needs that are imposed by all, all of us, really as the public. Uh, and they are trying to find solutions to the situation they find themselves in, which is that they're overseeing that they have people looking over their shoulders at all times. They have to comply with laws that are designed not based on business sense, but on policy, social policy, uh, fairness to all of the contractors who want to bid for government business and so on.


And even though it's deliberately a low trust environment, that's the way it's built the best way to, to solve its problems is still in an agile way, right? And a dev ops process in particular maps, very nicely into all of the government's special needs. So, uh, think about us that way. When you see chaos monkey, wreaking, havoc, uh, think, think about what kind of a success that is, because what it means is that we brought the practices that we want to enter the government, and at the same time met all of its needs. And thank you all.