Las Vegas 2019

Shifting QA Left - Emerging Trends in Code Quality and Security Automation

This talk will discuss various advances in program analysis technology that enable a larger class of bugs to be detected earlier in development (and even to be automatically fixed in some cases).


The talk will focus in particular on recent developments that enable tight integration of program analysis tools into DevOps processes.


These new techniques have been pioneered by academia and operationalized at scale (billions of lines of code / thousands of commits per day) by large tech companies such as Google and Facebook.


The talk will conclude with best practices for organizations interesting in incorporating modern program analysis into their development workflow.

DS

Dr. Stephen Magill

MuseDev, CEO

Transcript

00:00:02

Thanks everyone for coming to the session. Um, so I'm Steven McGill, I'm CEO of muse dev. And, uh, just to tell you a little bit about myself, um, so I've spent the first part of my career building code analysis tools, uh, tools to find bugs in software, um, on the, mostly on the research side of that question, uh, first doing work in academia and then recently in industrial research labs. Um, but over the last couple of years, I've gotten more and more interested in this question of, um, how do we improve the impact of these tools, right? So like the tool is part of it, but it's the workflows and the processes and the integrations that you build around that, that have a large impact on, on how much value these tools bring. And so how can we improve that value? And in particular, um, looking at what Google and Facebook are doing in this space in terms of how they apply code analysis at scale in their companies and the impact that that has on QA processes.

00:00:53

So that's what I'll be talking about today. Um, and I like to start talks like this with like a slide on why code quality and security is important and slides like that usually have, uh, some story about a disaster, like a data breach or, um, you know, a, an intrusion or maybe even something blew up because of a coding error. Um, and then some, some scary graph showing the impact of that, you know, maybe declining stock price or user trust going down. Um, but I imagine we all have stories like that in our minds, from our own organizations, right? There's always those incidents or those close calls that really motivate the need, uh, for a quality code quality process and a focus on security. Um, and so I'm just going to assume that people in the shifting QA left talk care about QA, um, and talk about some of the best practices that, again, I imagine a lot of people in this room are already following.

00:01:48

So things like, uh, revision control, uh, testing, uh, doing a peer based code review, which is still the best way to improve code quality and catch lots of different types of bugs, um, and, uh, using static analysis tools to find, uh, find things that testing's not good at or help, uh, find errors more efficiently, um, and then doing instrumentation and monitoring, uh, to collect production data, uh, and learn from performance and reliability, uh, results, uh, in, in the deployment scenario. Um, and then also in addition to all of this, uh, most organizations layer on top of this, a QA, a separate QA process, it's usually a separate team. Um, maybe it's called the QA team, maybe it's called AppSec. Um, but they're focused on, uh, testing the software. I'm looking for errors, looking for security problems and then sort of launching those back to development as issues, sign a bug tracker or something like that.

00:02:42

Um, and this, um, this team is usually separate from the development team. Uh, and like I said, they just, they interact by, by sort of filing tickets. Um, and in particular, um, this, the static analysis box that I'm showing here, it usually lives, uh, on the QA side. So, um, often the, the QA team will use static analysis tools is one of the things in their toolkit to identify issues, um, and then file tickets for the things that look most impactful. And so really, um, a large part of the focus of this talk is on that static analysis box and shifting it sort of away from QA and security into like a proper member of this stack, where developers are interacting with it all the time, and it's not a separate workflow and there's not the overhead associated with that. Um, and so there's many choices for how you could do that integration.

00:03:31

Um, obviously we want to maximize the outcomes of this process while minimizing the actual process required, right? So like, obviously you can improve code quality by just like doubling the size of your QA team and also adding a lot of developers. So they can like triage all the issues that are coming their way, but that's like a horribly inefficient way to maybe get marginal improvements in quality, right. So how can we do this with less process, less communication, um, and as just less slow down. And so a lot of it, as I mentioned at the beginning has to do not so much with the tools themselves, but with, um, how they're, uh, how they're integrated into the process, how they're orchestrated. So that'll be a large focus. Uh, and as I said, this is all motivated by, um, what Google and Facebook are doing the space.

00:04:13

They've both companies have published quite a bit on what their developer productivity groups are doing, uh, what they use in terms of tooling and in particular, um, how they go about incorporating those tools and what they've found to work and what doesn't. And so a lot of those lessons learned and best practices I've rolled up into this session. Um, and really I've distilled it down to three key or four key principles, right? So, um, use multiple tools. Uh, there's generally not a single tool that will hit all the types of errors that you care about. Um, integration matters. It's not just about the tools you choose, but actually how you incorporate them into your workflows, cherish developer trust. So this, this point is really that it's all about developers, and if developers aren't getting value from these tools and from this workflow, they won't care and they'll stop responding.

00:04:59

Um, and then principle four is that these tools when integrated the right way can actually support productivity. Um, so there's this myth that the more, the more tools you add into your CIP process, um, the more things that you add as checks along the way, uh, the more you slow down development, velocity, release, velocity, and so forth. And, uh, that's just not true, right? If you do it the right way, um, you can get additional checks in there, um, and actually support more productivity or enable new engineering, uh, efforts that wouldn't have been possible otherwise. And so I have a story about that. All right. So as I said, um, while these key principles I think can be applied in a lot of aspects of dev ops, right? Testing that the instrumentation monitoring, I mentioned before, I'm going to focus here on static analysis. That's my space, that's what I know best.

00:05:45

Um, and so I'm going to first say like, what is, what is static analysis? Right. So, um, and at its core, it's just saying something about the behavior of the program without actually running the program. Um, and it could be looking for errors, things like, uh, this piece of data isn't properly encrypted in this place, in the code, or it could be trying to prove the absence of errors right. Everywhere in this code, we're properly encrypting customer data, right. Um, and different tools will target sort of more or less different sides of this, of this balance, right. They might just looking for bugs that I'm very confident in it in, or am I trying to find all bugs? And maybe I report some things that aren't bugs along the way, right. Um, whatever approach they take. And there's a number of different domains of interests that you can target, right.

00:06:31

So we all know about security that gets the most attention, right. Using tools to find security problems automatically. Um, but you can also use these tools to find performance issues, uh, evaluate readability and maintainability, uh, look for problems that can cause reliability issues, uh, and evaluate just overall correctness of the code. Um, whichever thing, uh, you're trying to check there's then the question of like what underlying analysis technology do you use to answer that question? Uh, and again, there's a range of options. So the simplest, simplest static analysis tool, you can go run it yourself right now grab, right. So grappa will tell you things about the code without running the code, right. It will look for, for particular syntactic patterns. And so, like, here's an example of the sort of thing you could do with grip, right? Um, I'm, I'm saying there's this knit connection function here, and I'm assuming this sets up some sort of encrypted communication channel.

00:07:22

Um, and the first argument is the cipher to use how to, how to encrypt things. And so here, it's asking for, uh, we're providing three days, which I'm assuming is triple Dez, which, um, is still supported by many applications today, but has for a long time, but no, not to be a secure cryptosystem. So you might want to make sure that your code doesn't doesn't do that. It doesn't initiate a connection with that weak cryptosystem. Um, and so you can do that by just searching for that string, maybe making some allowance for white space, um, you'll pick up direct instances like this. Um, and you'll miss cases where maybe that constantly gets assigned to a variable and then flows into the function, or maybe it flows into the surrounding function via some parameter. Um, and so for those sorts of things, there's more advanced approaches based on graph analysis and looking at the code as a graph, uh, computing, various derived graphs, things that talk about, say how data flows through the program, or how control, uh, flows through the, through the statements in the program.

00:08:19

And those sorts of analyses can, can pick up those cases. I just mentioned where, uh, where the value indirectly flows into the, into the function. And so you can get sort of more general patterns from that. Um, and so if that's a more advanced approach to analysis, there's still more advanced approach that I'm, um, there's a variety of tools I'm lumping together here in something that I'm tagging compositional program analysis. But the idea here is you just view the program as a collection of graphs, you compute something over each graph, and then you join the results together. Um, and you use that to answer very deep, very sort of whole program properties about your code things like, um, synchronization errors, right? Do the UI and the networking threat, are they properly synchronized? Is this thread thread, is this class thread safe, things like that. Um, all right.

00:09:02

So you've got this range of analysis from simple to more advanced deep analysis. Um, and, uh, the point I want to make here is like, it's not just a matter of using more advanced analysis. Like that's, that's not the right approach. Um, each tool has its place, right? And so for simple things like a simple API misuse errors, uh, looking for deprecated APIs, or I'm looking for a list of authentication tokens that you want to make sure it don't leak, uh, simple searches are the way to go, right? Um, for deeper properties like memory safety and thread safety, uh, you need a deeper analysis, which generally takes a bit longer to run. Um, so, so just that conceptual description of the space, um, probably brings to mind that the multiple tools would be the answer, but there's also empirical evidence that, uh, using multiple tools is helpful.

00:09:50

So, uh, NIST, the national Institute for standards and technology for 10 years now has been doing periodic evaluations of static analysis tools, both commercial and open source analysis tools. Um, and they recently published a ten-year retrospective, some sort of summarizing what they've learned in all those evaluations. And one of the takeaways was that, uh, the results showed limited overlap between these tool reports and that the use of multiple tools, uh, can increase overall recall and boost confidence in the results separately in separate work, um, Habib. And Pradel, uh, looked at three open source static analysis tools. So error prone, Infor, and, uh, spot bugs are all open source tools that you can go down and download. And again, have limited overlap in that set of benchmarks that they looked at. Um, and so that's sort of results from the academic literature. Um, Google has also found this, uh, empirically in production and, um, published about how they use static analysis.

00:10:46

So they have this platform called tri-quarter, uh, which is a static analysis platform. You can plug multiple tools into it. And as of January of 2018, it included 146 different analyzers. The majority of which were actually written by developers, you know, so individual developers on teams looking probably for API specific application, specific bug patterns, um, that have come up in, in their code review efforts. So I'm not suggesting anyone go run 146 analyzers. I think probably only Google has 146 analyzers to run. Um, but clearly, uh, using multiple analyses, um, is important. And so which analyses should you go run? Well, um, when it comes to there's commercial analyzers, of course. Um, but when it comes to open source, there's actually a collection of pretty good analyzers out there now. Um, and so from, uh, this would be my go-to list, right? If you're looking for things to deploy internally.

00:11:38

Um, so Google has released their error prone and clang tidy analyses error-prone supports Java clang tidy is for C and C plus plus, uh, these are sort of their internal work courses for their code analysis efforts. Uh, Facebook has released their Infor tool, which supports CC plus plus Java and objective C. And then the open source community has a variety of tools that are useful. So, uh, PMD spot bugs find sec bugs, those all support Java analysis, and then the client static analyzer, um, gives you, you know, some good results on CNC plus plus. So already you have a collection of tools, so you can go use, um, but it's not just about picking the right tools and, and running multiple tools. Um, the way that they're integrated has a huge impact. Um, so let's talk about integration and let's talk about how not to do static analysis.

00:12:25

So, uh, so one thing you could do is you could run these tools and, uh, and never look at the results, right? And, and that seems silly. Why would you do that? But, uh, well, here's the answer. Someone bought a license of this tool and they said that I should run it. And so I ran it on my code, uh, and, you know, save the results and I'll go, uh, deal with them when I have time. Right. Happens all the time. Uh, you can run this psych analysis tools and file bug reports for that, then get ignored and deprioritized and in favor of features, right? Again happens all the time. So you might think, well, we'll, we'll, we're smart, we'll be smart about this. We'll set up a process with incentives to make sure that these reports get acted on, right? So you can evaluate your QA team now on, uh, the bugs that they managed to get fixed.

00:13:07

Uh, but then the development team inevitably tends to be prioritized, uh, and rewarded on, uh, features shipped and new products developed. And so you have this sort of ongoing battle between like QA and dev or security and dev, right. And, um, and a lot of again, process overhead and waste, right. So much better is present results to developers when they want to see them, which, you know, sounds so easy. Right. But, um, but isn't, isn't always done. So what does that look like? Well, from a timing perspective, the best time to display results, like if I'm a developer, when do I want to see results right after I wrote the code is the best time, right? Like it's all in my head, I just wrote it. I know what's going on. I understand everything about this piece of code and how it interacts with the rest of the system.

00:13:55

And so if you see an error that you think I should address, tell me now, right. It's very easy for me to fix it. Um, another good time is maybe it's not code. I wrote, uh, but it's code. I just modified someone else wrote it. You know, I went in there and learned enough about it to make a code change. Again, if there's an error related to that code change, now's the time to know about it. Basically, you know, as developers are working, they have a certain piece of the system paged in to their head, right. Um, that it's easy for them to focus on, easy for them to make changes. And, um, and that's really the time to report results about that piece of code. Otherwise they have to go familiarize themselves with some other part of the system, and there's all this context switching overhead.

00:14:36

So good integration has to be timely, but there's a couple of other important components. It has to be part of an existing developer workflow and use existing developer tools. And why in all of this, I'm very focused on developers, right in here, developer workflow, developer tools. Why it's because developers are the ones who fix the bugs, right? Ultimately it has to be a developer making that change to fix that bug. And so if you can get a process together that works for developers, um, that presents things to them, the way that they'd like to see them, um, that's the best for everyone. So what does existing workflow look like? This is a pretty typical development workflow, right? You have something like get hub, managing your repositories. The developer pulls the code onto their local machine. Um, they use their favorite ID to make some code change.

00:15:21

Um, and I'm not going to get into which one's better, no VI or Emacs works here, but, um, you know, there's a variety of them. Things usually standardized again when it comes to the compiler, right? So check, check the code bills, run some tasks locally, and then push the change up to get hub, uh, or whatever repository manager you're using. And that generally then kicks off a CCI process, right. A CICD process. Um, and so there's a CIT based build and test infrastructure. Generally, there's a manual code review step that's sort of the last step in, in a deploy pipeline. So any of these places are good places to integrate, right? Like in the ID is great. Um, in code review, um, in, in CGI, you can have, you know, you can have a tool block, the builds if you're very confident that that tool is going to be always providing meaningful results.

00:16:08

Um, but in general, any of these are good places, uh, what are not good places? Well follow me to a place well, outside your workflow. So this is my awesome bug dashboard. I put this together. Um, last week I spent a lot of time picking the right font and, uh, coming up with like a great quality scale over there. I'm really proud of that. Um, and so you can see it's like providing meaningful results, right? Like Thanos clearly doesn't understand what the balance operation means. I just don't think he knows what that word is. And, um, and you know, in Brazil, the central services, having that, that method has problems with repair. So, um, you know, so these meaningful results that you might want to go fix and, you know, quality got this nice polygraph it's declining, w w you know, we're up against a release deadline, we're pushing pretty hard.

00:16:54

So quality is taking a hit, but, you know, we know it's, Joe's fault. Like he's the one who's really, really putting some bad code in there. Right. So we, we know who to blame. Right. Um, so I'm being kind of facetious, but like, um, there's, there's a lot of, uh, there's a lot of focus on dashboards like this and they, they certainly have their place, right. Um, a nice sort of code quality dashboard, a list of bug results. That can be a great sort of way to get a sense of where you are from a quality perspective, a great sort of after action report, if you've just finished some initiative to try and improve quality, um, it can be great for sort of like management level and higher, right. But this is not an interface targeted at a developer, right? This is not, developers are not going to want to pull this up and, uh, and start taking off, you know, bugs here.

00:17:38

It's not part of their workflow. Uh, it's not oriented towards them. Um, it's just not effective, so what's more effective. So here, this is the GitHub pull request, um, a workflow so that the developers are very familiar with this, right? Anytime if you're using GitHub, anytime you submit a code change, it goes through this pull request system. And this is how a code review works for your base code review. So it starts with a description of the code change. Um, here, this is a performance improvement. Um, so CPU usage was 60% here. Um, after the change, it's down to 35%. So improved performance, lots of detail on the particular scenario and how performance was impacted. Uh, and then below that is a discussion, right? So it's conversation among developers on the team about, uh, what additional changes should be made before this code is merged.

00:18:26

Right? And so, uh, you know, this person says, would it make sense to me, Lee returned all after each condition check. Uh, yes, I think that would be good and they make the change. And then Progressive's right. So this is a great place to insert tool results, right? Um, part of it's are, there's already a conversation happening about what's good about the code. What's bad about the code, what needs to change before it's merged? So if there's some tool that knows something that should get fixed, now's the time to report it, right. And again, it's part of their existing workflow. No one has to go check a separate webpage. No one has to remember to do some other process. Um, and this is exactly what Google does. So here's a screenshot from a paper that Google put out on their tri-quarter system. Um, and just like I showed before it integrates with code review, um, their code review, it's an internal tool.

00:19:13

Um, so it looks a little bit different. Um, but you can see, these are two analysis results, one from a linter one from error-prone, which is an open-source tool, um, that they released. Uh, and in each case, there's this, please fix button that the code reviewer can click on to say, Hey, I think you should address this issue. If the developer disagrees, they can comment on why, again, it kicks off that conversation that's already happening. All right. Uh, here's another great story about why integration matters. So at Facebook, um, there's a team that supports a tool called Infor, um, in, for when they first, so they were an acquisition, actually, they were startup building the study, and I was still got acquired by Facebook, went into, apply their tool inside Facebook, and initially deployed it in that thing. I showed on like the second slide right off to the side, um, as part of an overnight run, um, where then they took the results and filed them as, as bug reports, uh, and associated tickets.

00:20:08

Um, and they'd spent a lot of time making sure that the analysis was reporting useful bugs that that people should care about. Um, and, uh, and so they deployed it. They thought it was gonna be great. They got almost no fixes, right? No one responded to these folk reports. Um, they then later took the same tool and deployed it in this code review workflow. Um, what they call here, the diff time deployment, right? And the same analysis tool with the same results saw a 70% fixed rate. Suddenly 70% of those issues were getting reported or were getting fixed, whereas none of them were before. So again, same tool, just a different integration and workflow. So that's, that's how much integration matters.

00:20:48

Alright. Principle three is cherished developer trust. What do I mean by that? Well, let's look at some of the things that can go wrong. Even if you do those two things, I've already mentioned using multiple tools and are getting them the right way. Um, you know, it could be the case that, uh, suddenly there are more automated results than developer comments, right? You're using so many tools, they're all reporting things. And, uh, and it becomes, instead of having this conversation with my development team now, I feel like I'm just triaging tool results. Um, and by the way, most of those tool results are just status updates saying, Hey, the tool ran, didn't find problems, or they're like style suggestions that maybe apply to someone else's code base, but we don't follow that practice so that, you know, ignore, ignore, ignore, um, or the, the results that say their errors are not actual errors.

00:21:33

Cause the tool doesn't understand the code well enough. And it doesn't understand that like actually we're doing this other thing that protects against that. And so like at this point, if I'm on this team, I start looking for like browser extensions to just filter all this noise off or like where's the off switch? How do I get this out of my pipeline? Right? So there's this danger with a low signal to noise ratio. There's this danger that as more analyzers are added and the risk of tool fatigue increases. Right. Um, and so as you, as you grow the set of analyzers using, it becomes even more important to maintain a high standard of quality among these tools. Um, because once this developer trust is eroded, once they decide that these automated results are much more likely to be not useful than useful, um, it's really hard to regain that trust.

00:22:18

And so this potential effectiveness of like, you can just put the results in front of developers and they'll just get fixed. No one else has to be involved in a separate workflow. Um, like you've lost your shot at that. Right. So you really want to make sure that you're taking it, paying attention to this. Um, so how can you do that? Well, um, at Google is probably not a surprise to learn. There are data-driven about it, right? So again, the same screenshot I showed before each of these reports has this button called not useful, right? If you click that their developer productivity team collects that data and notices that you didn't find that particular results useful. And, um, and if, if there's ever more than 10% of reports for a particular analysis that are flag not useful, it gets pulled right. And, uh, it doesn't get re enabled until the team can fix.

00:23:06

Um, uh, sometimes it's the description of the errors and precise enough, or it's not understandable, you know, it's actually an error, but it's just not well communicated. Um, sometimes it's just, you know, not a very precise analysis, but, um, they're kind of continually monitoring whether things get fixed and pulling things if they're not effective. So that's how, that's how you maintain a high standard of quality. All right. So these three principles together, if you managed to get this right, right. All of these theories combined, um, there's this nice synergy between these where, um, you've got multiple tools producing results. Um, you've taken care to make sure that those results are very likely to be useful. You've pulled any tools that aren't performing, um, and you've integrated into the developer's workflow. And so you're essentially Matt making the most of developer attention. Right. Well, you're making sure that bugs get fixed when they're easiest and cheapest to fix right.

00:23:56

In terms of time and effort. Um, all right. So, uh, as an example of the outcomes that this enables, um, here's some sort of statistics from, from Google and Facebook. So at Google, um, they've reported that approximately 50,000 code review changes get analyzed per day. Um, that please fix button. I showed, uh, it gets clicked more than 5,000 times each day. Um, and so by their estimates, this system running this on a continual basis has prevented hundreds of bugs per day from entering the Google code base. Facebook has had similar results. Um, so static analysis, uh, has prevented thousands of vulnerabilities from being introduced over the last 10 years or two years. Um, and it catches more severe security bugs than either manual security reviews or their bug bounty program. And again, they collect the data. So they know these numbers, right. What's the percentage of bugs found by each method.

00:24:53

Right. Um, and just on the, on the topic of data, like data is so important. So like even I want to think about, even if you're already using sort of analysis tools or even testing, right? Like how, how much insight do you have into how useful those results are, right. Can you track those results through the whole workflow to see if they get fixed and see if they deliver value super important? All right. So the last, um, principle is that these tools can actually support productivity. And, um, and so for this, I want to just, uh, re relay a story. So, uh, Facebook, um, has described this story in a couple of talks and in an article that they published, um, on how they use analysis tools. Um, but basically it's a story about, um, the newsfeed component of their Android app, right? So if you, if you pull up the Facebook Android app, there's, you know, this, this feed that shows you the recent activity relevant to you, um, it's called newsfeed and they were moving it from, they wanted to move it from a single threaded architecture to a multithreaded architecture.

00:25:55

So if you know anything about, uh, concurrency, like just that statement should like scare the crap out of you, right. Um, suddenly you've got all these classes, um, that we're assuming they'd only be called, you know, sequentially one at a time in a single threaded context. And now they're being called in a multithreaded context and everything has to be properly synchronized, or you can have, um, you can have these errors that are just very difficult to trace down, right? Like multi multithreading errors are notoriously hard to debug and, uh, and stamp out. So hundreds of classes were impacted. Um, and, uh, it just so happens that around the same time the developer tooling group was working on a concurrency analysis. Um, and so they got together and they collaborated on tweaking that analysis to make sure that served the needs of this product team. Right.

00:26:45

And, um, and so they worked together, they, um, they deployed the tool and then did the rearchitecting, um, and, uh, it was a success and they managed to do it. And at the end of the effort, one of the Android engineers, uh, communicated with the team that, uh, without infer, which is the tool that they built this into, uh multithreading and newsfeed would not have been tenable. Right. It was just too much risk, uh, that, that it wouldn't go well, uh, you know, that they'd, that they'd get halfway through the rearchitecture and have all these lingering bugs. Right. Um, and so the tool support was really critical to that. Um, and so I like this story because it's not just like, there's reasons that doing this integration right. And using the right tools can improve productivity that, that I described at the beginning. Right. Um, it simplifies the workflow.

00:27:29

Less people are involved, you know, it's more automated and more localized. Um, but this shows how having the right guardrails in place, the right automated tools as part of your process can even enable new engineering achievements that wouldn't have been possible otherwise. All right. So, um, in terms of shifting QA left, the summary is static analysis, static analysis tools have always been a way to automate QA. Right? I said that at the beginning, often these tools are, uh, one of, one of the things in the toolkit of the QA team. Um, but as analysis efficiency has improved as these tools have gotten better and better, and we've learned more about, uh, how to do effective dev ops, how to integrate tools into dev ops. Um, these automated tools can really be brought left closer to developers and integrated better into the process. Uh, and if you managed to do this the right way, uh, you can simultaneously improve code quality, right?

00:28:19

What we're all after you can, you can get your QA stuff done, uh, while improving overall productivity. So, um, Jean asked everyone to close with, uh, the help they're looking for. And so what I'm looking for here is tell me all the reasons this wouldn't work in your environment, right. Um, if any of you were thinking, that's great for Google and Facebook, but I'm not at Google or Facebook, this would never work for me. Right. I want to know, like, what are the roadblocks, what are the differences that you see, um, where do you think this, this could and couldn't work and what tweaks would have to be made? So, um, I'll be in the speaker's corner after this. Uh, if you want to have more discussion on the topic, um, or just come find me throughout the conference, um, thank you very much.