The Minefield of Open Source: Guidance for Staying Secure (US 2021)

Did you know that 6.7% of open source Java library releases contain known vulnerabilities? And this increases to 24% when you consider only the most popular and most used projects. Navigating this minefield to keep applications secure can be a challenge. In this talk, we give a preview of our latest software supply chain research, which characterizes this risk for various languages and offers guidance for how teams can 1) choose components that help minimize their risks and 2) adopt practices that help them quickly discover and remediate security issues as they arise. This session is presented by Sonatype.

las vegasvegasusbreakout2021

(No slides available)


Dr. Stephen Magill

Vice President, Product Innovation, Sonatype



Hi, I'm Steven Miguel VP of product innovation at Sonatype. And I'm super excited to be here today at the DevOps enterprise summit, talking about the latest state of the software supply chain report. So this is a report that stone N-type puts out every year, um, and has been publishing for the last seven years. I've been personally involved in the analysis for the last three years. Um, a lot of that work was a collaboration with gene Kim, those more recent, uh, analysis that we did. And it's just always really exciting to go in redo analyses from previous years, uh, but updated with an additional year's worth of data, but also think about additional questions we can ask additional analyses we can do, uh, to try and gain more insight into how people manage their software supply chains. So how do, uh, enterprise software development teams, how to open source projects?


How do individuals stay on top of the constantly evolving, uh, massive software that they're using that they're, uh, incorporating into their applications and how do they do that securely? And so that's what we're going to look at today. I'm going to hit the high points of some of the more interesting findings from this year. If you want full details, uh, you can go access the full report. I have a link at the end of the presentation, and I want to start with just an overview of the space. Um, and in particular, start by looking at just how much open source is out there. So if we look at a total projects, uh, that we see in the various ecosystems, and so this is for Java JavaScript, Python and nougat. Um, here is data on the total number of individual projects that exist. So, you know, JavaScript is absolutely massive at 1.9 million projects out there.


Uh, Java has a lot, 430,000. Um, and what's interesting, you know, if you think about this, uh, say you're a Python developer on you think about all the libraries that you use, that you pull in, you know, when you start a new project, uh, things maybe you haven't used, but you've heard about right, guaranteed. It's not up at that 336,000 level. Right. Um, and that, and we see that, uh, in, when we look at utilization. So if you go and you look at, um, for a particular ecosystem, how many of those projects actually show up in other projects, dependencies, um, or are utilized in, in applications? It's a much smaller percentage of that, that full set of software. So, uh, Java for example, is at the high end with 15% of those projects in Maven central being utilized, um, and JavaScript is down at the low end.


So if we look at NPM only 2% of the projects that are published to NPM are actually utilized by other projects. Um, the next thing I want to focus on is just sort of the growth, right? So, um, yes, it's a small portion of these projects that are used, um, but it's a large number and it's growing, right. Um, so we see, uh, 71% growth in the Java ecosystem still, you know, very vibrant community growing very quickly, uh, JavaScript 50% more. This is your, your growth in terms of number of projects. So 50% more projects, uh, in the JavaScript ecosystem, um, than there were at last year. Um, and so you can see numbers for all of these, they're all growing at a very rapid pace, right? So open source, open source obviously is a huge success, continues to be, uh, continues to grow, uh, in both supply and then in utilization as well.


Um, all right. So the next thing is just around the vulnerability landscape, right? So if you look at, um, where do the vulnerabilities lie and how do they break down in terms of usage, um, what you see as a really interesting, uh, situation where, uh, it's the most popular projects that have most of the known vulnerabilities and that, that sort of makes sense, right? Because the security community, you know, both white hat and black hat, um, focus their efforts on those most popular projects, right? If you have way more impact, if you find a vulnerability, uh, you know, way more impact for positive or negative for good or bad, if you find, uh, a own in a highly utilized project, right? And so those projects get the attention. Um, those projects are by and large, where we see CVEs being published. It doesn't mean there aren't vulnerabilities in less use projects.


You know, it doesn't mean you should go out and pick some obscure, uh, dependency, uh, to base your application on, you know, that might not be good from a technical risk perspective. Um, and you know, there's probably vulnerabilities there too. They're just not known yet. But what we see is, uh, to take Java as an example, 23% of, uh, the 10%, most popular, uh, project releases, uh, are vulnerable. And if you look at the 90% least popular projects, uh, only 4% of those releases are vulnerable. Um, and so here's a graphical representation of this. You can see this is broken down by a decile. So, you know, per 10%, uh, sort of slice, right, the top 10% of, like I said, 26% of those are vulnerable. Um, the next 10%, only 7% are vulnerable 7%, 3%, but you can see it drops off very quickly.


The height of these bars is the popularity. So you can see popularity is very skewed. You know, those top 10% of the projects are much, much more utilized, much more frequently utilized, uh, than the rest. Uh, and you can see the security community's attention basically Maris that, right? So the next thing to mention is just the huge increase we've seen in, uh, what we're calling the next generation software supply chain attacks. So this is things like dependency confusion, Typosquatting malicious code injection, uh, you know, sneaking code into a repository upstream, um, that has just exploded over the last couple of years. Um, you can see, you know, in 2020, it was down below 2000, uh, instances, uh, in 2021, you know, up at 12650% year over year increase in those numbers, which is just huge. Um, and it's been good to see, you know, there is, there is awareness of this, like people, people are noticing, um, there's tech technology solutions out there, um, that are starting to address these.


Um, and so, so, you know, the tools are catching up. Um, you can stay safe with respect to these dependency confusion attacks, but it's important to know that they are happening and they're happening more and more. Um, all right. So for the rest of the analysis, I want to talk about, first of all, the data set that it's based on. Um, so what we did is we looked at, um, 4 million dependency upgrades, uh, so cases where it dependency was upgraded from one version to another, um, there were 234,000 dependency versions represented across that data set, and this is a Java dataset. So it is, um, Java centric. Although I think a lot of the findings extend to other ecosystems as well. Um, 40% of the open source projects that are out there, uh, occurred in, uh, in this upgrade data. And so if you go back and you look at the, it was like 434,000 in the Maven ecosystem, that's about 10%, right?


So about 10% that ecosystem utilized in this dataset, um, what's interesting is, um, I said it's 4 million dependency upgrades, um, that represents, uh, those upgrades and those 234,000 dependency versions, the 40,000 projects that represents, uh, 20, 25%, uh, of the dependencies, uh, were actively managed. So what I mean there is, um, when you look at all the dependencies that show up, uh, in this dataset, uh, only 10,000 of them only twenty-five percent, um, actually were updated at some point, then this data set covers the last year. So it's in the last year, um, that that's occurred. All right. So that's, that's sort of the general, that's the general data. Um, and now I'm going to say, you know, what did we do with that? Right. Um, so first of all, we just looked at, you know, what's the vulnerability density I was talking about how the top 10%, most popular projects, you know, 23% were vulnerable, 26% were vulnerable.


Um, and then it drops off from there. If we look across all of the versions that were utilized in, um, in this dataset, 8% of them were vulnerable. Um, so that's, you know, that's not small, um, it's not 26%, but it's not small. Um, what's interesting is, uh, while 8% in general, we're vulnerable. If you go and you look at up at a per project level, so, you know, you're using some library as a dependency, um, and the version that you're on may be as vulnerable. Um, but you know, you might reasonably ask the question, well, can I, can I fix that right? Is there a version I can move to now that's not vulnerable and, and, and remediate this, uh, this security issue that I'm pulling in and the answer is a resounding yes. So only 800, I only 784 of these, um, projects that were being upgraded, uh, had no remediation path.


So there was no that you could move to, to fix the vulnerability. Um, that's only 0.3%. Um, and you know, these are, these are very different numbers. Uh, I think it helps to see it visually, uh, to get a sense of this scale. Um, so this is what 8% looks like compared to 0.3%, right? So, you know, if you think about that, like this, this talk is titled the minefield of open source, right? If you think about the graphic of the left is this minefield you have to navigate, and you want to avoid stepping on a red square, uh, because these are the vulnerable dependencies and you don't want to pull in those vulnerabilities. Um, it looks kind of dire, right? There's a lot of rad out there. Um, but the figure at the right is really the thing to be concerned about, right? These are the places where you could get stuck.


And generally I should mention, you're stuck for, you know, a narrow window of time, and I'll say more later about how to choose good dependencies, high-quality dependencies that release frequently so that you can minimize any sort of window of vulnerability. Um, but you know, if you have good tooling, if you're paying attention to, uh, what's vulnerable, what's not what a safe upgrade path looks like. Um, you can, you can be in a space that's much more like the figure at the right. All right. So, um, so that's sort of the landscape, how are, um, industry participants, how are, how are companies approaching management of their own dependencies? Um, and so this is, uh, uh, some survey data we've done the survey two years now, um, asking, uh, you know, industry software developers, team leads, et cetera, you know, how do you approach managing your software supply chain?


What do you have in terms of, uh, controls around build and release around consumption of, um, of open source, uh, how do you manage risk at your organization? Um, and what we see. So first of all, you know, this graphic is representing how mature, um, the, the, you know, the participants in the survey as a whole are, um, with respect to these various dimensions, um, of, of, uh, of control. And, uh, and you can see it goes from sort of completely unmanaged at the left. Like we're just not paying attention to that at all, uh, to the far right, as sort of monitor and measure approach, where you have controls in place, and you're actively monitoring to ensure those controls are working, changing things as needed to make sure that you're on top of whatever risks or threats, you know, might live in that area of the software development process.


And so we can see, um, so, you know, right is better right, as more mature, um, these show, the distribution across the data set of maturity for each of these, uh, and we can see remediation is actually quite good. So, um, as a whole, um, the industry is doing a pretty good job, uh, making sure that as vulnerabilities arise, you know, as those squares in that space, I was showing earlier turned red. Um, they are moving on, fixing those vulnerabilities, keeping things secure. Um, what's interesting is there's a lot less maturity around how you choose suppliers. Um, and so the questions in this category were all around. You know, what process, what standards do you have in place? Uh, when you go to add a new open source dependency, what sort of evaluation do you do with that project? How do you evaluate, you know, what that will mean for your technology development going forward to now be depending on this project, you know, is it a high quality project that it's fine to depend on?


You know, it'll benefit you, it'll, it'll stay up to date, it'll stay secure, whereas it a project where you're going to run into, you know, some problems, right. Maybe, um, it's breaking your build a lot, or, you know, they're changing the API or they're, you know, not responding to security incidents, things like that. Um, and so what's interesting. There is like, that's how, that's how you get proactive about security, right? Um, remediation, you know, going back here, this is, this is responding to things that come in being reactive, right. As CVEs get released, you're fixing them. This is getting proactive saying, you know, we're gonna, we're gonna start bringing in things that we know will cause less problems down the line. We're going to do some planning. Um, we're going to get ahead of this issue. And so I want to talk next about how do you get proactive, right?


If you, if you were to go back and you were to say, you know, I'm going to take, uh, I'm going to take my score here in the supplier category, and I'm going to, I want to improve that. What are some things you can do? Um, and so one is to pay attention to some quality metrics. So these are various quality metrics that have been proposed in the last couple of years. Um, one is meantime to update. Um, this was proposed, um, by us as part of the, the software supply chain research a couple of years ago, um, developed with, uh, Jean Kim. We worked on this together, uh, collecting this data and evaluating, uh, you know, how useful this was, uh, as a metric. What it's measuring is, uh, the average time that it takes a project to update one of its dependencies when that dependency releases a new version.


So sort of how up-to-date are you with respect to your dependencies? And this can be really important. Um, it's really important in general, but what's interesting about it is the lower. This is, um, the, the more benefits transitive dependencies to, right. If you think about, um, a deep dependency chain, uh, you know, in your, depending on library, a and it depends on library B and it depends on library. See, right? The, the better the MTT you is along that chain, you know, the faster security fixes feed their way down to you, right? So it's not just like, it's an important metric in isolation. It's also important to think about the composition of these free deep dependency chains. So that's why we think that's sort of an interesting metric. Um, there's also a couple of metrics proposed by the open source security foundation. Um, one is this criticality metric, which sort of measures how, how important a project is in the open source community.


How many people in the commercial, you know, users of open source, how many people depend on it, how many people contribute to it? Um, things like that. Um, there's also a scorecard metric from open SSF, which is, uh, uh, more of like a checklist of various best practices that you should be doing, you know, are you using CGI? Um, are you scanning code, um, you know, with static analysis tools, lynchers things like that, um, that, uh, they don't have a method for distilling that down to a metric. So there's no numerical, you know, scorecard results. Um, so we didn't include that in the analysis just because it was not quantitative. Um, so we couldn't do the sorts of data analysis that we were doing with the others with that metric, but I'd love to see a quantitative version of that. I think that'd be great.


Um, and then there's a library. I libraries source rank, um, metric, which is also, you know, if you go to and search for a package, you'll see this, um, it's a measure of various things. There's a description there about what it's measuring over at the right here. I have a summary of sort of like, what are the types of things that these various metrics are measuring? Is it mostly popularity based popularity is a big component of some of the source rank analysis. Um, is it measuring maturity? Um, is it measuring at its at its heart sort of more dev practices or is it looking at dependencies? MTU is very much looking at the dependency structure. Uh, so it's high on that axis. So you can sort of see a snapshot of an overview when do these metrics focus on. And I think what's interesting is, um, they focus on different things, right?


So that makes it really interesting to think about it, analyzing them and saying, you know, which one is most associated with various outcomes that we care about. And that gives you a sense of maybe, you know, which project attributes, um, are more associated with a particular good outcomes. And so that's exactly what we did. Uh, we asked this question of, you know, suppose developer a chooses, a high quality component, developer beaches, a low quality component for one of these quality scores. And we looked at each one sort of intern individually, um, is a phase project, less likely to have vulnerabilities, um, is a less likely to experience breaking changes as they keep their dependencies up to date. Um, are they gonna, you know, have to do a lot of work to do that? Um, and what we see is a yes, so for certain quality metrics, um, they are as so high quality with respect to that metric is associated with good outcomes.


Uh, so for example, high MTU projects are 1.8 times less likely to be vulnerable, then lower MTT or, you know, uh, sorry, I should say, um, high, high scoring projects. So low MTT U uh, projects that update their dependencies quickly, um, are less likely to be vulnerable, um, uh, projects that are slow to update their dependencies according to MTT you, um, are more likely to be vulnerable. Um, and this was the only metric showing statistical significance for this category. Um, high quality projects in general are a little less likely to have breaking changes. Um, and this was true with respect to all the metrics, uh, open SSLs criticality, uh, sort of showed the greatest effect. Um, this is a summary of the different effects. Uh, so as I said, MTU 1.8 times less likely to be vulnerable. If you have faster in TTU 3.2 times less likely to have breaking changes, the high, uh, scores, according to criticality were eight times less likely to have breaking changes and you can see sort of the other results.


Uh, one interesting thing is, uh, libraries, IO and popularity. So we included sort of like, well, what if you just choose based on popularity because a lot of people do that, right? Um, those were not good predictors when it comes to vulnerability or they were not associated, or I should say with, um, with, uh, security. Um, so more popular projects were actually more likely to contain vulnerabilities, um, that exactly sort of matches and mirrors the findings I talked about earlier at the population level observation that, you know, the most popular projects are where the vulnerability is live. That's where the security community is spending their time. Um, and libraries IO is, uh, includes a lot of popularity type, uh, uh, evaluations in its metrics. So, you know, it makes sense that that mirrors popularity as the results there. All right. Uh, so you know, you, if you look at this chart, right, MTT you, um, associate positively associated with good outcomes.


That's great. Um, another great thing is MTU is improving over time. So if we look at, um, the Java ecosystem has Maven central, we can see, you know, each year, uh, the D, the circle is, is where the, uh, average MTT U is for that year. And we can see that decreasing over time, uh, which is great. So the, the, um, community as a whole is getting better at keeping up to date, keeping secure when it comes to, uh, these vulnerabilities that come in through dependencies. All right. So that's sort of an overview of how you might go about choosing high quality components now, how do you keep those up to date once you've chosen them? Um, and what we see here is, um, first of all, uh, updates, you know, when you think about where a developer is spending their time, when it comes to updating dependencies, uh, it's a pretty narrowly focused, only 25% of the dependencies occurring in this dataset.


Uh, we're actively being updated. Um, and so this is just a graphic of, and you think about all of Maven central, only some of those are being used only some of the utilized, uh, projects are being kept up to date. Um, and when we look at how they're being kept up to date, there's a whole lot of what we're calling suboptimal, uh, update decisions. Um, so 64% of those updates were classified as, as imperfect, not optimal. Um, and so what does that mean? Uh, well, we had a number of rules for like, what does an optimal update decision look like? And it's, you know, some objective guidance, things like don't use alpha beta release candidate, uh, versions, you know, unless you want to be a beta tester, uh, don't do that in a production app. Uh, don't upgrade to a known vulnerable version, um, update if you, if there's only vulnerable versions available, at least choose a low severity of vulnerability, like try and be as secure as possible.


Um, and, uh, you know, sort of try and choose the latest when there's a tie. Um, and then there's some subjective criteria, uh, like, you know, choosing common update paths, you know, staying, staying with more popular versions. Um, that's better from a sort of a technical support perspective, um, choosing, um, uh, you know, choosing newer versions, minimizing, breaking changes, things like that. Um, so, you know, most, most updates do not sort of qualify as optimal according to this. And so what does that non optimality look like now? It looks like a lot of wasted work. So if you think about an imperfect upgrade, um, what we're seeing here is, uh, is sort of what that looks like. So imagine there's 1, 2, 3, 4, these, these rows are updates, uh, that a project made. And so it has some dependency, it updated it from 1.1 to version 1.5 and then from 1.5 to 1.8, but that whole time, 1.9 was actually the latest version.


Um, and then, and so then it, it did another update and then, you know, finally landed at, at the latest they're in update rescission for, um, so what it could have done. So the blue lines, these are the optimal path where you go from your out-of-date version directly to the latest version when a new version comes out and you go directly to that version. Um, and so you can see the restorative for updates when you could have gotten by with two, right? So that's that if we assume that each update takes some fixed amount of work, like that's wasted work, um, and how much wasted work well, a lot. So for a 20 a medium-sized enterprise that has like 20 application development teams, uh, we estimated, uh, you would save a a hundred and 160 developer days per year, um, which works out to about 190, 2000, depending on, you know, depending on your development costs, um, and benefits and things like that.


Right? So, um, you can definitely, there's only cost savings to be had by being more efficient about dependency update. Um, it's also important just from a, you know, planning work perspective, right. And being able to make the right decisions and having the right context and having the space to make the right decisions. Uh, so this is a, um, this is a view of, uh, all the updates for spring core, which is a very commonly used, uh, enterprise framework and, uh, what we can see here. So each square, so time is going from, uh, from the top to the bottom, um, versions are on the horizontal axis. Um, and so what this is showing, there's a few things to note about this, and it's a complicated figure, but there's a lot packed in here. I suggest to you, you know, sort of maybe pause and read the text, um, cause there's some really cool stuff here.


So one is, if you look over at the right, so the, the, um, how dark these squares are, show how many people are updating to that version. And so you can see, um, the most recent versions, which are the ones that the right are the darkest. So most people are staying up to date or staying at least close to the edge. Um, and you can see because there's sort of two edges that are dark. Um, this, this shows you that actually there's two minor versions that are being supported officially supported by the spring project. And so you can see that, you know, these vertical gaps are gaps between, um, sort of minor release versions. Uh, so, so that's great. Those, those, those projects are doing well, right. What's happening at the other side of the figure? Well, first of all, there's a whole lot of red, right?


So what does red mean? Red means, um, these people have updated to a vulnerable version. Um, and so you see a lot of, uh, you know, a lot of versions that are known to be vulnerable is still being utilized. Um, you know, probably for a variety of reasons, we haven't gotten into sort of, um, an analysis of exactly what might feed into those decisions. Um, but certainly these projects that are over there at the left, um, are having to be in a very reactive mode. So when I was talking earlier about proactive versus reactive, um, if you're over here at the right, you can be proactive, you can plan, you can say, okay, you know, there's a new version of this project out. Uh, we need to update in the next 60 days, you know, that's our standard, um, let's plan that work. Let's get it done, right.


If you're over at the left here, um, your responding to vulnerabilities as they come in, right as these, uh, you can sort of imagine there's this wave of red sort of moving from left to right as additional security research happens, new vulnerabilities are discovered and disclosed. Um, and so the people over there at the left using these older versions are much more likely to have a higher severity CV come out, um, that forces them to do an update sort of in the moment unplanned. Um, so I think that's, that's a really important reason to just stay up to date in general, right. It gives you that space, uh, to plan. All right. So those are the findings in general. Um, you know, what guidance comes out of that, right? So we observed all these things. Uh, we did this analysis, uh, what should you do day-to-day as part of your software development process?


Um, the first is have a process to choose high quality dependencies, right? I went through some of the quality metrics that are out there, what we see in terms of association between those, um, and good outcomes MTT was a good one to pay attention to, but, you know, pick, uh, pick some process, have some quality standard, and then apply that when you pull in new dependencies, um, have a process or tooling that lets you chart a safe course through that minefield of vulnerabilities, right. You know, when you were making a decision about, Hey, I need to update this, there's a vulnerability against it, make sure that you have the data. I have access to the data to tell you which one you should be updating to, um, live near the edge so that you can be proactive, um, but not, uh, right at the edge.


So, um, when we did the analysis, uh, the average on average, the optimal version was 2.7 versions from the latest, right? And that's because latest versions are more likely to be released candidates to be still in beta. Um, they're more likely to be subject to these supply chain attacks that I mentioned. Um, they're more likely to, you know, cause some breakage or bugs. And so staying back a little bit, you know, let's, you sort of see, see how a version is landing, uh, see how, you know, the, uh, community is reacting to it, give it some time to have, you know, sort of any issues discovered, uh, before you adopt it into production. You know, maybe you can be aggressive for non production apps, but, uh, for production apps, you know, you generally want to be close to the edge, but not quite there. So those are just some of the guidance that came out of this year's report. There's a lot more data in there. A lot more analysis. I encourage you if you're interested in this to go, uh, check some of that out. Uh, the website is on this slide here. Um, there's the report, the full PDF, um, that's many pages. Um, there's also a great summary. That's just in sort of webpage form. You can sort of click through there and see some more of the graphs, some more of the analysis that came out of this year's report. So thank you.