How Twilio Scaled Through Dev-First Security and DevSecOps
As more organizations leverage cloud native technologies such as Kubernetes, IaC, containers and serverless – shifting left and adopting DevSecOps is a must-do. But how does it actually work in practice?
Meet Twilio: a billion dollar unicorn that has mastered dev-first security. In this session, you’ll hear from Twilio’s Head of Product Security on how he built and runs an application security program that maintains high velocity outputs. You’ll learn about everything from their security champions program; developer threat modeling training; to their dev-friendly security tooling choices. Join us to learn from the pros.
This session is presented by Snyk.
Simon Maple
Field CTO, Snyk
Yash Kosaraju
Head of Product Security, Twilio
Chapters
Full transcript
The complete talk, organized by section.
Simon Maple
Hello everyone, and welcome to this Snyk session, "How Twilio Scaled Through Dev-First Security and DevSecOps."
My name is Simon Maple. I'm the VP of DevRel and Community here at Snyk. And joining me here today is Yash Kosaraju. Yash, how are you doing?
Yash Kosaraju
Hey. I'm doing good. How are you, Simon?
Simon Maple
Very well, thank you. You're the head of product security at Twilio. Tell us a little bit about what you do.
Yash Kosaraju
Cool. So I've been at Twilio close to three years now. My team essentially is responsible for all the security activities in the SDLC, ranging from developer secure coding, training, threat modeling, security champions, all the way to running our bug bounty program, doing penetration tests, and things like that.
Simon Maple
Awesome. And covering pretty much everything in between with tooling in the build pipeline, working with engineers, all of that stuff. Sounds like you're a busy man. Sounds like it's a busy team.
Well, welcome to the session, and we'll get started, and we'll go deeper in a lot of that going forward shortly.
So in this session, we're going to have a very brief intro into how modern application development has changed DevSecOps and changed the way we think about security of an app. We'll then cover a little bit of background about Twilio and talk about what Dev-First security and DevSecOps means at Twilio, and go into depth on certain things about responsibility, how we educate developers, and how we enable developers through the pipeline.
Okay, so let's briefly jump into how applications have changed in the last number of years. Pre-cloud, of course, we looked at developers who were writing their custom code and pulling in a number of open source libraries. That then got dropped, thrown over a wall, if you will, and dropped onto a stack, a platform that was very much handled by the IT and operations team.
Today, in a much more cloud-oriented environment, we see developers actually much more responsible for a larger number of things, including parts of the platform. That can include your containers, your config, et cetera. And as developers handle a lot more of these different artifacts and different configuration files, developers need to be more inclined to maintain, more inclined to secure, and certainly have that understanding of what misconfigurations are, as well as how secure their environments are.
So, of course, there are a ton of different types of attacks that could be possible at each of these different layers. And as more of the groupings here from your cloud services, your containers, et cetera, as more of that becomes available to the developer, the developers need to think much more about the different types of attack. And that does include your config, your unpatched packages in your operating system, ports, which maybe previously developers didn't need to think about. All of these need to be brought to the developer's attention.
So there are a number of different ways in which that can be done, different toolings in which that can be achieved. I'm not going to go through all of this in too much depth, but you'll see on this iceberg picture, as there is typically custom code which developers have traditionally been more focused on. This is just a small piece of the iceberg.
Whereas realistically, under the covers today in modern applications, there's open source code in libraries that we need to think about, making sure where our vulnerabilities exist in those open source packages, as well as in containers, the hundreds of packages that we pull in. Thinking about what container version we're using, thinking about which packages we're pulling in, and of course, the infrastructure that binds all of that together, the code that we use in the config, the scripts, whether it's Terraform, Kubernetes, whatever it is that pulls all of that together, we need to make sure that in this entire software supply chain, we're not just writing and maintaining, but we're securing and being very conscious of where attacks, et cetera, can come in.
So with that, I did promise there'd only be a few slides. I'm going to jump across over to Yash again. And yeah, let's have a discussion about some of these pieces. And we'll first talk, why don't we have a little bit of background about Twilio, maybe the journey Twilio's been on and the changes in the style of application that you're looking to support, Yash. And I'd love to know who's responsible for those different areas as well.
Yash Kosaraju
Cool. So Twilio has a wide variety of services we provide, right? And that kind of trickles down to things that the security and trust team needs to secure and work with engineering.
In terms of how it's divided between the teams, so we have multiple sub-security teams. We have product security, cloud security, enterprise security, vulnerability management, and so on and so forth. More relevant to this discussion, I guess I'll focus a little more on cloud and product and how they kind of work in this cloud-native environment of Kubernetes, because you could pretty much make an argument Kubernetes is a cloud security responsibility versus it's not, and then where do containers fit in, right? That's been an interesting debate over the last few years.
So the way we try to do it at Twilio is everything that's written by our developers is product security, and then once you talk about how that's deployed on the infrastructure, that's cloud security, and that's super high level. So Kubernetes security is something we work together on. Everything that's AWS and cloud security in general, that goes to the cloud sec team. Security stuff relating to the code that's written internally, the applications we build, and containers and all that portion of it lies within product security.
Simon Maple
What are the typical interactions then between the security teams and the dev teams in those instances? So why don't we take something which is much more on the application side of maybe open source libraries or their own code as well. Where is that line between where devs find things, where the security team come to the devs with issues? How does that work?
Yash Kosaraju
The way I like to think about it is we work with the engineering teams to help them write more secure code, and build more secure products for Twilio's customers. I don't want to draw a line of who finds issues. Anyone within Twilio can find an issue, come to us, and say, "How do we fix this?" That's probably my dream, if engineering can start thinking about that, come to us and say, "Hey, how do we fix this?" And that, to some extent, happens within Twilio.
The way we do it is we have a security champions program. So there is a security champion nominated in pretty much every engineering team, and they have a single point of contact within the security team who's their security partner, and they work pretty closely on a regular basis, talking about the changes the team is making, new products, new features, and then deciding as a team what security activities need to be done. It could be like, let's do a threat model, or let's do a quick pen test on this, or just let's talk through the flow and make sure everything's okay. So it's a team effort.
Simon Maple
And that's really interesting. I think it's a common thing to see a security champions or security mavens program, and these are the kind of things that clearly we've seen success with from a number of different people. In Twilio specifically, realistically, when we talk to a developer, security's not going to be high on their list in terms of things they want to get done that day. So how do you really energize the developers into wanting to be a part of this program or wanting to be educated in security?
Yash Kosaraju
I think as long as we make security asks reasonable and possible, I think it'll work. If I go to them and say, "Here's a vulnerability to fix, there's no public fix available. I don't know what you're going to do with it, but you need to fix this," that's not going to work.
But if I go and say, "This is the vulnerability, this is why it's important. This is what are the repercussions of not fixing this, and this is how we can help you fix it," and sort of show them a path forward of fixing stuff without breaking something else, I think developers are open to security. From my experience, at least at Twilio, everyone wants to do the right thing, but it's how do we help them do it? And that's the important part that we try to focus on.
Simon Maple
And as an individual that is part of a security champions program, so you mentioned there's one person from each team or each squad in Twilio that is part of this program. What should I expect? Do I get educated internally by the security teams? How do I interact with other security champions? What's my kind of different day to day?
Yash Kosaraju
So we do have secure coding training for pretty much all engineers at Twilio. But when it comes to champions, we recently rolled out in-person, well, virtual of course right now, but virtual in-person security training based on what they're building.
We also have an advanced security champions program where they can enroll and do offensive, defensive, and cloud security courses/challenges in a CTF-style environment, and then sort of earn points as part of that. We also have Slack in there. And at the end of the day, if they complete all of these, we also give them more responsibilities and privileges, which usually are reserved for the security team.
Simon Maple
So it's kind of like the more they show, the more you enable them in terms of what they can do.
Yash Kosaraju
Yes.
Simon Maple
Right. In terms of then how they apply themselves within their engineering org, do they fit into the design a little bit earlier? Do they inject themselves into code reviews? How do they then engage with the engineering teams?
Yash Kosaraju
You mean the champions or the partners?
Simon Maple
The champions, yeah.
Yash Kosaraju
So the champions are part of the engineering team, right? So they would be part of the whole end-to-end process that the team works on. And once they've gone through these trainings, some things that we do there are like, okay, you do threat models from the dev side of things. Now let's switch sides. Why don't you come join us as part of the security squad when doing a threat model for another team, and then sort of enable them to think like an attacker during threat models.
And that kind of goes on for all the different challenges we have. And at the end of it, it kind of helps them in the long run of when they're building a design for a new feature, they start to think, "Okay, how can this be broken?" And that kind of helps reducing the number of issues that my team would find later on in the process.
Simon Maple
Okay. And for other individuals in the team who are not part of the security champions team, if they need support, if they need help, do they go to the champions, or would they come directly to the security team?
Yash Kosaraju
They can come directly to the security team. I would not turn any developer away if they come and say, "Help us do the right thing." However, they can go to the security champion. They can go to the security champion, and through them, come to the partner. I'm open to any interaction that they want to have.
Simon Maple
Awesome. And I think that's turning into a real best practice, I think, of having a security champions program and a really great way of developers almost like holding each other responsible for a high level of security practice within an engineering team, which sounds good.
So why don't we talk a little bit about how a pipeline in Twilio would work. So a developer, first of all, actually, who owns that pipeline? Do you have DevOps-style teams? Is there developers who kind of work on those pipelines as well? Where does the ownership sit?
Yash Kosaraju
So we have a platform team that owns the whole pipeline, which developers then use to sort of build their features or products on top of it. So essentially, whenever we want to put tools in the pipeline, it's basically working with the platform team that owns those pipelines to sort of enable us.
And in my opinion, the DevOps team that owns these pipelines are basically a secret sauce that enables security to succeed because you're essentially going in saying, "I have a tool which I want to put in your stuff. Please let us do it."
Simon Maple
So a developer now wants to push some code into production, so they make their code changes. At what stage, what's the first stage they start testing, perhaps in an automated way or even manually, for security issues?
Yash Kosaraju
So for security issues, the way we are trying to do it is have as many checks as possible in the pipeline, right? All the things from code ownership checks to secrets in code to dependency security, static code analysis, all of that stuff as early as we can and give feedback via comments in pull requests or Slack messages or combinations. Or if it even works, we may even create tickets in the right team's queue if all of these automations set up work with each other.
Simon Maple
And what's the most important way to get that feedback to a developer that there's an issue? What's the feedback cycle, and how does a developer expect to resolve those?
Yash Kosaraju
I think the true false positive rate should be pretty low. We've had code analysis tools submit comments on PRs, and people actually look at them immediately. And one of my team's learning in the past is some of the tools that we have used have had a higher rate of false positives than others. And then people immediately look at those if they see 10 high findings that need to be fixed because they want their code to be secure.
But at the end of the day, if they find out that eight of those 10 are false positives, they're going to lose trust in the tooling we run. So I think that's a big important issue for me and my team is whatever we run, whatever we tell the developers to do, and whatever type of feedback we give, there needs to be that consistent high rate of efficiency and less false positives, to maintain trust and sort of get people to look at those.
Simon Maple
Yeah. I think that's certainly one of the most important things I've seen as well in terms of when developers have time to fix. They don't want to fix things that are just frustrating them in terms of, "Yes, I know I see 20 issues here, but only two of them are real issues," kind of a thing.
Now you use Snyk today, and one of the big things which we care about as a developer tool in security is how we make actionable information back to the developer. In terms of how much remediation and things you expect developers to do straightaway or to do to their backlog, how important a push do you have on remediation and fixing vulnerabilities within your security org?
Yash Kosaraju
It depends on the criticality of the issue. Certainly, we're not going to say, "Here's 100 issues. Go fix all of them now." The way we're trying to solve this is categorize those issues based on severity, fix availability, exploit maturity. Has this been exploited before? Is it in our edge service or not? And based on a bunch of factors, sort of categorize them, and then use our vulnerability management standard, which has defined SLAs for certain criticality within all of Twilio, and then use that to sort of drive remediation in a slow but progressive manner versus trying to tackle everything at once.
Simon Maple
And we'll talk about automation actually in a little sec as well, go into more depth on that. But I think, in terms of vulnerabilities that you find in your pipeline, you do no blocking right now today, I believe, in your pipeline. But talk us through some of the automation that you have that kind of generates those tickets for you.
Yash Kosaraju
Sure. So the first problem we tried to solve was code ownership because more often than not, you find a vulnerability, you dig through to get to the source code. But then the next question is, who does this even belong to? And more than once in the past, I've done, like, a git blame to see who made changes, ping them. They may be offline, may not be in the company, then go down that sort of rabbit hole to finally figure out who owns that piece of code.
So one way we're solving that problem at Twilio is asking engineers to put an about.yaml file in their code repos with basic metadata of that code that we want to know: which team the code's owned by, what Jira project they work off of, what's their Slack channel, all of those things.
And then we're also putting in all of our security tooling into the pipeline, like container scanning, code scanning, dependencies, all of those. And we're building a sort of ticketing framework, which essentially talks to all of these tools, gets results, goes and looks for the about.yaml file, and then based on certain rules we write, it goes and files tickets into the exact queue that the teams would look at.
So essentially, the teams then don't have to go and sift through a bunch of results, figuring out which ones are actually applicable to them. They get those tickets in their Jira queue, in the backlog where their engineering managers or product managers can look at them and help prioritize who works on what.
Simon Maple
Awesome. So automation's absolutely key in terms of the adoption. Is that how you see it? Is there anything you do outside of the pipeline that can really engage developers in that adoption of testing earlier before they enter the pipeline?
Yash Kosaraju
So security education is a key there, right? We do in-person and online security education, and basically walk through different scenarios that have happened in the past and how they can happen again. We've also recently started a lunch-and-learn practice within Twilio where we just take an hour instead of a day-long training and talk through some of the things that we're doing.
This is not necessarily OWASP Top 10, but it's like, "Hey, how does Twilio do dependency security? Why are secrets in code bad?" And then talk through some of the bigger projects that we have taken, the motivations behind them, and also get developers familiar to the tools that we use and what to expect from those tools and what not to expect from them.
Simon Maple
Awesome. And I think there's obviously the differences between the pipeline and educating the developers is that you can teach developers, you can educate developers, but you can't force a developer to do anything necessarily. Whereas in the pipeline, in the automation, everything gets policed and everything gets tested every time. So it's such a key piece there.
Yash Kosaraju
It is.
Simon Maple
You also actually did some automation around Snyk, more recently in Snyk Watcher. Can you tell us a little bit about that?
Yash Kosaraju
Sure. So we rolled out Snyk for all of our code and then realized there are code changes like creating new projects, deleting, archiving, and those weren't reflecting back into Snyk in a native fashion, and we didn't really want to go into Snyk every week or every couple of days and say, "Re-import everything to maintain state." And one of my engineers was like, "Let's build automation around this."
And then essentially Snyk Watcher keeps our code repositories in sync with Snyk. So whenever a change happens in our code, for example, say a project is archived, right? That triggers off a webhook, Snyk Watcher goes into Snyk and then makes the relevant changes to the projects within Snyk.
Simon Maple
Awesome. And you use the Snyk API for that, I presume?
Yash Kosaraju
Yep.
Simon Maple
Yeah. Awesome. And I'd love to talk a little bit going more towards the path of success, because I think this is always a tough question in terms of how you feel you are being successful in your security programs, how you see developers being successful. What are the kind of things you measure today in terms of your security programs?
Yash Kosaraju
So that's something we have recently started working on. So being completely honest, I don't have the full answer of what does success metric look like for a security team, but the way we're trying to approach this is first surface a dashboard of, okay, here are all the vulnerabilities per business unit, and then let BU leaders be able to see that and get that visibility as one single dashboard from all of our tools.
And the way we're diving deep into that is also breaking it down on which phase of our capabilities were those vulnerabilities found. So basically imagine a graph of sorts which says threat modeling found X percent of your issues, versus Bug Bounty found Y percent. And the more we mature, I foresee that graph being heavy on X, which is the number of vulns or the percentage of the vulnerabilities found during threat modeling, code reviews, champion syncs, and stuff like those, versus Bug Bounty. The idea being we try and eliminate vulnerabilities before they even hit our code.
Simon Maple
Awesome. And I think you talk a little bit about how much you value the dev time previously in terms of giving tools and processes, et cetera, that speed up that pipeline, reducing false positives so you know what they're working on are the critical things that are actually issues. What kind of feedback do you give to developers via dashboards and things like that? How do they know what to work on next in a backlog, for example?
Yash Kosaraju
So that's what I leave it up to the security partners and champions. So we also are building dashboards for each champion teams or BUs, which basically shows a list of open security tickets in each team's queue, and also open task tickets in the security queue, which relates to those teams, right? It's more of a two-way street.
The team can come and say, "Do X, Y, and Z for us, security." And those three tickets will show up in the dashboard. And every time the security partner and the security champion meet, they kind of look at this, be like, "Hey, you have five tickets in your backlog. Can you prioritize those in your next sprint planning?" Engineering can come to us and say, "Hey, you have these two tickets that we asked you to work on. Can you actually get to those soon?" So that dashboard working with those champion syncs is sort of the way we envision working through those tickets and dashboards and things like that.
Simon Maple
Got it. And that's interesting that you talk about different tickets getting into the next sprint or a future sprint. What kind of prioritization would a security issue get over a new feature or a functional bug? How do the teams balance that? Is it mostly based on SLAs or is there other things involved?
Yash Kosaraju
There is SLA, and we're also working with teams to try and dedicate some portion of time for security asks on a regular basis. It also depends on the criticality of the bug. For example, there could be a bug which is pretty benign and has not been exploited. There's no known exploit, so that could take backseat. But once we have evidence that, hey, this is being exploited in the wild, we need to go and get this done now, then that's when the conversation changes to, this is why it needs to be done.
So essentially, the bottom line is, unless we have a valid reason to ask dev to stop what they're doing and work on security, we don't usually do that. Security, it should not be a hammer that we use and say, "Go do this." It should be more of a collaborative working session between engineering and us in the long term to build that trust relation and find that balance between security and feature.
Simon Maple
And I think this is really interesting, actually. I think if we were to even take that earlier, so let's say there's a developer's working on a new feature and they need a new package, or they need to use a specific Docker container. There's a balance, really, between you giving the developer enough space in which they can be creative, in which they can choose different tools or different libraries that they want to use. But there's also you wanting to make sure that there is maybe guardrails or some way in which that developer is being sensible with what they choose. How do you balance that, and how do you know when to give developers more rope to be able to express themselves more freely?
Yash Kosaraju
I think that's the change of thinking that you need to look at in this case because security is not just responsible for security of Twilio, right? We are here to enable developers to embed security into their products, and they're ultimately responsible for their products. So the way I look at it is we give them the right tools, the right results, the right guidance, and we kind of make a collaborative decision of what takes precedence.
Simon Maple
Awesome. Yash, thank you very much. This has been a really interesting chat, and thank you for joining us today.
Yash Kosaraju
Thanks, Simon.