How Twilio Scaled Through Dev-First Security and DevSecOps

As more organizations leverage cloud native technologies such as Kubernetes, IaC, containers and serverless – shifting left and adopting DevSecOps is a must-do. But how does it actually work in practice?


Meet Twilio: a billion dollar unicorn that has mastered dev-first security. In this session, you’ll hear from Twilio’s Head of Product Security on how he built and runs an application security program that maintains high velocity outputs. You’ll learn about everything from their security champions program; developer threat modeling training; to their dev-friendly security tooling choices. Join us to learn from the pros.


This session is presented by Snyk.

SM

Simon Maple

Field CTO, Snyk

YK

Yash Kosaraju

Head of Product Security, Twilio

Transcript

00:00:13

Hello everyone. And welcome to this sneak session. How Twilio scaled through Def first security and DevSecOps? Uh, my name is Simon maple. I'm the VP of dev role and community here at sneak. Uh, and joining me here today is yeah. How you doing?

00:00:30

Hey. Yeah, I'm doing good. How are you, sir?

00:00:31

Very well, thank you. You're the head of product security at Twilio. Tell us a little bit about, uh, about what you do.

00:00:38

Oh, so I've been at Twilio close to three years now. My team essentially is responsible for all the security activities in the SDLC ranging from a developer secure coding, training, threat modeling security champions, to all the way, uh, running our bug bounty program, doing penetration tests and things like that, and covering pretty much everything in between with tooling in the SD, uh, in the build pipeline, uh, working with engineers, uh, all of that stuff

00:01:07

Sounds like you're a busy man. Sounds like it's a busy team. Um, well, welcome to the welcome to the session and we'll, uh, we'll get started and we'll, we'll, uh, go deeper in a lot of that, uh, going forward shortly. So in this session, we're going to have a very, very brief intro into how modern application development has, has changed DevSecOps and changed the way we think about security of an app. Uh, we'll then cover a little bit of background about Twilio and talk about how deaf first security, what Def for security and DevSecOps means at Twilio and going into depth on certain things about responsibility, how we educate developers and how we enable developers, uh, through the pipeline. Um, okay. So, uh, let's briefly jump into how applications have changed, uh, in the last number of years. Uh pre-cloud of course, we looked at developers who were writing, uh, their custom code and pulling in a number of open source libraries that then got dropped thrown over a wall, if you will, and dropped onto a stack a platform that was very much handled by the it and operations team today in a much more, uh, uh, cloud oriented environment.

00:02:13

We see developers actually much more responsible for a larger number of things, including parts of the platform that can include your containers, uh, your, your, your config, et cetera, and as developers, uh, handle a lot more, uh, of these different artifacts in different configuration files, uh, developers need to be more inclined to, uh, maintain more inclined to secure, and certainly have that understanding of, uh, what misconfigurations are as well as, uh, how secure their environments are. So of course there are a ton of different types of attacks that could be possible at each of these different layers. And as more of the, uh, as more of the groupings here from, uh, your, your cloud services, your containers, et cetera, as more of that becomes available to the developer, the developers need to think much, much more about the different types of attack, and that does include your configure unpatched packages and your operating system, uh, which, you know, maybe previously developers didn't need to think about all of these needs to need to be brought to the developers' attention.

00:03:16

So there are a number of different ways in which that can be done, uh, different toolings in which that can be achieved. I'm not going to go through all of this in too much depth, but you'll see, uh, on this kind of like iceberg picture, as there is typically custom code, which developers have traditionally been more focused on, this is just a small piece of the iceberg, whereas realistically, under the covers today in modern applications, there's open source code in libraries that we need to, that we need to think about making sure where our vulnerabilities exist in those open source packages, as well as in containers, the hundreds of packages that we pull in thinking about what can contain a version we're using thinking about which put, which packages we're pulling in. And of course the infrastructure that binds all of that together, uh, the code that we, uh, that we use in the conflict, the scripts, whether it's Terraform, Kubernetes, whatever it is that pulls all of that together, we need to make sure that in this entire software supply chain, uh, we're, we're not just writing and maintaining, but we're securing, uh, and being very conscious of where, uh, attacks et cetera, uh, can come in.

00:04:20

So, uh, with that, I did promise to only be a few slides. Uh, I'm going to jump across over to, uh, to yet again, um, and yeah, let's, let's have a discussion about some of these, some of these pieces and we'll first talk, why don't we have a little bit of background about, about 20, or maybe the journey Twilio has been on and the changes in the start of application that you're looking to support. Um, and, um, and I'd love to kind of like know who's responsible for those different areas as well.

00:04:48

Cool. So Twilio has a wide variety of services we provide, right? And that kind of trickles down to things that the security and trust team needs to secure and work with engineering, uh, in terms of how it's divided between the teams. So we have multiple sub security teams, we have product security, cloud security, enterprise security, while management, and so on, so forth, uh, more relevant to this discussion, I guess, uh, uh, focus a little more on cloud and product and how they kind of work in this cloud native environment of Kubernetes, because you could pretty much make an argument Kubernetes as a club security responsibility versus it's not. And then where do containers fit in? Right. That's been a interesting debate over the last few years. So the way we try to do it at Twilio is everything that's written by our developers is product security. And then once you talk about how that's deployed on the infrastructure, that cloud security and that's super high level, uh, so Cuban ID security is something we work together on, um, everything that's AWS and cloud security in general, that goes to the clubs team, um, security stuff, relating to the code. That's written internally, the applications we build, uh, and containers and all that portion of it, uh, lives within product security.

00:06:13

Well, what are the typical interactions then between the security teams and the dev teams in those instances? So why don't we take a, why don't we take something which is much more on the application side of, uh, of, you know, maybe open source libraries or, or, you know, their own code as well. Um, when does the, when did the dev, where is that line between where dev find things where, where, um, the security team come to the devs with issues? How does that work?

00:06:36

The way I like to think about it is we work with the engineering teams to help them write more secure code, uh, and build more secure products for Twilio's customers. Uh, I, I don't want to draw a line of like, who finds issues. Anyone within Twilio can find an issue, come to us and say, how do we fix this? Like, that's probably my dream. If like engineering can, you know, start thinking about that, come to us and say, Hey, how do we fix this? And that, to some extent happens within Twilio, the way we do it is you have a security champions program. So there is a security champion nominated in pretty much every engineering team. And they have a single point of contact within the security team, boosts their security partner, and they work pretty closely on a regular basis. Talking about the changes the team has making new products, new features, and then deciding as a team, what security activities need to be done. It could be like, let's do a threat model, or let's do a quick pen test on this, or just let's talk through the flow and make sure everything's okay. So, uh, it's a team effort

00:07:42

And that's really interesting. I think it's a common thing to see a security champions or security mavens program. And these, they, these, these are the kinds of things that clearly we've seen success with for, for a number of different people in Twilio, specifically, um, developers w you know, realistically, when we talk to a developer, security is not going to be high on their list and says things they want to get done that day. So how do you, how do you really energize the developers into wanting to be a part of this program or wanting to be educated insecurity?

00:08:10

I think as long as we make security asks reasonable and possible, I think it will work if I go to them and say, here's the vulnerability to fix? There's no public fix available. I don't know what you're going to do with it, but do you need to fix this? That's not going to work, but if I go and say, this is the vulnerability, this is why it's important. This is what are the repercussions of not fixing this, and this is how we can help you fix, and then sort of show them a path forward of fixing stuff without breaking something else. I think developers are open to security from my experience, at least the Twilio, uh, everyone wants to do the right thing, but it's how do we help them do it? And that's the important part that we try to focus on.

00:08:55

And as I, an individual that is part of a security champions program. So you mentioned there's one person from each team or each squad in Twilio that is part of this in this program. What should I expect? Do I get educated internally by the, by the security teams? How do I interact with other security champions? What's, what's my kind of different state today.

00:09:15

Um, so we do have, um, assess secure coding training for pretty much all engineers to Twilio, but when it comes to champions, we recently rolled out like in-person well, virtual of course right now, but what you will in person, uh, security training based on what they're building. Uh, we also have a advanced security champions program where they can turn up and roll and do offensive defensive and cloud security, um, courses slash, uh, challenges in a CTF style environment. Um, and then sort of earn points as part of that. We also have slack in there. Um, and at the end of the day of the complete, all of these, we also give them more responsibilities and privileges, which usually are reserved for security.

00:10:03

So, so it's kinda like, uh, the more they, the more they show, the more you enable them in, in terms of, uh, in terms of what they can, what they can do. Right, right. Um, in, in terms of then how they apply themselves within the engineering org, um, how do they, do they, do they fit into the design a little bit earlier? Do they inject themselves into code reviews? Uh, how do they then engage with the engineering teams?

00:10:31

Uh, you mean the champions or department? This is the champions. Yeah. So the champions are part of the engineering team, right? So they would be part of the whole end to end process that the team works on. And once they've gone through these trainings and some things that we do, they're like, okay, you do threat models from the dev side of things. Now let's switch sides. Why don't you come join us as part of the security squad when doing a threat model for another team, and then sort of enable them to think like an attacker during threat models. And that kind of goes on for all of the different challenges we have. And at the end of it, it kind of helps them in the long run of like, when they're building a design for a new feature, they start to think, okay, how can this be broken? And that kind of helps reducing the number of issues that my team would find later on in the process.

00:11:22

Okay. And, and for other individuals in the team who are not part of the security champions team, if they need support, if they need help, do they go to the champions or would they come directly to the, uh, to this curiosity?

00:11:34

Uh, they can come directly to the security team. Uh, I would not turn any developer away if they come and say help us do the right thing. Uh, however, uh, they can go to the security champion. They can go to the security champion and through them come to the partner, I'm open to like any interaction that they want to have.

00:11:52

Awesome. Awesome. And I think that's, uh, yeah, like I say, it's, uh, it's, it's turning into a real best practice. I think of having a security champions program and a really great way of developers almost like holding each other responsible for, for a high level of security practice within an organization within an engineering team. Um, which sounds good. Um, so let's talk a little bit about, um, how a pipeline in Twilio would, would work. So a developer, first of all, actually who owns that pipeline, do you have like dev ops style of teams is a, is that developers who kind of, uh, work on those pipelines as well? Who, where, where does the ownership? So,

00:12:31

So we have a platform team that owns the whole pipeline, which developers then use to sort of build their features or products on top of it. Uh, so essentially whenever we want to put tools in the pipeline, it's basically working with the platform team that owns those pipelines to sort of enable us. And in my opinion, like the team, the dev ops team that owns these pipelines are basically a secret sauce that enables security to succeed, because you're essentially going in and saying, I have a tool, which I want to put in your stuff, please let us do it. Hmm.

00:13:10

And, and so a developer now wants to push some, push some code into production. So they make their code changes at what stage what's the first stage do they start testing, uh, perhaps in an automated way or even manually, uh, first for security issues.

00:13:25

So for security issues, uh, the way we are trying to do it is have as many checks as possible in the pipeline, right? Uh, all the things from code ownership checks to secrets in code to, uh, dependency, security, static code analysis, all of that stuff, as early as we can and give feedback vs comments and pull requests or slack messages or combinations, or if it even works, we may even create tickets in the right teams queue if, uh, all of these automations set up, uh, work with each other.

00:14:00

And what's the, what's the most important way to get that feedback to a developer that there's an issue, how, you know, what's the, what's the feedback cycle and how does a developer expects to resolve those?

00:14:13

I think the developers would have is, uh, the true false positive rate should be pretty low. We have had a code analysis tools, submit comments on PRS and people actually look at them immediately. And one of my team's learning in the past is some of the tools that we have used have had a higher rate of false positives than others. And then people immediately look at those. If they see 10 high findings that need to be fixed because they want their car to be secure. But at the end of the day, if they find out that eight of those 10 are false positives, they're to lose trust in the tooling V run. Um, so I think that's a big, uh, important issue for me and my team is whatever we run, whatever we tell the developers to do and whatever type of feedback they give there needs to be that consistent higher rate of efficiency and less false positives, uh, to maintain trust and sort of get people to look at those.

00:15:16

Yeah, I think, I think that's, uh, that's, uh, certainly one of the most important things I've seen as well in, in, in terms of when developers have time to, to, to, to fix, they don't want to fix things that are just frustrating them in terms of, yes, I know I've got, I see 20 issues here, but only two of them are real issues kind of, kind of, kind of a thing. Um, now that now, yes. Um, uh, sneak today and one of the questions, one of the big things, which we, which we care about as a, as a developer tool, uh, insecurity is how we make, uh, um, actionable information back to the developer. Um, in terms of, in terms of how much remediation and things you expect developers to, to do straight away or to, you know, due to their backlog, how important a push do you have on remediation and fixing vulnerabilities in, uh, within your security org?

00:16:12

It depends on the criticality of the issue. Uh, certainly we're not going to say here's a hundred issues, go fix all of them now, uh, the way we're trying to solve this is categorize those issues based on severity, fix availability, uh, exploiting charity, has this been exploded before? Is it in our edge service or not? And based on a bunch of factors, sort of categorize them and then use our wealth management standard, which has defined SLA for certain criticality within, uh, all of Twilio, and then use that to sort of private mediation, uh, in a slow but progressive manner versus trying to tackle everything at once.

00:16:54

And we'll talk about automation actually in, in a, in a little sec as well, go into more depth on that. But I think, you know, in terms of vulnerabilities that you find in your pipeline, you do know blocking right now today, I believe in your, in your pipeline, but talk, talk, talk us through some of the automation that you have that kind of, uh, generates those tickets for you.

00:17:13

Sure. So the first problem we tried to solve was code ownership because more often than not, you find a vulnerability, you dig it dig through to get to the source code, but then the next question is who does this even belong to? And more than ones in the past have done like a good blame to see who made changes, paying them what they may be offline may not be in the company then, you know, go down that sort of rabbit hole to finally figure out who owns that piece of code. So one may we're, uh, solving that problem at Twilio is asking engineers to put a about Yammel file in their code repos with basic metadata of that code that we want to know which team the codes owned by what your project they work off of, what's their slack channel, all of those things.

00:17:59

And then we're also putting in all of our security tooling into the pipeline, like container scanning, code, scanning, dependencies, all of those. And we're building a sort of ticketing framework, uh, which essentially talks to all of these tools, gets results, goes and looks for the about Yammel file. And then based on certain rules, we write it goes and files tickets into, uh, the exact queue that the teams would look at. So essentially the teams then don't have to go and sift through like a bunch of results, figuring out which ones are actually applicable to them. They get those tickets in their JIRA queue, in the backlog where they're engineering managers or product managers can look at them and sort of help prioritize, uh, who works on what

00:18:41

Awesome. And, and how, how is so automation is like absolutely key in terms of the adoption. Is that, is that how you say it? Um, is there, is there anything you do outside of the pipeline that can really engage developers in that, in that adoption of testing testing earlier, before that, before they entered the pipeline?

00:18:59

So security education is a key there, right? We do, uh in-person and sort of online security education and basically walk through different scenarios that have happened in the past and how they can happen. Again. We've also recently started a lunch and learn, um, practice within Twilio, where we just take like an hour instead of a day-long training and talk through some of the things that you're doing. Like, this is not necessarily a wash top 10, but it's like, Hey, how do, how does Twilio do dependency security? Why are secrets in code? And then sort of talk through some of the bigger projects that you have taken the motivations behind them, and also get developers familiar to the tools that we use and what to expect from those tools and what not to expect from them.

00:19:43

Awesome. And I think there's obviously the differences between the kind of the pipeline and, um, educating the developers is, is that you can, you can teach developers, you can educate developers, but you can't force it about a pet to do anything necessarily. Whereas in the pipeline, when the automation, everything gets policed and everything gets tested every time. So it's such a, such a key piece there. Um, in, in, in terms of you also actually also did some automation around sneak, uh, well recently in, uh, sneak watcher. Can you tell us a little bit about that?

00:20:14

Sure. So we rolled it out to me for all of our code and then realized there are code changes, like creating new projects, deleting archiving, and those weren't reflecting back into sneak in a native, uh, fashion. And we didn't really want to go into sneak every week or every couple of days and say reimport, everything to maintain the state. And one of my engineers was like, let's build automation around this. And then essentially sneak watcher keeps our code repositories in sync with snakes. So whenever a change happens in our code, for example, say a project is archived, right. That triggers off a web hook, a sneak watcher goes into sneak and then makes the relevant changes to the projects within sync.

00:20:57

Awesome. And you use the snake API for that? I perceive. Yep. Yep. Awesome. Awesome. Um, and I'd love to kind of like talk a little bit going more towards the, the, the path of success, because I think this is always a tough, a tough question in terms of how you feel you are being successful in your security programs, how you see developers being successful. Um, what are the kinds of things you measure today in terms of the, your security programs?

00:21:24

So that's something we have recently started working on. So being completely honest, I don't have the full answer of like, what does success metric look like for a security team, but, uh, the way we are trying to approach this is first surface, a dashboard of, okay, here are all the vulnerabilities per business unit. And then, uh, lead be a leader is be able to see that instead of get that visibility as one single dashboard from all of our tools, um, and the way we are sort of diving deep into that is also breaking it down on like, which phase of our capabilities, where those vulnerabilities. So basically imagine a graph of sorts, which says threat modeling found X percent of your issues versus by Bonnie found Y percent. And the more of me, we mature, I foresee that graph being heavy on X, which is the number of loans or the percentage of the vulnerabilities. And during third modeling code reviews, champions, sayings, and stuff like those, um, where versus bounty, the idea of being the try and eliminate vulnerabilities before they even hit our code.

00:22:34

Awesome. And I think econ like, uh, talk a little bit about how much you valued the dev time previously in terms of, in terms of, uh, giving, giving tools and processes, et cetera, that, that, that, you know, ma speed up that, uh, that, that pipeline reducing false positives. So, you know, what, what they're working on to the critical, critical things that are actually issues, um, what kind of feedback do you give to developers by like dashboards and things like that if they were, if they didn't, you know, how do they know what to work on next in, in a backlog, for example.

00:23:08

So that's what I leave it up to the security partners and champions. So we also are building dashboards for each, uh, champion teams or be used, which basically shows a list of open security tickets in each team's queue and also open task tickets in the security queue, which relates to those teams, right? It's more of a two-way street. The team can come and say, do X, Y, and Z for us to carry. And those three tickets will show up in the dashboard and every time the security partner and the security champion meet, they kind of look at this and be like, Hey, you have five tickets in your backlog. Can you sort of prioritize those in your next sprint planning engineering can come to us and say, Hey, you have these two tickets that we asked you to work on. Can you actually get to those soon? So, uh, that dashboard working with those champions sings is sort of the way we envision, um, sort of working through those tickets and dashboards and things like that.

00:24:03

Got it. And that's interesting that you talk about, you know, different tickets getting into, into the next sprint or a future sprint. What, what kind of, um, uh, prioritization would a security issue over a new feature or a functional bug? Uh, how did the, how their teams balance that? Is it mostly based on SLS or is there other things involved?

00:24:24

There is SLA, and we're also working with teams to sort of try and dedicate some portion of time for security asks on a regular basis. Um, and also depends on the criticality of the bug. Uh, it could be, for example, there could be a bug which is pretty benign and has not been exploited. There's none unexplained, so that could take backseat. But once we have evidence that, Hey, this is being exploited in the wild, we need to sort of get, go and get this done. Now that that's when the conversation changes to this is why it needs to be done. So essentially the bottom line is unless we have a valid reason to sort of ask them to sort of stop what they're doing and work on security. We don't usually do that. It's security. I, it should not be a hammer that we use and say, go do, does it should be more of a collaborative, uh, working session between engineering and, uh, us in the long-term to sort of build that trust relation and sort of find that balance between security and, um, feature. Hmm.

00:25:30

And I think this is pretty interesting. I think if it was to even take that earlier, so let's say there's a developers working on a new feature and they're going to, they need a new package, or they need to use a specific Docker container. Um, there's, there's a balance really between you giving a developer enough, uh, enough space in which they can be creative in which they can choose different tools or different, uh, libraries that they want to use. But there's also, you wanted to make sure that there is maybe guard rails or some, some way in which that developer is being sensible with what they choose. How do you balance that and how do you know when to give developers more kind of rope, uh, uh, to, to, to get express themselves more freely?

00:26:12

I think that's the change of thinking that you need to look at in this case, because it's security is not just responsible for security affiliate, right? We are here to enable developers to embed security into their products, and they're ultimately responsible for their products. So the way I look at it, as we give them the right tools, the right results, the right guidance, and we kind of make a collaborative decision of like what takes precedents.

00:26:42

Awesome. Yes. Thank you very much. This has been a really, really interesting chat and thank you for joining us today. Thanks.