How Engineering does DevOps using Slack

Around the globe, DevOps teams use Slack for everything from code reviews to cross-functional communication. Now you can learn how to unlock these capabilities, as well as how to build a DevOps culture that’s ready for change.


In this session, we’ll show you how technical teams can centralise and automate their workflows through Slack using all their favourite tools and apps. You’ll come away with an understanding of how DevOps teams can work more efficiently with Slack, leading to stronger features and faster releases.


This session presented by Slack.

VB

V Brennan

Regional Lead Engineering EMEA, Slack

Transcript

00:00:06

Hi, I'm B Brennan from SAC. I'm delighted to be here today to talk to you about how we leverage the lack of the platform for our dev ops activities. Please come along to our slack channel after the session for Q and a, but let's dive in. I'm going to start by telling you a little bit that me, um, I've led development teams, operations teams, and even it teams, I'm a big proponent of dev ops. I think it brings together two very, very, uh, important on different skillsets. Uh, I think we build better software together and I started my career at my technical career, right at B and Zed. Um, it was surprising you modern for a 150 year old bank. Um, it was where I had my first introduction to agile and dev ops test-driven development and lean. Uh, I got to work on some very sexy, uh, features like, uh, foreign exchange and digitizing a modernizing from there.

00:01:06

I moved to Spotify and because of my interest in agile, it was very like going to church. Uh, it was quite a ride at Spotify. I really thoroughly enjoyed my time there, but we decided to move back home to Ireland to be closer to family. Um, when I was looking around for somewhere to go to slack was obviously a standout candidate for me. Um, I'm really passionate about communication and collaboration. So it was something that resonated with me on a personal level. I've been really excited to see slack girl, especially in the recent months and be part of a tool that's literally helping people stay connected in times like this. So, uh, it's a very inspiring and personal connection that I feel I have for slack. So what are we going to talk about today on the agenda we have? And I will just move my weakness on the agenda.

00:01:59

We have a quick intro to slack for those of you who don't know our history. And then we're going to look back at how dev ops emerged at slack. Um, we're going to talk about some of the challenges that modern software teams face. And we're going to show you some examples of how slack innovates to actually address those kinds of challenges. So what does slack for those of you who are, that are not familiar? Slack started out as a video game called glitch. So it's not my current day job. I kind of wish it was, this is a screenshot from the game called glitch. It was created by our founders and the same founders who created flicker that, which was an early photo sharing platform back in the early two thousands, unfortunately for glitch and it's small, but loyal fan base. Uh, the game never really had broad appeal.

00:02:46

And so they decided to shut it down. But when they explained the situation to their investors, they asked if there was anything else that they thought they could take to market. The team had developed an app to encourage and collaboration, and they really felt that they couldn't work without this app. And maybe that this was something that other people would enjoy also. So slack back then was all about persistent chats. It had a few integrations, like you can see the Google calendar integration right there, but the user experience was really clean. And it was something that everybody really enjoyed. Anyone who interacted with it really enjoyed the user experience fast-forward to today, since 2003, we've now over 12 million daily active users. And over half of them are outside the us. Collectively, those users send a billion messages a week and last year we reached a significant milestone by going public. There's no debt. We benefited from a great product market fit at the time. So let's jump into our dev ops story.

00:03:53

So we've observed at least that systems are becoming more and more complex teams have had to change to keep up. Since the early days dev ops has become a standard way of working, but also a competitive advantage operators, right? Code developers, write config and need to understand how the system works under the hood. The world continues to move faster and get more complex. And the result of this can really be overwhelming if you're not in front of it. Dev ops is about leveraging people with broad system knowledge around a particular change or problem. We've approached it from the perspective of service ownership and tooling. And I'll talk about both in turn. So when we talk about service ownership, we talk about it from a perspective that teams own their end-to-end customer experience. So we do that through faster tooling in slack, the minds, this is a mindset and a culture we've fostered.

00:04:53

Um, and it's very much a journey it's not complete. When we first launched the idea of service ownership. It was quite scary for development teams and it's been something that we've actually had to work on a lot and offer more and more tooling. Um, we do believe it's true to our legacy of not creating silos, um, customer handoffs. Um, but as we've evolved, it has become harder. Uh, slack is harder to navigate. There's fewer people in the organization who actually understand how the full system works end to end, um, and we've needed to help developers feel successful and safe. And for us tooling is the key. So these tools make all the difference. We've built tools to support developers, to manage deploys, logging, alerting, escalations, and support. We also have an embedded SRE model, which is helping us grow broader scales in teams.

00:05:48

But assuming this ownership is not free, we don't just shift the work from one team to another. There's an emphasis on preparation on support and goal is to reduce the burden, but still empower responsibility through visibility. So we believe incident response works best when we have both the system and the developers responding together, we have provided two links so that developers don't have to understand how Promethease or Terraform works. What we want is to make the developer experience efficient and joyful. So what do we mean by service ownership? Well, like I've already said, we're talking about the teams being responsible for managing their customer experience and to, and this means managing their monitoring and delivering their software to use as in production. But it also means after that, that they've got service health instrumentation around service level objectives, that they've got good monitoring and alerting for rapid response.

00:06:45

When there is a problem, we also have production readiness reviews and deployment risk assessments to make sure that when we're putting a major new feature into production, that people have taken the time and energy to make sure that it's not going to, to break production capacity and performance monitoring is something that's really important. Um, we saw that really play a huge role back in March when we had a surge in activity, when the whole world started to work from home, thankfully our capacity and performance planning had kicked in. Um, we saw the majority of our systems just scale automatically. It was quite a thing to see. We also ensure that teams have PagerDuty rotations. We've got a solid incident response process on postmortem activities. So the goal is, is that we take, uh, our teams from feeling scared and worried and unsure to creating this joyful experience. So it's about taking the anxiety out of, out of operations. Um, I'm replacing the ambiguity, uh, by, uh, extrapolating that away. There is a lot of surface area for people to cover. So we have crew used slack as a platform in order to insulate them from all of that change and all of that knowledge that they would normally have.

00:08:06

So this is where dev ops comes in. So dev ops means reducing risk through tools and culture. At least that's what it means for us as systems increase the number of platforms you need increase to prevent FAPE or failures, like all dev ops organizations. We look to reduce repetitive manual tasks, um, reduce the need for operator intervention. The difference is that we've leveraged slack as a platform to do that. So here are some examples. This is an example from internet relay chat of, uh, developers, operations, their logs, uh, alerts and monitoring all, having a conversation with each other. The goal here is to enable transparency, collaboration, and integration. Um, we leverage slack as a platform to enable automation and reduce toil or connecting developments and operators and putting key context in place to unlock new workloads. So in the very early days we used internet relay chat. And one of the important innovations that we made early on was that we saved this information so that it could be searched later on, provide a lot of context for teams at a later date.

00:09:20

So here's an example of one of our, uh, most popular, uh, and joyful integrations. This is deployed wizard. So through deploy wizards, we can, uh, look into our continuous integration, continuous delivery pipeline and tell developers when their PRS are being deployed. And by him, we do have automation on alerting around these things, but it is really nice for developers to know when these things are actually happening. So it's a really simple interface to that empowers important behavior like integrating several systems and workloads all in a single place. JIRA is our source of truth for work in progress. So JIRA integrations allow us to see the context of a ticket in slack without having to leave the app. So there's no time needed to switch from one application to another. I get regular updates from JIRA bot of tickets that I'm interested in, or maybe new tickets that have been assigned to me, new comments, status updates. I can also create a new ticket directly within slack using the JIRA integration.

00:10:26

Escalation bot solves one of the tried and true troubles that we have with, uh, with operations who owns this feature. Um, as an incident commander myself, I know how hard it is to figure out who owns what nevermind grappled with team names. So a scalp bot helps us save time and putting the right people in front of the right problem. Previously, escalations went via human operator. Now escalations go straight through to the right team on their pager due to rotation, if necessary. This is critical to save time and help enable teams, uh, to keep up with an ever changing landscape. We are still available as major incident commanders if teams need help, but it does strike a balance between creating that independence for folks and autonomy while also being there to provide support if they need us. So the app is just a slash command away, anywhere on slack.

00:11:23

There's a lot of functionality here, but what I want to highlight is the centralized feature of who owns what. So this is rooted in some fuzzy logic that helps people enter a description of a feature or a problem that they're seeing. And we see that we've some non-obvious results that the system throws up. In this case, I'm going to stick with the anatomy team inline. We provide access to their JIRA project, as well as the collection of special purpose challenge channels that we have related to this team and this service. So let's look at direct actions that we can take from here.

00:12:06

We're going to escalate. We can either page a team or start a major incident. So in this case, we're going to page an engineer. We can decide the severity again. We see that for the largest care for the team that we're looking for. We've got some choices. I would page our anatomy team, but for today, let's leave them to carry on what their job, uh, farther innovation that we have to that is our incident. But the purpose of incident bot is to speed up the creation of incidents, establish clear communication and ownership really, really quickly. We automate the creation of JIRA incident, tickets and notifications to relevant challenge channels. Um, and we can also provide a technical summary based on the information that's been entered into the channel.

00:13:03

Kevin look here. So again, it's just a slash command away. You can just type incident PDE and you can see a list of all of the different types of activities. And here we have, we've run the command to view all current open incidents. In this case, we're going to open a new incident. So we write a brief summary, choose the severity level and set the incident commander automatically. We can see a JIRA incident ticket is being created. The slack channel name is being created and it's posting the incident notification channels depending on the severity that could be to the team. It could be to our, uh, to our exempting. So the incident channel has been created. What we're going to do now is update the incident commander. Again, it's just a slash command away. We just type I see. And from the job dropdown, we select the new incident commander. We can also update the severity of the incidents at any time I just typing, uh, again, and we choose from a dropdown and change it to severity to.

00:14:20

And lastly, when it's under control or resolved, we can change the state. So we just type state under control. It's automatically updated the heading with it under control. One of the other things I love about this tool is that I can get a summary of what has been happening in the channel for my technical summary. So PDE incidents summarize, it goes to slack bots, slack bot drafts a message for me. I can edit it, fill in some more of the known detail, um, about how many responders, how many users were affected a description of the impact, uh, the times, um, we use, uh, Pacific time for all of our, uh, incidents. So I like that it automatically creates that cause that causes me a bit of a headache, trying to translate it from, uh, British summertime. So once that's done, I get an automatic 10 technical summary in the format that we've agreed on. It's automatically posted to the incident channel. So it's also reduces the cognitive load on the incident commanders at any given time, the incident commander should be focusing on, uh, who's doing what and ensuring that key, uh, key streams are being addressed during an incident, actually trying to figure out how to format my technical summary on my, uh, conditions, actions, and needs report. I should be the last thing that we're in. So incident bot really, really helps us out there.

00:16:00

So integrations mean we don't have to leave slack. They make life simpler, they drive particular behaviors and they help us streamline process. And it does go beyond our dev teams to on the team can use these integrations to update their objectives or key results for a quarter approve an expense came or even apply for leave. So some of the most common ones that I've already mentioned JIRA, uh, we also have code review minions who made sure that RPRs, don't hang around too long without any attention. Slack bot reminds us about our standup on a daily basis. Um, we can also use it to remind us to start a thread about awesome things that we've worked on this week. We can also centralize feature requests, uh, creating a clear space for dedicated content to the development teams, connecting them directly with our customers and social media input so that, uh, product teams can be reacting and hearing the voice of the customer in a daily basis.

00:16:58

It's quite a powerful tool. So when we think about dev ops by function, um, we think about teamwork. We think about observability and we think about, uh, how do we manage our pipelines? And these are just some of the examples of the integrations that we have today. Slack integrates with thousands of these applications. So even if you don't see the tools that you use here, there's a very, very strong likelihood that the tools you use will already have an existing integration with slack, slack builds slack with slack, um, and that is something we're committed to doing. Um, we're committed to evolving how we work this way. Um, we want to share that experience with, uh, the wider community. Um, we believe it creates opportunities every day for to improve idea of off workflows. We also think it creates opportunities every day to help us improve slack and how it works. Um, we believe we're making that more joyful experience for our developers. So we know that that the methodology of transparency, collaboration and integration is applicable beyond DevOps. And we hope to continue what we learn right across the dev ops and engineering community, but with the broader community that use slack also, thanks so much for your time. Join us over in slack as dev ops and our slack channel for Q and a. Now, um, I've really appreciated you taking the time to spend with me and we'll see you there. Thanks.