Disney Global SRE — Creating Digital Magic

Disney is one of the world’s largest media companies and home to some of the most respected and beloved brands around the globe. Embracing the latest technology is an important strategic focus at Disney, allowing guests to better connect with Disney and allowing Disney to better connect with guests in innovative and delightful ways.


We will tell you a story about a century-old organization that has scaled its SRE practice to ignite digital magic across the globe. This team of SRE Jedi Knights is on a mission to foster curiosity, communities of practice, and technology awesomeness while venturing where no SRE has gone before.


In this talk, we will deliver epic stories of successes, setbacks, and failures while pushing large-scale platforms to their limit and delivering the best in-seat, digital experiences, products, and content to our guests and subscribers across the globe. Showcasing some of the technology and automation we have built.

AV

Alexi Varanko

VP, Transformation Engineering, The Walt Disney Company

JC

Jason Cox

Director, Global SRE, The Walt Disney Company

BS

Brian Scott

Sr Staff Technical Evangelist, Global SRE, The Walt Disney Company

Transcript

00:00:14

Thank you, Ron. So I have been studying Jason Cox director of platform and SRE at the Walt Disney company for over eight years. And I was so delighted that he was willing to be a part of this program committee from the very beginning. So one of the most amazing things I've gotten to do in my career was shadow him for an entire day. It was an amazing and exhilarating experience. And I learned so much, but the biggest surprise happened at the very beginning of the day when we were at the Starbucks on the Glendale California campus. And he was walking through his calendar describing what was gonna happen that day. But then a woman from across the room saw him ran over to him saying, Jason, Jason, thank you. Thank you so much for saving our butts two weeks ago. And it turns out that she was one of the CTOs of one of the major business units and something like that happened three more times throughout the day.

00:01:07

In fact, I think I got to see him thanked more times in a day than most ops practitioners get thanked in an entire career <laugh>. So I am so delighted that this year he's able to present with his boss, Alex Vaco, VP of it engineering strategy and operations. And with Brian Scott, senior staff technical evangelist for SRE, they will be talking about how they organize themselves to elevate developer productivity and skills around SRE across the entire Disney enterprise. This is an incredible continuation of the journey that Jason has shared with this community from the very beginning of this conference, here is Jason, Alex and Brian.

00:01:52

Hello, and thank you for joining us today. My name's Alex Verrano and today you're gonna hear how the enterprise technology transformational engineering team helps create digital magic here at the Walt Disney company. But first I have a short little clip here from some of our top technology executives across the company that tell what it means to create the magic here at Disney, keep your hands and arms inside the vehicle at all times. And hang on. You're about to be taken on a great ride

00:02:24

At now. You're host Walt Disney in our modern world, everywhere we look, we see the influence science has of on our daily lives. Discovery is that discoveries that were miracles a few short years ago Are accepted as commonplace today. Many of the things that seem impossible now will become realities. Tomorrow.

00:02:48

Today at Disney, we have amazing teams creating magic with technology.

00:03:06

Technology's part of the DNA at Disney has always been part of the DNA at Disney, but it's always been in service of telling the story

00:03:13

That the guests walk away talking about the technology we haven't done our job. It's all about the experience

00:03:19

I care about creating technology that really builds wonderful experiences. And I think there is no company where you can do that in the way that you can. Here.

00:03:27

We believe that a close partnership between art and technology is essential to creating compelling stories with engaging characters in believable worlds.

00:03:35

There's so much passion on the part of all of the teams behind what we do to make sure that what we're building lives up to that Disney expectation

00:03:44

With a big idea, there is very little friction in making it into something through the use of technology

00:03:50

With industry leading technologies like augmented reality and machine learning behind the scenes. We were creating awesome experiences.

00:03:57

That to me is really super exciting as a technologist to continue to move at the very front edge of the technology in so many different areas.

00:04:06

Tech at Disney is one of the most important things as we make our transition to a direct consumer world to borrow a baseball metaphor. We're in the second or third inning of this whole streaming saga. We really want to have something for everybody.

00:04:21

During the last few years, we've ventured into a lot of different teams, had the opportunity to meet and work with a lot of wonderful people. We hope that you will join us and that you'll find here a place of knowledge and happiness.

00:04:41

Wow. I always get excited when I see it and hear about all the great things we're doing across the company. First, I wanna talk a little bit today about how Disney's structured here. You can see the different business segments that we have here at the Walt Disney company. And there's three main areas. There, a group that focuses on telling great stories, generating great content, whether it be theatrical releases, animation news, or even sports. The studio entertainment and sports team is really focused on creating the best content out there that lasts generations. The second group is really focusing on media and entertainment distribution. So the streaming services that you all know and love like Disney plus ESPN plus hot star movies, anywhere in Hulu are really focused on how do we get you that content reliably in the best quality on any device in the final group, there parks and experiences and product taking the great stories and the great content created by the studio, general entertainment and sports groups, making them a reality, bringing them to life and immersing you in those experience.

00:05:52

And guess what all of that is done through the use of technology and people to delight the guests and create magical experiences. We work in enterprise technology. We're here to help those teams really focus on driving business, differentiating results across the industry. But sometimes when people look at us, they kind of take a look at us and say, Hey, you're here to help. And they view this is that a star destroyer, but really what we want to do is figure out how can we make it magical? How can we make it a great experience so that people can optimize and make things more reliable, make it easy for them to spin up new systems, innovate new ideas and create new experiences across the board here. And so my team today, you're gonna hear some stories about how this has evolved over time. Jason Cox is gonna talk about the evolution of the SRE team here at the Walt Disney company. And Brian Scott is gonna talk a little bit more about some of the engagements that we've had, how we've helped teams deliver some of the magic here. I'm gonna turn it over to Jason now to talk about his space there.

00:07:07

Thanks Alexy. Well, hello everyone. We're here today to talk to you about SRE and how at Disney we're using it to create digital magic. I wanna roll back in time and talk to you about some of the challenges we faced early on that really shaped who we are and our identity in terms of SRE at Disney, a little over 15 years ago, we got a new CEO, Bob IER. When Bob came in, he sat down three strategic pillars that would drive growth in the company to move us forward. Number one, he said, creativity. We're all about telling story, delivering compelling content, delivering experiences and products that delight our guest around the world. And he said, second and second, only to that will be technology. That is how we create that content and those experience, how we deliver that content and those experience to our guests will be increasingly through technology is how we're gonna better connect with our guests and allow our guests to connect with us.

00:08:06

And third globalization, we're gonna expand across the globe, opening up new markets, connecting with people all across the planet, but more important than that, that the content that we deliver, that the stories that we tell would better represent this amazing spectrum of human family across the globe. Not only on screen, but behind the camera with those strategies in place. What we saw was explosive growth in the company in terms of systems, in terms of connecting in terms of the technology used to power the business forward. And of course that hits us. Those of us who were in the operation space. We went from tens of servers to thousands, to tens of thousands of servers in very short order. And we had to face and manage all that scale was a big issue for us. And then there was speed. The way we're gonna better connect with our guests is gonna be about delivering features, new products and those experiences to our guests in short order, speeding up the process that allow us to deliver that magic was super critical for us as a business.

00:09:13

Yet we were really inhibited by all the manual slow processes and red tape built up over the years, almost like scar tissue calluses that kept us from moving at the speed we need to move at and instability because of that scale, because of that enormous growth, we suffered problems with reliability, resiliency, things wouldn't return or recover after an outage occurred. And of course, security became an issue at that scale, as well as quality. We weren't delivering the quality that we wanted to deliver. And of course, stagnation, which is that the teams were so busy doing the daily work. They had no time to improve the daily work you ever been in that situation before. Many of us have well, that's where we were and all that results in stress burnout. There's so much that you have to know and manage and deal with at a great scale that that cognitive love begins to result in burnout across the teams.

00:10:12

If I were to cast sort of that, that world and give you a sort of a glimpse into a picture of what it looked like. Um, several years ago, it was probably not unlike many other enterprises that is that we are very siloed. Um, we were processing through projects in a very waterfall way and everything was hyper transactional. If you needed something, it required a ticket, uh, ticket cues everywhere for everything. In fact, I used to joke, you couldn't even go to the bathroom without a ticket. Um, tickets, how we got things done. Unfortunately, tickets were slow weeks at a time before you would ever get a response months at times for even getting systems stood up. And of course everything was manual. That's how we did everything back then. If you needed a server, we will hand carve you a server handcrafted. And of course, when you begin to do things like that, you begin to name 'em cuz you have all this investment in these systems.

00:11:11

And we did that at Disney. I joke about this all the time, but it's so true. We had, we had systems that were named grumpy, sleepy, bashful, you name it. And the downside to naming servers, treating them like pets or people is that they begin to develop a personality. And that's exactly what we saw, uh, grumpy throwing errors all the time, uh, sleepy, oh, latency problems every day. And of course bashful Bashall would disappear from the network for days on in <laugh> all that to say this that wasn't a plan for success. In fact, what it began to result in was a bunch of operational heroics to be able to handle the scale and deliberate, the speed that the business were demanding. We were starting to experience burnout teams began for some odd reason, celebrating the amount of hours you could go without sleep. How long you could spend on that release call.

00:12:09

How many days could you go in a service interruption? That was not good. It was not humane and burnout. Just like our two's friend here was happening all across enterprise. Well, we had to change. We needed to change. It was critical that we changed. And so we changed our name. <laugh> now I joke about that, but it's so true. We did change our name. We became systems engineers. The point was this, we didn't wanna just drive the train. So we wanted to help build them. We thought the only way to get in front of this thing was become part of the solution. Let's engineer, our way out of it in an odd way, it really changed our identity by just using engineering in the title. We begin to reframe that we were capable of solving the issue, how we perceived ourselves, how others saw us begin to change.

00:12:59

And as an enterprise, we went on this journey and different groups and our businesses began to be way more agile in the terms of how they delivered value. We went from being very siloed to more embedded. One of our critical steps was taking these engineers and embedding them in the product teams in the businesses to better understand them, to really grasp what's going on. And of course that meant that we could be way more integrated in the solution, no longer everything through a queue let's platform, it away API calls, not ticket request that led to automation. We begin to automate the infrastructure. We begin to leverage and use the cloud. We begin to no longer naming servers, but treating them as they should be pods host that are just running workloads ephemerally could be created, destroyed, recreated telemetry, all defined through code. Everything began to become code all of operations with software.

00:13:58

And that transformation of course led us to where we are today, which is reliability engineering. And as SREs at Disney, the important part was that we began to see ourselves as part of full stack teams. You've probably heard this concept, right? Uh, full stack engineers. In fact, there's a lot of job openings for that, right? People are looking for full stack engineers. Let me just say, I believe that is inhumane to be able to have that breadth and depth of every technology required to be able to deliver value for a business in one person is really untenable. And, and yet we wanna be part of teams that are empathetic, that really understand all the disciplines. So we've shifted to this concept of T-shaped teams, which is, if you look at it from a T standpoint, you have this horizontal bar where everybody on the team understands all the disciplines.

00:14:54

You understand quality, you understand security, you understand cloud engineering, you understand systems engineering, you understand software engineering, right? But you as individual, you bring a vertical depth one area in which you are the expert on the team that you're adding that virtue to the team to form a full stack team through individuals that are T-shaped on the team and that's the direction we went self-service platforms. So the idea there is that one way to abstract and remove some of that cognitive load is to platform a way where we can, we put an API in front of it instead of having an expert that has to deal with all of the nuts and boats, all the Lego blocks underneath the platform, use platforms and leveraging self-service platforms to be able to deliver that and where we can, we even get it ourself out of the way, just give the businesses, the capabilities to be able to access in a very user experience, user interface way.

00:15:49

And of course, software ended everything software, right, treated as software, treating operations like software, where we have committing of, of code that is infrastructure. We have peer review of infrastructure S code what we have continuous delivery of infrastructure, just like we do with our co that allowed us to begin this transformation of helping all these different teams all across Disney move from what would be traditional ways of working traditional platforms of technology into some of the new ways of working and new platforms. And that transition is possible because we were embedding inside the businesses to understand their needs, to help pull them into some of the new ways of working new technologies, Kubernetes, as ways to begin to ship things in containers, right onto platforms that are just scheduling these containers and looking at that from a greater and greater scale to support the growth of the business, pulling all the teams into these different spaces.

00:16:45

So what is a Disney global SRE? Look, I won't read through all these, but I think this could be helpful. There's a lot of just sort of textbook here. I won't, I won't try to highlight this because Google's written the book on it. Please pick up the book. You should read that, but there are some nuances here that I would highlight that are specifically related to Disney. One is this last point I have on here as an SRE, we are technology Sherpa. What we have found is that as we're embedded in these different teams, as we begin to work with the different product groups and the software engineering teams and the businesses, uh, they need help navigating over the mountain, the obstacle, the challenge. And so we play the role of that Sherpa. Here's how to get over the challenge. Here's how to connect the technologies together.

00:17:30

Here's how to integrate with systems and to look at it systemically across the company, not with local optimizations, what may affect your throughput, but with systemic optimization, looking at all the different constraints that could be imposing, uh, friction slowing us or reducing, uh, service levels for those services that we operate and deliver for the company. One of the things that was really helpful for us as we began this journey and really this new name SRE at Disney, was to identify who we were. And we did that in the first person. I don't know how many of you have done this, but to write out your description of your group, your job in that first person declaring what you do. This was helpful for us as a lens to understand our identity, our value to the organization, what we should be all about, what we should optimize for.

00:18:30

If there's anything sort of take away from this talk today, please take some of these slides and consider this for yourself. You know, mileage age may vary, but this could be helpful for you, too, whatever your group is, sit down, define who you are through that first person, declare that out and start to think about that's how you optimize for success on the team, super helpful for us, right? And highly recommend for you consider. Well, the way that we engage at Disney is in three different areas. Man, I wanna talk about these because these are important to us. I talk to so many people and so many different other businesses that have SRE teams and they deploy 'em completely different and you will too. I think everybody's different in this case, but at Disney, what we found was a lot of value in these three different areas.

00:19:16

So the first facet was that we have teams that already, uh, have SREs our job. Our role that we see at the global SRE standpoint is to help partner with them, help them connect with other SREs, the larger knowledge that community of practice around SRE the tools, the shared tools, those platforms, the knowledge that we can gain the training to help continually elevate what we have found over and over again is that when we do that, we find pockets of awesome that we want to share with the rest of the organization. So we can help be, as Brian, Scott often says the cross pollinators of cool, and we do that cross pollinating some great technology, great tools, great knowledge to help all of the enterprise. Now, second way that we engage in this other facet is all about where, uh, teams have a burst need.

00:20:12

They may not have an SRE team, but they have a product they're about to launch. So they reach out to us to be almost like that consultant, to engage on a particular launch or a particular problem, or maybe just a staff augmentation. We're almost operating in that agency model internal to Disney. And the advantage of that is that we bursted into these different areas in Disney that have a need for SRE. And, uh, we can burst back out whenever the need shrinks or contracts, but by doing that, we don't lose the knowledge. It stays within Disney, even though they come back to core and reassign somewhere else, those SRE are still here. Part of Disney, they bring that expert, understanding that history that could bring to bear back into that engagement into that business, where they were previously. And we often see that it could be a short term thing, just really a few calls to something that's long term.

00:21:08

Couple years we do that wherever we can to be able to help Disney move forward, uh, using and elevating the SRE practice across the company. And then the final facet is areas where we have embedded teams by embedding SES deep into a product team, into a business. You really get to know the business. You really get to understand the challenges that they're facing. And you're better able to represent that back to the global company perspective, that enterprise perspective of what is needed to be able to optimize tools and agreements and platforms for those particular businesses and product teams all across Disney, by understanding its culture and what it values and optimizes for. We gain all this knowledge to help them succeed even further from a global enterprise standpoint. So those are super important for us of these dedicated embedded teams. We have teams that are embedded in all sorts of areas across Disney, all deeply understanding the business and providing that very tactical real time support for all those different teams.

00:22:14

Well, that gets to our goal. And I think that you see that having those different facets in way that we engage allows us Steven, accomplish some of these goals, helping our businesses, ship content products and experiences better. That is higher quality. We also wanna do it faster, get rid of the friction, go faster, remove toil. That is automatable work that is keeping us from investing more in quality and delivery of experiences that delight our guests gotta go faster, right? And so shipping faster is super important to stay market relevant and then safer protecting our guest data as well as our company, data is paramount and then happier we ship happiness, right? That's what we ship as a product as a company. But more importantly, that the way that we ship that happen is having happy cast members, happy employees. And so it's really important to us that as we optimize for delivery, we're also optimizing for happiness on both sides of the equation. Those that are creating the happiness as well as those that are receiving it. And that's what we're all about when we think about how we're gonna win, getting new technologies, new ways of working out to all the different groups all across Disney is really important to us. And it's about evangelism. Well with that, I'm really happy to, to introduce Brian Scott, and let him talk to you a little bit about what a technology evangelist looks like here. So it away, Brian,

00:23:46

Thanks, Jason. And let's see for that intro. Hi everyone. I'm Brian Scott, senior staff tech evangels at the Wal Disney company. Today. I'm gonna kind of walk you through how we, uh, evangelize technology here at the company to really empower our teams through faster and be safer in the cloud as well as on-prem and how we deliver better guest experiences. I'll kind of also walk you through some of our engagements that we have throughout the enterprise and how we embed some of our SRE teams to help support our businesses. One of those ways that we evangelize technology is through the Jedi engineering trading academy. This is a forum for us to invite external folks as well as teams internally to kind of give talks about the awesome that they're building and kind of show off some of the automation and tools they're using to help share knowledge and empower others to actually like do the same.

00:24:33

This allows team members to engage in a community, share ideas, meet with each other, also educate others in different ways that we leverage some of the same tools. Uh, other ways that we also share is via the communities of practice, whether that is infrastructure, to share industry trends, best practices, uh, secure the development pipelines to allow our security teams to evangelize ways to bring security forward into our prop lines and treat that as a first class citizen, as well as through forms like the SRE tech emerge form and Disney tech bites, which is a place for us to write blog posts, send out newsletters and create podcasts again to further connect teams together. Awesome list. If you're familiar with the Python awesome list or the go awesome list on Git up.com, we brought that same LA methodology list internally to allow teams to contribute in a merge request or pull request type fashion, to add their knowledge to a growing list that teams can read curate internally within their own teams, then also contribute back the awesome that they're building.

00:25:40

And again, further, uh, create a better community to share knowledge. Now I'm gonna kind of walk you through some of the SRE, uh, platforms and engagements that we have here at the Wal company. So in the past, it took a while for teams to be able to, you know, pull out their credit card and grab that cloud account. So to make things easy, we've developed a platform to allow teams to go to a single URL and be able to revision a cloud account, whether that's in Google cloud, Azure and or Amazon, and be able to spin up a new cloud account that has sane and safe defaults built in to allow them again, to move faster and be able to hit a ground running, to be able to launch their products and services. The Dr. Platform is a logging platform that allow teams to easily ingest log data and be able to create dashboards and understand that data better.

00:26:29

Whether they're trying to, uh, create dashboards from a leadership perspective or operational dashboards for their team to understand how the platform was running and actually operating. One of our embedded teams within imaginary was instrumental in helping the Imagineers launch Galaxy's edge with such attractions like smugglers run, which is actually powered by several like GPUs and creating these automated pipelines to allow Imagineers to easily get builds and code changes out to these attractions in a safe and fast way for them to iterate and be able to, again, better the guest experience as fast as possible, or our latest attraction, a gala star wars cruiser that was launched in all Disney world. Our SRE teams are bringing SRE mindset to how we deliver attractions within the park. They're working side by side with Imagineers to make sure that they have an instant, fast feedback loop so they can iterate on changes as they are designing empowering tools and services in our theme parks to again, bring a better guest experience.

00:27:35

Our SRE flex team has been responsible for launching quick products for product teams that need to get to market fast, such as with our reimagined tomorrow site SRE team at movies anywhere are empowering teams to deliver a streaming platform to users to share their entitlement via different video, like retailers with such tools like Kubernetes and Terraform and Amazon web services, bringing things like security as a first class citizen, working with our product owners and service owners on defining SLIs and SLOs understanding what better availability means planning for failure, because we all know that there is no such thing as 1% uptime, you have to plan for failure, reducing toil, making automation key to allow teams to move fast and reduce the amount of manual tasks that are done and monitoring. Thinking about monitoring from a user perspective, what do our users expect when they are using a platform like move anywhere.com?

00:28:34

We take advantage of multiple different tools, including serverless. We launch number of different sites completely on such platforms like Amazon Lambda and Google cloud run, helping teams understand the interconnectivity of all their microservices, helping them understand latency between those different microservices and providing them the dashboards and monitoring tools. They need to be effective when they are building and, and deploying serverless applications through a C I C D pipeline. The team over at imaginary launched an internal platform called Axio to empower our imaginaries to move faster, which such tools like GitLab, Amazon web services and Terraform. We're able to allow our teams to instantiate their repo from the start create basic templates via a CLI tool. So when they push to GitLab, the automation takes effect and their service is fully operational with a URL logging and monitoring all baked in to allow our app teams to move fast and have the tools and data.

00:29:32

They need to be able to instrument and operate their service, providing such solutions within the Axiom platform like secrets management, crash, reporting, monitoring, having a way to not only pre-con containerize these applications, but also providing secure remote access via tools like sets manager, where we don't have to expose SSH or any other type of port making it easy to do the right thing. When you make things easy for folks to kind of hop on and use, they'll just use it, providing those guardrails, providing the easy buttons to make teams move fast without having all that extra toil and friction of filling out Excel spreadsheets or opening tickets is a dream come true. Teams want this. And so again, it's all about making things easy so they can move fast and efficiently and also reducing toil down for your own team. I, myself am also embedded with an engagement with ILM, helping ILM understand and be able to instrument new technology so they can again provide a great experience when they're developing VFX for our movies. And thank you. And with that, I wanna hand things back off to Jason.

00:30:40

Well, thank you, Brian. Well, hopefully you saw how we are meeting our goal of helping our businesses, ship content products and experiences better, faster, safer, and happier. I love ending with this quote from Walt. I think it really sums it up. We talked about creating magic today, but I think that Walt put it best when he said there's really no secret about our approach. We just keep moving forward, opening up new doors and doing new things because we're curious and curiosity keeps leading us down new paths.

00:31:17

Wow. Weren't those some great stories that they were telling there. And they're also really great technologists as well too. Hey, I wanna thank everybody again for joining us today. We're always looking for great people to join the rebellion and help us transform how we're delivering the best content, the best services, the best experiences to our guests. Again, thank you and have a great day.