DevOps for Pandemics

This talk looks at the ways in which DevOps principles can be applied to help manage in times of crisis. Sean takes a deep dive into the Wiley response to the COVID-19 and how this draws upon DevOps best practices. We will look at what these practices are and how you can apply them to crisis and major incidents of all types.


Key takeaways:

- Learn how the principles of DevOps can be applied to crisis such as the the COVID-19 Pandemic

- Hear about real-world experience of Wiley about how these principles helped us accelerate in the face of crisis

- Understand steps you can take to build a more adaptive and resilient organization

SM

Sean Mack

CIO & CISO, Wiley

Transcript

00:00:15

Hi, I'm Sean Mack. Welcome to dev ops for pandemics at the DevOps enterprise summit COVID-19 has transformed the way we live and work today. We sit a minute, a pandemic whose effects extend well beyond today and into the future of work. This is truly a unprecedented event and it's forced us to adapt in ways that we could not have anticipated key DevOps principles and practices such as collaboration, transparency, and automation standard. The core of our ability to adapt and respond. I am tremendously excited to be here today at the DevOps enterprise summit to share a bit about what we saw at Wiley, when a global pandemic impacted businesses around the world and how the principles of DevOps helped us respond. Take a little time to provide an introduction and myself and Wiley and the impact we saw immediately following COVID-19. And then I'm going to get into some of the principles at the core of crisis response.

00:01:30

Then I'll get into one example of what we saw on our applications, the impact we saw from COVID and how we were able to respond. Finally, I'll talk about next steps and where we can all go from here. I've got a lot to cover today, so let's get started. I don't want to spend a ton of time on myself or on Wiley, but I did want to introduce myself to provide a bit of background. I'm Sean Mack, I'm the CIO and CSO at Wiley I've led global teams across a wide range of companies from large financial companies like Experian to innovative tech companies. Now, prior to Wiley, I ran global a global dev ops company that built open source software and help other companies through dev ops transformation. I've been part of the dev ops world and the DevOps community for quite some time, which is one of the many reasons I'm excited to be here today.

00:02:34

Now, Wiley is primarily known as a publishing company, but now more than ever, we're a tech enabled research and education company. Wildly success has long been defined by our ability to listen to the world, to adapt in response today in the face of a global pandemic. Well, wildly does research and education is more important than, and for Wiley, the only path forward as always is to keep evolving in pursuit of our mission, to help the world heal, recover, rebuild, and thrive. Now we've been on a journey of transformation together since well before. COVID-19, I've personally been with Wiley about two years now and when I started, so this is prior to COVID-19 our CEO, Brian ne PAC said that we're not in a single market. That's not going through major disruption. And he was right. If you look at our markets, everything from research to education to publishing are transforming dramatically. And those changes are only accelerated by the crisis we face today. Now why Lee's a company that's been around for over 200 years, and we've only been around for that long because of our ability to continue to transform. And the fact that nearly 80% of Wiley's revenue is generated from digital products and tech enabled service. And that this continues to grow is Testament to that transformation from publisher to digital disruptor.

00:04:13

So I don't know about you, but I remember clearly on March 13th, what was happening now, we left the office March 13th. It was a Friday and we left the office thinking we'd be back at work on Monday. I was lucky. I had some idea that maybe we wouldn't return eventually. So luckily I removed all the perishable food, uh, fruit and whatnot from my desk, but I thought we'd be back, but I'm more March 14th that Saturday we decided we would not return to the office. What we saw at Wiley at that time was that the provisions we'd put in place to enable hundreds of remote workers combined with planning around redundancy and resilience enabled us to scale to thousands. We shifted resources to staff up our service desk and provide additional security. We allocated additional bandwidth. We saw applications automatically scaled to meet demand and where they did not. We were able to rapidly push changes to improve them more on that. I'll talk more on that later. Ultimately, we were able to move 9,000 plus employees to work from home in a matter of days with almost no impact. Not only that, but this crisis opened up an opportunity for acceleration on multiple fronts. And this amazing response was only possible because of the underlying dev ops principles.

00:05:53

Now, I know you are all very familiar with dev ops, so I'm not going to spend a ton of time on the, what is of dev ops. But I did want to say that at its core, for me, dev ops is about collaboration. Dev ops is evolved over the past 10 plus years as a set of principles to respond more quickly and rapidly to business and technology to the rapidly changing business and technology landscape dev ops has transformed the way we work by bridging the gap between development and operations teams and improving time to market while improving service availability. Now, when I look at this from a perspective of collaboration, the amazing thing about dev ops, and one of the reasons I'm so passionate about it is that it shows that by working together, we can improve time to market and build more resilient systems.

00:06:51

And this is, this is amazing. This is the holy grail that, that tech folks like myself have been at, uh, after, since, well, at least since my career began, I do also want to emphasize the dev ops is not tech. It's a set of principles for delivering value to the customer, focused on collaboration and small batch sizes. And those principles impact people, process and technology, the same dev ops principles, which has helped have helped us deliver better market outcomes have also helped us to adapt to rapidly changing conditions brought about by COVID 19 pandemic. The people processes and technology, which deliver on the principles of dev ops are critical to success during this global crisis. So let's get into a little more detail on those principles.

00:07:52

People are at the center of dev ops and never have people and collaboration been more important than right now in times of crisis. And when tackling difficult tasks, good teamwork and tight collaboration, truly what sets successful teams apart trust and empowerment of the individual. Our core principles of dev ops, these same principles were also critical to success in a remote work environment. I mean, in a decentralized work from home work, from anywhere environment, it's simply not possible to maintain tight control in a bureaucratic or micromanaged environment teams with a trust-based culture where individuals are empowered to act individually are ideally suited to, to thrive in a work from home environment at Wiley, as remote between people becomes ever more critical. So to do the tools that enable this collaboration, we must be able to socially distance our workforce to ensure that nothing we do requires that people go into the office teams as our chat platform.

00:09:13

And we saw a dramatic increase when we shifted to work from home. This was a great example of both the immediate acceleration, as well as the opportunity for additional acceleration, because we went well beyond that initial increase and use this as an opportunity to deprecate our other chat platforms, to get everyone in the same chat communications platform. Overall, we saw increase of 55% and usage on our teams chat platform. After we moved to remote work, a learning culture is another critical element central to dev ops, which becomes increasingly important in our response to crisis. At Wiley, we are focused on education, both for our customers and for our employees. The pandemic forced us to learn new things that are rapid pace from new tools, how to respond to new security threats at Wiley, we rolled out a massive training effort on our collaboration tools, running 12 sessions in the two weeks that followed our decision to work from home, where we train more than 2,700 attendees.

00:10:32

We also rolled out new security training in response to increasing cyber security threats related to the pandemic. As you may know, the us saw a massive increase in fishing and spear fishing activities in the weeks following the COVID-19 outbreak while we have many security tools in place. One of the most important tools is training in response to the increase in tax. We launched a massive training campaign. We also initiated regular company-wide communication through our internal social networks to ensure that all employees were part of the security solution, this ability of a company to learn and adapt our core tenants of dev ops, which are also critical to being able to respond to COVID-19 or any crisis, which disrupts the way we work.

00:11:29

I want to talk for a minute about process. Now I know that some of you may feel that dev ops and process are antithetical, but this is not the case at all. Instead dev ops Vols around processes that are lightweight and automated, wherever possible, Jayne Groll, the, the CEO of DevOps Institute said something very interesting me one day. And that is that there are no processes that are intrinsic to dev ops. And I find that rings true for me. What that means is that dev ops takes its process from other practices, be it agile or it service management and modifies them based on core DevOps principles.

00:12:18

The dev ops processes that we do see are automated often replacing manual process actions with activities that occur occur automatically. An example of this is our white lightweight change management process, which allows for automated changes to be pushed to production without manual oversight. This doesn't mean that we have no change control only that we enable small changes to be pushed automatically with a review and approval that are built in to the development CICB pipeline. These sort of automated and trustful processes were critical to Wiley's response, where we saw significant spikes in demand for our online education products. As millions of students moved to online learning now dev ops technologies, such as and observability help get to the market quicker while increasing stability. These same technologies also help us to be significantly better in responding to crisis small batch sizes delivered to continuous integration and continuous deployment enable companies to rapidly pivot their delivery to market. In addition to CIC D agile and adaptive infrastructure help us respond quickly to changes required to respond to crisis.

00:13:55

I think it's interesting to note that we need to understand what is a DevOps tool. I would pause it that there's no such thing, but if we think of dev ops as a culture of collaboration, then we can consider a set of tools which enable collaboration and empower developers to do their jobs. It's important to clarify that it's ultimately about how we're using the tools and technology and not about the tool itself that makes it dev ops. For example, I've seen companies that use collaboration tools like chat in ways that actually hamper collaboration. In one company I worked with, you have the business teams using HipChat. You had the developers using slack and the operations teams using teams. And that way you see a tool, which is arguably one of the most collaborative in nature, a chat platform being used to ensure that teams don't collaborate and don't communicate effectively with each other. But if we use tools that help helped us focus on collaboration, empowerment, small batch deployment, and automation, it will in a collaborative way, it will enable us to respond better to crisis.

00:15:22

CIC D is certainly one example of this by enabling, enabling small batch sizes at Wiley, we were able to rapidly change our products to meet the demand of the pandemic. Wiley has developed standard tools and architecture for RCI CD pipeline, allowing federated teams to manage their own deployment pipelines. In addition, we built a shared CICB pipeline, which teams can use if they don't want to manage or develop their own CICB system. Transparency and observability are other key principles that underlied dev ops, which are implemented through proper monitoring correlation of monitoring and observability activities enable us to see the problems before they occur and resolve issues quicker. When they do, if this sort of data about how our systems are performing is available, shared and properly configured, it can provide visibility into the rapid shifts, which indicate some sort of crisis. Now we need to ensure that we're monitoring for insignificant shifts that are outside of normal cyclical patterns. When problems do occur, proper monitoring enables us to quickly determine the source of the problem, allowing us to make rapid chain changes and adjustments to provide continued service to our customers.

00:17:00

In addition, in order to have a culture of transparency, there must be shared visibility into the monitoring data in order to collaborate. All people within the organization must have visibility into the data that makes the organization run at Wiley. We rapidly developed and deployed a business continuity dashboard, which had a broad range of data extending well beyond system level metrics to key business information. These dashboards showed usage of everything from VPN, as well as information about the number of new register ins on each of our e-learning platforms, the business continuity dashboard show data about how internal collaboration tools were being used. So we optimize our work from home workforce as well. It also showed the number of research articles being submitted by authors around the globe, researching this new virus by sharing data with our technical teams and our business teams. We use the concept of transparency to truly allow our business to make better decisions in light of a rapidly changing environment.

00:18:15

During unprecedented times, infrastructure is code is another dev ops design pattern, which helps us adjust rapidly to changing conditions. Infrastructure as code allows us to treat infrastructure in a programmatic way, by describing it with code, by including infrastructure as code in our deployment pipeline, we can easily and quickly make incremental changes to adjust to changing market conditions. In addition, automatic adjustments through elastic scaling can help us automatically scale up and down as capacity demands when COVID-19 hit and usage spiked on our platform. This ability to rapidly and elastically scale in response to change was critical to meeting the demand.

00:19:09

Social distancing is keeping people healthy during the pandemic and some appropriate level of distancing of our application components can help keep our SA systems healthy. Loosely coupled systems are a good way to accomplish this sort of distancing at a system medic level loosely, coupled architectures also help provide more resilient systems which can avoid disaster by ensuring that independent components of our system can operate independently. We ensure that failure in one portion of the system does not cause complete system failure. Loosely coupled architectures become increasingly important as some of the components and services our applications rely on are SAS based services hosted and managed by others with SAS based services. We can't control the availability. So we must make sure the overall service continues to operate, even if that service SAS based service component fails. So in addition to socially distancing your application, it's important to test that your system is resilient to component failure With those principles in mind. I want to tell you one story about what we saw in our applications and services when the pandemic hit and how the people processes and technology of dev ops helped us respond.

00:20:47

The pandemics impacted many company in many different ways. It's important to understand that Wiley's a diverse, confident company arranging and products from research to education. Many of our markets actually saw a significant increase due to the pandemic. This was particularly true for our online education platforms as the global impact of COVID began to escalate in March Wiley encountered significant increases in platform activity and low levels. Um, and response times because of this started to be impacted on one of our products on a single customer increase, uh, instance, we saw dramatic increases in response time. This was, uh, and usage. This was due to aggressive promotion of work from home training to the population that was using that, uh, learning application. And this activated a known issue that started impacting response times from all instances in that hosting region, the performance degradation occurred due to due to problems with caching overhead in the application that were known before, but hadn't had IM impact like this. Now the response of the team was amazing in order to re address the issue. They quickly made configuration changes to isolate the impact to only one customer instance. Uh, the one that was causing the issue, these initial changes were made in the matter of minutes, once that was complete platform changes were developed and tested to optimize and reduce the caching overhead, the changes were pushed to production production over the course of two days, resolving the problem and improving the overall performance of the platform.

00:22:49

This graph shows response times before and after the optimizations were made. This improvement is Testament to the incredible work of the team and the impact of dev ops tooling and modern architectural design because of dev ops practices and technologies such as CIC D. We were able to react quickly to a serious problem by reprioritizing current work and addressing the media need while at the same time, reducing our backlog and improving the baseline performance of the platform. We also saw a reflection of the DevOps principles in responsive teams that were engaged and energized by owning and managing the successful response and decision-making process directly. We were able to rapidly deploy changes to meet the changing demand on our systems because we had appropriately developed our CICB plat practices. What that means is because of the dev ops principles we were, uh, able to enable, um, and ensure that learners around the world were able to access the Wylie platforms without interruption during a time of unprecedented change.

00:24:14

So where do we go from here? First? I want to say that this is hard. I don't want to imply otherwise or make little of the fact that people are dying, that people are suffering. There are very real impacts to people due to this pandemic at a very personal level and to businesses around the globe. Many businesses are challenged by that. And even for the businesses that are not adversely impacted, running a business during a pandemic is challenging. I'm thankful that in the tech world, we are for the most part, extremely privileged to have jobs we can do at home or anywhere.

00:25:10

So as I look forward, well, I don't have a crystal ball, but what I can say is that the world's transforming. And we must to, what I can say is that never has technology been more important there's for some time been an understanding that the business needs technology, but more and more, there's a recognition that technology is the business, or at very least the business and technology are inseparable, never has operations and execution been more important and never has the people process and technology of dev ops been more crucial to access success. So look towards the key principles of DevOps and the way they're applied to people, process and technology, and think about how you can progress these in your area.

00:26:10

Well, responding to crisis is tough. There are invariably opportunities. I encourage you to find the opportunity. The hope that we will return to normal may actually be detrimental to our ability to move forward. So stop waiting, embrace constant change because there's not a back to normal. There's just the next normal and the next one after that and the next one after. So if you're waiting for a return to normal, you may be waiting while you go out of business. I encourage you to drive, continue to drive forward, continue to innovate, continue to transform because while we may never get back to normal, there's tremendous opportunity in the next change. And it's up to you to take advantage of it. Thank you very much for your time today. I hope this presentation was useful. If you're passionate about technology, I encourage you to join the Wiley team. You can find out more at wiley.com/careers. I also encourage you to connect with me on LinkedIn. You can find me at Sean D Mac NYC, and I encourage you to continue the discussion on Twitter, where I'm at Sean D Mac and YC. Thank you. And enjoy the rest of DevOps enterprise summit.