Las Vegas 2019

DevOps Confessions

Dominica DeGrandis reads the DevOps Confessions.

DD

Dominica DeGrandis

Principal Flow Advisor, Tasktop

Transcript

00:00:02

Uh, the next format is called DevOps confessions. This came from some amazing advice that we got from Dr. Richard Cook from the safety culture community. He said there are certain types of stories that you don't hear on stage. Instead, you hear them after the sessions, most likely at the bar after there's been a few drinks. And that's where real lessons are told because great practice comes from experience and experience comes from bad practice. So in the program committee, we wanted to bring these stories that you might hear at the bar, but bring them to your onstage so that you don't have to be so lucky to be in the right place at the right time. And that's what we're calling the dev ops confessions format. We collect the stories that we've anonymized and want to share with you. So please welcome. One of our program committee members, Dominica at the grand us who will be reading one of these anonymized stories.

00:01:06

So it is an honor to be here and to be the voice for this confessional, the willingness of this community to embrace the sharing part of DevOps is just amazing. And frankly, it's just really a relief to learn that others struggle too. And so here we go. My fortune 1000 organization had decided to become cloud first. We only had a handful of folks who could spell cloud or even dev ops. In addition, we had a ton of work to deliver, including a brand new ERP cloud integrations, event-driven components and removing decades of middleware and other heirlooms systems to make matters worse. There were at least 10 other high prideful high profile projects in play within the enterprise that had the potential to transform how our supply chain works, how it delivers products and how it generates revenue. I was freaking out given the sheer magnitude of cultural automation and tooling changes that would be required.

00:02:10

As I reflected on my situation, I realized that I was also missing broad executive support for a DevOps transformation in the week prior one of our top executives asked me what's the root cause of the failure and who should be held accountable for the interruption in core services. I felt like I was in the middle of a chicken or the egg scenario. If dev ops is all about calms culture, automation, lean metrics, and sharing, how am I going to drive a DevOps transformation? If we don't have that culture or tradition of automation lean or a focus on metrics is culture and an input or an output of dev ops, can culture be shaped by working differently in devoting daily time to improvements? Despite all the headwinds, I decided that we cannot successful longterm without leaning into first principles from lean DevOps and continuous delivery. Much of the journey has been completely organic.

00:03:08

And at times it feels like pushing a big Boulder uphill. As I spend more time convincing teams of the benefits of loose coupling, a lean mindset and continuous improvement. On the best of days, I've been able to inspire teams to create more automated unit tasks, to increase coverage and safety of deployments. And on these days I'm gaining allies and converts to new ways of working. These teams start to work differently, although they may struggle from time to time the bad days, oh my goodness. There have been some bad days immediately. The memories come flooding back. We had written a prototype program that would remove unused resources in a non production cloud sandbox. And everything was good for several months. We were saving money and keeping the environment clean. And one day the cloud sandbox was removed while the Cron schedule remained. And we came in the next day, do a huge dumpster fire because the original cleanup job and schedule had somehow jumped across to the production environment.

00:04:15

Hundreds of resources were deleted or hundreds of resources, such as web apps functions, service-based bus subscriptions and API management registrations were deleted. At first, we had no idea what was going on, what to do about it, or how to stop the bleeding. After a couple of hours, we figured out what had happened, but our reputations and relationships with our delivery teams took a hit. Luckily the job timed out and prevented more carnage, such as removing the production data lake by the middle of the afternoon. Many of the resources had been redeployed by their delivery teams. Although a handful of items were never fully recovered. This event made an indelible imprint on the organization and it's driven a large body of continuous improvement work such as separate accounts per environment and auditing and pipelines for random blocks of code. Our bad days had been opportunities to role model calmness and focus and teaching us how to nudge the culture and the practices and the mindsets to the next target condition.

00:05:22

Ultimately, the new ERP system was implemented on time and the new event with the new event driven cloud integrations. Was it perfect? Absolutely not. But we have indeed started to work differently and I do see signs of different culture practices and beliefs. One of my favorite examples is a team that's been applying continuous improvement to the CICT delivery pipe practices for data bricks and machine learning. And these outcomes I believe are going to serve us as a foundation for more continuous improvement and hopefully more impactful outcomes. The enterprise as a DevOps change agent, I seldom had everything to be successful at any time I needed more money C-suite support technology, experimentation time or open-minded teams. But looking back, it's important to note that many of our breakthroughs came from thinking about the next best things that we could work on rather than waiting for the moon and the stars to line up perfectly. Thank you.