Getting Business Results Faster With FedEx DevSecOps Fast Lane

In this talk Matt and Ilia will explain their journey to enable DevOps in a traditional program that supports route optimisation services for FedEx packages delivery. And how leadership behaviours, enterprise wide cooperation, engineering practices and shared quality mindset changed hearts and minds of people which resulted in supercharged delivery in this program.


In such a large enterprise it’s not possible to just start “doing DevOps” by a team and that’s it - there are organisational procedures and functions, established historically and distributed geographically, various delivery commitments to the customers, ongoing operations.


In reality there is limited capacity for such moves. We all know the theory - everyone who is participating in value delivery needs to be “on board” in supporting the change:


Leaders to model behaviours and allocate time for improvement.

Change Management to make processes better so products can move fast.

Security and Compliance, to shift left and let possible violations be caught at early stages.

Infra and Architecture, to enable the ecosystem and let local architectures and solutions emerge.


This story is about such 360 degree cooperation, where leaders, various organisation functions and Agile Release Train teams joined their forces and made change real.

MP

Matthew Pegge

Managing Director IT, FedEx

IS

Ilia Shakitko

DevXOps, Site Reliabiity Engineering & Innovation Lead, Accenture

Transcript

00:00:13

Okay, well, hello everybody. And welcome to our virtual talk today. Um, on our approach to how we created a DevSecOps Fastlane, my name's Matthew Pega and I'm managing director of it at FedEx express. One of my main responsibilities is to lead the business agility transformation in Europe. And today I have earlier with me, so over to you.

00:00:37

Thanks Matt. Hi all I'm Elliot . I'm a site, reliability engineering and innovation lead at Accenture. And I'm working together with Matt on the FedEx business agility transformation in Europe as a technical coach, that's go. And the overview of the FedEx.

00:00:55

Thank you. So hopefully everyone's, at some point in their life received a package from FedEx, um, or at the very least seen the film cast away. And so you've at least heard of FedEx if only in the movies and for those of you that haven't, but X as a corporation has over half a million employees and an annual revenue of about $70 billion. And we're made up of six operating companies, that's express grind, freight logistics, office and services. And I belong to FedEx express express was the original and is historically the largest of the opcos within FedEx and a significantly. The only one that operates outside of the us, as you can see the total half a million employees around half of those are employed within the express opco and more or less a third of all shipments that FedEx handles globally are from within the express operating company.

00:01:57

Next slide please. Okay. So FedEx as a corporation, um, have been experimenting with agile for many years, but we officially launched our business agility transformation for real back in around 2018. When our CEO, Raj Subramanian and our CIO, Rob Carter used the analogy that you can see on the slides of the small, fast fish eating the large slow fish. The now that the analogy goes that gone are the days where the big fish such as FedEx simply bought out their competitors or ate the small fish. These days, we see small nimble startup companies unimpeded by the years of legacy technology and layers of bureaucracy that the big fish can have attacking small slices of the most profitable bits of our value stream and left and checked. This could lead to a death by a thousand cuts. That's why they explained FedEx needs to lead into business agility and become a fast, flexible and focused company, really a whoring up there. So in order to support that business agility transformation, we established an enterprise business agility office. Um, this was stood up to build the core business agility framework based around the five key areas that you see on the slide. The model also allows for the opcos and regions to then build their flexible edge while still maintaining enterprise wide standards and the manager. So we don't all need a decoder ring to understand each other.

00:03:40

Okay. So what do we want to take away from today's session? It's common sense that we want quality user experience customer and employee satisfaction, to be the best and to continuously improve the value we deliver. However, it's often a hard decision to sacrifice FTE time for non-delivery work. There's always demand for new features and capabilities, internal requests for new requirements and BAU defects defects. We don't want to pause the machine or slow down delivery, right? What about ad hoc requests and deadlines? There's a seemingly never ending cycle of continuously growing product complexity. On the other hand, when it comes to innovation improvement, everyone thinks it's a good idea and something that they should support. We encourage teams to adopt new ways of working, apply automation and improve quality. Yet the reality is there's never time for these moves, or if there is you stumbled upon roadblocks, almost every move, everyone else is doing their own thing in their way. And whenever you want something, you get asked to raise a ticket. This doesn't enable change.

00:04:56

As I said, one of my roles is to stand up and European agile enablement office in order to lead the transformation in Europe and as part of our broad and balanced transformation plan. One of the key things we decided to do was set up a flagship art. This is an agile release train that could be recognized as being best in class. And then we would use that to nail it and scale it and spread that across the other release trends. Again, we focused on a balanced approach to launching train, taking an existing art and using external coaches from Accenture SIQ to help coach the art alongside internal coaches and build a robust art roadmap. Key to this was maturing, got dev suck ups capability. And today we aim to share how we approach this in a bit more detail earlier over to you.

00:05:55

Thanks Matt. But before diving into the details, it is important to make a little disclaimer. FedEx is a large enterprise and this is not a no regret. Move in isolation, FedEx, enterprise leadership and Accenture SAQ coaches work together for your, to this event, developing an outside perspective on business, agility, maturity across all operating companies, improving technical agility is one of our critical priorities of the transformation. There are more than five change events are taking place in our journey, of course, but here we will focus on five main ones. Let's talk about each point. Now

00:06:42

There is a selection of DevSecOps health diagnostic frameworks together with coaches, we designed the custom approach to give us necessary insight across all areas that included maturity, assessments, surveys, workshops, and interviews. The products selected for the fast lane was the new ground. We entered, not all processes were established and plenty of opportunities to improve were found. We started by developing an outside perspective on the DevSecOps maturity and made it our baseline. As you can see, few areas on the top were making their way towards second level. The challenge was not the fact that some may think, yeah, that's the low maturity area. We got the snacks, the real challenge and our motivation to have the stock today is the time and effort it was taking to gain those initial improvements. You can also see some continuous integration ignition business of automation and best automated here and there. Hey, what about security and about continuous integration by the way, did anyone see just humble stoke about real continuous integration? I'm curious, share your thoughts in the chat. Are your teams really doing CGI?

00:08:05

Okay. There was plenty of room for improvement, where to start, what improvements can be carried away by the teams independently and where program or enterprise support is required in complex product development. Like a system that hasn't going on. Commitments running our approach was to implement improvements that make the largest impact and impede the flow. Most mapping delivery pipeline help here. Three real main points of rework and delays. We also looked at the areas of control for each improvement opportunity. The above picture and highlighted areas are the common struggle for teams at most of the maturity levels is the pasta production defined and clear to everyone who is participating in value. Delivery Is quality incorporated at all stages of the delivery and collectively owned are the teams moving small enough customer centric value pieces through all the delivery. You may have had time to glance over to select the improvement list. We will see remeasured maturity in the end of the stock, and hopefully you'll join our human celebration. How much diamond support would you give to your teams to learn, fail and evolve in such areas now going to psychological safety, Matt, over to you.

00:09:33

So now we have commitment to drive the change and have improvements embedded as part of our regular deliveries, but is your environment safe? I'm talking about psychological safety. Here are your teams and individuals able to take risks without feeling insecure or embarrassed. The reality is often not what you may think and creating this safe environment is a key role for an agile servant leader. An unsafe environment causes team members to share fewer ideas and to overfill to them so they don't share because they don't feel safe. Steven Smith published a safety check article, which gives excellent insight on how to conduct regular exercises to measure and improve teams, psychological safety.

00:10:22

It's also important to allocate time and space for the improvements to happen. We address this on various levels. In our case, firstly, the program management realized the need for the improvement and supported it by incorporating it into the program. Backlog, next business owners recognized the value added and reflected it in their higher enabler. Objective business value scores. Thirdly, the innovation and planning interact into integration was given more attention and we tried to limit or reduce the urgent things we needed to finalize you notice, I didn't say completely removed. We're also iteratively improving here. Number four, the product owners were coached on balancing the iteration backlog with user stories and enablers. And then finally wanting started to experiment with the 80 20 rule. So besides the actual hours dedicated, we also considered the natural learning curve cycle, all improvements, techniques, tools. They can't be applied all at once or in a row. You need time for the knowledge to settle. Something may not work for a particular team or technology. It all takes time and you need room to inspect and adapt Back to you.

00:11:46

All right, so now one of the key components in decreasing risk of releases is to bad sizes. When we started to move work batches through the pipeline, more often, we started participating integration, practicing integration, testing, and deployment process more often. And therefore we're able to find and fix problems earlier. And here, I always like to share a little story from my past experience, uh, about one team that was undergoing a change together with a program. And, um, they were not really giving a full buy-in, uh, to shift to smaller batch sizes. However, they were still cooperative and they decided to give it a chance, but they said, you know, you want the small batch sizes. You will have all batch sizes, but they were a little bit resistance still. So they decided to go really wild with the composing work into even smaller sizes. Well that isn't the best move when, uh, you know, you're aware of the transaction costs, but that was another learning point. The end of that story is the team was surprised with the amount of completed work and actually fully closed iteration goals.

00:13:08

The next point on the list is work in progress limits. It is easy to get this point lost. If you stop at just assigning WIP limits in your ALM tool, embed this into the team, working agreements, develop set of scenarios on what to do. If a Colombia can read, address this in retrospectives or along the iteration to keep respecting the work in progress Lehman finally well technical feedback from integration and tests may not be the first available move. Few techniques that we implemented were to move closer to scrum events, reducing the overhead on planning and giving acceptance and quality feedback earlier.

00:13:57

Having enterprise DevSecOps platform and dedicated team who continuously evolve and adapt it to their organization. Unique context is one of the key enablers for rapid feedback. We also boosted our environment with enablement and technical coaches who is always there around the product teams to get them on board and support the adoption. Here, we show our continuous delivery pipeline. You may have spotted the common parts such as quality scan release candidate build a packing ultimated desks and deployments. I believe it is becoming common to integrate these tools with ALM. We went beyond an extended the capabilities to include critical enterprise functions into the process. That's one of the examples of enterprise cooperation and collaboration in order to satisfy in the future outfits enterprise platform team works together with compliance teams to understand the requirements and incorporate them into the necessary stages of the pipeline. Security is developing reusable assets and pipeline references to enable scanning into the delivery and conducted continuously at various stages release and dependencies information is gathered from various sources and upended throughout the automated release lifecycle and finally project and change management services that are historically part of the processes are populated, updated and closed.

00:15:32

Yes, we still have that box with verbal edge. Did anyone notice where some of the review and approval has to take place? The good news is that for now once a review or approval to place release continuous automatically, but for now, this is our now next point of the improvement.

00:15:58

Now that we connected all our pieces of the delivery in one automated process, there's a big chance, big bottleneck and rework late at work, acceptance and testing. It is great. If your team started Greenfield, the unit test component tests operating in batches automation, all that. In our case, we started with a traditional development program where there were a shared testing functions and phase gates. Imagine first time test there see features and behaviors that needs to be tested is when it's considered to be done. And what is done by the way, we'll touch that in a moment. So your tests are switch from whatever they were testing previously for other part of the program to the new features coming in the queue. It's great. If all the specs and behaviors are clear to everyone, but sometimes it requires extra communication to understand what exactly needs to be tested.

00:17:01

As you saw earlier, in our story, we mapped the delivery pipeline and we clearly saw percent complete and accurate metric low at the testing stage. That means up to 50% of rework is getting back all the way to development. This is a very expensive situation to have such number in the right side of the pipeline. We know there is a destined pyramid and quality shift left technical practices. It has to be implemented also to reduce overhead on the heavy and long UI testing, moving more closer to unit API and component testing, but the day stories to highlight cooperation and collaboration, thus, we will focus on relative points to that. One of the first things to do was to get various perspectives together in one room and let them talk. And this needs to be happened with decides the backlog refinement before user stories are taking into development.

00:18:07

It is not about getting three people, a PO testers and developer, but to get the three perspectives with whatever amount of people required. So this there's have chance to ask about happy and set scenarios, developers to understand what are important points to think about clear some assumptions, or maybe add a question that BU has to return on defining what's raining what's done mean was a crucial exercise. As it's also incorporated various perspectives and placed as evolution. Number one of the agreement, we have to be careful. It was tempting to add all sorts of criteria and the definition of done by PO and quality as well as definition of ready by engineering. Our coach has helped to start with the right balance because if we overload definition of ready, it will become a phased approach again. And we will wait until perfect design requirements and definition to start work.

00:19:15

Same with definition of done. If a team would start with too much commitment, thanks will never be done. It has to be evolutionary and reviewed every few weeks and last but not least vertical Fleisig together with previously mentioned small batches moving all pieces that makes minimal value increment to the user made the difference in particular, it became easier for the end user to understand what's being delivered and provide feedback. Shout out if you recognize the situation of a customer or stakeholders are dropping the demo attendance because they don't understand what's being with and how to provide feedback to that.

00:20:02

Next thing was to Luke where the testing is struggling and how efficient it is. We realized there is a continuous stumble at lower environments. We don't yet talk about people reduction. How can we even talk about shifting quality left if there are so many components and dependencies around. So most of existing functionality can be easily tested without having access to real systems service virtualization. The stubbing really helped us to speed it up in particular. We've got no need for dependent services to be up and running. When system level best is needed, reusability of virtualized dependencies for other products, we're able to capture required dependent system behaviors and generate stubs as well as simulate realistic performance. Now also we noticed improved quality feedback, time and frequency eliminated zero valley routine and improved overall disability.

00:21:17

And it is hard to imagine a large enterprise having no security and compliance and Lola right, and the delivery and releases, but let's face it. When are these parties involved in the process? Usually I clearly remember the time when dev ops was a rising hype and everyone was looking in unifying development and operations, someone just merging two teams, someone looking broader on improving the walls of confusion, removing them between those two functions. But they, we had security saying, what about us business intelligence, infra compliance, Hey, you forgot about us as well. It was a nice rising of DevSecOps stressing, continuous security importance and incorporation across the development street.

00:22:12

But how can we deliver fast? If in the end, there is a game that performance verification, whether ongoing release satisfies all necessary criteria, secure enough compliant, et cetera, which can reject that release. And that will be right. Yeah. Let's be realistic. How many times we had that whole hard work done waiting in the queues approvals, et cetera. And then it returns all the way back to development because of subsequent issue, but don't get me wrong. It has nothing to do with bad security folks. They don't let us go fast. No, this is a very crucial moment. The problem is that just feedback that we receive, we receive a very late state. So how can we shift this feedback left as well? That's what we've done. We work together with the InfoSec and cooperate with them and compliance teams, enterprise dev DevSecOps platform, uh, and incorporated common and routine operations requirements and expectations for what is needed to have the complete package within the release and increase the chances to go through from the first round, we had to think what is possible to test and include in the pipeline at lower levels. Can we start collecting some important information about the release approvals, et cetera, and continuously append reports that can be used in the end strict security environment where you can't simply commit and deploy as the same person. All right. How do we incorporate that into our automation process as early as possible, last but not least, we observed the ecosystem replied back to this moose, providing back to the InfoSec and enterprise platforms, community improved automation and pipeline pieces that benefit existing teams and everyone who just started their DevSecOps journey.

00:24:17

Okay, so now we've connected the delivery process into one automated delivery phone, but we were still constrained with actually rolling out the changes because once new feature set production, that's, it it's live external and internal customers get to see it in our case, the products, and to go in to change had internal customers spread across several regions and countries and sometimes need training or, or other, um, business requirements are needed in order to be able to start using the features. And what about release planning? And what if users in region aren't ready or able due to labor relations issues, um, to start using the new features and processes.

00:25:00

So we've automated and accelerated the move to production process, but now we're asking the dev teams to hold on the deployment so we can first ensure everyone is educated, how to use the new functionality, or we have the local workers council approval. And then we can finally approve the release. This is a separate journey to change hearts and minds of people to learn that the move to production does not have to mean release to end customer. The dev teams always want feedback in the production deployment. Yes, we know the change worked and was tested on pre-prod and all that, but there's always a chance that things may not go quite as expected when integrating changes into the complex production environment. What if we can let the team deploy the changes as fast and frequently as they want still ensuring there are proper fallback scenarios in place to ensure no business disruption to the customer experience, but at the same time, let the business decide when and where they want to open up the new features. This is what we did in this fast Fastlane exercise. And addition due to the complexity of our landscape, we extended the feature feature toggle capability with not only on an off, but also some basic business rules that specify when the toggle should be on or off, based on a location region or even what type of user, this decoupling enabled various benefits, such as time to production faster frequent feedback and advanced engineering practices, but still allowed us to main control for the business side.

00:26:47

So when all of this is done, did we make a difference? What changed? So we've covered some of the points on the, on this slide on previous sides, over some of the best results we got were around team engagement and satisfaction in retros and inspect and adapt events. We started to see real signs as the team becoming a true learning organization and a growing employee engagement where previously people saw an issue, but didn't feel like they own the solution. They're now starting to understand that they're empowered to make the change and fix it themselves and not wait for management to say so finally, you can see that we readdressed our DevSecOps maturity status against the baseline we took and saw improvements across all elements of dev ops. But this is not a one-time exercise. It's a continuous journey that we will carry forward using continuous self assessment and improvement.

00:27:47

And we're not done yet. As we keep saying, this is an ongoing iterative journey. Some of our next steps include expanding the scope to look more closely at the releasing change process and bring them closer to the teams, getting rid of that purple box earlier showed earlier. Now we have a flagship, but the purpose of the black flagship is to use it, to inspire others and, and hold it up as an example of what can be achieved on what good looks like. So we need to go and replicate that across all of the value stream across all of the agile release trains and teams in Europe.

00:28:28

Yeah, indeed. And it is very crucial to enable collaborative and learning environment, invest time to improvement, to achieve greater success. Summarizing the story. While there are known paths and frameworks to enable DevSecOps and agile in your organization, it doesn't work to just ask your teams to innovate and improve and expect that to happen by itself. Everyone is busy with deadlines, commitments, and catching up with bug fixes. It is crucial to establish safe environment, to invest time into improvement and drive the change. But if you want to supercharge the change enablement and scale, it widely enable the learning and collaborative environment around those who are undergoing the change and you will see the difference. Well, that was our story for today. Hope you enjoyed it. Thanks Matt. And see you later.