Virtual US 2022

Paving the Road for 30,000+ Developers

At SAP we have around 30,000+ people working in our development organization, with >1000 products on our price list using various technology stacks.


How do you increase developer productivity at this scale?


Backed by in-house user research and industry trends we decided to lower our team's cognitive load by introducing an in-house CI/CD platform called 'Hyperspace'.


Dirk will talk about the obstacles of creating 'Hyperspace' with a "platform as a product" approach to an organization that was highly fragmented. Concepts of Paved Roads (a.k.a. Golden Paths) help us to provide guidance to teams with the aim to reduce team cognitive load and decrease support load on central teams


Join Dirk's talk for lessons learned, impacts that we already see, and an outlook on what we envision in the Hyperspace.

DL

Dirk Lehmann

Chief Development Expert, SAP

Transcript

00:00:06

Hello everyone. My name is D Leman. I work for the product management for the internal C I C D platform at SS a p. I worked for the company since over 21 years in various roles. I was, for example, part of the first team that established the first full continuous delivery approach at SS a P. And today I want to tell you a little bit about our journey towards an internal C I C D offering serving roughly 30,000 engineers. Let me bring up my, uh, slides. Here we go. Some of you might know SS a P in our products and some might not. So I thought it's a good idea to give you some context of who we are and what we do to set the stage also on the challenges that we face when we create and see i c d platform offering for an enterprise of, of this scale.

00:00:59

SS a P was founded over 50 years back with the idea to standardize enterprise software as every company has to deal with financials, employees, customers, products and so on. The idea was to build a product to manage all of that. Today, our e r P suite covers 25 industries from oil and gas, retail, manufacturing, public sector, what have you. And even though 80% of our customer base is from the S m E market, 99 out of the a hundred largest companies in the world run SS a p I'm not sure who's the one without s a P software, but I'm pretty sure, uh, my six SA sales colleagues are at them. I have 100 over 112,000 colleagues around the globe in 130 locations. And out of that our roughly 30,000 are engineers. So our products are written in, some of our products are written in an own proprietary proprietary technology stack that comes with an own pro, uh, cramming language called abba that has its own lifecycle methods and tooling all integrated in the platform.

00:02:04

And other products use technologies which are maybe more fam familiar to you, like Java, which is roughly 30% of our code base, Java script, c c plus plus Python, and go into some others. We sell our software and services through various channels like on-premise cloud hybrid. So where cloud and on-premise are mixed for sure, mobile, we run our own data centers, but we also support the big hyperscalers that you all know. The reason why I tell you all this is that I think, well, the figures are quite impressive and some of them weren't known to me before I compiled the slide deck, but it also outlines and gives you a bit context of the situation where we operate our C I C D platform in, if we just take the highlighted facts of that I just outlined, we get some idea of the challenges like we were founded in 1972, so we have quite some heritage and

00:02:58

A huge active customer base that we have to safeguard. Whatever change or larger innovation we do, we always have to make sure that we do not disrupt our customer's business. I spoke about many industries and you know, the challenges that companies often have, especially in highly regulated industries to comply with, um, um, legal requirements and such. Well, we have all of them, right? We have the sum of all of that, the banking, the oil and gas, the healthcare, what have you. We serve with our software, all of them. So we need to support and adapt all the requirements in our software. And our customers come from everywhere in the world and make business basically with every country in the world. And that implies well, the known internationalization concerns like translations and supporting left or right languages and such, but also with legal requirements that our customers need to adhere and hence our software needs to adhere and to handle all of this, our engineering force is quite, quite huge.

00:03:59

With 30,000 people in around 130 locations and thousands of teams, A simple amount of people and teams that we have to deal with is, is already a challenge on its own. And we work with very heterogeneous technologies. So tools and processes that work in one tech stack, they do not necessarily work in another tech stack that we support. Well, SS a p promotes certain programming languages simply by better tooling and process support, but we do not limit or mandate some programming languages or technologies that our engineers use. Yes, we do have, again and again discussions whether we should mandate certain tech stacks, but currently we don't have that. I think it has a pros and cons and we serve various delivery channels, um, um, in, in that we offer our software and that implies slightly different development models for each channel. Think about things like feature toggling, which is a common approach in cloud-based software.

00:04:58

It simply does not work in on-premise software. Shipping changes fast as an early as possible means something completely different in an on-premise approach or cloud or a mobile app, and we have to serve all of that. Now, how did we do that in the past? Well, first of all, we offered quite a bunch of central tools to the development teams so that they need to, to deal with that. Instead of having each development team setting up their own tools, we offered them centrally to various team to cover their needs. So for example, we run multiple own GitHub instances, which are quite large already. We run a farm of around 2000 centrally provided Jenkins instances. And Jenkins is just once the I c d orchestrator and many teams run their own instance of of an own CI orchestrator. We run an own central artifact repository that has around 250 million requests per day.

00:05:54

So what my colleagues are doing here is pretty much the heavy lifting and sometimes the tools that that we serve are basically, um, targeting the same thing like pipeline orchestrators. We have multiple or build tools, we support multiple and sometimes we have the same tool, but in multiple instances because they are in different network segments or have different configurations. And then it simply depends on the tech team's technology stack, programming, language, location, what have you, which one is the best for them, and also worse to mention that the tool ownership was in the past spread across the company. Security tools came from the security people. And if you had an issue with the legal tool, well your legal tool, the legal colleagues were your friend to fix that. And we have broker, uh, process requirements, which are basically best practices how to create enterprise, create software.

00:06:44

This simply ensures that our software is following all the legal requirements, security standards, accessibility requirements, data privacy that all our customers around the world have in all those various industries that I mentioned earlier. Some of them are treated as best practices like good advisors and some of them are a bit more mandatory, so they are not negotiable. So they differentiate and we have guidelines, good advisors and recommendations for all kind of things like architecture, um, support operations, handbook service management and such. And again, some of them are very specific, some of them are vague, some are, some are applied only to some products. Some of them are globally applied to all products. And all of this engineering had to deal with, you see, choosing the right tools, keeping up to date with the latest and best tooling, ensuring to fulfill all the changing requirements, making sure that you don't miss an update on the requirements, otherwise your delivery could be stopped ensuring to stay close with the latest guidelines of the organization.

00:07:46

And years back when releasing software was a matter of months or years, this was somehow manageable to the teams. Also, they had huge support by central organizations that took away some of those requirements for for them. Now lately, we seek for higher delivery frequency to get into closer feedback loops with our customers, which I think we agree on, uh, is a good idea. And in order to deliver faster and to, uh, reduce the handoffs, we empowered our engineering teams so they can take over more responsibilities. Shifting left is the key word here, which puts more and more load on the engineering. And this is a problem. The team's cognitive load explodes, cognitive load can be described as the total amount of mental effort being used in a person's working memory. And as a team consists of, uh, of multiple persons, we can apply that idea to the whole team, the team's cognitive load, and if the team's cognitive load gets too high because with too many unrelated tasks that we put uh, onto them, their ability to deliver customer value will go down.

00:08:55

So it's shifting left the problem. Did we all go wrong the whole industry? I think no. And James governor, co-founder of redmon put it to the point where he wrote in one of his recent, uh, blog posts that you need to have a good developer experience in place that allows you to shift left all the things. If your developer experience breaks and you shift things left cognitive load will blow up, development, productivity, developer happiness, customer value, all of that goes down and the bad things, failure rates, stress burnout ratios, all of that goes up. But it's important that you first have a good developer experience and then you shift things left. A broken process doesn't get fixed because it shifted left, it remains broken and it just increases the team's cognitive load. For two years back, we started the program to reduce, to reduce the team's cognitive load by implementing an internal C I C D platform offering following the platform as a product approach, which is nicely described in the team topologies book by Manuel PA and Matthew Skelton, along with many other important things, um, like the team's cognitive load.

00:09:59

So if you haven't read that, definitely worth reading. We named that uh, C I C D platform hyperspace as we had a previous, um, predecessor project called Hyper Pipe. And we somehow never get rid of the hyper hyper naming thing. So the idea of hyperspace is have first of all one entry point for the development teams, the developer experience portal. If you know Spotify backstage, which is now with the Cloud Native Computing Foundation, you know what we have here. It's one entry point where the teams get and expose all their vital information to deliver value below that. On the left, we have the tools and services that we already had in place, but we have reorged them now into one organization alongside with a newly created product management aside that to takes care about all the pro the tools that they are getting integrated, harmonized and aligned.

00:10:52

And we renovated the process requirement framework so that it fits better to the tools and services and that we get into a higher degree of testing, scanning, compliance, check and automation to, and that it's all better tailored to the needs and the situations on the teams. Also, the ownership of that process framework is now within the same, in the same organization of the the people that that own and build the tools and the whole platform. And centrally we have a new component, which is the paved road. And I want to elaborate a little bit more on this, uh, approach, uh, with the upcoming slides. The paved roads are some company called that the Golden Pass, and they even saw mixtures of golden roads and paved paths or whatever you call that. The idea is always the same thing, giving teams a clear end-to-end guidance for a complete process, in our case, a complete delivery process.

00:11:45

So we wanted to give teams a concrete detailed answers when they asked us, how shall we use this tool in order to comply this and that so that we could say, well, look here in your context, the best case that you could use this tool for fulfilling this requirement is this and that because we own and understand all of the tools and processes Now after the reorganization, well that's at least what we saw. Our first idea was to get to a paved road as well. Let's just bring all the experts of the tools and processes into one room, give them some canvas and some time, and then they will tell us about what is the best end-to-end, um, way to build and deliver software at SS a p. And that didn't quite work out very well. What you see here from far above is a simplified version of the canvas that we worked on and we didn't even finish.

00:12:33

Every colored box that, uh, that you see here is a tool or a service and the lines indicate implicit or explicit dependencies. And as I said, this is a simplified version as at some point in times we left all out all the obvious dependencies or all the well dependencies to basically everything because we were unable to read the canvas if we drew all those lines into the canvas. And as I said, we didn't even finish. We had to admit that the situation was way more worries and complex than we thought it is. But this exercise also had some good points. We learned that we need to set ourself clear guardrails and constraints in order to tackle the problem giving our technology stack and the legacy stuff. It is impossible that it is impossible that we cover all tools and processes combinations at once. But we could start with like an ideal case with not much legacy constraints and clear technology choices to have a better chance to handle the com the complexity.

00:13:36

So we set ourself a clear context and took a divide and conquer approach. We separated the paved roads into various segments that we called the development procedure. Now, in plain words, one development procedure describes a thing that an engineering team needs to do in order to ship software at SS a p. Well, let me give you an example. If a team wanted to use an open source library in their application at ss a p, they touched various tools and process or still touch various tools and processes, which sometimes is a pretty neat nightmare to them because they need to figure out, okay, what do we have to do in regards to licensing? Because the open source component might have a copy left or infective license or the license asks them to expose their usage somewhere in the application. They need to fulfill global trade and export compliance because the component could use crypto cryptographic methods which fall under some trade sanctions.

00:14:31

They need to store a central software bill of materials because s a p wants to know at every given point in time, uh, which software is used within their products, if there is any security or legal, um, issue popping up, uh, security is there known vulnerabilities to that open source component and making sure that there's a mechanism in place that if there comes up a vulnerabilities newly found, that we have measures in place to tackle them immediately. And for sure all those processes and tools come not always very well integrated. Some do, some not. And now a development procedure describes exactly this and we formulate that in giving them a trigger describing what is the trigger that the team, um, uh, brings to, into, into action. Like, hey, we want to use open source library and the value that they perceive after they went through all the steps that the development procedure describes, like, okay, the value is you can now ship your code faster by using open source software securely and compliantly.

00:15:35

And important is that the first version of the development procedures we describe the SS is state the situation as it is now. Um, so we, we, we look which which configurations which settings in which sequence, which tool needs to, needs to be used today. We do not attempt to optimize the situation as it is in place in the first version because describing the, um, the development procedures while improving it at the same time increases complexity and it just will, complexity would blow up into our face. The first version is simply describe the s is situation in all its beauty or not the beauty sites. And then we have for sure have multiple development procedures on how do I manage my backlog best? How do I release a feature to customers and such things? And the development procedures now use certain tools and describe exactly in which sequence certain tools are used, um, how to configure them when in the lifecycle you should approach them, uh, in order to fulfill certain requirements and guidelines in always in the team's given context.

00:16:44

Now, if you have multiple of those development procedures and you combine them, you have a paved road important that the the sum of the, of all the development procedures is more than, than than its parts because the paved roads, the paved road also ensures that the development procedures that are used in the paved route are internally consistent, meaning that one development procedure is not contradicting or conflicting another development procedure. But tool configuration in development procedure number one must not contradict the configuration of the same tool of in development procedure. Number five, also, each development procedure has a clear ownership, a development procedure owner who is the subject matter expert in the whole development procedure topic and the paved road has an owner, someone who watches the internally consistency of the whole paved road.

00:17:38

Now we are at the DevOps Enterprise Summit and I believe you identified what I actually showed you in the last slides. Value streams. One value streams, one larger one that we call the paved road and smaller ones which we call the development procedures. And that's Citrix trick. We did not invent anything new here, which it's the same old value stream idea that which just decades old. But now it has clear benefits to all parties the engineering team have for the first time in end-to-end description, how to deliver software at SS a p in their context that gives them clear advice what to do, when to do and how to do. What you see here is how the paved roads and the development procedures appear to the teams as documentation. And we did not copy or move existing documentation, but we link to the parts of the existing tool and process documentation, which is valid in the context of this specific paved roads.

00:18:28

So the teams don't get lost in gazillions of documentation wiki pages whatsoever, but we point them to the paragraphs that are important in this context. First go here to that tool documentation, that paragraph is important and then go there, read this paragraph and do that as a central unit. We finally now have systemic descriptions that the team follow for their delivery process, the paved growth, and we can optimize along that. We can see the bottlenecks, we can see the constraints, and we can finally improve the system as a whole spelled with a w. And the feedback that we have received for our PA first paved growth is extremely positive. Like that team that we worked with as a validator and review of our works that if this would have, would have existed one and a half years back, that this would have saved them wor weeks of work.

00:19:18

And I'm not sure whether what that is in money numbers, but it's a lot because it's multiple people that would have, um, uh, would have had an easier life. So what is it that we want to do next was the paved road. The first paved road is just general available since end of September. So everything that I tell you here is, um, quite fresh experiences. So we want to, uh, extend the scope of the paved roads, having more development procedures, describing operations, uh, portfolio process, but even cultural transformation. But at in total keeping the core development amount of development procedures as minimal as possible, uh, but recombine them to, um, create more paved roads, contact specific paved roads for the various technology stack and programming, programming languages that that are out there. And we will always reuse the existing core development procedures. So, so that we only have a very minimum set of like domain specific development procedures that only fit to one or two paved roads.

00:20:19

The learnings in creating the paved roads have been that first of all, you have to limit your scope, otherwise complexity will kill you. Focus on small, um, parts of the overall value stream, divide and conquer. Make the first scope as narrow as feasible even if the scope looks, um, too optimistic or topic or ideal and has too few adoption cases In the real world, adoption is not the goal in first place when you start it is creating transparency, gaining insight into complex structures. Adoption will follow that, start with the as is situation. Improvements will be in late, will be done in later versions. Don't improve. Try to improve the situation while describing the systems. It just adds up too much complexity and have clear ownership for the various development procedures or paved roads. So the, the the value streams and the value streams part so that you always have an expertise on how to go on and how to improve, um, uh, in the development procedures and, and the paved road and the paved roads are just one very central, uh, component of our hyperspace c i c platform offering.

00:21:29

But let me use the remaining minutes to share you some learnings that we had, um, on the overall platform so far. Currently we have 30,000, roughly 30,500 pipelines on hyperspace. And important for us is that we do not mandate the platform even though, and also we do not mandate the paved growths if teams want to create their own delivery process and pipelines with our tools or without our tools, it's just fine. The main criticisms on mandating a platform or tools or paved growth is it kills innovation processes and platform describes known things, best practices, how it already worked in the past. And Inva innovation tries to describe something that is somehow unknown, where we do not know how it works best. So innovation needs freedom and mandating a platform kills innovation, put all tools and processes into one ownership. This helps a lot to avoid unnecessary friction in handovers paved roads or golden paths are a pretty cool thing as I hopefully could outline in the last minutes and take your time.

00:22:30

We all started this two years back and it always still feels like we just started. There's still a lot of things that we have to learn and a lot of things that we have to do and take the whole team that works along in in the platform along with you. I have to confess that we could have done better this in the last years, but we are improving on this. One idea is that we create a storyboard, like a visual comic strip that describes a day of a life of a persona. Um, how we imagine how the work with using hyperspace will be in the future. This helped us whether we see the same vision or painting in our head, how the future using hyperspace will look like to a certain persona and it helps us communicating to stakeholders. Look, this is how we envision how, how developers' life will look in the future when they use hyperspace. And we are not sure whether this is 2025 or 2030, but this is the vision. This is the thing, um, towards we will work on. I could, I hope I could give you some insights into our internal C I C D platform and how we work and maybe the one other idea could inspire you in your daily work. If you want to reach out, feel free to use any of those listed social networks and thank you very much for your attention.