"DevOps to the Metal": Achieving “Flow” in a Large Organization and In Cyber-physical Systems

This experience report describes the transition of a large medical organization which develops computed tomography modalities towards more agile and lean approaches that improve the speed and the “flow” in the organization and the products it produces. It describes the broad set of challenges that are faced, especially when developing complex software-intensive systems in a safety-critical and regulated medical domain. This practical case study details the concrete set of activities, insights, and “lessons learned” that were done in this organizational-wide transition in a structure that allows persons or organizations with similar challenges to apply them to their own organizations.


Furthermore, this experience report describes how we apply DevOps approaches not only to software, but to software-intensive cyber-physical systems. Minimizing downtime of systems during updates is a vital aspect for our customers as well as the robustness of in the field updates. In a world of complex cyber-physical systems of systems that is an far from easy task and requires early continuous * activities including continuous deployment to in-house systems before the subsequent final release as well as an extremely high transparency of the installed base through a regular feedback loop and data analytics methods.

TJ

Thomas Jachmann

Head of Software, Computed Tomography, Siemens Healthineers

Transcript

00:00:12

Welcome to the talk about bringing dev ops into a complex environment of cyber physical system of systems into a large scale organization and a regulated environment. Welcome to dev ops to some method.

00:00:30

I am working in Siemens Healthineers and Siemens Healthineers is a leader in the global health industry with impressive numbers. He provides the systems, software and solutions to clinicians worldwide for the best possible analysis and treatment of our customers and patients. I'm from one of the segments, computer tomography, and you will find computer tomography in almost every regulatory worldwide. We are actually even today at key lever for fighting the global pandemic as computer for us as the best radiology equipment to analyze and treat the COVID patients into the best possible way. My name is Tom . I am head of software of computer tomography. And as you see on this picture here, computer tomography is offering a strong portfolio of software and CT scanners. Most of these scanners are driven by a single software platform that allows the utilization of the great quality and functionality of this platform, or just a whole fleet of scanners and brings it to all of our customers.

00:01:56

So this is the story about dev ops and what we did in, we are facing many challenges, not only that we are in a regulated environment, but we also have to cope with a growing complexity, both gives a wonderful product. And within our global growing organization, we are transforming from an hardware only business it to a world where software becomes more and more dominant. Whereas the key sales aspects are not say speed of the rotation of the gantry is any more alone, but what does the software makes out of those images received from Cisco center systems become more and more interconnected. So you might have been part of the 12 of last year's conference with team clay presented in an global environment, but say stuff. The only, we are not only incorporating team play elements, but many, many asset elements that are, is enhancing the boundaries of our systems.

00:03:12

We have also problems with the compatibility concerns between the parts of this large scale, consistent hardware, the software, the firmware, all different elements are changing. It's their own pace. And how do you get all of them together? There we see desktops as a huge uncertainty. We also see a fast pace of change. Of course, when it comes to also to do the risk, we are incorporating, let it be open-source software off the shelf, software or components incorporated as a Siemens Healthineers units, and last but not least also innovation cyclist and accelerating dramatically. So all cries for fast feedback for fully automated loops and getting the possibility into our systems to leverage the benefits of this. You might have seen is a picture from a typical deployment of a cloud of homogeneous application like team today, if you go into the world of cyber physical system of systems to go, it looks different here. You have not only the scanner by itself, but it consists of control systems. And as a PC-based element of embedded solutions, you have a magnitude of individual components associated with very different elements of a scanner and all are intelligent and all are driving the complexity of a single step, which is continuous deployment through the roof.

00:05:08

So this is a typical Def of fixture, a continuous cycle of your various levels in our world. We are going to regulate it world. Of course, healthcare business has to update the rules of a regulated environment. We cannot daily drop and deliver our solutions to the customers worldwide. There is a regulatory wall. It's a regulatory wall prevents us to go to the end customer, but it doesn't prevent us. It even encourages us to go in a fast feedback cycle internally and deploy and test in user equivalent environments. Along the way said, FCME our push of the final button. So release becomes a monument. Hi, is this regulatory model. Of course, that needs to be a diligent verification and validation. That is a formal part, but we can already make sure everything upfront is right. Steps of BNV phase is simple quick and doesn't reveal too much new findings.

00:06:31

And Susan's where we started. We realized many years ago, we have to invest. The first thing we invested in was efficient stuff integration. So the whole organization was growing. And so, so integration became more and more sick bottleneck of continuous delivering well and was the goal to make release as a non-event. We had also to invest in continuous deployment. So automatic system installation on such a scammer, as you have seen with so many full of sub-components is not an easy task. It is not likely to do in an homogeneous cloud-based environment where you basically have every single ready in place to accommodate this. Here you are in a novel environment where you have to discover your own ways and solutions, how it works, but even after being able to do so, we were still on a six weeks cycle because many, many steps in between were still men. So we invested if more and we came down to a daily, fully automatic cycle and our ability to deploy our software on each CT scanner in house automatically without any intermediate people's steps. So, as Susan said, we kind of really call us continuous of integration, a huge step forward, a major leap you might sink, but it was already at a point where we almost lost our dev ops endeavor in our organization.

00:08:22

Y well, we realized very quickly we had high goals. We set KPIs, KPIs metrics, basically our north star, but we didn't reach those KPIs. So yes, we could automatically department deploy on a scanner. Yes, we were very close to green to green. Yes. We were able to run fully automatic tests. We use on a magnitude of scanners to see what happened in between, but those tests revealed issues. And we have to fix those issues equal steps in between revealed issues. Of course, we had a changing software. We had a changing hardware and both together did not allow us a daily green to green deployment. It's the beginning. So, so what already, some people saying it's not worth the effort. Let's rather focus on something else. What did we do? We realized you have to create, we call it the toes. Very different steps will be found people.

00:09:28

We found groups. We found roles who were excited about some new ways of deployment of the new ways of integration. And what say did basically say help us to continue our dev ops, because for some, we realized their benefits. We brought sending to the picture, we made SEM part of this stakeholders of Steph folks endeavor. And as you see, so it might be developers in there or development teams. There might be system engineers in Seattle. And of course in our system for close to dev ops environment, the system mentioned youth play a wider role. And not all of those steps that you've mentioned beginning, fully automated, but we had people behind every one of those layers. And every one of those layers was basically something people were working towards too, because they felt the benefit for so what was it? What we learned in early stages, we have to have a balanced in words, we almost lost ourselves in the final details to automatically deploy to a scanner and sales details.

00:10:52

Don't always come from the software side. A lot of times it was a hardware change or a change in the test environment, which costs both problems on the pipelines, but set, it came to predominant observation of our DevOps endeavor. But what we realized, if you balance the stimulus also to reach the next plateau, because you already conquered other one, your trust are not done with the last final steps send, you can contain the spirit of death. Don't rush to the next topic. Don't leave all of the topics behind self because this belief that you are a huge bag of technical depth, but understand when you have really reached up the toe and literally secured it. And Sen one finishing says last steps, and this might take you from a year or tomorrow, set a new step of in a new direction to keep the spirits up.

00:11:56

So the second topic where we almost lost organization was set. We sets this north star as a KPI, which probably should read it because we didn't reach an off yet. Of course we did not. But, um, we were also not able to celebrate very visibly, intermediate successes by doing so. So the lessons we learned is beheaded. We had to create reachable benefit, beneficial plateaus, as I've seen, if you have seen this slide before you see a reachable, you can celebrate some success because you will get some success out of here. Then even celebrating small victories. It's very important to keep the spirit of your organization directed to birds, your desktop endeavor. And remember, it's all about conducting experiments and measuring the results. Some of you might know the complexity model of Canada.

00:13:06

on the very right lower corner of city of this line. There, we have a simple environment, a complicated environment. So those are the ones where you will have already established best practices. But if you move into complex space, you will realize there is no predefined solution to your problems. And if more, you cannot find one path from the start to the goal, which you can just plan upfront, very diligently and sell it to, to the end. And you will be successful. Sarah would have to stop rewind. And as I'm kind of tells us Crow sense, respond, you set up at KPI, which guides you into the right direction. You run an experiment, you see what is the outcome of sex tournament and send a response to the results. And if it is leading, it's a great, it's a right direction. Great. If it's not leading into right direction, you learn something new and you have to readjust, but by doing so, whenever I pick basis, you're not wandering off too far in the wrong direction, wrong direction gives you insights in your organization.

00:14:25

So, so, so as a lessons learned, we had instead of a roof first stage, and I give you one example, I'm talking about the plateaus for continuous integration here. You can see this simplified chart of the number of tests we have in our platform. Yes. I could have shown you something like test coverage. I've said, wouldn't have made this example. So illustrative. So let's speak to this KPI for a moment, knowing, see I'm more to watch. And what you can see here is set. Number of our tests is going continuously up and we are in the range of 150,000. This is good. The second thing after we acknowledged the checkmark coverage is said, if you look at the unit tests say often, most in some numbers sends the integration is a subsystem test. So say really form a current. So you might've heard about test pyramids, which have the shape of an hourglass, or even possibly it's a test per permit, which is put once to tip.

00:15:45

But in our case, I think after redefining our test permits three times. So we really reached the point where our test paramedic is a real appearance. Wonderful. So we have a current. So as a part, if you look at the execution time of all of those tests, you see, we are in say 80 plus hours of execution time for all of those tests. So when use it upfront, when we are growing the number of tests, we continuously have to invest in a scaling infrastructure and we did a deeper, but at set point, we lost almost our development teams because we've got alarming signals out of the teams saying we can still do our impact based rolling tests without any problem. But normally we do every night in every team, we do a nightly build where all of the tests run and send nightly builds into beginning. Unnoticed had grown by the time, spends a required until it didn't fit into a night anymore. So they grew to eight hours, 10 hours, 12 hours, Y we continuously invested in our test infrastructure.

00:17:12

So we realized something is wrong. And if you spot it very closely, you see what is wrong? Is it number of step system tests? It's very, very small, but it's the same time as you consume almost 80% of the overall execution plan. So we did some very something very, very good. Yeah. We created a test environment and a framework and even a model referral link, which for testing, so that allowed it to make sustain subsystem tests. The creation of subsistent lists very, very easily, but on the other side now, people didn't think too much anymore integrated subsystem tests.

00:17:58

Um, so we'll stop system tests require startup of a full platform to be executed. So we thought we are almost done with this topic of continuous integration when we've got drawn back to the drawing board, because we realized we are missing something in the middle between the subsystem tests and integration tests. And now we investing here again. So we invest in some things that mocks and stops various subsystems components. The ways that we do look to start our whole platform to run the tests because we realize we are running it to say, um, it is a problems that we cannot scale our test infrastructure as quickly anymore, as we would need it in order to run all of this stuff, system tests. So we could say we failed because we went almost route to the end to realize something is wrong, but it's the end. We didn't fail. We just uncovered some next level. We need to reach the next plateau in order to move forward in our endeavor. And so the next Pluto is, of course we do so continuously, even more she's left and to make sure that we keep all balance test pyramid. And we establish a new layer, which is not only balanced when it comes to the number of tests, but which is also balanced when it comes to execution.

00:19:36

I wish somebody would have told us before. So dear folks brings transparency to your whole organization, regardless, which of those boxes you are opening up here. But I am looking very quickly into continuous deployment of you for this talk and Sierra, you will find all the out of partial pieces, which fit exactly to your DevOps stuff. You'll find first look, say, what is this? But then you realize, yeah. If I tweak it, if I change it slightly, it works quite well. But then you will uncover so scans and you don't want to have some on your front porch and you might not want to have some in your dev ops endeavor, but it is certain that you will find all of the best things of your organization under, and now we can do two things. So one possibility is you turn away from DevOps because you're saying, ah, sorry, that's too much. Or you embrace the change and saying, it is important to uncover those issues. And even going beyond set, I'm not only celebrating my successes. I'm also celebrating the people and the failures.

00:21:04

And if you are doing so, you're well underway with your deaf ups in there. If you don't sin, you might look too short. Let me share another story. I don't know what you were doing on an Sammy weekend when you took setting with your neighbor, but it's happened to him actually just a few weeks ago, when I was tutoring with my neighbor about system DevOps, youth working in an industrial environment. And he got basically, it's a mandate from his boss to his devilish dev ops in his organization. But it's the same time. It was said that you are at max 500 grand and gets this checkmark behind DevOps within the next two years at most. So I was trying to tell him that's a wrong approach to dev ops dev ops is not something you can predefine executive with SIS production. And step time you will succeed. It is more about the term.

00:22:23

So regardless they are coming from, you could have started possibly into test automation arena. You could have started in the request to be a more efficient or an efficient software development organization. You could have started somewhere in your self, the integration. This is where we started mostly because we filmed that integration into software. Rearm became more and more as a burden, or you can come particularly from the product quality area, or you could go to birds, um, moving, shifting less, also your regular total required VMT activities, your verification, validation activities, and bringing stem much closer to the software. You could also look from the customer perspective and saying, if you are in a clinical environment, every downtime of the scanner is a problem. You cannot turn off the scanner sitting in an intensive care unit or an emergency department and installs the update for 10, 12 hours. So it's all about minimizing the downtime and maximizing how much, you know, customers can rely on their computer tomography.

00:23:44

And course, it's also about house move to installations. The deployments work. What's the customers who do they have to do it manually? It, they have too many individual steps or is it basically automatically updating? It's a background for some and say, trust, acknowledge the change regardless of the income from all roads lead parole as the same. And so if you, if you want to reach Rome with flow continuous system integration, operations ability to continuous release with trunk based development with continuous deployment, continuous software verification. So Wei is not straight. It is not well-defined, it's not a paved road, which you can go full speed. It will force you to sidetrack. It will force you to go also in not only detours, maybe cycles. In some cases you might decide to add the news stories next to it like digital twin, but it always helps your organization to grow in a continuous fashion.

00:25:09

You will uncover and learn new things about the organization and it will drive your ability and your willingness to continuous improvement basically to the boundaries, your organization, after going through systems of journey will not look like it looked into the beginning. And yes, we in our organization are facing possibly more challenges. Then as a school deploy in a homogeneous cloud-based environment, let's just also meet, you can benefit if more out of this journey, because we will see more elements working together much better than before. Let it be system approaches, let it be deploying to our customers and then highly operative.

00:26:09

So if you're willing to embrace continuous change and you understand that death is a journey, you will realize dev ops will lead you into a better. So here are the key takeaways from my presentation. It's a culture, it's a mindset. It's a culture and mindset of continuous improvement is a very decisive factor you have to have in mind. And also in the minds of the management. If your management is not buying into SIS as attorney, but say are looking forward to it. As a check mark behind a topic seven, you will struggle in between rather discuss it up front and center as such move right in between.

00:27:09

It's also about establishing as much automated, fast feedback cycle as possible. You have seen in some intermediate steps, it might not be fully automated beginning, but you will work to buzzard. And even if you're not doing continuous delivery to your customers, as you might also be in a regulated environment, all of the benefits coming from your second travel benefits are so great that you can stick to it for the greater good of the organization. And you have to learn. We are in a complex environment where pro sense respond. Is it only the approach to run experiments and measure, but puts the right KPIs into place. Don't put your KPIs to sit level where you are managing. If I see a north star or you will lose the ability to celebrate and have success stories on the word, it's okay, congressionally your north star, but it's more important to have intermediate KPIs, which show you that you are working for the greater good strive to meet the overall goals of severities in both wardrobes.

00:28:34

If the people do not feel what's in force, them stay will be lost on this. Long-term since it is a journey, the people need to be kept entertained in the only way to entertain them is to create wins for them. It's the only way to create wins for us is by driving stairs, specifics, making some stakeholders, um, making some antagonists of your dev ops and training and accept set. Your approaches will not show the results you have expected at the beginning. You will uncover. So it was topics in between like the skunks to at your front porch, but by doing so and treating some correctly, celebrating Sam so proper way, your organization will grow on the journey towards different. Thank you very much for listening to my talk. I'm happy to accept a few on slack and also hear your questions and try to understand, but we all our secrets force it for truth. So let's see if I can not also learn from thank you very much.