Building a Successful Platform Team

CROZ is a mid-sized professional services company with around 300 tech-oriented people. We use our technical, domain, and organizational expertise to help our clients build better products and achieve their business goals.


The problem we encountered:


With the speed of innovation higher than ever and the technology more complex than ever, we realized that our existing teams couldn't keep tabs on every moving part in the new technology stack that was emerging in the cloud.


Apart from working with our clients on designing and implementing new features, our teams became overwhelmed with the nuances of the underlying technology platform. Additionally, with each team using the platform as they saw fit, we experienced a proliferation of delivery process variations and various practices to use the platform. It wasn’t even clear what a platform is.


The net effect we have observed was people spending time designing their variation of the delivery process and reinventing existing practices. We have also observed a cumbersome team onboarding process due to the lack of standardization, and technical debt skyrocketing just before the major technology platform migration was to take place.


All of the above reflected in a significant decrease in the flow of value to the clients.

What did we do to overcome the problem?


Looking for a different way to structure our teams and coordinate work, we have found our inspiration in four fundamental team topologies and their interaction modes from the book Team Topologies.


A year after introducing the platform team and establishing interactions with the rest of the organization, we found that the new team structure better promotes collaboration and knowledge sharing, relieves cognitive load from existing teams enabling them to focus on delivering value to our clients, and serves as additional leverage for further organizational improvement initiatives.


Thinking in terms of four fundamental team topologies and three core interaction modes cleared up some ambiguities around roles and responsibilities in the delivery process. This enabled teams to focus on what they love and do best, motivating them to further build their skills. It also made the role of every team member in the delivery process transparent. No other past initiative produced such engagement among people.


In every sociotechnical organization, technology aspects and organization aspects are tightly intertwined. Looking back, we have changed the former without considering the latter and the system pushed back with a tangible manifestation through friction and bottlenecks occurring in the delivery process.


In this talk, we will share the experience of our on-going sociotechnical transformation, and changes that worked for us, but also some that didn't.

IK

Ivan Krnić

Director of Engineering, CROZ

Transcript

00:00:12

Hello everyone. My name is Yvonne carnage, and I'm so glad I can be part of the ops enterprise summit this year. It is truly humbling to be here with you. Today. I come from cross. We are a professional services company that works across Europe. We help our clients deliver better software solutions, and we do this by helping them in four key areas, cloud native development, complex integrations, data science, and business agility. In this talk today, I will share experience from our own journey in searching for a better way of working and how the ops principles and theme typologies approach helped us come closer to that. Ideal to give you a bit more context. Cross is a professional services company. That means that we don't focus so much on our own products. Instead, we work with our clients and help them deliver the best possible software solutions that would enable them to reach their business goals.

00:01:07

Our journey has been happening in various forms and shapes for the last 12 or 13 years, especially interesting where the last three or four years, since the speed of innovation really skyrocketed as well as the complexity of the technical landscape in this complex world, our delivery teams started to lose pace with the technology. What we were experiencing was not enough communication in teams and between teams, which resulted in low knowledge sharing, which further resulted in continuously reinventing the wheel. Although some things have already solved some problems and established good practices, other things will still struggling and what is worse. They were inventing new ways to solve those problems. As a result, the level of standardization was low for every new project. For example, we were asking ourselves which technology stack to use. We were a group of people with many skills capable of delivering in any technology stack prescribed, but we were also bad at standardizing stuff.

00:02:06

And all of that further resulted in an expensive delivery process. Reinventing the wheel is I'm sure, you know, not an easy task and it takes both time and money to do it right. And that directly impacted our delivery schedule. And our bottom line, we found this to be a huge space for improvement, and we hypothesized that we could gain improvements by changing the way our teams are structured and the ways those teams interact. The result of this experiment was pretty neat. We found improved collaboration, improved knowledge sharing, and lower cognitive load on the teams that positively impacted our delivery process in terms of shorter lead time, less friction in the delivery process and higher team engagement. All of this created also a positive atmosphere and prerequisites for subsequent organizational improvements and not to mention positive impact also on business results. So let's dive deep into details on how this happened to better understand our journey.

00:03:08

Let's go back to the early days of cross in 2008 for three years in a row, we were shortlisted on Deloitte technology, fast lists for EMEA and central Europe. During this period, we have doubled in head count for CIC from 60 something to around 120 people. All this puts additional stress on our organizational structure and processes, dealing with managing people, managing projects and managing knowledge in the organization. Friction was obvious for every new project. For example, we will form a new project team and it would take this team solid three to four months to get into a really productive state and all of this on a six month project. Additionally, since we were growing fast, we had a lot of new colleagues that were missing skills and knowledge to be effective in the team. Our first turning point happened in 2010, when we realized that we cannot improve further using the concept of project teams, that form for the project and the journey after the project.

00:04:10

So we started forming longstanding cross-functional teams with the idea of better knowledge sharing and having at least a couple of people ready to take on a particular task. In other words, we wanted to eliminate situations where people are the bottleneck and the world would stop. If they went on vacation or something similar, this move to longstanding teams worked perfectly. We achieved so many benefits here, like decentralized work management elimination of bottlenecks, increased efficiency and better knowledge sharing what our teams continuously aspire to is to own as much of the delivery process and supporting infrastructure as possible in order to reduce dependencies and increase the flow of features. Apart from the delivery teams, we also had a team called internal it team. This team takes care of all the hardware in the company and provides servers up to the point of the operating systems. They also take care of other infrastructure services, such as email file sharing.

00:05:13

They take care of the labs that we work on and so on. But what is important here is that the internal it team would provision a new virtual machine to the team. And then the team would install and further maintain all necessary middleware components, such as application servers, databases, integration components, and similar. This allowed the delivery teams to own all the environments and configure them as they saw fit. Every delivery team had the skills to open up on administrative console often, for example, application server to configure database connections and all other artifacts and deploy the application. But then the cloud native happened with all of its upside, such as moving to the cloud, scalability, elasticity, low technical and administrative barriers to entering the cloud and your mindset that supports majority easily coupling in run time environments and better automation tools. More automation leads to shorter feedback loops.

00:06:15

Short feedback loops in turn enables us to learn faster. So it's easy to see why cloud native approach and cloud environments are so appealing to modern customer oriented companies, but cloud native is not easy. And there are a number of challenges, especially for incumbents in the industry. So cloud native is much more complex. And from a technical point of view, there are literally thousands of products and providers in the current CNCF landscape cloud native also introduces different development party. The one that is focused on the coupling and modularity, especially in runtime, it also requires a different organizational setup. The one based on the autonomous teams and the different organizational culture, the one that supports learning collaboration, knowledge sharing the one that uses mistakes as a learning mechanism and not as a trigger to get somebody fired. We jumped of course, on this cloud native train.

00:07:13

And I remember it was early 2016 when we installed our first OpenShift cluster. At first, we were just piloting in our lab, but very soon we started running real projects on it. And so on, we stumbled on each and every one of the challenges that we mentioned before the cloud native approach changed everything and introduced new concepts, such as cloud run, time containers and container orchestrators. All of that deeply affected existing practices regarding configuration management, deployment pipelines, deployment strategies, and vulnerability scanning, just to name a few, these additional moving parts where the straw that broke the camel's back, it became impossible for our teams to simultaneously both deliver new features and maintain the new infrastructure stack. From our own experience, we knew that teams were a powerful organizational concept and the key to solving this problem, but still we were circling around and trying to apply it correctly to this new situation.

00:08:17

Right? About that time, I heard about a book called team topologies and I loaded an audio version on my mobile phone. I checked it was the 11 day after it was first published. And this book suggested a different way to form an operate teams in an organization. Theme topologies is a collection of best practices in designing, uh, an organizational structure and team interactions in order to achieve a higher goal, which is in our case, increased business agility. The approach is very conscious of the cognitive load that is present in organizations today to put the cognitive load under this approach suggests considering four distinct team types stream aligned teams, the ones that deliver features to the clients, platform teams, the ones that built and maintain the platform that helps the streamlined teams do their work better. Then the care enabling teams, the ones that evaluate new things and introduce them to other teams.

00:09:17

And finally, we have complicated subsystems teams in cases where skill set is so specific that it makes no sense for every person in the company to have that knowledge. There are also three standard ways in which these teams can interact, but I won't go into details here because I'm sure you're all aware of this approach. This was in 2019 and after forming longstanding cross-functional teams, this was our second turning point. The concept that stuck the most with me was the notion of a platform team. If you have streamlined teams that deliver value to the end users, let's not our burden, those teams with the additional cognitive load of maintaining the platform because the platform is becoming more and more complex every day. Rather whether does build platform and provided as a service to streamline teams, this way streamlined teams can focus on delivering value to the end users, which is what they do best in the organization.

00:10:17

Anyway, the team that is providing the platform as a service is called the platform team. By doing this, we are splitting the cognitive load between the streamlined teams that are using the platform and platform team that is maintaining the platform and evolving. It's further splitting the cognitive load this way, improve the efficiency of our teams. Once our stream aligned teams didn't need to focus so much on the platform, but rather use it as something that works and is provided as a service, they would find it much easier to focus on their everyday work, which is delivering value to the end users. There is always a famous question of what the platform really is. There is no single definition of what a platform is. It basically depends on the organization. As Metro skeleton road, a platform is a curated experience for engineers. In our case, our platform at CROs is based on the red hat OpenShift container platform.

00:11:16

Our platform team maintains this Kubernetes based layer, but also some of the services and features above it that are necessary for stream aligned teams. This includes for example, monitoring tools, observability tools, and some security features. Also, let's not forget self-service features because what we want here is the least possible friction between streamline teams and the platform team. If streamline teams need a particular resource, it would be best if they could get it through some kind of a self-service platform. In our vision, a self-service portal is a component that compliments Kubernetes based layer and brings additional value to the core platform. In that sense, the platform is larger than just the runtime engine. It also encompasses a self-service component that provides a user interface and API APIs for the stream aligned teams.

00:12:11

Our platform evolved over time to better support situations we faced in real life. For example, being a professional services company, we develop systems for our clients. Many times the client wants to move to cloud native, but has no supporting infrastructure set up and we need to help them build their platform. For that purpose, we have built automation scripts using Ansible that help us install and configure the container platform. We can also extended our platform to use get-ups principles. More specifically, we are using Argo CD tool. So the configuration of each application can be stored and managed. We are including this in the platform, enabled all teams to just use it. When we help our clients in setting their platform, they always need a basic deployment pipeline that they can quickly use, but that can also serve as an example. That's why we extended our platform with a basic deployment pipeline that can be quickly customized to a specific need.

00:13:11

Part of our platform is also let's call it that way. A standard application stack. It is a stack that we will use if the project is Greenfield and we have full autonomy to choose the technology for this purpose, we have formed a special team called the incubator. This team has two main goals. The first one is to define and further evolve our standard application stack based on typical use cases that we face and the current state of the technology. And the second goal is to onboard an enabled junior colleagues to use this tech and to embrace good engineering practices, such as working in small batches code review, writing tests. And so on. Our incubator team is a good example of what team typologies call the enabling team, enabling teams increase organizational capabilities. They do so by evaluating new technology and introducing it in a controlled fashion, they also educate other teams on how to use it in effect.

00:14:11

They are a reconnaissance team that puts in effort to explore new areas. So other teams don't have let cross enabling teams come in various forms. The incubator team that I mentioned is an example of a long-standing team that continuously explores the state of technology. On the other hand, we had an example of an enabling team that was exploring groovy and grills technology. And after this team concluded that the stack is fit for purpose and enabled other teams to use it. This enabling team was dismissed. The platform team also serves as enabling team, keeping the eye on the platform related technology and, and helping other teams to use the platform. Professional services companies are a bit different than product oriented companies. That's for sure they have a different business model. And that also reflects on the kind of support that platform needs to provide to other teams.

00:15:09

Some of the specifics that we found are following first platform needs to support many more products. Because as a professional services company, we work with various clients to solve their business problems. Different problems require different technology stacks. And consequently, not all products that we build are using the same technology stack. Our clients use the technology that best suits their needs. Therefore, our internal platform aims to be the greatest common divisor that covers the needs of all these technology stacks. For us, it is a continuous effort to strike that perfect balance of standardizing as many components as possible by putting them in the platform, but also giving streamlined teams enough flexibility to build the best possible solution for every client. By letting them use the components that best serve the needs. Standardization is good. It helps with knowledge sharing, onboarding new people and debugging problems, but standardizing too many components by putting them in the platform, imposes additional constraints on the types of solutions that streamline teams can produce.

00:16:22

And we want to retain a piece of that autonomy. Second platform development is partly constrained by clients technical decisions. Professional services company helps client companies build better digital products to achieve their business goals. And therefore a professional services company needs to fit into constraints that are already set by the client company. Sometimes this doesn't matter, but sometimes not being able to choose your own tooling feels like solving a problem with one hand tied behind your back, and you need to find new creative solutions. Third platform needs to provide a wider range of services because since we are building various products that will be used in various client environments and in order for our development environment to be as similar to the client production environment, we want to use the same services as the client. For example, for logging purposes, we use both Alex tech, but also lucky with prompt tail and Grafana.

00:17:29

For some projects, for example, we need to provide OpenShift version three dot 11, and for others, we need the latest version for that six. And the platform needs to support this fourth because of all this, the cognitive load on the platform team in a professional services companies in our experience higher fifth is an interesting question of who bears the cost of building the platform in a professional services company, since it is not built for a specific product, like in a typical product companies should project somehow proportionally bear the cost of the platform, or should that cost be treated as an infrastructure company cost. We hope that for this latter option, because it is simple enough for us and six is our final observation that a professional services company usually helps its clients set up their platforms. And in effect, acts as an enabling team for the client.

00:18:32

This means that people working on our platform team have significantly more external consulting engagements than they would have. If we were building our own platform, instead of delivering services who drives the platform, evolution at cross initiatives are coming from two directions. The first direction is from the platform team itself. Since the platform team lives with the platform every day, they know its strengths and weaknesses. They can recognize room for improvement. And the platform team also keeps an eye on new trends and suggests new capabilities that could help streamline teams. The second direction is from streamline teams. We consider this direction even more important because the sole reason the platform team is building the platform is to help the streamline teams. And in a way, the platform team recognizes and treats streamline teams as their clients and in a true product management fashion, the product team needs to take care of the platform and talk with streamline teams to understand their needs, their struggles and their challenges. In essence, the platform team needs to learn in which way to further evolve the platform and bring the maximum value stream aligned teams.

00:19:53

We also learned some things the hard way, for example, what happens when you base platform evolution too much on the bottom up approach? So when it's driven from the platform team, in one specific example, the platform team wanting to make the platform more secure, introduce the concept of seal secrets and made it mandatory. They certainly increase the security level, but the problem was that this change came from the platform team without any consulting with the stream aligned teams and from the platform team point of view, this was a no brainer, but those three malign teams, this was a major change that impacted their delivery process. And it took them now way too much time to do regular stuff. And we had a problem. This dynamic is something that took us a little bit longer to figure out it is not something that came naturally. In our case, we have traditionally pushed platform evolution more from the bottom up as a result of the platform was on the brink of being disconnected from the stream aligned teams, because people in the platform team were implementing some ideas that are not in line with the needs of the streamline teams.

00:21:03

What we have missed here, making it a good lesson for other organizations is to start treating the platform as a product. And the only way to treat something as a product is to recognize who are we building it for? Who are our personas? We need to understand the customer journey of these people and how the platform can help them. It doesn't differ that much from typical product management work. And for that, I strongly recommend escaping the bill trap a book that taught me that product management is not only for people building tangible services for customers. It is a necessary skill for anybody who wants to do anything we're doing in a world full of uncertainties and distractions. And that includes running transformations and building platforms as well. All the activities and techniques that we would use if we were designing and delivering a product like a web application are also applicable here.

00:22:00

This comes naturally when we are building a web app, but it doesn't come naturally when we are building a platform and it is something that we need to do consciously. Once we made this mental switch of treating the platform as a product, things started to move in the right direction. And this little shift in perspective made the biggest change for us. We needed to change the dynamics from two teams, driving the platform, evolution each from its sound direction to two teams working together to figure out what's the best for the platform. And looking at this image, it seems like it is easy to do everything seemingly by the book, but still repeat the same conceptual mistakes. Because when you look at this image to the left, what is this, if not the classical dev versus ops problem, but this time disguised in some other keywords to get that kind of healthy collaboration, we needed to have all communication channels open across the company.

00:23:00

Some of the practices that helped us are for example, platform, team talking directly to its users and getting to know firsthand how it's like to use the platform. Then you have tech Thursdays, which are sessions in which anybody in the company has an opportunity to share with others, interesting things that they have learned or share mistakes that they did for others. Not to repeat them. We are relying heavily on communities of practice, where people share ideas, experience and talk about ways to improve the practice. Some of the larger communities, our community of practice for team leaders, for product management, for project management and for people management, flesh news is a recurring event where the platform team broadcasts, what has been done on the platform recently and announces general future directions of the platform. The platform team itself has weekly synchronization sessions, where they share what has been done the previous week, what we plan to do the next week and how is this all helping us reach our goals?

00:24:08

The weekly cadence works well for us. And I'm glad I found a confirmation for this approach in a great book, the team that managed itself and in a conversation with the author, Christina, Woodkey a great book. I highly recommend it. Of course, the platform is not a silver bullet. There are many challenges that we face that cannot be solved by any platform, like for example, what to do and maintaining long living products. Over time, we are working with some of our clients for 15 years now on some of the systems, moving away from project mindset towards product mindset requires a different approach that has less to do with the platform and more to do with organization, culture and leadership. One of the challenges that we are facing is for example, how to retain the windage skills in the team. And I'm deliberately putting vintage in quotes here because we have seen people calling vintage everything that wasn't latest and greatest, the same goes for developing new engineering skills as well.

00:25:10

This is where the concept of longstanding teams helped us carrying a team that is not dismissed after the project, but the one that stays together enables us to retain the skills to always have enough people familiar with the technology and always have good mentors. We also need to embrace the reality and that is that people leave some go to other teams as a part of their personal growth, some leave the company. But if these changes happen in a controlled manner, then the team can handle it. The team can assimilate new members, adapt and come to the other side stronger than before the platform cannot help here. But smart leadership can. The one that understands that teams are the basic organizational building blocks. The one that is aware of the importance of cognitive load when forming teams and the one that provides full support in terms of healthy environment, where teams can thrive.

00:26:09

I'd like to briefly go back to that issue of using technology that is currently not the latest and greatest. I'm sure that most of you know, what kind of machine this is on this picture, some of the prettiest and most reliable houses I've seen were built this year using this machine. So it's not about tools. It's about vision skills, principle, craftsmanship, and most of all, it is about the team that you are building with in the end, I'm very happy with the results of our little experiment. We have confirmed the blueprint for how things should be structured. We have increased collaboration, then knowledge sharing, splitting the cognitive load, relieved that much of the pressure from streamline teams and enable them to focus better on delivering value to end users. The platform became a catalyst for change. New technical practices were introduced and standardized, and we noticed increased engagement among people. The changes had an obvious impact on our organization and delivery process. And this was bound to be reflected also in business metrics. My colleague board member at cross has his own view on this crusher.

00:27:24

A few years ago, we have been living happily and working with limited number of local plants. But then we decided to expand to Europe and north America. Soon, we realized that our existing delivery model doesn't work anymore. We got lost with increased number of projects. We struggled with too frequent switching between the projects and problems with knowledge sharing. So we have to reinvent the way we work. We simply had no choice by the way, they say people are generally forced to change. In our case, the motivation was intrinsic, but very, very strong after we implemented all those nice things even had explained. We discovered very pleasant side effect. Our revenue increased 70% while head count grew only 10%. I guess I don't need to explain our satisfaction. It's something that every it service delivery manager dreams about, and we keep on promoting those organizational idea with our customers, helping them reap the same benefits.

00:28:26

Thanks Kadesha. Our journey is far from finished. There is much more to do right now. We are focusing on further improving self-service capabilities, automating project resource governance, driving new technology and capabilities through the platform, implementing registry management tool to get even better insights into our delivery process. And we are working on connecting the existing portfolio management process with delivery metrics to better tune the flow of value through the organization. We have much more confidence today than we had at the beginning of our journey. And I hope that I'll be able to share with you how things turned out the next year. Thank you.