San Francisco 2014

Transform the Invisible Wall

The problem statement: at least 8 teams across department involved to deliver a new feature, with an average of 4-6 week. It becomes an issue when the senior management team has made a decision that they’re moving to cloud service. Should they do lift and shift, or should they adopt DevOps to revolutionise the way of working? Lift and shift seems to be a default option if the team has no idea; if you want DevOps, you have to know what’s your goal and fight hard. In our practise, senior management has played an important rule to get people agreed.


Top 5 priorities to be addressed along the DevOps journey:


1. Different goals;

2. Ownership: Access and Permissions;

3. Job security;

4. Organisation structure;

5. Compliance requirement.


As well as how to address these priorities; this is a long process and it could go back and forth, people needs to find their foot in the new world and they want to be valued.


At last, I am going to share the methodologies to measure the result for the project as well as the team to ensure it’s a sustainable process. The tool set includes: quantify the value for automation; devOps skill matrix.

MX

May Xu

Senior Consultant, Thoughtworks Australia

Transcript

00:00:08

Good afternoon everyone. Can you hear me okay in the back? Yeah. Cool. Also, so I'm my, uh, consultant from saltworks. Uh, so most of you might have heard of saltworks. So primarily working with our clients to go through the agile journey to go make them in is both of into continuous delivery. I actually coming from <inaudible> Australia senior office, so I think in those two days we have heard lot of good talks regarding to about unicorns, about houses. But today I'm going to talk about story about the koalas. So we have quite a giant koalas, so we'll see. That's your forward button. All right, cool.

00:00:49

This right side

00:00:50

Here? Yeah, this one. Yeah. Also, can you say that? Okay. Yeah. So just before we talking about DevOps, so I actually want to throw two question to you. So one question is when people, so as we all reading the books, so for example the Phoenix project from J. So why do people have different takeaways from that? Another question is why people have different reactions when they hear DevOps? The same word. Why do people react differently? So I think according to think, our brain is really acting like program. So it has input and it has output. So what is deciding the output? What is impact the behavior of people? That's our mindset. That's the perspective. So this is also a disclaimer from myself as well. So also I'm going to share with you today is really my perspective regarding to my understanding of DevOps, my journey regarding to what was the lesson learned was the accomplishment.

00:01:52

So I hope you'll see that. Alright, so this is a purely very simple agenda regarding to that. So first I'm going to give you introduction regarding to the context about the client. So I'm working together with so in which what kind of organization looks like, just to confirm you, whether it's as a giant koala. Another one is really regarding to uh, one of the big challenge we are running into when we started working is the first question is really what is DevOps? What is DevOps? So then another reason is when we talk about that to say why do we need to adopt DevOps? So these are the two biggest question when we start to talk work with our, our clients regarding to say we need to go this way, but why? What is DevOps and why? Then next step, I really going to go through the detailed journey from start to the end and really about the future.

00:02:48

I would like to make it right beginning. So I think in the beginning I can give you abstract regarding towards the story look like. Uh, we actually have the story. So we started to work on this um, DevOps initiative for a very big finance project finance organization starting from a year and a half ago. So when we started the snowboarding, the organization know what is DevOps, what that that look like. And most people seeing this won't work here because we're a finance and the regulators won't allow us to do this. So, and a year and a half ago, just a week before I come here, uh, I'm participating in one of the big workshop in the company wide is really looking to how to employ that was in the company companywide initiatives. So that's pretty much, yeah. Alright, so this is what the koala look like.

00:03:45

So it is a top 20 a s tech list finance company. You can pin me afterwards if you can guess what the company name is. And it has 15 k uh, employees and it has very agile, very good agile lean culture. What does that mean is actually if you talk to any business of this organization, they can easily tell, understand user stories and know how to split in ity that everybody can easily deliver those kind of things. So business, they have story boards, they have the backlogs, so they know how to do that stuff. So this organization with very good culture and technology wise, because it's giant, there's uh, historically they have uh, several acquisitions. So that means et cetera. They have several strategic legacy systems they need to deal with. So for this one, we actually started from a system at, actually they have 41 website.

00:04:41

It was actually managing the all the line of business in clothing, uh, personal insurance, live insurance, banking, commercial insurance, all those kind of stuff. So that's where we get started. Alright, so looking at this beautiful pictures, hopefully you can figure out where they're come from. So the, I think the, yeah, the first one is about, it's actually a spider, the house in Philippines and the right, right top one is a tree house in New Zealand. I believe most of you can tell what's the bottom one is <laugh>. I'd search this from Google because at that time, this is first time I'm coming here. So this is uh, picture from, from San Francisco. So what can you tell from this? So this, this house, uh, look different. They're building different material and they are in different countries, they're in different environments. So you have sea house, you have tree houses, you have the skyscrapers.

00:05:41

So what's, how is this related to DevOps? So this is actually from map. I really think there's a great similarity between the house and the current DevOps because when we asked about the question what is DevOps? Because when I was asked, I see, I see, oh, some people said DevOps is about collaboration between dev and o. Some people talk about this automation. I think that the all choose or chooses in all of the answers. But what does this mean for organization who's new to DevOps? So this is actually, I'm going yeah, introduce you to my definition of DevOps. So these are my half of DevOps. So if you're looking into this that actually uh, can structure this in into three parts. So the first top one is bus business value. So anytime when you want to adopt DevOps, there's reason for that. But you just cannot have DevOps as a goal because DevOps cannot be a goal and it is not a goal.

00:06:45

So it's really approach to enable certain business value. And in the bottom, in the bottom it's a call that is as a base because the base that including the environment, so the environment including the countries, the regulations, the media actually you might wanna do why I put media there or talk more later and then about that is about people. Because when know people is really, when we talk about DevOps, talking about technology practice, I think that's no own that for me that's uh, they are all less important than people. And then about that is organization, organization is every organization's different. So that means we need to treat that different. So that's why I have those kind of thing as a base and build upon that, I actually have three pillars. So once the pillars is about, so principle, principle pillar is talking about what's kind of value, what's your belief of the organization?

00:07:45

You have to, because this are the guidelines regarding to when people have confiance have concerns regarding to OSHA go this way or that way. This principles, these are the things that give you the direction. And then in the middle is about the team. How do we construct team? So then this last one is about practice opposed that last one because mostly once you have the a principal, you have the team. So the team will decide what kind of practice they're going to adopt. Alright, regarding to why adopt that was, I think there's so many people talking about the reasons. So for me that regarding to the focus organization we're working together with, it's mainly regard. When we get started it's not very clear. It's not very clear because it was triggered by application team and the <inaudible>, oh that's all the things that the cool guys do.

00:08:38

We shall do this as well, but what, why are we doing this? So, uh, again, to take talk more about this in the later side, but I think for this one, the main takeaway for for me is really about stay relevant. So because the world is fast changing, so how to you stay relevant to your customers? How do you stay relevant to the industry? How do you stay relevant to your employees? Because if your employees really, if you want to return, hire some cookies. So that's really important for you to stay relevant to this. Alright, so this is a simple steps regarding to build the DevOps, your house of DevOps. You can have your own house of DevOps, you can define your own practice, you can define your own team structures and you can define your own principles and certainly you have to, the number one thing you have to do is to identify the goal. So why do you want to adopt that way? Yeah, so then that's quite easy. It's really understand by develop the pay. The last one has to remember this. This is not one of things that means you just cannot do this in one time. Say I'm done, I'm done. So you have to have a iteration K rating.

00:09:55

Alright, so now I'm going to share the detailed story about uh, the journey when we started. So this is about discover the business value about why we doing this. This is quite interesting because this come from when the organization actually made a commitment, uh, a year and a half ago. So they made a commitment to a, they have partnership with AWS, then they come say, oh yeah, we want to have our first batch of sites going to AWS in three months. And then they have made another big company say, within 18 months we want to migrate all our applications into cloud. So this is the content when it happens, once it happens. So it certainly you read this kind of project you already drive driven by the infrastructure guys. So the infrastructure guide infrastructure team is really just get the application team into the meeting room to say, yeah, this easy, give it this really, this is a cost reduction initiative.

00:10:51

Let's do this way, let's lift and shift. Alright, so what does the application team say? Application team say, alright, we already identify quite a few pinpoints of the current process, current working environment. We want to have all those kind of shiny new architecture in AWS. Can we do it in different way? Can we go to DevOps? Certainly we have certain, quite initially, I think I'm part of the delivery team at related meeting. Can we do this? Actually it's no, it's a fail, it's failed try because we felt that oh no, we just cannot resolve this at the same level. So way we tried is we have to figure out another work around how to approach this, how to get infrastructure team to agree to go to buy in. So the approach we are taking is really we, we started to talking with executives. So we work out some business case with some amazing numbers and then kind of presentations just really show them what's the key balance phase we're addressing.

00:11:54

One of the key one is really about, uh, reduce the time to market, uh, improve <inaudible> and improved quality. So luckily enough I think that's uh, the executive is very supportive. So we go, uh, because at that time in the dealer team and infrastructure team are actually in two departments. So actually we need to go through two executive to get that in. So, but luckily enough we, we just goes to executive reach agreement and then in that case it's example top down approach. Then we kind of reach finally the team. We agreed yeah, we'll go to this DevOps way and this is a business value we are going to try to achieve. So that's the first step is re regarding to the this. I think overall if we look back, this is really a process regarding to initially I think the team thinks they knows business value, but then we cover, because when we figure out a conflict between two teammate, it's actually not known then kind of they say after student discussions meetings and finally we have one agreed shared business goal.

00:13:02

Yeah, so once we have the, uh, business value identified and we start to understand the base. So the first thing is about the environment. So this picture is showing that it is, uh, I think the right side hand side of the, the first top one is from Sydney Morning Herald. This is talking about the robot is taking away jobs. It's a big statement made by uh, the largest online employment website. So it say robot is taking away jobs. And the sec, the one is regarding from made technology review, is basically saying that the technology is create is destroying jobs faster than creating jobs. So imagine that this is really what kind of news that people are reading. And then in the right side it's all of us bad news about AWS about all the outages. So we kind of say, yeah, so now it's, it's really to say understand why media is important because you, we talk about just so many talks of talking about the people at fear.

00:14:07

What, so what does this fear come from? It's all coming from here. So we are living in this environment, this is the environment we're looking to in order to, I think it's quite natural for people to feel because they have no know knowledge regard to what it looks like and this is all the information they're getting. So to get with this right, is really just to say I think one of the uh, big thing we learn is change management is really, really important in adopting DevOps. Provide enough the right information provides the right communication channels is really, really important.

00:14:46

Yeah. So after we understand the environment as people, the third level of the basis really we need to understand the organization. So this is an organization definition from the business dictionary. It's quite interesting because if you simple, first of all, organization is a group of people with a specific purpose. It's about the peop people on purpose. What is more interesting about say organization have a management structure that determines relationship activities, all those kind of thing. Because DevOps is supposed to change the way that people are working. So this is actually indicating management structure has to be changed in order to adopt DevOps.

00:15:31

Alright, as part of that we are actually talking about say invisible. So what's invisible because when we started we it's actually still happening does not mean say we still get uh, challenging questions from people regarding to say yeah, this won't work here. Why do you ask guys ask going this way? Because we used to work this way, it has been working very well. So it's really regarding to walls behind all of this. So I think as uh, professor has done a research regarding to two, so people can divide it into two kind of mindset. So once fixed mindset, that's called gross mindset. So fixed mindset, you represent intelligence cannot be developed. You say what kind of capability cannot be developed and that people with gross mindset is actually this can be changed. So you can learn by doing things and you can have new capabilities.

00:16:25

So number, my rule of uh, to fix mindset is really the number one most, most important thing for them is look good. I want to look good all the time. I want to perform good that I don't want to look stupid. So what does that mean that you are a award challenges. So if there's something the nuns, they're not sure whether they're going to perform good, they'll probably say no. They just give you reasons to say no. So just like, uh, security does not like that, but is that true? Have you ask why? So those kind of question. But people with skills mindset is a yearly. So number one rule is really learn, learn, learn. For them it's really that means every time they're facing challenges really to say I want to give it tries and learn. How does that look like so I can learn? It's really, so it's regarding to people have this mindset. So regarding to this kind of a change because there are both, there was some cloud are new to the organization, you do need people who have growth mindset in order to get this moving rather than say no, just say no to this. Yeah.

00:17:33

So as we all know that people has mindset, the organization has mindset as well. So it's similar to this. So organization you can see the way that organizations responding to different challenges changes. You can actually figure out what the method for the organization look like. So for this one, I think for some of the organization is not just means the organization is just, if the organization is a fixed method, then it's everything is fixed method, it's actually say something. Sometimes it's could be fixed mindset, sometimes it's open mindset for this one is really the easy way to develop how to develop a fixed uh, group mindset for organization. So number one rule is to enable steady progress. What does that mean is actually this has been a research doing regarding to say what motivates peoples the most, what makes people happy every day in work.

00:18:31

That is actually to say makes progress every day no matter how small is that. So that's the biggest motivation for people to work. So in that I think that's very true. Imagine people also can make to say I had the check-ins and my checking production. So that will certainly make, make people, but imagine if another people just coming to certainly it is really just say yeah, I'm working on this issue. I still do not have on my access and seven days later to say I still do not have access. Will these people be motivated to come to work? No. So that's the first thing is really important. I think another thing is this is a reminder for the management team. So how do you enable providing environment for your team to make progress? So it's really to say you should really focus on the uh, block and remote voice rather than just micromanaging those other things.

00:19:28

Another one is really about safe to feel environment. What does this value, this is actually value to say it's okay to feel as long as you learn from that, as long as you learn from that and in the meantime you need to kind of lower down the cost of failure because if the failure has to happen in production, it impacts your customer. That's not, that's not reliable. I think another one is really just say if you could have your dev environment the same consistent as production environment. So you kind of reduce the costal failure. So this is really giving encouraging learnings. The last one we would like to really to mention about is innovation friendly environment. How can you create that? Because I think right now most of organizations are talking about we innovative, we want to encourage innovation, all the areas. How do you encourage that?

00:20:18

So it is really about to give people assistance, set up clear goals, provide them enough support when they need it, just do not create blockers. So these are the key lesson learned when we doing the growth mindset for the organization. So yeah, uh, so right now it is really we coming to the principles. So what is most important for the organization? We had started to worry about shall we think some of the practice is important, but finally after a kind of a brainstorming session with various team who started to reach out to agreements. So number one is really self-directed team or command and control self direct team. So number two is co correction or perfection. Number three is automation or menu. So I will just give you a quick example regarding to how do do we build teams like that. So the self-direct team, number one team has to be competence enough.

00:21:17

You don't the team make steel theory. So in our case, I think the team is really uh, worry experience because they have been uh, working with the customer's business directory for several years. So they know enough about the customer, about the business value and what enables them become a self direct team is actually the experience I just shared before because we actually are, we are the first team saying no to lift and shift. So that infrastructure team are working with multiple application delivery teams. So no other teams say no, they're just going lift and shift. We the first team saying no to that one. I think of the team get encouraged when they see, oh alright, the management team actually value that because that actually a difficult, oh business value is the only goal that guided why what kind of decision is made. They actually say, oh in the future this become to become, to make more and more decisions than themselves.

00:22:11

Actually they're going to, they actually really like really start to realize they can lead the decision rather than uh, waiting for the decision to be made. So this, I think, uh, in the management, the management team is really providing good support as well because so normally whenever you share some different opinions, you restart with why instead of say, I want you to do this way, do not give detail instructions. Start ask why. Why do you want do this? So this is really cool to have a a self-directed team. It's not a self organized team. This is really about the team can make decisions because this will certainly help the team when they those certain, because if you have to go five levels up to make certain decisions, those waiting times, the people can do nothing. Just wait for the decision to happen. So automation or manual, this is quite the way that we approach this is very interesting.

00:23:06

I just do not include this in my slide, but we do have one automate, uh, one automation roadmap. So we indicate the automation, the overall things that needs to be automating to three levels. The first level is regarding to the configuration, uh, infrastructure levels and we go into continuous integration. The the third level is about continuous operational. So that's the three levels we got draw into. We draw a box about to say, uh, in 10 to different colors. So one is red is is manual and yellow is automating and green is automated. We show this to the management team. What does this help is actually it's number one is this is going to encourage people when we automate a certain component, when we automate for example the EC2 incident, you can see that become greener. And uh, number two is really this will engage help on the collaboration because we had some difficult times to engage with our network network security team.

00:24:11

They usually just say, oh we don't have capacity to do that, to support you to do the automation. And once I released the first version of that, I actually get call from uh, our network friend to say Yeah, we want to do this. Why we want to <laugh>, we want to help you on this. So number three is really, really good for other people to have a kind of visibility because this going to give people kind of accomplishment because when we do most of the DevOps work is really hard to make the visible, it's really hard. But for having this showing on, showing on the wall, demonstrating to other teams, this really makes the team proud.

00:24:51

Alright, so we started with and we actually when we started project the projects quite fun. We started with in order to deliver this one piece of work, we had eight teams engaged. So that actually is the number of teams engaged and actually then certainly don't forget our friends of from risk from auditing and from security. So the eight teams involved, I think I was amazed by the number of teams involved in one of the feature delivery. Certainly that that means for me that means handoff. You need to go through those many people to get things done. And the way that those many teams talk to each other is quite interesting as well because they're communicating. We work orders, they do not talk. They say if you want me to do some work, create a work order for me. And then another one is really the transparency regarding to the team two team.

00:25:44

So how do you know each team like, like it is actually to for me. So a team might, the green team might, is actually look like a black box for the blue team because they have no idea what the scale look like in the green team look like. So those kind of things is really interesting and you really, what does that mean is you already answer your end of feature delivery. You really have some people from security or auditing asking you some question. You just feel like I don't have answer for that <laugh>. And unfortunately if you don't have answer to satisfy those people, you just won't be able to release on time.

00:26:24

So what we decided is really a cross-functional self-directed cross-functional end-to-end delivery team. So we already talk about how to build a self-direct team and cross-functional here is actually means we need organization structure to solve that as part in the past 18 months we already have three reorganization in order to support this kind of activities. The first small way is we have some way middle way system engineer coming to the application delivery team and after years and kind of the going to realize, oh this is helpful because we'll be able to transfer the knowledge from the man aware people into the application team and we'll be able to be automate those and say yeah, we should do this in the bigger scope. Then they did another organization change and later on another organization change is really, but I won't say we'll already done enough to support that is this is still ongoing.

00:27:22

So this is really based on the kind of practice where you are regarding to that. So what kind of uh, end to end delivery team gives you. For me, I think for that one is really the team is looking over from uh, creating new features to fix the box and then actually if they run into any infrastructure issues, they need to fix that as well. So this really give you a big view regarding what this end to end mean because previously this uh, also changed the view for business as well because previous business really know and I really only talk to brand to get this code fixed and delivered. That's it. And right now for business, they know right now they need to understand those if the security patching is part of the work as well.

00:28:08

Yeah, so when we talk about the DevOps practices, the payload, this is talking about the payload practice, we're talking about that just too many. So just too many. So asking for any new status, you really just wonder, so what does that mean? Well, should I, shall I get started? So I basically grab them in into three categories. So one is an essential one, essential one. I really just think if you do not do this, just do not say you're doing DevOps way because this is for example, infrastructure as good, beautiful failure, continuous integration, test automation. This really as an essential parts of Dev Ws. Then another one is regarding to advanced one. So I think advanced one is quite, uh, quite uh, different from organization to organization, but it's really about uh, visibility dashboard, everything. You have monitoring and you have operational metrics. So this is something we are still working on.

00:29:05

So another one is really about, uh, we realize, so visual realization. I realize this is crucial, important to get this impacted because this is really important for, to get the team engaged. This is really important to get business engaged as well because they know this is something real and customer side is really about the different according to the different organization might have different practices. Number one is about tools. So I think today, just two days we heard a lot of tools. People are using configuration management tool like PIP in ship. So for us, we're using Ansible. The only reason that blocking us from using pipe and share is really to require root password. And we still <laugh> do not get a yes from security yet for us to get to have a permission to use the root password. And uh, that will I and another one I'm going to share a little bit more regard.

00:29:58

One, the practice we introduced. I'm pretty sure most of you have heard of the Chaos Monkey from Netflix. So that chaos monkey thing is really just go to your production region randomly kill around your production instance. It's really to test resilience of your production architecture. And for this one, I'm pretty sure whether you have noticed that you actually have Curse Monkey for your organization as well. Do you know that it is for your, have you ever ran into an issue to say if one of your team member is sick on leave or just quit? So some work just cannot be done. So once we step into this DevOps world, those people really just come in the uh, managers, directors exactly really come into what does team look like after DevOps, what kind of skills we need to retain. So this is really a way that we introduce to uh, try to build the resilience into the organization.

00:30:56

So the first step is really identify the business goal to see what kind of goal you want to have and about the the skills. The skills is different from team to team. So I think this one is really important and your definition of the skill is different as well because Apache means for this team, team A could be different from team B because they, you already do different things on certain things and another one is regarded. It's really simple. So you just ask the team for the self assessment, then you manage this periodically you'll be able to say, uh, you can do a different, a lot of different analysis on this to say which the is the weakest skill area. So you can organize some activities, cross training or pairing or just on job rotations, those kind of things to just make sure you have quite a few building in your organization.

00:31:51

You don't worry when people just going on leave or people just simply quick. Yeah, so after this month is kind what kind of business value we have achieved. So, uh, this, I think this is really straight. So for end-to-end infrastructure, uh, probation to use, we used to have four to six weeks and right now it's kind of four hours. It actually could be less <laugh>. And another one we want to deployment tiny time is 30 minutes or more right now, uh, 90% of them is less than 10 minutes. Uh, so one I really want to talk about this, about infrastructure testing. I'm not sure. Uh, we have a routine regular firewall change every, uh, twice a week. So every time when there's some firewall change, the network security team go in there, just add some firewall, remove some of firewall, and in the next day you might not just say my application is broken.

00:32:46

Then you spend maybe enough hours words to figure out what's that work and then you figure out, oh, maybe they change something. So what we have done introduce, we introduce the kind of infrastructure regression testing using Ansible. We just run those kind of, uh, regret regression, automated testing daily. So every night. So just every time, especially the day when they make some firewall changes, we check the job. So this what which job telling what our job suffering or not. We actually introduced this to our network security friends say when you introduce some changes, run this. So we'll be able to know our application will be broken or not. Yeah. Uh, another one is really about, uh, infrastructure security patching. We all know this in the old days actually the still have infrastructure doing this way. So if there's upgrade you need to uh, assist engineers. You need to log into hundreds, thousands of machines just to make sure that it's updated. That usually you cannot do that in daytime, you have to do it in night. And right now because all the infrastructure has been scripted, what does this mean? It's simply done by deployment, new version for script. So just one clicks and hundreds of hosts can be updated already. So this is something that our off friends really, really like because this significantly reduce the, uh, we can or not support.

00:34:14

I think most of the biggest accomplishment for this one is what really makes me happy is starting from have zero knowledge or resistance, resistance to DevOps and coming to a stage, uh, considering we need to have devs as a strategy for this company.

00:34:34

So what I would like to need some help is really first one is regarding to access control. It's quite interesting because we have a long lasting debating always from application delivery team to the uh, system engineers to think application developers really think, oh, I need to have a route permission so I can install whatever application I need. And our system engineer friends already come, so this is not allowed according to our security policy, this is not allowed. I just wonder whether someone had, has some ideas to say how did you get route access to that? Another one is regarding to access what kind of access ownership you can have up to, because right now our automation is have a reach a certain level, but um, the team is really thinking about we want to have automation up to VPC level. So we want, we want to own end to end network.

00:35:33

Not only the application part. We want to have that big part because we are asked to take care of all every risk security issues for the entire network. Why don't we own them? So that's one of the things I really want to get some help. Another one is about open source. All we did in the organization, we prolonged very highly regard to open source internally, but I'm just thinking from, if I step back from another level, looking at all the enterprise clients, we're all looking into pretty much the similar things are able to enter open source, those kind of things to a level so we can actually reduce some waste on the industry. Yeah, I think that's it. Thank you very much for the time.