San Francisco 2015

Metrics and Modeling – Helping Teams See How to Improve

Helping teams see and understand their process highlights areas that need improvement. This session shows how to make team data visible in ways that lead to improvement action and how to avoid the pitfalls and traps of managing by numbers.

With examples of good and bad techniques for showing data and coaching teams, this session will help choose a path for helping teams develop an analytical angle to their technical skills.

Troy Magennis

President, Focused Objective

Julia Wester

Improvement Coach, Leankit

Chapters

Full transcript

The complete talk — auto-generated from the talk's captions.

Hi. Thanks everybody for coming to the last breakout session slot of the day. I know it's been a long day, so we're glad you're here. I'm Julia Wester, and I'm an improvement coach at Leankit.

I get the pleasure of working with Domenica every day, and I help people take the crazy out of work. And I'm Troy McGinnis. I run a small math consulting business, so I don't do much work. All right.

We're just going to start talking about the different types of analytics that there are, because there's a lot of confusion out there. You hear about big data and statistics and stuff like that. Now, essentially, I'm dressed in statistical attire. If I was in big data, I'd have a MacBook, and I'd have a turtleneck top.

Okay? There's a lot of different types of statistics and analytics you can do. Now, we're going to just give you our definition of them here, just so you understand where we're coming from, because we hear a lot about metrics, and we hear a lot about statistics. But what we're trying to do is get you to understand the different styles, so you can make a choice about which one to apply and when.

First of all, descriptive statistics. It lets you look at the past and then make an assumption about the future. So it's saying that even though we can't say... Well, your insurance company can't say exactly when you will die, they can tell that people like you commonly die at a certain age, depending on certain factors.

How do they determine those factors? Well, that's where they look at big data. That's where they look at historically starting to subdivide the whole population to work out what factors put you at a higher risk than others. And this is the hot area.

If you want to move into big money, go into big data. Don't do statistics. It's so yesterday. Now, what we're talking about here is prescriptive analytics.

In other words, what we're looking at is we're looking at improving your next decision. We're not looking at understanding. We're going to use the past to help us inform us, to help us make the next decision better than the previous. But once we take an action, all bets are off.

The whole table gets reset, and now we're looking at the next decision that we're going to make. So what prescriptive analytics does is it helps you answer the question about what can we do to cause an outcome that we want. The other two types of statistics, we're looking about historically, what are the chances that something's going to occur? So in coaching, what we're looking at is being prescriptive and saying what we want to happen.

So they're very expensive. Setting up data, making decisions, understanding what's going to happen. It can cost you a lot of money and still mislead you in the end, so you get no return on that investment. So before you spend a dollar on doing any analysis, invest wisely.

Because metrics are expensive, we want to make sure that we're using the best possible metrics we can. And one common thing we see is people using too many vanity metrics. And what that means is we see people out there measuring what's easy to measure or interesting. Maybe it came in the tool you just bought, so you're like, great.

It makes you feel good because you're doing something. So vanity metrics don't answer one of the two key questions. Does it matter to my customer? Does it help me make a decision that makes my business fitter?

It's like this pawn looking into the mirror and have a grander than itself view of what it really is. We fool ourselves into thinking that these vanity metrics can proxy for the information that we really need. So one metric that I consider a vanity metric is number of tickets closed, and I bet that somewhere in here, there's someone that's being measured by that right now. But my concern with this is that it falls into a class of metrics I like to call productivity metrics.

It puts more focus on activity rather than the progress of value. So we need to look at what happened because we did the work. What value was delivered for our business or for our customers? What satisfaction did we derive because we did that work?

Those are the things that really matter, much more so than finding out that Team A closed five tickets and Team B closed 10. That actually has no meaning when you sit down and think about it, although it's interesting. Now, another metric I want to talk about is system uptime, and I know that this is a ubiquitous metric that everyone measures, and we're expected to. So I'm not suggesting that you change that.

But system uptime has varied definitions across our customer bases. If I were to look at five different contracts, I could get five different definitions of what uptime means. Unless you read the fine print, you're not really sure if your five nines is the same as your five nines. But more egregiously, it treats all downtime as the same.

Similar to the number of tickets closed, which treats all activity with the same level of value, we don't want to do that with our downtime. We know that if we have an hour of downtime when no one is using our service, it's significantly less impactful than if we have an hour of downtime when everyone is using our service. So we need to go beyond measuring these generic measures and look at the actual impact of the outages to our customer. What did it cost them?

What did it cost us? It might be a little more challenging to measure, but it provides you immensely more valuable information. And as a customer of services, I'm as much interested in your ability to recover from an outage as I am your ability to prevent it. Because I know that no matter how hard you try, one day you're going to have an outage, and the worst thing for me at that time is if you spent all of your time on prevention and no time on learning how to recover once you have that failure.

So we also need to measure our ability to quickly respond, as well as our ability to prevent outages and the cost when we fail.Again, you want to have metrics answer the question, so what? And you do that by asking those two questions again: Does it matter to my customer, or does it help me make a decision that makes my business fitter? If it doesn't do either one of those, then it's a vanity metric, so just discard it and look for the next thing. Now, measuring individuals is a hugely commonplace practice, but it's one of the biggest pits you can fall into as an organization.

And it really doesn't matter whether you're measuring people to give them props for doing extremely well, or if you're trying to motivate people to improve. Measuring individuals rarely works out like you'd hope. I want to talk about Mr. Carmelo Anthony.

I know he gets a bad rap, but he's a great example of why measuring individuals to give them props is a bad idea. So Carmelo Anthony, at his peak, was the eighth highest scorer in the NBA. And you'd think his team would be ecstatic because they have a superstar on their team, and you'd think that when he played, his team would win more often. But what actually happened is that when he played, his team lost more often.

And so why is that? Superstar on my team, I should have great team performance, right? Well, the fact is that Carmelo had to take significantly more shots to get the scores that he got, and effectively he stole scoring opportunities from other members of his team. And those members could've been more capable of providing an outcome that could lead to team wins.

So Carmelo's focus on individual statistics, whether it was subconscious or knowingly, had a negative effect on team outcomes. What you measure shows what you value. So you need to make sure that you're valuing and showing what's really important. You don't care if Carmelo's number one, you care if the Knicks are number one.

That's who needs to win, so measure team performance. On the flip side of that, we like to give people motivation to improve, and there's a fine line between giving someone feedback and using metrics as a sharp weapon. And so what I want to talk about here is a dashboard that Troy found in a break room, so that's in a common area where everyone can see it. And it's a list of team names and individual team member names who have 10 or more bugs assigned to them.

So we don't know anything else about these bugs. We don't know the cost of delay of working on them. We don't know any other information that helps us prioritize them. And we also don't know what else these people are working on.

All we know is that a line got drawn in the sand. 10 bugs or more, you're on the naughty list. Right? So Goldratt had a quote that said, "Tell me how you'll measure me, and I'll tell you how I'll behave." And a lot of people start there, but I really love the next part that says, "If you measure me in an illogical way, don't complain about illogical behavior." So Troy, what did people do as a result of that dashboard?

Yeah. What happens is over time there, they did nothing else to change the actual underlying problem. So all they really changed was how the data was collected. So it actually tainted the data that we had to analyze.

We no longer had accurate bug defect numbers in areas of code. We didn't know which teams were signing work off early or trying to send it to production then have the issues arise when it went to deployment. So yeah, this had a terrible impact on architecture. People started dealing with the defects one at a time rather than trying to find whole groups of defects and fix the root cause.

So basically what they did by putting this chart up was make the software more unstable and increase the number of defects they ended up having in production. Not only that, but when you hate having your name up on a naughty list, you're going to think things like, "Oh, that's not really a bug. If I think about it, that's a feature, right? So let me just change that to a feature." Or, "Hey, John, over there, he's really good at doing that kind of work, so I'll just reassign it to him so he can do that.

Get my name off that board." Even worse, other parts of your organization look and say, "Oh, they really fix bugs when they come up, and I need this thing done. So this could be a bug if I think about it in this certain way. So let me put it in as a bug, so they'll get on that really fast." These are all ways that the system gets gamed when you measure the wrong things. So we need to continue to measure team outcomes, not individual behaviors.

Give feedback in an appropriate way. Now, I want to talk about balance a little bit before I hand it over to Troy for the rest of the presentation, and I want to do that by talking about restaurant tables. So when I go sit down at a restaurant table, one of my biggest pet peeves, if it's wobbly, right? It's super frustrating.

Every time you make a move, you have this reaction, and you have no idea, because you forget between the times it moves every time you move. And if I encountered that every time I went to a restaurant, I would probably go to a different restaurant if I could. Right? So businesses have a similar need for balance.

The key is to find the pillars that you need to be concerned with, and I'm going to use Larry Maccherone's four key metric quadrants for a sample. And you need to find a balance across those. So this quadrant, you can use this if you don't have a place to start from. It talks about doing it fast.

Am I keeping pace with my business? Doing it right. When I do it, are people happy? What's the outcome when I deliver something?

Do it on time. When I promise, do I deliver what I promised and when I promised it? And then keep doing it.It's not just enough to do each one of those things for a short time. You need to do them in an ongoing manner.

Now, there's something that they don't teach you in math class, but you certainly learn when you work on projects in an organization. Now, I'm going to ask for a show of hands. How many of you have ever worked on a project where each individual component was done well, but when you put it together, it didn't work right? Yeah, a lot, right?

It's definitely happened to me. So there's this dark matter or glue or whatever we want to call it that you can't easily articulate in a project plan, but we know it needs to be done. And I like to think of this fourth quadrant as that concept. It's not just enough to do it fast and to have those skills or do it right or do it on time.

We need to figure out what skills we need and cultivate those to do them all at the same time, and then to do them in an ongoing manner. And then once we do that, we have to keep the quadrant balanced, so we're not like that wobbly table. And coaching can really come in when you need to understand how to effectively tweak one quadrant without gutting another. You know that every action has an equal and opposite reaction.

We have to sort of manage that. So define your pillars of balance, whether it's three, four, whatever. Understand how you're going to measure something in each, and then focus on keeping them in relative balance. Because if you forget one or ignore some, your customers will likely look elsewhere.

All right. I'm going to use the back button. We were discussing whether we needed it. That's right.

I guess as managers and as executives, if we're above many teams, what our job is, is just trying ... Sometimes when you change one of these, you amplify the negative impact on the others. So it's not even a one-for-one trade-off here. It can be very dire.

I'm going to show you an example of how we tried to set up a dashboard within a company which helped people see the trade-offs they were making so that they could make intelligent trade-offs economically. And emphasis on economically. So essentially, you're trying to set the scene inside the company where everyone understands the business so that the decisions they make locally don't detrimentally impact someone else in your business. So this is what we built.

Whenever we showed metrics on any of our dashboards in our public spaces or for teams to use inside the organization, we always showed something from each of those quadrants. Okay? We never showed them individually, because what we're trying to do is get people to understand when something moves, something else will also, you'll have an impact somewhere else. And might be a little bit hard to see for some of you, but you'll see the bottom corner here, which doesn't really work.

Exhibit A, dark green, bottom corner, throughput. You sort of see it was going along roughly around about sort of 22 pieces of work per week, and then we put that dashboard up in the lunchrooms. See the drop to 1-1-4-13-2-2-2? So by putting the dashboard up to sort of say we're tough on quality, what we did is stop the company delivering value work.

The light green line in the background was the number of defects teams were delivering over time. So what happened was just the sheer fact of reporting a metric that was easy to capture had a very detrimental effect on business delivery throughput. This is what you're looking for. You're trying to help your teams understand that it's not good to overemphasize any one of these.

You want to trade something you're excellent for, for something you're not trending as well as your peers in. Context matters, right? So it doesn't matter what the individual number is, it matters how you compare to the rest of the company. So we converted that bugs rather than just a number of 10, we converted it to the number of days it would take all developers in the team to get to zero defects.

It made it personal for the team. It set the team the target that if before you take on any new business work, look at how much you have in debt lying around that you could burn down and make a decision as a team whether you should burn that down early. You've got some others there, responsiveness. But again, we're trying to soften the colors.

This is informational. It's not meant to say red and green. We don't want to sort of evoke a emotional sort of fight or flight response based on color. So we made everything a nice soft shade of pastel.

And it matters. So that was it. Bugs, we made it personal to the team, the number of team members down to zero. Incidentally, there's a couple of more talks on metrics if you're in this room tomorrow at 11:30, so you should stay here overnight so you don't miss a seat.

Mark Mikayalis goes into a plethora of metrics in each of these measures that you might want to choose from. I'm not going to do that. We're not going to do that. What we're going to do is just tell you that you need something from each of those four quadrants, and you're going to ask your teams which ones they want.

So this was ours, bugs, responsiveness, cycle time, how quickly they fixed things, throughput, how fast they were getting through work and defects, and predictability. Were they working weekends and then they stopped working weekends and now the trend started going in a different direction. Everything needs to be aligned on the date axis. This is the team view.

The team would use this during retrospectives or ops meetings. What are they called, Dominica? Ops reviews. Ops reviews.

See? I learned that this morning. And what you're trying to get the team to do is trade something they're best at for something that they want to improve. And this is the way we did that.

What we did at another level where the rest of the company could see, is we removed the axis, so there's no numbers on this chart at all. The orange line is your team. Not your team, their team.And the gray line is everyone else in the company in the same context. So we're helping the team see their trend over time against all of their peers.

And what we're after is we're after them asking these questions. Down the bottom here in the throughput, bottom left-hand corner, they're actually above the company trend, something they might trade. On the right-hand side, you sort of see they started better than the company trend, but they're starting to cross the company. Might be something that we might be able to trade some of that good throughput for.

And the predictability, they're an outlier. They're almost the worst in the company. So we're helping them come up with some coaching advice. What we would do then is up on the top coaching list, we would give them three pre-canned coaching responses for saying, "Teams in your situation often found these things helped." So the coaching advice was synchronized to the trend balance against the rest of the company.

And then we just leave it to the team to choose what they want to do. Again, they get immediate response. Next sort of sprint, next couple of weeks, next ops review, they get to see it all again, and they get to see it real-time. So that's what we're after.

We're after just trading something good for something we're not so good at, and we're just doing trends. And now we're just showing off. So what we also wanted to do as an industry, this company is particularly into data visualization. You can see the name.

This is Tableau Software. This is their development teams. And what we wanted to show in a story was the fact that we wanted to give them a little tool where they could look at all the defects that they closed and all the defects they had, and see where it goes. Do you want to do this slide?

No, you're good. And we made the colors just look like a ball of lint out of your pocket on the right-hand side because that was not really defects which actually were fixed. It was just process noise. So we wanted to show it as process noise.

And what we found just by putting this up with no instructions is that people were just enticed to sort of hover their mouse over these little bubbles and sort of see which ones were big and small, and then they would start discussing root causes in clusters. And then to help them do that, we started doing some text analysis on the bugs and the defects and trying to put them into big, small, and medium buckets based on just the text in the bug description so we could see, and the teams could see, if it was a certain unstable environment which was causing most of the defects. There we go. So if we were to distill this talk into five top takeaways, it would be these.

Keep your metrics inventory small. You guys are in ops, you know inventory incurs cost, right? So keep it small. Measure valuable outcomes, not individuals, so that you can achieve the goals that you really have.

Actively monitor a balanced set of metrics, and then monitor trends against those metrics to expose trade-offs. And then as Troy just went over, provide beautiful interaction to engage big brains to help you figure out what next steps to take forward. Now, the ask that we have is for people to share information with us. What set of metrics are you guys using and why are you using those?

And that'll tell us a lot about what's going on with you. So does anybody have any questions for us? Questions. I'll shout out.

Okay. Customer engagement. How do you successfully measure that? That's a great question.

So the question was customer engagement, how the hell do you measure that? There's a lot of metrics which you get after the fact. So measuring it's not real problem. Often you get surveys and certain companies aren't afraid to call in, but it's a bit late.

If there's one metric, if there's one ask that I find I'm always at a loss at chasing, it's the quality aspect. I can't find a good leading indicator for quality. The one I'm most looking at as an advantage there is a lot of companies have feature flags, and I'm finding that when you get a group of people and they have to agree whether to turn a feature flag on or off, that's correlating very closely to customer satisfaction. Because I think if you can get the team who built it to agree that it should be switched on in production, then you've done a survey of informed people that this feature is actually ready to go.

So I can find out afterwards by survey and customer satisfaction sort of scores. Trying to find a way to bring that earlier is a challenge which we would love your help with. Great question. Why not try experimentation?

Just do something with some of your customers and something else with some of the rest and see which one does better. Yeah, so like A/B testing and seeing what results in higher conversions. And that would be a lagging indicator as well. It's lagging and there's investment in it, too.

And while you're running that experiment, you're not running another. So I agree with you, it's just the right- It's 50% of the time ... it's the right choice sometimes. There you go.

That's right. Anything else? One over here, Dominica. Oh, second row is...

It's hot today. Any recommendations on how to get executives away from vanity metrics? Oh. Yeah.

So it's all in asking them to tell you what they think that really means. I had a situation where the director of business ops and IT at F5 Networks, where I worked before I worked at LeanKit, he wanted to know how many enhancements each development team got done in a sprint. And I said, "Okay, so I tell you 35. What does that mean to you?

Can you tell anything from 35? What about 30?" It's making them think to the next step, and they will figure that out themselves in almost every case, because they're like, "Yeah, you're right. 35 really doesn't tell me anything." Yeah. So it's- Start measuring them by a vanity metric.

Start putting up number of restroom breaks for the executives. Yeah. And if you can get them there without making them feel stupid along the way, then that's really going to help with that. Sometimes it's great being a consultant and not an employee.

Yeah. Although if you want to get hired back, you still have to make them not feel stupid. I mentioned I'm in math. I'm not being hired back.

Yeah, listen, this is a big area. The things I also want you to look at, probabilistic forecasting. Start helping your executives understand there is no one right answer to any of these metrics. Every one of these metrics over time changes just based on the work that you choose to pull and the direction that the company goes.

Most of the value metrics are outside your company's control. Your competitors call the shots as to what they're going to do when they change the value proposition for the features in your backlog. This is an area which we need to get good at, and I guess we're here to help. Yeah.

And so Troy mentioned a talk tomorrow that you might want to go to. Awesome. I also want to point out Dominica's talk at 3:35, same time, same bat channel, same place. Same room.

Yeah. Stay in the room. Camp here. Right.

So she's going to talk a lot about the shape of uncertainty and talk more about those probabilistic kinds of metrics. So tune in. And thank you so much. Thanks.

Thank you for coming. Thanks everyone.