Developer Productivity Engineering – The Next Big Thing in Software Development - Gradle, Inc. | US 2021

Login or create a new account

US 2021

Slides not available

Developer Productivity Engineering – The Next Big Thing in Software Development

By 2022, IDC has predicted that 65% of the global GDP will be digitally transformed. Two-thirds of the products and services that you pay for will be driven by software. There has never been a more important time to foster developer productivity, but many of our methods have not evolved.

In this keynote style talk you will learn why DPE is the most important development in the software engineering world since the introduction of Agile and DevOps concepts and tools. DPE is a new software development practice that uses acceleration technologies to speed up the software build and test process and data analytics to to improve developer efficiencies by as much as 10x. The ultimate aim is to achieve faster feedback cycles, more reliable and actionable data, and a highly satisfying developer experience.

Justin Reock, Field CTO and Chief Evangelist at Gradle is pioneering DPE as a practice and set of technologies and is one of the world’s leading advocates. Specifically, Justin will provide an overview of the key concepts and tools, business impact on key business objectives like time-to-market, cost and quality, the business case for DPE, and the role of AI/ML in DPE moving forward.

JR

Justin Reock

Field CTO and Chief Evangelist, Gradle, Inc.

Transcript

00:00:12

Hi, everyone. Thanks for joining today. I'm Justin reactor, chief evangelist and field CTO at Gretel. You may have heard some of my other talks before around open source and developer productivity in general. My background is predominantly in software development, moving into enterprise architecture, but now I really focus more on, uh, evangelizing, uh, developer productivity engineering, which I'm going to talk to you about today. So let's kind of jump right into it. I like to start this talk with the following quote from Eric Pearson, who was the CIO at Intercontinental hotels group. And he says it's no longer the big beating the small, but the fast beating the slow. Okay. And I think that this quote adequately sums up the state-of-the-art and the state of the industry right now. Okay. It's no longer the big giant software companies that are just dominating, right? It's the disruptive, nimble companies that are able to respond to customer and market feedback quickly and bring their features to market fast.

00:01:18

Right? And that's, that's, you know, obviously what we're here talking about today with dev ops, right? That's the whole purpose is to, uh, get around various constraints and to unstick bottlenecks and to convert our code into throughput faster. So we're going to talk today about a very specific problem, uh, around feedback cycles for developers and how that impacts developer productivity. Um, because the practice of developer productivity engineering is very practical and very pragmatic and very straightforward. It's about identifying bottlenecks and friction in the software development process and then using technology to mitigate those bottlenecks. Okay. So, so here's, you know, your average developer, uh, doing their thing they're in their local environment, but this could also apply to a remote environments or even building out in CII environments and they're waiting on feedback, right? They've written some code they're going to run a build and they want the build system to tell them something.

00:02:15

Well, sometimes the feedback just takes too long, right? Sometimes the build takes minutes in extreme cases, even hours, right. Or sometimes the test cycle takes too long. Maybe there's thousands of tests that have to be run, right. Maybe, uh, it, the, the feedback is okay, but, but, but it's a failure, right? There's some type of failure that has to be investigated, right. And maybe it takes too long to fix that failure or to try to collaborate on the fixed with that failure with, um, other engineers and maybe like the worst case. It's, it's a, it's a, it's a pitfall or a problem or a bottleneck that could have been avoided altogether. Right? Maybe a flaky test, a test with like a non-deterministic, uh, outcome or a failure that is impacting lots of developers in the organization, but nobody's talking about it because nobody's tracking it.

00:03:04

Right. So think about these pain points, these friction, these bits of friction, and these bottlenecks then multiply that by, you know, hundreds of calendar days per year, than hundreds of developers, times that we end up sinking productivity and increasing cost, which is antithetical to constraint, spaced work and productivity theory. So backing it up just a second. Okay. Software development is still a creative process, right? Even, even as practitioners, even as folks who do this for a living, right, this is still creative work. Now it's not purely creative work. It's also scientific work. I think it's fair to say that it's a bit of both, right. We experiment with our code. We form a hypothesis in our mind, we run the code and then we get the results, right. And we want to, uh, understand whether the work we did, um, accomplish the tasks that we wanted to accomplish.

00:03:57

And I think for a lot of us, when we started, uh, in this industry and we started down this path of like learning how to code the experience, felt like this very often for us, right? These were learning moments, right? We were achieving, we were, we were running a hello world program for the first time in a new language and we would run it and we would get the output. And that feedback would happen so quickly. Right? Because they were very simple, simple programs, right? Maybe we're doing a standard hello world, or maybe we're just changing the background color on a shape or something like that. But the point is that we would get feedback fast and that fast feedback as part of what led to the delight of the experience. But as a lot of us have moved into more professional roles with developing software, uh, especially in the enterprise, um, the success of software projects ironically creates a new set of challenges for us, right.

00:04:55

Um, as the projects become more successful in an enterprise, as more users start to use the project as the project grows in size and the number of developers working on it, or the number of repositories that are necessary to support the development, uh, just the number of dependencies that are being used. And just the overall diversity of the tech stack. You know, again, as enterprise projects get larger and larger, the tool chain efficiency, uh, starts to degrade if we're not managing it right. So the, the build time gets longer and longer and longer, the test cycles get longer and longer and longer. And ultimately the delight of the process and the happiness involved in the, in, in, in the craft begins to diminish and begins getting replaced with some frustrations. Right? So I think now, you know, if you look at the day of the life of an average enterprise developer, their calendar might look a little bit like this for organizations that haven't, uh, fully invested in productivity engineering, a practice, right?

00:05:58

The developers might come in, they're ready to go. Um, and they start coding. Great. Uh, and then we're going to wait for our local build to complete, we've made a bunch of changes and something failed in the local build. So we're going to spend some time debugging that build failure. Maybe we fix it and then go to lunch. Great. We're going to come back. Um, and we're going to code for a little while longer. We're going to wait for our local build to complete. Now we, we debugged the build failure that we had last time. So this time it's successful, but, uh, now we push that build to CGI and there's a flaky test, right? There's a test that passed in our local environment. It did not pass in the CIA environment. Now we're spending time investigating that. All right. So how much of this was spent doing what developers love to do, right.

00:06:42

Coding? Uh, not a lot. Right. A lot of this is actually just spent, uh, with various frustrations, supporting the build, troubleshooting, the build debugging, the build. So an outcome of developer productivity engineering then is to give developers literally hours back in their week to be able to work on valuable solutions. And I think that we are at a point where we can make a global appeal for this now. Okay. Um, there's a study that's been, I think, was published in October of last year. And some of you may have seen it. You may have even been quoting it in some of your presentations today. It's a very powerful statement and it was from IDC and there's a prediction that by 2022, right, coming up by the end of this year, 65% of the global GDP will be digitally transformed. 65% of the goods and services that we pay for will be software services.

00:07:36

Okay. So this is now, you know, still this relatively small, you know, group of craftspeople developers who are quite literally lifting all boats in the Harbor. And I think as much as possible, we owe it to this part of the workforce to have a happy and productive work experience. And that's very much, uh, the purpose of developer productivity engineering is to increase productivity by increasing developer happiness. Because right now the current state teams do work far from their true potential. And the productivity of developers absolutely affects their happiness. Right. Again, going back to when you first learned about coding, uh, it's it's neuroscience, right? Those little rewards that you get for running the experiment and seeing the results that you want, it's a dopamine hit, right? Right. It's, it's, it's a small reward that leads to overall joy and happiness. And if you can't be productive, if your tool chain is blocking you from getting the feedback that you want, and from building code at the rate at which you want to build it, you're going to be more frustrated than you are happy.

00:08:42

And so, as a result, low to low developer productivity is blocking business innovations. It's not that dev ops, hasn't done amazing things in terms of increasing our ability to code to market faster. Of course it has. But what we're talking about is a new set of bottlenecks that are actually further left in the process now that are actually part of the developer experience and writing code. So this is not like, you know, a replacement for dev ops or anything by any means, right? This is more of a continuation, right? It's a constraints based theory. Uh, just like, uh, just-in-time manufacturing business process. Re-engineering ultimately moving into things like change management and then evolving into practices like agile and dev ops. I mean, we, we know these processes deeply, we understand, uh, uh, gold rat theory and theory of constraints. And if we fast forward now to what developer productivity engineering is trying to do, it really is just taking a look at the same types of bottlenecks that have been identified by other practices like this in the past, and then taking pragmatic solutions to do something about it.

00:09:53

Okay. So let's talk about what those solutions are. That's a lot of theory, but the solutions are actually very straightforward and pragmatic. Um, so when we say that DPE is the next thing like this continuation of dev ops, though, it's quite literal, right? We, we really say, okay, well, this is the next set of bottlenecks and friction and pain points that need to be addressed. So we use acceleration technologies and, uh, data accumulation and analysis technologies. All right. Um, we use acceleration technologies to speed up the build itself, to, uh, decrease the amount of time it takes for developer to get feedback about the build. And that applies to local builds, remote builds and builds. We also apply, uh, technologies to the testing process to allow more tests, to take place in parallel. And we're working on a, a machine learning gradient boosting based technique called predictive test selection, uh, which was actually, um, pioneered at Facebook to avoid running certain tests that probably won't produce any valuable feedback for us to begin with.

00:11:04

Right. So, so really, you know, we just take these acceleration technologies. We apply them to the build and the test to speed up the feedback cycles. And then just as important we monitor, uh, metrics like build times and test cycle times and failures and test results so that we can determine flaky tests. And we use all this to paint a picture of overall build performance so that it can be monitored over time by a group of production engineers. Okay. So that is fundamentally different than some other approaches to productivity that we've seen in the past. Uh, and I think we can, you know, kind of talk about two categories of productivity work. We have developer productivity management and we have developer productivity engineering. Right. And I think it's really important to separate the two, right. We're talking about developer productivity engineering and this talk, which has a different focus, whereas productivity management might focus on the people.

00:12:05

So how many lines of code are being produced? How many story points are being generated by, or how many, how many story points does the team's capacity have? What's that team's velocity, right. DVE focuses on the process and the technology, right. It says, can we make the build times faster? Not are they fast enough, but how fast could they possibly be given the right types of acceleration technologies? Right. And so our metrics then are based on outcomes, right? I mean, symptomatically, a lot of them are the same as you'll get from a productivity management solution. Like this will absolutely increase the team's overall velocity. It will allow them to work through more story points per sprint, or however you, you, you measure your productivity. Um, but it's going to look at those raw concrete outcomes. The SDLC focus right now for productivity engineering really is just sort of in the, um, the, excuse me, the building test parts of the SDLC right now.

00:13:01

Um, because that's where the majority of the developer experience happens. Right. Um, this landscape is changing a little bit certainly. Right. Um, you know, I think, uh, as we start seeing more infrastructure as code and we start seeing more developers actually getting feedback from production, um, for very fast releases. So pushing things out to service mesh where they can wait, uh, network traffic and do things like blue-green testing or Canary releases very easily as part of the release process, then we may start see seeing DPE sort of creep into more of the, uh, uh, CD and deployment side. But right now it's really focused on test feedback cycles, build feedback cycle times and tracking that over time. And as a result, the ROI from taking DPE initiatives is very straightforward and easy to calculate. It's very hard and proven. So just to give an overview of the overall solution and the five, um, pains that we try to address with developer productivity engineering as a practice, um, we already talked about idol and wait time as a, as a pain.

00:14:09

And as a result, we want faster feedback cycles. And so build caching and test distribution are the two technologies that we utilize that we're going to do a quick, super quick demo of a build cash so that you can get a better idea of what that is. And I'll show you where you can link to a video on test distribution. Unfortunately, we don't have enough time today to do a demo on that one, then the next thing, inefficient troubleshooting. So we mentioned that, you know, the build time is just one part of it. The build time and waiting on feedback is just one part of it. What if that feedback is negative feedback? What if it's a failure? How can we improve a developer's ability to debug that failure collaborate on that failure with other engineers so that they can get to a root cause with that, we utilize something, a build scan, and you may be familiar with this.

00:14:53

If you built, um, with cradle in the past, if you haven't run a build scan, but you build with cradle, you can do it right now. It's part of the open source tool. You can just do dash dash scan at the end of a cradle build, and it will run a scan. You'll kind of get a chance to look at that. We'll, we'll see one at the very end, and these are also available for Maven the Maven, uh, free Gretel enterprise plugin, kind of the freemium feature for Maven. Um, but you can also run, build scans against the Maven build as well. So what the build scan does is just collect a whole bunch of forensic data about the build itself and context around the build, uh, and puts it in a shareable form and a URL that can be passed around the business. Failure analytics allow us to detect proactively things like flaky and non-deterministic tests, and then other types of avoidable failures that, you know, failures that may be occurring again for multiple developers, but there's no visibility.

00:15:45

No, one's actually tracking that. And so, because no one's tracking it, um, there's, uh, that, that failure effectively never leaves that developer's workstation. Um, there's no metrics or KPI observability. No, one's actually paying attention to how well builds are performing, um, in a lot of organizations. And so that's another pain point. No one is really paying attention to how much time developers are sitting, waiting for builds to complete a test, uh, test feedback to come back from, from, from, uh, testing frameworks. And so part of this is making sure that you're rolling up aggregating all that data centrally and being able to visualize it, and then a side effect of this, which really doesn't have anything to do with productivity. But since we're, you know, things like caching, uh, are allowing us to, and things like failure analytics and flaky tests, detection are literally, um, making the build systems do less work, a side effect of this is that we can utilize RCI resources more efficiently, a side effect then is that if we're building out on cloud and we're worried about CCI costs on cloud, which we all know is a creeping cost, um, by literally asking the build system to do less, we can save on those resources.

00:17:01

So it's just a side effect of DBE. It's really not part of productivity, but it's worth mentioning. All right. So those are really the five pillars of DVE. And if you are aware and conscious of these pains, and if you're using technology and process to address those pains, then you're going along the path of developer productivity engineering. So let's look at what kind of impact this can have because very fast, fast feedback cycles are really important. Let's take a look at two separate developer teams and just do a little thought experiment here. We have 11 developers on one team with a four minute build time with GS, any Java enterprise developer of a four minute build times killing them. They're going to say, no, they're going to say it's fine. Um, but compare that to a team of six with a one minute build time.

00:17:45

I look how much more often they're able to build look how many more builds they can run in the same unit of time, right? This second team will in all likelihood be able to ship more better features because they're able to build more frequently. They're able to, um, uh, have a smaller change set per build. They're able to avoid merge conflicts more often because they're able to build smaller change sets more frequently, and they're able to experiment on the code base more frequently. When we start looking at, um, the savings per year at very large teams, when we take this same principle, look at a hundred developers doing 12,000 local builds per week with a nine minute build time, reducing that with our acceleration technologies like caching to a five minute build time can translate the 5,200 days a year in engineer's savings. Okay. So we talked about this build cash, and it's just a tool for fast feedback cycles.

00:18:43

Uh, the, the, it was introduced to the Java world by Gretel in 2017. It's not the only build cash technology out there. Um, but it is, um, available for Maven and grateful. And it's important to understand that this is very different than a dependency cash, right? A dependency cash like, um, an Artifactory or a Sona type nexus, those hold your binary dependencies, fully compiled binary dependencies that need to be downloaded, uh, to various projects. And they're useful, right? They're, they're, they're, they're complimentary to a build cash, which actually caches, um, outputs from various tasks or goals within the build. So, you know, as a griddle task completes, uh, the inputs from either the greatest task or the Maven goal are effectively just cryptographically, hashed, uh, and a key is generated. And if code hasn't changed, uh, or if test seven change that would affect the output, uh, then when we actually go to run the build, we just generate the key, look in the cash first to see if we had literally the exact same output that would have been generated based on the changes.

00:19:47

And if we do, we just pull it from cash and that's usually a lot faster. Okay. Um, several open source projects, uh, more than just are in this list are using this technology now. And they're seeing, I think very dramatic results. The, uh, commons IO Java library has, um, is using caching and it brought their bill from a minute 23 down to four seconds, spring boot from 21 minutes down to six. And this is even better now, actually, we can go and look, we're going to look at this dashboard as the very last demo we got about five minutes left, um, at the autonomy project, which builds lots of stuff. This is Tom cat, but then also things like active in queue for JMS, uh, from an hour 27 minutes down to 20 minutes with, uh, caching. So we're going to take a look at a super quick demo, really fast.

00:20:36

Okay. So this is just a very simple Maven project. Uh, it's actually the, uh, camel spring boot router, uh, archetype that you can get from Maven central archetype. And I've modified it to add the, um, Gretel enterprise may have an extension source, uh, well, free lead redistributable extension that you can just add to your project. So we're going to do a Maven clean verify, and this should just be like a normal run. Um, I think generally this project takes 20 to 30 seconds to build and test on a normal run. All right. About 10 seconds probably cause I'm I built it pretty recently. Um, but let's, let's run this again now with caching turned on and look how much faster. Okay. We pulled, um, several of these tasks from cash and the build time only took three seconds and we can take a quick look at a Maven, excuse me, a griddle build scan here on a Maven build and take a look at our performance.

00:21:38

And we can see that we avoided, um, uh, 79%, almost 80% of our overall build time using the cash. Okay. Let's jump back into the presentation since, uh, we are getting near the end here. So build cashing is one acceleration, uh, technology, uh, test distribution is another, uh, test distribution is the ability to distribute test workloads across multiple agents and sort of an elastically scaling way. We don't have time to get into it, but we do have a really good video on how we've applied this to the Apache Cassandra project. And you can view it, uh, here. Okay. Now the other part of this we said was observability and data, uh, what gets measured gets improved, right? We all know that, um, performance regressions are very easily introduced, um, into any type of build infrastructure, right? Changing office locations, refactoring the code, changing the way that we manage, uh, binary dependencies and things like that.

00:22:41

All of this can have an impact on build performance. And so it's really important that we maintain Villa visit, uh, vigilance over, uh, over that part of the build and make sure that those metrics are really well understood. And so that's the other two parts of, um, the other two practical parts of this, uh, practice is doing failure analytics and really making sure that we can detect things like, uh, flaky tests and then, um, making sure that we're watching build performance over time. So we're going to take a look at another very quick demo, and then we're going to wrap up, um, this is the spring framework, uh, great all enterprise dashboard. So several open source projects, a number of open source projects. We just give this technology to for free Gretel enterprise is an enabling technology for the practice of developer productivity, engineering, and spring has chosen to use it, uh, to augment their build process.

00:23:37

So I want to point out two things here. First, let's take a look at failures. Um, these are failures that have been aggregated across the build process for all spring developers and look at this one right here. So we have these two types of failures, a non verification and verification failures. The non verification failure is like an infrastructure failure, right? It's something from maybe like a network timeout, whereas a verification failure is like a, um, an assertion that wasn't met properly, right? So like a programmatic failure, but we can find out, like for instance, right now that 26 builds have failed with this particular failure. We can see and drill down to all the various builds where it's happening. These are different local builds taking place on, uh, for different users. And we can drill down right into the failure and really try to understand what's happened.

00:24:24

And we can even take this link and we can, you know, link to really any part of this build scan, but we can take this URL, copy it, we could paste it and give it to somebody else and it'll take them right into the failure. All right. So this is one way that spring is using it. Um, and then another way I want to point out is, um, the trends dashboard. And this is what allows us to take like a, um, you know, a relative amount of time, maybe the last 28 days working weeks, four working weeks, and take a look at how our build performance has been. Um, how many builds have taken place? What's the cumulative build time. If we scroll down a little bit, we can see how much savings have been taking place by our cash. So this is the way that, um, productivity teams can remain vigilant over the overall productivity of the development group.

00:25:18

Okay. Let's close this out, wrap it. I think we're coming right up on time. So just as some next steps, we do have an ebook on this subject that you're welcome to download for free. Uh, you can, uh, go and hit this link or just look up the developer productivity engineering ebook. You'll find it, take a look at kind of a, it was written by Hans doctor, the inventor of cradle, uh, and, um, and the person who sort of coined the developer productivity engineering practice in term. So that's a good start. If you want to learn more, try a free, a free Maven, a great build scan. Um, just if you have great, we'll just do dash guys scan, or just try to maybe an extension, check out our documentation, read more about build scans and of course, feel free to reach out to me. Justin react. That's Jay rock, cradle.com. If you have any questions and we'll take some live Q and a now.