A Scalable, More ContinuousFuture for Performance and DevOps (US 2021)

There’s no question that enterprises today want to further integrate continuous performance testing into automated pipelines. However, many are finding it difficult to reconcile the mismatched clock-speed of testing with today’s accelerated pace of development/delivery. You’ll learn, among other things, the key steps to continuous performance testing in DevOps: - Gather the right metrics to assess your gaps Prioritize, then systematize across your application portfolio - Plan for acceleration across the whole delivery cycle Design concrete measurements with the end in mind - Pick the right targets to automate Make scripting easy for multiple teams Develop performance pipelines - Use dynamic infrastructure for test environments Ensure trustworthy go-no-go decisions This session is presented by Tricentis.

las vegasvegasusbreakout2021

(No slides available)


Paul Bruce

Director of Customer Engineering, Tricentis NeoLoad, Tricentis



Welcome everybody. I have the extreme pleasure to talk to you today about a passion of mine, more continuous futures for performance and DevOps, who am I? I'm Paul Bruce. I'm one of the directors of customer engineering for Neil load. Now, part of the Tricentis family. I also chair some of my own events, um, Ali Fest, uh, DevSecOps days, uh, some of those things around observability and open telemetry. I'm a huge fan of my local Boston DevOps community. And I'm one of the core organizers of dev ops days, Boston for the past couple of years. And in terms of dev ops in high compliance organizations, well, uh, the, I Tripoli just released a standard that I, and a bunch of other people have been working on for awhile called 2 6, 7, 5. Uh, we're hoping to get that adopted, uh, as an ISO international standard soon as well.


So at least for today, the scope is performance in a continuous context, but I want to tag onto some other themes of transformation, uh, in the event. I'll describe what I mean when I say more continuous, uh, we'll look at how some of the customers that I've worked with are doing that. What approaches and framings are useful to do this stuff and briefly how the Tricentis platform might be able to help. So let's start with transformation. Ooh, all the rage these days, right? Yeah. Well, I say that jokingly, but in reality, yeah. Transformation is the new norm it's happening in every organization it's happening in almost every aspect of every organization. It's, what's driving. A lot of it changes. It really is the new norm, but like every large system, various parts and pockets of bound in these organizations. So it's rare. You see major transformations at a day-to-day level.


Sometimes it's as unfamiliar to us to say the universe of the very tiny, the quantum realm. Now, if you don't mind, I love particle physics. It's, it's just food for my nerd brain. So if I can make a quick, uh, correlation here, um, at the quantum level, there are no exact, th there's no perfect measurement, no absolutes. It's all statistical distributions. Hey, like performance engineering, right? And subatomic processes, just like the teens and transformations at your organization, just when you think, you know, what's going to happen and that you have a standard model pictured in your head. Well, something changes. We learn something new and we have to reconsider how our reality actually works. So when I hear people say things like, should we change or not? Uh, we can't afford to change, or this is gonna take too long or in the very worst circumstances, less just keep doing business as usual and consider that the transformation ship has already sailed.


And you really don't want to be left on some distant island. You cannot afford to do that. Not at an organizational level for competitive industry reasons and not even at a personal or team level, because slow doesn't work for anyone anymore. You can't afford to be on the fence about change. Now, look, I'm a performance and reliability nerd, right? That's my practice. That's my scope. That's where we're going ahead with the rest of this talk. I also listened to hundreds of other practitioners and transformation leaders each year from every conceivable size geography and cultural space. We can think of what they say can be summed up as what I call the performance imperative. It is imperative that your systems, software, your hardware and people where are high performance and can scale now look for consumers, right? Those users that produce revenue and pay your paychecks.


Well, if something is slow, it's equivalent to broken, they don't have tolerance for downtime or, or unreliable practices is just seen in the past year. Massive digital expansions. These things were driven by remote work, unemployment processing. Well, that was a mess, uh, overwhelming gaps in online education and healthcare management. They have bound. We don't need this stuff breaking. And remember if it works on your machine, but not on mine and tens of thousands of other people's, it's broken for business management and leadership. Well, this consumer expectation translates into an authoritarian demand that these systems be at their peak performance by default, not due to heroics superheroes in it are single points of failure and are toxic to transformation and dev ops culture there. I said it, but still misalignments like it prioritizing desktop experiences over mobile performance. It still happens. It's still a big problem.


Since more than half of all the internet traffic is now mobile driven and like Conway's law suggests our systems are an outcome of our teams, not just in terms of the communication patterns, but of the functional and dysfunctional behaviors. The responsibility lies everywhere, right? Not just in the QA or subject matter experts. Now for all of us in it, who actually are not superheroes, but just trying to row in the same direction. It becomes an essential and urgent thing to synthesize this demand for a high degree of performance in our systems and processes. Because when somebody else's stuff goes down, we need to recover fast. It's not just about what we're building and shipping, right? That's us. You are your weakest link in the chain from a performance perspective, multi regionality and cloud providers is built for a reason, right? And cloud providers have this problem too.


Things like AWS, Azure GCP zones, right? They go down, everyone has downtime, but we don't have the luxury of waiting to see if that's us too, because we use them. And we know that the point is everybody has the performance imperative, but what are organizations doing about it while we've seen decades of performance being treated as an afterthought? There's a silo. Now in my mind, this is for two reasons, right? One, it wasn't factored into planning because it was long and it was hard. So therefore it was left to the end. And two, because it was left to the end because it was long and hard. Very few people in the organization develop the expertise to properly apply performance engineering practices. Not to mention how time to grow them in everyone else in non-experts. So performance stayed siloed because it was silo. Oh, well, by now you shouldn't be shocked when I tell you I'm a strong believer in the principle that if something has to be done and it's hard, we should bring the pain forward.


Right? Bring it on, get so good at it that it's no longer painful, but simply just a part of our approach. And this old siloed model has been disappearing rapidly in the past five years for exactly these reasons, the implementation of the new approach I see most often and are at scale enterprise performance engineering teams is consultative, not external consultants, internal consultants, advocates, right? Where subject matter expertise or SME that that's applied intelligently across many groups, knowledge is transferred and practices are both documented yes, but automated and where there's a concerted drive to reduce cycle time and toil. And especially in testing processes and all this is really great, right? It really is. Except for the sad news is that this only scales very far too, before we have to start pushing it further, the third approach is to build a consultative, like build on that consultative win, right?


Those wins and allow various teams to use self service processes and platforms for the easy stuff, right? In order to develop awareness and proficiency. Now this buys back time for SMEs to even more intelligently assist and coach performance practices, making sure that there are the proper guardrails built into these processes. This is how the next phase of performance engineering is happening right now. And in all my work in the vast fortune, I'll, let's call it two 50, right? Global and international organizations. In all industries, you see an escalation of the performance imperative, but an elevation of performance and reliability to key strategic elements of it, business and user experience. They understand how important it is that this not be left to the end and driving towards a more continuous process means bringing good performance practices along with it to now organizations that are successfully crossing chasms like this typically have a blend of yes, agile and DevOps, right?


And they have a true performance practice with a capital P right run by some consultative experts, but expanding out to self-service models. This is so that they can match the clock speeds of development cycles with right fit performance feedback loops providing even more value to those teams as they deliver products and smaller, more frequent patches to their consumers. The goal here though is not self-service self-service is just a tactic in the broader approach of moving towards a more continuous and scalable model for software delivery, which itself is a tactic to harmonizing the broader performance imperative into transformational changes. So at this in mind, right, fitting performance and reliability into that more continuous model has been a key focus of mine and the Nilo team along with Tricentis for years now, right? There's a lot that goes into performance and it's not just testing when you ship new code, right?


How about rolling out new infrastructure, right? You forklift some components over to a new cloud region or something, you know, what's your observability over those changes and how is the continuous monitoring factored in not just to production, but to lower environments as well. How do you pragmatically verify that your systems are resilient to operational fluctuations? Like when pods start crashing because of ill-informed timeout defaults or unexpected auto scaling phenomenon, maybe you should be injecting faults into your continuous testing cycles to address these emerging behaviors. To the point is it's not about testing all the things. That's a myth, an anti-pattern it's about testing the right things and providing valuable feedback as fast as possible. It's about thinking holistically about what needs to be in place daily, not just weekly or monthly, right. And can we make sure that building repeatable process on these reliable platforms gets us to the point where we can scale these things so that it's not a firefight in infrastructure, budgeting and planning.


Every single time we want to run a test, continuous performance is a key component to scaling yes, your systems, but also your knowledge, your velocity, your learning, and your ability to do this proactively with less and less waste toil and risk. Now like Steve jobs said, when you rolled out the iPhone, are you getting it right? The right combination of capabilities, multiplies your positive impact at a day-to-day level, but hold a holdup. What's so different about this though, right? Haven't people been telling us to go faster and automate all the things for years now. Well, let's take a breath, especially me, right? Let's step back for a moment and ask why, why is always a good question for good engineers to ask, you know, this, the maniacal demand for automation, isn't just a reaction to increasing business and product philosophy. It's a response to the complexity of our systems and how both our systems and delivery processes have changed.


How have they changed? Well, look as a networks used to be simple right now they're highly distributed and componentized. And yes, of course, there are still plenty of monoliths, which are also complex that those new microservices and systems depend on there is a million ways to do everything and few ways to do those things, right. We live in an increasingly complex world and that complexity increases every year. And Hey, look, product delivery cycles have compressed at a point where there's a pipeline for everything now, or at least people would like to think that, right? How does performance testing fit in to that model? And look, if it doesn't what happens? Well, let's walk through for a second, right? When you ship a new change, that's supposed to make users happy. Well, without the proper feedback on performance, happy phases, turn frowny very quickly. Oops, your database scaling set.


Isn't big enough for that new increase in query throughput. Right now we have a performance Oop and an availability problem because some of those front ends can't connect to the database because the databases are saturated. Now comes the reactionary response. Well, oh, let's change some code and deploy again, which usually has to be fast tracked through those fancy pipelines. You think you have, and again, lacking proper feedback. Your change to that change may not actually solve the problem. It'll probably make it worse. See without right fitting feedback on performance and reliability into your automated delivery processes, you are basically begging for kind of failure.


Now we're not talking about all the big fat load tests running all the time. Look, nothing is ever easy to get into that process. If it doesn't fit into that process, pipelines have time budgets too. That's why when we ask ourselves, wait a second, how do we approach performance and reliability in a continuous context? The answer is that it's going to take a little work to pick the right things and not all that work has to be done by test engineers or developers or, or release engineers, right? The easier you make something, the more people can do it. And please the who does what question is something that you and your teams in your organization need to figure out right. Actively drive to what works best in which pockets of engineering, but please provide a purpose and a vision that doesn't get us stuck in holy wars.


And what shift left is, or what does it mean to properly do the dev ops, right? What we can all agree on is that having the right feedback at the right time is what keeps us delivering on time and not accruing architectural debt and unplanned work performance is not a checkbox, right? But hold up if asking performance questions are not in your definition of done, it's likely not going to be tested and therefore not providing feedback for good release decisions. So for continuous performance feedback, we need quick sampling. That's good enough in early cycles, right? It should not be hard for development or product teams to express their API in details in a way that can provide them feedback from within their own environment and tools. It should not be hard to produce testing artifacts that harmonize with more, proper, more scalable testing processes when they check those artifacts in and run them in an automated pipeline.


Same as other tests, you know, out of those pipelines now comes a frequent stream of baselines on various environments that may have their own SLS, right? Driving the question around what are the proper SLRs and metrics, right? This proves that you are exercising the performance mindset and simplifies operational readiness for when you do transition to much larger environments now to take an analogy here, right? You don't ever expect to run a marathon with your shoelaces untied and zero training, right? Why would you ever expect high performance in your systems and in your teams, your people where if you aren't exercising this continuous muscle, going back to how we scale this out, like I said, it has to be easy, right? You don't start day one training for that marathon running the entire thing either, right? You don't start with big things. You start with smaller goals and, um, and you make that progress to the point where you understand your capabilities and you can apply your efforts wisely.


If you want someone to get better at something you start with easy and small wins, right? Exercises and practices that anyone can pick up and run with stuff that doesn't bust paradigms entirely, but moves it over time to a more continuous approach to improvement. This is how we do modern performance engineering, right? We build self service processes and platforms that non-experts can use as guardrails to adopt a continuous mindset over performance and reliability. We start with APIs and microservices, these new distributed, sometimes monoliths, um, that now have even more network latency. And we extend our subject matter expertise to more complex situations. At the same time, we provide speed and scale over performance engineering tasks to our team so that they can be more continuous and transform safely. It all starts with making things easy. Why do I say this? Because I've seen it work over and over and over again.


I've been working with the Nilo team for over three years now, right? Driving to meet this imperative, our platforms, easy to use. It's easy to learn in terms of scripting. It's easy to execute tests from anywhere in the world, on your laptop, your browser and your CIA pipelines. And most importantly, it's easy to understand the outcomes of this testing, right? That there should be flexibility in all things. Absolutely. But there definitely should be ways that are already thought through. I work with enterprise performance and automation engineers every day to right fit what patterns they build and make available to their product teams. So that's easy to do, and it's easy for product teams to see the positive impact of now from the Tricentis perspective, Tasca, Milos, and QTS can accelerate those go-no-go decisions on release by having performance feedback in the right places at the right time.


Now, we're always happy to discuss this further with folks because everyone's slightly different. We're in a slightly different place, different journeys, and right, fitting this stuff into your overall strategy takes collaboration, right? We're all on a journey, but it's very cool for me. When I get to see people process and technology working together and helping to grow better practices with organizations. However, technology is just one part of the transformation, right? True transformation takes time. It takes effort and it takes the right decisions. Decisions affect people, process and technology. So this is really actually a third, a three body problem, right? And, and body problem, you can't change one without effecting the others. And if you want to scale the transformation, you need to make sure that you're always considering these three aspects. Like we've learned from the DevOps mindset. It takes effective communication collaboration, and it takes adaptation.


Now the good thing is over time, as these practices become part of your culture and you iterate, right, you, you get better, never stop iterating because transformation is a rolling stone and it's always moving. And so should you. So once you start getting better at these performance practices, they lead to better continuous motion, which leads to faster transformation and the positive impact compounds, further transformation drives demand for better performance. Yeah. So new practices and better continuous outcomes need to come out of that as well. Now at the beginning, I mentioned, I love particle physics, right? And I also love gardening and permaculture. That's just my thing, right? The folks in the permaculture space have a saying, if you want a healthy tree now, best time to plant one was 20 years ago or now like, right. You should definitely start now. And it's the same thing with this stuff start now. Right? Never not be looking to improve. Keep on. So on that note, I greatly appreciate your time. And I really look forward to discussing the future of performance reliability with everyone here.