Las Vegas 2020

Is This Thing On? Instrumenting DevOps for Architecture Health, App TCO and Compliance

Large organizations are making progress towards agility yet struggling with the acceleration of technology investments and detecting desired outcomes.


DevOps and pipelines have driven efficiency gains, but what is the Total Cost of Ownership (TCO) for each application in a large portfolio? Are we achieving the ROI expected? Are compliance and architecture policy standards being met?


The next evolution of DevOps requires instrumentation converging with compliance, Enterprise Architecture (EA) and financials (TBM) to drive health and cost insights into the business planning near real-time.


In this session we will explore an approach used to unite DevOps, EA and TBM for cost elimination and help business partners make informed decisions.

BM

Brian McCarty

Principal Technical Architect, USAA

Transcript

00:00:13

Good morning, everyone. My name is Brian McCarty. I'm with USA. Uh, maybe here to talk a little bit about instrumenting dev ops for application total cost of ownership and compliance. Um, thanks for joining. Like I said, my name is Brian McHardy. I'm actually a principle technical architect with USDA's chief technology office. I worked for the CTO. Um, I do a lot of specializing in the, uh, to the practice of business of it. So, um, specifically though, it's, uh, I focus a lot around our cloud governance practice, uh, the tools and techniques for architecture. And then, uh, I spent a lot of time on technology, business management, which is really the, uh, uh, the discipline, the practice of merging, uh, technology consumption, information usage of technology with, uh, finance. So we try to understand the things like total cost of ownership for technology and the return on investment for, um, initiatives, technology based initiatives, as well as, uh, expense management, uh, planning and budgeting and forecasting. Um, I also do a lot to support our agile, uh, uh, tool chain as well.

00:01:30

You'll say you're not familiar is an organization that provides financial services to the military, the United States, military, and their in their families. Um, we start every mission, uh, every meeting with a look at our mission and our USA standard to ensure that, uh, the topic, the decisions we're about to make are in the best interest of our membership and the association. And in particular, I'd like to point out number three, under USA standard that's for the purposes of this presentation. I think it's most impactful, uh, that the be compliant and manage risk is the standard that applies, uh, plus the most S aged a little bit about what we do, and we provide a full range of insurance and banking investment products to the military community. Like, uh, like I mentioned, um, we are now unfortunately 100, um, organization, um, and steadily growing. We do have a very large security and innovation practice in USA, which does translate to a large, uh, you know, fairly large, uh, technology landscape.

00:02:41

I'd never going to talk about here just a little bit more. So, um, just to give you an idea for the purposes of this discussion, um, about what we had to construct to try to drive out to more data from our technology environment, um, and, uh, to be able to create automation, um, to, uh, eliminate manual and redundant tasks that were being placed on a development community. We do have, uh, over 30,000 employees, um, at 4,400 of them do work and what we refer to as a chief administrative officer, basically technology design and digital professionals, um, as well as, um, we do have a 96% retention rate for technology staff, which is pretty good. It's one of the highest in the industry. Um, we like to think that the reason that we have a high retention rate for technology staff is because we do put a lot of tension, a lot of focus on letting them do the job that they're, you know, that they're best suited for.

00:03:44

Right? So the job of actually designing, developing, testing, and supporting, um, uh, technology and applications that best serve the membership, um, as well as, you know, some internal terminal type applications. So the topics that we're going to cover today are, um, first we got to hit a couple of background, a little bit of background key terms. So you kind of understand where we're coming from, and we have a common language, a common way of, um, understanding as some of the, you know, the topics we're going to be covering later in the presentation. Then we're going to go straight into looking at a demonstration of some of the control automation that we've created, as well as the ability to the application total cost of ownership calculation. Now, this is probably the best time to stop in the, uh, discuss, you know, why, why is this guy talking about controls, automation, and application, total cost of ownership, and on the same presentation, they seem like topics that, um, that don't have much to do with each other.

00:04:46

In fact, it turns out that they do. Um, but we've learned, uh, through this process, is that the data and the discipline around understanding of the entire technology landscape from all of our applications that we manage to support 90 down to the underlying infrastructure, uh, uh, supporting configuration items that, that data and that discipline of having that complete and accurate inventory and our technology landscape actually drives the, the outcome for both controls automation and being able to calculate the total cost of ownership, uh, for an application. So we'll get into that a little more in the method under, uh, you know, more like how we were able to achieve this, just see, no, this is, uh, uh, it's still early in our journey, but we've learned enough to understand that this does have the return that we're looking for or return on that investment as well as, um, build for the future.

00:05:47

So I will show you some examples, but there is some additional, uh, roof opportunities. We'll cover that in the what's next. So a little background, um, you know, USA we've, uh, had a lot of demands placed on us the last few years. Um, both the increase in regulatory, you know, regulatory scrutiny, our desire to be absolutely as compliant as possible, as well as that organic just business growth of the organization. They've doubled in size. Uh, you know, um, multiple times since I've actually started working here 20 years ago. Uh, so these demands, of course, we're not going away at the same time. Uh, we actually have some additional new desires, right? We're really want to, um, uh, attempt to accelerate our technology psycho, uh, to be even faster, our ability to adopt that develop and adopt new technologies. And there's the business needs as well as to pay down technical debt, uh, that may exist in some, some places.

00:06:52

Um, then another desire strong desire is that our business partners have become a much more engaged the last few years on really trying to understand, uh, where their costs are coming from and how to make good decisions, good technology, investment choices. It used to be, you know, 20 years ago that there was business. And then there was it. Um, the way that the world operates now is that every company as it's going to succeed is really going to be technology driven company. So our, our business partners are much more educated. They're much more involved and understand and wanting to understand, uh, their technology landscape, what are they paying for working? They make changes to, uh, to try than value. So that combination of the demands and the desires, um, it would be almost impossible to achieve, right? If we hadn't adopted some of our DevSecOps principles, um, you know, way back earlier as in a few years ago, uh, to try to create efficiencies, to be able to, to deal with these, uh, demands and desires, however, um, our DevSecOps disciplines alone work going to achieve, you know, uh, the efficiency gains necessary to, to be sustainable for the further in the future.

00:08:06

So we needed to make some new investments, um, to try to, you know, to generate more efficiency, um, and to, uh, purge, um, a growing set of manual tasks that had to be, you know, that have been created over the years to deal with compliance application support I'm in that business growth. So, um, uh, we couldn't go any further. So we've done is, is, uh, is we've actually made some, uh, new investments in automation and data about the technology landscape itself. We've actually figured out ways of extracting and aggregated information about the applications and their actual configurations, their, um, their, uh, what exactly their dependencies are. Um, we have, you know, over 3000 applications in operation at any given time. Uh, so, uh, the, uh, that, that those islands of data, um, about those, you know, that, uh, uh, the underlying configuration ecosystem, that those applications depend on, we needed to do a better job of aggregating that into one place so that we could drive and can build automation.

00:09:17

Um, that's effectively business rules around managing and monitoring, uh, that technology landscape. So we're talking to course about two of those today. So, so far, um, this, uh, investments is starting to pay off. Um, however, we, of course, are going to do a more, it's really a never ending, uh, type of initiative. There is. There's always ways of finding more efficiency and developing more, um, capabilities, um, that it's almost like scientific discovery. We ask one question, you ended up, I answered one question you end up asking me anymore. So a couple of key terms, general controls ever referring to more of a regulatory terminology around generic patrols since USA is also a bank we have looked at, uh, we, we do, um, uh, closely, uh, monitor and adopt the, the, uh, it handbook from FFIC. And there's a link down there at the bottom if you're not familiar with it, but I extracted some relevant, uh, talking points for this, for this key term, ensure the proper development, implementation systems and integrity of program of data sets.

00:10:26

That kind of controls that we're talking about when I say the word control application portfolio management, and this is a sort of a modified definition for some industry thought leaders. This is the one we use internally, basically, it's that actual inventory of all of our application landscapes to describe the technical architecture, as well as information about which of the applications help at the value, the business value that, uh, that we expect from it. And some of the support information as well, of course, value, um, as a measure of costs as well. So, um, the fact that we're working on application TCO is what is what's going to drive our value, uh, measurements for the future to figuration manager database, um, probably V probably a familiar term. Most people don't spend a whole lot of time here. I just want to point out that we're be really focused on is understanding the configuration items and their relationships.

00:11:23

We've done a lot of work on that as well as sort of as methods for certifying that data over the last couple of years, monitoring for health and for completeness and accuracy, and then building some controls directly into seemed to be to ensure that that, that those, uh, that, that relationship certification help is sustainable over the long-term at TCO. Okay. Lots of people probably have different definitions or opinions, but when it comes down to it, we're meaning at TCO is we're looking at the sum total of all of the labor hardware, software, and services, the cost per application, uh, sum that up to get a, to get a TCO. Now we're not actually tackling, um, the development costs yet. That's a next step for us that the TCO calculation that we're using right now is that direct and indirect costs for, um, managing the applications.

00:12:21

Um, and it's, it's, uh, helping estimate the business value so that we can rationalize applications from that just a little bit more about regulatory, uh, uh, regulatory and compliance. This is just an example of one of the, uh, control, um, needs that we have out there. Uh, just something, uh, I just picked one of these random from FFIC handbook. Um, just as an example to say that regulators, you know, it's important. I remember they described the requirement, you know, not the implementation. They don't say necessarily, you know, the method you should use, they say establish appropriate change management standards and procedures. So what we do is you take those that, that requirement, and you work with your first, second, third line to try to understand, you know, what is a satisfactory implementation, um, that have a control for that requirement of establish appropriate change management standards and procedures.

00:13:26

Okay, so let's go right to some demo time gonna show you a couple of views of our controls automation, the output from that, as well as at TCO, this is an example of a report that we've, uh, you know, automated the creation of it's, uh, it's specific to one control. It's the detailed evidence that the control 1 0 6, 1 by six is being met this control. Um, there's of course, many of them there's hundreds of, uh, uh, potential controls. I just pick this one to illustrate. If you see there in the, uh, change requests, what is it we're doing here? This, this control you're saying is the requirement is that the changes need to be approved by an authorized person before implementation. Um, out of our last change windows, by last seven days, there was 666 changes that went to production. What we're saying is out of that 666, 664 of them passed, they met that business rule requirement of saying the date that the date of approval was prior to the date of implementation.

00:14:39

So that's just a business role looking at the data, right? Compare two dates together, if one's greater, greater, or equal to the other pass, uh, you know, if it's less than you fail. So, um, in this case actually doing really well, uh, over time back to this views a whole month of September, it looks like we've met that requirement. Uh, we stayed in the green for that the whole time you say, but it's not a hundred percent well from a, a, uh, from a organizational perspective, there's threshold to say, if you met, met, met, or, uh, uh, met or did not meet those expectations in this case, uh, um, you know, as long as it's as above 98.4, uh, we're still in the green, but that doesn't mean it's perfect. There's always room for opportunity. And in this case, we're saying, well, we failed to two of those 666 and say, well, what do you do about that?

00:15:31

Well, this view, this, this environment that we've created, it makes it very easy to create those coaching opportunities, uh, by presenting that failure detailed right away, we can see down at the, you know, towards the bottom of the middle, you know, the actual change request numbers, hyperlink directly to change management system, so that all the details about that are available. Those approver, every deck had the names there, but, um, the approver, um, could easily have a conversation, a co coaching opportunity, uh, with the implementer to make sure that that gets done, they partner up to get that done in the future. So that's a view of all changes for one controller, you know, the whole ecosystem, uh, for one control. This next view I'm going to show you is, okay, let's look at it by application. All right. So what I've done here is this is a view of, of, of showing how well, uh, does all of the applications in my domain.

00:16:28

I'm a domain architect. So you can see up in the upper right hand corner, it says, display my ass. Um, how well is for this one particular control is my, uh, is my domain performing. You can see. So there's a risk level. There it's basically are a method for understanding risks. It looks like we're doing a pretty well, a 33 applications in my domain. Only three of them are not meeting this control and say, well, why is that Brian? Well, let's find out, let's, let's look at it. So let's click on one of the ones that is not meeting that expectation and look at some details here. So it's saying is for this one application, um, there is, uh, there was no assignment group specified, um, or it's not found in active directory. What this is saying is, is this for one app, there must, there is an issue with the data and the CMDB telling us who is responsible for providing, uh, that initial return to service that first contact for restoring service this application.

00:17:25

So since this data, since we have all of the data, you know, uh, available to us, we can easily present those views, uh, you know, very quickly. So it's easy for me now for this, why I have hundreds of controls to meet. I can actually look very quickly to say, I can show me how am I performing for all controls, one control, and I can take action on that as the architect. I know the reason we worked in this yet, why this is not being met is because this application, it's a low risk, small application that we have yet to, uh, you know, clean up or sanitized some missing data values to, to bring this whole. So, uh, rather than email or spreadsheets, or, uh, you know, trying to have some sort of, uh, dig up some details in that what I can do is I can just slack the application owner directly. You don't have to deep dive link and saying, Hey, this we're missing. You know, we're, we failed on this control. We need to get this, uh, uh, cleaned up. And so can you please take care of that? So once that's that piece of data is established, that proves that there is in fact, a proper support group for that application within four hours, you know, that control now, uh, be met. Um, and so we'll go green.

00:18:39

Great. Um, okay. So let's look a little bit about now let's pivot and look a little bit about application total cost of ownership. I saw I've produced two views here about how we're, this is the, uh, the end result of the automation that we're running to be able to perform, uh, uh, look at the entire landscape of the underlying configurations that went into the, um, maintenance to hosting in support of that particular application. In this case, this is a business view. So I'm going to show you two views. I'm gonna show you the business view, and I'm gonna show an it view. This business view is saying, uh, to our business partners are, can, they're, uh, fully accessible to everyone, uh, you know, in the company, right, that that has, uh, I need to see this kind of information. They can easily see what is it they are getting from our it hosting for their apps.

00:19:27

They're very familiar with the application names. Um, we're trying to get them more familiar with understanding our what's, the actual ITIL service portfolio that they're receiving benefit from as well as very specifically out of the service catalog. What is the technology, the technical capabilities that application is dependent on in this case, we're seeing that it's got some, uh, distributed databases they've got middleware talent at that's actually, uh, uh, are what we refer to as our private cloud environments are more modern, uh, ability to host applications, uh, basically based on cube, you know, Kubernetes. So, and then our classic, which is really just our traditional way of, uh, J E support for deployment to JVMs and there's some storage and some virtual costs. So we didn't tell them the underlying, you know, give me the, every specific server or storage block that was used, uh, or, um, you know, which individual databases that this is rolled up to the level of understanding which applications they are, because they understand applications now, right?

00:20:30

This is of course, uh, that's something to do with the posits. I'm redacting part of the name, but you can see that it's an application that revolves around deposits as well as what was the unit of measurement they're being charged. Um, there's a little bit of need to understand some things around support that, for example, this is a medium level support because the architectural complexity of this application looking at our application portfolio management system tells us this is a medium complexity application, as well as, um, the, uh, contract that, uh, between the two parties in order for us to name reg w compliance, uh, we, they need to understand, they need to, the lines of business need to contract with, uh, our internal affiliates, uh, you know, it being one of them and saying, what's what, uh, what are you offering me? What am I paying for?

00:21:22

And so this is the contractual obligation, uh, record. So, and after that, we can just simply look at the units, that unit of measurement, how many of that unit measurements that you consume, what was the charge? What was the price for it? And then, so we just multiply those two together to get a cost. So price times, quantity. In this case, we can also look out for this number 27, 7 81. I'm going to show you that next chart kind of tie this out between the business view and the it view to show continuity here. So like, it's like, let's look at the it view sightsee view is saying, you know, for here's a collection of individual technology offerings, what we're saying is that, that from an it perspective, we could deep dive and actually see, uh, from the data directly from the CDP, what was the relationship, uh, between that application and, uh, where it's being deployed.

00:22:11

So in this case, it looks like it's got four JBMs, it looks like it's probably four different JVMs, um, that this application is deployed to. So of course we need to understand, uh, you know, the aggregate of that. That's where that 27, 7 81 number comes back. That's a total megahertz consumed by all of the JVMs at this application's depended on. And I know it's this application because we are, this is the secret sauce here. We are keeping a unique U U I D a U U ID, um, for every application, the entire ecosystem, uh, without doubt without fail, this is the, uh, uniqueness to see the thumbprint of that record throughout our entire ecosystem. And of course, we do show the name for convenience purposes, but when it comes to, from a data perspective, that unique ID is the important bit. So you say, well, how did it get there?

00:23:05

We're going to talk about a little more, what we're looking at the CDB is that all JVMs for this one app, um, they showed up in the CVB because of this one-time registration that was done in the CICB pipeline. And so let's look at that method. Okay. So I'm going to walk through the method a little bit. And so this method, uh, just, you know, from a imaginary perspective, think of this as a new initiative, someone, the business area has decided, you know, they need to, uh, uh, fund or establish a new initiative. So they work with someone like myself, a technical architect to, um, understand, uh, the architect next to the distinction of decision. Well, I can tell you that this is not a capability we currently offer. So I, new capability is going to have to construct a new application. So the architect goes and establishes a new Yorker application and application portfolio management.

00:23:57

They work with the dev ops team, usually solution architect, or a tech lead type of person, someone leading the development team to, uh, uh, to establish the initial configurations for that. Like, it's a sort of like a register one one-time registration process for entering, um, our pipeline. So, uh, you know, uh, the, before source code, you checked in, or before a first deployment initial appointment, you can't join up, uh, without that tech lead, that solution architect establishing a one-time configuration that one time config has got that you, you ID right, that unique ID. So the downstream, anytime that that application deployed all of the dependency eyes that we were actually, um, going to be looking for, for example, you know, the container, right, that container namespace is automatically related, uh, to that application ID, uh, regardless of the number of times it's actually deployed. So what that means is, is that we can pick up, um, in the CMDB, that relationship between application portfolio management and all of the federated and discovered, uh, configuration items that exist in the ecosystem.

00:25:08

And so that's great. So this is where, the point where we're at a little while back, um, but what we've done now more recently is actually established, uh, you know, uh, ability to collect the health, the consumption, the configuration items, all in one location and data mark, so that we can, um, uh, you know, run additional automated control monitoring, um, as well as the full technology business management calculations, which is where TCO comes from, um, you know, in one aggregated location. So this is very powerful. This easily can answer a lot of questions, much quicker. It eliminates, uh, uh, islands of automation eliminates manual handoffs between teams and really establishes a holistic ecosystem between application development, support architecture, and, you know, in the business partners that are dependent on.

00:26:02

Okay, so let's just talk a little bit about the, uh, uh, what's what's left for us to work on. So, um, we're, uh, you know, this, this journey, like I said, at the beginning is not complete we've, we've laid the groundwork. We've, we've implemented several controls, uh, but we still have quite a bit of work to do to, um, enhance the data that we're getting from our public cloud providers. Um, we're finding ways of actually doing discovery or Federation of that data, as well as ingesting, um, you know, financial data there real time. If you're not familiar with the term fit ops, you know, it's something you Google, it's basically a consortium it's, uh, revolves around, uh, you know, that's a member of the Linux foundation, you know, focusing on really maximizing the value you get, um, and the managing costs and your public cloud ecosystem.

00:26:52

So we're going to bring that data in here, uh, in 2021. So we can start looking, studying that. Um, we're also going to expand control testing, and we're going to, now that that framework is in place. It's just a matter of coding up as many rules as we can find to do testing in an automatic fashion should be eliminating hundreds or even thousands of manual effort to do that control testing, uh, for, you know, both auditors regulators and for our internal needs to understand the health of our environment. We're going to do a lot more around deepening the actual APM and, uh, you know, application portfolio management, technology, business management insights. This is a lot where I'm going to focus on the next for the next year is now that we actually have that financial and technology, uh, ecosystem data, uh, coming together now, what can we do with it, right?

00:27:44

What are the insights you can drive from it? I want to try to find ways of automate or find patterns that we could automate recommendations around application rationalization, or making architectural recommendations back to our business partners. That's getting pretty excited. There's a lot of opportunity for machine learning there, as well as your, some more traditional data analytics. And then I'm going to do some more work around automating the actual data certification process itself so that we can, uh, absolutely be one, you know, be very ensured that it's, uh, that all of the data that we're looking at in this environment is actually accurate and complete. So with that, I just want to say that w also, hopefully what's next is I'm going to get to see all 20, 21, uh, maybe, um, uh, you know, maybe, hopefully everything will be, uh, going, uh, going better next year and we can actually see each other and purpose and person in Las Vegas. Uh, one more time. Um, my slack handle is at BMC. Feel free to look for that in the, uh, you know, in the conference, uh, slack channel, slack channel, as well as, uh, if you miss me at the conference, feel free to reach out to me and LinkedIn. Thanks a lot.