San Francisco 2015

Test Automation For Mainframe Applications

Many organizations have very limited automated testing for their mainframe applications. This lack of test automation has many historical causes, now is the time to start improving the situation. With the concepts of shift left testing, and the new tooling specifically targeted to improving test capability on z/OS, there are opportunities to quickly improve the lack of testing.

This session will discuss the state of the current environment and some options teams are doing to build up their test automation even for the Mainframe applications. This will include a discussion of zUnit as well as integration testing and using virtual services to improve early testing.

Rosalind Radcliffe

Distinguished Engineer, Chief Architect for DevOps and CLM, IBM

Chapters

Full transcript

The complete talk — auto-generated from the talk's captions.

So today I want to talk about mainframe automation, test automation. But as Gene just said, I'm going to start first with a little bit about me and then a little bit about what the mainframe is today. So first, Rosalind Radcliffe, IBM Distinguished Engineer, currently responsible for DevOps for Z systems. I've been at IBM 28 years.

I started in ISPF development, and for those of you in the room who might not know what ISPF is, that's the menuing system, the green screen of the mainframe. So I started there. I learned how to do development there. I wrote assembler code, et cetera.

And then through my career in IBM, I've done a whole bunch of other things. I've worked in systems management. I've worked in services. I've spent my entire career working with clients, understanding their problems, and bringing it back to development.

I've done that, working with clients to understand what their challenges were so that, one, I could build better products, but two, I could really understand how things were done and how people were working. With my move into doing development tooling in IBM, it was kind of fun. I got to do development tooling for Z and improve those processes. And then when DevOps came out, why wouldn't I work in DevOps?

I've done systems management. I've worked in operations. I've worked in development. I already had the, it's not development's fault.

It's not operations' fault. I can't blame either side. So DevOps seemed like the perfect place to focus for me. So a little bit about mainframes.

Well, here's some fun facts. If you think about mainframe today, it runs the world's leading businesses. It runs the banks. It runs US retailers, insurance companies, large number of the Fortune 500 global companies.

And since I started with the mainframe, I've been hearing the stories of the mainframe is dead. I went through the stories of the dinosaur is angry and he's back. There are lots of fun things. But the mainframe is what runs our businesses today.

It's what runs the banking system. It's what you do when you take money out using an ATM. When you swipe your credit card, those go through mainframe systems, multiple mainframe systems probably. If you look at the data in the world, 80% of the world's corporate data resides on mainframes.

So we've got critical assets running on these systems, and in most cases, they're legacy systems that were designed and built quite a while ago, many cases 20, 30 years ago. And they're being updated, but not necessarily modernized. So we have business logic that's very valuable that's locked in these assets, and what we need to do is modernize that. So the mainframe today.

With the latest release of the mainframe, the z13, it's built for today's mobile workloads. It's built for reliability, availability, serviceability. It's the machine that stays up. You don't have to worry about it.

It's there. It's running. And you write your systems assuming it's running. It has security built into the system, encryption built in, so that you have the capability to do what you need to do in today's wonderful world of hackers and people who want to get your information.

The latest machines tested to 100 Cyber Mondays each day. I don't really want to live with that, but okay. And up to 10 terabytes of memory, as an example, in the systems. Now, yesterday I was sitting in a session with Jas Humble and heard a comment about having to build your software and work on your software to make it run on unreliable systems.

Okay. So you do a lot of work to get reliability out of an unreliable system, or you can run on a reliable system. Take your choice. Now, before you say it, yes, I am a mainframe bigot.

That is true. But if you think about it, there's a set of transactions that matter. They really matter. They're the banking transactions.

They're the transactions that I want to make sure work, and they're large-scale transactions. So why not put those on a system that's reliable, that's secure, and then do other things on the distributed system working with those mainframe systems? So there's a place in the world for all of these systems. Everything is not going to run on a mainframe.

But the high transaction, guaranteed reliability kinds of systems do. And so we need to make sure that we take those assets that we have, those systems, and modernize them and make them more accessible to these systems of engagement to make them available so that you can get the business value out of those systems. And what's one of the problems we have today is testing these existing systemsI spend many time with clients who comment, "It takes me months to test a change that's going to go into production." Months of manual testing. You're not getting a very fast release if you have to spend the last few months testing it.

One of the biggest problems I see in the mainframe space is it's exactly the same as I started 28 years ago. So when I walk into most mainframe development shops, it looks the same. The same tools, the same processes. What else is the same as 28 years ago?

Nothing? Okay, it's got to change. It's got to modernize. In the existing tooling, we didn't have continuous integration.

We didn't have the thought processes of continuous delivery. We didn't have the focus on automated testing. And if you look at my simple picture down there, everything's shared. Well, the mainframe is built for shared systems.

It is built and designed fundamentally to allow you to share, but when it comes to development, you really don't want to be sharing. You really want to be able to have your dedicated development environment or test environment so that you can do what you need to do and test it in an automated fashion. And one of the challenges and reasons why we have these shared environments is it's too costly to build additional environments. Talk to your system programmers and they'll say, "No, I'm not going to put up another system.

I'm not going to do that." Cost, MIPS charges, challenge of maintaining multiple systems becomes complex, and once I get a system, I'm not giving it back to you. You got to be kidding. I've got my test system, I'm going to keep it. So how do we deal with this challenge?

When I look at most customers today, the amount of automated testing is increasing. But when I ask them about automated testing on the mainframe, in general, the answer is, "Well, it's not zero." Okay, yes, it's close to 0%. Wrong. Not a good idea.

So what I want to do for the rest of my time is talk about some of the solutions that are available for this challenge. And I'm going to go through each one of these, and so I'm not going to spend time on this chart because in 30 minutes, it's hard to get through all these stories. So the first one, and this is the one place I'm actually going to talk about a product. One of the things that I'm somewhat surprised by, since we've had this technology for a little while, is we have a product called Rational Developer and Test Environments for System Z.

It's z/OS, true z/OS, running on Intel Linux. Okay? Think about that a minute. Every time I say it here at this conference, I get shocks and kind of surprises, but it means I can now run real z/OS, it's hardware emulation, so it's real z/OS on Intel hardware.

I can now provision my z/OS test environment or development environment exactly the same way that I provision my distributed environment. I can use my private cloud, public cloud, and I can provision a z/OS to do development and test on. Create a golden image of what you have in your environment and make it available. Think about that.

I don't have to share my environments anymore. Now, I can have my pipeline the same way I build my distributed pipeline. I can build a pipeline that instantiates a z/OS image, deploys the application to it, deploys the data, runs the test, validates the results, and tears it down. I have the feedback now and I have it fast, and I'm getting that information back to the developer in the same way that I do in the distributed space.

I don't have to do something to create a system. Now, RDNT is a way to do this. With RDNT, there are no MIPS charges. So you get RDNT and you don't have those additional software charges that everyone talks about on the mainframe.

You could do the exact same thing today using z/VM on real z hardware. Nothing to say you can't, other than the MIPS charges and your system programmer saying you won't do it. But with RDNT, it's available and you have the ability to create these images. With this technology and with the ability to create images, I now can do all of those things that I want to do for automated testing, for development.

I have access to the resources. The other thing that this actually somewhat does is has developers learn a little bit more about z/OS. So those new kids that are coming in to write new code, COBOL, PL/1, Assembler maybe if they really want to, they can learn a little bit more about z/OS on their own on a system that they don't have to worry about. They don't have to worry about breaking.

Once I have that system, once I have the capacity, and I'm not necessarily sharing my system with others, I can now start to do other things.And this is the cultural change. Okay, you've been doing development exactly the same way for 30 years. You need to move to the current century. You need to start having developers responsible for unit testing.

They can do it. You can write unit tests even for Z. There's Z unit, you can do it. It's a little harder with monolithic programs, or a lot harder with monolithic programs.

And in many cases, we do have very large monolithic programs that makes this hard. But there are a lot of programs that aren't as large and monolithic. They're parts of the environment that you can start doing unit testing on. And through the use of virtual surfaces, through isolation, through this test environment that you have, you can do the testing.

The other thing I find is since they spend months doing manual testing, the testing is all happy path testing. Well, that's not really useful. How about doing negative testing? How about causing problems in the environment?

The other thing to think about here is how about doing code scanning, code rules, all of those kinds of things that you do in the distributed space should be done against your Z code. Run the security scans to make sure that you're meeting the same security standards as you are in other environments. Your mainframe's backing your mobile transaction. You need to worry about its security as much as you do other platforms and automate the testing as part of the build.

Build the same pipeline. Build a pipeline that allows the testing of the system in a consistent way as other platforms. And by the way, integrate it into that same solution. So if you're using Jenkins as your pipeline, integrate the mainframe part into it.

When you do deployment, deploy both sides. I suspect many organizations have highly coupled environments still, mainframe and distributed. And because of that, you need to keep that deployment together. Now, over time, we want to make this more loosely coupled, but until it's there, deploy it at the same time.

Coordinate it together. Interface testing. Now, I said you can start doing unit testing, and you should. You should start doing unit testing for all that new code.

But you probably got a few million lines of code out there already, and you don't want to tell the developer that if they change a single line of code in a module that they have to do 100% code coverage or they'll never do anything right. So new code changes, they need to write unit tests for, but how do I get coverage for all of that's already out there? Interface testing. Most mainframe applications, most people don't use green screens anymore when they run an application.

Unfortunately, they do for development, but they don't generally for actually running the application. There's usually a front end to the application. So that interface is a great place to put a test in. Start testing at that level.

You can start getting coverage of your application simply by doing interface testing. And by having this in test case built up with both positive and negative testing, you now have a comfort factor that allows you to make more changes on the back end system. The other thing this gives you if you build your interface testing correctly is you can have enough of an interface that allows your distributed teams to work independently. So you don't necessarily have to have that Z resource available.

If you have an interface test that can provide enough of the function, then the teams can work independently of each other. It also allows you to isolate the function more efficiently, and so using RDNT on a Linux Intel box is even easier. Because one of the problems we have is these mainframe applications are large and monolithic, and they're interconnected. I have spoken to many clients in which their entire mainframe set of applications depend on each other.

And in a case like that, you don't really want that whole mess on a test environment, you really only want parts of it. So virtualize off the pieces that you don't need to allow you to get the testing done for that environment. Now that you've got the test environments, now that you've started to do unit testing, now that you have interface testing, now you can start to do what you really want to do, which is refactor those monolithic applications into services that then can be used for your systems of engagement. One, this makes it easier to test because you're building small services that are easy to test against the API.

And two, now you've restructured that monolithic application into something that's easier to maintain, easier to work with, and allows you to get the business value that you need to more easily. You can compose the services together into your systems of engagement more efficiently rather than having this big monolithic system.Now, if I have the automated testing I'm confident that by refactoring an application, I haven't broken the system. I can gain confidence in being able to do this and I can move forward with this capability. Now, most large-scale systems running today are monitored, well, in production at least.

Well, if we think about production data and production monitoring, that's really useful, but I need to know that information sooner. I need to use the same monitoring tools earlier and have the monitoring be part of the test so that as I'm running my test cases, as I'm running my tests, I get the right monitoring data, I get the right information about how this is going to perform. These systems, these systems of record, need to have sub-second response time. They need to be running fast.

They need to be running reliably. And so I need to understand how they're performing early. One of the things that's important to remember about the use of RDNT, it's a dev test box. It's running on Intel Linux.

It's not going to perform the same way z hardware is going to perform. Not at all. It works great for 20 users doing development. It's not a performance test environment.

But you can still use the data to understand, compare run to run, to understand how my application is performing, how things are working in the environment. This helps you not only understand how the application's performing, but helps you identify the areas for improvement. Because if you're shooting for that sub-second response time, you need to understand each part and improve its performance. The other piece that we look at in these environments, I've got this running system that I need to interact with, but I have a lot of operations data that tells me what's going on in the environment.

I know which modules are using the most CPU time. I've got that data, it's in the system. So use that data to optimize how those things are working, to optimize those programs. One of the differences we have in mainframe environments is when I do a build, I'm very efficient with my build.

I only build the program that's changed or the affected programs. What does that translate to? It translates to there's a high probability you have programs running that haven't been recompiled in maybe 20 years. Think about that.

Okay, I ran into a company that said they probably had recompiled everything in the last seven. Ooh. Okay? Got it?

Now that's not a good idea, especially when you think about the compiler's update with the machine hardware. And I mentioned something about the machine hardware changes and the latest release of the machine hardware supports a lot more memory, supports a number of other things, floating point, whole bunch of numerical calculations. If you haven't recompiled in 20 years, well, you don't recompile because you don't have tests. You don't have automated testing.

So now that you have automated testing, now that you can build up your test cases, you can now recompile and you can get optimization without actually having to change the code, just recompile it. Simple things through the use of automated testing allows you to do more on the system, allows you to take advantage of the latest and greatest hardware in the environment. The other piece is understanding what goes wrong. So I've heard during the discussions deployments and the issues with deployments.

That's true everywhere. So what goes wrong? What's really going wrong in those deployments? What's going wrong in production?

Feed that data back so it gets fixed so that you don't have those problems continuing to come up. So I want to tell some success stories of what people have done in this space and some other fun fact stories about the z system. So first, I'm going to use this example about a bank that through automated testing reduced software development time by over 90% and decreased time to market by 40%. Now I went and triple-checked that reference when I saw over 90%, because how do you decrease development time by over 90%?

Well, if it takes you two weeks to write the code, but it takes you three months to test it, it's pretty easy. So by doing automated testing, they've significantly decreased the time for development. If we look at the financial services provider here, they used a common set of modern tools, moved to common practices, virtual services, and reduced from weeks to hours in deployments. I have an example down in the bottom corner.

We have a client, a financial services company. They deploy their z/OS environments through a single click. It's the same portal that their developers use to do a distributed system. They click, 40 minutes later, they have a full z/OS system.

Now, that's an actual number, 40 minutes later. It's a lot longer than it should be. We've been having the conversation about what they're doing in their deployment. It should be able to be much faster, but in their case, they're doing a lot of extra steps.

But 40 minutes is a whole lot better than neverWhich was the answer that their teams got before. "No, you can never have your own test system." Now, 40 minutes later, you have your own test system. I also have down here an internal reference. Our internal teams are going through the same transformation.

They're modernizing, they're transforming their systems so that we can build our systems faster and do the same kinds of things through automated testing. Now, I have to admit that when I was in ISPF development all those 28 years ago, we actually had a full automated regression bucket. And every time I made a change, I would kick off the regression bucket before I went home, and I would run it because I didn't want to break anything. Now, those were the days when you could go home at night.

Okay. I come back to work, come back home in the morning, yeah. Come back to work in the morning, and I'd know whether or not I'd broken something. Now, I had to add a test case to the regression bucket when we added function, but we had it even back then.

If we look at what the CICS team has been doing, the CICS team has automated their environment. They've built a set of tooling, Jat, built on Ant. Ant technology sound familiar? Yeah.

Common tooling, common capabilities. They've built a set of tooling to allow them to do development efficiently, allow them to instantiate their environments and run through their tests so their developers can get feedback in minutes. Now, I have a couple of other things I want to mention here, a few stories that Gene wanted to make sure I said. First one I want to say is when I was talking about the reliability of a Z system, a few things I want to point out.

VisaNet is a really good example. VisaNet's been up for 19 years without being down. Sound like a useful thing? I know many banks, and a specific number of banks that have been up for more than 10 years without being down.

That doesn't mean they had a planned outage. It means they've not been down. Because of the reliability of the system, they can roll in new hardware, they can update the environment without ever actually being down. So they have the capability built into the system to allow them to do that.

The other story I want to tell, you'll have to go out to YouTube to hear the full and complete story. But Walmart has been talking about their transition, and they're using z/OS as a cloud-based environment. They have transitioned to building services in Z in support of their organization. And there's a particular story that they like to tell about a caching service.

They had a problem. Their distributed applications needed a reliable cache. Their distributed applications run all across the globe, and they needed to be able to have a caching service that they could depend on, even through Cyber Mondays, Black Fridays, et cetera. And they'd had a lot of trouble building one or getting one in the distributed space.

So the Z guys said, "We can write you a cache." So they did. They wrote a caching service. The way you request it is a web interface, ask for it, and you get a REST callback. Here's the REST interface that you use.

All of their distributed applications can now use this REST service as a caching service, and it's part of their application. They don't have a clue-- Well, they know it's running on Z because it always works, but they don't have a clue where it's running, and they don't have to care because it's just a REST call. They need the caching service, they have it. They have a number of other services, and they have YouTube videos to describe all of the different services they have.

But that caching service has been up for years and has had zero failures, and it's at one-fifth the cost of a distributed implementation. Just for fun. How much is that? More than a trillion transactions.

Okay, that thing's way used. And more and more parts of the company use it. You would think people would say, "Mainframe distributed? They don't talk together?" Well, they still might not talk together, but if you have a service that works and is reliable, why on earth wouldn't you use it?

So there are a couple of key takeaways. First, you've got to modernize the mainframe development practices, bring automated testing to them. We can't keep doing what we've been doing for 30 years. You need to have the capacity.

You need to give the teams the development test capacity that they need. Use virtual services. Use interface testing to get this to work. And what do I need help with?

So I work with a lot of companies and they build automated testing, and it's great, but how do you get them to stop doing the manual testing? What's the metric to prove they don't need to do it? If you can help me with that, I appreciate it. One more thing.

If you're interested, since I only could tell a few of the stories, there are a bunch more stories in "Mobile to Mainframe DevOps for Dummies," and we'll be handing them out, and I'll sign them downstairs in the IBM booth. And here is your signed copy. Thank you so much.