Las Vegas 2018

Seven Steps to Move a DevOps Team into the ML & AI World

Nobody questions the potential for machine learning and artificial intelligence to revolutionize DevOps productivity and business efficiency as a whole. However, organizations leveraging ML and AI to their full potential are few and far between. Done right, ML/AI and DevOps can turn enterprises into “digital attackers” that can release higher-quality software faster and at lower cost. Torsten Volk will outline the seven basic rules to follow to implement ML and AI in real-life DevOps situations. This session will cover: Getting started with AI and ML. The role of parallel pipelines. Which metrics to use and how to use them effectively. Why patience and experimentation are key to making it work.


With over 15 years of enterprise IT experience, including a two-and-a-half-year stint leading ASG Technologies' cloud business unit, Torsten returns to EMA to help end users and vendors leverage the opportunities presented by today's hybrid cloud and software-defined infrastructure environments in combination with advanced machine learning.

TV

Torsten Volk

Managing Research Director, Hybrid Cloud, Software Defined Infrastructure and Machine Learning, EMA Research

Transcript

00:00:05

Can everybody hear me? Okay? There we go. Or do you hear me with a German accent? Yes. <laugh>. Yeah, that's the microphone. What we wanna talk today about today is AI and DevOps. And believe me, it is difficult for me. I have a hard time put the term artificial intelligence AI in there. I think we all know what we are really talking about, and I wanna start and set the frame a little bit to, to have that joint understanding, because enterprise management associates via an analyst firm, and, uh, we are all about pragmatic solutions and, uh, trying to get people to, from point A to point B with the least possible pain and, uh, artificial intelligence, what you can see here, and I'm gonna turn all of those cool videos on, um, can help us there. It can help us produce better software, it can help us do that cheaper.

00:01:06

It can help us do it faster and eventually get to a continuous release process. And what you see here are all of the different options, and not options, but expressions of ai, if you will. So, uh, at the top left, we see different types of cars that are all learning to go around the track. And they couldn't learn that in traditional code because there is no, I mean, they, they can, but then they would only work for this track, right? And if you change the track, um, the, the cars, the little blue and green and red cars would have no idea where to go and how to drive around that track efficiently. And, uh, that comes down to the whole challenge of artificial intelligence. And that is, I have three components. I have features, and those are the input variables. Do I drive left? Do I drive right, for example, for those cars?

00:02:02

Then I have, um, actions. The actions are basically, uh, sorry, I <laugh>. I have features where I, where my car can drive to the left and it can drive to the right and, um, I can detect what it's exactly doing currently. And then I have actions of where I turn the wheel to the left and to the right and steer around that course. And then I have a reward. If I get around the course, I get a higher reward, just like in the Mario game. If I get, uh, a further down, uh, um, in the, uh, in the level in Mario, I get, um, I get a higher reward for my, for my, uh, artificial intelligence model. And, um, those are things that I can really only do if I use artificial intelligence models instead of structured coding. And, uh, what you can see, for example, in the Mario model, uh, there are a lot of individual decisions, basically clicking the individual buttons, pushing the individual buttons on the, uh, on the joy pad, A, BXY up down left, right?

00:03:07

And, uh, I get a reward, uh, uh, for pushing them in a specific situation. And the situation you see in the Mario example at the, at the bottom, uh, or at the left in that gray box there where it shows the, the obstacles and it shows basically how Mario sees the world. And then, uh, Mario gets rewarded for responding to that limited world view differently. And, um, what you can see on this next example is, um, it's a computer game that probably a lot of you recognize. Uh, it, uh, it's, it's asteroids and this player is a neural network that is playing asteroids. And you can see it just last, right, uh, uh, very quickly. So it, it changed its strategy and it's now, uh, driving around much more and shooting and driving and turning and shooting, and is checking out if that is a strategy that works out or not.

00:04:08

And, um, once that one is hit by an asteroid, it'll try the next thing and the next thing, and it'll get rewarded. The more of those, um, asteroids it hits and it'll get basically, uh, punished the quicker it, uh, dies and the less it's achieves in this level. And, uh, the reason why I put this, uh, this, uh, example there is, uh, this guy actually almost plays like a human being, even though he has no idea what asteroids is. He there. This is just, uh, purely a pattern based trial and error effort. And if you get hit after a certain behavior, or if you hit more asteroids, then you get rewarded more. And, uh, if you don't, you get rewarded less. And, uh, to, to end the history or the science part of, of this talk. Uh, this is from 1993, and this is the first, uh, convolutional neuron network, uh, that really changed the economics in, um, in how checks were processed.

00:05:12

Before you had no way of automatically reading people's handwriting. You got 80%, 85%, uh, hit rate when you tried and do that. And that was just not enough. And, uh, in 1993, uh, this, uh, y Kun, um, he's now, I think chief, uh, technology officer at Facebook came up with, uh, this convolutional neural network, uh, algorithm and model that could for the first time do this. And, uh, that's what makes us, that was kind of the, uh, that the turning point that makes us compare, uh, artificial intelligence and machine learning to the industrial revolution, to everything that really fundamentally changed the, um, changed the economics of, uh, how we do business. For example, instead of having lots and lots of people transcribing checks into computers, we now scan them in and they go, they're all done. And, um, that's the same thing in a lot of other disciplines as well.

00:06:12

So the reason why we are having this talk today is, um, AI is not trivial, even though it seems trivial in many ways. But, uh, we can have a large number of different kinds of neural networks, for example, and those are just neural networks, right? And then for every single neural network, I have a lot of parameters, hyper parameters, I have a lot of, uh, data that's attached to it. Um, each one of them has its limitations, performance requirements, hardware requirements. Um, you need to configure it properly. Uh, otherwise it's not gonna work, right? I mean, I, my favorite example of is I always get very, uh, uh, I dunno how, uh, very almost emotional, right? I've been doing this stuff for, for a long time. And you talk to data scientists and they talk you through the individual steps that it takes to set something like this up.

00:07:10

And this is a very much simplified, um, overview, it <laugh>. But every time you do a step wrong, for example, for the hyper parameters, you, uh, you do it too deep or you do it, uh, not deep enough, or you, you, you, you pick just the wrong configuration. That thing will still run for a week or for three days and cost you $50,000 and you get nothing out of it, basically zero. Which is why we are often so dependent on data scientists helping with these things, which then prevents that everybody can do it. And, um, that is really the big, the big goal. Everybody needs to be able to leverage this. And here's why. You can see again, this car, it's a little bit of a different model. It has three sensors in the front, one in the middle, one in the left, one in the right, and it just randomly tries out how to get through that maze.

00:08:03

It doesn't, it doesn't know anything about driving. It just learns by crashing. And then the reward function, um, has a very, has a low output, and it drives it, again, with a different strategy and, uh, a successful behavior speed and where it should stay on the street and, uh, how it should turn, how much it should break before turning. All of this is, um, rewarded versus, uh, you know, if it goes too fast in the corner or if it does crazy things or hits the the wall immediately, then, uh, that gets punished. And at a certain point you will see that car starts driving around, um, around the corner. But in the beginning, it looks like very hopeless, right? It looks like this car will never do anything. And the reason for that is, uh, uh, it doesn't know anything about the world. It doesn't know that it's a car.

00:08:52

It just looks at sensor data. How far is the wall away in comparison to how what I do to the gas pedal and what I do to the steering wheel? That's all it knows. And, um, so it's, see for example, that it's a bad idea if I'm very close to the wall, to, uh, to, to, to push the throttle through all the way, but it'll try that anyway, right? Because it doesn't know anything about reality. It explores basically all the different options that it has and, um, uh, goes through that whole evolutionary process, um, over and over again. Uh, it doesn't have a memory from, from another car or from a different model, um, that gives it a head start, right? And, and that's one of the, uh, that's one of the issues that makes this a little bit tricky, but what is important? And that's my really, my rule number one.

00:09:40

And, um, we've been playing with these models, uh, since 99 basically. And the interesting piece is they're still pretty much the same. Um, they are just a little bit more accessible to everybody, but to get them, to get them into the enterprise and to get people to benefit from it, um, they have to have a basic understanding and not be intimidated by data scientists saying, I, yeah, yeah, you can't, you can't do any of this. This is cr you have to, you have to pre-process. You have to, uh, you know, do all of this configuration and it takes a few months and, uh, you know, at the end, uh, I don't even know if it'll work. So that's why at the end of the day, you have to really be able to understand the advantages of this whole, uh, feature, action and rewards model. And that's not all that difficult.

00:10:31

But the difficult part is, um, in the next step, or I have actually an example here for this step. Um, we did some research and we looked at, uh, this is called driverless ai. There's a, this is a software called H2O ai. And, um, that is really interesting because it builds the model, and you can see it does all of the features. It, it parameterize, um, how, how it measures the errors. It, it does basically everything iteratively by itself. All it needs is resources. And then it shows you in real time, uh, how important, uh, the variables are that were the input variables that it found, for example. And, um, that is something that lets you then, uh, try things out like, uh, you know, you can do things. You download data sets randomly, and you see what you find. For example, uh, what kind of person likes a certain kind of micro brewery is what I, I ran with 60,000 examples.

00:11:33

That that is public domain data. You can get all kinds of interesting data, uh, from, uh, websites like Kaggle and run them through a model like that, and you just get a feel for what you can achieve. And then at the end stands a predictive model, right? If this worked, and if the error is acceptable, then you can provide this as an API, and you can start using this for, for any kind of software, uh, like a microservice. Uh, number two is, uh, start with, uh, with narrow challenges. And, uh, that goes in hand in hand with number one. Uh, there is a lot of things we have, we have played with a lot of, uh, uh, those technologies and experimented with a lot of those technologies. And, um, this is, uh, the most prominent and to me upsetting example, um, where there was, uh, IBM Watson for oncology in the news, uh, a few months ago, and it basically said, yeah, that thing doesn't work.

00:12:27

It's all crap. Uh, we should have never, uh, <laugh> started with it. All the hospitals, all the doctors, nobody likes it. It doesn't recognize cancer. It's, it's terrible. Um, but in the beginning when it started, everybody thought, wow, it's the biggest breakthrough. It, it, it changes the way we, we cure cancer. It'll, it'll change everything, right? And, um, the interesting part is it works perfectly within the parameters that I would expect it to work in. I mean, I haven't dug into the individual details, but what happened here was there was a scandal made, or there was a whole big problem made out of, um, this, the software, this AI driven software, basically not independently learning what it, uh, uh, to, to, uh, to come up with a cancer treatment, right? And to, uh, to be compliant and to be, uh, to be secure and to just work the way it should work.

00:13:20

And like a doctor works almost. And, um, that is just a very high ambition. And there was no, uh, uh, uh, checks and balances and, uh, no limitations, uh, of, uh, of AI considered. And, uh, this is basically the opposite of, uh, of a narrow approach. And, uh, rule number three, treat AI as an experiment. That's my own experiment. That's for my aquarium. And, um, I used a technology that I'm not gonna name, but the product managers of those tech of those technologies thought it should be able to read out this industrial display here and just transcribe it into adjacent API. Um, and you see a preview of that to the right, but you can see the numbers don't match. It doesn't recognize a lot of the text. Um, those three numbers are the most important thing. Um, they just don't work, right? And, um, the funny thing is that, um, on the example, it works just fine.

00:14:18

And the example looks at least every bit as difficult or easy as, um, what I did in my own little lab when I tried and automate my, uh, my aquarium. But it shows that artificial intelligence is not intelligent. It cannot read, it cannot know. Look, those, the 62 here, that's not a 6.2 or is 5.1, it's a pH so it has to be a point. One pH cannot be 51. Uh, things like that. It would ex I would expect if it was truly intelligent. But at the end of the day, it's, uh, uh, finding patterns of pixels on, uh, a high resolution photo and is basically transcribing those into, um, into, into API output. And, um, it does it exclusively on correlations. It doesn't know what it's talking about. If that's a number, if that's a, a letter or if that's a picture, that's, that's not at all what it's doing.

00:15:14

But if I had done this, this was just my own project, if I had done this as a, um, as a production project for a customer, for example, I mean, there's a lot of similar thinkable scenarios for industrial controllers that have these displays. I would've said, yes, absolutely. That's easier than most of the stuff that, uh, that they do in the, uh, in the demo. So this will not be a problem. But it didn't work. It couldn't be, uh, made to work. It was just a dead end. And in the end, we had to use a standard library that had nothing to do with AI and machine learning, and was a lot more labor intensive to implement. And, uh, there's a couple of interesting examples that we al always need to remember when we talk about what to do with AI in, uh, building our own software.

00:16:01

We can see up to the, uh, top left, um, one of the, uh, image recognition tools, um, uh, totally mischaracterized, uh, basically weapons here, uh, at the bottom right, somebody managed to build a mask that, uh, was fooling a facial recognition software. Uh, top right, there were two computers that were learning how to chat together, um, and having fun in their own language, which didn't do anything. And, uh, here we have a couple of other issues where, um, that showed the difference in expectations versus what can really be achieved. And that is, from a project management perspective, a really important piece. And that's why I really always recommend to start with turnkey APIs. If you talk about AI ML artifacts, turnkey APIs are really, really cool, and they do a ton of stuff. And if they work, uh, just like, uh, one of the guys in the keynote said, uh, it's, it's not core for us to, uh, to configure TensorFlow and to build the model and train it ourselves, um, if we can get something out of the box like this one here.

00:17:07

So I wanna show also something from IBM that actually, uh, that was positively in the headlines. Um, this is, uh, visual insights here. Um, and what this does is it basically replaces human quality controllers. It finds issues that it hasn't seen before. And, um, what it requires obviously, is that you give it some examples in the beginning, but the whole overall, um, AI modeling process and software, uh, is out of the box. And, uh, that is an excellent way to start. And this, uh, you can, you can start at a, a few different d um, different types of, um, different types of, uh, starting points. Basically, you have pre-trained APIs where you can do a whole bunch of commodity stuff. You can get them from Google, from Amazon, from, uh, Microsoft, from IBM, from a ton of other vendors. They're kind of commodity. Then in the middle, those are, to me, very interesting stuff like this.

00:18:03

H2O ai, self-driving, ai, uh, IBM, visual Insights, Splunk Machining learning Machine learning toolkit. There's a lot of interesting things that you can turnkey use without a ton of implementation. And then to the right, we have things like algorithm car and SageMaker from Amazon, where it's basically a bunch of APIs where you at least don't have to deploy your own, um, your own, um, uh, tooling. And, uh, next point, uh, rule five make it modular. And I'll keep this short because we don't have a ton of time. I think, um, make it modular just means that use a different color on the board for ai, because you really don't know if your model will actually produce the results that you think it will. And believe me, I've done it so, so many times when I was still doing work and not just talking about work as an analyst, but, um, we, we really thought, oh, yeah, that's a slam dunk that will absolutely a hundred percent work.

00:19:02

Like with that aquarium thing. I mean, I would've bet a lot of money that that would've worked, you know, and so would the guy have the guys who did the API, but in the end, it didn't work. So we have to have a different color for this so that we see, yeah, that's ai. We have to have some contingency planning in case in, uh, in place, um, in case something, uh, doesn't work immediately, right? And, uh, funnily enough that the varium thing now works or would work, I haven't redone it yet, but, um, it was fixed, right? The vendor retrained it, it took, it took months to retrain it for such a simple, relatively simple thing, and it is now fixed. And now I could use this API very reliably for, uh, automating my aquarium controller. Uh, number six, uh, treat it as code. And that's actually, I said this this morning for, for the first time AI as code.

00:19:52

And it truly, I think, is absolutely critical that we think of AI as code because those, uh, those issues that we had with, uh, you know, with the cancer, with the Tesla, with, with all of those things, uh, it's, it's, it comes down to the same thing of having infrastructure as code. Uh, if you have it as code, we can version control it. We can, um, uh, reproduce is reproduce issues and basically turn it into artifacts, let other people use it as well, and, uh, let everybody benefit from what we found out. And ideally, we have a pre-trained model that, uh, that people can use. So yeah, it's basically a service, right? We have a rest, API, and we deploy to a docker container, serverless function, um, streaming framework like Apache Spark. We can, we can do anything with it. And, uh, what this, uh, it's just another service, uh, little piece, uh, should really show is that it is a service that my main service relies on for a certain thing.

00:20:50

Like in my case, um, I read out a pH of my barium and open up the, uh, controller, uh, to hire or lower it if it's off, right? Um, so that's a capability that depends on ai, but I should have a plan B where I have a static library or something that I can, in the meantime use to, uh, uh, to make sure that I have at least that capability available to some degree. And then there's another thing that we as an analyst firm see a lot, and that is the, the project metrics. Um, if, if you have an AI project, there's always a lot of, uh, it's very, it's very easy to justify overruns. It's very easy to justify that. Uh, you know, the, the RI was not there yet in the first six months or at, at milestone one or two, um, you know, because it's so cool and the implications are so large, but that's how AI is currently losing a lot of, um, a lot of, uh, uh, trust and a lot of, uh, enthusiasm with a lot of people who, uh, whose budgeted is that they're spending on this, right?

00:21:52

So, um, it's absolutely critical to manage expectations, to have the same milestones, the same metrics that we would have for other projects as well. And, um, another interesting piece now is, uh, rule eight, uh, the advantage of public data. And, um, to <laugh>, I just looked at the, uh, you know, the, the Kaggle website, uh, that, that have all this, uh, public data available. You can basically get a data set for most, uh, problems that you can solve, um, with data. So in this case, I mean, uh, if you go and explore this and combine it with your, uh, corporate data, you can fill in a lot of gaps that you, that you have where you, uh, you don't know something about reality, and you can't really train your model properly. But if you add this open source data, public domain data in a lot of cases, then, um, uh, you can, you can actually predict something really, really well.

00:22:52

So that's a, that's another thing to consider. You know, if you say u there's no data available. There is data available. I mean, uh, I, I just reviewed a whole bunch of exploration tools. Literally, uh, you can get all the Yelp reviews for all of the us everything with, with all the data, the columns, with all the, uh, the comments with, with absolutely everything. You can get that and learn a lot about people, uh, within your corporate context. You know, if you wanna achieve a certain goal, you can use, uh, Amazon, you can, you can do sentiment analysis on Amazon reviews. They're all pop, they're all available, uh, as a file, as a text file, and you can add them into your own, um, into your own, uh, models. So that is number eight and number nine, um, this is, this is really interesting in that, um, you build, basically with ai, you can build a virtual reality because you perceive the world through that ai.

00:23:50

Um, the AI basically presorts what you're seeing. And, um, there's a lot of entry points that, uh, if you want to exploit that as a competitor, uh, that, uh, can, can be very detrimental, uh, because the, the character of AI is generally entirely opaque. So you can't see into the model and see why it made a decision. So if somebody feeds, uh, takes a memory stick and feeds a whole bunch of data into that model, um, that is racist, that is, you know, follows a certain agenda that does a certain thing that, uh, that is there to discredit my company or to make my products work less, less, well, um, you, you will not see that anymore. You will not be able to, to prove that anymore once the memory stick is removed and somewhere in the trash, unless you find it, because that stuff is then all in the training process.

00:24:44

And, uh, you have to basically continuously test your model to see what it's doing, right? Will, in two months, will my aquarium example still work? I said it worked. Now, well, that means it worked four weeks ago when I tried it last, and I thought, wow, great, that worked. But I have no guarantee that, you know, in four weeks from now, it'll still work. So there's a lot that I can inject. There's, uh, dictionaries, uh, that, for example, show the algorithm alternative words for a certain term that, uh, if I manipulate those dictionaries, uh, I can get you into big trouble because, uh, instead of a nice balanced evaluation of a, of a text or of a, you know, of any, anything that you wanna do, you might get something entirely biased and skewed. So, uh, feedback loops is another thing, right? Uh, a lot of people, uh, when, when Mike, I think it was Facebook when they launched their, uh, their chat bot, uh, they had a lot of fun manipulating their chat bot through external, um, through external inputs so that, uh, you know, it would be very politically incorrect.

00:25:47

<laugh>. And, uh, uh, rule number 10, and that goes really in line with all of this is, uh, it's a strategic investment into ai. You can continuously improve your product, you can affect your entire business. You are definitely gonna positively Im, uh, impact your competitive position with it. But, um, it's a little bit like quantum computing, like other things that are today, you know, a little bit of a hype, quite visionary, and you think, oh, yeah, I'm, I'm almost there, right? But in a lot of cases, 80, 90% is fantastic and makes for a fantastic demo. And I, I've done those, right? Incredible demos, and then you want to use it in production. They say, yeah, 85%, no, uh, can't do that. We need 93%, and you have no way to get from 85 to 93% there, there is just no way that you, that you can see today.

00:26:39

And, um, it may end up that you have to throw away your, your whole software, you know, and I, I've thrown away a lot of, a lot of that soft, sadly, in the past. So today what we have is we have a lot better systems, we have a lot better, um, a lot easier to use, uh, systems, and it's all much more accessible. So at the end of the day, um, the takeaway is features, actions, rewards versus, uh, instruction based coding. Teach the, teach your guys to think in that, in those dimensions, right? What are the features that I wanna measure? What are the actions that I wanna take and how do I, um, feed back information if something worked or not? Like I, I asked, uh, VMware, why do, why, why does vSphere not come with a, with a AI driven administrator built in?

00:27:31

Why, why do I have to have my own administrator, right? It's, it's all technology that I could do, but the answer is, and that's the, the truth, and it tells you a lot about how AI works. You have no way of reliably feeding back reinforcements into, uh, for, for your actions, right? So you, you add a storage volume and you configure that thing, and you do a hundred thousand things at the same time, and at the end of the day, you have maybe in six months you have a horrible disaster because of that, but could be because of something else. So you have no real easy way to say, oh, yeah, this belongs to that. And, um, uh, train that algorithm by feeding back the, uh, the effect to the, uh, the outcome to the, uh, um, to the model. So yeah, at the end of the day, it requires creativity, discipline, uh, it's a transition, almost like the transition to DevOps.

00:28:26

It's a, it's a painful transition because you can spend a lot of hours and a lot of money on not achieving all that much. And, um, that's why those rules, uh, really apply in terms of, uh, if you want to find out if you wanna use AI and optimize your DevOps process, right? You don't optimize the whole DevOps process. You look at one metric, you look, uh, uh, to, you look to find a certain thing that can help you become a little bit better and learn something from it, and do that over and over again. And then, um, and then you, uh, you get to a point where you solve a lot of individual problems with ai, but you're not solving the global problem with ai. Because if you try and do that, then again, there's a lot of headlines that show what happens, uh, when you try and do that, because it just, we are just not there from a, from a tech perspective.

00:29:18

And the key here is, uh, uh, d AI thinks fundamentally different from humans. A human reasons a human. You know, if I, if you ask me a question, I don't know what I'm talking about, uh, if it's at least in my field chances I can extrapolate the AI can't do that, yeah. I just will give you garbage and not even know that. So, um, yeah, that's the, that's really the key. And, uh, it's really a gradual transit, uh, transition to ai just like it is to DevOps. And, uh, sadly I didn't leave any time for questions, but, uh, that's, that's it. Thank you.