From DevOps to DataOps

Financial Services has taken 20 years to recognise that it is a software business. It does not have another 20 years to recognise that it is enabled by data. Unlike software natives or data natives, traditional businesses are learning and retrofitting practices in tandem with their service or product offering.


The rise of the Chief Data (and Analytics) Officer in mid-2010, was due to the need to safeguard data and evidence control over its usage. This was rapidly followed by traditional businesses wanting to extract value from data to stay competitive and relevant. As we have become data and digital citizens, getting data right is a social responsibility.


For professionals in this field, getting decisions based on data right is a moral obligation.

Since Dodd-Frank and S-OX regulations in the early 2000s, developers have been kept very separate from live operations. The disciplines of DevOps made it easier to protect production environments through engineering rigour and advanced practices and tools. The emergence of analytics professionals in mid 2010 has been mistakenly bundled with software development.


For data scientists, data engineers and data management professionals, real data is their raw material and their product. They happen to exhibit coding skills to build those data products. New interactions with IT teams are emerging (with new team topologies) and with those, new practices, tools and technologies that enable DataOps.



This session is about accelerating understanding and proficiency in the field of data and analytics. In highly regulated geographies and industries, every single outcome of a decision (increasingly automated) must be within the boundaries of the law and - in retrospect - meeting consumers' needs. For that to happen, we must bridge the chasm between software engineers and analytics professionals.



What is DataOps? It is the practice of orchestrating human and automated activities as the data flows through a software-enabled production line, in order to guarantee the integrity of data and decisions based on data. The stations in this production line may be discrete applications or a complex system of people and software. Conceptually, these flows are data pipelines in a data factory. DataOps wraps new disciplines around DevOps, such as the interactions with customers (iterative model design) to operations (statistical process control for drift/bias detection).

SS

Simone Steel

Chief Data and Analytics Officer & CIO for Enterprise Data Platforms, Nationwide Building Society

Transcript

00:00:14

Thank you Moira. Over the past three years, nationwide building society has been a fixture here at the DevOps enterprise summit with their teams presenting seven times, two years ago, Patrick Elridge, their chief operating officer presented with Janet Chapman. One of the three mission leaders presented on their transformation and its importance to the society. I'm so delighted that this year Simone Steele their chief data and analytics officer is presenting. She is passionate about the mission of the society, which includes providing homes and building stronger kinder communities, but also how data must be used to help everyone in the organization make better decisions and move the needle on the outcomes that matter most talking with her, I was frankly, in awe of this personal mission that she is on, of which the society is benefiting from. She has a very strong view on what every developer, what every technologist and every leader needs to know and what their responsibilities around data are. So in this talk, she'll be teaching us not only what she expects from everyone at nationwide building society, but on also how to achieve it. I think you'll agree with me that she is pointing to something that is elusive, but profound and important. That is relevant to everyone. Here's Simone,

00:01:33

Thank you, Jean, for the introduction. Um, let me give a little bit more of an insight on what I do. I'm Simon steel, chief data analytics officer at nationwide, and I have a role as well as CIO for enterprise data platforms. And what does it all mean? Um, it means that in a, in the business that we run, we would like to make the best possible use of data for our membership, which I'm gonna explain a little later, but at the same time as any financial institution keep absolutely safe, secure, and compliant, all the data that we hold on our platforms.

00:02:15

So what does union nationwide? What does the data and analytics function that I manage as chief data officer? Um, what does it do nationwide is a UK only building society. And it means that our business is run for the benefit of the members and the members are, um, customers that save or bank with us or have mortgages with us. So everything we do is for their benefit and for our communities, um, uh, where we operate, it's an old business, um, since 1884 and one of the biggest, uh, financial institutions here in the UK equivalent to a foot C two 50 company, we have over 17 million members and more customers that have fewer product holdings with us, and about 15,000 colleagues that support, uh, via our branch network, our whole centers and our administrative offices. So how does data and data and analytics specifically connect with this enterprise?

00:03:23

We do five big things or I'll call them value streams. We manage as a master data management function, our member data and customers. We manage the data flows between all channels, where we trade all, uh, product lines that we offer and all types of membership and customer, and, um, are responsible for bringing all this enterprise data together for functions that require total visibility like finance and risk management, for example, or marketing, we have the responsibility for governing data. It means that we need to know the meaning of everything that, and where is coming from. So the meaning the lineage of data, so that we can evidence, um, to our members, to regulators, to our own management that are books and records and our enterprise risk view are all integral.

00:04:23

We have a responsibility, um, as well, there are two main analytics function. One is business insight and the integration of those insights to everything we do. So, um, let's say that's a very sophisticated way of saying we need business intelligence at the front line and when action is happening. But also there's a flip side to this, which is the analytical work that we do with more advanced techniques. Um, and I will touch upon a few examples later on, um, from our stakeholders where, uh, we use machine learning and AI to unveil some, um, behaviors that, uh, we didn't know existed through data and to convey some opportunities within no existed through data. So these are the five, um, areas where we are embedded into the business, um, and, uh, through more or less 400 colleagues across those five key functions that deliver everything from understanding, uh, where business questions are coming from and contributing to those questions to operating a very large technology estate with, um, very large and multiple technologies of data lakes, warehouses, and Mars, many business intelligence tools, and, um, increasingly more specialized, um, tools for analytics.

00:05:59

I came to nationwide about two years ago. And with that, I brought my baggage, which is academically, um, computing, data science, and a lot of technology experience, but mostly working for investment banks, global banks. And this is my first UK only experience. So what I'm gonna share with you is a mix of all that. And my goal today, as much as I can possibly teach about data operations is to actually make better connections with you, um, our DevOps community, and any other network that you may have access to that you can influence, but to begin with, let me position, um, what I believe is the need for those connections. You might not have seen this picture before is an aerial photo of the Amazon river. And it begins with the confluence of two other big rivers, they're Hume and Soli MOS. And for hundreds of miles, the waters of these two rivers don't mix. And for me, this is the perfect analogy for the space that I occupy here at nationwide. Being a chief data officer means that sometimes you are fully embodying the business, um, and how data enables or hinders doing, um, good business. And sometimes we are doing work that is almost purely a technology, um, is a technology challenge because right now, most of our data flows through technology, um, applications and systems. And I say most because believe it or not, we still have some data in other forms in true analog fashion.

00:07:58

So, um, my job in a nutshell is to bring those two things together. That's why connections are so important for me and how we use and manage data needs to be very carefully articulated for these 15,000 colleagues that work with us. We want to do this in an ethical way that really brings value to the membership, but is also safe and compliant with ever more complex regulations in a data space. Now, the perspective from our internal clients, I brought a couple of, uh, perspectives here for you, um, to contextualize why this is important. For example, our chief strategy and sustainability officer recently talked to our society about the study, done with data around our carbon emissions, as we mortgage a lot of product, um, a lot of properties, um, here in the UK, we do have access to a lot of information about how they perform in their carbon emission, uh, emissions, energy efficiency as a whole, and that could help us help the members, uh, finance improvements to their homes, or even inform them of better choices of where to get, um, their first mortgage.

00:09:16

There is also another perspective, which is very operational, not just forward looking what we could do with data, but the here and now. So I also chose, um, a little perspective here from, um, proposition and marketing from our chief proposition marketing officer whereby um, we wanted to give back some value to the members, um, following our last financial year and with a good understanding of our membership and who would be eligible very carefully, um, created a member prize draw that we now run monthly monthly with a 1 million prize, um, every month. So data in the eyes of these stakeholders is the business, but clearly when we look under the hood, we have to, um, pay attention to how we do that. Technically how we deliver the data, how we care for it and how we evidence that we are compliant at all times. So why do I need those better connections, uh, with different communities of business and technologists is because data is the operation. It is how those business results materialize. So I'm gonna explore a little bit today how we do this, um, with the acronym of data ops, but I was trying to write at the beginning to set the tone for data operations without the acronym, not to get confused, um, with maybe a very defined set of practices.

00:10:55

So brief history, um, of data that helped us develop the thinking here at nationwide. If we go back far enough, maybe not necessary to go back far enough, we'll see that data is not a new thing. It has been, um, recorded systematically since at least 200 BCE. And this is an example that's in the British museum. I will add to, um, the references later on in the presentation, but these are the earliest data records that we have found, and it records periodic motions of celestial bodies in precise ways. And that's really fast forwarding to where we are now, what we should really be, uh, thinking about those really basic practices of capturing data correctly, accurately, and knowing what the purpose is, not just data as a self-serving entity.

00:11:54

So more recent story. Uh, more recent history is that we have had the Dawn of the database is not that very long ago in relative terms. Um, we have had a myriad of tools that help us now use the data that we gathered to the business benefit that we are pursuing. We have had huge, um, explosion, I would say of tools and technologies that help us handle ever, uh, greater amounts of data at ever faster speeds and leads, and, um, have developed a bit of an industry of management information systems over the last couple of decades. We have seen emergence of business intelligence tools, business insight, sometimes they're called as well. And that has been now part of our portfolio of tooling and ideas or how, how to use data. And very recently in last say, uh, five to 10 years, uh, big data technologies, um, coupled with much cheaper and accessible computing and cloud computing specifically, um, and many specialist languages for statisticians and mathematicians.

00:13:12

Um, we have seen the Dawn of the enterprise data science, and I put a little star here and you can read later if these slides are available, that the job of the data scientist, which is so crucial to delivering value from data today is arguably the job of the geneticist or the geophysicist or, um, the economist of the past. But now we couple informational technology with better tool. This has led to an, uh, to an enterprise wide capability that we now call data science, but in the context of financial services just very quickly, it has taken us enterprise, um, um, big enterprises, a long time to recognize that we are software enabled and we can't afford another 20 years to recognize that was flowing through the software systems is the data that we call the business. So we have some catching up to do in the financial sector, and we have been kind of pushed, uh, towards a number of those practices being, um, industrialized.

00:14:27

I know, um, my experience around 2000, um, 2002 was the do Frank and, uh, Sox regulation that pushed us to having much more strict disciplines in managing our cycle of development and operations and how we evidence the chain of correctness in those environments. We have had financial crisis, uh, on top of each other over the last 10 years or so that have created the need for a chief data officer in some financial institutions to control the data well in that terms to govern data, which is, um, uh, a big challenging deed and because professionals and, um, in the data space have straddled technology and business for such a long time. They need to be now not categorized as end users, but they are almost citizen developers. Yeah. Those tools and those practices now need to permeate a much broader cohort of professionals in the enterprise and why all of that, um, because I am operating right now in those two camps, I am firmly in the, um, kind of data job, but I experience every day relationships a bit like this.

00:15:51

I need some data to create some insights for our members or our business stakeholders, but often the relationship between technology and data goes a little bit like this lots of practices have matured over the last 10 years from source code management, all the way to continuous monitoring of your performance tuning, scaling, and many sophisticated DevOps practices. But the data professional may not have witnessed may not have had that rigor in their practices, but yet given the job to gathering sites from whenever more sophisticated, um, technology platform that holds all these data, uh, points and looking at technology thinking, I, I just want to get hold of the data. I just want to get hold of an experimental sandbox to be able to even say what the data's telling me, not even, uh, remotely close to the software automation concepts that our technology colleagues are accustomed to.

00:17:06

So these are the two camps that are trying to work together, but there, there is a little bit mistrust, uh, or misunderstanding, um, happening. And what I'm trying to do here with, um, data operations is to really elevate the common goal, um, which I'm only exemplifying here. You can replace that with any goal that you might feel appropriate for your business. Um, but in this case, both the development, uh, population, software development population, the data analysis and the data science population want really, um, good outcomes for our customer base, but we want it at a real time. So it means that we need to be more sophisticated together at getting the data pipelines working. So this is all going to be about how do we get the organization to embrace and to accept no data warehouses, no data, Mars, no operational databases, but the data pipelines. So let's assume for a moment, this is our common goal. We want to offer financial products at real time that are tailored to our customer needs. And by the way, this is going to become even more, um, demanding as regulation enforces us to use the data for the best possible outcome for our customers.

00:18:32

So in reality, how do we, how do we create these data pipelines if we work so differently? So if I am creating some software products, I'm probably thinking about user stories and user experience. I'm U I'm thinking about, um, regulatory obligations and policies that I have internally in my company. I'm also thinking about nonfunctional requirements like performance and resilience, as well as security. And I'm also taking hopefully feedback from my production, um, my op software operations and bringing it back to pay back technical debt, fixed bugs and all that good stuff. And DevOps op has helped me, um, improve those practices together with some behavioral changes in, um, the quest for, uh, business agility. But at the end, I'm providing I'm, I'm, uh, producing software products. And on the data camp, I'm thinking here, what is the hypothesis that the business is trying to prove or disprove?

00:19:40

What am I out of out of the box ideas that I think I should really be looking at the data and, and figuring out whether or not there are some gems in there I'm thinking, how do I connect data domains that may be really far apart in the organization? How do I connect for example, security data from this building to my organizational chart in my HR system. And, you know, perhaps even those employees, um, are clients of my own company. How do I, um, for example, take into account the long term sustainability objectives that have become front and center in most businesses these days from an accessibility perspective, a bias towards certain demographics, how do we take into account our environmental responsibilities? And at the end of the day, what I want to produce is some actionable insight. And that could be a report that could be some learning algorithm, and that could be, um, some less, um, sophisticated, um, set of insights, but we want to use in a repetitive way.

00:21:00

So those needs are different. And I wanted to just, um, make sure we don't use the data operations or the data ops, um, expression thinking that it is just DevOps for data, which is the criticism that I think, um, some of us have heard, um, over a few years. So take the softer product and the data product, the insights, the learning systems, and put them into your company's production line. What we know is happening day in, day out, um, 24 by seven, depending on your type of business is that we are taking data from place to place going through those, those algorithms, either the terministic software or predictive. And these are the outcomes that we take, um, to our clients good or bad. And for that reason, for that confluence of predictive and deterministic software, um, for, for that to be handing glove for that to provide the best possible outcome, we need to understand each other's jobs and have an appreciation for each other's contribution to business results, more specifically learning from each other, how to automate those data pipelines and how to monitor those data pipelines.

00:22:27

And here I'm gonna take a tiny bit of digression, um, and postulate that data observability what's in those data packages that come out of your production line. Um, observing the behavior of data is not the same as observing the behavior of the system that produced the data. So without going too much down a rabbit hole, what I want to take us, uh, through now is that yes, of course, um, we would benefit from observing how data is evolving over time. We know that learning systems have a feedback loop and that the outcomes will determine the behavior of machine learning and AI, therefore observability is key to managing, um, drift or bias. That is, um, unintended.

00:23:20

Now life is not so simple with one workstation, with data in some software and some decisions and data out. We live in a very, very complex ecosystem in most of our enterprises and an ecosystem made out of, uh, automated systems and human systems. So, um, I can't linearly extrapolate that thinking. Um, so what do we do instead? Um, so my reminder to you is that, um, I'm here to establish those connections between technology and data so we can get better outcomes. So take a moment to think in your role, what is the opportunity to create those better connections, human connections, and technical connections to improve the, um, flow of data through your production line.

00:24:13

So hold that thought. Um, and while I cover a little bit about DevOps specifically, um, I'm sure you've heard this before in different guises, but very quickly, why do we still need to talk about this topic? Let's understand how we got here. Um, again, I'm kind of postulating it from, from literature. Um, but what is the reason why those data pipelines are so important? There is still a there perception, but there's also, uh, data that shows that getting value from data is, um, has been, almost treated as it project for a long time for almost 10 years. And not that all it products were super successful either cuz um, you know, we discussed this every year here. Um, but data and analytics is a power user domain. They are different animals as, as postulate here with the HBR article, they are not it products. That's why they were failing if we are trying to run them like so, um, after that is, um, being published, there have been many attempts to make it more, um, structured and formulate such as this one from Tamer, uh, from anti powers, working definition of data ops, the intersection, data engineering, data, integration, data, quality, data security, you know, what is this new discipline that is emerging, but that's from a practitioner's point of view.

00:25:49

Um, but as we have a greater cohort of data scientists, um, you know, over the last 10 years, I think, uh, it, it has, it has been recognized now as a mainstreaming profession, we have a different perspective, uh, which again plays to the dichotomy of technology versus data, um, dynamics within the organization. What is the insight here from the data scientist is that we have conflicting goals and those goals, um, may be on the surface being, um, articulated as better outcomes from, for your clients, internal, external, but inside the organization, there is a pool for controlling environments and software and there is a pool to liberating the data, but in a controlled way, and those internal incentives are not aligned. So there are, um, studies that you can see maturing EV uh, with, uh, good cadence, but still not as mature as, um, our technology disciplines.

00:27:02

Um, again, those references will, will be shared with you, uh, for further reading if you wish. And, um, finally there is still a technology perspective to data ops, which is how we learn from this, um, loop of development and operations to design and implement better data architectures and distributed data architectures in particular. So this is looking at how my data pipelines are actually created physically. So they take into account now into the DevOps op uh, discipline, some hardcore practices of development work, um, that were not present before given the fact that data scientists were power users and where we are now is I think this maturity, uh, emerging with continuous design as in make use of, um, all the lessons we've learned from good business agility, having stakeholders present at all time and need to rating quickly having a continuous operation with observ, observability, um, being the key for success and tuning. Um, and how do I now, uh, take all of this and use it as my data ops? Uh, what's common, right? What's my north star. How, how do we share the north star? Uh, how do we start to get those communities to speak the same language that we have been trying to define for almost 10 years?

00:28:40

So that's the start of 10, um, integrity of data and timeliness of decisions is the really, uh, the ultimate goal, just timeliness, uh, or decisions based on data is not good enough if you can't trust it, it's all about ensuring your pipeline is encoded with those safety mechanisms. And now we'll volunteer here, a few, uh, a few ideas, uh, that I am trying here in nationwide, which is together teams to work closely together. Um, multidisciplinary teams are best, uh, for a reason, but not always achievable. So how do we create proximity, even if they're not in the same organization place and how do we create the same pace as we can impute into software delivery into data science? Because we have very, um, depressing stats, uh, from Gartner that only about 8% of our data science applications reach production. And we aim as an industry to get to about 75% in 2030. And, uh, secondly data analytics is now, uh, detaching itself from the static management information systems that we could produce once every month. And you'll be for our board of directors to look at the health of the business through many lenses. We were very accustomed to have ETL and root force data wrangling. That's not the expectation anymore. We want dynamic data insight with rigorous metadata management, which I mean quality timeliness, completeness measured on the wire as the data pipelines get executed.

00:30:26

And, um, so just to kind of wrap it all together, data is shaping, um, up for us as a mix of how we organize ourselves to design and continuously evolve our solutions that combine data and software. We are also trying to, uh, cross pollinate, bring our DevOps, um, colleagues that have more experience in operational settings and operational, sometimes high resilient systems, uh, into the large data problems of warehouses and lakes and sandpits for data analysis. What are the good practices of versioning of controlling of rolling back that can be automated to enable more colleagues to benefit from that speed? And then finally something that is still, um, quite, um, in its infancy, I would say in terms of ideas for us, a nationwide, how do we implement observability in the context of a business outcome to simplify? Um, the, the current entire sentence is how do I know what's happening with the data at real time or near real time for my internal constituents here in the business, um, to understand that they can trust the data from an accuracy, timeliness, you know, responsiveness, uh, point of view, because once you erode the content or you erode the trust in the content, that's very, very hard to retrofit.

00:32:05

So for example, a system might be running well enough throughout the day and overnight, but the actual content is deteriorating in quality. We want to bring the same rigor that we have for monitoring the system, to monitoring the data and provide those kinds of business controls based on rigorous statistical process control to detect anomalies in short.

00:32:30

And, um, if I have the opportunity, uh, to come at some other point in the future, I can see that the DevOps, um, uh, knowledge base and practice and maturity of practitioners is a stepping stone for us to achieve similar maturity in machine learning operations and analytics operations in general, as we adopt more and more learning systems, um, through many forms of AI, deep learning and linear regression and the various other methods, um, there absolutely require data to be at, um, at your fingertips quickly and reliably. And there are some sister conferences that are happening in the summer. Uh, so the data summit in San Francisco is one of them. And some of these thinking if you are interested, are going to be, uh, explored further in those, uh, conferences, as I promise, uh, I will share with you some of the references that I made here today.

00:33:33

And, um, just to, uh, call out that none of this is particularly my thinking. It's a confluence of, um, a lot of thinkers that came before me. And I think we don't have time as humans to learn everything from first principles. Uh, and so it's a reminder, very close to my heart that the more we read, um, and exchange experiences the quicker we'll get, we'll get there with data ops. So just to wrap up and in case you forgot, I'm asking for your help here in creating better connections, uh, between your technology and your data functions through this data operations lens, those data pipeline lenses. So the help I'm really looking forward to, um, um, the help that I would like to ask you here is that create a, uh, bidirectional mentorship, or, um, almost like pair programming, but not with your other technology colleague colleagues with your data colleagues, uh, perhaps to accelerate the speed of data exchange in your organization with the safety that it needs. Um, I'm very curious to know if you are trying, um, those experiments that, um, what the results for your business might be this time next year, where we perhaps will meet again. And with that, I would like to really thank you very much for listening to me, and I'm looking forward to, um, learning your views and having your questions.