Cloud Data Journey: Story Of Adopting Cloud Technology and Modernization of Data Pipeline

We want to share our Cloud data journey at Discover that improved and provided faster business insights. Today cloud technology has a vast scope and whenever you come to a point of understanding exactly what it is,’ To move or not move’ is the basic question that becomes a challenge. Our solution focuses on an event-based pipeline which proved to be critical to our business while creating reporting solutions. This helped to know the status of your data in time via event-based data pipelines making use of a data lake.

PY

Prajakta Yerpude

Senior Software Engineer, Discover Financial Services

SA

Shivani Anand

Sr. Principal Solution Architect, Discover Financial Services

Transcript

00:00:13

Hello everyone. Are you on a journey to migrate data and analytics to cloud or planning to start one, if yes, start changing your mindset and get excited in adopting emerging cloud technologies. We are here to share our cloud journey with you, which is a story of adoption of cloud technology and modernization of data pipeline here. Our expedition story from ground to cloud, which will help you in building your runway. Discover financial services is a company whose mission is to help people spend smarter, manage that better and save more. So they achieve a brighter financial future. And vision is to be the leading digital bank and payments partner discover has a presence in more than 200 plus countries. Before I get started, let me introduce myself. I'm Shavani Anan, senior principal solution architect at discover financial services focused on providing technology and data different solutions for data. And it's also called as DNA organization. I discovered I have diversified almost two decades of experiences in DNA and helping companies in providing thought leadership and building strategies and solutions for large scale data, product projects, fun fact about me lately. I've developed a huge passion for gardening. If my family do not find me inside home, they know where I am in the backyard with no further ado. I'm passing the mic over to my colleague projector who will be walking you through the agenda of this presentation projector.

00:01:48

Thanks you Ani. Hello everyone. My name is project pudding and I'm a senior software engineer at discover in the cloud data products team. I have been extensively working in cloud technologies for over five years, especially implementing variety on data ingestion pipelines because you diving faster analytics. Apart from this, I love to do weight trainings and I've been doing it for the last two years with a dedicated personal trainer moving on to today's agenda on what me and Shivani would be covering in our cloud data journey. We would be going through the very initial basics on how we started to learn to fly from ground to cloud what challenges we came across in the whole process and how we overcame them by these four principles that forms the base offer. Today's talk the mighty forces that is cloud technologies, collaboration with each other to changing mindsets and continuous learning. You going to be sharing some offer accomplishments and wins on how discover is changing the way financial services are working in it, as well as share our bit on how you can learn from our journey as a lesson learned. Finally, we'll also share on some of the scope on how things are fast-paced moving at discover to achieve efficient results. I let your money take over from here.

00:03:06

Yeah. Thank you for the doctor. It's quite evident. Moving data on cloud is vital for companies to stay relevant in today's competitive business landscape because of multiple factors, the key being increasing speed and quality and decreasing the cost for the business to be a leading digital bank and payments partner. These were the driving factors that propelled us to move to cloud, uh, discover in the era of data explosion, managing volume, variety, and velocity of data on premise is challenging and costly. For example, in order to scale your on-prem data warehouse, you have to go through the estimation process and then accordingly pay the licensing storage and compute cost of front. We were in a similar situation that triggered our cloud data journey. Ideally applied to cloud was bumpy. Initially there was some skidding and sledding in the beginning as embracing new technologies is always a challenge and many factors need to be taken into concentrations.

00:04:07

And cloud technology is not an assumption that the culture of let engineers flight in combination with cloud technologies built an environment of innovation, which contributed to the development of factory of wound grown products. Boosting our cloud data pipeline. The company has also been awarded a 2021 CIO awards, a hundred it's innovation in cloud data platforms. CDF the CDs platforms helps deliver a distinct advantage to our business and improves customer experience by bringing information faster to market with higher quality and reliability for us, this journey has just begun and it has the scope of making a breakthrough in how data is being ingested, managed and used on cloud. We have reinvented the way we built our solutions and products being an early adopter of cloud. Our use cases help the vendors in enhancing their products best practices and standards. The power of cloud computing is making possibilities reality. Some of the analytical processes and machine learning models, which are not possible to run on premise due to volume and complexity, complexity after migrating to cloud is providing huge value to our business partners. Let's hear the challenges we face from projector.

00:05:30

Thanks you Ronnie. When we think of technology, we always look for challenges that we have in our current tech stack for us, we came across following challenges, which in my opinion, everyone has had in their tech experience by thinking on why they should be moving to cloud. First and foremost is the large amount of legacy data today, as we know that data gives you about everything you need at an enterprise to move things faster for the business. And this data grows in an exorbitant amount daily, which becomes hard to maintain and manage as the data grows, not just in gigabytes, but terabytes and petabytes in a fast paced financial company, like discover. We need to find a tech stack that is able to manage this in an efficient way. The second thing that follows is the high storage cost for storing this humongous data would require a lot of resources to maintain, manage, and process this data that ultimately causes an increase in the overall cost.

00:06:32

It is hard to keep up with the costs for resources like hardware, infrastructure, and power consumption. To keep up with this ever-growing data on-prem systems usually require a large upfront purchase, which means capital expenditure is often required. And on top of that, you need to include maintenance cost to ensure support and functionality upgrades. This brings me to my next point, and that is capacity and scalability constraints for example, but the on-prem infrastructure needs. We sometimes need the resources to scale up or down based on the dynamic requirements and to do it manually. You need to take help from actual human resources to spin up additional infrastructure like spinning up servers that requires a downtime and that can affect your application performance and its delivery to the business. But you also have to maintain data centers that are quite expensive and manage all the on-prem data that usually requires specialized resources like mechanical engineers, electrical engineers, who have to take care of these data centers.

00:07:38

When required, of course, data centers have their own advantage. And I've discovered we follow an hybrid approach of data centers and cloud, where we use data centers for critical data and use the cloud for less confidential information because the cloud is so easily accessible. I'm scalable using the cloud for additional capacity might be a good solution for some organizations. You may find that certain workloads are better suited for your data center, but others run more effectively in the cloud. In the end, your flexibility workload and security needs will dictate whether a data center, our cloud is the best fit for your organization. Talking about on-prem solutions. You do the above challenging factors, but it will be go for a deployment. It takes quite a good amount of downtime that can impact certain business applications. Also, eventually since a lot of resources are involved in taking care of the infrastructure and data applications as a whole, it causes a lot of dependencies in between teams that can delay your overall timelines. I journey started with all of these points taken into account, and she won't even talk about what exactly what formulas in the next slide. Thank you for that

00:08:55

Of the four CS cloud technologies, collaboration, changing mindsets and continuous learning has been the driving factor of our success in moving to cloud cloud technology of course is fundamental to moving to cloud, but without true collaboration with partners, both on the side of business and technology, you cannot fully adopt and implement cloud solutions. Successful little partnership starts with educating your partners and getting understanding of their processes and building any cloud building, understanding of their processes because building any cloud solution or moving to cloud, is it digital transformation journey that starts with changing mindsets of every person in the enterprise? This is a very critical step. As the implementation and usage of cloud technologies today are different from the traditional way of doing data warehousing and analytics to be successful in this climate. One has to be a lifelong learner, as we all know, technology is changing at a rapid pace. Thank you is learning is the mantra for staying ahead in the game and to handle unprecedented challenges, project the novel experiment on the first seat.

00:10:06

Thanks, Ronnie, let's begin with our base for today's session on our four CS. And I'll be talking about how some of the challenges that I explained earlier could be resolved by the cloud technologies that are out there today. First of all, with a cloud-based subscription model, there's no need to purchase any additional infrastructure or licenses as explained by Shivani in exchange for an annual fee, a cloud provider, maintain servers, network and software for you. And this gives a big advantage of flexibility to use software platform and infrastructure as a service, any time when we need in our journey, it provides numerous advantages to employees by greatly reducing the time and money spent on tedious tasks that just installing, managing, and upgrading the software. This also has been rapid development and deployment of applications on cloud and ultimately help reaching our business goals and satisfying customers.

00:11:03

For example, in our case, when it will be required to add a new server to a data center, it took three, two to three months to completely set it up and running. And that had quite a good amount of downtime causing dailies. This prompted us to look for a data solution that provided flexibility in scaling up resources, talking about performance. I would like to code the basics here, and that is cloud computing. It's simply the use of large scale computer networks. And it is the use of network hosted servers to do seven tasks like storage process and management of data. In this cloud computing environment, we have multiple network load balancers, then distributes workloads and compute resources. This load balancing allowed our users to manage our application and workload demands by resources among multiple computers, networks, or service. And guess what? We didn't have to explicitly do it on our own.

00:12:00

It was all taken care of by the cloud. For example, we have ETL pipelines today and discover that scales up or down, and that saved a lot of costs to us. But more examples is the snowflake compute resources. Today we spin up additional compute resources to perform ETL operations automatically saving us time and cost with the increasing need for more storage. When Vic benefit of the cloud is that the transparent infrastructure can be extended when needed the scalability of the cloud allows your organization to add or reduce capacity as your needs change. This took away a lot of our effort in keeping up with the growing data, which is now in petabytes and discover the best advantage of cloud is that most cloud computing services are pay as you go. This means that if you don't take advantage of what cloud has to offer, then at least you won't have to be dropping money on it. For example, the pay as you go system also applies to the data. So storage space needed to service your stakeholders and clients, which means that you'll get exactly as much space as you need and not be charged for any space that you don't take in together. These factors result in lower costs and higher returns. I let you only take over to the next scene.

00:13:21

Thank you for that collaboration. The second mighty sea, as we have already mentioned without true collaboration with partners on the boat, business and technology in some ways can not fully adopt and use the cloud platform. Let me share one of our use cases to emphasize the importance of collaboration at the very initial stage of our data migration journey. We focused on the technical aspect of moving data faster and securely to the cloud without much engagement with our business partners. Once the data was available on cloud, some of the important analytics and machine learning processes were not working as expected. Why due to the fact that the sensitive data sets were tokenized or mass, and there was no solution available for the business or data science team to securely de tokenized and use these values, which was very important for the processes. As a lesson learned, we changed our way of working and started engaging our business partners throughout the journey, which has been immensely helpful in driving the cloud data migration initiative.

00:14:24

Similarly, partnering with our cross functional technology teams helped us to overcome the blockers and build a reliable, secure, and cost effective enterprise solutions. Most of the blockers are due to the lack of understanding of new technologies. One of the blockers, which we faced was the inability to load white tables in cloud as the cloud data warehouse, which is snowflake for better performance has a soft limit on maximum number of Cardinals per table joint team effort by the cross functional technology experts not only resolved the issue, but created a solution which helped faster data on boarding for multiple teams for true collaboration to work. We built small autonomous cross-functional teams, working shoulder to shoulder with shared goals and objectives. Each team members has a clear definitions and common understanding of terminology or processes, metrics, and goals. In a nutshell, leveraging the best of each technology to build an award winning project.

00:15:25

We can only be possible with true partnership. The famous Michael Jordan quote, talent wins games, but teamwork and intelligence win championships seems fitting in this context. Let's move on to the next seat, which is changing mindsets. Cloud computing is a disruptive technology and in order to adopt it, every individual in an enterprise has to go to the exercise of changing mindsets. I reiterate cloud computing is very different from optimized data warehousing. For instance, you have all heard it before. Cloud computing is based on the model of pay as you go. The allocation is clear. Liability of the resources are on play, no need to splash, huge investment upfront and so on. Are ingenious. Embrace this flexibility of the cloud and work like a startup by quickly prototyping, innovative ideas, leading to enterprise ready product, which are the key components of a cut cloud data pipeline. Trust is built in the teams.

00:16:29

The encourage on testing in innovative and imperfect prototypes in AGL approach. It's always good to fail fast. This helps in giving, uh, getting the expected outcome, uh, faster than spending time on building perfect solution daily problem solving. How does in the cross-functional autonomous teams, if you will, in moving quickly and efficiently, engineers are given more authority to make decisions and less dependency on hierarchies, changing mindsets of business, product owners, and so forth is helping changing mindsets of ingenious leading to the product growth by building reusable and high quality solutions. Ingenious are thinking out of box and embedding extreme automation in their designs and being creative by branding outputs. There is unwillingness from teams to accept the status school on the flip side, lack of understanding and shift in the mindset in cause unexpected costs and empowers the cycle time. For instance, during migration, some processes had like sequence in built, adopted the lift and shift approach of migrating data to cloud.

00:17:45

These sequels are designed for on-premise data warehousing and not for cloud computing. When the same processes ran in cloud, guess what happened? The reverse was reverse of what was expected. The cycle time increase, which elevated the compute cost and defeated the whole goal of cloud migration. Clearly it's not just about getting there, but also about how we get there. Cloud is not just a bed run and store your data, but it is how we now work, which is like making extreme automation. Reliability. Availability is the way we deliver every product or solution in cloud it's joint responsibility of all parties involved to efficiently use this technologies and solutions to achieve the common goal of gaining speed and quality at an optimal cost. Let's hear from project at the last C, which is continuous learning

00:18:45

As money talked about operation, and shouldn't be, this is followed and yes, with the ever-growing technology stacks, it's very important to keep a learning curve in an organization to implement innovative cloud data solutions. I definitely agree with what William Pollard, a scholar in the 18 hundreds quoted saying that learning and innovation go hand in hand, the arrogance of success is to think that what you did yesterday will be sufficient for tomorrow. At discover. For example, in my team of cloud data products, all the engineers spent a good amount of time in experimentation and educating themselves on the new technologies that can improve our current processes. We gave a P priority to learning new things where innovation is aimed not only at totally new your unique products, but the great bulk of innovation is aimed at simply improving existing products, delivering them more efficiently. And that's what exactly we are learning in our cloud journey.

00:19:48

But every day we are enabling our business users with faster data analytics solutions discover provides multiple platforms to the employees within the company with opportunities to learn about existing and upcoming technologies. That tremendously helps in bringing about innovations in our products. Let's move on to where we are in our cloud data journey. And as we discussed on how these forces were involved in our successful journey, I would like to mention some offer wins at discover that caused about innovation, improving our business experience holistically. The crux of any cloud data journey is a pipeline whose job is to not just move your data from one source to a destination, but it involves a lot of steps like extraction transformation, and eventually loading of the data at discover. We are leveraging AWS cloud and have created products around it services and are using snowflake as a destination warehouse.

00:20:48

For example, we have a solution called as universal data loader, which is an event driven pipeline that automates data ingestion on cloud to move data from an S3 data lake to its corresponding snowflake table. Our journey started with using on-prem ETL solutions to move data to cloud. And then we moved on to complete even based data ingestion pipeline to design such a pipeline. We decided to make use of AWS components like Lambda function to perform operations, dynamo DB, to store these events easy to for computing complex SQL queries, SNS, SQS to store messages. And so on. We have a dedicated engineering platform teams consisting of software engineers, data engineers, and dev ops engineers that handle the development and deployment of these pipelines I've discovered, but collaboration happens at its best and decides who decides these features every quarter, according to the business requirements with all the above innovations that are being made within the company, discover is already said to be one of the top financial services company, to be a true enterprise status data centric organization. I let you only take over from here.

00:22:04

Thank you projector. So what we have learned so far from this journey, definitely in most cases, it does not lift and shift. As we have mentioned before, cloud technologies are different from onto my solutions. Take, for example, the lift and ship sequel that I mentioned earlier, instead of reducing the processing costs and cycle time, it has the opposite effect. It is essential to put an upfront effort on learning and understanding of new this technology before adoption engage your partner from business and technology to understand the dependencies and the enablement of all the features, which will not only help in successfully onboarding the data on cloud, but also, which is equivalently important is easier adoption of business partners, the recommended approach and the simple and complex business use case business.

00:23:00

And in fairness, and worked on it's important to train your partners on cloud adoption. The sooner you adopt this technology, the easier it will be to sunset the legacy process processes leading to a significant cost reduction last but not the least, the whole process is not set and forget. It's important to keep monitoring and collecting vital statistics for continuous improvement related to cost security, efficiency, and reliability. Key takeaways for you from our lesson learnt is spend upfront time understanding your use cases and technologies as it's a continuous journey of how efficiently we fly to conclude storing and managing data on cloud and adoption of cloud data for multiple use cases like next generation machine learning models, artificial intelligence and analytics is a journey. We started a journey with the form ITCs and we have continued it by embracing and enhancing cloud technologies, extending collaboration with our business partners, encouraging teams to change mindsets regarding new ways of working. And lastly, by expanding continuous learning culture to coat our CIO honor, this is a journey and requires a culture of continuous learning and discontent with the status school. Parents will hearing our story hope. This helps in defining part of your cloud journey. Last note discover is hiring talented engineers across domains, including cybersecurity, data, DevOps, infrastructure, and software to learn more on how technology is driving business at discover. Please check out the link shared by projector on the slack chat. Thank you and stay safe. Thank you.