Las Vegas 2018

Targeting 100 Million Users: Continuous Improvement at TODO1

TODO1 is a leading mobile and online banking platform serving 10 Million users and over 2 Billion transactions per year. They began their DevOps journey in order to support the business’ goal to grow their user base 10x in three years.


Ironically, huge initial success with DevOps (like an 83% reduction in deployment times) revealed unexpected downstream challenges and barriers.


In this talk, TODO1’s Director of Technology Architecture will share some insights to their successes, and plans for continuing to optimize their release process, including:


- How agility gains in one place revealed bottlenecks elsewhere–and how they tackled them.

- The benefits of a “pipeline-first” rather than “stack-first” approach to automation (horizontal vs. vertical).

- How automation naturally forced adoption of a “more lean” deployment processes.

- Successful approaches to getting other teams on board.


Juan Felipe is the Director of Technology Architecture at TODO1 Services. He leads the Architecture Team and helps drive the company-wide DevOps Automation initiative. With more than 18 years in software development for the Financial Industry, he joined TODO1 in 2007, having several roles as Developer, Technical Lead, Solutions Architect, Product Design Manager, and agile sponsor.


With a strong experience in Digital Transformation in the Latin American Banking Industry, Juan Felipe worked in the design and implementation of Digital Banking, in both Retail and Corporate Banking, as well as Private Banking segments.


Juan Felipe plays the guitar (tries at least), reads sci-fi and loves super heroes.

JF

Juan Felipe Cardona

Director of Technology Architecture, TODO1 Services

Transcript

00:00:04

Welcome. My name is Juan Felipe Cardona. I'm going to share our journey, uh, with continuous improvement at Tono. Uh, first of all, I want to thank Electric Cloud for, for inviting, inviting me to do this presentation, and also I want to thank my team back at <inaudible> and John, our, uh, engineer, continuous, uh, automation engineer that is here in the audience as well. Thank you for being here is a company, um, in which we are focused in humanizing the relationships between people and their financial institutions. We are not a FinTech since we don't build tools to sell the banks, uh, but, uh, we, we create solutions as a service. That's, that's our, our business. We offer solutions as a service to offer banks, uh, for them to, uh, complete their digital channels strategy. And we are focused on banks in Latin America. We have more than 18 years of experience, uh, in the market.

00:01:34

We work with world class tools, platforms, and the best of breed, um, uh, platforms that we can, that, that we can offer our clients. We are focused on the adoption of our, of our solutions to our clients. Um, um, we promise our clients our, the, the best, uh, the best user experience possible. We recently acquired a company that is expert in user experience and do all, all the development in that area and all the user journey for us to apply in our solutions. We're experts in integration. Uh, we integrate with different backends, uh, uh, the course of our clients, and we also integrate different business applications, uh, to provide the best solutions for our clients. Our business model is risk-based. Uh, that means that, uh, we provide our users with, with our solutions, and we work very closely with them so that their success becomes also our success. We just, we don't just deliver the platforms or the tools for them to operate. We become business partners to our, to our bank clients.

00:03:07

We are located, um, in Miami, in Colombia, in South America. We are office. We have offices in central, in Central America, in Mexico, and we have clients in Colombia, central America, the Caribbean, and we are now opening new markets in Peru and, and Argentina. The type of applications that I'm going topo speak about in this presentation are basically, uh, online and mobile banking, both for the retail and the business segments. We also deliver PFM, that's, uh, personal business management solutions and BBFM business finance, business management solutions. The architecture of our applications, and this is to to give you more context of what we, we were trying to automate in our automation journey is basically Java based. Almost 90, 95% of our applications are Java based, and we have a UI layer for, for the mobile and web applications in which we have our web installers.

00:04:34

We deploy their base standard, standard Java, Java web, uh, artifacts. Uh, we also deploy in that layer, uh, or, or, or generate the installer for, for the mobile applications for, for Android and for iOS. We also deploy configuration files in that, in that layer. You know, in the next layer, the backend layer in which we have our APIs and our business logic, uh, we basically deploy, uh, standard that ear files. Those are Java, Java installers and configuration files in the database layer. We work with DDL data definition scripts and data modeling scripts. And in our integration layer, we, in which we have basically an SOA appliance, we deploy scripts for changing, uh, for in, in our deployments, we basically put configuration files.

00:05:39

The business goal that was given to us in it was to target 100 million users in between three to five years. We're currently serving 15 million users amongst all our, of our clients in South Central America and the Caribbean. So we still have a, a long, a long way to go, but we're working on it. Two additional business requirements that, that were given to us was basically increase our release cadence by four times. In, in general, in Latin America, usually, um, payrolls are paid on the 15th and by the end of the month. So those dates are frozen for us. We cannot deploy into production in those days, and we can also, we cannot also deploy the days, so a couple of days before, and a couple of days after those, uh, payroll payment dates. So that, that gives us only a couple of weeks to do our deployments in, in, in our production environment.

00:07:04

Additionally, uh, the business was asking for us to have better service quality. It's of course, with, with, uh, with, with our previous manual process, uh, we were having, we were prone to human error and the service availability and the credibility of our clients was being affected. So that was, that's of course another business requirement that that was asked for us. So to achieve that, those business requirements, we needed to get deployment automation in place. That's the only solution that we had. And additionally, we had to adjust our processes. We have been doing, uh, continuous integration for more than 10 years, and when we started, we were doing, uh, uh, manual, manual builds. We, we did that. We, we, we went forward to do continuous integration, and our teams have have been doing that for all that time. Now that we started the, our DebOps journey almost one year ago, our teams had to learn to do things again in a different way. And that's one of the challenges that we have as well.

00:08:34

Our initial plan to have, uh, automation, to have the, the deployment automation. Uh, we decided to go with several applications, and we decided as well to focus in one of the layers of our architecture. We decided to go with the backend layer, because that's the one in which most of the bug were found. That's the most difficult one to change in terms of, of, um, solving, solving the box, because those are the logic. Those are the API and logic, uh, layers. And our initial plan then was to allow, allow the, the developers to deploy into their integration environment and to allow the QA team to deploy into their, um, QA environment. Without the intervention of our, uh, operations team on a second phase, we decided to go into the production environment to pull this first, uh, plan of deploying that only layer in the production environment.

00:09:50

So now, now we start to see that with that initial plant, which initially sh uh, looked well for us. There, there start to see, uh, some smells. Some things were not good, but we, we, we started like very shy because this was our first time doing, um, DebOps. So we, we didn't want to mess with the production environment from the beginning. So this is our, this was our situation. Um, before doing automation, it was basically the development team doing all manual deployments in all of our layers and components. And the operations team was in charge of both the QA and the production environment. Also manually. With that first, uh, that initial plant plan that we decided to go, uh, to go with, we, we were going to allow, or, or we allowed the development team to automatically deploy into the backend layer. We let, uh, sorry, uh, we let the QA team deploy into the, into the backend layer.

00:11:13

Um, that, that one, um, this part is, uh, was deployed by the QA team, but the rest of the layers and the production environment was being deployed by the operations operations team still. So now with, even with, uh, with that first, uh, with that first approach, we had very good results, but we also realized about some challenges and we were not delivering what the business was asking from us. Uh, so in this part, so we went for, for achieving this first part, we needed company-wide company-wide commitment. That means that, uh, we had to break the silos. We had, uh, we had all our teams working together with our DebOps team to put that plan, uh, to put that plan up. That was the first time that we, we had the, the developers talking with the operations team, and then with the, with the QA team, and we were working all together to achieve one the same goal. Additionally, the management sponsorship was very important. Uh, these kinds of initiatives, because they are, um, they are across the whole company, cannot be, cannot be achieved by, by, uh, only one team pooling and, and doing all the, all the, all the effort, all the effort there. So management sponsorship was very important for this initiative as well.

00:13:12

We have, uh, of course, and we have seen some, some of the o other conferences about compliance. We have still to be compliant with, with this, uh, with our compliant auditing and security teams. So we relied on, on our automation tool, electric flow to provide us with the information that we needed in terms of user auditing. The, the reporting capabilities of the tool, uh, gave us the chance to know who did what and when. That's the user auditing. Um, to know what we changed between releases, uh, the execution tracking also means that we are able to know exactly, uh, or the auditors, the, the, during the audits, they are able to know what we changed, what art artifacts we changed, what components were changed, and, and go exactly to the, to the level that they need to, to check, uh, for any change in, in our environments. Additionally, uh, we had to work with the infrastructure team, uh, to, to do some changes in our network segments from the beginning. We have, we have always have, uh, a separation between the development or integration network segment, the QA network segment, and the production network segment.

00:14:53

Now, we needed to somehow send signals from the integration environment to the QA environment for the pipeline to continue their deployments. Uh, but we, we were, we needed, we needed to be compliant with the condition that they, those segments were completely separated. So we worked with our infrastructure team and added an additional segment that is trans, that is trans transverse to all other, um, environments. Uh, and from that new network segment, we were able to connect and put all the security requirements that sec, that security and the, and the compliance team were asking from, from us to be able to push or, or to handle the deployments in each of the separate environments.

00:15:52

We used a feature of the tool, of the automation tool called the entry gate rules that allows us to put stops in the pipeline. So when a developer starts a deployment, uh, the tool stops and only a person, an authorized person from the next environment, is able to do the review and the approval for deploying, deploying in the next environment. With that first phase, we got very good results. We were initially in, in, in our integration environment, we were deploying, uh, in 45 minutes, and we went from 45 to 12 minutes, uh, in two of our applications. And with the new, with the new, uh, automation, we were able to have savings of 40 to 73% in, in time. And with the number of deployments that, that we have per month, that goes almost to 150 hours of savings, that now the, the developers were not spending doing deployments those hours, that that's ba, that's basically, uh, almost one developer working full time during, during a month. So he should, he should probably be, uh, doing and, uh, creating more functionalities and adding more value to the business. In our QA environment, uh, we went, we had even better results. We went from 120 minutes, two hours to 20 minutes per deployment. Um, that's an 83% savings of time that now the operations team is not spending doing manual, manual processes with the number of deployments that we were doing per month or, or that we are doing per month. That's more than 400 hours that we are saving the de the operations team. That's basically, can you

00:18:08

Elaborate on that? Is that manual testing? Is that automated testing?

00:18:12

Say again, please.

00:18:13

Is that automated testing or manual testing?

00:18:16

The, no, this, this is, this is manual testing. This is only, I'm talking here only about the deployment. This is not testing. This is just

00:18:28

The QA environment.

00:18:29

Yes. This is deployment into, in the QA environments, not, not, not the testing. Okay. Okay. And, and those two hours represent also the, um, the queues that we were adding to the process, to the operations team. So now that we took, uh, 100 minutes per deployment to the, to the develop, to the operations team, uh, that's basically three operations team that should be, uh, spending their time doing either more deployments or, or improving their mon their monitoring in, in their, in their environments.

00:19:11

Can you go back?

00:19:13

Sure.

00:19:14

So, so you're telling us how, you telling us the what, but how did you achieve this?

00:19:19

Okay, for this, we have to work with processes. Let me go back to, um,

00:19:29

Yeah, this one. So we opt, we optimize our deployment process. We, we auto automated it, and we are only, we are only talking here about one of the layers of the architecture. So first, first part, automating second part, we made our processes leaner. We created less documentation. Uh, we relied on the capabilities of the tool to give the information that our manual processes were generating in our manual, previous manual process. Uh, we were generating a lot of documentation, manual documentation that we have to deliver to the next team to operate. So we got rid of, of, of some of the, of that documentation, and we started generating some of that documentation as evidence because the auditors were still, um, requiring that information. And some of the reporting that we generated generated manually, uh, for example, um, evidence of the deployments on ev, evidence of the restarting of a, of a server, for example. It's now being given automatically by, by the tool in their reporting capabilities. So basically two points there, automation of, of the deployment and linear processes.

00:20:57

So with these good results, with these good results, we also, uh, generated a bottleneck in the QA environment because now the, the developers were deploying faster in their environment and were, and were delivering faster to qa. But the operations team, even with the savings that I showed, that should be almost three people doing other things, they were not able to keep up with the new, um, with the new speed that the developers were having in their developments in the, in the, in their deployments. And, and the speed that the QA team were having in that backend layer, the QA team was, was still depending on the operations team for the, for deployment on, for deploying, for the deployment of the other layers. So we went from automating a single layer in multiple applications to automate all layers in a single application. That's the change of strategy that we have to do, and that's the change of plan. That's what we changed from our original plan. I'll go back again. We went from automating a single layer in multiple applications to automating all layers, uh, in a single application. The application that, that the application that we chose was one of the, it was, uh, decided with the operations and the business team and was the, was, uh, was one of the applications that the business was requiring, requiring us to, to deliver faster and to increase the cadence of the deployments in production.

00:22:48

Uh, the goal then was to, to have all the environments for that only application, all the layers, and to give the teams the ability to deploy with only one click. So, uh, the developers were deploying in all their layers with only one click of the tool. Uh, the QA team is now deploying in all the, their layers with only one click. And, uh, in this first phase, we were still not, uh, we finished that part and we are working and we are going towards, uh, pushing that into the pro into production environment. And about, in about two to three weeks, uh, we are planning to do our first fully automated deployment in production. That's something that we're working right now with our coworkers back into Theo. The second challenge that we found, and that we had to deal, deal with was the resistance of the teams.

00:23:56

Uh, with that change, um, adjusting to this new process, to this, to these new tools needed us for giving them coaching, generating documentation for the new members of the develop development teams and QA teams that were coming. Uh, we, we created demos, we created supporting, and the support was increased because, because a lot of teams were, were wanting to, um, to start doing automation. Once they saw the other teams, uh, deploy automatically, they just, they, they started to come to us and asking for us to, to include them, but we were, we, we already had our plan so that the business is the one that, that was giving our priorities at that moment. The good thing is that, uh, once they, the team started, they really loved it. It's the, the same phenomena that, uh, the same phenomenon that happened 10 years ago when we started doing continuous integration.

00:25:09

Two important lessons that we learned in in our process, since we are not in the business of creating, uh, DebOps and automation platforms, uh, we learned that we need a, a unified platform, uh, that we could build on and provided as with enterprise grade features, like multi-tenancy, high availability, um, uh, and that we could, uh, work with. Yeah. The other important lesson is that we underestimated the number of nodes that we needed to, uh, automate at the beginning. We started only in the, in our first plant, we only looked at a couple of applications and only one layer. And we, when we changed to our strategy to focus in one application and all the layers, the number of nodes that we needed to automate was increased. So that was something that we had to deal in terms of budgeting as well. So, uh, our advice, or my advice, um, for, for everyone is that plan and budget for the best for the beginning.

00:26:32

Our next steps in, in our process in <inaudible>, in our, uh, continuous improvement is going to keep, uh, keep working towards the goal of serving 100 million users in three to five years. We want to deploy to implement automation deployment automation into all of our applications. We want to increase the, increase the, the integration with functional testing suites that we are, we're currently working on. We also want to go forward with the, with building and, and the delivery of the, of the mobile applications in terms of delivering them to the testing markets or even delivering them to our final clients for them to deploy into their production production environments in the markets by themselves. And finally, uh, our, in our next steps, we want to go forward with our, with our database automation and, uh, the DML scripts. So we, so we can have both automation of the data manipulation and also data definition. That's, that's our other goal. So thank you very much. That's, that's all I have to, to share with you. If you have any questions, I'll be happy. Can

00:28:07

You talk a little bit more about EL automation?

00:28:11

Sure, sure. So, so we, we, in, in our current, um, state, we are, um, we are deploying only script for data, for data, data manipulation. Uh, we have, we learned that we still have a long, uh, a long journey to go there because we have like al like mega scripts. And one of the things that we realized and, and that we learned reading the documentation and, and reading other experiences are, is that we have to, to, uh, have smaller pieces for, uh, for our scripts. Um, so that way you can add, put those, uh, those scripts in the, in the automation pipeline and have them deployed and have them installed in the database without any problem. If if you have smaller pieces, it's going to be easier even for doing rollbacks if you have to. Okay. Did I answer?

00:29:19

How do you validate?

00:29:31

Uh, how do we validate? Do, how do we validate that information? The, the information that we are, um, deploying, there are basically configuration configuration. It's not, it's not, uh, transactions that, uh, transaction information, it's not information about the transaction of our, of our solutions. They are basically configuration, um, configuration files or configurations that are work for our applications that are, are storing the database. So that's, that's, that's the part that we are automating.

00:30:09

Do you have cases when you need to handle <inaudible> because of one of the layers failed.

00:30:15

Oh, sorry. We got the time out. We can, we can speak here. Thank you very much.