Productizing the Network: Square Peg, Round Hole? (US 2021)

We've been on an amazing journey at Capital One since productizing our network infrastructure teams in 2019. Listen as Girija Rao, Vice President Software Engineering; Denee Ferguson, Director-Edge Network Services; and Jennifer Miles, Director-Agile Portfolio discuss the drivers for productization, unique challenges with productizing network infrastructure teams, changes made, outcomes, and lessons learned.

las vegasusplenaryvegas2021

Girija Rao

Vice President, Card Technology, Capital One


Denée Ferguson

Director, Technology, Capital One


Jennifer Miles

Director, Technology, Capital One



I love the story because it challenges the notion that ops principles and practices are only for customer facing software applications. So up next is Karega Rao. She is a vice president who up until very recently led enterprise conductivity presenting with Denae Ferguson director of edge network services and Jennifer Miles director of agile portfolio. So here's Greta Denae and Jennifer.


Hi everyone. We're here today to share with you the product oriented agile driven transformation be performed in our network organization at capital one, two years ago, and a reflection on the outcomes we achieved. I'm Greta Rao. And with me at Denay Ferguson and Jennifer Miles, and we each played a key role in this effort. I was responsible for the enterprise connectivity organization and along with my leadership team initiated and led the transformation. Denise Ferguson was responsible for tier three operations at the time and was one of the key leaders in driving the effort with teams along the change curve. She now leads edge network services, providing wired and wireless connectivity to over 50,000 associates across over 450 locations. Jennifer Miles leads, agile portfolio and program management for cloud and connectivity, and was a key contributor to figuring out how to make all work visible and evolve our delivery model. Before I get into the details of what we did, I would like to share some key facts about capital one. We were founded in 1994 and are led by our founder, Richard Fairbank. We're a top 10 bank based on us deposits the third largest credit card issuer in the U S and a fortune 100 company from the start capital one has been a leader in technology driven banking. And in 2020, we became the first us bank to exit our data centers and go all in on public cloud.


Our organization enterprise connectivity has approximately 350 associates. We've divided our team into three major towers security and application services, connectivity, and horizontal services with technology scopes that range from proxy, VPN data firewall, DNS load balancing and network admission control to contact centers and voice. And let's not forget my personal favorites, wireless LANs, routing, and switching, and software defined networking, which includes sun and STXs. We support connectivity for over 50,000 employees, over a hundred offices and hundreds of retail bank branches and cafes. And we have an excess of 14,000


On our network at 185,000 carrier assets, which includes circuits toll-free lines and pots lines. So let's start by talking about why we felt change was needed. We had an organizational structure that had been in place for a few years, and there were several pain points and opportunities that had become clearly apparent to us in this structure. There were two primary network organizations, one focused on network architecture and engineering, and the other focused on network operations each in their own tower and under separate executive leadership, there was fragmentation in our approach and despite all of our best efforts, a significant engineering versus operations dynamic, there was no unified sense of ownership or clear lines of accountability. It was too easy for things to get thrown across the fence without adequate consideration of the end to end life cycle sustainability or customer experience due to this for any project or issue multiple network teams needed to be engaged each focused on their own piece of the puzzle.


Practically speaking, this meant that teams were working in parallel without sufficient visibility to what others were doing, absorbing the unplanned impact of upstream or interdependent projects and lacking standardized knowledge and best practices for the same platforms. There was inefficient resourcing due to the siloed nature of the roles. And finally, not all work was being tracked or was being tracked in inconsistent ways. People leaders did not have full visibility to all the work that their team members were doing and prioritization was a challenge. So in 2019, as we embarked upon this change, our drivers were to unify our vision and strategy, improve efficiency, make delivery, predictable, and improve the overall quality of work delivered to accomplish this. We needed to do several things, reorganize around key products and services with clear accountability, make all work visible, to enable effective prioritization, dependency, mapping, and resourcing, centralize our intake and set limits on work in progress combined together.


We felt these actions would strengthen us position us better to deal with our complex and dynamic environment and help us to achieve predictable high-quality outcomes. Our transformation involved changes in three major categories, organizational structure, prioritization, and reporting. We started with our organizational structure. This was huge and ended up with over 160 people changing managers, as I mentioned earlier, and you can see on this slide engineering and ran as two separate organizations, a hundred different executive leadership while supporting the same network platforms and services. We consolidated all of it into one organization under a single accountable executive. We identified each of the distinct products and services and grouped them into product portfolios for each product. For example, wireless, we created dev and ops teams aligned under a single product owner who also served as the people leader for the team, agile delivery and program support where matrix to the teams as a horizontal service. This provided a sense of ownership and accountability within each team for their product and the work delivered as well as helped establish consistent practices across teams. Whey didn't make to do this. We left it as is. For example, we kept our 24 by seven tier one tier two ops teams, separate as horizontal functions, engaging with all the other teams.


The second change we made was related to the way we prioritize our work before our transformation work was prioritized at the team level with very little visibility into what was coming next, operations and engineering prioritize work separately within their own silos. Even though they may have been working on the same platform, this led to teams working on different goals and even different customer pain points context. Switching also became a challenge. As team members were distributed across multiple teams within a silo, they had difficulty knowing where to focus as priorities varied between teams. We also became frequently disrupted with internal and external distractions, such as shoulder tapping and leadership high priority asks these things became a huge problem. Whoever was the most visible at that moment received the attention of the team regardless of existing priorities. So what do we do about this as part of the product team reorg, we took the opportunity to adjust multiple aspects of our planning cycles.


We started by creating an annual initiative prioritization process. We used a bottom up approach to define our work pipeline with teams contributing potential initiatives. To start the list. Leadership review forums were then held to prioritize and refine the list based on business need and organizational goals. Once the final list was prioritized and distributed, all teams were able to see the established priorities across the entire organization. Work with them, broken down into achievable epics and stories with tracking and course corrections made as needed using quarterly increment in shorter sprint planning sessions. This more robust work breakdown process made our teams realize several benefits, including better alignment of capacity and resources. The establishment of work in progress limits and the ability to understand historical unplanned work cycles work in progress limits were an immediate outcome of the priority list with above the line work taking priority and below the line work only being pulled in once teams had opened capacity teams were then able to say no or not now to work that previously may have caused disruption in summary, better prioritization, ultimately reduce team over commitment, allowing them to deliver on what they said when they said


Before we get into more specifics about our outcomes, I'd like to spend a few moments clarifying why this transformation was anything, but a guaranteed slam dunk move. As mentioned previously, our organization is focused on network infrastructure and network delivery has some key differences from application delivery. First common roles within our engineering teams include network engineer, wireless engineer, firewall engineer, DNS engineer, load, balancing engineer, et cetera. While many of our engineers have some software development skills. Most of learn them over the course of this multi-year transformation. Most are not software developers by trade consequently, some agile constructs that are well-known and commonly used in the software development world were not consistently understand or employed across the entire organization prior to the start of this transformation. Second while we've eliminated physical infrastructure, wherever we can. Many network technologies are not at a point where they can be fully eliminated.


As an example, you can't connect your laptop to a wireless land without a physical access point. Consequently, we still had a significant physical footprint. One of the consequences of having a physical physical footprint is many projects are waterfall like and F heavy interdependencies. For example, establishing connectivity to a new office location involves ordering delivery and turn up of a circuit, extending the circuit from the telco room to the equipment, location, provisioning, DHCP scopes, DNS entries, firewall rules, and ordering, configuring, and installing the equipment. Multiple teams are involved in performing these tasks and some of them require an onsite presence. So there's a lot of scheduling and a lot of coordination. Third agile delivery methodologies are not a perfect fit for how we've historically operated. Many of our tasks don't neatly fit inside short iterative time-box sprints or even regular PI cadences work tends to oscillate between short duration tasks and long duration tasks.


And sometimes we have waiting times within the same effort work to accomplish a particular objective to span multiple teams and organizations. And finally, we often have a heavy unplanned workload component which needed a cultural shift to address forcing us to plan much further in advance, learn to say no or not yet, which is a muscle that just wasn't well exercised or even well tolerated at the time. Finally, product management constructs often prove challenging bringing to mind the image of a square peg in a round hole. Our customers are often not aware of the role or services play in their daily experience. And this makes common product management constructs tricky, such as doing product or service focused, customer surveys, empathy interviews, developing north star metrics and actually measuring on them. Second. There is no customer choice element. There's no competitor that our employees can choose from new features and capabilities deployed. Often. Aren't driven by end user requests for market share concerns. Instead, many are security and automation delivered, and consequently they're frequently invisible to our customer base last but not least customer experience. How our customers interacted with the products and services we provide was not historically at the forefront of our mind when we design new solutions.


The third pillar of our transformation relates to reporting. We updated, created or consolidated multiple reporting mechanisms to bride, a better flow of information to our teams and leaders. Specifically, we created more robust operational reporting metrics that measured incident frequency and severity, helping us to understand change impacts. We also adjusted and consolidated agile metrics used at the team portfolio and tower levels, helping us identify improvement opportunities. And finally, we created mechanisms to make initiative progress visible at all levels, ensuring accountability for our outcomes,


Primary operational metrics focused on items directly related to stability, such as incident frequency, severity, and incidents caused by change, which historically as our number one driver of incidents on the top half of the slide, we're focusing on incidents statistics, the left-hand side shows network incident counts for all severities by quarter. The volume is indicated by the trend line and red is clearly declined over the course of the past two and a half years. The right-hand side shows network incident counts for only the critical incidents. Although historically, we haven't had a high number of these. Each of them is in the single digits when they do happen, they have an outsized impact on the organization due to our desire to quickly resolve the impact, perform root cause analysis and identify and implement remediation actions to prevent a recurrence critical incidents. As you can see, have also declined significantly at a bit sharper pace than the overall network incident volumes, as indicated by the sleeper steeper slope of the red trendline.


At this point we've had in excess of 300 plus days without a critical incident, which has been huge for us. The bottom half of this slide focuses on network changes. The left hand side shows the percentage of network changes resulting in incidents, and that is declined significantly, nearly a 60% drop over the course of the past two and a half years. The right-hand side shows how network change volume is trending. And you can see that overall it's relatively steady. That little decrease in Q2 of 2020 is the result of change freezes that were put in place right after the pandemic forced everyone to start working from home. So in conclusion, our testing and change validation procedures have accomplished the desired effect of improving network stability. Two of our agile metrics, our sprint commitment, reliability and velocity, sprint commitment, reliability indicates the percentage of the work.


The team committed to accomplishing at the beginning of a sprint that actually was completed during the sprint and velocity shows the trending number of story points that were completed each sprint and measures the amount of work completed by the team. Both metrics illustrate how the teams were initially disrupted by the transformation and became much more reliable as time progressed. And we're attributing this due to the better strategic alignment on our outcomes. Everyone on the team is now working toward the same goal efficiency gains from organizational restructuring visibility into previously hidden work. All work is now captured and tracked permission to say no or not yet. So we can focus on the right priorities and finally, an enhanced continual improvement mindset.


So let's talk briefly about our initiative. Progress tracking improvements are updated. Prioritization cycles created an environment of greater visibility for all of our associates. We implemented initiative progress tracking through monthly leadership reviews and quarterly planning readouts. By doing this, our teams and leaders were able to see tangible progress occurring, Rita conversations and views included not only milestones achieved, but more importantly, what value was delivered to support the overall organizational goal. Previously teams had difficulty delivering on the predicted timelines and slippages were regular occurrence. Also leadership visibility into team challenges was limited to team members, self identifying an issue, and then having to search for the right forum to raise it in this process was not easy to navigate or transparent in any way in the new organization. Regular review cycles created an open environment that fostered transparent communication between teams and leadership leaders engaged early and often to help an issue resolution leading to a reduction in delivery date movement.


Overall, we also have made better use of our agile tools and applications to help teams raise visibility on dependencies and impediments. So leaders and teams can see where work is at any time in the cycle. The most important outcome of our enhanced progress tracking is that our teams, leaders and partners all have greater clarity into when work is planned to be delivered. They're also able to address challenges before they become critical. We've talked about all the great things we accomplished. However, we also learned some valuable lessons. What we found out is that the devil is really in the details. Our first lesson learned was that it is very important to get buy-in at all levels, not just with the leadership team. We made this organizational change with input primarily from several levels of leadership and basically pulled everyone else along the change curve with us.


In hindsight, we should have invested additional time to explain the why of the change to everyone. A few examples of the why include articulating to all levels, a clear vision of the state from the very beginning, determining at the outset, how agile would be adapted for use in a non software development organization, like our own explaining why it is important to make all work visible, illustrating how projects that normally span weeks to months can work in a sprint based model and why it makes sense to do this. And finally, better explaining the, you build it, you own it model and why we created on-call groups that included all team members and not just operations. Although with all that being said, at some point, our teams had to make the leap of faith and they did. Our second lesson was that the product model may not fit all teams. We encountered this challenge with select support teams, these included circuit provisioning, and as Grecia mentioned, tier one and two operations where historically workload is heavily ticket base to that end, we did keep those teams. Plus our architecture and agile delivery as horizontal services. Our third lesson was that we did not make enough skill retooling assumptions. We would have been better to assume that all rules would need some sort of skill retooling, whether it be operations, engineering, or delivery. This lack of skill enhancement led to some confusion around roles responsibilities.


Our fourth lesson involved, focusing on reporting needs would have been much more helpful if we'd done it from the start incorporating required status reporting into planning sessions. As a key lesson learned the right year, a structure can make status, reporting a breeze and the wrongs, your reporting structure can make it extraordinarily painful. It's far better to invest the time early on to determine what structure makes the most sense before you start the work then to try to adapt your structure while work is in flight. For example, I had a recent effort with my team to perform wireless site surveys at a large number of locations. This involved performing site surveys, analyzing the data generated from the site surveys and then identifying required remediation actions, which either took the form of deploying additional access points at each location, or simply making tuning changes to the RF parameters.


Initially, we had a single epic that focused on all of this work and we rapidly learned that that wasn't going to work for us. So we had to course correct during the course of the project to shift our stories into four different epics, one focused on actually performing the site surveys. The second on analyzing the data and then a third and fourth focusing on the remediation actions, one for access point deployments, and then one for actually doing the RF optimization. Our fifth lesson learned involved agile training while we did have some agile training, more robust agile training was really necessary. We needed to have a refresher for those who were already familiar with the agile construct and a bit of a deeper dive for those that were not more mock-up discussions that focused on how our work change on a day-to-day basis, from what it historically was to what it needed to be. We found initially that it was a struggle to convince our engineers that the additional work we were asking them to do by writing stories for all their work provided value.


So looking back at this two years later, wow, we did achieve the outcomes we had envisioned the enterprise connectivity organization has a unified sense of mission, clear lines of accountability. And as you saw from the metrics improve delivery with higher quality of work, our partners and stakeholders have also in with their feedback on how much they appreciate the clarity of ownership and accountability and the stronger partnerships that we have been able to form with them as a result, this model has stood the test of time and also enabled us to easily incorporate additional infrastructure functions over the past two years. The Jen, I believe you said this the other day that we would never choose to go back to the old model. This foundational structure that we have created is one that we continue to iterate and evolve upon. Ultimately, it's important to note that this transformation was by us for us and purely driven by a strong desire to address our pain points, improve how we were operating and enable ourselves to be the best that we could be. I would encourage anyone considering something similar to take the leap. And finally, the help that we are looking for is that we are actively hiring. So come on over and check out our capital one career website.