Establishing SRE Foundations: Aligning The Organization On Ops Concerns Using SRE Team Topologies
Establishing SRE in a larger software delivery organization requires an SRE organizational structure. The structure needs to support the organization with the goal to appropriately align everyone involved on operational concerns in a sustainable manner. The goal can be achieved using different organizational structures. Deciding on the appropriate organizational structure for SRE is not a straightforward task. It requires weighing different dimensions against each other. A core dimension to consider is who should run the services. This gives rise to 4 core options:
- You build it, you run it
- You build it, you & SRE run it
- You build it, SRE run it
- You build it, ops run it
The options differ in incentives for the development teams to implement reliability. The incentives are maximized with “you build it, you run it” because developers are in full control of their own operations workload, which might require them to wake up in the middle of the night. The incentives diminish as dedicated SREs share or take over the operations responsibilities from the developers.
Another dimension to consider is the organizational function to place the SRE teams and SRE infrastructure teams in. There are 3 core options here:
- Development organization
- Operations organization
- SRE organization
Sensible permutations of all the options above give rise to 9 SRE team topologies presented in the talk:
SRE Team Topology 1:
- Development organization: You build it, you run it with no dedicated SRE role. Every developer is an SRE on rotation
- Operations organization: SRE infrastructure team
- SRE organization: none
SRE Team Topology 2:
- Development organization: You build it, you run it with a dedicated SRE role in the team
- Operations organization: SRE infrastructure team
- SRE organization: none
SRE Team Topology 3:
- Development organization: You build it, you run it with a dedicated SRE role in the team and a dedicated developer on rotation
- Operations organization: SRE infrastructure team
- SRE organization: none
SRE Team Topology 4
- Development organization: You build it, you & SRE run it with a dedicated SRE team
- Operations organization: SRE infrastructure team
- SRE organization: none
SRE Team Topology 5
- Development organization: You build it, you & SRE run it
- Operations organization: Dedicated SRE team and SRE infrastructure team
- SRE organization: none
SRE Team Topology 6
- Development organization: You build it, you & SRE run it
- Operations organization: SRE tool chain procurement and administration
- SRE organization: Dedicated SRE team and SRE infrastructure team
SRE Team Topology 7
- Development organization: You build it, SRE run it with a dedicated SRE team
- Operations organization: Dedicated SRE infrastructure team
- SRE organization: none
SRE Team Topology 8
- Development organization: You build it, SRE run it
- Operations organization: Dedicated SRE team and SRE infrastructure team
- SRE organization: none
SRE Team Topology 9
- Development organization: You build it, SRE run it
- Operations organization: SRE tool chain procurement and administration
- SRE organization: Dedicated SRE team and a dedicated SRE infrastructure team
The team topologies have different reporting lines and produce different cultural identities for SRE. The cultural identities are based on a triangle:
- product-centric identity vs.
- reliability user experience-centric identity vs.
- incident-centric identity.
Depending on the reporting lines, the SREs lean more towards one of the SRE cultural triangle vertices. A comparison of the 9 SRE team topologies above will put listeners into a position to evaluate the options well, helping to drive better SRE organizational decisions in their companies.
Dr. Vladyslav Ukis
Head of R&D, Teamplay Digital Health Platform, Siemens Healthineers