Las Vegas 2022
How Google SREs Modify Production Resources Securely & Safely
Most outages are caused by changes to production resources. Automated production changes are typically fast and secure, but can't address every use case -- especially during an incident. An SRE with production access can fill this gap, but that access introduces reliability and security risk if they make a mistake or their account is compromised. To balance this risk, Google developed a framework that automates the majority of production operations, while providing routes for manual changes when necessary.
BB
Brett Beekley
SRE, Google
MB
Michael Bird
SRE, Google