Tag: site reliability engineer management

  • Where in team topologies does Site Reliability Engineering fit in?

    We will explore the workings of the Team Topologies model and how Site Reliability Engineering (SRE) teams can fit into it. In more detail, I will share with you the following: Let’s get started. Overview of team topologies Team topologies is a relatively new model/framework, having been officially introduced in 2019. It’s a response by…

  • Rundown of Uber’s SRE practice

    Introduction Every time you push a button like the one below to request an Uber ride, you activate a sequence of (micro)service requests. You’d never know unless you look under the hood because most of these services run solely in the background. Yet every service contributes to the start and completion of the Uber ride…

  • Building the case for starting a software reliability team

    This article aims to help engineering leaders consider issues before starting a software reliability team. Since I am an advocate for Site Reliability Engineering (SRE), we will now refer to such a team as the “SRE team”. Besides creating a new team, leaders face many responsibilities that are often invisible to individual contributors and their…

  • How cloud infrastructure teams evolve – from start to maturity

    I recently read a post by Will Larson, who started SRE at Uber. The post is called the Trunks and branches model for scaling infrastructure organizations. Several passages in the post covered how infrastructure teams can evolve from the startup phase. I felt it would be easier to comprehend the dense-and-rich advice with a visual…

  • Cloud infrastructure success is a fine balance of budget and service quality

    The visual summary below is based on a post by Will Larson, who started the SRE function at Uber. His post elaborates on a “trunks and branches” model for developing infrastructure-facing teams. It also covered an interesting perspective on the balancing act of budget and service quality. I will explain the visual summary underneath it.…

  • 25+ Site Reliability Engineering OKRs

    Incident Response OKRs System performance and resilience OKRs Developer support OKRs DevSecOps OKRs FinOps (Cloud Cost Control) OKRs Work practices OKRs Feel free to reach out if you have any questions about the above OKRs or want us to add a new OKR.

  • How SREs are unique in their approach to work

    Site Reliability Engineers (SREs) are a rare bunch in the software community. But there’s little denying that the approach of Site Reliability Engineering is the future of software operations. Here are some things that make SREs a unique breed in software work: SREs look at the broader picture Ask any developer what they’re working on…