Category: Team Development

We deepdive into problems involved in building a reliabilty-focused team. Get insights to help you fix teaming issues ⬇️⬇️⬇️

Case Studies, Podcast, Team Development

#9 Inside Booking.com’s Site Reliability Engineering Practice

Episode 9 [SREpath Podcast] Ash Patel interviews Samuele Tonon and Yoann Fouquet about their experiences in managing and growing the Site Reliability Engineering (SRE) function at Booking.com. Booking.com is one of the world’s largest travel sites with a market capitalization of over $100 billion and over 1.5 million bookings per day. Here are key highlights…

October 3, 2023
Podcast, Team Development

#8 Software Reliability Ninja Who is NOT An SRE (with Pablo Bouzada)

Episode 8 [SREpath Podcast] Ash Patel interviews Pablo Bouzada about his beliefs on software reliability as a non-SRE software engineering leader. They discuss the importance of leadership to drive effective reliability changes in the software system, as well as the challenges of providing reliable service within video streaming giant, ViaPlay. Read the Episode Transcript Don’t…

September 12, 2023
Articles, Team Development

10 Tips for Onboarding New SRE Hires

How new SRE hires can get stuck There’s more than one way to mess up your new SRE hire and get them stuck in a loop. Here are 6 ways new hires will know you’ve made this mistake: This article will unpack these 6 sticking points and show how to solve them. Later on, I…

August 23, 2023
Articles, Team Development

Starting SRE at startups and smaller organizations

Who should pay attention to this article ❌ SRE at a very small startup with few users rarely makes a difference until you’ve reached a fair userbase size or have growing pains ❌ Many organizations without a strong money/legal incentive e.g. SLAs tied to their operations, cannot justify diving into a complex field like SRE…

August 1, 2023
Articles, Team Development

How to convert developers into Site Reliability Engineers (SREs)

In this article, you will learn the following: Introduction Hiring in the Site Reliability Engineering (SRE) space is notoriously difficult. So it makes sense to figure out how to expand the hiring pool beyond existing SREs. One way to increase the hiring pool is to recruit developers (also known as SWEs) and gradually advance them…

February 9, 2023
Articles, Team Development

Analysis of SRE and platform setup at 10+ tech companies

In this article, you will see a breakdown of the platform setup and SRE practices within 12 non-FAANG technology companies. This is based on the case studies by Andrios Robert. “There is a lot of content available on how Google did [Site Reliability Engineering]; let’s uncover what happens with the rest of the world.” —…

November 22, 2022
Articles, Team Development

Where in team topologies does Site Reliability Engineering fit in?

We will explore the workings of the Team Topologies model and how Site Reliability Engineering (SRE) teams can fit into it. In more detail, I will share with you the following: Let’s get started. Overview of team topologies Team topologies is a relatively new model/framework, having been officially introduced in 2019. It’s a response by…

October 12, 2022
Articles, Team Development

Building the case for starting a software reliability team

This article aims to help engineering leaders consider issues before starting a software reliability team. Since I am an advocate for Site Reliability Engineering (SRE), we will now refer to such a team as the “SRE team”. Besides creating a new team, leaders face many responsibilities that are often invisible to individual contributors and their…

May 31, 2022
Articles, Team Development, Visual Summaries

How cloud infrastructure teams evolve – from start to maturity

I recently read a post by Will Larson, who started SRE at Uber. The post is called the Trunks and branches model for scaling infrastructure organizations. Several passages in the post covered how infrastructure teams can evolve from the startup phase. I felt it would be easier to comprehend the dense-and-rich advice with a visual…

April 19, 2022
Articles, Team Development, Visual Summaries

Cloud infrastructure success is a fine balance of budget and service quality

The visual summary below is based on a post by Will Larson, who started the SRE function at Uber. His post elaborates on a “trunks and branches” model for developing infrastructure-facing teams. It also covered an interesting perspective on the balancing act of budget and service quality. I will explain the visual summary underneath it.…

April 12, 2022