, ,

#9 Inside Booking.com’s Site Reliability Engineering Practice

Episode 9 [SREpath Podcast]

Ash Patel interviews Samuele Tonon and Yoann Fouquet about their experiences in managing and growing the Site Reliability Engineering (SRE) function at Booking.com.

Booking.com is one of the world’s largest travel sites with a market capitalization of over $100 billion and over 1.5 million bookings per day.


Here are key highlights from our conversation:

SRE antipatterns

  • Yoann highlights the anti-pattern of SREs embedded in teams doing the work instead of product teams, leading to a lack of scalability and automation.
  • Samuele emphasizes the importance of a blameless culture but warns against the misconception that it means “no accountability”.

Challenges Faced in SRE:

  • Samuele discusses challenges like teams treating SREs as a shield or SREs being treated like software developers.
  • Yoann mentions challenges in maintaining consistent SRE standards, especially as the company grows.

Technical Challenges in SRE:

  • Yoann talks about the challenges of data reliability, ensuring data correctness, and preventing data-related outages.

Advice for New SRE Managers:

  • Yoann advises new SRE managers to balance reactivity with a long-term vision, setting clear KPIs, and fostering motivation.
  • Samuele encourages managers to support team members and create room for personal and professional growth.

Transcript

⚠️ We apologize but due to a technical issue, this episode’s transcript is not available ⚠️

Related article:  10 Tips for Onboarding New SRE Hires