Articles on better software operations practices

Rundown of Netflix’s SRE practice

Introduction A lot goes on in the background every time you load up your favorite Netflix movie or series. Engineers spread across Chaos Engineering, Performance Engineering and Site Reliability Engineering (SRE) are working non-stop to ensure the magic keeps happening. 📊 Here are some performance statistics for Netflix When it was alone on top of … Read More

Runbooks for better incident response

Introduction I can confidently tell you that runbooks form a critical part of the incident response toolkit. I will also tell you that SREs are well-placed to start and oversee the development of runbooks. If you don’t have a runbook yet, let me entice you with the thought of checklist-type documentation to follow when you’re … Read More

SRE is not a monolithic role

SRE is gaining more traction and a misconception is gaining steam among senior stakeholders. That SRE is a monolith role like what “programmers” were in the 90s. Let’s burst that misconception… SRE is a broad, overarching responsibility that needs a multitude of role considerations to pull off properly. It is not a monolithic role where … Read More

How SREs are unique in their approach to work

Site Reliability Engineers (SREs) are a rare bunch in the software community. But there’s little denying that the approach of Site Reliability Engineering is the future of software operations. Here are some things that make SREs a unique breed in software work: SREs look at the broader picture Ask any developer what they’re working on … Read More