Introduction A lot goes on in the background every time you load up your favorite Netflix movie or series. Engineers spread across Chaos Engineering, Performance Engineering and Site Reliability Engineering (SRE) are working non-stop to ensure the magic keeps happening. 📊 Here are some performance statistics for Netflix When it was alone on top of … Read More
Incident Response OKRs System performance and resilience OKRs Developer support OKRs DevSecOps OKRs FinOps (Cloud Cost Control) OKRs Work practices OKRs Feel free to reach out if you have any questions about the above OKRs or want us to add a new OKR.
Introduction I can confidently tell you that runbooks form a critical part of the incident response toolkit. I will also tell you that SREs are well-placed to start and oversee the development of runbooks. If you don’t have a runbook yet, let me entice you with the thought of checklist-type documentation to follow when you’re … Read More
SRE is gaining more traction and a misconception is gaining steam among senior stakeholders. That SRE is a monolith role like what “programmers” were in the 90s. Let’s burst that misconception… SRE is a broad, overarching responsibility that needs a multitude of role considerations to pull off properly. It is not a monolithic role where … Read More
Site Reliability Engineers (SREs) are a rare bunch in the software community. But there’s little denying that the approach of Site Reliability Engineering is the future of software operations. Here are some things that make SREs a unique breed in software work: SREs look at the broader picture Ask any developer what they’re working on … Read More