Runbooks are a Site Reliability Engineer’s best friend. They are most useful when you envisage putting out the same fires again and again. Or at least do it without a 🤯 feeling.
Why runbooks are useful in SRE incident response
Here are 3 reasons why:
Ways that teams have set up their runbooks
Confluence — is not particularly designed for managing runbooks but is an open-ended tool that enables you if you have a solid enough idea of runbook design
Jupyter Notebooks – an open-source tool with a combo of text, image and live code snippets so decent option if you are happy to install and maintain it
Markdown files hosted in git repo — maintenance might be an issue over time without strict guidelines within the team
Err… this ➝ “Sticky notes on someone’s desk. We’re thinking about getting a laminator to keep the coffee spills from being too serious of a problem.” 😅
Factors to consider in your own runbook setup
Want a deeper understanding of Site Reliability Engineering culture?
👇 Take SREpath’s free 7-day SRE culture patterns course 👇