{"id":637,"date":"2022-05-08T14:50:34","date_gmt":"2022-05-08T04:50:34","guid":{"rendered":"https:\/\/sysmit.com\/cf22\/?p=637"},"modified":"2023-12-13T15:28:02","modified_gmt":"2023-12-13T05:28:02","slug":"postmortems-software-outages-psychological-safety","status":"publish","type":"post","link":"https:\/\/sysmit.com\/cf22\/postmortems-software-outages-psychological-safety\/","title":{"rendered":"Renaming “post-mortems” of software outages for psychological safety"},"content":{"rendered":"\n

As a generative leader and mental health advocate, I am wary of seeing such a morbid term being thrown around for what should be a learning experience that advances culture.<\/em><\/p>\n\n\n\n

This post will differ from my usual positive posts about Site Reliability Engineering (SRE). Please bear with this because I\u2019m an otherwise forward thinker.<\/p>\n\n\n\n

Two issues I have with the term\u00a0post-mortem<\/em>:<\/p>\n\n\n\n

    \n
  1. It compromises the psychological safety of novice SREs<\/li>\n\n\n\n
  2. It risks your job security in pathological organizations<\/li>\n<\/ol>\n\n\n\n

    Let\u2019s unpack this.<\/p>\n\n\n\n

    Imagine being a new SRE and hearing all these fascinating terms like SLOs, observability, APM, Chaos Engineering, etc. <\/p>\n\n\n\n

    Then a term \u2014 typically reserved for\u00a0gritty crime dramas \u2014 makes its way into the SRE lingo<\/strong>.<\/p>\n\n\n\n

    \u201cWe are doing a post-mortem on yesterday\u2019s outage\u201d.<\/em><\/p>\n\n\n\n

    A what? You and I know what it means: figuring out what went wrong after an outage or performance degradation event in the production software system.<\/p>\n\n\n\n

    But let\u2019s consider others for a second.<\/p>\n\n\n\n

    It\u2019s a ghastly connotation for people who are averse to negative metaphors. Even more for those mentally scarred from seeing post-mortem scenes on TV. <\/p>\n\n\n\n

    Yes, they exist, but many will not be vocal about it.<\/p>\n\n\n\n

    I can understand the term\u2019s origins. A lot of my friends are pure engineers and many of them have a sense of dark humor that they use to shock and delight each other.<\/p>\n\n\n\n

    But Site Reliability Engineering spans well beyond the figurative IT basement<\/strong>. It has begun to draw in diverse \u2014 in particular, neurodiverse \u2014 talent. Do we need to have lingo like this?<\/p>\n\n\n\n

    The other issue I have with this term is that it can risk your job in companies that don\u2019t practice \u2014 and likely never will \u2014 a\u00a0Westrum generative culture<\/a>\u00a0as outlined by Humble et al. in their book,\u00a0Accelerate<\/em>.<\/p>\n\n\n\n

    A generative culture of accepting failure will not translate well to companies where I\u2019ve seen managers chastise people for unavoidable mistakes.<\/p>\n\n\n\n

    Sonja Blignaut is a complexity science thought leader and has written about the use of\u00a0dark metaphors<\/em>\u00a0in organizations. Dark metaphors are words loaded with some other meaning that is likely to cause friction in organizational dynamics.<\/p>\n\n\n\n

    Here\u2019s an excerpt from\u00a0Sonja\u2019s writing<\/a>:<\/p>\n\n\n\n

    \n

    \u201c\u2026 we examine the nuts and bolts of what makes a powerful learning experience.\u201d<\/em><\/p>\n\n\n\n

    Nuts & Bolts \u2026 so are learning experiences like a machine?<\/p>\n<\/blockquote>\n\n\n\n

    and another one:<\/p>\n\n\n\n

    \n

    Looking \u201cunder the hood\u201d to understand culture; Fireing up, fixing or fine-tuning your culture. Creating a culture change \u201cdashboard\u201d.<\/em><\/p>\n\n\n\n

    So is culture like a car that can be taken apart, fixed and tuned? Again the metaphor implies predictability and mechanistic certainty.<\/p>\n<\/blockquote>\n\n\n\n

    So what will happen when a Site Reliability Engineer talks with a sociopathic manager who takes metaphors literally? \u201cWe\u2019re doing a post-mortem on the outage.”, the SRE says. <\/p>\n\n\n\n

    The manager will think, \u201cPost-mortem? Sounds like something went bad, and someone must have messed it up big time.\u201d and then say, \u201cOkay, so who\u2019s involved in that?\u201d.<\/p>\n\n\n\n

    In an act of self-preservation, many such managers will create a scapegoat for a revenue-losing or bad-PR downtime incident<\/strong>.<\/p>\n\n\n\n

    In its essence, the word\u00a0post-mortem\u00a0<\/em>is a trigger word describing maleficent intent.<\/p>\n\n\n\n

    The institutional meaning will evolve into something that takes us away from what we want in all organizations \u2014 even the non-tech ones. We want a blame-free environment that lets us learn and improve systems.<\/strong><\/p>\n\n\n\n

    On that note, I propose \u201cRetrospective” for describing post-incident analyses. <\/p>\n\n\n\n

    In a world already deeply fatigued by negativity, do we need to be reminded of it every time we want to work to improve our systems?<\/p>\n","protected":false},"excerpt":{"rendered":"

    As a generative leader and mental health advocate, I am wary of seeing such a morbid term being thrown around for what should be a learning experience that advances culture. This post will differ from my usual positive posts about Site Reliability Engineering (SRE). Please bear with this because I\u2019m an otherwise forward thinker. Two […]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[60,1],"tags":[12],"_links":{"self":[{"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/posts\/637"}],"collection":[{"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/comments?post=637"}],"version-history":[{"count":7,"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/posts\/637\/revisions"}],"predecessor-version":[{"id":5017,"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/posts\/637\/revisions\/5017"}],"wp:attachment":[{"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/media?parent=637"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/categories?post=637"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sysmit.com\/cf22\/wp-json\/wp\/v2\/tags?post=637"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}