,

Reaffirming the value of SREs amid ongoing tech layoffs

I’ve been curious about the prospects for Site Reliability Engineers (SREs) as companies scale back headcount across the board. This opinion piece will unpack the pressing issue.

Many experts predict an ongoing downturn in the tech job market that could last for the next 3-5 years.

An unfortunate turn for many employed in the tech industry and a time for us to come together and support one another.

As someone who has been observing this industry for over two decades, I get the feeling that the recent spate of layoffs is not a temporary problem.

It is unlikely that the situation will change in the near term.

This rightfully is leading to a prevailing sense of pessimism about the future of careers in Site Reliability Engineering.

I feel the need to address this issue to provide clarity to those who may be feeling uncertain about their future in this field.

Let’s begin with a reasonable statement…

Laying off SREs is a risky proposition

Done without justification, it can have serious consequences for an organization.

One SRE told me in no uncertain terms:

Companies getting rid of SREs are penny wise, pound foolish. SRE talent is difficult to attract and keep. Getting rid of them would hamper the ability to keep a product competitive in the long term in the vast majority of organizations.

For example, many executives are now aware of Twitter’s outage woes, as new management reduced its SRE headcount to less-than-ideal numbers.

Even before the pandemic, many companies struggled to find and hire qualified SREs, leaving many teams already operating with limited resources.

A 2020 report from Indeed stated there were 9 job postings for every one qualified SRE candidate.

It’s vital to note that SREs were not hired in droves, so it’s imperative to keep the ones that a company has already invested time and resources in.

Related article:  How cloud infrastructure teams evolve – from start to maturity

If a company were to lay off its existing SREs, it would be a short-sighted and potentially disastrous decision.

SREs play a critical role in ensuring the smooth operation of production services, and their absence can lead to unforeseen downtime and lost revenue.

A study by Gartner found that software downtime costs businesses anywhere from $300,000 per hour all the way up to $1 million per hour.

SREs can have a direct impact on cost savings and an indirect impact on revenue growth.

Cutting SRE positions would be a mistake with keeping the above in mind. It could have long-lasting implications for a company’s reputation and bottom line.

To avoid this situation, companies need to invest in training and developing their existing SREs and work to attract new talent to the field.

There is a simple truth to all of this…

Modern applications require SREs to ensure their continuous operation, especially in lean organizations with smaller development teams.

By doing so, they can ensure that their production services remain stable and reliable, even in the face of economic uncertainty.

Nonetheless, as an SRE it’s important to show your value, not just believe in it.

I have noticed a consistently high demand (and low supply) for high-quality DevOps/SRE professionals, based on recruiter feedback.

However, if the SRE title is being merely used for the sake of it i.e. the role is a rehashed SysAdmin, there may be trouble.

Similarly, having the SRE title and working solely on platforms like cloud migrations or K8s transition is a high-risk proposition.

When the “project” stops, so does your work.

A higher quality SRE deeply understands infrastructure, can code well, and knows how to scale & architect systems.

In this case, even if your current employer decides to let you go, you can expect to be in between roles for a few days or weeks at the most.

Related article:  Inside Disney’s Site Reliability Engineering practice

Let’s explore some specific ways to show your quality and value as an SRE.

Ways to show value as an SRE in a downturn

Automate the work

One of the best ways to demonstrate your value as a Site Reliability Engineer (SRE) is through automation.

This not only improves the efficiency of the processes but also helps with cost control.

By automating certain processes, you can lessen vendor sprawl, which can save significant amounts of money.

Automation also ensures that developers are spending less time on non-value-adding work, freeing them up to focus on more important tasks.

Support financial view of operations

Another way to show your value is by focusing on FinOps.

If your organization is going into cost-cutting mode, you can lean hard into FinOps to help your organization save money.

Adding FinOps labels to workloads based on things like “cost center”, “product feature”, etc. can provide better visibility into expenses.

Talk to your finance people to figure out how they categorize things, what they care about, and what they would like to have better visibility into.

Once you have added the labels, you can create FinOps dashboards to track costs based on said labels.

C-level executives love dashboards with dollar signs on them, so this can be an effective way to demonstrate your value.

Finding ways to make your pretty FinOps dashboards trend down and to the right can also demonstrate your value to the organization.

It’s worth noting that while there may be some overlap, DevOps and FinOps are different concepts.

DevOps is about continuous delivery, shortening development cycles, fast feedback loops, etc., while FinOps is more about improving visibility into and management of operational expenses.

Related article:  Rundown of Netflix’s SRE practice

That being said, you could certainly approach FinOps using DevOps principles, with your finance department as a primary stakeholder.

Save money through capacity planning

Lastly, some organizations completely overlook some aspects of their software operations when times are good. These areas include:

  • autoscaling for higher resource use only when required
  • utilizing preemptible VMs at a lower price point
  • adjusting workload size for lower resource impact

If you know how to implement these properly, you can prove your financial value to your current employer or impress a potential employer.

Your bottom line from developing expertise in these areas is that you can help your organization save money and improve its overall efficiency.

These are traits that protect a job in tough times.

Wrapping it up

If a tech company has to lay off employees, SRE and DevOps teams should logically be the last to go.

These positions are difficult to fill and even harder to replace, especially when a company’s service availability depends solely on these professionals.

Nonetheless, be sure to assess the value you give the organization against what high-quality SREs are expected to. It can mean a lot for the future of your job.

Ash Patel
Connect?