{"id":729,"date":"2022-06-15T17:55:08","date_gmt":"2022-06-15T07:55:08","guid":{"rendered":"https:\/\/sysmit.com\/cf22\/?p=729"},"modified":"2023-12-13T15:28:02","modified_gmt":"2023-12-13T05:28:02","slug":"jaeger-tracing-software-observability","status":"publish","type":"post","link":"https:\/\/sysmit.com\/cf22\/jaeger-tracing-software-observability\/","title":{"rendered":"How Jaeger tracing fits into software observability"},"content":{"rendered":"\n

In this article, I will share how tracing and more specifically Jaeger tracing can fit into your wider software observability strategy. <\/p>\n\n\n\n

Before we get into tracing, let’s define observability.<\/p>\n\n\n

What is observability?<\/h2>\n\n\n

Observability is a comprehensive means of gaining data on how software services perform in production. <\/p>\n\n\n\n

This data gives you a picture of the health and performance of individual services<\/strong>, as well as the cloud infrastructure that supports them. <\/p>\n\n\n\n

It can be broken down into 3 actions: logging, tracing, and monitoring. Our focus in this article will be on tracing. <\/p>\n\n\n

What is tracing?<\/h2>\n\n\n

Tracing is an action that tracks a request from initiation to completion<\/strong> within a microservices architecture. <\/p>\n\n\n\n

It usually starts when a user or service starts a request which moves along a chain of interconnected services needed to fulfill the request. <\/p>\n\n\n\n

With tracing enabled, software engineers and SREs can pinpoint any issues within the chain of requests among the various involved services. <\/p>\n\n\n

Where Jaeger fits into the tracing paradigm<\/h2>\n\n

What is Jaeger tracing?<\/h3>\n\n\n

Jaeger is an open-source tracing tool that allows engineers to track request performance and issues among 10s, 100s, and even 1000s of services<\/strong> and their dependencies. It <\/strong>collects tracing data and then populates Grafana dashboards.<\/p>\n\n\n\n

The key benefit of this is that it highlights downtime\/load-time risks and errors. This makes it an essential component of a strong observability practice. <\/p>\n\n\n

Jaeger’s origin story <\/h3>\n\n\n

Jaeger was created in 2015 by an engineer at Uber, Yuri Shkuro, who wanted to help engineers work out where<\/em> issues were popping up. This emerged as a critical need at Uber over time.<\/p>\n\n\n\n

\"Glimpse
Above: a glimpse of services that support the Uber app. Many of these services get triggered every time you request an Uber ride. (Source: Youtube, Jaeger Intro \u2013 Yuri Shkuro<\/a>)<\/em><\/figcaption><\/figure>\n\n\n\n

The Uber app may seem simple to its end users, but behind the facade runs a complex network of microservices. Many of these services depend on other services and their sub-services.<\/p>\n\n\n\n

Weaknesses in the service chain can risk the whole user request falling apart i.e. no ride. <\/p>\n\n\n\n

In business terms, Uber risks losing ride fares at a large scale if one or some component services fail or slow down.<\/p>\n\n\n\n

\n

\u201cIn deep distributed systems, finding <\/em>what<\/strong> is broken and <\/em>where<\/strong> is often more difficult than <\/em>why<\/strong>\u201c<\/em><\/p>\n\u2014 Yuri Skhuro, Founder & Maintainer, CNCF Jaeger<\/cite><\/blockquote>\n\n\n\n

Jaeger tracing helps engineers find out what services are experiencing issues and where. That way, they can fix small issues before they snowball into serious problems or crises.<\/p>\n\n\n

Do your observability needs justify using Jaeger?<\/h3>\n\n\n

You might be wondering whether you even need Jaeger. After all, your use case might not be as complex as Uber\u2019s. Jaeger was designed to make sense of a complex web of services and up to millions of daily requests<\/strong>.<\/p>\n\n\n\n

Tracing is not an absolute must-have for simpler software architectures. However, it is useful for finding bottlenecks if you have more than a handful of services. Having more than 10 services is a fair threshold of need.<\/p>\n\n\n\n

Would the following situation ever pose a problem for your software? Your application has more than 10 services and suddenly gets a traffic spike. A large volume of requests has not been completed. <\/p>\n\n\n\n

How will you find the culprit fast enough to fix the issue? <\/p>\n\n\n\n

If this compels your need for tracing, let’s explore how Jaeger tracing works from a high-level view: <\/p>\n\n\n

How Jaeger tracing works<\/h3>\n\n

Step 1<\/strong><\/h4>\n\n\n
\"\"<\/figure>\n\n\n\n

Jaeger Agent<\/strong> gathers \u201cspan data\u201d by sampling parts of UDP packets transmitted by microservices<\/p>\n\n\n\n