Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Observability with Grafana
  • Table Of Contents Toc
  • Feedback & Rating feedback
Observability with Grafana

Observability with Grafana

By : Rob Chapman, Peter Holmes
4 (4)
close
close
Observability with Grafana

Observability with Grafana

4 (4)
By: Rob Chapman, Peter Holmes

Overview of this book

To overcome application monitoring and observability challenges, Grafana Labs offers a modern, highly scalable, cost-effective Loki, Grafana, Tempo, and Mimir (LGTM) stack along with Prometheus for the collection, visualization, and storage of telemetry data. Beginning with an overview of observability concepts, this book teaches you how to instrument code and monitor systems in practice using standard protocols and Grafana libraries. As you progress, you’ll create a free Grafana cloud instance and deploy a demo application to a Kubernetes cluster to delve into the implementation of the LGTM stack. You’ll learn how to connect Grafana Cloud to AWS, GCP, and Azure to collect infrastructure data, build interactive dashboards, make use of service level indicators and objectives to produce great alerts, and leverage the AI & ML capabilities to keep your systems healthy. You’ll also explore real user monitoring with Faro and performance monitoring with Pyroscope and k6. Advanced concepts like architecting a Grafana installation, using automation and infrastructure as code tools for DevOps processes, troubleshooting strategies, and best practices to avoid common pitfalls will also be covered. After reading this book, you’ll be able to use the Grafana stack to deliver amazing operational results for the systems your organization uses.
Table of Contents (22 chapters)
close
close
1
Part 1: Get Started with Grafana and Observability
5
Part 2: Implement Telemetry in Grafana
10
Part 3: Grafana in Practice
15
Part 4: Advanced Applications and Best Practices of Grafana

Observability in a nutshell

The term observability is borrowed from control theory. It’s common to use the term interchangeably with the term monitoring in IT systems, as the concepts are closely related. Monitoring is the ability to raise an alarm when something is wrong, while observability is the ability to understand a system and determine whether something is wrong, and why.

Control theory was formalized in the late 1800s on the topic of centrifugal governors in steam engines. This diagram shows a simplified view of such a system:

Figure 1.1 – James Watt’s steam engine flyweight governor (source: https://www.mpoweruk.com)

Figure 1.1 – James Watt’s steam engine flyweight governor (source: https://www.mpoweruk.com)

Steam engines use a boiler to boil water in a pressure vessel. The steam pushes a piston backward and forward, which converts heat energy to reciprocal energy. In steam engines that use centrifugal governors, this reciprocal energy is converted to rotational energy via a wheel connected to a piston. The centrifugal governor provides a physical link backward through the system to the throttle. This means that the speed of rotation controls the throttle, which, in turn, controls the speed of rotation. Physically, this is observed by the balls on the governor flying outward and dropping inward until the system reaches equilibrium.

Monitoring defines the metrics or events that are of interest in advance. For instance, the governor measures the pre-defined metric of drive shaft revolutions. The controllability of the throttle is then provided by the pivot and actuator rod assembly. Assuming the actuator rod is adjusted correctly, the governor should control the throttle from fully open to fully closed.

In contrast, observability is achieved by allowing the internal state of the system to be inferred from its external outputs. If the operating point adjustment is incorrectly set, the governor may spin too fast or too slowly, rendering the throttle control ineffective. A governor spinning too fast or too slowly could also indicate that the sliding ring is stuck in place and needs oiling. Importantly, this insight can be gained without defining in advance what too fast or too slow means. The insight that the governor is spinning too fast or too slowly also needs very little knowledge of the full steam engine.

Fundamentally, both monitoring and observability are used to improve the reliability and performance of the system in question.

Now that we have introduced the high-level concepts, let’s explore a practical example outside of the world of software services.

Case study – A ship passing through the Panama Canal

Let’s imagine a ship traversing the Agua Clara locks on the Panama Canal. This can be illustrated using the following figure:

Figure 1.2 – The Agua Clara locks on the Panama Canal

Figure 1.2 – The Agua Clara locks on the Panama Canal

There are a few aspects of these locks that we might want to monitor:

  • The successful opening and closing of each gate
  • The water level inside each lock
  • How long it takes for a ship to traverse the locks

Monitoring these aspects may highlight situations that we need to be alerted about:

  • A gate is stuck open because of a mechanical failure
  • The water level is rapidly descending because of a leak
  • A ship is taking too long to exit the locks because it is stuck

There may be situations where the data we are monitoring are within acceptable limits, but we can still observe a deviation from what is considered normal, which should prompt further action:

  • A small leak has formed near the top of the lock wall:
    • We would see the water level drop but only when it is above the leak
    • This could prompt maintenance work on the lock wall
  • A gate in one lock is opening more slowly because it needs maintenance:
    • We would see the time between opening and closing the gate increase
    • This could prompt maintenance on the lock gate
  • Ships take longer to traverse the locks when the wind is coming from a particular direction:
    • We could compare hourly average traversal rates
    • This could prompt work to reduce the impact of wind from one direction

Now that we’ve seen an example of measuring a real-world system, we can group these types of measurements into different data types to best suit the application. Let’s introduce those now.

bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY