Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Observability with Grafana
  • Table Of Contents Toc
  • Feedback & Rating feedback
Observability with Grafana

Observability with Grafana

By : Rob Chapman, Peter Holmes
4 (4)
close
close
Observability with Grafana

Observability with Grafana

4 (4)
By: Rob Chapman, Peter Holmes

Overview of this book

To overcome application monitoring and observability challenges, Grafana Labs offers a modern, highly scalable, cost-effective Loki, Grafana, Tempo, and Mimir (LGTM) stack along with Prometheus for the collection, visualization, and storage of telemetry data. Beginning with an overview of observability concepts, this book teaches you how to instrument code and monitor systems in practice using standard protocols and Grafana libraries. As you progress, you’ll create a free Grafana cloud instance and deploy a demo application to a Kubernetes cluster to delve into the implementation of the LGTM stack. You’ll learn how to connect Grafana Cloud to AWS, GCP, and Azure to collect infrastructure data, build interactive dashboards, make use of service level indicators and objectives to produce great alerts, and leverage the AI & ML capabilities to keep your systems healthy. You’ll also explore real user monitoring with Faro and performance monitoring with Pyroscope and k6. Advanced concepts like architecting a Grafana installation, using automation and infrastructure as code tools for DevOps processes, troubleshooting strategies, and best practices to avoid common pitfalls will also be covered. After reading this book, you’ll be able to use the Grafana stack to deliver amazing operational results for the systems your organization uses.
Table of Contents (22 chapters)
close
close
1
Part 1: Get Started with Grafana and Observability
5
Part 2: Implement Telemetry in Grafana
10
Part 3: Grafana in Practice
15
Part 4: Advanced Applications and Best Practices of Grafana

Introducing the user personas of observers

Observability deals with understanding a system, identifying whether something is wrong with that system, and understanding why it is wrong. But what do we mean by understanding a system? The simple answer would be knowing the state of a single application or infrastructure component.

In this section, we will introduce the user personas that we will use throughout this book. These personas will help to distinguish the different types of questions that people use observability systems to answer.

Let’s take a quick look at the user personas that will be used throughout the book as examples, and their roles:

Name and role

Description

Diego Developer

Frontend, backend, full stack, and so on

Ophelia Operator

SRE, DevOps, DevSecOps, customer success, and so on

Steven Service

Service manager and other tasks

Pelé Product

Product manager, product owner, and so on

A picture containing vector graphics

Description automatically generated

Masha Manager

Manager, senior leadership, and so on

Table 1.1 – User persona introductions

Now let’s look at each of these users in greater detail.

Diego Developer

Diego Developer works on many types of systems, from frontend applications that customers directly interact with, to backend systems that let his organization store data in ways that delight its customers. You might even find him working on platforms that other developers use to get their applications integrated, built, delivered, and deployed safely and speedily.

Goals

He writes great software that is well tested and addresses customers’ actual needs.

Interactions

When he is not writing code, he works with Ophelia Operator to address any questions and issues that occur.

Pelé Product works in his team and provides insight into the customer’s needs. They work together closely, taking those needs and turning them into detailed plans on how to deliver software that addresses them.

Steven Service is keen to ensure that the changes Diego makes are not impacting customer commitments. He’s also the one who wakes Diego up if there is an incident that needs attention. The data provided to Masha Manager gives her a breakdown of costs. When Diego is working on developer platforms, he also collects data that helps her get investment from the business into teams that are not performing as expected.

Needs

Diego really needs easy-to-use libraries for the languages he uses to instrument the code he produces. He does not have time to become an expert. He wants to be able to add a few lines of code and get results quickly.

Having a clear standard for acceptable performance measures makes it easy for him to get the right results.

Pain points

When Diego’s systems produce too much data, he finds it difficult to sort signal from noise. He also gets frustrated having to change his code because of an upstream decision to change tooling.

Ophelia Operator

Ophelia Operator works in an operations-focused environment. You might find her in a customer-facing role or as part of a development team as a DevOps engineer. She could be part of a group dedicated to the reliability of an organization’s systems, or she could be working in security or finance to ensure the business runs securely and smoothly.

Goals

Ophelia wants to make sure a product is functioning as expected. She also likes it when she is not woken up early in the morning by an incident.

Interactions

Ophelia will work a lot with Diego Developer; sometimes it’s escalating customer tickets when she doesn’t have the data available to understand the problem; at other times it’s developing runbooks to keep the systems running. Sometimes she will need to give Diego clear information on acceptable performance measures so that her team can make sure systems perform well for customers.

Steven Service works closely with Ophelia. They work together to ensure there are not many incidents, and that they are quickly resolved. Steven makes sure that business data on changes and incidents is tracked, and tweaks processes when things aren’t working.

Pelé Product likes to have data showing the problematic areas of his products.

Needs

Good data is necessary to do the job effectively. Being able to see that a customer has encountered an error can make the difference between resolving a problem straight away or having them wait maybe weeks for a response.

During an incident seeing that a new version of a service was deployed at the time a problem started can change an hours-long incident into a brief blip, and keep customers happy.

Pain points

Getting continuous alerts but not being empowered to fix the underlying issue is a big problem. Ophelia has seen colleagues burn out, and it makes her want to leave the organization when this happens.

Steven Service

Steven Service works in service delivery. He is interested in making sure the organization’s services are delivered smoothly. Jumping in on critical incidents and coordinating actions to get them resolved as quickly as possible is part of the job. So is ensuring that changes are made using processes that help others do it as safely as possible. Steven also works with third parties who provide services that are critical to the running of the organization.

Goals

He wants services to run as smoothly as possible so that the organization can spend more time focused on customers.

Interactions

Diego Developer and Ophelia Operator work a lot with the change management processes created by Steven and the support processes he manages. Having accurate data to hand during change management really helps to make the process as smooth as possible.

Steven works very closely with Masha Manager to make sure she has access to data showing where processes are working smoothly and where they need to spend time improving them.

Needs

He needs to be able to compare the delivery of different products and provide that data to Masha and the business.

During incidents, he needs to be able to get the right people on the call as quickly as possible and keep a record of what happened for the incident post-mortem.

Pain points

Being able to identify the right person to get on a call during an incident is a common problem he faces. Seeing incidents drag on while different systems are compared and who can fix the problem is argued about is also a big concern to him.

Pelé Product

Pelé Product works in the product team. You’ll find him working with customers to understand their needs, keeping product roadmaps in order, and communicating requirements back to developers such as Diego Developer so they can build them. You might also find him understanding and shaping the product backlog for the internal platforms used by developers in the organization.

Goal

Pelé wants to understand customers, give them products that delight them, and keep them coming back.

Interactions

He spends a lot of time working with Diego when they can look at the same information to really understand what customers are doing and how they can help them do it better.

Ophelia Operator and Steven Service help Pelé keep products on track. If too many incidents occur, they ask everyone to refocus on getting stability right. There is no point in providing customers with lots of features on a system that they can’t trust.

Pelé works closely with Masha Manager to ensure the organization has the right skills in the teams that build products. The business depends on her leadership to make sure that these developers have the best tools to help them get their code live in front of customers where it can be used.

Needs

Pelé needs to be able to understand customers’ pain points even when they do not articulate them clearly during user research.

He needs data that gives him a common language with Diego and Ophelia. Sometimes they can get too focused on specific numbers such as shaving off a couple of milliseconds from a request, when improving a poor workflow would improve the customer experience more significantly.

Pain points

Pelé hates not being able to see at a high level what customers are doing. Understanding which bits of an application have the most usage, and which bits are not used at all, lets him know where to focus time and resources.

While customers never tell him they want stability, if it’s not there they will lose trust very quickly and start to look at alternatives.

Masha Manager

Masha works in management. You might find her leading a team and working closely with them daily. She also represents middle management, setting strategy and making tactical choices, and she is involved, to some extent, in senior leadership. Much of her role involves managing budgets and people. If something can make that process easier, then she is usually interested in hearing about it. What Masha does not want to do is waste the organization’s money, because that can directly impact jobs.

Goals

Her primary goals are to keep the organization running smoothly and ensure the budget is balanced.

Interactions

As a leader, Masha needs accurate data and needs to be able to trust the teams who provide that data. The data could be the end-to-end cycle time of feature concept to delivery from Pelé Product, the lead time for changes from Diego Developer, or even the MTTR from Steven Service. Having that data helps her to understand where focus and resources can have the biggest impact.

Masha works regularly with the financial operations staff and needs to make sure they have accurate information on the organization’s expenditure and the value that expenditure provides.

Needs

She needs good data in a place where she can view it and make good decisions. This usually means she consumes information from a business intelligence system. To use such tools effectively, she needs to be clear on what the organization’s goals are, so that the correct data can be collected to help her understand how her teams are tracking to that goal.

She also needs to know that the teams she is responsible for have the correct data and tools to excel in their given areas.

Pain points

High failure rates and long recovery time usually result in her having to speak with customers to apologize. Masha really hates these calls!

Poor visibility of cloud systems is a particular concern. Masha has too many horror stories of huge overspending caused by a lack of monitoring; she would rather spend that budget on something more useful.

You now know about the customers who use observability data, and the types of data you will be using to meet their needs. As the main focus of this book is on Grafana as the underlying technology, let’s now introduce the tools that make up the Grafana stack.

bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY