Setting expectations with RPO

RPO is a common term in business continuity known as Recovery Point Objective. In the context of a database system, it describes the amount of data that may be lost following an unexpected outage before it is once again operational. It's important to understand this at an early stage because it will drive decisions such as node count, data synchronization methods, and backup technologies.

In this recipe, we will examine the ingredients for concocting a comprehensive RPO that will influence the PostgreSQL cluster composition itself.

Getting ready

The first thing we need to do is set expectations. These are most often defined by upper management or some other decision-making entity. Data loss is never desirable but is unavoidable in catastrophic scenarios. How much data loss can the business tolerate under these circumstances? Seconds, minutes, or hours' worth?

This recipe will mainly focus on information gathering from key individuals, so make sure it's possible to at least email anyone involved with the application stack. Hardware purchases depend on budget proposals, so it may even be necessary to interact with VP and C-level executives as well. Even if we don't do this right away, try to determine the extent of influence available to you.

How to do it...

Since we're dealing with many vectors, we should iterate them if possible. Try to follow a process like this:

Seek the input of major decision makers:

VP and C-level executives involved with technology
Product manager
Application designers and architects
Infrastructure team lead

Find an amount of time that will satisfy most or all of the above.
Follow the rest of the advice in this chapter to find a suitable architecture.
Try to determine a rough cost for this and the closest alternative.
Present one or more designs and cost estimates to decision makers.
Document the final RPO decision and architecture as reference material.

How it works...

Decision makers such as the technology VP, CEO, CTO, and such are the final word in most cases. Their input is vital and should be considered a requirement before ever taking a step further. Keep in mind that these people are likely not familiar with the technical feasibility of their demands at this extreme implementation level. When asked a question such as How much data can we lose in a major outage? they're probably going to say None! Regardless, this is a vital first step for reasons that will shortly become apparent.

Then, we simply traverse the stack of people who helped define the features the application stack fulfills, those who designed and implemented it, and whoever may be in charge of the requisite hardware and network where everything runs. Perhaps the design has a built-in tolerance for certain amounts of loss. Perhaps inherent queues or caches act as a sort of buffer for data backend difficulties. Maybe the design assumes there are multiple data systems all ingesting the same stream for redundancy. The architecture and those who built it are the best sources of this information.

Once we know the maximum amount of data the backend can lose before being restored, we must apply what we learn from the rest of this chapter and choose one or two best-case designs that can deliver that promise. The point here is that we will be executing this recipe several times until everyone agrees to all inherent design costs and limitations before continuing.

The best way to estimate cost is to take the chosen database server architectures and iterate a gross cost for each element. The next chapter on Hardware Planning describes in detail how to do this. We don't have to be exact here; the goal is to have some numbers we can present to decision makers. Do they still want zero RPO if it costs 10x as much as ten seconds of data loss? Are they willing to compromise on a hybrid design?

Once we have chosen a final structure, possibly the most important step is to produce a document describing that architecture, why it was chosen, the known limitations, and the RPO it delivers. Present this document to decision makers and encourage them to sign it if possible. Save it in any corporate documentation management system available, and make sure it's one of the first things people see regarding the database cluster layer. This document will single-handedly answer multiple questions about the capabilities of the database cluster, all while acting as a reference specification.

There's more...

RPO is considered a vital part of business continuity planning. Entire books have been written on this subject, and what we've presented here is essentially a functional summary. The subject is deep and varied, rich with its own inherent techniques beyond simply architecture and design. It is the language of business and resource management, so it can be a key component when interacting with decision makers.

Learning these concepts in depth can help influence the overall application stack to a more sustainable long-term structure. We'll cover more of these techniques in this chapter, but don't be afraid to proactively incorporate these techniques into your repertoire.

PostgreSQL 12 High Availability Cookbook

By : Shaun Thomas

PostgreSQL 12 High Availability Cookbook

By: Shaun Thomas

Overview of this book

Setting expectations with RPO

Getting ready

How to do it...

How it works...

There's more...

Unlock full access

Continue reading for free

PostgreSQL 12 High Availability Cookbook

By : Shaun Thomas

PostgreSQL 12 High Availability Cookbook

By: Shaun Thomas

Overview of this book

Unlock full access

Continue reading for free

Create a Note

Delete Bookmark

Delete Note

Edit Note

Confirmation

Buy this book with your credits?