Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • PostgreSQL 12 High Availability Cookbook
  • Toc
  • feedback
PostgreSQL 12 High Availability Cookbook

PostgreSQL 12 High Availability Cookbook

By : Shaun Thomas
4.5 (2)
close
PostgreSQL 12 High Availability Cookbook

PostgreSQL 12 High Availability Cookbook

4.5 (2)
By: Shaun Thomas

Overview of this book

Databases are nothing without the data they store. In the event of an outage or technical catastrophe, immediate recovery is essential. This updated edition ensures that you will learn the important concepts related to node architecture design, as well as techniques such as using repmgr for failover automation. From cluster layout and hardware selection to software stacks and horizontal scalability, this PostgreSQL cookbook will help you build a PostgreSQL cluster that will survive crashes, resist data corruption, and grow smoothly with customer demand. You’ll start by understanding how to plan a PostgreSQL database architecture that is resistant to outages and scalable, as it is the scaffolding on which everything rests. With the bedrock established, you'll cover the topics that PostgreSQL database administrators need to know to manage a highly available cluster. This includes configuration, troubleshooting, monitoring and alerting, backups through proxies, failover automation, and other considerations that are essential for a healthy PostgreSQL cluster. Later, you’ll learn to use multi-master replication to maximize server availability. Later chapters will guide you through managing major version upgrades without downtime. By the end of this book, you’ll have learned how to build an efficient and adaptive PostgreSQL 12 database cluster.
Table of Contents (17 chapters)
close

Picking redundant copies

How many database servers should any architecture have as part of the inherent design? There are several factors that contribute to this answer, including the design of the final architecture itself. The number of redundant data copies ultimately determines how many nodes must exist, irrespective of whether we require more data centers, irrespective of whether we should account for latency, and so on.

The goal in this recipe is to consult our needs to derive a node count that won't break the bank, but still deliver the level of availability we want. In other words, aside from our primary data node, we will explain how to figure out the number of redundant data nodes necessary to adequately represent the entire cluster safely and ensure high availability.

Getting ready

Luckily, this recipe is a simple counting exercise. The only necessary elements are a healthy imagination and perhaps some idea of the budgetary constraints before we begin. Just consider that, for any of our reference designs, we will always require more than one server node as a minimum.

How to do it...

Observe the following steps when considering node counts driven by high availability architectures:

  1. Always add one separate server for backups.
  2. Always allocate one server for a logical or physical replica.
  3. For automated failover, allocate the following:
  • An additional small VM / node to act as a voter
  • OR a fully qualified replica
  1. For every active data center beyond the first two, allocate one replica.
  2. If non-local data access latency is a concern, allocate the following:
  • An additional replica in the primary location
  • An additional replica in each location for symmetric clusters

How it works...

Why do we demand at least one backup server? The full answer to this question actually has its own recipe in this chapter. However, catastrophic failure is a fact of life and we must be ready for such an event. Even if the separate server is not a fully operational PostgreSQL node, it must exist and should be part of the reference design.

Likewise, we must have at least one PostgreSQL replica. Some of our designs work with either physical or logical replicas, so we won't differentiate between them here. Simply assume that every highly active PostgreSQL cluster must have at least two nodes that can fulfill the role of a primary database. Backups take time to restore, whereas replicas are generally writable in a minute or less.

One replica only really covers the case where switching from one PostgreSQL node to the alternate is a manual procedure. Fully automated failure detection mechanisms require an odd number of nodes for voting purposes. This third node can either be a mere voting entity, or a full PostgreSQL replica. We cover this in greater depth in the Considering quorum recipe.

Once we start accounting for multiple geographic locations at different data centers, things don't change exceptionally. By now, we have at least one PostgreSQL replica that is probably at the first alternate location. If we have three or more active data centers where the application is using PostgreSQL, we'll want a local replica for each.

Then, consider the implications of limiting ourselves to merely one PostgreSQL node per location. This means any minor upgrade or other maintenance task will mean switching to an alternate data center while the maintenance is active. This can introduce unwanted latency that will affect the application. To reduce this, add one replica to the primary location to account for this effect. For symmetrical data centers that have no primary location, add a PostgreSQL replica to each location for the same reasons.

As a quick example, consider two scenarios. Our first company (Company A) only uses two data centers, and doesn't need automated database failover, nor is it necessary to worry about downtime caused by minor upgrades. In this case, they decided to use two PostgreSQL servers and a backup system. This is a minimum of three nodes related to PostgreSQL, and their cluster looks like this:

Company A

In the second case (Company B), we have a much more demanding financial institution that requires all three of their data centers to be active at all times. They chose to have one Primary PostgreSQL server, two Replicas per data center, a Witness node, and a Backup server. In that extreme case, they used a total of eight nodes dedicated to PostgreSQL.

Their cluster would look like this:

Company B
Nodes denoted here are specifically meant to be running PostgreSQL. Architectures discussed in this chapter will include various abstraction layers and, as a result, will likely require even more nodes in the cluster. PostgreSQL nodes tend to be much larger and do far more work, so are the focus of this recipe.

There's more...

Chosen node counts will naturally suggest certain architectures over others. The reason we're performing this recipe so early is to get an idea of compatible architectures. It may be necessary to revise these counts once we learn information from other recipes in this chapter.

bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete