
PostgreSQL Replication, Second Edition
By :

You might wonder why theory is being covered in such a prominent place in a book that is supposed to be highly practical. Well, there is a very simple reason for that: some nice-looking marketing papers of some commercial database vendors might leave you with the impression that everything is possible and easy to do, without any serious limitation. This is not the case; there are physical limitations that every software vendor has to cope with. There is simply no way around the laws of nature, and shiny marketing cannot help overcome nature. The laws of physics and logic are the same for everybody, regardless of the power of someone's marketing department.
In this section, you will be taught the so-called CAP theorem. Understanding the basic ideas of this theorem is essential to avoid some requirements that cannot be turned into reality.
The CAP theorem was first described by Eric Brewer back in the year 2000. It has quickly developed into one of the most fundamental concepts in the database world. Especially with the rise of NoSQL database systems, Brewer's theorem (as the CAP theorem is often called) has become an important cornerstone of every distributed system.
Before we dig into the details, we need to discuss what CAP actually means. CAP is an abbreviation for the following three concepts:
Why are these three concepts relevant to normal users? Well, the bad news is that a replicated (or distributed) system can provide only two out of these three features at the same time.
Keep in mind that only two out of the three promises can be fulfilled.
It is theoretically impossible to offer consistency, availability, and partition tolerance at the same time. As you will see later in this book, this can have a significant impact on system layouts that are safe and feasible to use. There is simply no such thing as the solution to all replication-related problems. When you are planning a large-scale system, you might have to come up with different concepts, depending on needs that are specific to your requirements.
PostgreSQL, Oracle, DB2, and so on will provide you with CAp ("consistent" and "available"), while NoSQL systems, such as MongoDB and Cassandra, will provide you with cAP ("available" and "partition tolerant"). This is why NoSQL is often referred to as eventually consistent.
Consider a financial application. You really want to be consistent and partition tolerant. Keeping balances in sync is the highest priority.
Or consider an application collecting a log of weather data from some remote locations. If the data is a couple of minutes late, it is really no problem. In this case, you might want to go for cAP. Availability and partition tolerance might really be the most important things in this case.
Depending on the use, people have to decide what is really important and which attributes (consistency, availability, or partition tolerance) are crucial and which can be neglected.
Keep in mind there is no system which can fulfill all those wishes at the same time (neither open source nor paid software).
The speed of light is not just a theoretical issue; it really does have an impact on your daily life. And more importantly, it has a serious implication when it comes to finding the right solution for your cluster.
We all know that there is some sort of cosmic speed limit called the speed of light. So why care? Well, let's do a simple mental experiment. Let's assume for a second that our database server is running at 3 GHz clock speed.
How far can light travel within one clock cycle of your CPU? If you do the math, you will figure out that light travels around 10 cm per clock cycle (in pure vacuum). We can safely assume that an electric signal inside a CPU will be very slow compared to pure light in vacuum. The core idea is, "10 cm in one clock cycle? Well, this is not much at all."
For the sake of our mental experiment, let's now consider various distances:
Considering the size of a CPU core on a die, you can assume that you can send a signal (even if it is not traveling anywhere close to the speed of light) from one part of the CPU to some other part quite fast. It simply won't take 1 million clock cycles to add up two numbers that are already in your first level cache on your CPU.
But what happens if you have to send a signal from one server to some other server and back? You can safely assume that sending a signal from server A to server B next door takes a lot longer because the cable is simply a lot longer. In addition to that, network switches and other network components will add some latency as well.
Let's talk about the length of the cable here, and not about its bandwidth.
Sending a message (or a transaction) from Europe to China is, of course, many times more time-consuming than sending some data to a server next door. Again, the important thing here is that the amount of data is not as relevant as the so-called latency, consider the following criteria:
The basic problems of latency described in this section are not PostgreSQL-specific. The very same concepts and physical limitations apply to all types of databases and systems. As mentioned before, this fact is sometimes silently hidden and neglected in shiny commercial marketing papers. Nevertheless, the laws of physics will stand firm. This applies to both commercial and open source software.
The most important point you have to keep in mind here is that bandwidth is not always the magical fix to a performance problem in a replicated environment. In many setups, latency is at least as important as bandwidth.
Change the font size
Change margin width
Change background colour