-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Redis Stack for Application Modernization
By :

The core data structures that are available out of the box in the Redis server solve a variety of problems when it comes to mapping entities and relationships. To start with concrete examples of modeling using Redis, the usual option to store an object is the Hash data structure, while collections can be stored using Sets, Sorted Sets, or Lists (among other options because a collection can be modeled in several other ways). In this section, we will introduce the multi-model features of Redis Stack using a comprehensive approach, which may be useful for those who are used to storing data using the relational paradigm, which implies organizing the data in rows and columns of a table.
Consider the requirement to model a list of cities. Using the relational data model, we can define a table using the SQL data definition language (DDL) instruction CREATE TABLE as follows:
CREATE TABLE `city` ( `ID` int NOT NULL AUTO_INCREMENT, `Name` char(35) NOT NULL DEFAULT '', `CountryCode` char(3) NOT NULL DEFAULT '', `District` char(20) NOT NULL DEFAULT '', `Population` int NOT NULL DEFAULT '0', PRIMARY KEY (`ID`), KEY `CountryCode` (`CountryCode`) )
This table definition defines attributes for the city entity and specifies a primary key on an integer identifier (a surrogate key, in this case, provided the uniqueness of the attributes is not guaranteed for the city entity). The DDL command also defines an index on the CountryCode attribute. Data encoding, collation, and the specific technology adopted as the storage engine are not relevant in this context. We are focused on understanding the model and the ability that we have to query it.
Primary key lookup is the most efficient way to access data in a relational table. Filtering the table on the primary key attribute is as easy as executing the SQL SELECT statement:
SELECT * FROM city WHERE ID=653; +-----+--------+-------------+----------+------------+ | ID | Name | CountryCode | District | Population | +-----+--------+-------------+----------+------------+ | 653 | Madrid | ESP | Madrid | 2879052 | +-----+--------+-------------+----------+------------+ 1 row in set (0.00 sec)
Modeling a city using one of the Redis core data structures leads to mapping the data in the SQL table to Hashes, so we can store the attributes as field-value pairs, with the key name including the primary key:
127.0.0.1:6379> HSET city:653 Name "Madrid" CountryCode "ESP" District "Madrid" Population 2879052
The HGETALL command can be used to retrieve the entire hash with minimal overhead (HGETALL has direct access to the value in the Redis keyspace):
HGETALL city:653 1) "Name" 2) "Madrid" 3) "CountryCode" 4) "ESP" 5) "District" 6) "Madrid" 7) "Population" 8) "2879052"
In addition, we can limit the bandwidth usage caused by the entire row transfer to the client and select only specific attributes. The SQL syntax is as follows:
SELECT Name, Population FROM city WHERE ID=653; +--------+------------+ | Name | Population | +--------+------------+ | Madrid | 2879052 | +--------+------------+ 1 row in set (0.00 sec)
In this analogy between the relational model and Redis, the command is HGET (or HMGET for multiple values):
127.0.0.1:6379> HMGET city:653 Name Population 1) "Madrid" 2) "2879052"
While we need to extract data based on the primary key identifier, the solution is at hand in both the relational database and in Redis. Things get more complicated if we want to perform lookup and search queries on the dataset. In the next examples, we’ll see how the complexity and performance of such operations may vary substantially.
Primary key lookups are efficient: after all, the primary key is an index, and it guarantees direct access to the table row. But what if we want to search for cities by filtering on an attribute? Let’s try an indexed search against our relational database over the CountryCode column, which has a secondary index:
mysql> SELECT Name FROM city WHERE CountryCode = "ESP"; +--------------------------------+ | Name | +--------------------------------+ | Madrid | | Barcelona | | [...] | +--------------------------------+ 59 rows in set (0.02 sec)
This is an efficient search because the table defines an index on the CountryCode column. To continue the comparison of the relational database versus Redis, we will need to execute the same query against the stored Hashes. For this demonstration, we will assume that we have migrated the city table to Hashes in the Redis server. By design, Redis has no secondary indexing feature for any of the core data structures, which means that we should scan all the Hashes prefixed by the “city:” namespace, then read the city name from every Hash and check whether it matches our search term. The following example performs a non-blocking scan of the keyspace, filtering on the key name (“city:*”) in batches of configurable size (three, in the example):
127.0.0.1:6379> SCAN 0 MATCH city:* COUNT 3 1) "512" 2) 1) "city:4019" 2) "city:9" 3) "city:103"
The client should now extract the CountryCode value from every city, compare it to the search term, and repeat until the scan is concluded. This is obviously a time-consuming and expensive approach. There are ways to improve the efficiency of such batched operations. We will explore three standard options and then show how to resolve the problem using the Redis Stack capabilities:
We will look at these in detail next.
The first approach to reducing the overhead of the search operation is to use pipelining, which is supported by all major client libraries. Pipelining collects a batch of commands, delivers them to the server, and collects the outputs from the server immediately before returning the result to the client. This option dramatically reduces the latency of the overall operation, as it saves on the roundtrip time to the server (an analogy that works is going to the supermarket once to purchase 30 items rather than going 30 times and purchasing one item on every visit). The pros and cons of pipelining are as follows:
Lua scripting and functions (functions were introduced in Redis 7.0 and represent an evolution of Lua scripting for remote server execution) help to offload the client and remove network latency. The search is local to the server and close to the data (equivalent to the concept of stored procedures). The following function is an example of local search:
#!lua name=mylib local function city_by_cc(keys, args) local match, cursor = {}, "0"; repeat local ret = redis.call("SCAN", cursor, "MATCH", "city:*", "COUNT", 100); local cities = ret[2]; for i = 1, #cities do local keyname = cities[i]; local ccode = redis.call('HMGET',keyname,'Name','CountryCode') if ccode[2] == args[1] then match[#match + 1] = ccode[1]; end; end; cursor = ret[1]; until cursor == "0"; return match; end redis.register_function('city_by_cc', city_by_cc)
In this function, we do the following:
Type the code into the mylib.lua file and import the library as follows:
cat mylib.lua | redis-cli -x FUNCTION LOAD
The function can be invoked using the following command:
127.0.0.1:6379> FCALL city_by_cc 0 "ESP" 1) "A Coru\xf1a (La Coru\xf1a)" 2) "Almer\xeda" [...] 59) "Barakaldo"
The pros and cons of using functions are as follows:
Data scans, wherever they are executed (client or server side), are slow and ineffective in satisfying real-time requirements. This is especially true when the keyspace stores millions of keys or more. An alternative approach for search operations using the Redis core data structures is to create a secondary index. There are many options to do this using Redis collections. As an example, we can create an index of Spanish cities using a Set as follows:
SADD city:esp "Sevilla" "Madrid" "Barcelona" "Valencia" "Bilbao" "Las Palmas de Gran Canaria"
This data structure has interesting properties for our needs. We can retrieve all the Spanish cities in a single command:
127.0.0.1:6379> SMEMBERS city:esp 1) "Madrid" 2) "Sevilla" 3) "Valencia" 4) "Barcelona" 5) "Bilbao" 6) "Las Palmas de Gran Canaria"
Or we can check whether a specific city is in Spain using SISMEMBER, a constant time-complexity command:
127.0.0.1:6379> SISMEMBER city:esp "Madrid" (integer) 1
And we can even search the index for cities having a name that matches a pattern:
127.0.0.1:6379> SSCAN city:esp 0 MATCH B* 1) "0" 2) 1) "Barcelona" 2) "Bilbao"
We can refine our search requirements and design an index that considers the population. In such a case we could use a Sorted Set and Set the population as the score:
127.0.0.1:6379> ZADD city:esp 2879052 "Madrid" 701927 "Sevilla" 1503451 "Barcelona" 739412 "Valencia" 357589 "Bilbao" 354757 "Las Palmas de Gran Canaria" (integer) 6
The main feature of the Sorted Set data structure is that its members are stored in an ordered tree-like structure (Redis uses a skiplist data structure), and with that, it is possible to execute low-complexity range searches. As an example, let’s retrieve Spanish cities with more than 2 million inhabitants:
127.0.0.1:6379> ZRANGE city:esp 2000000 +inf BYSCORE 1) "Madrid"
We can also check whether a city belongs to the index of Spanish cities:
127.0.0.1:6379> ZRANK city:esp Madrid (integer) 5
In the former example, the ZRANK command informs us that the city Madrid belongs to the index and is fifth highest in the ranking. This solution resolves the overhead caused by having to scan the entire keyspace looking for matches.
The drawback of such a manual approach to indexing the data is that indexes need to reflect the data at any time. Considering scenarios where we want to add or remove a city from our database, we need to perform the two operations of removing the city Hash and updating the index, atomically. We can use a Redis transaction to perform atomic changes on both the data and the index:
127.0.0.1:6379> MULTI OK 127.0.0.1:6379(TX)> DEL city:653 QUEUED 127.0.0.1:6379(TX)> ZREM city:esp "Madrid" QUEUED 127.0.0.1:6379(TX)> EXEC 1) (integer) 1 2) (integer) 1
Custom secondary indexes come at a price, though, because complex searches become hard to manage using multiple data structures. Indexes must be maintained, and the complexity of such solutions may get out of hand, putting the consistency of search operations at risk. The pros and cons of using indexing are as follows:
Next, we will examine the capabilities of Redis Stack.
Caching is one of the frequent use cases for which Redis shines as the best-in-class storage solution. This is because it stores data in memory, and offers real-time performance. It is also lightweight, as data structures are optimized to consume little memory. Redis does not need any complex configuration or maintenance and it is open source, so there is no reason not to give it a try. As a real-time data storage, it seems plausible that complex search operations may not be the primary use case users are interested in when using Redis. After all, fast retrieval of data by key is what made Redis so versatile as a cache or as a session store.
However, if in addition to the ability to use core data structures to store the data, we ensure that fast searches can be performed (besides primary key lookup), it is possible to think beyond the basic caching use case and start looking at Redis as a full-fledged database, capable of high-speed searches.
So far, we have presented simple and common search problems and both solutions using the traditional SQL approach and possible data modeling strategies using Redis core data structures. In the following sections, we will show how Redis Stack resolves query and search use cases and extends the core features of Redis with an integrated modeling and developing experience. We will introduce the following capabilities:
Let’s discuss each of these capabilities in detail.
Redis Stack complements Redis with the ability to create secondary indexes on Hashes or JSON documents, the two document types supported by Redis Stack. The search examples seen so far can be resolved with the indexing features. To perform an indexed search, we create an index against the hashes modeling the cities using the following syntax:
FT.CREATE city_idx ON HASH PREFIX 1 city: SCHEMA Name AS name TEXT CountryCode AS countrycode TAG SORTABLE Population AS population NUMERIC SORTABLE
The FT.CREATE command instructs the server to perform the following operations:
As soon as the indexing operation against the relevant data – all the keys prefixed by “hash:”– is completed, we can execute the queries and searches seen so far, and more. The syntax in the following example executes a search of all the cities with the value “ESP” in the TAG field type and returns only the name of the cities, sorted in lexicographical order. Finally, the first three results are returned using the LIMIT option. Note that this query is executed against the new city_idx index, and not directly against the data:
127.0.0.1:6379> FT.SEARCH city_idx '@countrycode:{ESP}' RETURN 1 name SORTBY name LIMIT 0 3 1) (integer) 59 2) "city:670" 3) 1) "name" 2) "A Coru\xc3\xb1a (La Coru\xc3\xb1a)" 4) "city:690" 5) 1) "name" 2) "Albacete" 6) "city:687" 7) 1) "name" 2) "Alcal\xc3\xa1 de Henares"
It is possible to combine several textual queries/filters in the same index. Using exact-match and full-text search, we can verify whether Madrid is a Spanish city:
127.0.0.1:6379> FT.SEARCH city_idx '@name:Madrid @countrycode:{ESP}' RETURN 1 name 1) (integer) 1 2) "city:653" 3) 1) "name" 2) "Madrid"
In a previous example, the range search was executed using the ZRANGE data structure. Using the indexing capability of Redis Stack, we can execute range searches using the NUMERIC field type. So, if we want to retrieve the Spanish cities with more than 2 million inhabitants, we will write the following search query:
127.0.0.1:6379> FT.SEARCH city_idx '@countrycode:{ESP}' FILTER population 2000000 +inf RETURN 1 name 1) (integer) 1 2) "city:653" 3) 1) "name" 2) "Madrid"
Redis Stack offers flexibility and concise syntax to combine several field types, of which we have seen only a limited but representative number of examples. Once the index is created, the user can go ahead and use it, and add new documents or update existing ones. The database maintains the indexes updated synchronously as soon as documents are created or changed.
Besides full-text, exact-match, and range searches, we can also perform data aggregation (as we would in a relational database using the GROUP BY statement). If we would like to retrieve the three most populated countries, sorted in descending order, we would solve the problem in SQL as follows:
SELECT CountryCode, SUM(Population) AS sum FROM city GROUP BY CountryCode ORDER BY sum DESC LIMIT 3; +-------------+-----------+ | CountryCode | sum | +-------------+-----------+ | CHN | 175953614 | | IND | 123298526 | | BRA | 85876862 | +-------------+-----------+ 3 rows in set (0.01 sec)
We can perform complex aggregations with the FT.AGGREGATE command. Using the following command, we can perform a real-time search and aggregation to compute the total population of the top three countries by summing up the inhabitants of the cities per country:
127.0.0.1:6379> FT.AGGREGATE city_idx * GROUPBY 1 @countrycode REDUCE SUM 1 @population AS sum SORTBY 2 @sum DESC LIMIT 0 3 1) (integer) 232 2) 1) "countrycode" 2) "chn" 3) "sum" 4) "175953614" 3) 1) "countrycode" 2) "ind" 3) "sum" 4) "123298526" 4) 1) "countrycode" 2) "bra" 3) "sum" 4) "85876862"
To summarize this brief introduction where we addressed the search and aggregation capabilities, it is worth mentioning that there are multiple types of searches, such as phonetic matching, auto-completion suggestions, geo searches, or a spellchecker to help design great applications. We will cover them in depth in Chapter 5, Redis Stack as a Document Store, where we showcase Redis Stack as a document store.
Besides modeling objects as Hash, it is possible to store, update, and retrieve JSON documents. The JSON format needs no introduction, as it permeates data pipelines including heterogeneous subsystems, protocols, databases, and so on. Redis Stack delivers this capability out of the box and manages JSON documents in a similar way to Hashes, which means that it is possible to store, index, and search JSON objects and work with them using JSONPath syntax:
JSON.SET city:653 $ '{"Name":"Madrid", "CountryCode":"ESP", "District":"Madrid", "Population":2879052}' JSON.SET city:5 $ '{"Name":"Amsterdam", "CountryCode":"NLD", "District":"Noord-Holland", "Population":731200}' JSON.SET city:1451 $ '{"Name":"Tel Aviv-Jaffa", "CountryCode":"ISR", "District":"Tel Aviv", "Population":348100}'
127.0.0.1:6379> JSON.GET city:653 "{\"Name\":\"Madrid\",\"CountryCode\":\"ESP\",\"District\":\"Madrid\",\"Population\":2879052}"
127.0.0.1:6379> JSON.GET city:653 $.Name "[\"Madrid\"]" 127.0.0.1:6379> JSON.GET city:653 $.Name $.CountryCode "{\"$.Name\":[\"Madrid\"],\"$.CountryCode\":[\"ESP\"]}"
FT.CREATE city_idx ON JSON PREFIX 1 city: SCHEMA $.Name AS name TEXT $.CountryCode AS countrycode TAG SORTABLE $.Population AS population NUMERIC SORTABLE
127.0.0.1:6379> FT.SEARCH city_idx '@countrycode:{ESP}' FILTER population 2000000 +inf RETURN 1 name 1) (integer) 1 2) "city:653" 3) 1) "name" 2) "Madrid"
Unlike Hash documents, the JSON supports nested levels (up to 128) and can store properties, objects, arrays, and geographical locations at any level in a tree-like structure, so the JSON format opens up a variety of use cases using a compact and flexible data structure.
Time series databases do not need any long introduction: they are data structures that can store data points happening at a certain time, indicated by a Unix timestamp expressed in milliseconds, with an associated numeric data value, typically with double precision. This data structure applies to many use cases, such as monitoring entities over time or tracking user activities for a determined service. Redis Stack has an integrated time series database that offers many useful features to manage the data points, for querying and searching, and provides convenient formatting commands for data processing and visualization. Beginning with time series modeling is straightforward:
TS.CREATE "app:monitor:temp"
127.0.0.1:6379> "TS.ADD" "app:monitor:temp" "*" "20" (integer) 1675632813307 127.0.0.1:6379> "TS.ADD" "app:monitor:temp" "*" "20" (integer) 1675632818179 127.0.0.1:6379> "TS.ADD" "app:monitor:temp" "*" "20" (integer) 1675632824174 127.0.0.1:6379> "TS.ADD" "app:monitor:temp" "*" "20.1" (integer) 1675632829519 127.0.0.1:6379> "TS.ADD" "app:monitor:temp" "*" "20" (integer) 1675632835052
127.0.0.1:6379> "TS.RANGE" "app:monitor:temp" "1675632818179" "1675632829519" 1) 1) (integer) 1675632818179 2) 20 2) 1) (integer) 1675632824174 2) 20 3) 1) (integer) 1675632829519 2) 20.1
We have just scratched the surface of using time series with Redis Stack, because data may be aggregated, down-sampled, and indexed to address many different uses.
Deterministic data structures – all those structures that store and return the same data that was stored (such as Strings, Sets, Hashes, and the rest of Redis structures) – are a good solution for standard amounts of data, but they may become inadequate due to the constantly growing volumes of data that systems must handle. Redis offers several options to store and present data to extract different types of insights. Strings are an example because they can be encoded as integers and used as counters:
127.0.0.1:6379> INCR cnt (integer) 1 127.0.0.1:6379> INCRBY cnt 3 (integer) 4
Strings can also be managed down to the bit level to store multiple integer counters of variable length and stored at different offsets of a single string to reduce storage overheads using the bitfield data structure:
127.0.0.1:6379> BITFIELD cnt INCRBY i5 0 5 1) (integer) 5 127.0.0.1:6379> BITFIELD cnt INCRBY i5 0 5 1) (integer) 10 127.0.0.1:6379> BITFIELD cnt GET i5 0 1) (integer) 10
Regular counters, sets, and hash tables perform well for any amount of data but handling large amounts of data represents a challenge to scale the resources of the machine where Redis Stack is running, because of its memory requirements.
Deterministic data structures have given way to probabilistic data structures because of the need to scale up to large quantities of data and give a reasonably approximated answer to questions such as the following:
In the attempt to give an answer to the first question in the list, we could calculate the hash of the URL of the visited page and store it in a Redis collection, such as a Set, and then retrieve the cardinality of the structure using the SCARD command. While this solution works very well (and is deterministically exact), scaling it to many users and many visited pages represents a cost.
Let’s consider an example with a probabilistic data structure. HyperLogLog estimates the cardinality of a set with minimal memory usage and computational overhead without compromising the accuracy of the results, while consuming only a fraction of memory and CPU, so you would count the visited pages and get an estimation as follows:
127.0.0.1:6379> PFADD pages "https://redis.com/" "https://redis.io/docs/stack/bloom/" "https://redis.io/docs/data-types/hyperloglogs/" (integer) 1 127.0.0.1:6379> PFCOUNT pages (integer) 3
Redis reports the following memory usage for HyperLogLog:
127.0.0.1:6379> MEMORY USAGE pages (integer) 96
Attempting to resolve the same problem using a Set and storing the hashes for these URLs would be done as follows:
127.0.0.1:6379> SADD hashpages "522195171ed14f78e1f33f84a98f0de6" "f5518a82f8be40e2994fdca7f71e090d" "c4e78b8c136f6e1baf454b7192e89cd1" (integer) 3 127.0.0.1:6379> MEMORY USAGE hashpages (integer) 336
Probabilistic data structures trade accuracy for time and space efficiency and give an answer to this and other questions by addressing several data analysis problems against big amounts of data and, most relevantly, efficiently.
Redis Stack embeds a serverless engine for event-driven data processing allowing users to write and run their own functions on data stored in Redis. The functions are implemented in JavaScript and executed by the engine upon user invocation or in response to events such as changes to data, execution of commands, or when events are added to a Redis Stream data structure. It is also possible to configure timed executions, so periodical maintenance operations can be scheduled.
Redis Stack minimizes the execution time by running the functions as close as possible to the data, improving data locality, minimizing network congestion, and increasing the overall throughput of the system.
With this capability, it is possible to implement event-driven data flows, thus opening the doors to many use cases, such as the following:
#!js api_version=1.0 name=lib redis.registerFunction('hello', function(){ return 'Hello Gears!'; });
redis-cli -x TFUNCTION LOAD < ./lib.js
127.0.0.1:6379> TFCALL lib.hello 0 "Hello Gears!"
redis.registerKeySpaceTrigger("key_logger", "user:", function(client, data){ if (data.event == 'del'){ client.call("INCR", "removed"); redis.log(JSON.stringify(data)); redis.log("A user has been removed"); } });
In this function, we do the following:
127.0.0.1:6379> HSET user:123 name "John" last "Smith" (integer) 2 127.0.0.1:6379> DEL user:123 (integer) 1
299:M 05 Feb 2023 19:13:09.004 * <redisgears_2> {"event":"del","key":"user:123","key_raw":{}} 299:M 05 Feb 2023 19:13:09.005 * <redisgears_2> A user has been removed
And the counter has increased:
127.0.0.1:6379> GET removed "1"
Through this book, we will come to understand the differences between Lua scripts, Redis functions, and JavaScript functions, and we will explore the many possible programmability features along with proposals to resolve challenging problems with simple solutions.
Redis Stack combines the speed and stability of the Redis server with a set of well-established capabilities and integrates them into a compact solution that is easy to install and manage – Redis Stack Server. The RedisInsight desktop application is a visualization tool and data manager that complements Redis Stack Server with a set of functionalities useful for visualizing data stored by different models as well as providing interactive tutorials with popular examples, and more.
To complete the picture, the Redis Stack Client SDK includes the most popular client libraries to develop against Redis Stack in the Java, Python, and JavaScript programming languages.
Figure 1.1 – The Redis Stack logo
Redis Stack empowers users with the liberty to use it for free in development and production environments and merges the open source BSD-licensed Redis with search and query capabilities, JSON support, time series handling, and probabilistic data structures. It is available under a dual license, specifically the Redis Source Available License (RSALv2) and the Server Side Public License (SSPL).
So, in a few examples, we have introduced new possibilities to modernize applications, and now we owe you an answer to the original question, “What is Redis Stack?”
To define what Redis Stack is, we need to go back for a moment to its origins, because Redis is the spinal cord of Redis Stack. Redis was born as in-memory storage to accelerate massive amounts of queries and achieve sub-millisecond latency while optimizing memory usage and maximizing the ease of adoption and administration. It appeared at the same time as other solutions taking part in the NoSQL wave and deviating from relational modeling. While the key-value Memcached store was an already established solution, Redis became popular too as a type of key-value storage. So, we can surely say that Redis Stack can be used as a key-value store.
However, considering Redis Stack as a simple key-value data store is reductive. Redis is best known for its flexibility in storing collections such as Hashes, Sets, Sorted Sets or Lists, Bitmaps and Bitfields, Streams, HyperLogLog probabilistic data structures, and geo indexes. And, together with data structures, its efficient low-complexity algorithms make storing and searching data a joy for developers. We can certainly say that Redis Stack is also a data structure store.
The features introduced so far are integrated into Redis Stack Server and extend the Redis server, turning the data structure server into a multi-model database. This provides a rich data modeling experience where multiple heterogeneous data structures such as documents, vectors, and time series coexist in the same database. Software architects will appreciate the variety of possibilities for designing new solutions without multiple specialized databases and software developers will be empowered with a rich set of client libraries that improve the ease of software design. Database administrators will discover how shallow the learning curve is to learn to administer a single database rather than installing, configuring, and maintaining several data stores.
The characteristics discussed so far, together with stream processing and the possibility to execute JavaScript functions for event-driven development, push Redis Stack beyond the boundaries of the multi-model database definition. Combining Redis, the key-value data store that is popular as a cache, with advanced data structures and multi-model design, and with the capability of a message broker with event-driven programming features, turns Redis Stack into a powerful data platform.
We have completed the Redis Stack walk-through, and to conclude this chapter, we will briefly discuss how to install it using different methods.
Change the font size
Change margin width
Change background colour