
Elasticsearch 8.x Cookbook
By :

Using explicit mapping makes it possible to start to quickly ingest the data using a schemaless approach without being concerned about field types. Thus, to achieve better results and performance in indexing, it's required to manually define a mapping.
Fine-tuning mapping brings some advantages, such as the following:
Elasticsearch allows you to use base fields with a wide range of configurations.
You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.
To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. I suggest using the Kibana console, which provides code completion and better character escaping for Elasticsearch.
To execute this recipe's examples, you will need to create an index with a test
name, where you can put mappings, as explained in the Using explicit mapping creation recipe.
Let's use a semi real-world example of a shop order for our eBay-like shop:
Figure 2.1 – Example of an order
order
record must be converted into an Elasticsearch mapping definition, as follows:PUT test/_mapping { "properties" : { "id" : {"type" : "keyword"}, "date" : {"type" : "date"}, "customer_id" : {"type" : "keyword"}, "sent" : {"type" : "boolean"}, "name" : {"type" : "keyword"}, "quantity" : {"type" : "integer"}, "price" : {"type" : "double"}, "vat" : {"type" : "double", "index": false} } }
Now, the mapping is ready to be put in the index. We will learn how to do this in the Putting a mapping in an index recipe of Chapter 3, Basic Operations.
Field types must be mapped to one of the Elasticsearch base types, and options on how the field must be indexed need to be added.
The following table is a reference for the mapping types:
Figure 2.2 – Base type mapping
Depending on the data type, it's possible to give explicit directives to Elasticsearch when you're processing the field for better management. The most used options are as follows:
store
(default false
): This marks the field to be stored in a separate index fragment for fast retrieval. Storing a field consumes disk space but reduces computation if you need to extract it from a document (that is, in scripting and aggregations). The possible values for this option are true
and false
. They are always retuned as an array of values for consistency.The stored fields are faster than others in aggregations.
index
: This defines whether or not the field should be indexed. The possible values for this parameter are true
and false
. Index fields are not searchable (the default is true
).null_value
: This defines a default value if the field is null.boost
: This is used to change the importance of a field (the default is 1.0
).boost
works on a term level only, so it's mainly used in term, terms, and match queries.
search_analyzer
: This defines an analyzer to be used during the search. If it's not defined, the analyzer of the parent object is used (the default is null
).analyzer
: This sets the default analyzer to be used (the default is null
).norms
: This controls the Lucene norms. This parameter is used to score queries better. If the field is only used for filtering, it's a best practice to disable it to reduce resource usage (true
for analyzed fields and false
for not_analyzed
ones).copy_to
: This allows you to copy the content of a field to another one to achieve functionalities, similar to the _all
field.ignore_above
: This allows you to skip the indexing string if it's bigger than its value. This is useful for processing fields for exact filtering, aggregations, and sorting. It also prevents a single term token from becoming too big and prevents errors due to the Lucene term's byte-length limit of 32,766. The maximum suggested value is 8191
(https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-above.html).From Elasticsearch version 6.x onward, as shown in the Using explicit mapping creation recipe, the explicit inferred type for a string is a multifield mapping:
text
. This mapping allows textual queries (that is, term, match, and span queries). In the example provided in the Using explicit mapping creation recipe, this was name
.keyword
subfield is used for keyword
mapping. This field can be used for exact term matching and aggregation and sorting. In the example provided in the Using explicit mapping creation recipe, the referred field was name.keyword
.Another important parameter, available only for text
mapping, is term_vector
(the vector of terms that compose a string). Please refer to the Lucene documentation for further details at https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/index/Terms.html.
term_vector
can accept the following values:
no
: This is the default value; that is, skip term vector.yes
: This is the store term vector.with_offsets
: This is the store term vector with a token offset (start, end position in a block of characters).with_positions
: This is used to store the position of the token in the term vector.with_positions_offsets
: This stores all the term vector data.with_positions_payloads
: This is used to store the position and payloads of the token in the term vector.with_positions_offsets_payloads
: This stores all the term vector data with payloads.Term vectors allow fast highlighting but consume disk space due to storing additional text information. It's a best practice to only activate it in fields that require highlighting, such as title or document content.
You can refer to the following sources for further details on the concepts of this chapter: