
Elastic Stack 8.x Cookbook
By :

In this recipe, we are going to learn how to set up and use a specific analyzer for text analysis. Indexing data in Elasticsearch, especially for search use cases, requires that you define how text should be processed before indexation; this is what analyzers accomplish.
Analyzers in Elasticsearch handle tokenization and normalization functions. Elasticsearch offers a variety of ready-made analyzers for common scenarios, as well as language-specific analyzers for English, German, Spanish, French, Hindi, and so on.
In this recipe, we will see how to configure the standard analyzer with the English stopwords filter.
Make sure that you completed the Adding data from the Elasticsearch client recipe. Also, make sure to download the following sample Python script from the GitHub repository: https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook/blob/main/Chapter2/python-client-sample/sampledata_analyzer.py.
The command snippets of this recipe are available at https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook/blob/main/Chapter2/snippets.md#using-analyzer.
In this recipe, you will learn how to configure your Python code to interface with an Elasticsearch cluster, define a custom English text analyzer, create a new index with the analyzer, and verify that the index uses the specified settings.
Let’s look at the provided Python script:
es = Elasticsearch( cloud_id=ES_CID, basic_auth=(ES_USER, ES_PWD) )
movies
index, the script includes code that deletes any such index:if es.indices.exists(index="movies"): print("Deleting existing movies index...") es.options(ignore_status=[404, 400]).indices.delete(index="movies")
index_settings = { "analysis": { "analyzer": { "standard_with_english_stopwords": { "type": "standard", "stopwords": "_english_" } } } }
es.indices.create(index='movies', settings=index_settings)
settings = es.indices.get_settings(index='movies') analyzer_settings = settings['movies']['settings']['index']['analysis'] print(f"Analyzer used for the index: {analyzer_settings}")
$ python sampledata_analyzer.py
Figure 2.10 – The output of the sampledata_analyzer.py script
Alternatively, you can go to Kibana | Dev Tools and issue the following request:
GET /movies/_settings
In the response, you should see the settings currently applied to the movies
index with the configured analyzer, as shown in Figure 2.11:
Figure 2.11 – The analyzer configuration in the index settings
The settings
block of the index configuration is where the analyzer is set. As we are modifying the built-in standard analyzer in our recipe, we will give it a unique name (standard_with_english_stopwords
) and set the type to standard
. Text indexed from this point will undergo analysis by the modified analyzer. To test this, we can use the _analyze
endpoint on the index:
POST movies/_analyze { "text": "A young couple decides to elope.", "analyzer": "standard_with_stopwords" }
It should yield the results shown in Figure 2.12:
Figure 2.12 – The index result of a text with the stopword analyzer
While Elasticsearch offers many built-in analyzers for different languages and text types, you can also define custom analyzers. These allow you to specify how text is broken down and modified for indexing or searching, using components such as tokenizers, token filters, and character filters – either those provided by Elasticsearch or custom ones you create. For example, you can design an analyzer that converts text to lowercase, removes common words, substitutes synonyms, and strips accents.
Reasons for needing a custom analyzer may include the following: