Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Elastic Stack 8.x Cookbook
  • Toc
  • feedback
Elastic Stack 8.x Cookbook

Elastic Stack 8.x Cookbook

By : Huage Chen, Yazid Akadiri
5 (3)
close
Elastic Stack 8.x Cookbook

Elastic Stack 8.x Cookbook

5 (3)
By: Huage Chen, Yazid Akadiri

Overview of this book

Learn how to make the most of the Elastic Stack (ELK Stack) products—including Elasticsearch, Kibana, Elastic Agent, and Logstash—to take data reliably and securely from any source, in any format, and then search, analyze, and visualize it in real-time. This cookbook takes a practical approach to unlocking the full potential of Elastic Stack through detailed recipes step by step. Starting with installing and ingesting data using Elastic Agent and Beats, this book guides you through data transformation and enrichment with various Elastic components and explores the latest advancements in search applications, including semantic search and Generative AI. You'll then visualize and explore your data and create dashboards using Kibana. As you progress, you'll advance your skills with machine learning for data science, get to grips with natural language processing, and discover the power of vector search. The book covers Elastic Observability use cases for log, infrastructure, and synthetics monitoring, along with essential strategies for securing the Elastic Stack. Finally, you'll gain expertise in Elastic Stack operations to effectively monitor and manage your system.
Table of Contents (16 chapters)
close

Using an analyzer

In this recipe, we are going to learn how to set up and use a specific analyzer for text analysis. Indexing data in Elasticsearch, especially for search use cases, requires that you define how text should be processed before indexation; this is what analyzers accomplish.

Analyzers in Elasticsearch handle tokenization and normalization functions. Elasticsearch offers a variety of ready-made analyzers for common scenarios, as well as language-specific analyzers for English, German, Spanish, French, Hindi, and so on.

In this recipe, we will see how to configure the standard analyzer with the English stopwords filter.

Getting ready

Make sure that you completed the Adding data from the Elasticsearch client recipe. Also, make sure to download the following sample Python script from the GitHub repository: https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook/blob/main/Chapter2/python-client-sample/sampledata_analyzer.py.

The command snippets of this recipe are available at https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook/blob/main/Chapter2/snippets.md#using-analyzer.

How to do it…

In this recipe, you will learn how to configure your Python code to interface with an Elasticsearch cluster, define a custom English text analyzer, create a new index with the analyzer, and verify that the index uses the specified settings.

Let’s look at the provided Python script:

  1. At the beginning of the script, we create an instance of the Elasticsearch client:
    es = Elasticsearch(
        cloud_id=ES_CID,
        basic_auth=(ES_USER, ES_PWD)
    )
  2. To ensure that we do not use an existing movies index, the script includes code that deletes any such index:
    if es.indices.exists(index="movies"):
        print("Deleting existing movies index...")
        es.options(ignore_status=[404, 400]).indices.delete(index="movies")
  3. Next, we define the analyzer configuration:
    index_settings = {
        "analysis": {
            "analyzer": {
                "standard_with_english_stopwords": {
                    "type": "standard",
                    "stopwords": "_english_"
                }
            }
        }
    }
  4. We then create the index with settings that define the analyzer:
    es.indices.create(index='movies', settings=index_settings)
  5. Finally, to verify the successful addition of the analyzer, we retrieve the settings:
    settings = es.indices.get_settings(index='movies')
    analyzer_settings = settings['movies']['settings']['index']['analysis']
    print(f"Analyzer used for the index: {analyzer_settings}")
  6. After reviewing the script, execute it with the following command, and you should see the output shown in Figure 2.10:
    $ python sampledata_analyzer.py
Figure 2.10 – The output of the sampledata_analyzer.py script

Figure 2.10 – The output of the sampledata_analyzer.py script

Alternatively, you can go to Kibana | Dev Tools and issue the following request:

GET /movies/_settings

In the response, you should see the settings currently applied to the movies index with the configured analyzer, as shown in Figure 2.11:

Figure 2.11 – The analyzer configuration in the index settings

Figure 2.11 – The analyzer configuration in the index settings

How it works...

The settings block of the index configuration is where the analyzer is set. As we are modifying the built-in standard analyzer in our recipe, we will give it a unique name (standard_with_english_stopwords) and set the type to standard. Text indexed from this point will undergo analysis by the modified analyzer. To test this, we can use the _analyze endpoint on the index:

POST movies/_analyze
{
  "text": "A young couple decides to elope.",
  "analyzer": "standard_with_stopwords"
}

It should yield the results shown in Figure 2.12:

Figure 2.12 – The index result of a text with the stopword analyzer

Figure 2.12 – The index result of a text with the stopword analyzer

There’s more…

While Elasticsearch offers many built-in analyzers for different languages and text types, you can also define custom analyzers. These allow you to specify how text is broken down and modified for indexing or searching, using components such as tokenizers, token filters, and character filters – either those provided by Elasticsearch or custom ones you create. For example, you can design an analyzer that converts text to lowercase, removes common words, substitutes synonyms, and strips accents.

Reasons for needing a custom analyzer may include the following:

  • Handling various languages and scripts that require special processing, such as Chinese, Japanese, and Arabic
  • Enhancing the relevance and comprehensiveness of search results using synonyms, stemming, lemmatization, and so on
  • Unifying text by removing punctuation, whitespace, and accents and making it case-insensitive
bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete