Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Elasticsearch 7.0 Cookbook
  • Toc
  • feedback
Elasticsearch 7.0 Cookbook

Elasticsearch 7.0 Cookbook

By : Alberto Paro
4 (2)
close
Elasticsearch 7.0 Cookbook

Elasticsearch 7.0 Cookbook

4 (2)
By: Alberto Paro

Overview of this book

Elasticsearch is a Lucene-based distributed search server that allows users to index and search unstructured content with petabytes of data. With this book, you'll be guided through comprehensive recipes on what's new in Elasticsearch 7, and see how to create and run complex queries and analytics. Packed with recipes on performing index mapping, aggregation, and scripting using Elasticsearch, this fourth edition of Elasticsearch Cookbook will get you acquainted with numerous solutions and quick techniques for performing both every day and uncommon tasks such as deploying Elasticsearch nodes, integrating other tools to Elasticsearch, and creating different visualizations. You will install Kibana to monitor a cluster and also extend it using a variety of plugins. Finally, you will integrate your Java, Scala, Python, and big data applications such as Apache Spark and Pig with Elasticsearch, and create efficient data applications powered by enhanced functionalities and custom plugins. By the end of this book, you will have gained in-depth knowledge of implementing Elasticsearch architecture, and you'll be able to manage, search, and store data efficiently and effectively using Elasticsearch.
Table of Contents (19 chapters)
close

Setting up an ingestion node

The main goals of Elasticsearch are indexing, searching, and analytics, but it's often required to modify or enhance the documents before storing them in Elasticsearch.

The following are the most common scenarios in this case: 

  • Preprocessing the log string to extract meaningful data
  • Enriching the content of textual fields with Natural Language Processing (NLP) tools
  • Enriching the content using machine learning (ML) computed fields
  • Adding data modification or transformation during ingestion, such as the following:
    • Converting IP in geolocalization
    • Adding datetime fields at ingestion time
    • Building custom fields (via scripting) at ingestion time

Getting ready

You need a working Elasticsearch installation, as described in the Downloading and installing Elasticsearch recipe, as well as a simple text editor to change configuration files.

How to do it…

To set up an ingest node, you need to edit the config/elasticsearch.yml file and set up the ingest property to trueas follows:

node.ingest: true
Every time you change your elasticsearch.yml file, a node restart is required.

How it works…

The default configuration for Elasticsearch is to set the node as an ingest node (refer to Chapter 12, Using the Ingest module, for more information on the ingestion pipeline).

As the coordinator node, using the ingest node is a way to provide functionalities to Elasticsearch without suffering cluster safety.

If you want to prevent a node from being used for ingestion, you need to disable it with node.ingest: false. It's a best practice to disable this in the master and data nodes to prevent ingestion error issues and to protect the cluster. The coordinator node is the best candidate to be an ingest one.

If you are using NLP, attachment extraction (via, attachment ingest plugin), or logs ingestion, the best practice is to have a pool of coordinator nodes (no master, no data) with ingestion active.

The attachment and NLP plugins in the previous version of Elasticsearch were available in the standard data node or master node. These give a lot of problems to Elasticsearch due to the following reasons:

  • High CPU usage for NLP algorithms that saturates all CPU on the data node, giving bad indexing and searching performances
  • Instability due to the bad format of attachment and/or Apache Tika bugs (the library used for managing document extraction)
  • NLP or ML algorithms require a lot of CPU or stress the Java garbage collector, decreasing the performance of the node

The best practice is to have a pool of coordinator nodes with ingestion enabled to provide the best safety for the cluster and ingestion pipeline.

There's more…

Having known about the four kinds of Elasticsearch nodes, you can easily understand that a waterproof architecture designed to work with Elasticsearch should be similar to this one:

bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete