
Elasticsearch Essentials
By :

Elasticsearch is a distributed, full text search and analytic engine that is build on top of Lucene, a search engine library written in Java, and is also a base for Solr. After its first release in 2010, Elasticsearch has been widely adopted by large as well as small organizations, including NASA, Wikipedia, and GitHub, for different use cases. The latest releases of Elasticsearch are focusing more on resiliency, which builds confidence in users being able to use Elasticsearch as a data storeage tool, apart from using it as a full text search engine. Elasticsearch ships with sensible default configurations and settings, and also hides all the complexities from beginners, which lets everyone become productive very quickly by just learning the basics.
Lucene is a blazing fast search library but it is tough to use directly and has very limited features to scale beyond a single machine. Elasticsearch comes to the rescue to overcome all the limitations of Lucene. Apart from providing a simple HTTP/JSON API, which enables language interoperability in comparison to Lucene's bare Java API, it has the following main features:
elasticsearch.yml
file and restarting the node. This is possible because each resource (index, document, node, and so on) in Elasticsearch is accessible via a simple URI.There are many more features available in Elasticsearch, such as multitenancy and percolation, which will be discussed in detail in the next chapters.
Elasticsearch is based on a REST design pattern and all the operations, for example, document insertion, deletion, updating, searching, and various monitoring and management tasks, can be performed using the REST endpoints provided by Elasticsearch.
In a REST-based web API, data and services are exposed as resources with URLs. All the requests are routed to a resource that is represented by a path. Each resource has a resource identifier, which is called as URI. All the potential actions on this resource can be done using simple request types provided by the HTTP protocol. The following are examples that describe how CRUD operations are done with REST API:
POST /user fname=Bharvi&lname=Dixit&age=28&id=123
GET /user/123
PUT /user/123 fname=Lavleen
DELETE /user/123
Many Elasticsearch users get confused between the POST
and PUT
request types. The difference is simple. POST
is used to create a new resource, while PUT
is used to update an existing resource. The PUT
request is used during resource creation in some cases but it must have the complete URI available for this.
All the real-world data comes in object form. Every entity (object) has some properties. These properties can be in the form of simple key value pairs or they can be in the form of complex data structures. One property can have properties nested into it, and so on.
Elasticsearch is a document-oriented data store where objects, which are called as documents, are stored and retrieved in the form of JSON. These objects are not only stored, but also the content of these documents gets indexed to make them searchable.
JavaScript Object Notation (JSON) is a lightweight data interchange format and, in the NoSQL world, it has become a standard data serialization format. The primary reason behind using it as a standard format is the language independency and complex nested data structure that it supports. JSON has the following data type support:
Array, Boolean, Null, Number, Object, and String
The following is an example of a JSON object, which is self-explanatory about how these data types are stored in key value pairs:
{ "int_array": [1, 2,3], "string_array": ["Lucene" ,"Elasticsearch","NoSQL"], "boolean": true, "null": null, "number": 123, "object": { "a": "b", "c": "d", "e": "f" }, "string": "Learning Elasticsearch" }
The following are the most common terms that are very important to know when starting with Elasticsearch:
A shard can be either primary or secondary. A primary shard is the one where all the operations that change the index are directed. A secondary shard is the one that contains duplicate data of the primary shard and helps in quickly searching the data as well as for high availability; in a case where the machine that holds the primary shard goes down, then the secondary shard becomes the primary automatically.
Elasticsearch is a search engine in the first place but, because of its rich functionality offerings, organizations have started using it as a NoSQL data store as well. However, it has not been made for maintaining the complex relationships that are offered by traditional relational databases.
If you want to understand Elasticsearch in relational database terms then, as shown in the following image, an index in Elasticsearch is similar to a database that consists of multiple types. A single row is represented as a document, and columns are similar to fields.
Elasticsearch does not have the concept of referential integrity constraints such as foreign keys. But, despite being a search engine and NoSQL data store, it does allow us to maintain some relationships among different documents, which will be discussed in the upcoming chapters.
With these theoretical concepts, we are good to go with learning the practical steps with Elasticsearch.
First of all, you need to be aware of the basic requirements to install and run Elasticsearch, which are listed as follows:
Please go through the following URL to check the JVM and OS dependencies of Elasticsearch: https://www.elastic.co/subscriptions/matrix.
The most common error that comes up if you are using an incompatible Java version with Elasticsearch, is the following:
Exception in thread "main" java.lang.UnsupportedClassVersionError: org/elasticsearch/bootstrap/Elasticsearch : Unsupported major.minor version 51.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:637)
at java.lang.ClassLoader.defineClass(ClassLoader.java:621)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
If you see the preceding error while installing/working with Elasticsearch, it is most probably because you have an incompatible version of JAVA set as the JAVA_HOME
variable or not set at all. Many users install the latest version of JAVA but forget to set the JAVA_HOME
variable to the latest installation. If this variable is not set, then Elasticsearch looks into the following listed directories to find the JAVA and the first existing directory is used:
/usr/lib/jvm/jdk-7-oracle-x64, /usr/lib/jvm/java-7-oracle, /usr/lib/jvm/java-7-openjdk, /usr/lib/jvm/java-7-openjdk-amd64/, /usr/lib/jvm/java-7-openjdk-armhf, /usr/lib/jvm/java-7-openjdk-i386/, /usr/lib/jvm/default-java
Change the font size
Change margin width
Change background colour