Mastering Apache Solr 7.x

Mastering Apache Solr 7.x

By : Sandeep Nair, Chintan Mehta, Dharmesh Vasoya

Buy this Book

Mastering Apache Solr 7.x

By: Sandeep Nair, Chintan Mehta, Dharmesh Vasoya

Buy this Book

Overview of this book

Apache Solr is the only standalone enterprise search server with a REST-like application interface. providing highly scalable, distributed search and index replication for many of the world's largest internet sites. To begin with, you would be introduced to how you perform full text search, multiple filter search, perform dynamic clustering and so on helping you to brush up the basics of Apache Solr. You will also explore the new features and advanced options released in Apache Solr 7.x which will get you numerous performance aspects and making data investigation simpler, easier and powerful. You will learn to build complex queries, extensive filters and how are they compiled in your system to bring relevance in your search tools. You will learn to carry out Solr scoring, elements affecting the document score and how you can optimize or tune the score for the application at hand. You will learn to extract features of documents, writing complex queries in re-ranking the documents. You will also learn advanced options helping you to know what content is indexed and how the extracted content is indexed. Throughout the book, you would go through complex problems with solutions along with varied approaches to tackle your business needs. By the end of this book, you will gain advanced proficiency to build out-of-box smart search solutions for your enterprise demands.

Title Page

Packt Upsell

Contributors

Preface

Free Chapter

Introduction to Solr 7

Introduction to Solr

Why choose Solr?

Solr use cases

What's new in Solr 7?

Summary

Getting Started

Solr installation

Understanding various files and the folder structure

Running Solr

Loading sample data

Understanding the browse interface

Using the Solr admin interface

Summary

Designing Schemas

How Solr works

Understanding field types

Field management

Mastering Schema API

Deciphering schemaless mode

Summary

Mastering Text Analysis Methodologies

Understanding text analysis

Understanding analyzer

Understanding tokenizers

Understanding filters

Understanding multilingual analysis

Understanding phonetic matching

Summary

Data Indexing and Operations

Basics of Solr indexing

Understanding index handlers

Apache Tika and indexing

Language detection

Client APIs

Summary

Advanced Queries – Part I

Search relevance

Velocity search UI

Query parsing and syntax

Response writer

Faceting

Highlighting

Summary

Advanced Queries – Part II

Summary

Managing and Fine-Tuning Solr

JVM configuration

Managing solrconfig.xml

Managing backups

JMX with Solr

Logging configuration

SolrCloud overview

Enabling SSL – Solr security

Performance statistics

Summary

Client APIs – An Overview

Client API overview

JavaScript Client API

SolrJ Client API

Ruby Client API

Python Client API

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Introduction to Solr

Solr is one of the most popular enterprise search servers and is widely used across the world. It is written based on Java and uses the Lucene Java search library. Solr is an open source project from Apache Software Foundation (ASF) and is amazingly fast, scalable, and ideal for searching relevant data. Some of the major Solr users are Netfix, SourceForge, Instagram, CNET, and Flipkart. You can check out more such use cases at https://wiki.apache.org/solr/PublicServers.

Some of the features included are as follows:

Full-text search
Faceted search
Dynamic clustering
GEO search
Hit highlighting
Near-real-time indexing
Rich document handling
Geospatial search
Structured Query Language (SQL) support
Textual search
Rest API
JSON, XML, PHP, Ruby, Python, XSLT, velocity, and custom Java binary output formats over HTTP
GUI admin interface
Replication
Distributed search
Caching of queries, documents, and filters
Auto-suggest
Streaming
Many more features

Solr has enabled many such Internet sites, government sites, and Intranet sites too, providing solutions for e-commerce, blogs, science, research, and so on. Solr can index billions of documents/rows via XML, JSON, CSV, or HTTP APIs. It can secure your data with the help of authentication and can be drilled down to role-based authentication. Solr is now an integral part of many big data solutions too.

History of Solr

Doug Cutting created Lucene in 2000, which is the core technology behind Solr.

Solr was made in 2004 by Yonik Seeley at CNET Networks for a homegrown project to provide search capability for the CNET Networks website.

Later in 2006, CNET Networks published the Solr source code to ASF. By early 2007, Solr had found its place in some of the top projects. It was then that Solr kept on adding new features to attract customers and contributors.

Solr 1.3 was released in September 2008. It included major performance enhancements and features such as distributed search.

In January 2009, Yonik Seeley, Grant Ingersoll, and Erik Hatcher joined Lucidworks; they are the prime faces of Solr and enterprise search. Lucidworks started providing commercial support and training for Solr.

Solr 1.4 was released in November 2009. Solr had never stopped providing enhancements; 1.4 was no exception, with indexing, searching, faceting, rich document processing, database integration, plugins, and more.

In 2011, Solr versioning was revised to match up with the versions of Lucene. Sometime in 2010, the Lucence and Solr projects were merged; Solr had then became an integral subproject of Lucene. Solr downloads were still available separately; however, it was developed together by the same set of contributors. Solr was then marked as 3.1.

Solr 4.0 was released in October 2012, which introduced the SolrCloud feature. There were a number of follow-ups released over a couple of years in the 4.x line. Solr kept on adding new features, becoming more scalable and further focusing on reliability.

Solr 5.0 was released in February 2015. It was with this release that official support for the WAR bundle package ended. It was packaged as a standalone application. And later, in version 5.3, it also included an authentication and authorization framework.

Solr 6.0 was released in April 2016. It included support for executing parallel SQL queries across SolrCloud. It also included stream expression support and JDBC driver for the SQL interface.

Finally, Solr 7.0 was released in September 2017, followed by 7.1.0 in October 2017, as shown in the following diagram. We will discuss the new features as we move ahead in this chapter, in the What is new in Solr 7 section.

We have depicted the history of Solr in the preceding image for a much better view and understanding.

So by now, we have a brief understanding of Solr, along with its history. We must also have a good understanding of why we should be using Solr. Let's get the answer to this question too.

Lucene – the backbone of Solr

Lucene is an open source project that provides text search engine libraries. It is widely adopted for many search engine technologies. It has strong community contributions, which makes it much stronger as a technology backend. Lucene is a simple code library that you can use to write your own code by using the API available for searching, indexing, and much more.

For Lucene, a document consists of a collection of fields; they are name-value pairs consisting of either text or numbers. Lucene can be configured as a text analyzer that tokenizes a field’s text to a series of words. It can also do further processing, such as substituting with synonyms or other similar processes. Lucene stores its index on the disk of the server, which consists of indexing for each of the documents. The index is an inverted index that stores the mapping of a field to its relevant document, along with the position of the word from the text of the document. Once you have the index in place, you can search for documents with the input of a query string that is parsed accordingly to Lucence. Lucene manages to score a value for each of the relevant documents and the ones that are high-scoring documents are displayed.

Mastering Apache Solr 7.x

By : Sandeep Nair, Chintan Mehta, Dharmesh Vasoya

Mastering Apache Solr 7.x

By: Sandeep Nair, Chintan Mehta, Dharmesh Vasoya

Overview of this book

Related Content you might be interested in

Current Title:

Mastering Apache Solr 7.x

Elasticsearch 7 Quick Start Guide

Mastering Elasticsearch 5.x

Introduction to Solr

History of Solr

Lucene – the backbone of Solr