Introduction
An index in a search engine can make or break an application. A well-tuned index with a well-thought out indexing process will not only reduce future maintenance cost, but will also reduce any potentially expensive application failures due to corruption in the data and/or a break down in the data processing pipeline. We will dive into the indexing process more in this chapter to equip you with the knowledge you need to build a stable search application.
So far, we covered the basics of setting up Lucene, injecting data, and configuring the analysis process. In this chapter, we will explore the indexing process to learn more about the advanced techniques in configuring and tuning the process.
Let's review what we've learned already on Lucene's internal index structure so far, regarding the inverted index. Consider the following sentences passing through StandardAnalyzer
before being added to our index:
Humpty Dumpty sat on a wall, Humpty Dumpty had a great fall. All the king's horses...