Obtaining a common analyzer
Lucene provides a set of default analyzers in the lucene-analyzers-common
package. Let's take a look at them in detail.
Getting ready
The following are five common analyzers Lucene provides in the lucene-analyzers-common
module:
WhitespaceAnalyzer
: Splits text at whitespaces, just as the name indicates. In fact, this is the only thing this analyzer does.SimpleAnalyzer
: Splits text at non-letter characters and lowercases resulting tokens.StopAnalyzer
: Splits text at non-letter characters, lowercases resulting tokens, and removes stopwords. This analyzer is useful for pure text content and is not ideal if the content contains words with special characters such as product model number. This analyzer comes with a default set of stopwords but you can always have the provision to provide your own set of stopwords.StandardAnalyzer
: Splits text using a grammar-based tokenization, normalizes and lowercases tokens, removes stopwords, and discards punctuations. It can be...