Implementing the BM25 model
Let's take a look at how we use the BM25 model in Lucene. Lucene implements this model as BM25Similarity. We can start using this model as simply as instantiating it with default parameters. The constructor accepts two parameters for tuning. The first parameter controls nonlinear term frequency normalization. Its default value is 1.2. The second parameter controls to what degree a document length normalizes the tf
values.
How to do It…
Here we have our sample code to demonstrate how to use BM25Similarity;
StandardAnalyzer analyzer = new StandardAnalyzer(); Directory directory = new RAMDirectory(); IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer); BM25Similarity similarity = new BM25Similarity(1.2f, 0.75f); config.setSimilarity(similarity); IndexWriter indexWriter = new IndexWriter(directory, config); Document doc = new Document(); TextField textField = new TextField("content", "", Field.Store.YES); String[] contents = {"Humpty Dumpty sat...