
Natural Language Processing with Java
By :

Boolean retrieval works fine, but it only gives output in binary; it says the term matches or is not in the document, which works well if there are only a limited number of documents. If the number of documents increases, the results generated are difficult for humans to follow. Consider a search term, X is searched for in 1 million documents, out of which half return positive results. The next phase is to order the documents on some basis, such as rank or some other mechanism, to show the results.
If the rank is required, then the document needs to attach some kind of score, which is given by a search engine. For a normal user, writing a Boolean query itself is a difficult task, where they have to make a query using and, or, and not. In real-time, the queries can be simple as single words query and as complex as a sentence containing lots of words.
The vector space model can be divided into three stages:
Change the font size
Change margin width
Change background colour