
Apache Solr for Indexing Data
By :

Solr provides us with a way to prevent duplicate or nearly duplicate elements to get indexed using a signature/fingerprint field. It natively provides a deduplication technique of this type via the signature class, and this can further be used to implement new hash and signature implementations.
Let's see how we can implement deduplication in Solr. We'll use our musicCatalog
core, which we used in the previous chapter as well, and will modify it:
Copy the musicCatalog
core and create a new core called musicCatalog-dedupe
from it. After we have created the new core, we'll change schema.xml
to add a signature field that will contain the document signature/fingerprint:
<!-- Field to store the fingerprint/signature --> <field name="signature" type="string" indexed="true" stored="true" required="true" multiValued="false" />
After adding the field, we'll add a new UpdateRequestProcessor
element to solrconfig.xml
configuration file, which will...
Change the font size
Change margin width
Change background colour