-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Neural Search - From Prototype to Production with Jina
By :

The previous section provided an overview of the representation and principles of dense vectors. This section will focus on the application of these vectors. During our daily work and study, all files will have a unique modality, such as a text, image, audio, or video file, and so on. If documents of any modality can be represented by dense vectors and then mapped to the same vector space, it is possible to compare the cross-modal similarity. This also allows us to use one modality to search for data in another modality.
This scenario was first extensively put into practice in the field of e-commerce with the common use of image search, for example. Its major application in this field includes having a product photo and hunting for related or similar products offline and online.
The e-commerce search primarily consists of steps such as the following:
During preprocessing, techniques such as resizing, normalization, and semantic segmentation may first be employed to process images. Resizing and normalization enable the input image to match the input format of the pre-trained neural network. Semantic segmentation has the function of removing background noise from the image and leaving only the product itself. Of course, we need to pre-train a neural pathway for feature extraction, which will be elaborated on shortly. By the same token, if the dataset of an e-commerce product to be retrieved has a large amount of noise, such as a large number of buildings, pedestrians, and so on in the background of fashion photos, it will be necessary to train a semantic segmentation model that can help us accurately extract the product profile from photos.
During feature extraction, a fully connected (FC) layer of deep learning is generally used as a feature extractor. The common backbone models of deep learning are AlexNet, VGGNet, Inception, and ResNet. These models are usually pre-trained on a large-scale dataset (such as the ImageNet dataset) to complete classification tasks. Transfer learning is carried out with the dataset in the e-commerce field in a bid to make the feature extractor suitable for the field, such as the feature extraction of fashion. Currently, a feature extractor with deep learning techniques at its core can be regarded as a global feature extractor. In some applications, traditional computer vision features, such as SIFT or VLAD, are employed for the extraction of local features and fusion with global features to enhance vector representation. The global feature will transform the preprocessed image into a dense vector representation.
When users make a query based on the search for images with images, the keyword used for the query is also an image. The system will generate a dense vector representation of that image. Then, users will be able to find the most similar image by comparing the dense vector of the image to be queried against those of all images in the library. This is feasible in theory. However, in reality, with the rapid increase in the number of commodities, there may be tens of millions of dense vectors of indexed images. As a result, the comparison of vectors in a pair-wise manner will fail to meet the user’s requirements for a quick response from the retrieval system.
Therefore, large-scale similarity search techniques, such as product quantization, are generally used to divide the vector to be searched into multiple buckets and perform a quick match based on the buckets by minimizing the recall rate and greatly speeding up the vector-matching process. Therefore, this technique is also commonly referred to as approximate nearest neighbor, or ANN retrieval. Commonly used ANN libraries include the FAISS, which is maintained by Facebook, and Annoy, maintained by Spotify.
Likewise, the search for images by images in an e-commerce scenario is also applicable to other scenarios, such as Tourism Landmark Retrieval (using pictures of tourist attractions to quickly locate other pictures of that attraction or similar tourist attractions), or Celebrity Retrieval (used to find photos of celebrities and retrieve their pictures). In the field of search engines, there are many such applications, which are collectively referred to as reverse image search.
Another interesting application is question answering. Neural-network-based search systems could be powerful when building a question-answering (QA) system on different tasks. First, the questions and answers that are currently available are taken as a training dataset on which to develop a pre-trained model of texts. When the user enters a question, the pre-trained model is employed to encode the question into a dense vector representation, conduct similarity matching in the dense vector representation of the existing repository of answers, and quickly help users find the answer to a question. Second, many question-answering systems, such as Quora, StackOverflow, and Zhihu, already have a large number of previously asked questions. When a user wants to ask a question, the question-answering system first determines whether the question has already been asked by someone else. If so, the user will be advised to click and check the answers to similar questions instead of repeating the query. This also involves similarity match, which is normally referred to as deduplication or paraphrase identification.
Meanwhile, in the real world, a large number of unexplored applications can be completed using neural information retrieval. For instance, if you employ text to search for untagged music, it is necessary to map the text and music representation to the same vector space. Then, the appearance time of scenarios in the video can be located using images. Conversely, when a user is watching a video, a product that appears in the video is retrieved and the purchase can be completed. Alternatively, deep learning can be carried out for specialized data retrieval, such as source code retrieval, DNA sequence retrieval, and more!
Change the font size
Change margin width
Change background colour