-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

The Handbook of NLP with Gensim
By :

Since the previously mentioned text preprocessing steps are fundamental to NLP, the NLP community has long sensed the demand for an open source library to benefit more researchers. Thus, spaCy was developed and open sourced. It is designed particularly for production use. Researchers can build applications that process massive volumes of text efficiently. Its NLP pipeline handles all the assigned NLP tasks and then stores the results as attributes to each tokenized word.
Figure 3.1 shows how the nlp()
pipeline of spaCY works. It takes the raw text, tokenizes the text with its tokenizer, tags each tokenized word with its tagger, and so on. The results are stored as attributes:
tes:
Figure 3.1 – The spaCy pipeline
Let’s see what they are:
tokenizer
: This tokenizes the text and turns a string of text into an NLP object.tagger
and parser
: This assigns part-of-speech (PoS) tags and dependency labels. The PoS...