-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Mastering spaCy
By :

Every NLP application consists of several steps of processing the text. As you can see in the first chapter, we have always created instances called nlp
and doc
. But what did we do exactly?
When we call nlp
on our text, spaCy applies some processing steps. The first step is tokenization to produce a Doc
object. The Doc
object is then processed further with a tagger, a parser, and an entity recognizer. This way of processing the text is called a language processing pipeline. Each pipeline component returns the processed Doc
and then passes it to the next component:
Figure 2.1 – A high-level view of the processing pipeline
A spaCy pipeline object is created when we load a language model. We load an English model and initialize a pipeline in the following code segment:
import spacy nlp = spacy.load("en_core_web_md") doc = nlp("I went there")
What happened exactly in the preceding code is as follows...