-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

The Handbook of NLP with Gensim
By :

As the saying goes, “All great endeavors commence from ground zero,” and NLP’s ground zero is encoding. There are many encoding techniques to effectively represent words with the right contexts or NLU meaning. Let’s start with the three simplest encoding methods – one-hot encoding, BoW, and Bag of N-grams.
We can do one-hot encoding for texts. It is also called count vectorizing. The idea is very simple – we create a vector whose length is the number of unique words in the entire text. At the time of writing this chapter, on a quiet evening, I am listening to the song Never Enough from The Greatest Showman (https://www.imdb.com/title/tt1485796/). Let me use its lyric as an example:
“All the stars we steal from the night sky
Will never be enough, never be enough, never be enough for me.”
Here, there are two sentences. Each sentence will be converted to a vector. The length...