-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Data Processing with Optimus
By :

When using text in machine learning, we need to convert text to a list of features a machine learning algorithm can understand. This means that we need to convert text to numbers. To accomplish this, there are two approaches that can be used with Optimus:
Let's see how you can use these methods in Optimus.
In the bag of words approach, we take all the words and then count the number of occurrences of each word.
After counting the number of occurrences of each word, because a corpus can have millions of words, it can be useful to select the most frequent word in the text, as shown in the following figure:
Figure 9.2 – Bag of words example
To apply bag of words in Optimus, you can use the following code:
_df = df.cols.bag_of_words("text")
This returns a big dataframe with all the strings as column names and the word count in every row. Because...