This recipe is a thought experiment that should help make clear what machine learning does. Recall the Training your own language model classifier recipe in Chapter 1, Simple Classifiers, to train your own sentiment classifier in the recipe. Consider what a conservative approach to the same problem might be—build Map<String,String>
from the inputs to the correct class. This recipe will explore how this might work and what its consequences might be.
Brace yourself; this will be spectacularly stupid but hopefully informative.
Enter the following in the command line:
java -cp lingpipe-cookbook.1.0.jar:lib/lingpipe-4.1.0.jar:lib/opencsv-2.4.jar com.lingpipe.cookbook.chapter3.OverfittingClassifier
The usual anemic prompt appears, with some user input:
Training Type a string to be classified. Empty string to quit. When all else fails #Disney Category is: e
It correctly gets the language as
e
or English. However, everything else is about to fail. Next, we will...