Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Learning Data Mining with Python
  • Toc
  • feedback
Learning Data Mining with Python

Learning Data Mining with Python

By : Robert Layton
3.7 (7)
close
Learning Data Mining with Python

Learning Data Mining with Python

3.7 (7)
By: Robert Layton

Overview of this book

If you are a programmer who wants to get started with data mining, then this book is for you.
Table of Contents (15 chapters)
close
14
Index

Improving accuracy using a dictionary


Rather than just returning the given prediction, we can check whether the word actually exists in our dictionary. If it does, then that is our prediction. If it isn't in the dictionary, we can try and find a word that is similar to it and predict that instead. Note that this strategy relies on our assumption that all CAPTCHA words will be valid English words, and therefore this strategy wouldn't work for a random sequence of characters. This is one reason why some CAPTCHAs don't use words.

There is one issue here—how do we determine the closest word? There are many ways to do this. For instance, we can compare the lengths of words. Two words that have a similar length could be considered more similar. However, we commonly consider words to be similar if they have the same letters in the same positions. This is where the edit distance comes in.

Ranking mechanisms for words

The Levenshtein edit distance is a commonly used method for comparing two short strings...

bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete