Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Learning Data Mining with Python
  • Table Of Contents Toc
  • Feedback & Rating feedback
Learning Data Mining with Python

Learning Data Mining with Python

By : Robert Layton
3.7 (7)
close
close
Learning Data Mining with Python

Learning Data Mining with Python

3.7 (7)
By: Robert Layton

Overview of this book

If you are a programmer who wants to get started with data mining, then this book is for you.
Table of Contents (15 chapters)
close
close
14
Index

Character n-grams


We saw how function words can be used as features to predict the author of a document. Another feature type is character n-grams. An n-gram is a sequence of n objects, where n is a value (for text, generally between 2 and 6). Word n-grams have been used in many studies, usually relating to the topic of the documents. However, character n-grams have proven to be of high quality for authorship attribution.

Character n-grams are found in text documents by representing the document as a sequence of characters. These n-grams are then extracted from this sequence and a model is trained. There are a number of different models for this, but a standard one is very similar to the bag-of-words model we have used earlier.

For each distinct n-gram in the training corpus, we create a feature for it. An example of an n-gram is <e t>, which is the letter e, a space, and then the letter t (the angle brackets are used to denote the start and end of the n-gram and aren't part of it)....

Unlock full access

Continue reading for free

A Packt free trial gives you instant online access to our library of over 7000 practical eBooks and videos, constantly updated with the latest in tech

Create a Note

Modal Close icon
You need to login to use this feature.
notes
bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Delete Note

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Edit Note

Modal Close icon
Write a note (max 255 characters)
Cancel
Update Note

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY