Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Learning Data Mining with Python
  • Table Of Contents Toc
  • Feedback & Rating feedback
Learning Data Mining with Python

Learning Data Mining with Python

By : Robert Layton
close
close
Learning Data Mining with Python

Learning Data Mining with Python

By: Robert Layton

Overview of this book

This book teaches you to design and develop data mining applications using a variety of datasets, starting with basic classification and affinity analysis. This book covers a large number of libraries available in Python, including the Jupyter Notebook, pandas, scikit-learn, and NLTK. You will gain hands on experience with complex data types including text, images, and graphs. You will also discover object detection using Deep Neural Networks, which is one of the big, difficult areas of machine learning right now. With restructured examples and code samples updated for the latest edition of Python, each chapter of this book introduces you to new algorithms and techniques. By the end of the book, you will have great insights into using Python for data mining and understanding of the algorithms as well as implementations.
Table of Contents (14 chapters)
close
close

Applying of Naive Bayes


We will now create a pipeline that takes a tweet and determines whether it is relevant or not, based only on the content of that tweet.

To perform the word extraction, we will be using the spaCy, a library that contains a large number of tools for performing analysis on natural language. We will use spaCy in future chapters as well.

Note

To get spaCy on your computer, use pip to install the package: pip install spacyIf that doesn't work, see the spaCy installation instructions at https://spacy.io/ for information specific to your platform.

We are going to create a pipeline to extract the word features and classify the tweets using Naive Bayes. Our pipeline has the following steps:

  • Transform the original text documents into a dictionary of counts using spaCy's word tokenization.
  • Transform those dictionaries into a vector matrix using the DictVectorizertransformer in scikit-learn. This is necessary to enable the Naive Bayes classifier to read the feature values extracted...

Unlock full access

Continue reading for free

A Packt free trial gives you instant online access to our library of over 7000 practical eBooks and videos, constantly updated with the latest in tech

Create a Note

Modal Close icon
You need to login to use this feature.
notes
bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Delete Note

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Edit Note

Modal Close icon
Write a note (max 255 characters)
Cancel
Update Note

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY