Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Learning Data Mining with Python
  • Table Of Contents Toc
  • Feedback & Rating feedback
Learning Data Mining with Python

Learning Data Mining with Python

By : Robert Layton
close
close
Learning Data Mining with Python

Learning Data Mining with Python

By: Robert Layton

Overview of this book

This book teaches you to design and develop data mining applications using a variety of datasets, starting with basic classification and affinity analysis. This book covers a large number of libraries available in Python, including the Jupyter Notebook, pandas, scikit-learn, and NLTK. You will gain hands on experience with complex data types including text, images, and graphs. You will also discover object detection using Deep Neural Networks, which is one of the big, difficult areas of machine learning right now. With restructured examples and code samples updated for the latest edition of Python, each chapter of this book introduces you to new algorithms and techniques. By the end of the book, you will have great insights into using Python for data mining and understanding of the algorithms as well as implementations.
Table of Contents (14 chapters)
close
close

Introducing data mining

Data mining provides a way for a computer to learn how to make decisions with data. This decision could be predicting tomorrow's weather, blocking a spam email from entering your inbox, detecting the language of a website, or finding a new romance on a dating site. There are many different applications of data mining, with new applications being discovered all the time.

Data mining is part algorithm design, statistics, engineering, optimization, and computer science. However, combined with these base skills in the area, we also need to apply domain knowledge (expert knowledge)of the area we are applying the data mining. Domain knowledge is critical for going from good results to great results. Applying data mining effectively usually requires this domain-specific knowledge to be integrated with the algorithms.

Most data mining applications work with the same high-level view, where a model learns from some data and is applied to other data, although the details often change quite considerably.

Data mining applications involve creating data sets and tuning the algorithm as explained in the following steps

  1. We start our data mining process by creating a dataset, describing an aspect of the real world. Datasets comprise of the following two aspects:
  • Samples: These are objects in the real world, such as a book, photograph, animal, person, or any other object. Samples are also referred to as observations, records or rows, among other naming conventions.
  • Features: These are descriptions or measurements of the samples in our dataset. Features could be the length, frequency of a specific word, the number of legs on an animal, date it was created, and so on. Features are also referred to as variables, columns, attributes or covariant, again among other naming conventions.
  1. The next step is tuning the data mining algorithm. Each data mining algorithm has parameters, either within the algorithm or supplied by the user. This tuning allows the algorithm to learn how to make decisions about the data.

As a simple example, we may wish the computer to be able to categorize people as short or tall. We start by collecting our dataset, which includes the heights of different people and whether they are considered short or tall:

Person Height Short or tall?
1 155cm Short
2 165cm Short
3 175cm Tall
4 185cm Tall

As explained above, the next step involves tuning the parameters of our algorithm. As a simple algorithm; if the height is more than x, the person is tall. Otherwise, they are short. Our training algorithms will then look at the data and decide on a good value for x. For the preceding data, a reasonable value for this threshold would be 170 cm. A person taller than 170 cm is considered tall by the algorithm. Anyone else is considered short by this measure. This then lets our algorithm classify new data, such as a person with height 167 cm, even though we may have never seen a person with those measurements before.

In the preceding data, we had an obvious feature type. We wanted to know if people are short or tall, so we collected their heights. This feature engineering is a critical problem in data mining. In later chapters, we will discuss methods for choosing good features to collect in your dataset. Ultimately, this step often requires some expert domain knowledge or at least some trial and error.

In this book, we will introduce data mining through Python. In some cases, we choose clarity of code and workflows, rather than the most optimized way to perform every task. This clarity sometimes involves skipping some details that can improve the algorithm's speed or effectiveness.

Create a Note

Modal Close icon
You need to login to use this feature.
notes
bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Delete Note

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Edit Note

Modal Close icon
Write a note (max 255 characters)
Cancel
Update Note

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY