Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Learning Data Mining with Python
  • Toc
  • feedback
Learning Data Mining with Python

Learning Data Mining with Python

By : Robert Layton
close
Learning Data Mining with Python

Learning Data Mining with Python

By: Robert Layton

Overview of this book

This book teaches you to design and develop data mining applications using a variety of datasets, starting with basic classification and affinity analysis. This book covers a large number of libraries available in Python, including the Jupyter Notebook, pandas, scikit-learn, and NLTK. You will gain hands on experience with complex data types including text, images, and graphs. You will also discover object detection using Deep Neural Networks, which is one of the big, difficult areas of machine learning right now. With restructured examples and code samples updated for the latest edition of Python, each chapter of this book introduces you to new algorithms and techniques. By the end of the book, you will have great insights into using Python for data mining and understanding of the algorithms as well as implementations.
Table of Contents (14 chapters)
close

Using Python and the Jupyter Notebook

In this section, we will cover installing Python and the environment that we will use for most of the book, the Jupyter Notebook. Furthermore, we will install the NumPy module, which we will use for the first set of examples.

The Jupyter Notebook was, until very recently, called the IPython Notebook. You'll notice the term in web searches for the project. Jupyter is the new name, representing a broadening of the project beyond using just Python.

Installing Python

The Python programming language is a fantastic, versatile, and an easy to use language.

For this book, we will be using Python 3.5, which is available for your system from the Python Organization's website https://www.python.org/downloads/. However, I recommend that you use Anaconda to install Python, which you can download from the official website at https://www.continuum.io/downloads.

There will be two major versions to choose from, Python 3.5 and Python 2.7. Remember to download and install Python 3.5, which is the version tested throughout this book. Follow the installation instructions on that website for your system. If you have a strong reason to learn version 2 of Python, then do so by downloading the Python 2.7 version. Keep in mind that some code may not work as in the book, and some workarounds may be needed.

In this book, I assume that you have some knowledge of programming and Python itself. You do not need to be an expert with Python to complete this book, although a good level of knowledge will help. I will not be explaining general code structures and syntax in this book, except where it is different from what is considered normal python coding practice.

If you do not have any experience with programming, I recommend that you pick up the Learning Python book from Packt Publishing, or the book Dive Into Python, available online at www.diveintopython3.net

The Python organization also maintains a list of two online tutorials for those new to Python:

  • For non-programmers who want to learn to program through the Python language:

               https://wiki.python.org/moin/BeginnersGuide/NonProgrammers

  • For programmers who already know how to program, but need to learn Python specifically:

                https://wiki.python.org/moin/BeginnersGuide/Programmers
Windows users will need to set an environment variable to use Python from the command line, where other systems will usually be immediately executable. We set it in the following steps

  1. First, find where you install Python 3 onto your computer; the default location is C:\Python35.
  2. Next, enter this command into the command line (cmd program): set the environment to PYTHONPATH=%PYTHONPATH%;C:\Python35.
Remember to change the C:\Python35 if your installation of Python is in a different folder.

Once you have Python running on your system, you should be able to open a command prompt and can run the following code to be sure it has installed correctly.

    $ python
Python 3.5.1 (default, Apr 11 2014, 13:05:11)
[GCC 4.8.2] on Linux
Type "help", "copyright", "credits" or "license" for more
information.
>>> print("Hello, world!")
Hello, world!
>>> exit()

Note that we will be using the dollar sign ($) to denote that a command that you type into the terminal (also called a shell or cmd on Windows). You do not need to type this character (or retype anything that already appears on your screen). Just type in the rest of the line and press Enter.

After you have the above "Hello, world!" example running, exit the program and move on to installing a more advanced environment to run Python code, the Jupyter Notebook.

Python 3.5 will include a program called pip, which is a package manager that helps to install new libraries on your system. You can verify that pip is working on your system by running the $ pip freeze command, which tells you which packages you have installed on your system. Anaconda also installs their package manager, conda, that you can use. If unsure, use conda first, use pip only if that fails.

Installing Jupyter Notebook

Jupyter is a platform for Python development that contains some tools and environments for running Python and has more features than the standard interpreter. It contains the powerful Jupyter Notebook, which allows you to write programs in a web browser. It also formats your code, shows output, and allows you to annotate your scripts. It is a great tool for exploring datasets and we will be using it as our main environment for the code in this book.

To install the Jupyter Notebook on your computer, you can type the following into a command line prompt (not into Python):

    $ conda install jupyter notebook

You will not need administrator privileges to install this, as Anaconda keeps packages in the user's directory.

With the Jupyter Notebook installed, you can launch it with the following:

    $ jupyter notebook

Running this command will do two things. First, it will create a Jupyter Notebook instance - the backend - that will run in the command prompt you just used. Second, it will launch your web browser and connect to this instance, allowing you to create a new notebook. It will look something like the following screenshot (where you need to replace /home/bob with your current working directory):

To stop the Jupyter Notebook from running, open the command prompt that has the instance running (the one you used earlier to run the jupyter notebook   command). Then, press Ctrl + C and you will be prompted Shutdown this notebook server (y/[n])?. Type y and press Enter and the Jupyter Notebook will shut down.

Installing scikit-learn

The scikit-learn package is a machine learning library, written in Python (but also containing code in other languages). It contains numerous algorithms, datasets, utilities, and frameworks for performing machine learning. Scikit-learnis built upon the scientific python stack, including libraries such as the NumPy and SciPy for speed. Scikit-learn is fast and scalable in many instances and useful for all skill ranges from beginners to advanced research users. We will cover more details of scikit-learn in Chapter 2, Classifying with scikit-learn Estimators.

To install scikit-learn, you can use the conda utility that comes with Python 3, which will also install the NumPy and SciPy libraries if you do not already have them. Open a terminal with administrator/root privileges and enter the following command:

    $ conda install scikit-learn

Users of major Linux distributions such as Ubuntu or Red Hat may wish to install the official package from their package manager.

Not all distributions have the latest versions of scikit-learn, so check the version before installing it. The minimum version needed for this book is 0.14. My recommendation for this book is to use Anaconda to manage this for you, rather than installing using your system's package manager.

Those wishing to install the latest version by compiling the source, or view more detailed installation instructions, can go to http://scikit-learn.org/stable/install.html and refer the official documentation on installing scikit-learn.

bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete