-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

IPython Interactive Computing and Visualization Cookbook
By :

In this recipe, we will give an introduction to IPython and Jupyter for data analysis. Most of the subject has been covered in the prequel of this book, Learning IPython for Interactive Computing and Data Visualization, Second Edition, Packt Publishing, but we will review the basics here.
We will download and analyze a dataset about attendance on Montreal's bicycle tracks. This example is largely inspired by a presentation from Julia Evans (available at https://github.com/jvns/talks/blob/master/2013-04-mtlpy/pistes-cyclables.ipynb). Specifically, we will introduce the following:
>>> import numpy as np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline
We can enable high-resolution Matplotlib figures on Retina display systems with the following commands:
from IPython.display import set_matplotlib_formats set_matplotlib_formats('retina')
url
that contains the address to a Comma-separated Values (CSV) data file. This standard text-based file format is used to store tabular data:>>> url = ("https://raw.githubusercontent.com/" "ipython-books/cookbook-2nd-data/" "master/bikes.csv")
read_csv()
function that can read any CSV file. Here, we pass the URL to the file. pandas will automatically download the file, parse it, and return a DataFrame object. We need to specify a few options to make sure that the dates are parsed correctly:>>> df = pd.read_csv(url, index_col='Date', parse_dates=True, dayfirst=True)
df
variable contains a DataFrame object, a specific pandas data structure that contains 2D tabular data. The head(n)
method displays the first n
rows of this table. In the Notebook, pandas displays a DataFrame object in an HTML table, as shown in the following screenshot:>>> df.head(2)
Here, every row contains the number of bicycles on every track of the city, for every day of the year.
describe()
method:>>> df.describe()
Berri1
and PierDup
. Then, we call the plot()
method:>>> df[['Berri1', 'PierDup']].plot(figsize=(10, 6), style=['-', '--'], lw=2)
index
attribute of the DataFrame object contains the dates of all rows in the table. This index has a few date-related attributes, including weekday_name
:>>> df.index.weekday_name Index(['Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday', 'Monday', 'Tuesday', ... 'Friday', 'Saturday', 'Sunday', 'Monday', 'Tuesday', 'Wednesday'], dtype='object', name='Date', length=261)
groupby()
method lets us do just that. We use weekday
instead of weekday_name
to keep the weekday order (Monday is 0
, Tuesday is 1
, and so on). Once grouped, we can sum all rows in every group:>>> df_week = df.groupby(df.index.weekday).sum() >>> df_week
plot()
method of DataFrame to create our plot:>>> fig, ax = plt.subplots(1, 1, figsize=(10, 8)) df_week.plot(style='-o', lw=3, ax=ax) ax.set_xlabel('Weekday') # We replace the labels 0, 1, 2... by the weekday # names. ax.set_xticklabels( ('Monday,Tuesday,Wednesday,Thursday,' 'Friday,Saturday,Sunday').split(',')) ax.set_ylim(0) # Set the bottom axis to 0.
@interact
decorator above our plotting function:>>> from ipywidgets import interact @interact def plot(n=(1, 30)): fig, ax = plt.subplots(1, 1, figsize=(10, 8)) df['Berri1'].rolling(window=n).mean().plot(ax=ax) ax.set_ylim(0, 7000) plt.show()
To create Matplotlib figures, it is good practice to create a Figure (fig
) and one or several Axes (subplots, ax
object) objects with the plt.subplots()
command. The figsize
keyword argument lets us specify the size of the figure, in inches. Then, we call plotting methods directly on the Axes instances. Here, for example, we set the y limits of the axis with the set_ylim()
method. If there are existing plotting commands, like the plot()
method provided by pandas on DataFrame instances, we can pass the relevant Axis instance with the ax
keyword argument.
pandas is the main data wrangling library in Python. Other tools and methods are generally required for more advanced analyses (signal processing, statistics, and mathematical modeling). We will cover these steps in the second part of this book, starting with Chapter 7, Statistical Data Analysis.
Here are some more references about data manipulation with pandas:
Change the font size
Change margin width
Change background colour