-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Pandas 1.x Cookbook
By :

We can select a single column by passing the column name to the index operator of a DataFrame. This was covered in the Selecting a column recipe in Chapter 1, Pandas Foundations. It is often necessary to focus on a subset of the current working dataset, which is accomplished by selecting multiple columns.
In this recipe, all the actor and director columns will be selected from the movie dataset.
>>> import pandas as pd
>>> import numpy as np
>>> movies = pd.read_csv("data/movie.csv")
>>> movie_actor_director = movies[
... [
... "actor_1_name",
... "actor_2_name",
... "actor_3_name",
... "director_name",
... ]
... ]
>>> movie_actor_director.head()
actor_1_name actor_2_name actor_3_name director_name
0 CCH Pounder Joel Dav... Wes Studi James Ca...
1 Johnny Depp Orlando ... Jack Dav... Gore Ver...
2 Christop... Rory Kin... Stephani... Sam Mendes
3 Tom Hardy Christia... Joseph G... Christop...
4 Doug Walker Rob Walker NaN Doug Walker
>>> type(movies[["director_name"]])
<class 'pandas.core.frame.DataFrame'>
>>> type(movies["director_name"])
<class 'pandas.core.series.Series'>
.loc
to pull out a column by name. Because this index operation requires that we pass in a row selector first, we will use a colon (:
) to indicate a slice that selects all of the rows. This can also return either a DataFrame or a Series:
>>> type(movies.loc[:, ["director_name"]])
<class 'pandas.core.frame.DataFrame'>
>>> type(movies.loc[:, "director_name"])
<class 'pandas.core.series.Series'>
The DataFrame index operator is very flexible and capable of accepting a number of different objects. If a string is passed, it will return a single-dimensional Series. If a list is passed to the indexing operator, it returns a DataFrame of all the columns in the list in the specified order.
Step 2 shows how to select a single column as a DataFrame and as a Series. Usually, a single column is selected with a string, resulting in a Series. When a DataFrame is desired, put the column name in a single-element list.
Step 3 shows how to use the loc
attribute to pull out a Series or a DataFrame.
Passing a long list inside the indexing operator might cause readability issues. To help with this, you may save all your column names to a list variable first. The following code achieves the same result as step 1:
>>> cols = [
... "actor_1_name",
... "actor_2_name",
... "actor_3_name",
... "director_name",
... ]
>>> movie_actor_director = movies[cols]
One of the most common exceptions raised when working with pandas is KeyError
. This error is mainly due to mistyping of a column or index name. This same error is raised whenever a multiple column selection is attempted without the use of a list:
>>> movies[
... "actor_1_name",
... "actor_2_name",
... "actor_3_name",
... "director_name",
... ]
Traceback (most recent call last):
...
KeyError: ('actor_1_name', 'actor_2_name', 'actor_3_name', 'director_name')