-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Pandas 1.x Cookbook
By :

Many DataFrame methods have an axis
parameter. This parameter controls the direction in which the operation takes place. Axis parameters can be 'index'
(or 0
) or 'columns'
(or 1
). I prefer the string versions are they are more explicit and tend to make the code easier to read.
Nearly all DataFrame methods default the axis parameter to 0
, which applies to operations along the index. This recipe shows you how to invoke the same method along both axes.
>>> college = pd.read_csv(
... "data/college.csv", index_col="INSTNM"
... )
>>> college_ugds = college.filter(like="UGDS_")
>>> college_ugds.head()
UGDS_WHITE UGDS_BLACK ... UGDS_NRA UGDS_UNKN
INSTNM ...
Alabama A... 0.0333 0.9353 ... 0.0059 0.0138
Universit... 0.5922 0.2600 ... 0.0179 0.0100
Amridge U... 0.2990 0.4192 ... 0.0000 0.2715
Universit... 0.6988 0.1255 ... 0.0332 0.0350
Alabama S... 0.0158 0.9208 ... 0.0243 0.0137
.count
method returns the number of non-missing values. By default, its axis
parameter is set to 0
:
>>> college_ugds.count()
UGDS_WHITE 6874
UGDS_BLACK 6874
UGDS_HISP 6874
UGDS_ASIAN 6874
UGDS_AIAN 6874
UGDS_NHPI 6874
UGDS_2MOR 6874
UGDS_NRA 6874
UGDS_UNKN 6874
dtype: int64
The axis
parameter is almost always set to 0
. So, step 2 is equivalent to both college_ugds.count(axis=0)
and college_ugds.count(axis='index')
.
'columns'
changes the direction of the operation so that we get back a count of non-missing items in each row:
>>> college_ugds.count(axis="columns").head()
INSTNM
Alabama A & M University 9
University of Alabama at Birmingham 9
Amridge University 9
University of Alabama in Huntsville 9
Alabama State University 9
dtype: int64
.sum
method may be used to verify this:
>>> college_ugds.sum(axis="columns").head()
INSTNM
Alabama A & M University 1.0000
University of Alabama at Birmingham 0.9999
Amridge University 1.0000
University of Alabama in Huntsville 1.0000
Alabama State University 1.0000
dtype: float64
.median
method can be used:
>>> college_ugds.median(axis="index")
UGDS_WHITE 0.55570
UGDS_BLACK 0.10005
UGDS_HISP 0.07140
UGDS_ASIAN 0.01290
UGDS_AIAN 0.00260
UGDS_NHPI 0.00000
UGDS_2MOR 0.01750
UGDS_NRA 0.00000
UGDS_UNKN 0.01430
dtype: float64
The direction of operation on the axis is one of the more confusing aspects of pandas. Many pandas users have difficulty remembering the meaning of the axis parameter. I remember them by reminding myself that a Series only has one axis, the index (or 0). A DataFrame also has an index (axis 0) and columns (axis 1).
The .cumsum
method with axis=1
accumulates the race percentages across each row. It gives a slightly different view of the data. For example, it is very easy to see the exact percentage of white and black students for each school:
>>> college_ugds_cumsum = college_ugds.cumsum(axis=1)
>>> college_ugds_cumsum.head()
UGDS_WHITE UGDS_BLACK ... UGDS_NRA UGDS_UNKN
INSTNM ...
Alabama A... 0.0333 0.9686 ... 0.9862 1.0000
Universit... 0.5922 0.8522 ... 0.9899 0.9999
Amridge U... 0.2990 0.7182 ... 0.7285 1.0000
Universit... 0.6988 0.8243 ... 0.9650 1.0000
Alabama S... 0.0158 0.9366 ... 0.9863 1.0000