Let's begin by importing the necessary Python libraries:
- Import the required Python libraries:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
- Let's load a few categorical variables from the dataset:
cols = ['GENDER', 'RFA_2', 'MDMAUD_A', 'RFA_2', 'DOMAIN', 'RFA_15']
data = pd.read_csv('cup98LRN.txt', usecols=cols)
- Let's replace the empty strings with NaN values and inspect the first five rows of the data:
data = data.replace(' ', np.nan)
data.head()
After loading the data, this is what the output of head() looks like when we run it from a Jupyter Notebook:

- Now, let's determine the number of unique categories in each variable:
data.nunique()
The output of the preceding code shows the number of distinct categories per variable, that is, the cardinality:
DOMAIN 16
GENDER 6
RFA_2 14
RFA_15 33
MDMAUD_A 5
dtype: int64
The nunique() method ignores missing values by default. If we want to consider missing values as an additional category, we should set the dropna argument to False: data.nunique(dropna=False).
- Now, let's print out the unique categories of the GENDER variable:
data['GENDER'].unique()
We can see the distinct values of GENDER in the following output:
array(['F', 'M', nan, 'C', 'U', 'J', 'A'], dtype=object)
pandas nunique() can be used in the entire dataframe. pandas unique(), on the other hand, works only on a pandas Series. Thus, we need to specify the column name that we want to return the unique values for.
- Let's make a plot with the cardinality of each variable:
data.nunique().plot.bar(figsize=(12,6))
plt.ylabel('Number of unique categories')
plt.xlabel('Variables')
plt.title('Cardinality')
The following is the output of the preceding code block:

We can change the figure size with the figsize argument and also add x and y labels and a title with plt.xlabel(), plt.ylabel(), and plt.title() to enhance the aesthetics of the plot.