Let's begin by importing the necessary libraries and getting the data ready:
- Import the required Python libraries:
import pandas as pd
import matplotlib.pyplot as plt
- Let's load the Car Evaluation dataset, add the column names, and display the first five rows:
data = pd.read_csv('car.data', header=None)
data.columns = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'class']
data.head()
We get the following output when the code is executed from a Jupyter Notebook:

By default, pandas read_csv() uses the first row of the data as the column names. If the column names are not part of the raw data, we need to specifically tell pandas not to assign the column names by adding the header = None argument.
- Let's display the unique categories of the variable class:
data['class'].unique()
We can see the unique values of class in the following output:
array(['unacc', 'acc', 'vgood', 'good'], dtype=object)
- Let's calculate the number of cars per category of the class variable and then divide them by the total number of cars in the dataset to obtain the percentage of cars per category. Then, we'll print the result:
label_freq = data['class'].value_counts() / len(data)
print(label_freq)
The output of the preceding code block is a pandas Series, with the percentage of cars per category expressed as decimals:
unacc 0.700231
acc 0.222222
good 0.039931
vgood 0.037616
Name: class, dtype: float64
- Let's make a bar plot showing the frequency of each category and highlight the 5% mark with a red line:
fig = label_freq.sort_values(ascending=False).plot.bar()
fig.axhline(y=0.05, color='red')
fig.set_ylabel('percentage of cars within each category')
fig.set_xlabel('Variable: class')
fig.set_title('Identifying Rare Categories')
plt.show()
The following is the output of the preceding block code:

The good and vgood categories are present in less than 5% of cars, as indicated by the red line in the preceding plot.