📚

Ordinal Encoding

Category Encoders

✅

Description

Encodes categories based on meaningful order.

Relevant Class/Implementation

sklearn.preprocessing.OrdinalEncoder

Scikit-learn

✅

Ordinal Encoding

Ordinal Encoding is a technique in Machine Learning used to convert categorical variables with a meaningful orderinto numerical values. It assigns integer labels to each category based on their rank or order, allowing models to learn from the inherent hierarchy in the data.

Why Use Ordinal Encoding?

Some categorical features have a natural order—for example:

Education Level: High School < Bachelor's < Master's < PhD
Size: Small < Medium < Large

Using One-Hot Encoding on such features loses the order information, while Ordinal Encoding preserves it.

How Ordinal Encoding Works

Each ordered category is mapped to an integer, starting from 0 (or 1, depending on implementation). The higher the number, the greater the rank or level.

Example 1

Example Dataset (Before Encoding)

Size

Small

Medium

Large

Medium

Small

Ordinal Encoded Representation

Size

The order is defined: Small < Medium < Large
Each row now has a numeric rank based on the size.

Implementing Ordinal Encoding in Python


import pandas as pd
from sklearn.preprocessing import OrdinalEncoder

# Sample Data
data = pd.DataFrame({'Size': ['Small', 'Medium', 'Large', 'Medium', 'Small']})

# Define order explicitly
size_order = [['Small', 'Medium', 'Large']]

# Initialize and apply OrdinalEncoder
encoder = OrdinalEncoder(categories=size_order)
data['Size_encoded'] = encoder.fit_transform(data[['Size']])

print(data)

mathematica
CopyEdit
     Size  Size_encoded
0   Small           0.0
1  Medium           1.0
2   Large           2.0
3  Medium           1.0
4   Small           0.0

Using `map()` for Custom Encoding in pandas

python
CopyEdit
data['Size_encoded'] = data['Size'].map({'Small': 0, 'Medium': 1, 'Large': 2})

This approach is quick and flexible, especially when you know the category order ahead of time.

Example 2

python
CopyEdit
df = pd.DataFrame({
    'Education': ['High School', 'PhD', 'Bachelor', 'Master', 'High School']
})

# Define custom mapping
edu_map = {'High School': 0, 'Bachelor': 1, 'Master': 2, 'PhD': 3}

df['Education_encoded'] = df['Education'].map(edu_map)
print(df)

Education	Education_encoded
High School	0
PhD	3
Bachelor	1
Master	2
High School	0