📚

Target Encoding (Mean Encoding)

Category Encoders

✅

Description

Replaces category with mean of target.

Relevant Class/Implementation

category_encoders.TargetEncoder

Scikit-learn

❌

Target Encoding (Mean Encoding)

What is Target Encoding?

Target Encoding (also called Mean Encoding) replaces each category in a categorical feature with the mean of the target variable corresponding to that category.

It’s particularly useful in supervised learning when categorical features are related to the target and you want to extract that information.

Why Use Target Encoding?

✅ Great for High-Cardinality Features (e.g., 1,000+ city names)

✅ Captures Relationship Between Feature and Target

✅ Keeps Model Input Compact (unlike One-Hot Encoding)

🚫 Caution:

Can cause data leakage if not handled properly (e.g., target values influencing training).
Must be used with cross-validation or out-of-fold encoding.

How Target Encoding Works

Group the dataset by the categorical feature.
Calculate the mean target value for each category.
Replace the category with its corresponding mean value.

Example

Let’s say we’re predicting sales and have a City column:

Input Dataset

City	Sales
London	200
Paris	250
New York	300
London	220
Paris	270

Step 1: Compute Mean Target per Category

London → (200 + 220) / 2 = 210
Paris → (250 + 270) / 2 = 260
New York → 300

Step 2: Replace Category with Mean Target

City	Sales	City_Encoded
London	200	210
Paris	250	260
New York	300	300
London	220	210
Paris	270	260

Implementing Target Encoding in Python

✅ Using `category_encoders` library

python
CopyEdit
import pandas as pd
import category_encoders as ce

# Sample Data
df = pd.DataFrame({
    'City': ['London', 'Paris', 'New York', 'London', 'Paris'],
    'Sales': [200, 250, 300, 220, 270]
})

# Initialize encoder
encoder = ce.TargetEncoder(cols=['City'])

# Fit and transform
df['City_Encoded'] = encoder.fit_transform(df['City'], df['Sales'])

print(df)

Handling Data Leakage

To avoid data leakage, especially during model validation:

Use out-of-fold encoding for training data
Apply encoding separately on validation/test sets

✅ Using Cross-Validation Encoding

python
CopyEdit
from sklearn.model_selection import KFold
import numpy as np

df['City_Encoded'] = np.nan
kf = KFold(n_splits=5, shuffle=True, random_state=42)

for train_idx, val_idx in kf.split(df):
    train_df = df.iloc[train_idx]
    val_df = df.iloc[val_idx]

    means = train_df.groupby('City')['Sales'].mean()
    df.loc[val_idx, 'City_Encoded'] = val_df['City'].map(means)

Comparison with Other Encoders

Encoding Method	Preserves Target Relationship?	Risk of Leakage	Suitable for High Cardinality
One-Hot Encoding	❌ No	❌ No	❌ No
Label Encoding	❌ No	❌ No	✅ Yes
Frequency Encoding	❌ Not direct	❌ No	✅ Yes
Target Encoding	✅ Yes	⚠️ Yes	✅✅ Highly Recommended

When to Use Target Encoding

✅ When categorical feature is strongly related to the target

✅ When dealing with high-cardinality columns

✅ In regression or classification tasks

🚫 Avoid when:

You're working with test data that has unseen categories
You're not using proper validation techniques

Conclusion

Target Encoding is a powerful technique for handling categorical features with many unique values. It captures the relationship between categories and the target, but must be used carefully to prevent leakage. Always validate properly. 🚀

What It Does: Replaces categories with the mean of the target variable for each category.
Best For: Commonly used for high-cardinality categorical features.
Caution: Risk of data leakage; apply carefully with validation techniques.
Example:

For a "city" feature with values ["London", "Paris", "New York"], calculate the mean target value for each city.

mean_encoding = df.groupby('city')['target'].mean().to_dict()
df['city_target_mean'] = df['city'].map(mean_encoding)
print(df[['city', 'city_target_mean']])

city	city_target_mean
London	1.0
Paris	0.0
New York	0.5
Paris	0.0
London	1.0
New York	0.5
London	1.0

Target Encoding (Mean Encoding)

Target Encoding (Mean Encoding)

What is Target Encoding?

Why Use Target Encoding?

How Target Encoding Works

Example

Input Dataset

Step 1: Compute Mean Target per Category

Step 2: Replace Category with Mean Target

Implementing Target Encoding in Python

✅ Using category_encoders library

Handling Data Leakage

✅ Using Cross-Validation Encoding

Comparison with Other Encoders

When to Use Target Encoding

Conclusion

✅ Using `category_encoders` library