âś…
Converts category to binary and splits digits into columns.
category_encoders.BinaryEncoder
❌
Binary Encoding
What is Binary Encoding?
Binary Encoding is a categorical encoding technique that converts categories into binary numbers and then splits them into separate features. This method reduces dimensionality compared to One-Hot Encoding while preserving category information.
Why Use Binary Encoding?
- ✅ Efficient for High-Cardinality Features (categories with many unique values)
- ✅ Less Dimensionality than One-Hot Encoding
- ✅ Captures Ordinal Relationships (unlike One-Hot Encoding)
How Binary Encoding Works
- Convert categories into unique integer labels (like Label Encoding).
- Convert those integers into binary representation.
- Split each binary digit into separate columns.
Example
Consider the feature "City" with four unique values:
["New York", "London", "Paris", "Tokyo"]
- Assign an integer to each category:
- Convert integers to binary:
- Split each bit into separate columns:
New York → 1
London → 2
Paris → 3
Tokyo → 4
1 → 01
2 → 10
3 → 11
4 → 100
City | Binary 1 | Binary 2 | Binary 3 |
New York | 0 | 1 | 0 |
London | 1 | 0 | 0 |
Paris | 1 | 1 | 0 |
Tokyo | 1 | 0 | 0 |
Implementing Binary Encoding in Python
We use the category_encoders
 library.
Installation
pip install category_encoders
Example in Python
import pandas as pd
import category_encoders as ce
# Sample categorical data
df = pd.DataFrame({'City': ['New York', 'London', 'Paris', 'Tokyo', 'London', 'Paris']})
# Initialize BinaryEncoder
encoder = ce.BinaryEncoder(cols=['City'])
# Transform data
encoded_df = encoder.fit_transform(df)
# Display result
print(encoded_df)
Output
City_0 City_1 City_2
0 0 1 0
1 1 0 0
2 1 1 0
3 1 0 0
4 1 0 0
5 1 1 0
Comparison with Other Encoding Techniques
Encoding Method | Handles High-Cardinality? | Adds Extra Columns? | Retains Ordinal Information? |
One-Hot Encoding | ❌ No | 🔴 Many | ❌ No |
Label Encoding | âś… Yes | âś… Single Column | đź”´ Introduces Unwanted Order |
Binary Encoding | ✅ Yes | 🟢 Fewer Columns | ✅ Partial |
When to Use Binary Encoding?
✅ High-cardinality categorical variables (hundreds or thousands of unique values)
✅ When One-Hot Encoding causes too many columns
✅ When ordinal relationships exist but not strictly numerical
đźš« Avoid if:
- The feature has only a few unique categories (One-Hot Encoding might be better).
- The model requires interpretable categorical features (OHE is clearer).
Conclusion
Binary Encoding is a powerful alternative to One-Hot Encoding when dealing with high-cardinality categorical variables. It significantly reduces dimensionality while retaining useful categorical relationships. 🚀
from sklearn.preprocessing import category_encoders as ce
binary_encoder = ce.BinaryEncoder(cols=['city'])
df_binary = binary_encoder.fit_transform(df)
print(df_binary)