📚

Binary Encoding

Category Encoders

âś…

Description

Converts category to binary and splits digits into columns.

Relevant Class/Implementation

category_encoders.BinaryEncoder

Scikit-learn

❌

Binary Encoding

What is Binary Encoding?

Binary Encoding is a categorical encoding technique that converts categories into binary numbers and then splits them into separate features. This method reduces dimensionality compared to One-Hot Encoding while preserving category information.

Why Use Binary Encoding?

  • ✅ Efficient for High-Cardinality Features (categories with many unique values)
  • ✅ Less Dimensionality than One-Hot Encoding
  • ✅ Captures Ordinal Relationships (unlike One-Hot Encoding)

How Binary Encoding Works

  1. Convert categories into unique integer labels (like Label Encoding).
  2. Convert those integers into binary representation.
  3. Split each binary digit into separate columns.

Example

Consider the feature "City" with four unique values:

["New York", "London", "Paris", "Tokyo"]
  1. Assign an integer to each category:
  2. New York → 1
    London   → 2
    Paris    → 3
    Tokyo    → 4
  3. Convert integers to binary:
  4. 
    1 →  01
    2 →  10
    3 →  11
    4 → 100
  5. Split each bit into separate columns:
  6. City
    Binary 1
    Binary 2
    Binary 3
    New York
    0
    1
    0
    London
    1
    0
    0
    Paris
    1
    1
    0
    Tokyo
    1
    0
    0

Implementing Binary Encoding in Python

We use the category_encoders library.

Installation


pip install category_encoders

Example in Python


import pandas as pd
import category_encoders as ce

# Sample categorical data
df = pd.DataFrame({'City': ['New York', 'London', 'Paris', 'Tokyo', 'London', 'Paris']})

# Initialize BinaryEncoder
encoder = ce.BinaryEncoder(cols=['City'])

# Transform data
encoded_df = encoder.fit_transform(df)

# Display result
print(encoded_df)

Output


   City_0  City_1  City_2
0      0      1      0
1      1      0      0
2      1      1      0
3      1      0      0
4      1      0      0
5      1      1      0

Comparison with Other Encoding Techniques

Encoding Method
Handles High-Cardinality?
Adds Extra Columns?
Retains Ordinal Information?
One-Hot Encoding
❌ No
đź”´ Many
❌ No
Label Encoding
âś… Yes
âś… Single Column
đź”´ Introduces Unwanted Order
Binary Encoding
âś… Yes
🟢 Fewer Columns
âś… Partial

When to Use Binary Encoding?

✅ High-cardinality categorical variables (hundreds or thousands of unique values)

✅ When One-Hot Encoding causes too many columns

✅ When ordinal relationships exist but not strictly numerical

đźš« Avoid if:

  • The feature has only a few unique categories (One-Hot Encoding might be better).
  • The model requires interpretable categorical features (OHE is clearer).

Conclusion

Binary Encoding is a powerful alternative to One-Hot Encoding when dealing with high-cardinality categorical variables. It significantly reduces dimensionality while retaining useful categorical relationships. 🚀

from sklearn.preprocessing import category_encoders as ce 

binary_encoder = ce.BinaryEncoder(cols=['city'])
df_binary = binary_encoder.fit_transform(df)
print(df_binary)