Here’s a compact and useful list of various methods for removing outliers, categorized by approach and suited for study or quick reference:
1. Statistical Methods
Method | Description | Python Example |
Z-Score | Flags data points with Z-scores > 3 or < -3 | abs(stats.zscore(df)) > 3 |
IQR (Interquartile Range) | Removes values outside 1.5×IQR (Q1 - 1.5IQR or Q3 + 1.5IQR) | df[(df[col] > Q1 - 1.5*IQR) & (df[col] < Q3 + 1.5*IQR)] |
Modified Z-Score (Median Absolute Deviation) | Better for skewed data | modified_z = 0.6745 * (x - median) / MAD |
🧠 2. Machine Learning-Based Methods
Method | Description | Library |
Isolation Forest | Detects anomalies by partitioning trees | from sklearn.ensemble import IsolationForest |
One-Class SVM | Learns boundary for normal class | from sklearn.svm import OneClassSVM |
Elliptic Envelope | Assumes Gaussian distribution | from sklearn.covariance import EllipticEnvelope |
LOF (Local Outlier Factor) | Density-based local anomaly detection | from sklearn.neighbors import LocalOutlierFactor |
Autoencoders | Neural nets that learn normal patterns | keras , PyTorch |
🔎 3. Visualization-Based Methods
Method | Description |
Box Plot | Visualize and spot outliers via whiskers |
Scatter Plot | Detect unusual observations |
Histogram | Spot extreme values visually |
QQ Plot | Compare distribution against normality |
🛠️ 4. Domain-Specific Rules
Method | Description |
Threshold Filtering | Define upper/lower bounds manually |
Winsorizing | Replace outliers with nearest accepted value |
Transformation | Use log, sqrt, or Box-Cox to reduce influence |
⚙️ 5. PyOD Library Methods
Advanced detection via pyod:
python
CopyEdit
from pyod.models.knn import KNN
from pyod.models.iforest import IForest
from pyod.models.auto_encoder import AutoEncoder
Supports multiple algorithms:
KNN
,HBOS
,ABOD
,AutoEncoder
, etc.
Name | Text | Text 1 |
---|---|---|
Flags data points with Z-scores > 3 or < -3 |
| |