Outliers Remover
👣

Outliers Remover

Here’s a compact and useful list of various methods for removing outliers, categorized by approach and suited for study or quick reference:

1. Statistical Methods

Method
Description
Python Example
Z-Score
Flags data points with Z-scores > 3 or < -3
abs(stats.zscore(df)) > 3
IQR (Interquartile Range)
Removes values outside 1.5×IQR (Q1 - 1.5IQR or Q3 + 1.5IQR)
df[(df[col] > Q1 - 1.5*IQR) & (df[col] < Q3 + 1.5*IQR)]
Modified Z-Score (Median Absolute Deviation)
Better for skewed data
modified_z = 0.6745 * (x - median) / MAD

🧠 2. Machine Learning-Based Methods

Method
Description
Library
Isolation Forest
Detects anomalies by partitioning trees
from sklearn.ensemble import IsolationForest
One-Class SVM
Learns boundary for normal class
from sklearn.svm import OneClassSVM
Elliptic Envelope
Assumes Gaussian distribution
from sklearn.covariance import EllipticEnvelope
LOF (Local Outlier Factor)
Density-based local anomaly detection
from sklearn.neighbors import LocalOutlierFactor
Autoencoders
Neural nets that learn normal patterns
kerasPyTorch

🔎 3. Visualization-Based Methods

Method
Description
Box Plot
Visualize and spot outliers via whiskers
Scatter Plot
Detect unusual observations
Histogram
Spot extreme values visually
QQ Plot
Compare distribution against normality

🛠️ 4. Domain-Specific Rules

Method
Description
Threshold Filtering
Define upper/lower bounds manually
Winsorizing
Replace outliers with nearest accepted value
Transformation
Use log, sqrt, or Box-Cox to reduce influence

⚙️ 5. PyOD Library Methods

Advanced detection via pyod:
python
CopyEdit
from pyod.models.knn import KNN
from pyod.models.iforest import IForest
from pyod.models.auto_encoder import AutoEncoder

Supports multiple algorithms:

  • KNNHBOSABODAutoEncoder, etc.

Statistical Methods

Name
Text
Text 1

Flags data points with Z-scores > 3 or < -3

abs(stats.zscore(df)) > 3