👣

Outliers Remover

Here’s a compact and useful list of various methods for removing outliers, categorized by approach and suited for study or quick reference:

Method	Description	Python Example
Z-Score	Flags data points with Z-scores > 3 or < -3	`abs(stats.zscore(df)) > 3`
IQR (Interquartile Range)	Removes values outside 1.5×IQR (Q1 - 1.5IQR or Q3 + 1.5IQR)	`df[(df[col] > Q1 - 1.5IQR) & (df[col] < Q3 + 1.5IQR)]`
Modified Z-Score (Median Absolute Deviation)	Better for skewed data	`modified_z = 0.6745 * (x - median) / MAD`

Method	Description	Library
Isolation Forest	Detects anomalies by partitioning trees	`from sklearn.ensemble import IsolationForest`
One-Class SVM	Learns boundary for normal class	`from sklearn.svm import OneClassSVM`
Elliptic Envelope	Assumes Gaussian distribution	`from sklearn.covariance import EllipticEnvelope`
LOF (Local Outlier Factor)	Density-based local anomaly detection	`from sklearn.neighbors import LocalOutlierFactor`
Autoencoders	Neural nets that learn normal patterns	`keras`, `PyTorch`

Method	Description
Box Plot	Visualize and spot outliers via whiskers
Scatter Plot	Detect unusual observations
Histogram	Spot extreme values visually
QQ Plot	Compare distribution against normality

Method	Description
Threshold Filtering	Define upper/lower bounds manually
Winsorizing	Replace outliers with nearest accepted value
Transformation	Use log, sqrt, or Box-Cox to reduce influence

Advanced detection via pyod:

python
CopyEdit
from pyod.models.knn import KNN
from pyod.models.iforest import IForest
from pyod.models.auto_encoder import AutoEncoder

Supports multiple algorithms:

Name	Text	Text 1
📚 Z-Score	Flags data points with Z-scores > 3 or < -3	`abs(stats.zscore(df)) > 3`
IQR (Interquartile Range)