Status
Done
1. Relationship & Dispersion Metrics
df['series'].corr(other)→ Pearson correlation with another Series (-1 to 1)df['series'].cov(other)→ Covariance with another Seriesdf['series'].autocorr(lag=1)→ Autocorrelation at specified lag (for time series)
2. Shape & Distribution Metrics
df['series'].skew()→ Skewness (measure of asymmetry)df['series'].kurt()/kurtosis()→ Kurtosis (tailedness; aliaskurtosis)
3. Aggregation & Error Metrics
df['series'].prod()→ Product of all valuesdf['series'].sem()→ Standard error of the mean (σ/√n)
Key Notes:
corr()/cov()require equal-length Series- Positive
skew()= right-tailed, negative = left-tailed - High
kurtosis()= heavy tails (leptokurtic) autocorr()helps detect periodicity in time series
Advanced Statistical Metrics
Sample Dataset
python
import pandas as pd
import numpy as np
data = {
'temperature': [22, 25, 28, 24, 27, 23, 26, 29],
'ice_cream_sales': [110, 150, 180, 130, 170, 120, 160, 190],
'time_series': [10, 15, 12, 18, 14, 20, 16, 22] # Simulated time-series data
}
df = pd.DataFrame(data)1. Relationship & Dispersion Metrics
‣
1.1 df['series'].corr(other)
‣
1.2 df['series'].cov(other)
‣
1.3 df['series'].autocorr(lag=1)
2. Shape & Distribution Metrics
‣
2.1 df['series'].skew()
‣
2.2 df['series'].kurt() / kurtosis()
3. Aggregation & Error Metrics
‣
3.1 df['series'].prod()
‣
3.2 Series.sem()
Summary Table
Method | Description | Example Use Case |
.corr() | Pearson correlation (-1 to 1) | df['temp'].corr(df['sales']) |
.cov() | Covariance | df['temp'].cov(df['sales']) |
.autocorr() | Autocorrelation (time series) | df['ts'].autocorr(lag=1) |
.skew() | Distribution asymmetry | df['sales'].skew() |
.kurt() / .kurtosis() | Tailedness (outlier propensity) | df['sales'].kurt() |
.prod() | Product of all values | df['temp'].prod() |
.sem() | Standard error of the mean | df['sales'].sem() |
Key Notes
- Correlation vs. Covariance:
- Use
.corr()for standardized relationships (unitless). - Use
.cov()for directionality (units depend on input). - Skewness Interpretation:
- Right skew (positive): Mean > Median (e.g., income data).
- Left skew (negative): Mean < Median (e.g., age at retirement).
- Kurtosis:
- Compare to
0(normal distribution). High kurtosis → more outliers. - Autocorrelation:
- Critical for time-series analysis (e.g., detect seasonality at lag=12 for monthly data).
- SEM:
- Reflects how far the sample mean may deviate from the true population mean.
Practical Applications
- Business:
corr()to validate "temperature vs. sales" hypotheses. - Finance:
autocorr()to detect stock price patterns. - Engineering:
skew()/kurt()to assess sensor data quality.