Status
Done
1. Relationship & Dispersion Metrics
df['series'].corr(other)Ā āĀ Pearson correlationĀ with another Series (-1 to 1)df['series'].cov(other)Ā āĀ CovarianceĀ with another Seriesdf['series'].autocorr(lag=1)Ā āĀ AutocorrelationĀ at specified lag (for time series)
2. Shape & Distribution Metrics
df['series'].skew()Ā āĀ SkewnessĀ (measure of asymmetry)df['series'].kurt()Ā /Ākurtosis()Ā āĀ KurtosisĀ (tailedness; aliasĀkurtosis)
3. Aggregation & Error Metrics
df['series'].prod()Ā āĀ ProductĀ of all valuesdf['series'].sem()Ā āĀ Standard error of the meanĀ (Ļ/ān)
Key Notes:
corr()/cov()Ā require equal-length Series- PositiveĀ
skew()Ā = right-tailed, negative = left-tailed - HighĀ
kurtosis()Ā = heavy tails (leptokurtic) autocorr()Ā helps detect periodicity in time series
Advanced Statistical Metrics
Sample Dataset
python
import pandas as pd
import numpy as np
data = {
'temperature': [22, 25, 28, 24, 27, 23, 26, 29],
'ice_cream_sales': [110, 150, 180, 130, 170, 120, 160, 190],
'time_series': [10, 15, 12, 18, 14, 20, 16, 22] # Simulated time-series data
}
df = pd.DataFrame(data)1. Relationship & Dispersion Metrics
ā£
1.1Ā df['series'].corr(other)
ā£
1.2Ā df['series'].cov(other)
ā£
1.3Ā df['series'].autocorr(lag=1)
2. Shape & Distribution Metrics
ā£
2.1Ā df['series'].skew()
ā£
2.2Ā df['series'].kurt()Ā /Ā kurtosis()
3. Aggregation & Error Metrics
ā£
3.1Ā df['series'].prod()
ā£
3.2Ā Series.sem()
Summary Table
Method | Description | Example Use Case |
.corr() | Pearson correlation (-1 to 1) | df['temp'].corr(df['sales']) |
.cov() | Covariance | df['temp'].cov(df['sales']) |
.autocorr() | Autocorrelation (time series) | df['ts'].autocorr(lag=1) |
.skew() | Distribution asymmetry | df['sales'].skew() |
.kurt()Ā /Ā .kurtosis() | Tailedness (outlier propensity) | df['sales'].kurt() |
.prod() | Product of all values | df['temp'].prod() |
.sem() | Standard error of the mean | df['sales'].sem() |
Key Notes
- Correlation vs. Covariance:
- UseĀ
.corr()Ā for standardized relationships (unitless). - UseĀ
.cov()Ā for directionality (units depend on input). - Skewness Interpretation:
- Right skew (positive): Mean > Median (e.g., income data).
- Left skew (negative): Mean < Median (e.g., age at retirement).
- Kurtosis:
- Compare toĀ
0Ā (normal distribution). High kurtosis ā more outliers. - Autocorrelation:
- Critical for time-series analysis (e.g., detect seasonality at lag=12 for monthly data).
- SEM:
- Reflects how far the sample mean may deviate from the true population mean.
Practical Applications
- Business:Ā
corr()Ā to validate "temperature vs. sales" hypotheses. - Finance:Ā
autocorr()Ā to detect stock price patterns. - Engineering:Ā
skew()/kurt()Ā to assess sensor data quality.