Symmetrical, bell-shaped distribution defined by mean and standard deviation.
Understanding norm.cdf
and norm.ppf
from SciPy's stats
Module
1. What is norm.cdf
?
The Cumulative Distribution Function (CDF) for a normal distribution represents the probability that a random variable XX will be less than or equal to a given value xx.
norm.cdf(x, loc, scale)
calculates the CDF at a specific value x for a normal distribution defined by:- loc: the mean (average) of the distribution.
- scale: the standard deviation (spread or "width") of the distribution.
x
Formula for CDF:
For a given value xx,
CDF(x)=P(X≤x)
CDF(x)=P(X≤x)
The CDF gives a probability that a random draw from the distribution will be less than or equal to xx.
Example of norm.cdf
:
Let’s say we have a normal distribution with a mean of 100 and a standard deviation of 15. We want to know the probability that a random value is less than or equal to 110.
python
Copy code
from scipy.stats import norm
prob = norm.cdf(110, loc=100, scale=15)
print(prob) # Output: ~0.7475
Interpretation: The result, 0.7475, tells us there's approximately a 74.75% chance that a random value from this distribution will be less than or equal to 110.
2. What is norm.ppf
?
The Percent Point Function (PPF) is the inverse of the CDF. It’s also known as the quantile function. It gives us the value xx at which a specified cumulative probability (percentile) occurs.
norm.ppf(q, loc, scale)
finds the value of x for a given cumulative probability q in a normal distribution defined by:- loc: the mean of the distribution.
- scale: the standard deviation.
x
q
Formula for PPF:
For a given probability qq,
x=PPF(q)
x=PPF(q)
The PPF function gives us the point xx that corresponds to the cumulative probability qq.
Example of norm.ppf
:
Suppose we want to find the value at the 95th percentile (or 0.95 probability) for a normal distribution with a mean of 100 and a standard deviation of 15.
python
Copy code
from scipy.stats import norm
value_at_95th_percentile = norm.ppf(0.95, loc=100, scale=15)
print(value_at_95th_percentile) # Output: ~124.67
Interpretation: The result, 124.67, means that 95% of values from this distribution will be less than or equal to approximately 124.67.
Quick Summary
norm.cdf(x, loc, scale)
: Calculates the probability that a value is less than or equal to x.norm.ppf(q, loc, scale)
: Finds the value x that corresponds to the given cumulative probability q.
x
x
q
Combined Example
Let’s say we have test scores that are normally distributed with a mean of 70 and a standard deviation of 10. Here’s what norm.cdf
and norm.ppf
can help us find:
- Using
norm.cdf
: What’s the probability a student scored 85 or below? - Using
norm.ppf
: What score corresponds to the top 5% of students?
python
Copy code
from scipy.stats import norm
prob_below_85 = norm.cdf(85, loc=70, scale=10)
print(prob_below_85) # Output: ~0.9332
Interpretation: About 93.32% of students scored 85 or below.
python
Copy code
from scipy.stats import norm
top_5_percent_score = norm.ppf(0.95, loc=70, scale=10)
print(top_5_percent_score) # Output: ~86.45
Interpretation: The score of 86.45 is the cutoff for the top 5% of students in this distribution.
Visualization Insight
Visualizing both the CDF and PPF can provide more intuition:
- CDF Plot: Shows the probability increasing as you move right along the x-axis.
- PPF Plot: Shows the x-value corresponding to increasing probability levels.
These functions are essential in probability and statistics, especially for working with normal distributions. Use norm.cdf
to find probabilities and norm.ppf
to find values for specific probabilities.
4o