Normal (Gaussian) Distribution:

Symmetrical, bell-shaped distribution defined by mean and standard deviation.

Understanding norm.cdf and norm.ppf from SciPy's stats Module

1. What is norm.cdf?

The Cumulative Distribution Function (CDF) for a normal distribution represents the probability that a random variable XX will be less than or equal to a given value xx.

  • norm.cdf(x, loc, scale) calculates the CDF at a specific value x for a normal distribution defined by:
    1. x

    2. loc: the mean (average) of the distribution.
    3. scale: the standard deviation (spread or "width") of the distribution.

Formula for CDF:

For a given value xx,

CDF(x)=P(X≤x)

CDF(x)=P(X≤x)

The CDF gives a probability that a random draw from the distribution will be less than or equal to xx.

Example of norm.cdf:

Let’s say we have a normal distribution with a mean of 100 and a standard deviation of 15. We want to know the probability that a random value is less than or equal to 110.

python
Copy code
from scipy.stats import norm

prob = norm.cdf(110, loc=100, scale=15)
print(prob)  # Output: ~0.7475

Interpretation: The result, 0.7475, tells us there's approximately a 74.75% chance that a random value from this distribution will be less than or equal to 110.

2. What is norm.ppf?

The Percent Point Function (PPF) is the inverse of the CDF. It’s also known as the quantile function. It gives us the value xx at which a specified cumulative probability (percentile) occurs.

  • norm.ppf(q, loc, scale) finds the value of x for a given cumulative probability q in a normal distribution defined by:
    1. x

      q

    2. loc: the mean of the distribution.
    3. scale: the standard deviation.

Formula for PPF:

For a given probability qq,

x=PPF(q)

x=PPF(q)

The PPF function gives us the point xx that corresponds to the cumulative probability qq.

Example of norm.ppf:

Suppose we want to find the value at the 95th percentile (or 0.95 probability) for a normal distribution with a mean of 100 and a standard deviation of 15.

python
Copy code
from scipy.stats import norm

value_at_95th_percentile = norm.ppf(0.95, loc=100, scale=15)
print(value_at_95th_percentile)  # Output: ~124.67

Interpretation: The result, 124.67, means that 95% of values from this distribution will be less than or equal to approximately 124.67.

Quick Summary

  • norm.cdf(x, loc, scale): Calculates the probability that a value is less than or equal to x.
  • x

  • norm.ppf(q, loc, scale): Finds the value x that corresponds to the given cumulative probability q.
  • x

    q

Combined Example

Let’s say we have test scores that are normally distributed with a mean of 70 and a standard deviation of 10. Here’s what norm.cdf and norm.ppf can help us find:

  1. Using norm.cdf: What’s the probability a student scored 85 or below?
  2. python
    Copy code
    from scipy.stats import norm
    
    prob_below_85 = norm.cdf(85, loc=70, scale=10)
    print(prob_below_85)  # Output: ~0.9332
    
    

    Interpretation: About 93.32% of students scored 85 or below.

  3. Using norm.ppf: What score corresponds to the top 5% of students?
  4. python
    Copy code
    from scipy.stats import norm
    
    top_5_percent_score = norm.ppf(0.95, loc=70, scale=10)
    print(top_5_percent_score)  # Output: ~86.45
    
    

    Interpretation: The score of 86.45 is the cutoff for the top 5% of students in this distribution.

Visualization Insight

Visualizing both the CDF and PPF can provide more intuition:

  • CDF Plot: Shows the probability increasing as you move right along the x-axis.
  • PPF Plot: Shows the x-value corresponding to increasing probability levels.

These functions are essential in probability and statistics, especially for working with normal distributions. Use norm.cdfto find probabilities and norm.ppf to find values for specific probabilities.

4o