The Bell Curve: Normal Distribution

What is it? The Shape of Many Things

The Normal distribution is a very common and important pattern in data. You might have heard it called the "bell curve" because of its shape: it's highest in the middle and smoothly tapers off on both sides. Many things in nature and everyday life tend to follow this pattern when you collect enough data. For example:

  • People's heights or weights.
  • Scores on a test (like IQ scores).
  • Measurement errors when you try to measure something very precisely.

A Normal distribution is described by two main numbers:

  • The Mean (μ, pronounced "myoo"): This is the average value, and it's right at the center (the peak) of the bell curve.
  • The Standard Deviation (σ, pronounced "sigma"): This tells you how spread out the data is.
    • A small standard deviation means the bell curve is tall and skinny (most values are close to the average).
    • A large standard deviation means the bell curve is short and wide (values are more spread out from the average).

When we say a variable X follows a Normal distribution, we write it as X ~ N(μ, σ²). (Note: σ² is the variance, which is just the standard deviation squared).

Making it Simple: Standardization and Z-Scores

Every Normal distribution can have a different mean and standard deviation, which means there are infinitely many different bell curves! That would make it very hard to calculate probabilities for each one.

Luckily, we can standardize any Normal distribution. This means we convert our original value (like a specific height) into a Z-score. A Z-score tells us something very useful: how many standard deviations our value is away from the mean.

The formula to calculate a Z-score is:

Z = (X - μ) / σ

Where:

 

  • X is our specific value (e.g., a height of 182cm).
  • μ is the average of our original distribution.
  • σ is the standard deviation of our original distribution.

After we calculate the Z-score, we are now dealing with a Standard Normal Distribution. This special bell curve always has:

  • A mean of 0.
  • A standard deviation of 1.

Probabilities for the Standard Normal Distribution are widely available in Z-tables (or can be found using calculators/software). These tables tell us the area under the curve to the left of a given Z-score, which represents the probability of getting a value less than or equal to that Z-score.

Heights - Normal Distribution

MODERATE

Assume that adult heights are normally distributed with a mean of 175cm and a standard deviation of 7cm. What is the probability that a randomly selected adult is taller than 182cm?

Explore Further: Using the same distribution (mean=175cm, std. dev=7cm), what is the probability that a randomly selected adult's height is between 168cm and 182cm?
(Hint: This is P(168 < X < 182). You'll need to find Z-scores for both 168cm and 182cm. Then, the probability is P(Z168 < Z < Z182) = Φ(Z182) - Φ(Z168)).

Nerchuko Academy · Free DS Interview Prep