PEP - An Unbiased Estimator of the Variance

An Unbiased Estimator of the Variance

Overview

The purpose of this document is to explain in the clearest possible language why the "n-1" is used in the formula for computing the variance of a sample.

The Mean of a Probability Distribution (Population)

The Mean of a distribution is its long-run average. Alternately you could say it is the probability weighted average of each possible value. The symbol for Mean of a distribution is μ.

We could restate the above by using this formula: μ = Σ [(x_i) * p(x_i)]

Σ is the symbol meaning "sum up for all values"
x_i is a particular value of x. If there are n possible values then "i" is every integer between 1 and n inclusive. So if n is 3 then "i" would be [1,2,3] and all of the x_i values would be [x₁, x₂, x₃].
* is the symbol for multiplication
p(x_i) is the probability of x_i occurring.
In the case of the die roll, the formula would be written out the long way as: (1 * 1/6) + (2 * 1/6) + (3 * 1/6) + (4 * 1/6) + (5 * 1/6) + (6 * 1/6) = 3.5

Further Notes On This Formula:

If there is an equal probability of all items occurring then you can write the formula as: μ = [Σ(x_i)] / n, which for the die example would be (1+2+3+4+5+6)/6 = 3.5.
This formula can only be used if you are dealing with a discrete probability distribution (like a die roll). There are other analogous ways to find the expected value for a continuous distribution (like the normal distribution).
This formula can only be used if you have precise knowledge of every possible value (every possible x_i) and also know the exact probability of each occurrence.
Some probability distributions have well understood properties and have shortcut formulas for the Mean. For example, the formula for the Mean for the Binomial Distribution is np where n is the sample size and p is the probability for success. So if n is 10 and p is 0.6 then the expected value is 6. Whether you use the shortcut formula or the long way you'll get the same Mean.
Do not confuse this formula with the formula for the sample mean, which looks similar at first glance. The sample mean is a random variable with a well-understood distribution. The Mean is not a random variable; it is a single number that can be calculated with 100% certainty for any distribution in which you know all values and probabilities.

The Sample Mean Of A Sample Taken From A Probability Distribution

The Sample Mean from a distribution is the probability weighted average of each sample. Typically we assume we are dealing with an unbiased sample, which means that we are assuming that the probability of each sample occurring is 1/n where n is the number of the sample. So if you roll the die 8 times, your sample size (your n) is 8 and the probability of each sample is ⅛. The symbol for the Sample Mean is .

We could restate the above by using this formula: = Σ [(x_i) * 1/n]

We could also write the above formula as = Σ(x_i)/n
Σ is the symbol meaning "sum up for all values"
n is the sample size
x_i is a particular sample value. If there are n possible values then "i" is every integer between 1 and n inclusive. So if n is 3 then "i" would be [1,2,3].
* is the symbol for multiplication
In the case of a die rolled 3 times then n would be 3. If the values are [2,3,6] then the sample mean would be: (2 * 1/3) + (3 * 1/3) + (6 * 1/3) = 3.6666.

Further Notes On This Formula:

This formula can be used whether the distribution from which you are taking the sample is discrete or continuous.
Sometimes your Sample Mean will equal the Expected Value of the distribution. This does not need to be the case so if it does happen it is typically just a coincidence (depending on what the original population looks like). For example, if you roll the dice twice and you roll a 1 and a 6 then the sample mean would be equal to 3.5 which is coincidentally the Mean of the original distribution (also called the True Mean).

The Variance of a Probability Distribution (Population)

The Variance is the Expected Value of the squared deviations from the mean.

By "deviations from the mean" we are talking about (x_i - μ) where x_i is a single particular sample from a distribution and μ is the mean of the distribution. If we think about the roll of a single die then x_i might be 1 though 6 and μ is 3.5. In the die roll example, since there are only 6 possible values there are also 6 possible deviations from the mean. They are:

1-3.5 = -2.5

2-3.5 = -1.5

3-3.5 = -0.5

4-3.5 = +0.5

5-3.5 = +1.5

6-3.5 = +2.5

By "squared deviation from the mean" we are talking about the previous set of numbers squared. Squaring the number has the beneficial affect of making every number positive. Without squaring the numbers, then the expected value of the deviation of the mean would be zero. In other words, the average of -2.5, -1.5, -0.5, +0.5, +1.5, +2.5 is zero. Here are the numbers for the die roll example squared.

(1-3.5)² = (-2.5)²= 6.25

(2-3.5)² = (-1.5)²= 2.25

(3-3.5)² = (-0.5)²= 0.25

(4-3.5)² = (+0.5)²= 0.25

(5-3.5)² = (+1.5)²= 2.25

(6-3.5)² = (+2.5)²= 6.25

By "Expected Value" we mean the long run average. It is the probability weighted average of the values. For the die roll example, in the long run each side will have a 1/6^thchance of appearing. So the expected value of the squared deviations of the mean is (6.25 * 1/6) + (2.25 * 1/6) + (0.25 * 1/6) + (0.25 * 1/6) + (2.25 * 1/6) + (6.25 * 1/6) = 2 ¹¹/₁₂= 2.91666.

In the simple case where every x_ihas the same probability you could write this as (6.25 + 2.25 + 0.25 + 0.25 + 2.25 + 6.25) /6.

The Variance is denoted using this symbol: σ²

Expected Value is denoted by this symbol: E(x)

You could write the expected value of the squared deviation of the mean like this:

σ² = E[(x_i - μ)²]

You could also write it like this:

σ² = Σ[(x_i - μ)² * p(x_i)]

Σ is a symbol meaning to sum all values. You would sum all values from "i" equals 1 to n, where n is the total number of values.

* just means to multiply

p(x) is the probability that a particular x will occur.

We applied this formula for the die roll example above (repeated here):

((1-3.5)² * 1/6) + ((2-3.5)² * 1/6) + ((3-3.5)² * 1/6) + ((4-3.5)² * 1/6) + ((5-3.5)² * 1/6) + ((6-3.5)² * 1/6) = 2.916666.

For the purposes of this document, we'll only be looking at cases where the probability of each occurrence is equal. So for the rest of the document we'll be using a slightly simpler version of the above formula than can only be used if the probability of each occurrence is equal. Again, we are using this because it is simpler to read and understand and it is all we'll need.

The variation on the formula is:

σ² = Σ[(x_i - μ)²] / n

You can also write the above formula like this:

σ² = Σ(x_i²)/n - μ²

See Appendix A for a derivation on this alternate form of the formula.

In the special case of each x_i having the probability of 1/n (meaning p(x_i) is 1/n for all "i") you can use this formula:

σ² = [Σ(x_i²) - nμ²] / n

σ² = Σ(x_i²)/n - μ²

For the die roll example, that would be:

Step 1) (1² + 2² + 3² + 4² + 5² + 6²)/6 - 3.5²

Step 2) (1 + 4 + 9 + 16 + 25 + 36)/6 - 12.25

Step 3) (91/6) - 12.25

Step 4) 15.16666 - 12.25 = 2.916666

Note: In this case we have a probability distribution that has an equal probability for each possibility (for each x). That is just a coincidence. Later we will look at samples from a distribution (with the die roll example we would be talking about rolling a single die typically two or more times to get sample). In general, no matter what main population looks like, we will assume samples from that population are equally likely, that each sample has an equal probability of occurring (this is synonymous with saying our sampling is unbiased).

The Variance of a Sample from a Probability Distribution

The formula for the variance of a sample taken from a Probability Distribution is:

s² = Σ[(x_i - )²] / n

s² is the symbol for the variance of a sample.
n is the sample size
x_i is a particular sample value. If there are n possible values then "i" is every integer between 1 and n inclusive. So if n is 3 then "i" would be [1,2,3].
is the sample mean calculated using this formula: = Σ(x_i)/n

Important Note: Σ[(x_i - μ)²] ≠ Σ[(x_i - )²]

Why? The main reason is that the sample mean() is not equal to the "true" mean(μ) of a population ( ≠ μ).

μ is the true population mean and is constant number that can be computed when you know all of the possible x_ivalues.

is the sample mean for a particular sample of size n. It is a random variable that has an expected value of μ and a standard deviation that is related to the standard deviation of the population by this formula:

σ = σ_x / n

The sample mean is normally a random variable with a particular mean and variance of its own. However, when used in the context of the s² formula (Σ[(x_i - )²] / n), the sample mean should not be thought of as a random variable at all. It is completely determined by the x_i values and should not even be though of as a separate variable. In fact, you can rewrite the formula to get rid of the term entirely by replacing it with the formula used to calculate it.

I.e., s² =Σ[(x_i - (Σ(x_i)/n))²] / n)

For example, when n equals 2, this becomes:

s² =Σ[(x_i - (x_{1 +}x₂)/2))²] / 2)

which is equivalent to this:

s² = [(x₁ - )² + (x₂ - )²] / 2

You can further reduce this to this (when n = 2):

s² = (½x₁ - ½x₂)²

See Appendix B for more details.

So the question is, if s² = Σ[(x_i - )²]/n then what is the expected value of s² or what is E(s²)? If it is equal to σ²then it is an unbiased estimator of σ²_. As it turns out, s² is not an unbiased estimator of σ².

First lets write this formula:

s² = Σ[(x_i - )²] / n

like this:

s² = [ Σ(x_i²) - n²] / n

(you can see Appendix A for more details)

Next, lets subtract μ from each x_i. This will leave s² unchanged as long as we also subtract it from _.

So we start with this:

s² = [ Σ(x_i²) - n²] / n

and get this:

s² = [ Σ(x_i- μ)² - n(- μ)²] / n

(See Appendix C for details)

Here we'll find the expected value of s²:

Step 1) s² = Σ[(x_i - )²] / n

This is the starting point.

Step 2) ns² = Σ[(x_i - )²]

Multiply both sides by n to make the formulas easier to read:

Step 3) ns² = Σ[(x_i - μ - + μ)²]

Add and subtract μ, the population mean. Notice that adding and subtracting any number nets to zero, so this is ok.

Step 4) ns² = Σ[(x_i - μ)²]- n(- μ)²

The right side term is shown to be the same as the formula of s²in Appendix C. Or you could say Σ[(x_i - μ)²]- n(- μ)² = Σ[(x_i - )²] is proven in Appendix C.

Step 5) E(ns²) = nσ²- n(- μ)²

Replace Σ[(x_i - μ)²] with nσ².

Why? By definition σ² = E[(x_i - μ)²], which equals Σ[(x_i - μ)²]/n when the probability of each x_i is identical, which is the case as we are assuming each sample has the same probability.

So if σ^{2 =}Σ[(x_i - μ)²]/n then nσ²= Σ[(x_i - μ)²] so we are able to replace this term in the equation.

Step 6) E(ns²) = nσ²- Σ[(- μ)²]

Since n(- μ)² = Σ[(- μ)²]

Step 7) E(ns²) = nσ²- n

Replace Σ[(- μ)²] with n

Why? If σ² = E[(x_i- μ)²] then = E[(- μ)²]

If σ² = Σ[(x_i- μ)²]/n then = Σ[(- μ)²]/n (when the probability of each item is equal.)

If = Σ[(- μ)²]/n then multiply both sides by n to get n= Σ[(- μ)²]

Step 8) E(ns²) = nσ²- σ²

Replace nwith

Why? We have seen previously that = σ²/ n. That is, the variance of the sample mean is equal to the variance of the original probability distribution divided by n, where n is the sample size.

Since = σ²/n then σ² = n

Step 9) E(ns²) = (n-1) σ²

Factor out the n-1.

Step 10) E(s²) = (n-1) σ²/ n

Divide both sides by n.

Therefore the expected value of s²is not σ². To get an unbiased estimator use this:

s² = Σ[(x_i - )²]/(n-1) instead since E(Σ[(x_i - )²]/(n-1)) = σ²

Appendix A

Going from this:

σ² = Σ[(x_i - μ)²] / n

to this:

σ² = Σ(x_i²)/n - μ²

Summary

Step 1) σ² = Σ[(x_i - μ)²] / n

Step 2) nσ² = Σ[(x_i - μ)²]

Step 3) nσ² = Σ[(x_i - μ) * (x_i - μ)]

Step 4) nσ² = Σ[(x_i² -μx_i- μx_i + μ²)]

Step 5) nσ² = Σ[(x_i² - 2μx_i + μ²)]

Step 6) nσ² = Σ(x_i²) - Σ(2μx_i) + Σ (μ²)

Step 7) nσ² = Σ(x_i²) - [2μ *Σ(x_i)] + Σ (μ²)

Step 8) nσ² = Σ(x_i²) - [2μ *Σ(x_i)] + nμ²

Step 9) nσ² = Σ(x_i²) - [2μ * nμ)] + nμ²

Step 10) nσ² = Σ(x_i²) - 2nμ² + nμ²

Step 11) nσ² = Σ(x_i²) - nμ²

Step 12) σ² = Σ(x_i²)/n - μ²

Details

Step 1) σ² = Σ[(x_i - μ)²] / n

This is just the normal formula for variance of a population

Step 2) nσ² = Σ[(x_i - μ)²]

Multiply both sides by n. The only reason to do this is to make it easier to read. Our last step is to undo this by dividing both sides by n.

Step 3) nσ² = Σ[(x_i - μ) * (x_i - μ)]

Write this out in a longer form. So instead of writing a²write a * a

Step 4) nσ² = Σ[(x_i² -μx_i- μx_i + μ²)]

Perform the multiplication. Remember FOIL (First, Outer, Inner, Last)? So instead of writing (a - b) * (a - b), write: (a² - 2ab + b²)

Step 5) nσ² = Σ[(x_i² - 2μx_i + μ²)]

This completes the factoring step begun in Step 4.

Step 6) nσ² = Σ(x_i²) - Σ(2μx_i) + Σ (μ²)

Move the summation signs next to each value. You can do this because you are just adding or subtracting each term. You couldn't do this if you were multiplying or dividing each term.

Step 7) nσ² = Σ(x_i²) - [2μ *Σ(x_i)] + Σ(μ²)

Move the 2μ to the outside of the summing of the x_i terms. Why is this OK? Remember that the Σ symbol means sum all of the terms for all x_i for "i" equals 1 to n. Since the 2 and the μ are constant and therefore unaffected by particular value of the x_i, you can move them to the outside of the summation notation. You wind up multiplying once at the end of the summing rather than multiplying for each loop in the summing process, but you get the same result.

Step 8) nσ² = Σ(x_i²) - [2μ *Σ(x_i)] + nμ²

Σ(μ²) becomes nμ². Why? Remember that the Σ symbol specifically means to sum for all values of x_ifrom "i" equals 1 to n . Since μ is a constant across all values of x_i, you can just multiply μ² by n to get the same result as you would get by summing it n times.

Step 9) nσ² = Σ(x_i²) - [2μ * nμ)] + nμ²

Σ(μ) becomes nμ. Why? Same logic as for Step 8.

Step 10) nσ² = Σ(x_i²) - 2nμ² + nμ²

This is just rewriting the formula to make the middle term easier to read.

Step 11) nσ² = Σ(x_i²) - nμ²

Add the last two terms on the right side of the equation.

Step 12) σ² = Σ(x_i²)/n - μ²

Divide both sides by n to get the result we wanted, the alternate formula for σ².

Appendix B

This: [(x₁ - )² + (x₂ - )²] / 2

becomes: (½x₁ - ½x₂)²

Summary

Step 1) [(x₁ - )² + (x₂ - )²] / n

Step 2) [(x₁-(x₁+x₂)/n)² + (x₂-(x₁+x₂)/n)²] / n

Why? = (x₁ + x₂) / n

Step 3) [(x₁-(x₁+x₂)/2)² + (x₂-(x₁+x₂)/2)²] / 2

Step 4) [(x₁- (x₁/2) - (x₂/2))² + (x₂- (x₁/2) - (x₂/2))²

Step 5) [(½x₁-(x₂/2))²+(½x₂-(x₁/2))²]/2

Step 6) [(½x₁-½x₂))²+(½x₂-½x₁))²]/2

Step 7) [(½x₁-½x₂) * (½x₁-½x₂) + (½x₂-½x₁) * (½x₂-½x₁)]/2

(a - b) * (a - b) = a²- ba - ba + b²

Step 8) [((½x₁)² - (½ * ½ * x₁*x₂) - (½ * ½ * x₁*x₂) + (½x₂)²) +

((½x₂)² - (½ * ½ * x₂*x₁) - (½ * ½ * x₂*x₁) + (½x₁)²)] / 2

Step 9) [((½x₁)² - (1/4x₁x₂) - (1/4x₁x₂) + (½x₂)²) +

((½x₂)² - (1/4x₂x₁) - (1/4x₂x₁) + (½x₁)²)] / 2

Step 10) [(½x₁)² + (½x₂)² + (½x₂)² + (½x₁)²- (x₁x₂)] / 2

Step 11) [¼x₁² + ¼x₂² + ¼x₂² + ¼x₁²- (x₁x₂)] / 2

Step 12) [1/2x₁² + 1/2x₂² - (x₁x₂)] / 2

Step 13) (¼x₁² - ½x₁x₂ + ¼x₂²)

Step 14) (¼x₁² - ¼x₁x₂ - ¼x₁x₂ + ¼x₂²)

Step 15) (½x₁ - ½x₂) * (½x₁ - ½x₂)

Step 16) (½x₁ - ½x₂)²

With n = 3 you get:

1) s² = [(x₁ - )² + (x₂ - )² + (x₃ - )²] / n

2) s² = [(x₁-(x₁+x₂+x₃)/n)² + (x₂-(x₁+x₂+x₃)/n)² + (x₃-(x₁+x₂+x₃)²]/ n)²]/n

(Since = (x₁ + x₂+x₃) / n)

3) s² = [(x₁-(x₁+x₂+x₃)/3)² + (x₂-(x₁+x₂+x₃)/3)² + (x₃-(x₁+x₂+x₃)²]/3)²]/3

5) [(x₁- (x₁/3) - (x₂/3) - (x₃/3))² + (x₂- (x₁/3) - (x₂/3) - (x₃/3))² +

(x₃- (x₁/3) - (x₂/3) - (x₃/3))²] / 3

6) [(2/3x₁-(x₂/3)-(x₃/3))²+(2/3x₂-(x₁/3)-(x₃/3))²+(2/3x₃-(x₁/3)-(x₂/3))²]/n

7) The above formula can be reduced further (but not here due to space constraints.)

Appendix C

This: [ Σ(x_i- μ)² - n(- μ)²] / n

Is equivalent to this: [ Σ(x_i²)- n²] / n

Here are the steps to go from one to the other:

Step 1) s² = [ Σ(x_i- μ)² - n(- μ)²] / n

Step 2) s² = [ Σ[(x_i- μ) * (x_i- μ)] - n[(- μ) * (- μ)]] / n

Step 3) s² = [ Σ(x_i²- 2μx_i+ μ²) - n(² - 2μ + μ²)] / n

Step 4) s² = [ Σ(x_i²)- Σ(2μx_i)+ Σ(μ²) - n² + 2nμ - nμ²] / n

Step 5) s² = [ Σ(x_i²)- 2μΣ(x_i)+ nμ² - n² + 2nμ - nμ²] / n

Step 6) s² = [ Σ(x_i²)- 2μΣ(x_i)- n² + 2nμ ] / n

Step 7) s² = [ Σ(x_i²)- 2μn - n² + 2nμ ] / n

Step 8) s² = [ Σ(x_i²)- n²] / n

Home

Site Map