Looking at Data in 10 Minutes#1
This is Part 1 of 3 for what you might find in a typical Intro to Probability and Statistics course.
This covers Mean, Median, Mode, Variance and Standard Deviation, Frequency Distributions and Histograms
Click this link for Part 2: Probability in 10 Minutes
Click this link for Part 3: Statistics in 10 Minutes
Who Is This For?
Someone without any prior experience, such as a college freshman about to take her first Statistics class.
Key Goals for Each of the 3 Parts
1) Introduce the Concepts, Terminology and Symbols that would be covered in a typical Intro to Statistics course.
2) Provide an intuitive understanding of things versus just providing formulas.
3) Present the information step-by-step in the best order for learning.
Looking at Data Part 1: Mean, Median and Mode
1) Mean: This is short for the Arithmetic Mean#2.
1.1) The Mean is the average of a set of numbers#3.
1.2) The formula is to sum up all values and divide by the total number of values#4.
1.3) The number of values is typically written as N for ‘Number’. Uppercase ‘N’.
1.4) We’ll write out the values as x1 for the first value, x2 for the second value and so on, all the way up to xn for the last value with a lowercase ‘n’.
1.5) The Mean is typically written using Greek letter mu, which is µ
1.6) The uppercase Greek letter sigma which looks like this ∑ is typically used to mean ‘the sum of’.
1.7) We will therefore write the formula for the Mean as:
µ = x1 + x2 + x3 + ... + xn / N
µ = ∑x / N
1.8) As an example, the Mean of these 3 numbers is 105.333333:
Number 1) 110
Number 2) 98
Number 3) 108
µ = (110 + 98 + 108) / 3
µ = 105.333333
1.9) The Mean of the values on a 6-sided die (singular of dice) is:
µ = (1 + 2 + 3 + 4 + 5 + 6) / 6
µ = 3.5
1.10) Observe that the mean does not need to be one of the source numbers. The Mean of 3.5 is not one of the vales on a die.
1.11) For the Mean, the order of the numbers does not matter.
The Arithmetic Mean of this set of numbers:
110, 98, 108
Is the same as the Arithmetic Mean of this (same) set of numbers (written in a different order):
98, 108, 110
1.12) Excel function to use is ‘AVERAGE’
2) Median: This is the middle number in a set of numbers. This requires that you order the numbers from smallest to largest.
2.1) For example, for this set of 5 numbers, the Median is 8.
Number 1) 1
Number 2) 8
Number 3) 8
Number 4) 9
Number 5) 9
By comparison, the Mean is:
µ = (1 + 8 + 8 + 9 + 9) / 5
µ = 7
2.2) Since the Median is just the middle number, in this example the Median would still be 8.
Number 1) -101
Number 2) 2
Number 3) 8
Number 4) 55
Number 5) 999
In the prior example, the Mean was 7, which is pretty close to the Median of 8. In this example, the Mean is intentionally very different to illustrate the point. The Mean here is 192.6.
2.3) The Median is used instead of the Mean in cases where a small number of very large values (or very small values) can Skew things one way or another.
The most common example is looking at incomes, how much money people earn in a year.
For example, if you have a room of 11 people with 10 average earners and Elon Musk (one of the richest people in the world), the Arithmetic Mean of their incomes, which would include Elon’s, might be over a million dollars. While their Median income, which is the income of the 6th person, ranked from least income to most income, might be $50,000, well under the average of $1m.
2.4) Excel function to use is ‘MEDIAN’
3) Mode: This is the most common number in a set of numbers#5.
As with the Mean, the order of the numbers does not matter in determining the Mode. Although ordering the numbers is not a requirement, we’ll present the numbers ordered from lowest to highest to make it easier to see the most common numbers.
3.1) For this example of 6 numbers, the Mode is 9. The number 9 repeats 3 times, more than any number.
Number 1) 1
Number 2) 8
Number 3) 8
Number 4) 9
Number 5) 9
Number 6) 9
Alternately, we can write this set of numbers as a list like this:
1, 8, 8, 9, 9, 9
3.2) If there are two numbers that repeat the same number of times and especially if they are non-adjacent the set of numbers is sometimes called ‘Bimodal’.
For example, this set of 10 numbers is Bimodal with Modes of 2 and 9.
1, 2, 2, 2, 6, 8, 8, 9, 9, 9
3.3) Excel function to use is ‘MODE’.
Looking at Data Part 2: Variance and Standard Deviation
1) Variance: A measure of variation or variability of a set of numbers. This is a hugely important concept in many areas such as finance and accounting.
1.1) In finance, a stock or that has a bond that range of values is considered less risky than one that has a larger range of values.#6
1.2) In manufacturing, if you need to make a metal bar exactly one meter long, a machine that makes them 0.999 to 1.001 meters long (small variation) will be better than one that makes them 0.98 to 1.02 meters long (larger variation). This is especially important when you are making many pieces that need to fit together, like parts of a car.
2) Standard Deviation: This is another measure of variation or variability of something. This is related to the Variance. The Standard Deviation is defined as the square root of the variance.
3) In practice sometimes the Standard Deviation will be used and sometimes the Variance will be used. For the examples above for finance and manufacturing, the measurement of variation are more commonly presented as Standard Deviations. You might see the Variance used in some formulas in Statistics.
The point is to make sure which one you are working with in any particular context. Of course, if you have one of them, you can easily get the other by either squaring the value or taking the square root, as appropriate.
4.1) Variance: The symbol is the lowercase Greek letter sigma squared: σ2
4.2) Standard Deviation: The symbol is the lowercase Greek letter sigma (without the squared): σ
5.1) The Variance of a population is the Average of the Squared Deviations from the Mean.
We’ll write that out as#8:
σ2 = [(x1 - µ)2 + (x2 - µ)2 + (x3 - µ)2 + … + (xn - µ)2] / N
x1 = the first value
xn = the last value. Lowercase ‘n’
µ = the Mean, i.e., the Arithmetic Mean
N = the total number of values. Uppercase ‘N’.
5.2) The Standard Deviation is just the square root of that.
We’ll write that as either this:
Or like this, which is a bit easier to do on a computer:
σ = ( [(x1 - µ)2 + (x2 - µ)2 + (x3 - µ)2 + … + (xn - µ)2] / N )^1/2
Variance of a Die (Single of Dice).
Values are 1, 2, 3, 4, 5, 6.
N, the number of values, is 6.
Step 1) Figure out the Mean. As previously shown:
µ = (1 + 2 + 3 + 4 + 5 + 6) / 6
µ = 3.5
Step 2) Use this Formula for Variance.
σ2 = [(x1 – 3.5)2 + (x2 - 3.5)2 + (x3 - 3.5)2 + (x4 - 3.5)2 + (x5 - 3.5)2 + (x6 - 3.5)2] / 6
σ2 = [(1 – 3.5)2 + (2 - 3.5)2 + (3 - 3.5)2 + (4 - 3.5)2 + (5 - 3.5)2 + (6 - 3.5)2] / 6
σ2 = [(–2.5)2 + (–1.5) 2 + (–0.5)2 + (+0.5)2 + (+1.5)2 + (+2.5)2] / 6
σ2 = [6.25 + 2.25 + 0.25 + 0.25 + 2.25 + 6.25] / 6
σ2 = 17.5 / 6
σ2 = 2.916667
Step 3) And then take the square root to calculate the Standard Deviation.
σ = 2.916667^1/2
σ = 1.707825
7) Excel functions to use depending on your version of Excel are:
For Standard Deviation:
7.1) The ‘P’ in the function names is for Population. There are similarly named functions, e.g., STDEV (without the ‘P’) which will be used later when we discuss topics on Statistics. The formulas are slightly different.#9
7.2) The Excel functions were given assuming that the data was located in Cells A1 to A6. You’ll need to vary those values for your data as appropriate.
Looking at Data Part 3: Frequency Distributions
1) Frequency Distribution: A table of data showing counts (frequencies) of numbers for single numbers or for a range.
We’ll look at the count of the sum of 2 dice. Each die has values 1 to 6 which allows for the sum ranging from 2 to 12.
We’ll use this notation (Dice1, Dice2) to show the values. For example, if Dice1 is a 2 and Dice2 is a 5, that would look like this:
Importantly, if Dice1 is 5 and Dice2 is 2, then that would look like this:
In other words (2, 5) and (5, 2) are considered different, even though they both sum up to 7. Using this approach there are 36 possible variations for 2 dice.
Here are the Sum of Two Dice with details shown for sums from 2 to 7:
Sums to 2: Frequency = 1
Sums to 3: Frequency = 2
(1, 2), (2, 1)
Sums to 4: Frequency = 3
(1, 3), (3, 1), (2, 2)
Sums to 5: Frequency = 4
(1, 4), (4, 1), (2, 3), (3, 2)
Sums to 6: Frequency = 5
(1, 5), (5, 1), (2, 4), (4, 2), (3, 3)
Sums to 7: Frequency = 6
(1, 6), (6, 1), (2, 5), (5, 2), (3, 4), (4, 3)
Sums to 8: Frequency = 5
Sums to 9: Frequency = 4
Sums to 10: Frequency = 3
Sums to 11: Frequency = 2
Sums to 12: Frequency = 1
3) Classes: Classes are the number of groupings for a Frequency Distribution. Other terms for this concepts are ‘Ranges’ or ‘Buckets’.
For the prior example of the sum of two dice, there are 11 Classes, ranging from 2 to 12.
If we wanted, we would reduce the number of classes from 11 down to just 6, with values based on ranges like this:
Class1: Dice Sum to 1 to 2: Frequency = 1
Class2: Dice Sum to 3 to 4: Frequency = 5
Class3: Dice Sum to 5 to 6: Frequency = 9
Class4: Dice Sum to 7 to 8: Frequency = 11
Class5: Dice Sum to 9 to 10: Frequency = 7
Class6: Dice Sum to 11 to 12: Frequency = 3
See that if you sum up the frequencies, you get the same total of 36.
36 = 1 + 5 + 9 + 11 + 7 + 3
Looking at Data Part 4: Histogram
1) Histogram: A graphical representation of a Frequency Distribution
Based on the original example of the sum of 2 dice.
The Y-axis is the count. The X-axis is the Class.
For example, see the bar in the lower left. That shows that there is 1 case when dice sum to 2. In the middle it shows there are 6 cases when dice sum up to 7.
See that the total number of possible sums of 2 dice is still 36 even when viewed as a graph.
This is based on the 2nd example of having 6 Classes for the Sum of 2 Dice Rolls.
3) Looking at Histograms, Mean, Median and Mode
Typical for Intro to Statistics classes you be asked to look at a Histogram and answer questions about the Mean, Median and Mode.
We’ll provide the frequencies here in table form (Frequency Distribution), though for a test you’ll likely just get the Histogram.#10
1 to 10
11 to 20
21 to 30
31 to 40
41 to 50
51 to 60
61 to 70
71 to 80
81 to 90
91 to 100
That totals to 367 different values:
367 = 4 + 8 + 15 + 25 + 43 + 75 + 62 + 41 + 31
Q1) What is the Mode?
A1) This is typically the easiest to discern. This is the number or range of numbers that has the highest frequency. In this case, that is the range from 61 to 70, which has a total of 75 values within that range.
Q2) Which is higher, the Median or the Mean?
A2) This is a bit harder to figure out just by looking at the above Histogram. We’ll show a trick on how to figure this out.
As a reminder, the Median is the middle value, ranked from smallest to largest. There are 367 values in total, so the middle value would be the 184th number. Although we are not going to use that directly here.
For the trick, we start by observing the values are skewed to the left. Then we can create a smaller set of values that we can more easily wrap our heads around.
Let’s use these 5 value values:
1, 8, 8, 9, 9
That looks like this as a Histogram. This is also skewed to the left though somewhat simplified.
The Median is 8, as that is the middle number, the third of the 5 numbers:
1, 8, 8, 9, 9
For the Mean, we have:
µ = (1 + 8 + 8 + 9 + 9) / 5
µ = 35 / 5
µ = 7
The answer in this case is the Mean is less than the Median, which will be true for Distributions that are skewed to the left.
For distributions that are skewed to the right, that would be reversed and the Mean would be higher than the Median.
1, 1, 8, 8, 9
The key here is that you may be asked this question just by looking at a Histogram and without having access to the underlying table of data.
#1) The title of this page was inspired by ‘Learn Python in 10 Minutes’ at:
#2) Another ‘mean’ is the ‘Geometric Mean’. That has a formula where you multiply all of the numbers and then take the square root. For example, for these three numbers which represent the assumed returns on a stock for three years:
Year 1: 110%
Year 2: 98%
Year 3: 108%
We are writing this to show the ending value relative to the starting value, so by 110% we mean that the stock went up 10%. And for 98%, we mean that the stock went down 2%.
This is called the ‘one plus’ approach, since we are adding one to the return. i.e., take -2% and add one and get +98%.
This makes the math nicer for Geometric Means because we don’t need to worry about negative numbers, since we are assuming stock prices can’t go below zero.
For the above example, we get this Geometric Mean:
Geometric Mean = (110% * 98% * 108%)^1/3
Geometric Mean = (116.42400%)^1/3
Geometric Mean = 105.19962%
Compare that to the Arithmetic Mean of 105.33333%, which is just summing the numbers and dividing by 3.
#3) The Mean is the ‘Simple Average’ of a set of numbers. There is another kind of average called the ‘Weighted Average’, which each of the values you are averaging is given a weighting other than 1/n.
#4) While it is not important for now, try to remember for later that the concept of the Mean (Arithmetic Mean) applies only when we have a set of numbers which represent the complete set of things. The term for this is ‘Population’. The average of a Population is known as the Mean and we use the symbol µ.
When we are working with a Sample of data, there is a different term used for the average and a different symbol. The term is Sample Mean and the symbol is an x with a bar over it:
Sample Mean =
An example of a sample would be to pick 10 students at random from a class of 30 and take their average height. The Sample Mean would be the average of the 10 selected students. In the Statistics Part, we’ll show how to use Statistical Inference to Estimate the Population Mean of the overall population of 30 students and provide a range around it called a Confidence Interval where we can be 95% sure, based on some assumptions, that the true Population Mean of the 30 students is within that range.
#5) The Mode as a value is as a practical matter used far less than the Mean or the Median. The Mean is by far the most used and so most important concept to know.
#6) For stocks, bond and commodities, finance professionals typically measure the change in values from one do to another when determining how risky a something is. For simplicity, we’ll say they use the percentage change. Finance professionals will use the term ‘Volatility’ to describe the Standard Deviation of the percent price change on an annualized basis.
6.1) A stock that trades $100 one day, then $110 the next and then $105 on the third day has day-over-day percentage moves of up 10% and then down 4.5%.
That is considered less risky than a stock that trades like this:
6.2) $4 on day 1, $5 on day 2 and $4.25 on day 3. Those percentages are much higher. 25% up move from Day 1 to Day 2 and 15% down move from Day 2 to Day 3.
#7) While it is not important for now, try to remember for later that the symbols σ2 for Variance and σ for Standard Deviation are to be used only when talking about Population, meaning the complete set of numbers. The term for this is ‘Population’.
When we are working with a Sample of data, there is a different symbol used. For a Sample of Data, we use lowercase s for Standard Deviation and then square it for Variance.
Variance of a Sample: s2
Standard Deviation of a Sample: s
#8) This formula for Variance should be used only when working with a Population, meaning you know all of the possible values. When dealing with a Sample of data from a bigger Population, there is a slightly different formula. That will be discussed in the Statistics section. As a comparison, the formulas for Arithmetic Mean (of a population) and Sample Mean are basically the same. The formulas only diverge when we get to Population Variance versus Sample Variance.
8.1) Some may wonder why we take the square of the difference. That is to get rid of negatives. Suppose we just took the average of the differences for each value from the mean. Without the squaring. That would look like this using the single die as an example:
??? = [(x1 – 3.5) + (x2 - 3.5) + (x3 - 3.5) + (x4 - 3.5) + (x5 - 3.5) + (x6 - 3.5)] / 6
??? = [(1 – 3.5) + (2 - 3.5) + (3 - 3.5) + (4 - 3.5) + (5 - 3.5) + (6 - 3.5)] / 6
??? = [(–2.5) + (–1.5) + (–0.5) + (+0.5) + (+1.5) + (+2.5)] / 6
??? = 0 / 6
??? = 0
An average deviation of 0 (zero) would not make much sense.
An alternate approach might be to take the average of the absolute value of the differences.
Average Deviation = [abs(x1 – 3.5) + abs(x2 - 3.5) + abs(x3 - 3.5) + abs(x4 - 3.5) + abs(x5 - 3.5) + abs(x6 - 3.5) ] / 6
Average Deviation = [abs(1 – 3.5) + abs(2 - 3.5) + abs(3 - 3.5) + abs(4 - 3.5) + abs(5 - 3.5) + abs(6 - 3.5)] / 6
Average Deviation = [2.5 + 1.5 + 0.5 + 0.5 + 1.5 + 2.5] / 6
Average Deviation = 9 / 6
Average Deviation = 1.5
A value of 1.5 is more reasonable. That said, we won’t be saying anything more about the Average Deviation as it is rarely used relative to the Variance and the Standard Deviation.
#9) See this link for an explanation of the difference in the formulas for the Population Variance versus the Sample Variance. Note: When we say ‘Sample Variance’ we really mean to say that we are using the data from a Sample to derive an Unbiased Estimator of Population Variance for the overall Population from which we took the samples.
#10) Looking at the Frequency Distribution and looking at the first class, which is for values ranging from 1 to 10, a note that we don’t know exactly what those values are.
They could be:
1, 1, 5, 6
7, 8, 8, 9
Or non-whole numbers like
1.5, 6.5, 7.343, 8.9
All we know about them is that there are 4 of them and they are greater or equal to 1 and less than or equal to 10.