Statistics Final, Sample #2, Answer Key

Intro to Probability and Statistics

Sample Final #2 – Questions And Answers (Answer Key)

Professor Brian Shydlo

brian@shydlo.com

Instructions:

1) Please write your name: _____________________________________

2) There are 7 questions totaling 100 points. Please be careful to answer all questions. Partial credit will be given.

Question 1) 12 Points (Correlation and Covariance)

Question 2) 8 Points (Expected Value and Standard Deviation of a Portfolio of Two Assets)

Question 3) 15 Points (Sample Means and Confidence Intervals)

Question 4) 15 Points (T Distribution)

Question 5) 29 Points (Linear Regression)

Question 6) 12 Points (Multiple Linear Regression)

Question 7) 9 Points (Miscellaneous)

Total 100 Points

Question 1) (12 points in Total)

You have the following table of X and Y values. (For example, there is a 40% chance that X will be 8 and Y will be 11, and so on…)

X	Y	Prob(y,x)
2	3	20%
3	5	20%
4	10	20%
8	11	40%

To help you out I have calculated the Variance and Mean (or Expected Value) of each.

μ_x = 5

μ_y = 8

s_x² = 6.4

s_y² = 11.2

Question 1a) (6 Points)

What is Covariance(X,Y)?

Answer: ___________________________

Answer 1a)

Covariance = 7.4

X	Y	Prob(y,x)	X- μ_x	Y - μ_y	Prob(y,x) * (X- μ_x) * (Y - μ_y)
2	3	20%	-3	-5	3
3	5	20%	-2	-3	1.2
4	10	20%	-1	2	-0.4
8	11	40%	3	3	3.6

					Sum = 7.4

Question 1b) (4 Points)

What is the Correlation Coefficient of X,Y?

Answer: ___________________________

Answer 1b)

Correlation(X,Y) = Covariance(X,Y) / (s_{x *}s_y)

The Variances were given, so you need to figure out the Standard Deviations:

s_x= √s_x² = √6.4 = 2.530

s_y= √s_y² = √11.2 = 3.347

so Correlation(X,Y) = 7.4 / (2.530 * 3.347) = 0.874

Question 1c) (2 Points)

Suppose you got to part B an answer of a Correlation Coefficient of 1.2. What would you conclude about your answer?

Answer: _________________________________________________

Answer 1b)

You would conclude you made a mathematical error. Correlation Coefficients must be between -1 and 1.

Question 2) (8 Points in Total)

A certain stock, X, has an expected return of 40% per year and a standard deviation of 50%.

A certain bond, Y, has an expected return of 10% per year and a standard deviation of 10%.

The have a Correlation Coefficient of 0.6

You could write this as:

m_x = 40%, m_y = 10%, s_x = 50%, s_y = 10%, and р_x,y = 0.6

Question 2a) (3 Points)

You decide to invest $40 dollars in X and $60 in Y ($100 in total).

How much money do you expect to have in one year?

Answer: ___________________________

Answer 2)

$40 * (1 + 0.40) + $60 * (1 + 0.10) = 122

Question 2b) (5 Points)

You decide to invest $40 dollars in X and $60 in Y ($100 in total).

What is the Standard Deviation your portfolio?

Answer: ___________________________

Answer 2b)

Variance(whole portfolio) = (Ps_x)² + ((1-P)s_y)² + 2 * (Ps_x) * ((1-P)s_y)) * r_xy

Variance(whole portfolio) = (0.4 * 0.40)² + (0.6 * 0.10)² + 2 * (0.4 * 0.50) * (0.6 * 0.10)) * 0.6

Variance(whole portfolio) = (0.16)² + (0.06)² + 2 * (0.2) * (0.06) * 0.6

Variance(whole portfolio) = 0.0256 + 0.0036 + 0.024

Variance(whole portfolio) = 0.0532%²

Standard Deviation of Whole Portfolio = sqrt(154) = 0.231 or 23.1%

Question 3) (15 Points in Total)

A random sampling of 100 American shrubs revealed the average height of a shrubbery to be 60 centimeters. The Standard Deviation of shrubberies is well known to be 15 centimeters (meaning that the Standard Deviation of the population of shrubberies is known with certainty to be 15 centimeters).

Question 3a) (5 Points)

What is the Standard Error (also called the Standard Deviation) of the Sample Mean?

Answer: ___________________________

Answer 2a)

s_x = 15

s_x-bar = 15 / Sqrt(100) = 15 / 10 = 1.5

s_x-bar = 1.5

Question 3b) (5 Points)

What is a 95% (2 standard deviation) Confidence Interval for the height of an American shrub? Please assume that the appropriate Z-score to use is 1.96.

Answer: ___________________________

Answer 3b)

P[60 - (1.96*1.5) < μ < 60 + (1.96*1.5) ] = 95%

P[60 - 2.94< μ < 60 + 2.94] = 95%

P[57.06 < μ < 62.94] = 95%

Question 3c) (5 Points)

What would the sample size need to be to get a 95% Confidence Interval that is exactly 9.8 centimeters wide? (Please use a Z-score of exactly 1.96)

Answer 3c)

n = (1.96² x 15²) / 4.9² = 36 shrubs

Question 4) (15 points in Total)

I took a sample and created 2 Confidence Intervals each with the same Standard Deviation and Point Estimate of the mean of a distribution. Both Confidence Intervals were for 95%.

The only thing that was different between the two Confidence Intervals was that for one of them I used the Z-distribution (Standard Normal) and for the other one I used the T-distribution:

Confidence Interval A: P[68.24 ≤ μ ≤ 91.76] = 95%

Confidence Interval B: P[60.91 ≤ μ ≤ 99.10] = 95%

Question 4a) (4 Points)

For which Confidence Interval did I use the T distribution (A or B)?

Answer: __________________

Answer 4a)

Confidence Interval B is wider, so it must use the T distribution.

Question 4b) (4 Points)

For the distribution that used the Z distribution I used a Z score of exactly 1.96 (meaning Z_α_/2 is 1.96). What was the Standard Deviation I used?

Answer: __________________

Answer 4b)

This is the Confidence Interval as written:

P[68.24 ≤ μ ≤ 91.76] = 95%

Here is the formula:

P[μ - (Z_α/2* s) ≤ μ ≤ μ + (Z_α/2* s)] = 95%

Which becomes:

P[μ - (1.96* s) ≤ μ ≤ μ + (1.96* s)] = 95%

91.76 - 68.24 = 23.52

μ + (1.96* s) - [μ - (1.96* s)] = 23.52

μ + (1.96* s) - μ + (1.96* s) = 23.52

(1.96* s) + (1.96* s) = 23.52

2 * (1.96* s) = 23.52

3.92s = 23.52 so s = 6

FYI…. The Sample Mean I used was 80, which is at the midpoint of the Confidence Interval.

(If you mistakenly put Confidence Interval A for the answer to question 3a, you would have got that wrong, but this one correct if you put 9.74 as the answer).

Question 4c) (4 Points)

I didn’t say how many degrees of freedom I used for calculating the T distribution, but given the following choices, which one do you think I used?

Choice A: 3 Degrees of Freedom

Choice B: 103 Degrees of Freedom

Answer: __________________

Answer 4c)

The answer is Choice A.

For Degrees of Freedom of 3, the T-score is 3.18, which is what I used for the Confidence Interval as written:

For Degrees of Freedom of 103, the T-score is 1.98, which is very close to the 1.96 that I would have got had I used the Standard Normal Distribution (Z-score).

This question was designed to test whether you knew that the T-score becomes equal to the Z-score as N gets large. If the Degrees of Freedom where 103, then the 2 Confidence Intervals would have been almost identical.

Question 4d) (3 Points)

If I had used the Degrees of Freedom of 103, then what must n have been (or how many items were in the sample or what was the sample size)?

Answer: __________________

Answer 4d)

The answer is 104 in the sample.

The formula for Degrees of Freedom for the T distribution is:

df = n -1

df + 1 = n

103 + 1 = 104

Question 5) (29 Points in Total)

I did a regression on the following data:

#	X	Y
1	2.060	2.044
2	0.064	0.059
3	1.202	1.124
4	0.800	0.500
5	4.099	5.080
6	3.516	3.500
7	2.157	2.362
8	1.378	1.410
9	3.481	4.084
10	0.301	0.319

and got the following information:

The regression equation is

Y = - 0.227 + 1.19 X

Predictor Coef StDev T P

Constant -0.2268 0.1494 -1.52 0.167

X 1.19374 0.06413 18.62 0.000

S = 0.272 R-Sq = XXxXXX

Analysis of Variance

Source DF SS MS F P

Regression 1 25.558 25.558 XXXxXX 0.000

Residual Error XX 0.590 0.074

Total XX 26.148

Unfortunately, the printer is an old model and it smudged the output. I would print another copy, but I realized that it was the last piece of paper. Please help me reassemble the original output

Question 5a) (3 Points)

What is the Degrees of Freedom Total?

Answer: __________________

Answer 5a)

The Degrees of Freedom Total = n - 1 = 10 - 1 = 9

Question 5b) (3 Points)

What is the R-Squared?

Answer: __________________

Answer 5b)

R-Squared = SSR/SST = 25.558/26.148 = 97.7%

Question 5c) (3 Points)

What is the F-score?

Answer: __________________

Answer 5c)

F-Score = MSR/MSE = 25.558/0.074 = 345.38

Question 5d) (4 Points)

Please predict Y using the Regression Equation when X = 3.

Answer: __________________

Answer 5d)

Y = - 0.227 + 1.19 X

Y = - 0.227 + 1.19 * 3

Y = - 0.227 + 3.57 = 3.343

Question 5e) (4 Points)

Please give me a 95% Confidence Interval for Y when X = 3. Please use a Z-score of 1.96 for your Confidence Interval.

Answer: __________________

Answer 5e)

The Point Estimate of Y was calculated in part A to be 3.343.

The Standard Error of the regression is: 0.272.

P [3.343 - (Z_α/2 * s) < μ < 3.343 + (Z_α/2 * s)] = 95%

P [3.343 - (1.96 * 0.272) < μ < 3.343 + (1.96 * 0.272)] = 95%

P [3.343 - (0.533) < μ < 3.343 + (0.533)] = 95%

P [2.810 < μ < 3.876)] = 95%

Question 5f) (4 Points)

How might you respond to someone who asked you to predict a Y with an X of 30?

Answer: ___________________________________________

Answer 5f)

This would not be valid as you would be extrapolating.

Question 5g) (3 Points)

You do another, unrelated regression and get the following information:

R-Squared = 81%

The regression equation is

Y = 10.1 - 0.879 X

What is the value of R (the correlation coefficient?)

Answer: __________________

Answer 5g)

ρ = √(R²) = √0.81 = .90

You need to set the sign to negative since the slope of the Regression Equation is negative, so the answer is -0.90

Question 5h) (5 Points)

I did another, unrelated regression and got the chart below. Is there anything about this chart, which shows the errors (also called the residuals), which would make you question the validity of this Regression?

Answer: ___________________________________________

Answer 5h)

The errors seem to follow a pattern. An assumption of the model is that the errors are random, hence this picture suggests a violation of one of the assumptions of the Linear Regression Model.

Question 6) (12 points in Total)

The following is output from a Multiple Linear Regression:

The regression equation is

Y = - 13.4 + 4.78 X + 0.674 Z

Predictor Coef StDev T P

Constant -13.376 3.867 -3.46 0.005

X 4.7839 0.2727 17.54 0.000

Z 0.6738 0.6479 XXXX 0.319

S = 4.451 R-Sq = 96.5%

Analysis of Variance

Source DF SS MS F P

Regression 2 6598.0 3299.0 166.54 0.000

Residual Error 12 237.7 19.8

Total 14 6835.7

Question 6a) (4 Points)

Please predict Y when X = 5 and Z = 6

Answer: __________________

Answer 6a)

Use the Multiple Regression Equation:

Y = - 13.4 + 4.78 * 5 + 0.674 * 6

Y = - 13.4 + 23.9 + 4.044 = 14.544

Question 6b) (5 Points)

Please comment on the value of adding the Z variable to the Regression Model (versus leaving it out).

Answer: ________________________________________

Answer 6b)

Since the p-value Z is so high (it is 0.319, which is above 0.05) this suggests that the Regression should be run again without including Z. I did and got an R-Squared of 96.2%, which is very close to the R-squared I got using 2 variables.

Question 6c) (3 Points)

What is the T-score of the variable "Z". (In the regression output, it is the value with the XXXX in it.)

Answer: __________________

Answer 6c)

The T-score = (Coefficient - 0) / Standard Deviation =

( 0.6738 - 0 ) / 0.6479 = 1.04

Question 7) (9 points in Total)

Question 7a) (3 Points)

It is possible to have Covariance(X,Y) = 200 and Correlation Coefficient(X,Y) = -0.6

Please indicate if this is True or False

Answer: __________________

Answer 7a)

False. The sign of the Covariance and the Correlation Coefficient must be the same.

Question 7b) (3 Points)

You do 2 simple Linear Regressions:

Y = 0 + 3X

Y = 0 + 5Z

If you do a Multiple Linear Regression you would expect to see this as your Regression Equation: Y = 0 + 3X + 5Z

Please indicate if this is True or False

Answer: __________________

Answer 7b)

False. In general, the slopes of a Multiple Linear Regression of X,Y,Z would be expected to be different compared to a simple Linear Regression of X,Y and Z,Y.

Question 7c) (3 Points)

You run a Multiple Linear Regression of three variables (e.g., A, B, C) and then add a forth variable (e.g., D) it is possible for your new R-Squared to be higher, even if there is actually no "true" relationship between D and any other variables.

Please indicate if this is True or False

Answer: __________________

Answer 7c)

True. You would expect for the R-squared to increase since random chance would tend to show at least a small correlation between two (or more) variables even if the long run value of the correlation where zero.