Statistics Final, Sample #3, Answer Key

Intro to Probability and Statistics

Sample Final #3 – Questions And Answers (Answer Key)

Professor Brian Shydlo

brian@shydlo.com

Instructions:

1) Please write your name: _____________________________________

2) There are 7 questions totaling 100 points. Please be careful to answer all questions. Partial credit will be given.

Question 1) 16 Points

Question 2) 18 Points

Question 3) 6 Points

Question 4) 15 Points

Question 5) 18 Points

Question 6) 21 Points

Question 7) 6 Points

Total: 100 Points

Question 1) (15 points in total)

A certain stock, X, has an expected return of 20% per year and a standard deviation of 25%.

A certain bond, Y, has an expected return of 5% per year and a standard deviation of 9%.

The have a correlation of -0.2.

You could write this as:

m_x = 20, m_y = 5, s_x = 25, s_y = 9, and r_xy = -0.2.

Question 1a) (3 Points)

You decide to invest $100 dollars in either X or Y or some combination of both. How do you allocate your $100 to maximize your expected return?

Answer 1a) Invest all $100 in X.

Question 1b) (3 Points)

You remember hearing something about diversification in an investments class. So you decide to split your money and invest $50 in X and $50 in Y. How much money do you expect to have after one year (your initial investment of $100 + the expected return of your portfolio of X and Y).

Answer 1b)

$100 * [ 50% * 20% + 50% * 5%] =

$100 * [ 10% + 2.5%] =

$100 * [ 12.5%] =

$112.50

Question 1c) (6 Points)

What is the standard deviation and variance of the portfolio from part b?

Answer 1c)

Variance(whole portfolio) = (Ps_x)² + ((1-P)s_y)² + 2 * (Ps_x) * ((1-P)s_y)) * r_xy

Variance(whole portfolio) = (.5 * 25)² + (.5 * 9)² + 2 * (.5 * 25) * (.5 * 9)) * -0.2

Variance(whole portfolio) = (12.5)² + (4.5)² + 2 * (12.5) * (4.5) * -0.2

Variance(whole portfolio) = 156.25 + 20.25 + -22.5

Variance(whole portfolio) = 154

Standard Deviation of Whole Portfolio = sqrt(154) = 12.41

Question 1d) (4 Points)

What is a 95% (2 standard deviation) confidence interval for your return? That is, give me a confidence interval for 50% in X and 50% in Y.

Answer 1d)

P[12.5 - (2 * 12.41) < μ < 12.5 + (2 * 12.41)] = 95%

P[12.5 - 24.82 < μ < 12.5 + 24.82] = 95%

P[-12.32 < μ < 37.32] = 95%

Question 2) (18 Points in total)

A city decides to determine the mean expenditures per tourist per visit. A random sample of 100 finds that the average expenditure is $800. The standard deviation of expenditures for all tourists is $120.

Question 2a) (6 Points)

What is the standard deviation of the mean, given that the standard deviation of the whole population is $120 and the number of people sampled is 100?

Answer 2a)

s_x = 120

s_x-bar = 120 / Sqrt(100) = 120 / 10 = 12

s_x-bar = 12

Question 2b) (6 Points)

What is a 95% (2 standard deviation) confidence interval for the value of the expenditures per tourist?

Answer 2b)

P[800 - (2*12) < μ < 800 + (2*12) ] = 95%

P[800 - 24 < μ < 800 + 24] = 95%

P[776 < μ < 824] = 95%

Question 2c) (6 Points)

If the city wants the error of estimation to be $20, how many people does it need to sample? (If this helps, remember that the error of estimation is equal to half the width of the confidence interval.)

Answer 2c)

n = (2² x 120²) / 20² = 144

They would have to sample 144 people.

Question 3) (6 points in total)

On the following graph, one of the lines is the Z distribution and the other is the T Distribution with 2 degrees of freedom.

Question 3a) (6 Points)

Is Line 1 the Z distribution or the T distribution with 2 degrees of freedom?

Answer 3a)

Line 1 is the Z Distribution.

Line 2 is the T Distribution with 2 degrees of freedom.

Question 4) (15 Points)

The average distance of stopping a certain make of automobile is 65 feet. A company deigns a new brake thought to be more effective than the type they currently use. To test this brake, they install it on 64 cars. The new brakes give a stopping distance of 63 feet with a standard deviation of 4 feet.

Question 4a) (5 Points)

Formulate a hypothesis for a lower-tail test. (i.e. write the null hypothesis and the alternate hypothesis).

Answer 4a)

H₀: μ = 65 or μ >= 65

H₁: μ < 65 or H_a: μ < 65

Line 1 is the Z Distribution.

Line 2 is the T Distribution with 2 degrees of freedom.

Question 4b) (5 Points)

Construct the Z statistic to test how many standard deviations is the sample mean of 63 away from the original mean of 65.

Answer 4b)

s_x-bar = 4 / Sqrt(64) = 0.5

Z = (63 - 65) / 0.5 = -4.00

Question 4c) (5 Points)

Are the new brakes significantly better than the old brakes? Significant is this case means 95%.

Answer 4c)

Yes, the new brakes are significantly better. -4 is well to the left of the -1.65 threshold required for a lower-tail test.

Question 5) (18 points) Someone starts a new mutual fund that invests based on the winner of the Superbowl.

Quoted from a newspaper from January 2001:

"One of the quirkiest stock market indicators is the so called Super Bowl predictor for the market. If the Super Bowl is won by a team from the original, pre-merger National Football League, the market will close higher by the end of the year. If a team from the old American Football League wins, then the market will fall."

Here is data from the market over the last 10 years:

#	Year	Stock Market Return	Old NFL Wins (Yes/No)	Old NFL Wins (1/0)	Prediction Correct
1	1991	30	Yes	1	Yes
2	1992	-2	Yes	1	No
3	1993	20	Yes	1	Yes
4	1994	-2	No	0	Yes
5	1995	50	Yes	1	Yes
6	1996	40	Yes	1	Yes
7	1997	-20	Yes	1	No
8	1998	40	Yes	1	Yes
9	1999	60	Yes	1	Yes
10	2000	-5	No	0	Yes

If the old NFL (National Football League)) wins the mutual fund puts their money into the stock market, that is, they bet it will go up. Otherwise if they bet it will go down. If you look at the table above, you'll see that the prediction was correct 8 out of 10 times or 80% of the time. It was only wrong in 1992 and 1997.

You decide to test out this theory by doing a simple linear regression. For the X (predictor variable) you use Old NFL Wins (0 for no and 1 for yes). For the Y you use Stock Market Return. You run this through Minitab and this is the output you get.

The regression equation is

Stock Return = - 3.5 + 30.7 Old NFL Wins?

Predictor Coef StDev T P

Constant -3.50 17.80 -0.20 0.849

NFL Wins 30.75 19.90 1.55 0.161

S = 25.17 R-Sq = 23.0% R-Sq(adj) = 13.4%

Analysis of Variance

Source DF SS MS F P

Regression 1 1512.9 1512.9 2.39 0.161

Residual Error 8 5068.0 633.5

Total 9 6580.9

Question 5a) (6 Points) Based on the Minitab output, comment and interpret on whether or not it the regression model is valid in this case.

Answer 5a) The P score is too high. We typically want something below 0.05. This is above 0.05, in fact, it is 0.161. This suggests that random fluctuations could have caused the stock market movements and not necessarily the Superbowl.

Of course, as we know, correlation does not imply causality.

Question 5b) (6 Points) This year, the Baltimore Ravens won, and they are not part of the old NFL (they were part of the AFL), so the rule says that the stock market should go down.

What is your prediction of the return on the stock market based on the regression equation from Minitab from part a?

(Regardless of whether or not the regression is valid, I still want your prediction.)

Answer 5b) The regression equation is:

Stock Return = -3.5 + ( 30.7 * NFL Wins? )

So in this case, NFL Wins is 0. Plug into the formula:

Predicted Stock Return = -3.5 + ( 30.7 * 0 )

Predicted Stock Return = -3.5 + 0

Predicted Stock Return = -3.5

Question 5c) (6 Points) What is a 95% (2 standard deviation) confidence interval for your estimate from part b?

Answer 5c) The standard error of the regression is: 25.17.

So your confidence interval is:

P[-3.5 - 2*25.17 < μ < -3.5 + 2*25.17] = 95%

P[-3.5 - 50.34 < μ < -3.5 + 50.34] = 95%

P[-53.84 < μ < 46.84] = 95%

Question 6) (21 points in total)

You are the head of marketing for BSEOC Consulting. You are summoned to the CEO's office on the other side of the building. He asks to see the Minitab analysis of the regression of sales based on the amount of advertising. He thinks he is extremely proficient in statistics, and is capable of forming his own conclusions based on the data, so he doesn't just want your interpretation of the Minitab output, he wants to see it for himself.

You print out the Minitab output and rush over to the CEO's office. It is a long walk and just as you reach the office your clumsy coffee-drinking pal bumps into you, knocking you over and causing your sheet of paper with the output to fall to the floor AND your clumsy pal spills coffee on your Minitab output. It is too late to run back to your desk to print out another copy. You decide to hand write in the fields that are blurred by the coffee.

The regression equation is

Sales = 29.0 + 1.50 Advert

Predictor Coef StDev T P

Constant 28.959 6.637 XXXX¹ 0.002

Advert 1.4951 0.6355 XXXX² 0.046

S = XXXX³ R-Sq = XXXXX⁴ R-Sq(adj) = 33.5%

Analysis of Variance

Source DF SS MS F P

Regression 1 XXXXXX⁶ 220.40 XXXX⁹ 0.046

Residual Error X⁵ XXXXXX⁷ 39.82

Total 9 XXXXXX⁸

Question 6a) (9 Points)

What is the values of

1) The T-statistic for the constant?

2) The T-statistic for advertising?

3) S, the Standard Error of the Regression?

4) R-Squared?

5) The Degrees of Freedom for the Residual Error

6) SS-Regression (Sum of Squared of the Regression)?

7) SS-Residual Error (Sum of Squared of the Residual Error)?

8) SS-Total (Sum of Squared of the Total)?

9) The F-statistic?

Answers 6a)

The regression equation is

Sales = 29.0 + 1.50 Advert

Predictor Coef StDev T P

Constant 28.959 6.637 4.36 0.002

Advert 1.4951 0.6355 2.35 0.046

S = 6.311 R-Sq = 40.9% R-Sq(adj) = 33.5%

Analysis of Variance

Source DF SS MS F P

Regression 1 220.40 220.40 5.53 0.046

Residual Error 8 318.58 39.82

Total 9 538.98

1) The T-statistic for the constant? 4.36 = 28.959 / 6.637

2) The T-statistic for advertising? 2.35 = 1.4951 / 0.6355

3) S, the Standard Error of the Regression? 6.311 = (39.82)^1/2

4) R-Squared? 40.9% = 220.4 / 538.98

5) The Degrees of Freedom for the Residual Error 8 = 9 - 1

6) SS-Regression? 220.40 = 220.40 * 1

7) SS-Residual Error? 318.58 = 39.82 * 8

8) SS-Total? 538.98 = 220.40 + 318.58

9) The F-statistic? 5.53 = 220.40 / 39.82

Question 6b) (6 Points)

You and the CEO both agree that the linear regression model is appropriate for the data originally used.

The CEO decides to predict sales based on the advertising money spent.

You have 10 years worth of data as shown below. This is the data that you used for the previous regression.

(All Numbers in Millions)

Year	Sales	Advert
1	32.2	3.2
2	33	5.4
3	41.8	12.1
4	38.2	9.6
5	45.8	12.8
6	44.5	12.6
7	46	13.3
8	49	9.6
9	52	11
10	56	10

The CEO says,

"Well, based on the regression equation of:

Sales = 29.0 + 1.50 Advertising

if our budget for Advertising is 100, then we should be able to pull in sales of 29 + 1.50 * 100 which equals 179.

179 is great for us.

What do you tell your CEO with regards to his prediction?

(What is the main warning you would give the CEO why the prediction may not match the actual value?)

Answers 6b)

The problem is extrapolation. The 100 in advertising is well above the highest observed data point, which was only 13.3. We cannot say that the linear relation will hold for data outside the range. Predictions should only be made based on Advertising spend between 3 and 13.

Question 6c) (6 Points) You decide to try to improve the relatively small R-squared by (it is only 40.9%) by adding another predictor equation, in other words, you decide to perform a multiple-linear regression. You add in promotions and get the following output.

The regression equation is

Sales = 22.5 + 1.42 Advert + 2.06 Promotions

Predictor Coef StDev T P

Constant 22.489 7.602 2.96 0.021

Advert 1.4187 0.5960 2.38 0.049

Promotion 2.062 1.401 1.47 0.185

S = 5.896 R-Sq = 54.9% R-Sq(adj) = 42.0%

Analysis of Variance

Source DF SS MS F P

Regression 2 295.65 147.82 4.25 0.062

Residual Error 7 243.34 34.76

Total 9 538.98

Is it appropriate to add promotions to the regression model? Answer "Yes" or "No" and give a reason.

Answers 6c)

No. We would want the p-score to be below 0.05 for the Promotion variable. It is above .05, it is 0.185.

The r-squared does go up, but we know that R-Squared will always go up when we add another variable, so this does not tell the whole story.

Question 7) (6 Points) You run a simple linear regression on tree width versus age. Minitab produces the following graph for you of the residuals.

What does this graph tell you concerning the validity of the Linear Regression Model in this case.

Answer 7) The original data is not linear.

You may want to see if taking the log of Y and re-running the regression gives a more random looking residual plot, in other words, the log of Y and X may be linearly related.