9-3 Basic Skills and Concepts

Transcription

9-3 Basic Skills and Concepts
4025_CH09_p444-507 01/02/04 9:47 AM Page 477
9-3
Regression
9-3 Basic Skills and Concepts
Making Predictions. In Exercises 1–4, use the given data to find the best predicted value
of the dependent variable. Be sure to follow the prediction procedure described in this
section.
1. In each of the following cases, find the best predicted value of y given that x 5 3.00.
The given statistics are summarized from paired sample data.
a. r 5 0.987, y 5 5.00, n 5 20, and the equation of the regression line is
ŷ 5 6.00 1 4.00x.
b. r 5 0.052, y 5 5.00, n 5 20, and the equation of the regression line is
ŷ 5 6.00 1 4.00x.
2. In each of the following cases, find the best predicted value of y given that x 5 2.00.
The given statistics are summarized from paired sample data.
a. r 5 20.123, y 5 8.00, n 5 30, and the equation of the regression line is ŷ 5
7.00 2 2.00x.
b. r 5 20.567, y 5 8.00, n 5 30, and the equation of the regression line is ŷ 5
7.00 2 2.00x.
3. Chest Sizes and Weights of Bears When eight bears were anesthetized, researchers
measured the distances (in inches) around the bears’ chests and weighed the bears (in
pounds). Minitab was used to find that the value of the linear correlation coefficient is
r 5 0.993 and the equation of the regression line is ŷ 5 2187 1 11.3x, where x represents chest size. Also, the mean weight of the eight bears is 234.5 lb. What is the
best predicted weight of a bear with a chest size of 52 in.?
4. Stocks and Super Bowl Data Set 25 in Appendix B includes pairs of data for the
Dow-Jones Industrial Average (DJIA) high value and the total number of points
scored in the Super Bowl for 21 different years. Excel was used to find that the value
of the linear correlation coefficient is r 5 20.133 and the regression equation is ŷ 5
53.3 2 0.000442x, where x is the high value of the DJIA. Also, the mean number of
Super Bowl points is 51.4. What is the best predicted value for the total number of Super Bowl points scored in a year with a DJIA high of 1200?
Finding the Equation of the Regression Line. In Exercises 5 and 6, use the given data
to find the equation of the regression line.
5. x
0
1
2
3
4
y
4
1
0
1
4
6. x
1
2
2
5
6
y
2
5
4
15
15
7. Effects of an Outlier Refer to the Minitab-generated scatterplot given in Exercise 7 of
Section 9-2.
a. Using the pairs of values for all 10 points, find the equation of the regression line.
b. After removing the point with coordinates (10, 10), use the pairs of values for the
remaining nine points and find the equation of the regression line.
c. Compare the results from parts (a) and (b).
Copyright © 2005 Pearson Education, Inc., publishing as Pearson Addison-Wesley
477
4025_CH09_p444-507 01/02/04 9:47 AM Page 478
478
CHAPTER 9
Correlation and Regression
Finding the Equation of the Regression Line and Making Predictions. Exercises 8–24
use the same data sets as the exercises in Section 9-2. In each case, find the regression
equation, letting the first variable be the independent (x) variable. Find the indicated predicted values. Caution: When finding predicted values, be sure to follow the prediction
procedure described in this section.
8. Fires and Acres Burned Find the best predicted value for the number of acres burned
given that there were 80 fires.
Fires
73
69
58
48
84
62
57
45
70
63
48
Acres burned
6.2
4.2
1.9
2.7
5.0
1.6
3.0
1.6
1.5
2.0
3.7
9. Buying a TV Audience Find the best predicted value for the number of viewers (in
millions), given that the salary (in millions of dollars) of television star Jennifer Anniston is $16 million. How does the predicted value compare to the actual number of
viewers, which was 24 million?
Salary
100
14
14
35.2
12
7
5
1
7
4.4
5.9
1.6
10.4
9.6
8.9
4.2
Viewers
10. Supermodel Heights and Weights Find the best predicted weight of a supermodel
who is 69 in. tall.
Height (in.)
71
Weight (lb)
125
70.5
119
71
72
70
70
128
128
119
127
66.5
105
70
71
123
115
11. Blood Pressure Measurements Find the best predicted diastolic blood pressure for a
person with a systolic reading of 122.
Systolic
138 130 135 140 120 125 120 130 130 144 143
Diastolic
82
91 100 100
80
90
80
80
80
140 130 150
98 105
85
70 100
12. Temperatures and Marathons Find the best predicted winning time for the 1990
marathon given that the temperature was 73 degrees. How does the predicted value
compare to the actual winning time of 150.750 min?
x (temperature)
y (time)
55
61
49
62
70
73
51
57
145.283 148.717 148.300 148.100 147.617 146.400 144.667 147.533
13. Smoking and Nicotine Find the best predicted level of cotinine for a person who
smokes 40 cigarettes per day.
x (cigarettes per day)
y (cotinine)
60
10
4
15
10 1
20 8
7
10
179 283 75.6 174 209 9.51 350 1.85 43.4 25.1
10
20
408 344
14. Tree Circumference and Height Find the best predicted height of a tree that has a circumference of 4.0 ft. What is an advantage of being able to determine the height of a
tree from its circumference?
x (circ.)
y (ht)
1.8
1.9
1.8
2.4
5.1
3.1
5.5
5.1
8.3 13.7
5.3
4.9
3.7
3.8
21.0 33.5 24.6 40.7 73.2 24.9 40.4 45.3 53.5 93.8 64.0 62.7 47.2 44.3
Copyright © 2005 Pearson Education, Inc., publishing as Pearson Addison-Wesley
4025_CH09_p444-507 01/02/04 9:47 AM Page 479
9-3
Regression
T 15. Cereal Killers Refer to Data Set 16 in Appendix B and use the amounts of fat (x) and
the measured calorie counts (y). Find the best predicted calorie count for a cereal with
0.05 grams of fat per gram of cereal.
T 16. Tobacco and Alcohol in Children’s Movies Refer to Data Set 7 in Appendix B and
use the times that the animated children’s movies showed tobacco use (x) and alcohol
use (y). Find the best predicted time for alcohol use, given that a movie does not show
any tobacco use.
T 17. Cholesterol and Body Mass Index Refer to Data Set 1 in Appendix B and use the
cholesterol levels (x) and body mass index values (y) of the 40 women. What is the
best predicted value for the body mass index of a woman having a cholesterol level of
500?
T 18. Readability Levels Refer to Data Set 14 in Appendix B and use the Flesch Reading
Ease scores (x) and the Flesch-Kincaid Grade Level values (y) for Tom Clancy’s The
Bear and the Dragon. Find the best predicted Flesch-Kincaid Grade Level value for a
page with a Flesch Reading Ease score of 50.0.
T 19. Home Selling Prices, List Prices, and Taxes Refer to Data Set 24 in Appendix B.
Caution: The sample values of list prices and selling prices are in thousands of dollars, but the tax amounts are in dollars.
a. Use the paired data consisting of home list price (x) and selling price (y). What is
the best predicted selling price of a home with a list price of $200,000?
b. Use the paired data consisting of home selling price (x) and the amount of taxes
(y). What is the best predicted tax bill for a home that sold for $400,000?
T 20. Tar and Nicotine Refer to Data Set 5 in Appendix B.
a. Use the paired data consisting of tar (x) and nicotine (y). What is the best predicted
nicotine level for a cigarette with 15 mg of tar?
b. Use the paired data consisting of carbon monoxide (x) and nicotine (y). What is the
best predicted nicotine level for a cigarette with 15 mg of carbon monoxide?
T 21. Forecasting Weather Refer to Data Set 10 in Appendix B.
a. Use the five-day forecast high temperatures (x) and the actual high temperatures
(y). What is the best predicted actual high temperature if the five-day forecast high
temperature is 28°?
b. Use the one-day forecast high temperatures (x) and the actual high temperatures
(y). What is the best predicted actual high temperature if the one-day forecast high
temperature is 28°?
c. Which predicted value is better: the result from part (a) or the result from part (b)?
Why?
T 22. Florida Everglades Refer to Data Set 12 in Appendix B.
a. Use the bottom temperatures (x) and the conductivity measurements (y). What is
the best predicted conductivity measurement for a time when the bottom temperature is 30.0°C?
b. Use the rainfall amounts (x) and the conductivity measurements (y). What is the
best predicted conductivity measurement for a time when the rainfall amount is
0.00 in.?
c. After identifying the best predicted conductivity measurement from parts (a) and
(b), is either of the predicted values likely to be accurate? Why or why not?
Copyright © 2005 Pearson Education, Inc., publishing as Pearson Addison-Wesley
479
4025_CH09_p444-507 01/02/04 9:47 AM Page 480
480
CHAPTER 9
Correlation and Regression
T 23. Old Faithful Refer to Data Set 13 in Appendix B.
a. Use the paired data for durations (x) and intervals (y) after eruptions of the geyser.
What is the best predicted time before the next eruption if the previous eruption
lasted for 210 sec?
b. Use the paired data for heights of eruptions (x) and intervals (y) after eruptions of
the Old Faithful geyser. What is the best predicted time before the next eruption if
the previous eruption had a height of 275 ft?
c. Which predicted time is better: the result from part (a) or the result from part (b)?
Why?
T 24. Diamond Prices, Carats, and Color Refer to Data Set 18 in Appendix B.
a. Use the paired data consisting of the carat weight (x) and the price (y). What is the
best predicted price of a diamond with a weight of 1.5 carats?
b. Use the paired color (x) and price (y) data. What is the best predicted price of a diamond with a color rating of 3?
c. Which predicted price is better: the result from part (a) or the result from part (b)?
Why?
25. Identifying Outliers and Influential Points Refer to the sample data listed in Table
9-1. If we include another pair of values consisting of x 5 120 (for 1,200,000 boats)
and y 5 160 (manatee deaths from boats), is the new point an outlier? Is it an influential point?
26. Identifying Outliers and Influential Points Refer to the sample data listed in Table
9-1. If we include another pair of values consisting of x 5 120 (for 1,200,000 boats)
and y 5 10 (manatee deaths from boats), is the new point an outlier? Is it an influential point?
9-3 Beyond the Basics
27. How Is a Regression Equation Affected by Change in Scale? Large numbers, such as
those in the accompanying table, often cause computational problems. First use the
given data to find the equation of the regression line, then find the equation of the
regression line after each x-value has been divided by 1000. How are the results affected by the change in x? How would the results be affected if each y-value were divided by 1000?
x
1
2
4
5
y
4
24
8
32
x
924,736
832,985
825,664
793,427
857,366
y
142
111
109
95
119
28. Testing Least-Squares Property According to the least-squares property, the regression line minimizes the sum of the squares of the residuals. We noted that with the
paired data in the margin, the regression equation is ŷ 5 5 1 4x and the sum of the
squares of the residuals is 364. Show that the equation ŷ 5 8 1 3x results in a sum of
squares of residuals that is greater than 364.
29. Using Logarithms to Transform Data If a scatterplot reveals a nonlinear (not a
straight line) pattern that you recognize as another type of curve, you may be able to
apply the methods of this section. For the data given in the margin, find the linear
equation (y 5 b0 1 b1x) that best fits the sample data, and find the logarithmic equa-
Copyright © 2005 Pearson Education, Inc., publishing as Pearson Addison-Wesley
4025_CH09_p444-507 01/02/04 9:47 AM Page 481
9-4
Variation and Prediction Inter vals
tion (y 5 a 1 b ln x) that best fits the sample data. (Hint: Begin by replacing each
x-value with ln x.) Which of these two equations fits the data better? Why?
x
2.0
2.5
4.2
10.0
y
12.0
18.7
53.0
225.0
30. Equivalent Hypothesis Tests Explain why a test of the null hypothesis H0: r 5 0 is
equivalent to a test of the null hypothesis H0: b1 5 0 where r is the linear correlation
coefficient for a population of paired data, and b1 is the slope of the regression line
for that same population.
31. Residual Plot A scatterplot is a plot of the paired (x, y) sample data. A residual plot
is a graph of the points with the same x-coordinates, but the corresponding y-coordinates are the residual values. To construct a residual plot, use the same x-axis as the
scatterplot, but use a vertical axis of residual values. Draw a horizontal reference line
through the residual value of 0, then plot the paired values of (x, residual). Residual
plots are helpful in identifying patterns suggesting that the relationship between the
variables is not linear, or that the assumption of constant variances is not satisfied.
Construct a residual plot for the data in Table 9-1. Are there any noticeable patterns?
Copyright © 2005 Pearson Education, Inc., publishing as Pearson Addison-Wesley
481