9-3 Basic Skills and Concepts
Transcription
9-3 Basic Skills and Concepts
4025_CH09_p444-507 01/02/04 9:47 AM Page 477 9-3 Regression 9-3 Basic Skills and Concepts Making Predictions. In Exercises 1–4, use the given data to find the best predicted value of the dependent variable. Be sure to follow the prediction procedure described in this section. 1. In each of the following cases, find the best predicted value of y given that x 5 3.00. The given statistics are summarized from paired sample data. a. r 5 0.987, y 5 5.00, n 5 20, and the equation of the regression line is ŷ 5 6.00 1 4.00x. b. r 5 0.052, y 5 5.00, n 5 20, and the equation of the regression line is ŷ 5 6.00 1 4.00x. 2. In each of the following cases, find the best predicted value of y given that x 5 2.00. The given statistics are summarized from paired sample data. a. r 5 20.123, y 5 8.00, n 5 30, and the equation of the regression line is ŷ 5 7.00 2 2.00x. b. r 5 20.567, y 5 8.00, n 5 30, and the equation of the regression line is ŷ 5 7.00 2 2.00x. 3. Chest Sizes and Weights of Bears When eight bears were anesthetized, researchers measured the distances (in inches) around the bears’ chests and weighed the bears (in pounds). Minitab was used to find that the value of the linear correlation coefficient is r 5 0.993 and the equation of the regression line is ŷ 5 2187 1 11.3x, where x represents chest size. Also, the mean weight of the eight bears is 234.5 lb. What is the best predicted weight of a bear with a chest size of 52 in.? 4. Stocks and Super Bowl Data Set 25 in Appendix B includes pairs of data for the Dow-Jones Industrial Average (DJIA) high value and the total number of points scored in the Super Bowl for 21 different years. Excel was used to find that the value of the linear correlation coefficient is r 5 20.133 and the regression equation is ŷ 5 53.3 2 0.000442x, where x is the high value of the DJIA. Also, the mean number of Super Bowl points is 51.4. What is the best predicted value for the total number of Super Bowl points scored in a year with a DJIA high of 1200? Finding the Equation of the Regression Line. In Exercises 5 and 6, use the given data to find the equation of the regression line. 5. x 0 1 2 3 4 y 4 1 0 1 4 6. x 1 2 2 5 6 y 2 5 4 15 15 7. Effects of an Outlier Refer to the Minitab-generated scatterplot given in Exercise 7 of Section 9-2. a. Using the pairs of values for all 10 points, find the equation of the regression line. b. After removing the point with coordinates (10, 10), use the pairs of values for the remaining nine points and find the equation of the regression line. c. Compare the results from parts (a) and (b). Copyright © 2005 Pearson Education, Inc., publishing as Pearson Addison-Wesley 477 4025_CH09_p444-507 01/02/04 9:47 AM Page 478 478 CHAPTER 9 Correlation and Regression Finding the Equation of the Regression Line and Making Predictions. Exercises 8–24 use the same data sets as the exercises in Section 9-2. In each case, find the regression equation, letting the first variable be the independent (x) variable. Find the indicated predicted values. Caution: When finding predicted values, be sure to follow the prediction procedure described in this section. 8. Fires and Acres Burned Find the best predicted value for the number of acres burned given that there were 80 fires. Fires 73 69 58 48 84 62 57 45 70 63 48 Acres burned 6.2 4.2 1.9 2.7 5.0 1.6 3.0 1.6 1.5 2.0 3.7 9. Buying a TV Audience Find the best predicted value for the number of viewers (in millions), given that the salary (in millions of dollars) of television star Jennifer Anniston is $16 million. How does the predicted value compare to the actual number of viewers, which was 24 million? Salary 100 14 14 35.2 12 7 5 1 7 4.4 5.9 1.6 10.4 9.6 8.9 4.2 Viewers 10. Supermodel Heights and Weights Find the best predicted weight of a supermodel who is 69 in. tall. Height (in.) 71 Weight (lb) 125 70.5 119 71 72 70 70 128 128 119 127 66.5 105 70 71 123 115 11. Blood Pressure Measurements Find the best predicted diastolic blood pressure for a person with a systolic reading of 122. Systolic 138 130 135 140 120 125 120 130 130 144 143 Diastolic 82 91 100 100 80 90 80 80 80 140 130 150 98 105 85 70 100 12. Temperatures and Marathons Find the best predicted winning time for the 1990 marathon given that the temperature was 73 degrees. How does the predicted value compare to the actual winning time of 150.750 min? x (temperature) y (time) 55 61 49 62 70 73 51 57 145.283 148.717 148.300 148.100 147.617 146.400 144.667 147.533 13. Smoking and Nicotine Find the best predicted level of cotinine for a person who smokes 40 cigarettes per day. x (cigarettes per day) y (cotinine) 60 10 4 15 10 1 20 8 7 10 179 283 75.6 174 209 9.51 350 1.85 43.4 25.1 10 20 408 344 14. Tree Circumference and Height Find the best predicted height of a tree that has a circumference of 4.0 ft. What is an advantage of being able to determine the height of a tree from its circumference? x (circ.) y (ht) 1.8 1.9 1.8 2.4 5.1 3.1 5.5 5.1 8.3 13.7 5.3 4.9 3.7 3.8 21.0 33.5 24.6 40.7 73.2 24.9 40.4 45.3 53.5 93.8 64.0 62.7 47.2 44.3 Copyright © 2005 Pearson Education, Inc., publishing as Pearson Addison-Wesley 4025_CH09_p444-507 01/02/04 9:47 AM Page 479 9-3 Regression T 15. Cereal Killers Refer to Data Set 16 in Appendix B and use the amounts of fat (x) and the measured calorie counts (y). Find the best predicted calorie count for a cereal with 0.05 grams of fat per gram of cereal. T 16. Tobacco and Alcohol in Children’s Movies Refer to Data Set 7 in Appendix B and use the times that the animated children’s movies showed tobacco use (x) and alcohol use (y). Find the best predicted time for alcohol use, given that a movie does not show any tobacco use. T 17. Cholesterol and Body Mass Index Refer to Data Set 1 in Appendix B and use the cholesterol levels (x) and body mass index values (y) of the 40 women. What is the best predicted value for the body mass index of a woman having a cholesterol level of 500? T 18. Readability Levels Refer to Data Set 14 in Appendix B and use the Flesch Reading Ease scores (x) and the Flesch-Kincaid Grade Level values (y) for Tom Clancy’s The Bear and the Dragon. Find the best predicted Flesch-Kincaid Grade Level value for a page with a Flesch Reading Ease score of 50.0. T 19. Home Selling Prices, List Prices, and Taxes Refer to Data Set 24 in Appendix B. Caution: The sample values of list prices and selling prices are in thousands of dollars, but the tax amounts are in dollars. a. Use the paired data consisting of home list price (x) and selling price (y). What is the best predicted selling price of a home with a list price of $200,000? b. Use the paired data consisting of home selling price (x) and the amount of taxes (y). What is the best predicted tax bill for a home that sold for $400,000? T 20. Tar and Nicotine Refer to Data Set 5 in Appendix B. a. Use the paired data consisting of tar (x) and nicotine (y). What is the best predicted nicotine level for a cigarette with 15 mg of tar? b. Use the paired data consisting of carbon monoxide (x) and nicotine (y). What is the best predicted nicotine level for a cigarette with 15 mg of carbon monoxide? T 21. Forecasting Weather Refer to Data Set 10 in Appendix B. a. Use the five-day forecast high temperatures (x) and the actual high temperatures (y). What is the best predicted actual high temperature if the five-day forecast high temperature is 28°? b. Use the one-day forecast high temperatures (x) and the actual high temperatures (y). What is the best predicted actual high temperature if the one-day forecast high temperature is 28°? c. Which predicted value is better: the result from part (a) or the result from part (b)? Why? T 22. Florida Everglades Refer to Data Set 12 in Appendix B. a. Use the bottom temperatures (x) and the conductivity measurements (y). What is the best predicted conductivity measurement for a time when the bottom temperature is 30.0°C? b. Use the rainfall amounts (x) and the conductivity measurements (y). What is the best predicted conductivity measurement for a time when the rainfall amount is 0.00 in.? c. After identifying the best predicted conductivity measurement from parts (a) and (b), is either of the predicted values likely to be accurate? Why or why not? Copyright © 2005 Pearson Education, Inc., publishing as Pearson Addison-Wesley 479 4025_CH09_p444-507 01/02/04 9:47 AM Page 480 480 CHAPTER 9 Correlation and Regression T 23. Old Faithful Refer to Data Set 13 in Appendix B. a. Use the paired data for durations (x) and intervals (y) after eruptions of the geyser. What is the best predicted time before the next eruption if the previous eruption lasted for 210 sec? b. Use the paired data for heights of eruptions (x) and intervals (y) after eruptions of the Old Faithful geyser. What is the best predicted time before the next eruption if the previous eruption had a height of 275 ft? c. Which predicted time is better: the result from part (a) or the result from part (b)? Why? T 24. Diamond Prices, Carats, and Color Refer to Data Set 18 in Appendix B. a. Use the paired data consisting of the carat weight (x) and the price (y). What is the best predicted price of a diamond with a weight of 1.5 carats? b. Use the paired color (x) and price (y) data. What is the best predicted price of a diamond with a color rating of 3? c. Which predicted price is better: the result from part (a) or the result from part (b)? Why? 25. Identifying Outliers and Influential Points Refer to the sample data listed in Table 9-1. If we include another pair of values consisting of x 5 120 (for 1,200,000 boats) and y 5 160 (manatee deaths from boats), is the new point an outlier? Is it an influential point? 26. Identifying Outliers and Influential Points Refer to the sample data listed in Table 9-1. If we include another pair of values consisting of x 5 120 (for 1,200,000 boats) and y 5 10 (manatee deaths from boats), is the new point an outlier? Is it an influential point? 9-3 Beyond the Basics 27. How Is a Regression Equation Affected by Change in Scale? Large numbers, such as those in the accompanying table, often cause computational problems. First use the given data to find the equation of the regression line, then find the equation of the regression line after each x-value has been divided by 1000. How are the results affected by the change in x? How would the results be affected if each y-value were divided by 1000? x 1 2 4 5 y 4 24 8 32 x 924,736 832,985 825,664 793,427 857,366 y 142 111 109 95 119 28. Testing Least-Squares Property According to the least-squares property, the regression line minimizes the sum of the squares of the residuals. We noted that with the paired data in the margin, the regression equation is ŷ 5 5 1 4x and the sum of the squares of the residuals is 364. Show that the equation ŷ 5 8 1 3x results in a sum of squares of residuals that is greater than 364. 29. Using Logarithms to Transform Data If a scatterplot reveals a nonlinear (not a straight line) pattern that you recognize as another type of curve, you may be able to apply the methods of this section. For the data given in the margin, find the linear equation (y 5 b0 1 b1x) that best fits the sample data, and find the logarithmic equa- Copyright © 2005 Pearson Education, Inc., publishing as Pearson Addison-Wesley 4025_CH09_p444-507 01/02/04 9:47 AM Page 481 9-4 Variation and Prediction Inter vals tion (y 5 a 1 b ln x) that best fits the sample data. (Hint: Begin by replacing each x-value with ln x.) Which of these two equations fits the data better? Why? x 2.0 2.5 4.2 10.0 y 12.0 18.7 53.0 225.0 30. Equivalent Hypothesis Tests Explain why a test of the null hypothesis H0: r 5 0 is equivalent to a test of the null hypothesis H0: b1 5 0 where r is the linear correlation coefficient for a population of paired data, and b1 is the slope of the regression line for that same population. 31. Residual Plot A scatterplot is a plot of the paired (x, y) sample data. A residual plot is a graph of the points with the same x-coordinates, but the corresponding y-coordinates are the residual values. To construct a residual plot, use the same x-axis as the scatterplot, but use a vertical axis of residual values. Draw a horizontal reference line through the residual value of 0, then plot the paired values of (x, residual). Residual plots are helpful in identifying patterns suggesting that the relationship between the variables is not linear, or that the assumption of constant variances is not satisfied. Construct a residual plot for the data in Table 9-1. Are there any noticeable patterns? Copyright © 2005 Pearson Education, Inc., publishing as Pearson Addison-Wesley 481