Some of the following were adapted from problems suggested by Kathleen Wong Nirei of Iolani School, Honolulu and Bill Harrington.
1. Circle the correct answer:
If a correlation coefficient is 0.80, then:
a. The explanatory variable is usually less than the response variable.
b. The explanatory variable is usually more than the response variable.
c. Below average values of the explanatory variable are more often associated with below average values of the response variable.
d. Below average values of the explanatory variable are more often associated with above average values of the response variable.
e. None of the above.
2. Circle the correct answer:
a. The closer a correlation coefficient is to 1
or 1, the more evidence there is of a causal relationship between the explanatory
variable and the response variable.
b. The closer a correlation coefficient is to 0,
the more evidence there is of a causal relationship between the explanatory
variable and the response variable.
c. The closer the value of r^2 is to 1 or -1, the
more evidence there is of a causal relationship between the explanatory
variable and the response variable.
d. The closer the value of r^2 is to 0, the more
evidence there is of a causal relationship between the explanatory variable
and the response variable.
e. None of the above.
3. One of the following statements is better than
the others. Circle that statement. VERY BRIEFLY explain why you did not
choose each of the other statements:
When comparing the size the residuals from two different models for the same data:
a. Use the range of each set of residuals as a basis for comparison.
b. Use the mean of each set of residuals as a basis for comparison.
c. Use the sum of each set of residuals as a basis for comparison.
d. Use the standard deviation of of each set of residuals as a basis for comparison.
4. Below is a plot of the 1986 profits versus sales
(each in ten of thousands of dollars) of 12 large US companies, the results
of a least squares regression performed on a TI-83, and some other summary
data. Note that some of the data with lower Sales values overlap on the
graph.
![]() |
|
a. Demonstrating your knowledge of the definition of r^2, explain what the value of r^2 means in the context of this problem.
b. Annotate, i.e. fully add labels and lines, at any one point on the plot to help a reader understand what r^2 measures.
c. The teacher who supplied this data set suggested that even though
r^2 is close to one there is reason to doubt some of the interpolative predictive
value of this model. He came to this conclusion with no further computation
or residual analysis. Explain his reasoning.
5 . Note: The data for this problem is stored in a program named AIDS which is available from Mr. Coons. Do NOT enter this data by hand.
Consider the following data on the number of AIDS cases reported in the US by state health departments between 1982 and 1986:
| Year | 1982 | 1983 | 1984 | 1985 | 1986 |
| Number of Cases | 434 | 1,416 | 3,196 | 6,242 | 10,620 |
a. Using year as the independent variable, state the value of and interpret the slope of the least squares regression line in the context of this data.
b. State the value of and interpret the y-intercept of the regression line in the context of this data.
c. Use the least squares regression line to predict the number of aids cases in the year 2000.
d. Assuming this data was an adequate and representative sample, how confident are you in the prediction you made in part c? Your answer must include conclusions from a residual analysis. Include a rough residual plot.
| From Rachel Apfel: I am not very confident of this prediction (although the correlation coefficient shows a very strong positive association, 0.965, and r^2 is 0.93, a very large proportion) for 2 reasons: 1) The residual plot shows a pattern, signifying the relationship is not best shown with a linear model, and 2) the danger of extrapolation: The year 2000 is a value beyond what is contained in this data set so we have no way of knowing that this relationship will remain the same for values outside this data set. | ![]() |
e. State the equation of a quadratic model and compare it fully to your previous model. Include a rough plot(s).
| cases = 575.57(year)2 - 2281347(year) + 2260600436 From Leah Temple: "The residual plot for the quadratic regression shows no pattern and its standard deviation 87.8 as opposed to the linear regression's residual plot which showed a pattern and had a standard deviation of 1080.4. This indicates that the quadratic is a better model. Also the line seems to fit better as it is graphed. " |
Quadratic: |
b) Create a numerical example of Simpson's Paradox. Briefly point out how your example demonstrates this deceiving situation.
| |
|
| |
| Town 1 | |
|
|
| Town 2 | |
|
|
| Town 3 | |
|
|
| Town 4 | |
|
|
| |
![]() |