1. Two companies, The Tool Company and The Machine Company, have made prototype devices to automatically throw softballs a fixed distance. Below are the results of 100 throws for each device. Each device was set to throw each ball a distance of 55 feet.
a. Fill in the chart below with comparisons of The Tool Company and The Machine Company data for "the six features that are often of interest when analyzing a distribution." Do this by simply looking at the dotplots. Do not do any counting or calculations.
|Feature||Compare The Tool vs The Machine Company for this Feature|
|Center||The center of the Tool distribution is about 72 while that of the Machine distribution is much lower at approximately 54.|
|Variability||The Tool distribution has much less variability than the Machine distribution .|
|Shape||The Tool distribution is mound shaped and fairly symmetrical while the Machine distribution is skewed left.|
|Peaks and Clusters||The Tool distribution is has a single peak while the Machine distribution may have two peaks (very subjective here)|
|Outliers||The Tool distribution does not have outliers while the Machine distribution has a probable outlier at 12.|
|Granularity||Neither distribution appears granular.|
b. Each company argued that its prototype is better. In a sentence or two write what you think each company's argument was?
The Tool Company: Our machine has low variability. Clearly we have a much better, more reliable design. All we have to do is make an adjustment on the distance scale and we will have a great machine.
The Machine Company: Our machine is set at 55
feet and the as you can see our "average" throw is much closer
to that number than our competitors.
c. Below Minitab's descriptive statistics for the Machine Company's data. Formally determine if The Machine Company's minimum data value is an outlier. Show your work.
1.5* IQR = 1.5*(59.00 - 45.25) = 20.625
An "left-side outlier" must be less
than Q1 - 20.625 = 45.25 - 20.625 = 24.625. Thus 12, the minimum value,
is an outlier.
d. Demonstrate your understanding of the empirical rule for three standard deviations by applying it Machine Company's data and explaining whether or not the rule seems to hold reasonably well. You can use the descriptive statistics displayed above. Show your work.
The empirical rule suggest that virtually all of the observed data should be within 3 standard deviations of the mean when the distribution is mound shaped:
= (51.25 - 3*11.64,51.25 + 3*11.64) = (16.33,86.11)
Consistent with the empirical rule, all but one of the observed data falls within 3 std. dev of the sample mean. This might happen by chance alone or due to the skewness of the distribution.
e. Draw, without counting, a very rough boxplot of The Tool Company's data by looking at the dotplot of that data . There is no need to label values or worry about possible outliers, simply make the relative sizes of the parts clear. Careful: do not use the Machine Company's data.
2. A study considers the variable: The weight of an automobile. Circle the true answer:
a) The case of the variable is automobile and it is a measurement variable.
b) The case of the variable is automobile and it is a categorical variable.
c) The case of the variable is weight and it is a measurement variable.
d) The case of the variable is weight and it is a categorical variable.
e) None of the above.
3. Circle the answer in which all entries are resistant
statistics [Motivated by questions from Jamie Bard and Brian O'Connor]
|4. The two boxplots to the right show the distributions of red and orange M&Ms from the 17 bags of M&Ms which were inspected by 1996-7 AP Stats class. Write a paragraph or two to a knowledgeable statistician at The Mars Candy Company explaining what might be expected about the number of red and orange M&Ms in an 18th bag taken from the same stock.|
In our sample of 17 bag, the number of red M&Ms very frequently was more than the number of orange M&Ms. In fact, 75% of the bags had more red M&Ms than the number of orange M&Ms that appeared in any bag. In addition, the median number of orange M&Ms is close to the minimum number of red M&Ms that appeared in any bag. Thus this data suggests that if is very likely that the number of red M&Ms in an 18th bag would be more than then number of orange M&Ms.
5. The formula for a z-score is . [Motivated by a question from Mike Dimella]
a. Explain the purpose of z-scores.
From Jamie "The z-score is useful for taking two observations of different scales and converting the data to a common scale. Called standardization, this allows for two or more observations taken in different scales to be compared."
b. In detail, explain how the formula actually fulfills your answer in part a.
From Leah: "By subtracting the mean from a specific observation you are given how far away from the mean this particular observations is. The standard deviation is the typical distance [all] the observations of the data set are from the mean. When you divide the particular distance by the typical distance you are given proportion. This proportion tells you how far away the specific observations in relation to the typical distance, because you are creating a ratio between the particular and the typical.
c. The best male long jumpers for State College since 1973 have averaged a jump of 263.0 inches with a standard deviation of 14.0 inches. The best female long jumpers have averaged 201.2 inches with a standard deviation of 7.7 inches. Which athlete is more impressive within their class, a male with a jump of 275 inches or a female with a jump of 207 inches? Prove your answer with appropriate calculations. [From Bill Harrington, Teacher, State College Area High School, State College, PA]
Therefore the male's jump was more impressive.
6. This problem asks you to make a generalization
based on the empirical rule:
Assume that after applying the empirical rule to data which forms a mound shaped distribution, about 95% of that data lies between the numbers a and b. Determine a formula for the sample standard deviation of the data in terms of a & b?