Chapter 1:
Exploring Data
AP Statistics Standards I. Exploring Data:
Describing patterns and departures from patterns
(20% –30%)
- A.
Constructing and interpreting graphical displays of distributions of
univariate data (dotplot, stemplot, histogram, cumulative frequency
plot)
-
- Center and spread
- Clusters and gaps
- Outliers and other unusual features
- Shape
B. Summarizing distributions of
univariate data
- Measuring center:
median, mean
- Measuring spread: range, interquartile
range, standard deviation
- Measuring position: quartiles,
percentiles, standardized scores (z-scores)
- Using boxplots
-
The effect of changing units on summary
measures
C. Comparing distributions of univariate data
(dotplots, back-to-back stemplots, parallel boxplots)
- Comparing center and spread: within
group, between group variation
- Comparing clusters and gaps
- Comparing outliers and other unusual
features
- Comparing shapes
|
|
Essential Question: How many numbers are needed to
describe a complex event or object? |
Introduction--what is statistics about?
- Given a complex system or object, describe it adequately with a limited number of indicators or measurements.
describe yourself with 2 words and 2 numbers. See if your classmates can ID you from the description. Formative assessment: What did you learn about the power of indicators or measurements from doing the above exercise? |
- State the key elements used for answering a research question in a statistically acceptable manner. Statistical analysis is an internationally recognized way of answering research questions and communicating data. It is a powerful international communication tool.
Design--the systematic way in which
the data is collected.
Analysis--the
systematic use of graphical
and mathematical tools to describe and evaluate the data.
Conclusions--the
systematic manner in which
inferences are drawn from the data and uncertainties are evaluated.
- Evaluate information to determine if it is anecdotal evidence. Anecdotal evidence is based on data that's collected in a haphazard manner. It usually consists of a small sample size, often a single data point, frequently chosen for emotional impact.
Evidence consisting of a single data point is always considered anecdotal
Conclusions based on anecdotal evidence are not statistically defensible
Homefun (formative/summative assessment): Find an article that uses anecdotal evidence. Briefly describe the evidence and how it is used. Provide a reference to the source.
|
Essential Question:
Is data always expressed as numbers? |
-
State the difference between categorical and
quantitative variables and give examples of each.
quantitative variables: consists of numerical values that could reasonably be expressed as an average. |
height |
weight |
age |
categorical variable: a classification system |
zip codes
|
grade (freshmen, sophomores, juniors, seniors)
|
size (small, medium, large)
|
Note:
Categorical data is drawn only on
bar graph or pie charts
- Evaluate the effectiveness of bar charts and other graphs. examples: .
Formative assessment: Evaluate the effectiveness of the above charts |
-
Create frequency tables for categorical data. In other words, convert the "count" data to % data.
- Convert the above tables into bar charts. A 2-way table will contain:
a vertical and a horizontal marginal distribution
multiple conditional distributions
- Use conditional distributions based on relative frequencies to establish
associations. This is typically done by looking at bar charts of the distributions
associations: a pattern exists between the values of one variable and the values of another. Association does not establish that one variable causes the other.
Homefun (formative/summative assessment): Read section 1.1, work exercises 1, 11, 17 pages 22 to 24
Essential Question: Can data sets be added together
to obtain a larger sample size and hence more meaningful conclusion? |
Simpson's Paradox
- Analyze data for
Simpson's paradox.
- Conclusions based on parts can be reversed when
considering the whole
- Conclusions based on parts is more likely to be
valid.
- State two conditions which must exist for
Simpson's Paradox to occur.
- One or more lurking variables
- Data from unequal sized groups being combined into a
single group.
Homefun (formative/summative
assessment):
-
Read Simpson's Paradox
- When Big Data Sets Go Bad
-
Read "A closer Look at SAT Scores Decline", Summarize in a paragraph how Simpson's paradox might be involved.
-
work exercises 20, 35 pages 25-26
|
|
Essential Question:
When using a number to describe a
complex event or object is
there a difference between using a single number and using a
single data point? |
Ch1.2 Describing Distributions
- Define distribution and state two key pieces
of information require to produce a distribution.
The pattern of variation of a
single variable
- Quantitative data(numbers along horizontal or x-axis)
- Frequency--How often various values are
expected (along vertical or y-axis)
- State the 3 key ways a distribution can be
described.
- Central tendency or center
- Spread or variability
- Shape
- Name and define the 3 key measure of central tendency.
Mean = Σxi
/ n or
Mean = ( x1+ x2
+ x3 + ... + xn) / n
- Median - midpoint, 50% above, 50% below
- Mode - most common data point or highest peak
-
Given a set of data determine the
mean, median and mode.
- Define and ID outliers.
Outliers are data points that are thought to
belong to a different distribution, hence, any influence they have
on the properties of a distribution causes errors.
- Data point not in distribution
- Gaps
Outliers and skew are not the
same thing. Skew is part of a distribution outliers are not.
Conclusions unduly
influenced by a single data point are statistically
indefensible! --these data points
are Outliers |
- State which measure of central tendency is
generally most influenced by outliers.
- Using the Mr. Rogers Rat Tail Rule, state
whether a distribution is skewed left or right, high or low.
The Mr. Rogers Rat Tail Rule--FAQ
Skewed distributions often look like a rat with a long tail.
The tail points in the direction of skew.
What gets skewed? The mean gets skewed or moved in the
direction the rat tail points.
Why does skew matter? For a skewed distribution, the
mean poorly represents the bulk of the data points.
What gets skewed very little? The median. It is represents the bulk of the data points better than the
mean. |
- Give examples of data that would tend to be
symmetrical and data that would be skewed left or right.
- Easy Test - skewed
left or skewed low
- Hard Test - skewed
right or skewed high
- Normal Test -
symmetrical
- Incomes - skewed
right or skewed high
Homefun
(formative/summative assessment): Read section 1.2
|
Stats
Investigation:
Investigation School Evaluation - time approx 3 class periods
(individual work) |
Purpose:
Determine if it is reasonable
for 50% of all schools receiving a school report card to be scored below average.
Instructions:
Perform the simulation of school ratings using the Excel Spread
Sheet provided.
Questions /Conclusions:
(see Excel spread sheet.)
|
|
Essential Question:
Is there a difference between
looking at tables of numbers and looking at plots or graphs of numbers? |
-
Make dot plots.
gasoline consumption analysis
Old Fathful analysis
foreign born analysis
Is IQ a bell-curve distribution?
-
Make histograms using the TI-83 calculator and in Minitab.
- State the key weakness of histograms (see "Four
Histograms").
Homefun
(formative/summative assessment):
work exercises
37, 41, 55, 57 pages 42-46 |
Essential Question:
Can the type of plot influence the
conclusions drawn and if so how can this be prevented? |
Stem and Leaf Plots
- Draw and interpret stem and leaf plots.
- clusters
- skew
- gaps
- multiple modes--these imply that the data comes from more than one distribution.
- Draw and interpret back to back
stem and leaf plots .
- State why a time plot should always
be used in an analysis of data.
Virtually everything is a function
of time.
Homefun
(formative/summative assessment): read section 1.3; exercises 45, 47, 49, pages 44-45
|
Essential Question:
Is there a difference between skew
and outliers? |
Box and Whiskers Plots
- Calculate quartiles, Q1 and Q3.
- Interpret 5 number summaries.
Low, Q1, Med.,Q3, Hi
- Find the IQR or interquartile range
for a data set.
IQR = Q3 - Q1
- Draw a box and whiskers plot.
- State the Mr. Rogers Rat Whisker
Rule for determining skew using a box and whiskers plot.
Long whisker indicates
direction of skew.
- State the % of the data expected in
each whisker and in the box for a box and whiskers plot.
25%
Homefun
(formative/summative assessment):
|
Essential Question:
Why are outliers important? |
Modified Box and Whiskers Plot
- Identify outliers using a modified
box and whiskers plot.
- Whisker's End
= 1st data pt within 1.5 IQR
of Q3
-
Outlier = data pt beyond the whisker's end
-
Create box and whisker plots on
the TI-83.
- Create and interpret parallel box and whisker plots on
the TI-83 and in Minitab.
Note that a box and whiskers plot cannot detect gaps, clusters, or
multi-modes, but here's the problem with other types of graphs such as
dot plots, stem and leaf plots, and histograms: the ability to detect
patterns depends on the interval size. There's no perfect plot for
visualizing distributions.
Formative assessments:
- Which type of plot(s)
is(are) best at identifying clusters?
- Which type of plot(s)
is(are) best at identifying multiple-modes?
- Which type of plot(s)
is(are) best at identifying gaps?
Homefun
(formative/summative assessment): exercises 91,93, 95 p. 71 Work the Chapter 1 practice Test TI.1 to TI.15 78-81:
|
Essential Question:
Ideally, how many data points in a
set of data are needed to characterize spread? |
Standard Deviation
- Quantities represented as Greek
alphabet symbols are considered true (known by
Zeus).
- Quantities represented in our
normal alphabet (known by mere mortals) are
estimates of the ones represented as Greek alphabet symbols.
|
-
Calculate the range and explain why it is a poor indicator of spread.
- Write the mathematical
definition for standard
deviation from memory and explain its meaning.
|
Calculated from an entire
population |
|
σ = |
[
Σ(xi
- μ)2 / n ]1/2 |
|
|
|
Calculated from a sample |
|
s = |
[
Σ(xi
- xbar)2 / (n - 1) ]1/2 |
The standard deviation is a way to
express how much a typical data point differs from the mean but it is
weighted so that large deviations have more influence.
- State how standard deviation and
variance are related.
variance = (standard
deviation)2
-
Calculate standard deviations by
hand and with a calculator
-
Explain the difference between S and
sigma.
- State why the standard deviation
is a better indicator of spread than range.
Std
dev uses all the data points, range uses only 2 pts.
- State an approximate relationship
between range and standard deviation.
(range roughly = 6 sigma.)
Rank the distributions show here from lowest to highest standard deviation.
Formative assessment: What does a distribution with high or low standard deviaton look like? |
Homefun
(formative/summative assessment): exercise 97, 99 p. 72; Work the Chapter 1 practice Test TI.1 to TI.15 78-81
|
Essential Question:
How can I make an "A" on
the test? |
Exploring Data Review
- Work the practice test.
- Review the objectives.
- Correctly interpret 5 number
summaries.
- Look over
free response problems
from previous years.
- Memorize the mathematical definitions of variance and standard
deviation for samples and populations.
- Master the vocabulary (see example
below).
Descriptive
Term |
Comments |
Central Tendency |
|
|
Mean |
Sensitive to
outliers & skew |
|
Median |
good when
outliers or skew present |
|
Mode |
rarely used |
Spread |
|
|
range |
Very sensitive
to outliers & skew |
|
variance |
Sensitive to outliers & skew |
|
standard
deviation |
Sensitive to
outliers & skew |
|
IQR |
good when
outliers or skew present |
Shape |
|
|
Symmetrical |
can have
multiple peaks |
|
Skewed left |
Skewed low, easy test |
|
Skewed right |
Skewed high,
hard test, income |
Summative Assessment:
Test--Objectives 1-36
|