Chapter 4: Producing Data
AP Statistics Standards
II. Sampling and
Experimentation: Planning and conducting a study (10% –15%)
A. Overview of methods of data
collection
- Census
- Sample survey
- Experiment
- Observational study
B. Planning and conducting surveys
- Characteristics of a well-designed
and well-conducted survey
- Populations, samples, and random
selection
- Sources of bias in sampling and surveys
-
Sampling methods, including
simple random sampling, stratified
random sampling, and cluster sampling
C. Planning and conducting experiments
- Characteristics of a
well-designed and well-conducted experiment
- Treatments, control groups,
experimental units, random assignments,
and replication
- Sources of bias and
confounding, including placebo effect and blinding
- Completely randomized design
- Randomized block design, including
matched pairs design
D. Generalizability of results from
observational studies, experimental studies,
and surveys
|
Objectives |
Essential Question:
Can bad data be corrected with
good statistical analysis? |
Designing Samples
- Distinguish between population and samples and tell
which one forms the basis of statistics.
Population: the entire group of individuals
Sample: a subset of the population used for drawing inferences about the population
-
Define bias. A flaw in the design of a study that causes the study's results to favors a particular outcome.
-
Define voluntary response and convenience
sampling. Explain why they
invariably
produces biased results.
-
Identify when confounding is present.
-
Explain why confounding and bias make
statistical inference impossible. Inference
implies that there is no other reasonable explanation for the data.
-
State the key difference between a statistical study
and a non-statistical study. Proper sampling technique
-
State the two basic forms of statistical studies.
Homefun (formative/summative
assessment): Exercises 1, 5, 7, 9 p. 226
Essential Question: What is an SRS? |
Creating an SRS
-
Describe an SRS (simple random sample) and state what it seeks to eliminate.
With a sample size of n, every set of n
individuals has an equal chance of being chosen from the population.
-
State how an SRS is formed.
- Label: subjects are assigned a number. (This is not a random process.)
- Table: numbers are randomly draw from the above list in order to select the subjects.
-
Use a table of random digits to create an SRS.
Seed a random number generator (such as a table) only once.
-
State the problem which the magic word "randomization" helps prevent. Bias.
Henceforth and forevermore, be informed
that anyone who hath not lost his or her mind nor desireth to fail this or any other statistics course shall observe the practice of randomization when sampling. One who forgets to include randomization when describing a sampling
operation shall wear the dreaded scarlet
R.
-
State the primary weakness of an SRS.
- Variability from study to study.
Occasionally a properly obtained random sample will select a single type
of subject instead of a representative sample. For example, a random sample
of the United States could end up being 100 % motor cycle gang members or
100% kindergarten teachers. They will most likely answer questions
in ways that do not properly represent the entire population.
Homefun (formative/summative
assessment): Exercises 11, 15, 17 p.226-227
Relevance: Proper sampling is the
foundation statistics rests on.
|
|
Essential Question:
What is the difference between
preventing variability and preventing bias in a statistical study? |
-
Describe the key method for preventing
variability
in observational studies and surveys. Stratification:
-
Identify key charateristics of subjects that could favor different results.
-
Divided the population into homogenious groups or strata based on these characteristics.
-
Select a sample from each strata using an SRS so that the fraction of subjects in the sample from each strata matches the fraction from each strata in the population.
- Describe a cluster sample (p. 218) and how it differs from a stratified sample.
Clusters are supposed to be representative of the entire population almost like small-sized random samples in themselves. By contrast, strata are deliberately set up to be homogeneous groups that ensure that no group goes unrepresented in a sample.
Cluster sampling is usually done in order to simplify or speed up the sampling process but are not convenience sampling. Subjects in convenience sampling are not representative of the entire population.
Example: In a theater, the seats become less expensive the further the row is away from the stage. A cluster sample could be obtained by randomly selecting columns (seats from the front of the theater to the back).
-
Describe how a multistage sample design is used
for preventing variability (p. 219). These typically combine 2 or more sampling methods.
-
Describe 4 ways to do a perfectly good job of sampling and still get worthless results. W - RUN
-
Wording Effects - Asking
the wrong question or biasing the result -- Do you favor universal health insurance or would you rather let small children die who can't afford medical care?
-
Response Bias - a) Intimidating
interviewer-- a policeman in uniform conducting a survey about cocain use. b) Intimidating question -- do you cheat on tests?
-
Under coverage -
Leaving groups
out of the sample selection process. Homeless people are particularly
difficult to reach since they typically have neither phone numbers or
addresses.
-
Non response -
Mr. Rogers
Syndrome (he doesn't do surveys)
Homefun (formative/summative
assessment): Exercises 21, 23, 27, 31, 33, 35 pp. 228-229
Relevance: Our democratic political
system depends on properly conducted sampling. It is the way politicians can
judge how people would vote on an issue if given the chance. A politician can
then choose to follow the will of the majority or attempt to educate the
majority to a more correct point of view.
|
Essential Question:
Which is more reliable a
census or a sample? |
- State the key advantage of a census over a
sample.
- no sampling errors.
Otherwise they both suffer from the same
kind of problems.
- Explain the key disadvantages of a census vs. a
sample-based survey.
- slow -- conditions can change before the
census is complete, for example during elections
- resource intensive -- expensive in terms of time, money, and personel requirements
-
State the problems common to both a sample-based survey and
a census with potential solutions.
W-RUN yet again
-
Wording Effects: use focus groups or preliminary surveys to screen questions. When giving the survey, ask the same
question in more than one way and test to see if the answers differ.
-
Response Bias: carefully select and train interviewers.
-
Under coverage: use a variety of
techniques to reach the sample group.
-
Non response: resurvey non-responders often with a different contact method such as a
phone call or personal interview.
Relevance: Taking a census
every 10 years is mandated by the U.S. Constitution. It is an
important factor in our political system because it determines things
such as the number of congressional representstives. It also
determines many types of funding.
|
Essential Question:
Why are experiments considered
more convincing than observational studies? |
Experiments
- Correctly use the following terms:
-
Experimental unit/subject
-
Treatment
-
Factor/level
-
Placebo effect -- an imagined effect that otherwise has no physical basis.
-
Control group
-
Completely randomized design -- all treatments are assigned to the subjects by a random process, hence the number of subjects in each group is generally not exactly the same.
-
State the magic word which is used in all experiments
and state why and how it is used. Hint: remember the "R".
-
Explain the conditions which make an experimental result
statistically significant. Assuming that the experiment has been properly done, the result or effect of a variable is unlikely to have happened by random chance alone.
-
Be as one with the three basic principles of
experimental design.
-
Control - effects of lurking
variables
-
Randomization - prevents sampling
bias.The treatment a
subject gets is chosen by a random process. When creating a sample from a larger population, the subjects are chosen by a random process, to the extent possible.
-
Replication - collect numerous data
points
-
Describe how
double blind testing is used.
-
Discuss the ethical considerations of double blind
testing.
-
Correctly use
blocking in an experimental design. Note that in an experiment human subjects are often volenteers who are provided some form of motivation, often payment. They cannot be considered to be randomly selected.
-
identify key lurking variable(s)
-
set up the blocks as
homogeneous as possible with
respect to key lurking variables
-
randomize the treatments: randomly select subjects from within the blocks for each treatment group so that the fraction of subjects in each treatment group from the various blocks matches the fraction of each block in the population.
Formative Assessment: Answer question 2 from free response section of the 1997 exam--table groups.
- Explain why blocking reduces study to study variability.
it
helps insure that lurking variables affect all treatment groups equally.
Hence, the effects tend to cancel out.
-
State the problem that blocking does not solve.
Bias
-
Set up matched pairs designs. In a matched pairs design, all of the subjects get all of the treatments. For example in educational situations (such as Spanish Camp) all of the subjects take the course. Evaluation is done by comparing a pre-test to a post-test. In taste tests, all the subjects taste each item.
Homefun (formative/summative
assessment): Exercises 45, 47, 51, 65, 73 pp. 228-229, Chapter 4 AP Statistics Practice Test T4.1 to T4.11 pp. 274-275
Summative Assessment: Test
Objectives 1 - 27
Relevance: Double blind testing is
the required standard for drug approval in the U.S. Any benefits claimed for a
drug, supplement or remedy that has not been double blind tested have to be
viewed with great skepticism. Example: facilitated communication with autistic
people.
|