Mr Rogers AP Statistics Objectives for Regression Analysis

Mr. Rogers - AP Statistics Objectives

Syllabus	1st Quarter		2nd Quarter	3rd Quarter	4th Quarter
1 Distributions	2 N-Distribution	3 Regression		4 NL Regression	5 Data

Lesson Plan

Practice Test

Practice Test Answers

Chapter 3: Regression Analysis

AP Statistics Standards

I. Exploring Data: (continued)

D. Exploring bivariate data

Analyzing patterns in scatterplots

Correlation and linearity

Least-squares regression line

Residual plots, outliers, and influential points

Objectives

Essential Question: How can we establish and quantify a cause and effect relationship between two variables?

Chap 3 2 Variable (Bivariate) Relationships

Identify the response and explanatory variables from a plot.

response: y-variable, dependent

explanatory: x-variable, independent

Identify positive and negative associations from scatter plots.

Note: an association does not establish cause and effect

Detect linear and non-linear relationships using scatter plots.

Note: ALWAYS make a scatter plot when analyzing bivariate data

Judge the relative strength of a relationship by the amount of scatter around the curve of best fit.
Identify outliers on scatter plots.

Within the expected range X-values
Outside the expected range of Y-values for a given X-value

Identify "influential outliers" on scatter plots.

Outside the expected range X-values
Note: in this region the expected range of Y-values is undefined

Make scatter plots using the TI-83 calculator and Excel.
State why any analysis of 2 variable (bivariate) data should always begin with a scatter plot regardless of which tools are used to further analyze the data.

identifies outliers

reveals gaps and clusters in the data

displays patterns such as linearity or non-linearity

Note: in the ideal situation all the data points would have equal influence and be uniformly distributed.

Homefun (formative/summative assessment): Exercises 1, 3, 5, 7 pp. 158-159

Relevance: Many scientific constants and predictions are based on measurements of the slopes of lines.

Essential Question: Why is it important to quantify correlation instead of just estimating it by looking at a graph?

Correlation

State the meaning of correlation and how it is typically indicated.

r = correlation coefficient, range goes from -1 to +1
Strength -- the absolute value of r is close to 1
Direction
Assumes a linear relationship

Calculate r using the formula:

r = 1
Σ
( xi - xbar ) ( yi - ybar )

n - 1 sx sy

Be as one with the following facts about correlation:

r-square is bullet-proof

adding a constant to either y-variable or x-variable or both has no effect on r-square or slope.

multiplying either the y-variable or the x-variable or both has no effect on r-square

r is dimentionless in other words it has no units.

Correlation makes no distinction between explanitory and response variables.

Homefun (formative/summative assessment): Exercise 9, 15, 17 pp.159-160

Essential Question: Why would we need to find a mathematical relationship between variables? Isn't correlation enough?

Regression

Explain the difference between correlation and regression.

correlation: denotes the strength of an association

regression: yeilds a mathematical model (regression equation) of the association.

Perform regression/correlation analysis with the TI-83 calculator and Excel Spreads sheets.
What type of error does least squares regression minimize?

Error measured in y-dimension (y = response variable)

x-dimension (explanatory variable) considered error-free

Interpret regression equations.

Single yhat = ax + b
Multiple yhat = a_o + a₁x₁ + a₂x₂ + ... + a_nx_n

Calculate ybar using a regression equation, given xbar.
Properly state the meaning of slope according to the official statistics definition. (p155)

For every increase of one in the x-variable, the predicted y increases by the slope

Properly interpret the intercept.

example: (sales) = 50 (advertising dollars) + 87

What are the sales with no advertising? Answer: the intercept or 87

Describe the region where a given regression equation will give a meaningful association. within the range of x-values
Define and decry the use of extrapolation. Extrapolation is the act of drawing a conclusion based on the regression line in a region significantly outside of the range of x-values. These conclusions can be highly misleading.

example: (bushels tomatoes) = 2 (lb fertilizer) + 10 ,

x-range 0 to 5

If Bob puts 100 lb of fertilizer on his plants, how many bushels of tomatoes will he get. Answer: zero--he kills his tomato plants.

Be aware that the point (xbar, ybar) is in the center of the regression line. ybar = b (xbar) + a

Homefun (formative/summative assessment): Exercise 35, 37, 41 p.191

Essential Question: What happens to the regression analysis when we change units?

Solve problems using the following equations (b = slope, a = intercept):

b = r (s_y/ s_x) a = ybar - b(xbar)

Action

Effect

(Derived from the above 2 equations & info at right)

Effect

(Based on review information from previous chapters.)

Slope Intercept s_y s_x ybar xbar

Multiply by constant = k

x-data points multiply by 1 / k none none multiply by k none multiply by k

y-data points multiply by k multiply by k multiply by k none multiply by k none

both none multiply by k multiply by k multiply by k multiply by k multiply by k

Add a constant = k

x-data points none adds -bk none none none adds k

y-data points none adds k none none adds k none

both none adds (k-bk) none none adds k adds k

Homefun (formative/summative assessment): Exercise 47 p.192

Relevance: Regression and correlation are the mathematical tools much of the social sciences as well as business tools are founded on.

Essential Question: What does R-Square really mean?

The Meaning of R-Square

State the meaning of SST and SSE. Use them to calculate R-square.

SST = ∑ (y_i - ybar)² SST (Sum of Squares Total) is a measure of the scatter or variability of the y-data points about the y-data's mean.

SSE = ∑ (y_i - yhat)² SSE (Sum of Squared Errors) is a measure of the scatter or variability of the y-data points about the regression line. Remember, the x-data is assumed to be error free.

(SST - SSE) is a measure of the amount of variability in the y-data points explained by the regression line.

r²= (SST - SSE) / SST is a measure of the fraction of the variation in the values of y that is accounted for by the regression line of y on x.

Give the official interpretation of r-square (coefficient of determination).

Use the proper magic words p.180: r² is the fraction of the variation in the values of y that is accounted for by the regression line of y on x.

r-square evaluates the entire equation

Explain why care must be taken in using the official interpretation of r-squared. Remember, even when correlating data from random sources, r-squared can sometimes be reassuringly high.

Susceptible to outliers, especially influential outliers
Data points furthest from the center of the line have more influence. It's similar to a playground see-saw or teeter-totter: a person seated on the end will have more influence than a person seated close to the middle.
There may be no causative relationship between explanatory and response variables. A high r-square does not establish causation!
r-square applies to linear relationships. A low r-square value does not establish that there is no association. The association could be non-linear.

Homefun (formative/summative assessment): Homefun (formative/summative assessment): Exercise 53, 59, 63 p.192-194

Relevance: Sometimes major political decisions are made or social theories proposed based on questionable evidence from correlation/regression analysis. It's difficult to evaluate this evidence without knowing something about the meaning of r-square.

Stats Investigation (formative/summative assessment): Meaning of R-Square - time approx 2 class periods (individual work)

Purpose: Determine if a regression analysis using random numbers can yield an r-square value of 50% or more.

Instructions: Set up a regression analysis in Excel using integer x-values from 0 to 9. Use a random number from 0 to 10 for the y-values. Run this simulation 100 times. Calculate the average r-square and record the highest r-squared value. Record the three highest r-square values obtained in the class.

Save the data sets from your 4 regression/correlation results with the highest R-square value. You will use it again at the end of the year.

Questions /Conclusions:

Based on your data, does a high r-square value by itself indicate a meaningful association or causation?

Is the random number generator used in this investigation truly random?

Is it possible to get a high r-squared value merely from random events?

What does it really mean when we say that r-square represents the fraction of the variation in the values of y that is "explained" by the least squares regression of y on x? Discuss things like the SSM and SSE.

Essential Question: Can a regression equation with a high R-square be inappropriate?

Residuals

Define what is meant by a residual.

Mathematically: resid = y_i - yhat

English: a residual is the difference between the measured y-value and the y-value predicted by the regression equation.

Calculate residuals using a TI-83 calculator.
State 2 ways to plot residuals.

Residuals vs x commonly used with straight line equation

Residuals vs y commonly used with multiple regression analysis

State the major assumptions concerning distribution of y-data points about regression lines.

Y-data normally distributed: If y-data points were repeatedly gathered for a given x-value, the y-data would form a normal distribution with its mean corresponding to the yhat value calculated with the given x-value. Remember, y-values have random measurement errors in them. Repeated measurements of a y-value will not give the exact same number.

Uniform spread in y-data from one end of the line to the other: The spread in the above distribution would be the same for every possible x-value. (See objective 32 to estimate the size of the spread.)

Interpret residual plot patterns.

Residual Plot conclusion: either appropriate or inappropriate

Random--appropriate
Smiley or Frowning Face (Mr. R's Terms)--inappropriate
Pattern in the scatter--inappropriate

Note: residual plots merely magnify the patterns that can be observed in a scatter plot. The horizontal line at the origin of a residual plot represents the regression line. A person skilled at interpreting scatter plots will arrive at the same conclusions that can be drawn from a residual plot.

Make residual plots using a TI-83.

Store x-data in L1 and y-data in L2

First perform the regression analysis for L1, L2

2nd

STAT

RESID

ENTER

RESID

STO

L3

Create a scatter plot of L1 on the horizontal axis and L3 on the vertical axis

State the sum of the residuals. zero

Correctly interpret the standard error of the least squares regression line. The standard error of the least squares regression line is related to the residuals as shown below and is a measure of the spread of the data around the regression line. It can be considered an estimate of the standard deviation of the normal distribution described in objective 28.

Most computer printouts will report a value for s. (see Minitab Output )

s = [ S(y - yhat)²/ (n-2) ]^1/2

s = [ S(residual)²/ (n-2) ]^1/2

s = [ SSE / (n-2) ]^1/2

Homefun (formative/summative assessment): prob. 46, 60, 61, 71, 73 pp.192-196

Relevance: Even though the world is largely non-linear, parts of it can often be accurately described with linear models. Knowing when a linear model is inappropriate is essential to building effective models.

Various types of regression models are used in everything from predicting grades on AP tests to computer control of chemical plants.

Essential Question: How can I make an "A" on the test?

Regression/Correlation Analysis Review

Work the practice test.

Review the objectives.

Look over free response problems from previous years.

Master the vocabulary (see example below).

Summative Assessment: Test--Objectives 1-32

Stats Investigation (formative/summative assessment): Determining if a Regression Equation is Appropriate - time approx 1 class periods (individual work)

Purpose: Determine if a linear regression equation is appropriate for two different situations.

Background: Commercial resistors follow ohm's law while light bulbs, due to their high temperatures do not. Ohm's law is as follows:

I = (1/R) V

Where: I = current, V= voltage and R = resistance.

Plotting I vs. V will theoretically yield a straight line passing through the origin.

Instructions: Set up a least squares linear regression analysis in Excel to find the association between current (response variable) and voltage (explanatory variable) for a commercial resistor and for a light bulb. Remember that this means a scatter plot as well as finding the slope, intercept, and R-square for the data. Set up the formulas needed to plot a residual plot and make such a plot for the two sets of data.

Questions /Conclusions:

Based on your data, does a high r-square value by itself indicate a meaningful association or causation?

Find the resistance value in Ohms for the commercial resistor?

Is a linear equation appropriate for the commercial resistor? How about the light bulb. Explain your answers.

SAM Team--Southside High School's STEM and Computer Science extra-curricular club (Mr. Rogers Sponsor)

Mr. Rogers' Twitter Site

Mr. Rogers Teacher's Blog

Mr. Rogers T-shirts

Mr. Rogers Information for Teachers

Mr. Rogers Science Fair Information

Check out other web sites created by Mr. R:

Check out Articles by Mr. Rogers:

Nerds: Let's Celibrate Nerdiness!:

Insultingly Stupid Movie Physics is one of the most humorous, entertaining, and readable physics books available, yet is filled with all kinds of useful content and clear explanations for high school, 1st semester college physics students, and film buffs.

It explains all 3 of Newton's laws, the 1st and 2nd laws of thermodynamics, momentum, energy, gravity, circular motion and a host of other topics all through the lens of Hollywood movies using Star Trek and numerous other films.

If you want to learn how to think physics and have a lot of fun in the process, this is the book for you!

First the web site,

now the book!