Upload
3rlang
View
214
Download
0
Embed Size (px)
Citation preview
8/14/2019 ma_chap5
1/9
Regression Diagnostics
Regression Diagnostic asks 3 questions:
Are the assumptions of multiple regression
complied with?
Is the model adequate?
Is there anything unusual about any data points?
8/14/2019 ma_chap5
2/9
Checking for Non-violation of
Assumptions
Linearity of relationship between each X and Ycan be checked by scatter plot of Y against eachX.
Normality of distribution of Y data points canbe checked by plotting a histogram of residuals.
Independence of explanatory variables from
each other can be checked by scatter matrix,Variance Inflation Factor and Durbin-Watsonstatistic.
8/14/2019 ma_chap5
3/9
Diagnosis of Multi-collinearity
Check by means of correlation matrix
Significant Fbut non-significant t-ratios.
Variance Inflation. Large changes in regression
coefficients when variables are added or deleted.
Variance Inflation Factor (VIF) > 4 or 5 suggests multi-
collinearity; VIF > 10 is strong evidence that
collinearity is affecting the regression coefficients. DurbinWatson statistic is another check for
collinearity. (Normal value 0-4).
8/14/2019 ma_chap5
4/9
Diagnosis of Violation of
Assumptions
Residual Plotsare used to check for:
Variance not being constant across the
explanatory variables.
Fitted relationship not being linear.
Random variation not having a Normal
distribution.
8/14/2019 ma_chap5
5/9
Fitted Values and Residuals
Fitted values (Fits) are the estimates of Y as
determined by the regression equation.
Residuals (Resids) are the differences between
each observed value and the corresponding
fitted value.
8/14/2019 ma_chap5
6/9
Residual Plots
0
50
100
1st
Qtr
3rd
Qtr
EastWest
North
20015010050
60
50
40
30
20
10
0
-10
-20
-30
-40
Fitted Value
Residual
Residuals Versus the Fitted Values
(response is Crimrate)
0
50
100
1st
Qtr
3rd
Qtr
EastWest
North
45403530252015105
60
50
40
30
20
10
0
-10
-20
-30
-40
Observation Order
Residual
Residuals Versus the Order of the Data
(response is Crimrate)
50403020100-10-20-30-40
10
5
0
Residual
Frequency
Histogram of the Residuals(response is Crimrate)
6050403020100-10-20-30-40
2
1
0
-1
-2
NormalScore
Residual
Normal Probability Plot of the Residuals
(response is Crimrate)
8/14/2019 ma_chap5
7/9
Abnormal Patterns in Residual Plots
Figures a). and b).suggest non-linearrelationship between Xand Y.
Fig. c). Suggestsautocorrelation.
Fig. d). Suggests variance
is not the same since thespread of Y values is fargreater for larger valuesof X.
8/14/2019 ma_chap5
8/9
Checking Unusual Data Points
Check for outliers long distance away from the rest of
the data. They exercise leverage, which is checked by
hi. It is considered large if more than 3 x p /n
(p=number of predictors including the constant).Flagged by X in printout.
Cooks Distance which measures the influence of a data
point on the regression equation. Cooks D > 1
requires careful checking; > 4 suggests potentiallyserious outliers.
8/14/2019 ma_chap5
9/9
Patterns of Outliers
a). Outlier is extreme in both X andY but not in pattern. Removal isunlikely to alter regression line.
b). Outlier is extreme in both X andY as well as in the overall pattern.
Inclusion will strongly influenceregression line
c). Outlier is extreme for X nearlyaverage for Y.
d). Outlier extreme in Y not in X.
e). Outlier extreme in pattern, butnot inX or Y.