48
Outline Outline: 1. Motivations 2. Smoothing 3. Degree of Freedom

Nonparametric Smoothing Methods and Model Selections T.C. Lin [email protected] Dept. of Statistics National Taipei University 5/4/2005

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

Outline

Outline:

1. Motivations

2. Smoothing

3. Degree of Freedom

Page 2: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

Motivation: Why nonparametric?

Simple Linear Regression: E(Y|X)=α+βX

• assume that the mean of Y is a linear function of X

• (+)easy in computation, description, interpretation, …etc.

• (-) limit of uses

Page 3: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

Note that the hat matrix in LSE of regression

I. symmetric and idempotent

II. constant preserving i.e. S1=1

III. = # of linearly independent predictors in a model = # of parameters in a model

)()()( SrankStrSStr T

SYXYXXXY :)'(ˆ 1

Page 4: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

If the dependence of E(Y) on X is far from linear,

• one can extend straight-line regression by adding terms like X2 to the model

• but it is difficult to guess the most appropriate function form just from looking at the data.

Page 5: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

Example: Diabetes data

1. Diabetes data (Sockett et al., 1987): a study of the factors affecting patterns of insulin-dependent diabetes mellitus in children.

• Response: logarithm (C-peptide concentration at diagnosis)

• Predictors: age and base deficit.

Page 6: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005
Page 7: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005
Page 8: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

What is smoothers?

A tool for summarizing the trend of a response Y as a function of one or more predictor measurements X1,X2,…,Xp.

Page 9: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

Idea of smoothers

Simplest Smoothers occur in the case of a categorical predictor,

Example: sex (male, female),

Example: color (red, blue, green)

To smooth Y simply average the values of Y in each category

Page 10: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

How about non-categorical predictor?

• usually lack replicates at each predictor value

• mimic category averaging through “local averaging”i.e. average the Y values in neighborhoods around each target value

Page 11: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

Two main uses of smoothers

I. Description: to enhance the visual appearance of the scatterplot of Y vs. X.

II. Estimate the dependence of the mean of Y on the predictor

Page 12: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

Two main decisions to be made in scatterplot smoothing

1. how to averaging the response values in each neighborhood ?(which brand of smoother?)

2. how big to take the neighborhoods ? (smoothing parameters=?)

Page 13: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

Scatterplot Smoothing

Notations:

• y=(y1, y2,…,yn)T

• x=( x1, x2,…,xn)T with x1< x2<…<xn

• Def: s(x0)=S(y|x=x0)

Page 14: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

Some scatterplot smoothers :1. Bin smoothers • Choose cut points

• Def :

the indices of data points in each region.

• (-): estimate is not smooth (jumps at each cut points).

Kcc 0

};{ 1 kikk cxciR

Page 15: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

2.Running-mean smoothers (moving average)

• Choose a symmetric nearest neighborhood • define the running mean

• (+):simple• (-):don’t work well (wiggly), severely bia

sed near the end points 

)()()( ixNji yavexs

iS

)(i

SxN

Page 16: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

3.Running-line smoothers

Def:

where and are the LSE for the data points in

(-): jagged

=>weighted LSE

)(ˆ)(ˆ)( 000 xxxs )(ˆ

0x )(ˆ0x)( 0xN S

Page 17: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

4. Kernel smoothers

Def:

where d(t) is a smooth even function decreasing in |t|, =bandwidth, C0 chosen so that the weights sum to 1

Example. Gaussian kernel

Example. Epanechnikov kernel

Example. the minimum variance kernel

|}(|00

jo

j

xxd

CS

,

;1||

0

),1(4

3)(

2

otherwise

tforttd

,

;1||

0

),53(8

3)(

2

otherwise

tforttd

Page 18: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

5.Running medians smoothers

Def:

• make the smoother resistant to outliers in the data

• nonlinear smoother

)()()( ixNji yMedxs

iS

Page 19: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

6. Regression splines

The regions are separated by a sequences of knots

Piecewise polynomial

e.g. piecewise cubic polynomial

joint smoothly at these knots

ps. more knots more flexible

 

},...,{ 1 K

Page 20: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

6a. piecewise-cubic spline

(1) s is cubic polynomial in any subintervals

(2) s has the two continuous derivates

(3) s has a third derivative that is step function with jumps at knots

),[ 1jj

Page 21: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

where a+ denotes the positive part of a

• it can be rewritten as a linear combination of K+4 basis function

• de Boor (1978): B-spline basis functions

3

1

3

3

2

210 )()(K

j jj xxxxxs

3421 )()(,,)(,1)( KK xxPxxPxP

(Continue) its parametric expression

Page 22: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

6b. Nature spline

      Def: Regression spline (see 6a)

+ boundary regions 0 ff

Page 23: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

7. Cubic smoothing splines

Find f that minimize the penalized residual sum of square

dttfxfyb

a

n

i ii

2

1

2 )}({)}({

• first term: closeness to the data

• second term: penalize curvature in the function

: (1) large values produce smoother curve

(2) small values produce wiggly curve

Page 24: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

8. Locally-weighted running-line smoothers (loess)

Cleveland (1979)

• define N(x0)=k nearest neighbors of x0

• Using tri-cube weight function in WLSE

Page 25: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005
Page 26: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

Smoothers for multiple predictors

1. multiple-predictor smoothers:example: kernel

(see figure) (-):difficulty of interpretation and computa

tion2. Additive model3. semi-parametric model

Page 27: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005
Page 28: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

“Curse of dimensionality”

Neighborhoods with a fixed number of points become less local as the dimensions increase (Bellman, 1961)

•For p=1 and span=.1 should length .1.

•For p=10 the side length need to be .8.

Page 29: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

additive model

Additive: Y = f 1(X1)+...+ fp (X2 ) + e

• The selection, estimation are usually based on the smoothing, backfitting, BRUTO, ACE, Projector, etc. (Hastie, 1990)

Page 30: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

• Backfitting (see HT 90)

• BRUTO Algorithm (see HT 90)

is a forward model selection procedure using a modified GCV, defined latter, to choose the significant variables and their smoothing parameters.

Page 31: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

(Smoothing in details)

assume

where ,

X independent with

)(XfY

0)( E

,)( 2 V

Page 32: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

The bias-variance trade-off

Example. running-mean

)( 12)(ˆ

isk xNj

iik k

yxf

)( 12

)()}(ˆ{

isk xNj

iik k

xfxfE

12)}(ˆ{

2

kxfV ik

k

Page 33: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005
Page 34: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

To expand f in Taylor series

assuming data are equally spaced with

, and ignoring R

Rxfxxxfxxxfxf iijiijij )()()()()()( 2

ii xx 1

2)(6

)1()}()(ˆ{)ˆ( iiikk xf

kkxfxfEfBias

k

Page 35: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

and the optimal k is chosen by minimizing

as2)}()(ˆ{)( iiki xfxfExMSE

5

1

24

2

})}({2

9{

i

opt xfk

Page 36: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

Automatic selection of smoothing parameters (1)Average mean-squared error

(2)Average predictive squared error

where is a new observation at Xi.

2

1)}()(ˆ{

1)( iik

n

ixfxfE

nMSE

2*

1)},({

1)( ii

n

ixfYE

nPSE

*

iY

Page 37: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

Some estimates of PSE:1. CV Cross Validation (CV)

where indicates the fit at xi, compute

d by leaving out the ith data point.

2

1)}(ˆ{

1)( i

ii

n

ixfy

nCV

)(ˆi

i xf

Page 38: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

Fact:

Since

)(}{ PSECVE

22 )}(ˆ)()({)}(ˆ{ i

i

iiii

i

i xfxfxfyExfyE

22 )}(ˆ)({ i

i

i xfxfE 22 )}(ˆ)({ ii xfxfE

Page 39: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

2. Average squared residual(ASR)

• is not a good estimate of PSE

2

1)}(ˆ)({

1)( ii

n

ixfxf

nASR

Page 40: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

linear smoothers

def1:

def2: where is called smoother matrix. (free of y)

e.g. running-mean, running-line, smoothing spline, kernel, loess and regression spline

)|()|()|( 2121 xybSxyaSxbyayS Syf ˆ ][ ijSS

Page 41: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

The Bias-variance trade-off for linear smoothers

n

bb

n

SStr TT

2)(

n

bb

n

SStrPSE

TT

2)(1)(

n

i

n

i iib

nf

nMSE

1

2

1

1)ˆvar(

1)(

Page 42: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

Cross Validation (CV)

constant preserving: weights Sij => Sij /(1-Sii)

=>

=>

=>

2

1)}(ˆ{

1)( i

ii

n

ixfy

nCV

j

n

ii

ij

i

i yijj S

Sxf

,1 )(1

)()(ˆ

)(ˆ)(,1

)()(ˆi

i

ii

n

jiji

i xfSijj

ySxf

)(1

)(ˆ)(ˆ

ii

iii

i

i S

xfyxfy

2

1}

)(1

)(ˆ{

1

ii

iin

i S

xfy

nCV

Page 43: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

Generalized Cross Validation

2

1}

/)(1

)(ˆ{

1

nStr

xfy

nGCV ii

n

i

Page 44: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

Degree of freedom of a smoother

Why need df? (here?)

The same data set and computational power of modern computers are used routinely in the formulation, selection, estimation, diagnostic and prediction of statistical model.

)( Strdf 1. 2. 3.

)2( Terr SSStrndf

)(var TSStrdf

Page 45: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

EDF (Ye, 1998)

Idea: A modeling / forecast procedure

said to be stable if small changes in Y produces small changes in the fitted values .

nnn RRYfYfYfY :))'(,),(()(ˆ

1

Page 46: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

More precisely (EDF) for ,

we would like to have,

where is a small matrix.

=> can be viewed as the slope of the straight line

nn R )',,( 1

HYfYfYY )()((ˆ)ˆ(

nji

i

iij Y

YhH ,,1,]

ˆ[][

iih

iiiii hYfYf )()(

Page 47: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

Data Perturbation Procedure

For an integer m > 1 (the Monte Carlo sample size), generateδ1, ...,δm as i.i.d. N(0, t2In) where t > 0 and In is the n×n identity matrix.

• Use the “perturbed” data Y +δj, to refit

• For i =1,2, ..., n, the slope of the LS line fitted to ( (Yi +δij), δij), j=1, ..., m, gives an estimate of hii.

Page 48: Nonparametric Smoothing Methods and Model Selections T.C. Lin tsair@mail.ntpu.edu.tw Dept. of Statistics National Taipei University 5/4/2005

An application

Table 1. MSE & SD of five models fitted to lynx data

About SD: Fit the same class of models to the first 100 obs., keeping the last 100 for out-of-sample predictions. SD = the standard deviation of the multi-step ahead prediction errors.

Model AR(2) SETAR ADD(1,2) ADD(1,2,9) PPR

MSE 0.0459 0.0358 0.0455 0.038 0.0194

MSEadj 0.0443 0.0365 0.0377

SD 0.295 0.136 0.100 0.347 0.247