Learning From Data Lecture 13 Validation and Model...

Learning From DataLecture 13

Validation and Model Selection

The Validation Set

Model SelectionCross Validation

M. Magdon-IsmailCSCI 4100/6100

recap: Regularization

Regularization combats the effects of noise by putting a leash on the algorithm.

Eaug(h) = Ein(h) +λ

NΩ(h)

Ω(h)→ smooth, simple h— noise is rough, complex.

Different regularizers give different results— can choose λ, the amount of regularization.

λ = 0 λ = 0.0001 λ = 0.01 λ = 1

DataTargetFit

Overfitting → → Underfitting

Optimal λ balances approximation and generalization, bias and variance.

c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 2 /29 Peeking at Eout −→

Validation: A Sneak Peek at Eout

Eout(g) = Ein(g) + overfit penalty︸︷︷︸

VC bounds this using a complexity error bar for Hregularization estimates this through a heuristic complexity penalty for g

Validation goes directly for the jugular:

Eout(g)︸︷︷︸

validation estimates this directly

= Ein(g) + overfit penalty.

In-sample estimate of Eout is the Holy Grail of learning from data.

c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 3 /29 Test set −→

The Test Set

D(N data points)

Dtest(K test points)

−−−−→

gek = e(g(xk), yk)−−−−−−−−−−−−→ e1, e2, . . . , eK

−−−−→

g Etest =1

−−−−→

Eout(g)

Etest is an estimate for Eout(g)

EDtest[ek] = Eout(g)

E[Etest] =1

Eout(g)= Eout(g)

e1, . . . , eK are independent

Var[Etest] =1

Var[ek]

KVar[e]

տdecreases like 1

bigger K =⇒ more reliable Etest.

c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 4 /29 Validation set −→

The Validation Set

D(N data points)

−−−→

−−−−−−−−−−−−−−−→Dtrain

(N −K training points)

Dval(K validation points)

−−−−→

gek = e(g (xk), yk)−−−−−−−−−−−−→ e1, e2, . . . , eK

−−−−→

g Eval =1

−−−−→

Eout(g )

1. Remove K points from D

D = Dtrain ∪ Dval.

2. Learn using Dtrain −→ g .

3. Test g on Dval −→ Eval.

4. Use error Eval to estimate Eout(g ).

c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 5 /29 Validation −→

The Validation Set

D(N data points)

−−−→

−−−−−−−−−−−−−−−→Dtrain

−−−−→

gek = e(g (xk), yk)−−−−−−−−−−−−→ e1, e2, . . . , eK

−−−−→

g Eval =1

−−−−→

Eout(g )

1. Remove K points from D

D = Dtrain ∪ Dval.

2. Learn using Dtrain −→ g .

3. Test g on Dval −→ Eval.

4. Use error Eval to estimate Eout(g ).

c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 6 /29 Reliability of validation −→

The Validation Set

D(N data points)

−−−→

−−−−−−−−−−−−−−−→Dtrain

−−−−→

gek = e(g (xk), yk)−−−−−−−−−−−−→ e1, e2, . . . , eK

−−−−→

g Eval =1

−−−−→

Eout(g )

Eval is an estimate for Eout(g )

EDval[ek] = Eout(g )

E[Etest] =1

Eout(g )= Eout(g )

e1, . . . , eK are independent

Var[Eval] =1

Var[ek]

KVar[e(g )]

տdecreases like 1

depends on g , not H

bigger K =⇒ more reliable Eval?

c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 7 /29 Eval versus K −→

Choosing KPSfrag

Size of Validation Set, K

ectedE

10 20 30

Rule of thumb: K∗ = N5 .

c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 8 /29 Restoring D −→

Restoring D

Dtrain

(N −K)

Eval(g )g

CUSTOMER

Primary goal: output best hypothesis.

g was trained on all the data.

Secondary goal: estimate Eout(g).

g is behind closed doors.

Eout(g) Eout(g )

↓ ↓

Ein(g) Eval(g )︸︷︷︸

which should we use?

c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 9 /29 Eval versus Ein −→

Eval Versus Ein

Eout(g) ≤ Ein(g) + O

(√dvc

Eout(g) ≤ Eout(g )≤Eval(g ) +O

(1√K

↑learning curve is decreasing

(a practical truth, not a theorem)

ւBiased error bar depends on H.

տUnbiased error bar depends on g .

Eval(g) usually wins as an estimate for Eout(g), especially when the learning curve is not steep.

c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 10 /29 Model Selection −→

Model Selection

The most important use of validation

H1 H2 H3 · · · HM

−−−→

g1 g2 g3 · · · gM

−−−→

Dtrain −−−→

Dval −−−→

c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 11 /29 Validation estimate for Eout(g1) −→

Validation Estimate for (H1, g1)

H1 H2 H3 · · · HM

−−−→

Eval(g1)

Dtrain −−−→

Dval −−−→

c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 12 /29 Call it E1 −→

Validation Estimate for (H1, g1)

H1 H2 H3 · · · HM

−−−→

Dtrain −−−→

Dval −−−→

c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 13 /29 Validation estimates E1, . . . , EM −→

Compute Validation Estimates for All Models

H1 H2 H3 · · · HM

−−−→

g1 g2 g3 · · · gM

−−−→

E1 E2 E3 · · · EM

Dtrain −−−→

Dval −−−→

c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 14 /29 Pick best validation error −→

Pick The Best Model According to Validation Error

H1 H2 H3 · · · HM

−−−→

g1 g2 g3 · · · gM

−−−→

E1 E2 E3 · · · EM

Dtrain −−−→

Dval −−−→

c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 15 /29 Biased Eval(gm∗) −→

Eval(gm∗) is not Unbiased For Eout(gm∗)

Validation Set Size, K

ectedError

Eval (gm∗)

Eout (gm∗)

5 15 25

. . . because we choose one of the M finalists.

Eout(gm∗) ≤ Eval(gm∗) + O

(√lnM

↑VC error bar for selecting a hypothesisfrom M using a data set of size K.

Restoring D

H1 H2 H3 · · · HM

−−−→

g1 g2 g3 · · · gM

−−−→

Model with best g also has best g ← leap of faith

We can find model with best g using validation ← true modulo Eval error bar

c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 17 /29 Comparing Ein and Eval −→

Comparing Ein and Eval for Model Selection

Validation Set Size, K

ectedE

optimal

validation: gm∗

in-sample: gm

validation: gm∗

5 15 25

0.56H1 H2 HM

g1 g2 gM

· · ·

E1 · · · EM

Dtrain

(Hm∗ , Em∗)

︸︷︷︸pick the best

Application to Selecting λ

Which regularization parameter to use?

λ1, λ2, . . . , λM .

This is a special case of model selection over M models,

(H, λ1) (H, λ2) (H, λ3) · · · (H, λM)

−−−→

g1 g2 g3 · · · gM

Picking a model amounts to chosing the optimal λ

The Dilemma When Choosing K

Validation relies on the following chain of reasoning,

Eout(g) ≈(small K)

Eout(g ) ≈(large K)

Eval(g )

Can we get away with K = 1?

Yes, almost!

The Leave One Out Error (K = 1)

E[e1] = Eout(g1)

−−−−−−−−−−→

. . . but it is a wild estimate

The Leave One Our Errors

E[e1] = Eout(g1)

Ecv =1

Cross Validation is Unbiased

Theorem. Ecv is an unbiased estimate of Eout(N − 1).տ

Expected Eout when learning with N − 1 points.

Reliability of Ecv

en and em are not independent.

en depends on gn which was trained on (xm, ym).

em is evaluated on (xm, ym).

en Ecv

Effective number of fresh examplesgiving a comparable estimate of Eout

Cross Validation is Computationally Intensive

N epochs of learning each on a data set of size N − 1.

• Analytic approaches, for example linear regression with weight decay

wreg = (ZtZ + λI)−1Zty

Ecv =1

(yn − yn

1− Hnn(λ)

H(λ) = Z(ZtZ + λI)−1Zt.

• 10-fold cross validation

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10

train trainvalidate

D︷︸︸︷

Restoring D

D2 · · ·

· · ·

︸︷︷︸take average

g2(x1, y1) (x2, y2) (xN , yN )

e1 e2 eN· · ·

CUSTOMER

Eout(g(N))≤ Eout(N − 1) ≤ Ecv +O

(1√N

↑learning curve

↑nearly independent en

Ecv can be used for model selection just as Eval, for example to choose λ.

Digits Problem: ‘1’ Versus ‘Not 1’

Average Intensity

Not 1# Features Used

5 10 15 20

x = (1, x1, x2)

z = (1, x1, x2, x21, x1x2, x

21x2, x1x

32, . . . , x

41x2, x

32, x1x

︸︷︷︸5th order polynomial transform −→ 20 dimensional non linear feature space

Validation Wins In the Real World

Average Intensity

no validation (20 features)

Ein = 0%

Eout = 2.5%

cross validation (6 features)

Ein = 0.8%

Eout = 1.5%

Learning From Data Lecture 13 Validation and Model...

Documents

PENGEMBANGAN MEDIA PEMBELAJARAN IPA SD MATERI … · results of expert’s validation on the learning media showed that the quality of the learning media was great, with the average

Lecture 10 - GitHub Pages · 2019-12-14 · Lecture 10 Deep Learning@UvA. ... Lecture Overview. UVA DEEP LEARNING COURSE EFSTRATIOS GAVVES DEEPER INTO DEEP LEARNING AND OPTIMIZATIONS*

Machine Learning Basics Lecture 2: linear classification · Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang. Review:

IMPLEMENTASI DATA MINING PADA SISTEM E ...etheses.uin-malang.ac.id/18909/1/16650100.pdfKeywords : Data Mining, E-Learning, Moodle, K-Fold Cross Validation, Learning Vector Quantization

UVA CS 4774: Machine Learning Lecture 5: Linear Regression

PHYSICAL OCEANOGRAPHY Lecture 17whan/ATOC5051/Lecture_Notes/... · 2019-11-01 · PHYSICAL OCEANOGRAPHY Lecture 17 Review: Learning objectives: understand and appreciate “The mixing

Chapter 1. C++parkjonghyuk.net/lecture/2011-2nd-lecture/programming2/... · 2011-09-13 · UCS Lab Learning Objectives •C++ 소개 –기원, 객체지향프로그래밍, 용어

CS 540: Machine Learning Lecture 3: Bayesian parameter ... › ~arnaud › cs540 › lec3_CS540_handouts.pdf · Lecture 3: Bayesian parameter estimation and hypothesis testing for

Machine Learning Foundations hxÒ - 國立臺灣大學 › ~htlin › mooc › doc › 01_present.pdf · Machine Learning Foundations (_hxÒœó) Lecture 1: The Learning Problem Hsuan-Tien

Learning From Data Lecture 11 Overfittingmagdon/courses/LFD-Slides/SlidesLect11.pdf · Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

PowerPoint Presentation · Lecture Capture design curriculum Lecture student' engagement=š Distance Learning flexible Blended Learning Student Engagement . MAHIDOL UNIVERSITY

Machine Learning Foundations hxÒhtlin/mooc/doc/16_handout.pdf · 2020. 5. 29. · Machine Learning Foundations (_hxÒœó) Lecture 16: Three Learning Principles Hsuan-Tien Lin (ŠÒ0)

Lecture: Interkulturelle Aspekte im e-Learning (German)

E-LEARNING · E-learning Sanction de la formation Attestation de réussite Objectif pédagogique Résultat attendu : au moins 70 % au quiz de validation des connaissances Contact

2013-1 Machine Learning Lecture 01 - Pattern Recognition

CS 103: Representation Learning, Information Theory and ... · CS 103: Representation Learning, Information Theory and Control Lecture 6, Feb 15, 2019

E-tools and Validation – areas of significance and dilemmas Validation and recognition of prior learning Expert seminar March 3 – 4, 2010 Håkon Grunnet,

Arbeitsbericht 2016/2017 - uni-bremen.de€¦ · democratic learning environments (‘ELEF‘) and the impact of validation of prior non-formal and informal learning on individual

Lecture 12 HYDRAULIC ACTUATORS - NPTEL 2/Lecture 12.pdf · Lecture 12 HYDRAULIC ACTUATORS Learning Objectives ... inside a cylindrical housing called barrel. On one end of the piston

Learning From Data Lecture 25 The Kernel Trickmagdon/courses/LFD-Slides/SlidesLect25.pdfLearning From Data Lecture 25 The Kernel Trick Learning with only inner products The Kernel