57
Deep Learning for Computer Vision Fall 2020 http://vllab.ee.ntu.edu.tw/dlcv.html (Public website) https://cool.ntu.edu.tw/courses/3368 (NTU COOL; for grade, etc.) Yu-Chiang Frank Wang 王鈺強, Professor Dept. Electrical Engineering, National Taiwan University 2020/09/22

Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: [email protected]

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Deep Learning for Computer VisionFall 2020

http://vllab.ee.ntu.edu.tw/dlcv.html (Public website)

https://cool.ntu.edu.tw/courses/3368 (NTU COOL; for grade, etc.)

Yu-Chiang Frank Wang 王鈺強, Professor

Dept. Electrical Engineering, National Taiwan University

2020/09/22

Page 2: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Tight (yet tentative) Schedule

2

Week Date Topic Remarks

1 9/15 Course Logistics

2 9/22 Machine Learning 101

3 9/29 Convolutional Neural Network (I) HW #1 out

4 10/6 Convolutional Neural Network (II): Visualization & Extensions of CNN

5 10/13 Tutorials on Python, Github, etc. (by TAs);Invited Talk by Prof. Philipp Krähenbühl (UT Austin)

HW #1 due

6 10/20 Object Detection & Segmentation HW #2 out

7 10/27 Generative Models & Generative Adversarial Network (GAN)

8 11/3 Transfer Learning for Visual Classification & Synthesis (I)

9 11/10 Transfer Learning for Visual Classification & Synthesis (II); Representation Disentanglement

HW #2 due;HW #3 out

10 11/17 TBD (CVPR Week)

11 11/24 Recurrent Neural Networks & Transformer

12 12/1 Meta-Learning; Few-Shot and Zero-Shot Classification (I) HW #3 due

13 12/8 Meta-Learning; Few-Shot and Zero-Shot Classification (II) HW #4 out

14 12/15 From Domain Adaptation to Domain Generalization Team-up for Final Projects

15 12/22 Beyond 2D vision (3D and Depth)

16 12/29 Image Inpainting and Outpainting; Guest Lecture HW #4 due

17 1/5 Guest Lectures

1/18-22 Presentation for Final Projects TBD

3 weeks

2 weeks

3 weeks

3 weeks

Page 3: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

What’s to Be Covered Today…

• From Probability to Bayes Decision Rule• Unsupervised vs. Supervised Learning

• Clustering & Dimension Reduction • Training, testing, & validation• Linear Classification

3

Page 4: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Example: Testing/Screening of COVID-19

4

Positivefor COVID

Negativefor COVID

Distributions between positive/negative test results (e.g., PCR, antibody, etc.)- further away from each other- more accurate COVID diagnosis

Page 5: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Bayesian Decision Theory

• Fundamental statistical approach to classification/detection tasks

• Take a 2-class classification/detection task as an example:• Let’s see if a student would pass or fail the course of DLCV.• Define a probabilistic variable ω describe the case of pass or fail.• That is, ω = ω1 for pass, and ω = ω2 for fail.

• Prior Probability• The a priori or prior probability reflects the knowledge of

how likely we expect a certain state of nature before observation.• P(ω = ω1) or simply P(ω1) as the prior that the next student would pass DLCV.• The priors must exhibit exclusivity and exhaustivity, i.e.,

• Equal priors• If we have equal numbers of students pass/fail DLCV, then the priors are equal;

in other words, the priors are uniform.

5

Page 6: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Prior Probability (cont’d)

• Decision rule based on priors only• If the only available info is the prior,

and the cost of any type of incorrect classification is equal, what would be a reasonable decision rule?

• Decide ω1 if

otherwise decide ω2 .• What’s the incorrect classification rate (or error rate) Pe?

6

Page 7: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Class-Conditional Probability Density (or Likelihood)

• The probability density function (PDF) for input/observation x given a state of nature ωis written as:

• Here’s (hopefully) the hypothetical class-conditional densities reflecting the time of the students spending on DLCV who eventually pass/fail this course.

7

Page 8: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Posterior Probability & Bayes Formula

• If we know the prior distribution and the class-conditional density, can we come up with a better decision rule?

• Yes We Can! • By calculating the posterior probability.

• Posterior probability 𝑃𝑃(𝜔𝜔|𝒙𝒙) :• The probability of a certain state of nature ω given an observable x.

• Bayes formula:𝑃𝑃 𝜔𝜔𝑗𝑗 ,𝒙𝒙

𝑃𝑃(𝜔𝜔𝑗𝑗|𝒙𝒙)

And, we have ∑𝑗𝑗=1𝐶𝐶 𝑃𝑃(𝜔𝜔𝑗𝑗|𝒙𝒙) = 1.8

Page 9: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Decision Rule & Probability of Error

• For a given observable x (e.g., # of GPUs), the decision rule (to take DLCV or not) will be now based on:

• What’s the probability of error P(error) (or Pe)?

9

Page 10: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

From Bayes Decision Rule to Detection Theory

• Hit (detection, TP), false alarm (FA, FP), miss (false reject, FN), rejection (TN)

• Receiver Operating Characteristics (ROC)• To assess the effectiveness of the designed features/classifiers• False alarm (PFA or FP) vs. detection (Pd or TP) rates• Which curve/line makes sense? (a), (b), or (c)?

10

PD

PFA

100

1000

Page 11: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

What’s to Be Covered Today…

• From Probability to Bayes Decision Rule• Unsupervised vs. Supervised Learning

• Clustering & Dimension Reduction • Training, testing, & validation• Linear Classification

11

Page 12: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Clustering

• Clustering is an unsupervised algorithm.• Given:

a set of N unlabeled instances {x1, …, xN}; # of clusters K• Goal: group the samples into K partitions

• Remarks:• High within-cluster (intra-cluster) similarity• Low between-cluster (inter-cluster) similarity• But…how to determine a proper similarity measure?

12

Page 13: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Similarity is NOT Always Objective…

13

Page 14: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Clustering (cont’d)

• Similarity:• A key component/measure to perform data clustering• Inversely proportional to distance• Example distance metrics:

• Euclidean distance (L2 norm): 𝑑𝑑 𝑥𝑥, 𝑧𝑧 = 𝑥𝑥 − 𝑧𝑧 2 = ∑𝑖𝑖=1𝐷𝐷 𝑥𝑥𝑖𝑖 − 𝑧𝑧𝑖𝑖 2

• Manhattan distance (L1 norm): 𝑑𝑑 𝑥𝑥, 𝑧𝑧 = 𝑥𝑥 − 𝑧𝑧 1 = ∑𝑖𝑖=1𝐷𝐷 𝑥𝑥𝑖𝑖 − 𝑧𝑧𝑖𝑖

• Note that p-norm of x is denoted as:

14

Page 15: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Clustering (cont’d) • Similarity:

• A key component/measure to perform data clustering• Inversely proportional to distance• Example distance metrics:

• Kernelized (non-linear) distance:

𝑑𝑑 𝑥𝑥, 𝑧𝑧 = Φ(𝑥𝑥) − Φ 𝑧𝑧 22 = Φ(𝑥𝑥) 2

2 + Φ(𝑧𝑧) 22 − 2Φ(𝑥𝑥)𝑇𝑇Φ(𝑧𝑧)

• Taking Gaussian kernel for example: 𝐾𝐾 𝑥𝑥, 𝑧𝑧 = Φ 𝑥𝑥 𝑇𝑇Φ 𝑧𝑧 = 𝑒𝑒𝑥𝑥𝑒𝑒 − 𝑥𝑥−𝑧𝑧 22

2𝜎𝜎2,

we have

And, distance is more sensitive to larger/smaller σ. Why?• For example, L2 or kernelized distance metrics for the following two cases?

15

Page 16: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

K-Means Clustering

• Input: N examples {x1, . . . , xN } (xn ∈RD ); number of partitions K• Initialize: K cluster centers µ1, . . . , µK . Several initialization options:

• Randomly initialize µ1, . . . , µK anywhere in RD

• Or, simply choose any K examples as the cluster centers• Iterate:

• Assign each of example xn to its closest cluster center• Recompute the new cluster centers µk (mean/centroid of the set Ck )• Repeat while not converge

• Possible convergence criteria:• Cluster centers do not change anymore• Max. number of iterations reached

• Output:• K clusters (with centers/means of each cluster)

16

Page 17: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

K-Means Clustering

• Example (K = 2): Initialization, iteration #1: pick cluster centers

17

Page 18: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

K-Means Clustering

• Example (K = 2): iteration #1-2, assign data to each cluster

18

Page 19: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

K-Means Clustering

• Example (K = 2): iteration #2-1, update cluster centers

19

Page 20: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

K-Means Clustering

• Example (K = 2): iteration #2, assign data to each cluster

20

Page 21: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

K-Means Clustering

• Example (K = 2): iteration #3-1

21

Page 22: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

K-Means Clustering

• Example (K = 2): iteration #3-2

22

Page 23: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

K-Means Clustering

• Example (K = 2): iteration #4-1

23

Page 24: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

K-Means Clustering

• Example (K = 2): iteration #4-2

24

Page 25: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

K-Means Clustering

• Example (K = 2): iteration #5, cluster means are not changed.

25

Page 26: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

K-Means Clustering (cont’d)• Limitation

• Preferable for round shaped clusters with similar sizes

• Sensitive to initialization; how to alleviate this problem?• Sensitive to outliers; possible change from K-means to…• Hard assignment only. Mathematically, we have…

• Remarks• Expectation-maximization (EM) algorithm• Speed-up possible by hierarchical clustering (e.g., 100 = 102 clusters)

26

Page 27: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

What’s to Be Covered Today…

• From Probability to Bayes Decision Rule• Unsupervised vs. Supervised Learning

• Clustering & Dimension Reduction • Training, testing, & validation• Linear Classification

27

Page 28: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Dimension Reduction• Principal Component Analysis (PCA)

• Unsupervised & linear dimension reduction• Related to Eigenfaces, etc. feature extraction and classification techniques• Still very popular despite of its simplicity and effectiveness.• Goal:

• Determine the projection, so that the variation of projected data is maximized.

28x

y

axis that describes the largest variation for data projected onto it.

Page 29: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Formulation & Derivation for PCA

• Input: a set of instances x without label info

• Output: a projection vector ω maximizing the variance of the projected data

29

x

y

Page 30: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Formulation & Derivation for PCA (cont’d)

• We need to maximize var 𝝎𝝎𝑇𝑇𝒙𝒙 with 𝝎𝝎 = 1.

• Lagrangian optimization for PCA

30

Page 31: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Eigenanalysis & PCA• Eigenanalysis for PCA = find the eigenvectors ωi and the corresponding eigenvalues λi

• In other words, the direction ωi captures the variance of λi.• But, which eigenvectors to use though? All of them?

• A d x d covariance matrix ∑ contains a maximum of d eigenvector/eigenvalue pairs. • Do we need to compute all of them? Which ωi and λi pairs to use?• Assuming you have N images of size M x M pixels, we have dimension d = M2.• What is the rank of ∑? It’s N, M, or d?

• Thus, at most non-zero eigenvalues can be obtained. Why?

31

Page 32: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Eigenanalysis & PCA (cont’d)• A d x d covariance matrix contains a maximum of d eigenvector/eigenvalue pairs.

• How dimension reduction is realized? how to reconstruct the input data?

• Expanding a signal via eigenvectors as bases• With symmetric matrices (e.g., covariance matrix), eigenvectors are orthogonal.• They can be regarded as unit basis vectors to span any instance in the d-dim space.

32

Page 33: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Practical Issues in PCA (if time permits…)

• Assume we have N = 100 images of size 200 x 200 pixels (i.e., d = 40000).• What is the size of the covariance matrix? What’s its rank?

• What can we do? Gram Matrix Trick!

33

Page 34: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Let’s See an Example (CMU AMP Face Database)

• Let’s take 5 face images x 13 people = 65 images, each is of size 64 x 64 = 4096 pixels.

• # of eigenvectors are expected to use for perfectly reconstructing the input = 64.

• Let’s check it out!

34

Page 35: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

What Do the Eigenvectors/Eigenfaces Look Like?

35

V4 V5 V6 V7

V8 V9 V10 V11

V12 V13 V14 V15

Mean V1 V2 V3

Page 36: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

All 64 Eigenvectors, do we need them all?

36

Page 37: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Use only 1 eigenvector, MSE = 1233

37

MSE=1233.16

Page 38: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Use 2 eigenvectors, MSE = 1027

38

MSE=1027.63

Page 39: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Use 3 eigenvectors, MSE = 758

39

MSE=758.13

Page 40: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Use 4 eigenvectors, MSE = 634

40

MSE=634.54

Page 41: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Use 8 eigenvectors, MSE = 285

41

MSE=285.08

Page 42: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

With 20 eigenvectors, MSE = 87

42

MSE=87.93

Page 43: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

With 30 eigenvectors, MSE = 20

43

MSE=20.55

Page 44: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

With 50 eigenvectors, MSE = 2.14

44

MSE=2.14

Page 45: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

With 60 eigenvectors, MSE = 0.06

45

MSE=0.06

Page 46: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

All 64 eigenvectors, MSE = 0

46

MSE=0.00

Page 47: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Final Remarks

• Linear & unsupervised dimension reduction

• PCA can be applied as a feature extraction/preprocessing technique.• E.g,, Use the top 3 eigenvectors to project data into a 3D space for classification.

470 500 1000 1500 2000 2500-2000

0-1000

1000

02000

200

400

600

800

1000

1200 Class1Class2Class3

Page 48: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Final Remarks (cont’d)

• How do we perform classification? For example…• Given a test face input, project into the same 3D space (by the same 3 eigenvectors).• The resulting 3D vector is the feature vector for this test input.• We can do a simple Nearest Neighbor (NN) classification (with Euclidean distance),

which calculates the distance between the input to all training data in this space.• If NN (1-NN), the label of the closest training instance determines the classification output.• If k-nearest neighbors (k-NN), then k-nearest neighbors need to vote for the decision.

48

k = 1 k = 3 k = 5

Image credit: Stanford CS231n

Demo available at http://vision.stanford.edu/teaching/cs231n-demos/knn/

Page 49: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Final Remarks (cont’d)

• If labels for each data is provided → Linear Discriminant Analysis (LDA)• LDA is also known as Fisher’s discriminant analysis.• Eigenface vs. Fisherface (IEEE Trans. PAMI 1997)

• If linear DR is not sufficient, and non-linear DR is of interest…• lsomap, locally linear embedding (LLE), etc.

• t-distributed stochastic neighbor embedding (t-SNE) (by G. Hinton & L. van der Maaten)

49

Page 50: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

What’s to Be Covered Today…

• From Probability to Bayes Decision Rule• Unsupervised vs. Supervised Learning

• Clustering & Dimension Reduction • Training, testing, & validation• Linear Classification

50

Page 51: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Hyperparameters in ML

• Recall that for k-NN, we need to determine the k value in advance. • What is the best k value?• Or, take PCA for example, what is the best reduced dimension number?

• Hyperparameters: parameter choices for the learning model/algorithm• We need to determine such hyperparameters instead of guessing.• Let’s see what we can and cannot do…

51

k = 1 k = 3 k = 5

Image credit: Stanford CS231n

Page 52: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

How to Determine Hyperparameters?

• Idea #1• Let’s say you are working on face recognition.• You come up with your very own feature extraction/learning algorithm.• You take a dataset to train your model, and select your hyperparameters

(e.g., k of k-NN) based on the resulting performance.

• Might not generalize well.

52

Dataset

Page 53: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

How to Determine Hyperparameters? (cont’d)

• Idea #2• Let’s say you are working on face recognition.• You come up with your very own feature extraction/learning algorithm.• For a dataset of interest, you split it into training and test sets.• You train your model with possible hyperparameter choices (e.g., k in k-NN),

and select those work best on test set data.

• That’s called cheating…

53

Training set Test set

Page 54: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

How to Determine Hyperparameters? (cont’d)

• Idea #3• Let’s say you are working on face recognition.• You come up with your very own feature extraction/learning algorithm.• For the dataset of interest, it is split it into training, validation, and test sets.• You train your model with possible hyperparameter choices (k in k-NN),

and select those work best on the validation set.

• OK, but…

54

Training set Test setValidation set

Training set Test setValidation set

Page 55: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

How to Determine Hyperparameters? (cont’d)

• Idea #3.5• What if only training and test sets are given, not the validation set?• Cross-validation (or k-fold cross validation)

• Split the training set into k folds with a hyperparameter choice• Keep 1 fold as validation set and the remaining k-1 folds for training• After each of k folds is evaluated, report the average validation performance.• Choose the hyperparameter(s) which result in the highest average validation

performance.

• Take a 4-fold cross-validation as an example…

55

Training set Test set

Fold 1 Test setFold 2 Fold 3 Fold 4

Fold 1 Test setFold 2 Fold 3 Fold 4

Fold 1 Test setFold 2 Fold 3 Fold 4

Fold 1 Test setFold 2 Fold 3 Fold 4

Page 56: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

Minor Remarks on NN-based Methods

• k-NN is easy to implement but not of much interest in practice. Why?• Choice of distance metrics might be an issue (see example below)• Measuring distances in high-dimensional spaces might not be a good idea.• Moreover, NN-based methods require lots of and !

(NN-based methods are viewed as data-driven approaches.)

56Image credit: Stanford CS231n

All three images have the same Euclidean distance to the original one.

Page 57: Deep Learning for Computer Visionvllab.ee.ntu.edu.tw/uploads/1/1/1/6/111696467/dlcv_w1.pdf · TAs & Office Hours: TA Hours: as listed above, or by appointment TA email: ntudlcvta2019@gmail.com

What We’ve Covered Today…

• From Probability to Bayes Decision Rule• Unsupervised vs. Supervised Learning

• Clustering & Dimension Reduction • Training, testing, & validation• Linear Classification (next lecture)

57