C03 Machine Vision Algorithms.ppt [호환 모드]dasan.sejong.ac.kr/~dihan/cv/C03_Machine Vision... · · 2016-03-21Machine Vision Algorithms Computer Engineering, Sejong University

1/104

Computer Vision

3. Machine Vision Algorithms

Computer Engineering, Sejong University

Dongil Han

2/104

SIFT(Scale Invariant Feature Transform)

Adaboost(Adaptive Boosting)

SVM(Support Vector Machine)

Precision and Recall

DL(Deep Learning)

Contents

3/104

Scale Invariant Feature Transform

Scale-invariant feature transform (or SIFT) is an algorithm in computer vision to detect and describe local features in images. The algorithm was published by David Lowe in 1999.

Applications include object recognition, robotic mapping and navigation, image stitching, 3D modeling, gesture recognition, video tracking, and match moving.

The algorithm is patented in the US; the owner is the University of British Columbia.

…. David LoweComputer Science DepartmentUniversity of British Columbia

4/104

Challenges

• Scale change

• Rotation

• Occlusion

• Illumination

……

5/104

Overview of SIFT

6/104

Scale Space extrema detection

• Find the points, whose surrounding patches (with some scale) are distinctive

• An approximation to the scale-normalized Laplacian of Gaussian

7/104

Maxima and minima in a 3*3*3 neighborhood

8/104

Locate DOG Extrema

• There are still a lot of points, some of them are not good enough.

• The locations of keypoints may be not accurate.

9/104

Edge Response Elimination

• Such a point has large principal curvature across the edge but a small one in the perpendicular direction

• The principal curvatures can be calculated from a Hessian function

• The eigenvalues of H are proportional to the principal curvatures, so two eigenvalues shouldn’t diff too much

10/104

11/104

Orientation assignment

• Assign an orientation to each keypoint, the keypoint descriptor can be represented relative to this orientation and therefore achieve invariance to image rotation

• Compute magnitude and orientation on the Gaussian smoothed images

12/104

Orientation assignment

• A histogram is formed by quantizing the orientations into 36 bins;

• Peaks in the histogram correspond to the orientations of the patch;

• For the same scale and location, there could be multiple keypoints with different orientations;

13/104

Feature Descriptor

• Based on 16*16 patches

• 4*4 subregions

• 8 bins in each subregion

• 4*4*8=128 dimensions in total

14/104

Application : Object recognition

• The SIFT features of training images are extracted and stored

• For a query image

1. Extract SIFT feature

2. Efficient nearest neighbor indexing

3. 3 keypoints, Geometry verification

15/104

16/104

SURF(Speeded Up Robust Feature)

• Approximate version of SIFT

• Works almost equally well

• Very fast

17/104

SIFT Summary

• The most successful feature (probably the most successful paper in computer vision)

• A lot of heuristics, the parameters are optimized based on a small and specific dataset. Different tasks should have different parameter settings.

• Learning local image descriptors (Winder et al 2007): tuning parameters given their dataset.

• We need a universal objective function.

18/104

Adaptive Boosting(Adaboost)

• Introduced by Schapire and Freund in 1990s.

• “Boosting”: convert a weak learning algorithm into a strong one.

• Main idea: Combine many weak classifiers to produce a powerful committee.

• Algorithms:– AdaBoost: adaptive boosting– Gentle AdaBoost– BrownBoost– …

19/104

Classifiers• Classifier Examples

– Linear Classifier

– Quadratic Classifier

– Nonlinear Classifier

+

++

+

+

test set example Classifier examples

+

++

+

+

+

++

+

+

20/104

Adaboost Learning Algorithm

21/104

Adaboost Learning Algorithm• Round 1 of 3

+

++

+

+

+

++

+

+

D21 = 0.300

1=0.424

h1

22/104

+

++

+

+

+

++

+

+

D2h22 = 0.196

2=0.704


23/104

Adaboost Learning Algorithm

+

++

+

+

h3

3 = 0.344

2=0.323

STOP


24/104

Adaboost Learning AlgorithmAdaboost Learning Algorithm• Final Hypothesis

25/104

Adaboost Learning Algorithm• Learning Process

26/104

Adaboost Learning Algorithm• Final Classifier and parameters

27/104

Algorithm Recapitulation

28/104

29/104

30/104

Adaboost in Face Detection

31/104

Adaboost in Face Detection• Adaboost Learning Algorithm by Froba and Ernst

– Detector has four classifiers of increasing complexity• analyzes image patches W of size 22x22 after MCT : Γ

• location within the analysis window W: x

• Adaboost classifier stage: j

• Classifier : H, Pixel Classifier : hx

32/104

Adaboost Summary

• Advantages- Very simple to implement- Does feature selection resulting in relatively simple classifier- Fairly good generalization

• Disadvantages- Suboptimal solution- Sensitive to noisy data and outliers

33/104

Support Vector Machine(SVM)

• A classifier derived from statistical learning theory by Vapnik, et al. in 1992

• SVM became famous when, using images as input, it gave accuracy comparable to neural-network with hand-designed features in a handwriting recognition task

• Currently, SVM is widely used in object detection & recognition, content-based image retrieval, text recognition, biometrics, speech recognition, etc.

• Also used for regression

• Demo of SVM

http://www.csie.ntu.edu.tw/~cjlin/libsvm/V. Vapnik

These slides are courtesy of www.iro.umontreal.ca/~pift6080/documents/papers/svm_tutorial.ppt

34/104

Discriminant Function

• It can be arbitrary functions of x, such as:

Nearest Neighbor

Decision Tree

LinearFunctions

( ) Tg b x w x

NonlinearFunctions

35/104

Linear Discriminant Function

• g(x) is a linear function:

( ) Tg b x w x

x1

x2

wT x + b < 0

wT x + b > 0

• A hyper-plane in the feature space

• (Unit-length) normal vector of the hyper-plane:

w

nw

n

36/104


• How would you classify these points using a linear discriminant function in order to minimize the error rate?

=> Infinite number of answers!

denotes +1

denotes -1

x1

x2

37/104


x1

x2

denotes +1

denotes -1• How would you classify these

points using a linear discriminant function in order to minimize the error rate?

=> Infinite number of answers!

=> Which one is the best?

38/104

“safe zone” Margin

x1

x2

denotes +1

denotes -1

Large Margin Linear Classifier

• The linear discriminant function (classifier) with the maximum margin is the best

• Margin is defined as the width that the boundary could be increased by before hitting a data point

• Why it is the best?- Robust to outliners and thus strong generalization ability

39/104

x1

x2

denotes +1

denotes -1• Given a set of data points:

• With a scale transformation on both w and b, the above is equivalent to

For 1, 0

For 1, 0

Ti i

Ti i

y b

y b

w x

w x

{( , )}, 1, 2, ,i iy i nx , where

For 1, 1

For 1, 1

Ti i

Ti i

y b

y b

w x

w x


40/104


x1

x2

denotes +1

denotes -1

Margin

x+

x+

x-n

Support Vectors

• We know that

• The margin width is:

1

1

T

T

b

b

w x

w x

( )

2 ( )

M

x x n

wx x

w w

41/104


x1

x2

denotes +1

denotes -1

Margin

x+

x+

x-n

• Formulation:

such that

2maximize

w

For 1, 1

For 1, 1

Ti i

Ti i

y b

y b

w x

w x

42/104


• Formulation:

such that

2maximize

w

For 1, 1

For 1, 1

Ti i

Ti i

y b

y b

w x

w x

• Formulation:

21minimize

2w

such that

For 1, 1

For 1, 1

Ti i

Ti i

y b

y b

w x

w x

( ) 1Ti iy b w x

or

43/104

Solving the Optimization Problem

( ) 1Ti iy b w x

21minimize

2w

s.t.

Quadratic programming

with linear constraints

2

1

1minimize ( , , ) ( ) 1

2

nT

p i i i ii

L b y b

w w w x

s.t.

LagrangianFunction

0i

44/104

2

1

1minimize ( , , ) ( ) 1

2

nT

p i i i ii

L b y b

w w w x

s.t. 0i

0pL

b

0pL

w 1

n

i i ii

y

w x

1

0n

i ii

y


45/104


• The solution has the form:

( ) 1 0Ti i iy b w x

• From KKT condition:

• Thus, only support vectors have 0i

1 SV

n

i i i i i ii i

y y

w x x

get from ( ) 1 0,

where is support vector

Ti i

i

b y b w x

x

x1

x2

x+

x+

x-

Support Vectors

KKT condition: URL: https://en.wikipedia.org/wiki/Karush%E2%80%93Kuhn%E2%80%93Tucker_conditions

46/104


SV

( ) T Ti i

i

g b b

x w x x x

• The linear discriminant function is:

• Notice it relies on a dot product between the test point x and the support vectors xi

• Also keep in mind that solving the optimization problem involved computing the dot products xi

Txj between all pairs of training points

47/104


• What if data is not linear separable? (noisy data, outliers, etc.)

• Slack variables ξi can be added to allow mis-classification of difficult or noisy data points

x1

x2

denotes +1

denotes -1

12

48/104


• Formulation:

( ) 1Ti i iy b w x

2

1

1minimize

2

n

ii

C

w

such that

0i

Parameter C can be viewed as a way to control over-fitting.

49/104

Non-linear SVMs• Datasets that are linearly separable with noise work out great:

0 x

0 x

• But what are we going to do if the dataset is just too hard?

0 x

x2

• How about… mapping data to a higher-dimensional space:

50/104

Non-linear SVMs: Feature Space

• General idea: the original input space can be mapped to some higher-dimensional feature space where the training set is separable:

Φ: x→ φ(x)

51/104

Applications of SVMs

• Bioinformatics• Machine Vision• Text Categorization• Ranking (e.g., Google searches)• Handwritten Character Recognition• Time series analysis

Lots of very successful applications!!!

52/104

Car license plate extraction

Precision=62%, Recall=50.2%False negative rate=49.8%False positive rate=1.1%

• Extract SIFT Feature first

• Classification using SVM

ClassificationResults

Ground Truth

Outside Inside

Outside 293734 (TN) 3398 (FP)

Inside 5505 (FN) 5545 (TP)

53/104

Summary: Support Vector Machine

• Large Margin Classifier – Better generalization ability & less over-fitting

• The Kernel Trick– Map data points to higher dimensional space in order to

make them linearly separable.

– Since only dot product is used, we do not need to represent the mapping explicitly.

54/104


• Precision: fraction of retrieved docs that are relevant = P(relevant|retrieved)

• Recall: fraction of relevant docs that are retrieved = P(retrieved|relevant)

Precision P = tp/(tp + fp)

Recall R = tp/(tp + fn)

55/104


• Car license plate extraction example

- TP(True Positive) : 번호판을 번호판으로 판단

- FP(False Positive) : 배경을 번호판으로 판단

- FN(False Negative) : 번호판을 배경으로 판단

- TN(True Negative) : 배경을 배경으로 판단

Precision(정확도) : 번호판으로 판단한 영역 중실제로 번호판인 비율

Recall(재현율, 검출율) : 전체 번호판 중실제로 번호판으로 판단한 비율

56/104

DL(Deep Learning)

• what exactly is deep learning ? - ‘Deep Learning’ means using a neural network

with several layers of nodes between input and output

• why is it generally better than other methods on image, speech and certain other types of data? - the series of layers between input & output do

feature identification and processing in a series of stages,

just as our brains seem to.

These slides are courtesy of https://www.macs.hw.ac.uk/~dwcorne/

57/104

DL(Deep Learning)

• multilayer neural networks have been around for

25 years. What’s actually new?

we have always had good algorithms for learning the

weights in networks with 1 hidden layer

but these algorithms are not good at learning the weights for

networks with more hidden layers

what’s new is: algorithms for training many-later networks

58/104

Brain vs. Computer

1. 10 billion neurons2. 60 trillion synapses3. Distributed processing4. Nonlinear processing5. Parallel processing

1. Faster than neuron (10-9 sec)cf. neuron: 10-3 sec

3. Central processing4. Arithmetic operation (linearity)5. Sequential processing

59/104

Neuron vs Artificial Neuron

수상돌기 세포체 축색돌기

60/104

Simple Perceptron

61/104

Classic Perceptron

Sigmoid UnitSigmoid function is

Differentiable (x) (x)(1 (x))x

Nonlinear Neuron :Sigmoid Unit

62/104

Multilayer Perceptron

63/104

W1

W2

W3

f(x)

1.4

-2.5

-0.06

64/104

2.7

-8.6

0.002

f(x)

1.4

-2.5

-0.06

x = -0.06×2.7 + 2.5×8.6 + 1.4×0.002 = 21.34

65/104

Training the neural network Fields class1.4 2.7 1.9 03.8 3.4 3.2 06.4 2.8 1.7 14.1 0.1 0.2 0etc …

66/104

Training dataFields class1.4 2.7 1.9 03.8 3.4 3.2 06.4 2.8 1.7 14.1 0.1 0.2 0etc …

Feed it through to get output

1.4

2.7 0.8

1.9

67/104


Compare with target output

1.4

2.7 0.8 0

1.9 error 0.8

68/104


Adjust weights based on error

1.4

2.7 0.8 0

1.9 error 0.8

69/104


Feed it through to get output

6.4

2.8 0.9

1.7

70/104


Compare with target output

6.4

2.8 0.9 1

1.7 error -0.1

71/104


Adjust weights based on error

6.4

2.8 0.91

1.7 error -0.1

72/104


And so on ….

6.4

2.8 0.91

1.7 error -0.1

Repeat this thousands, maybe millions of times – each timetaking a random training instance, and making slight weight adjustmentsAlgorithms for weight adjustment are designed to make

changes that will reduce the error

73/104

The decision boundary perspective…

Initial random weights

74/104


Present a training instance / adjust the weights

75/104


Eventually ….

76/104

The point I am trying to make

• weight-learning algorithms for NNs are dumb

• they work by making thousands and thousands of tiny adjustments, each making the network do better at the most recent pattern, but perhaps a little worse on many others

• but, by dumb luck, eventually this tends to be good enough to

learn effective classifiers for many real applications

77/104

Some other points

If f(x) is non-linear, a network with 1 hidden layer can, in theory, learn perfectly any classification problem. A set of weights exists that can produce the targets from the inputs. The problem is finding them.

78/104

Some other ‘by the way’ points

If f(x) is linear, the NN can only draw straight decision boundaries (even if there are many layers of units)

79/104


NNs use nonlinear f(x) so they

can draw complex boundaries,

but keep the data unchanged

80/104


NNs use nonlinear f(x) so they SVMs only draw straight lines,

can draw complex boundaries, but they transform the data first

but keep the data unchanged in a way that makes that OK

81/104

Feature detectors

82/104

what is this unit doing?

83/104

Hidden layer units become self-organised feature detectors

…

1

63

1 5 10 15 20 25 …

84/104

What does this unit detect?

…

1

63

1 5 10 15 20 25 …

85/104


…

1

63

1 5 10 15 20 25 …

it will send strong signal for a horizontalline in the top row, ignoring everywhere else

86/104


…

1

63

1 5 10 15 20 25 …

87/104


…

1

63

1 5 10 15 20 25 …

Strong signal for a dark area in the top leftcorner

88/104

What features might you expect a good NNto learn, when trained with data like this?

89/10463

1

vertical lines

90/10463

1

Horizontal lines

91/10463

1

Small circles

92/10463

1

Small circles

But what about position invariance ???our example unit detectors were tied to specific parts of the image

93/104

successive layers can learn higher-level features …

etc …detect lines inSpecific positions

v

Higher level detetors( horizontal line, “RHS vertical lune”“upper loop”, etc…

etc …

94/104

successive layers can learn higher-level features …

etc …detect lines inSpecific positions

v

Higher level detetors( horizontal line, “RHS vertical lune”“upper loop”, etc…

etc …


95/104

So: multiple layers make sense

Your brain works that way

96/104

So: multiple layers make sense

Many-layer neural network architectures should be capable of learning the true underlying features and ‘feature logic’, and therefore generalise very well …

97/104

But, until very recently, our weight-learning algorithms simply did not work on multi-layer architectures

98/104

The new way to train multi-layer NNs…

99/104


Train this layer first

100/104


Train this layer first

then this layer

then this layer

then this layerfinally this layer

101/104

Final layer trained to predict class based on outputs from previous layers

102/104

And that’s that

• That’s the basic idea

• There are many many types of deep learning,

• different kinds of autoencoder, variations on architectures and training algorithms, etc…

• Very fast growing area …

103/104

DL Application

104/104

DL Summary

• Much recent excitement, still much to be discovered

• "Google-Brain"• Sum of Products Nets• Biological Plausibility• Potential for significant improvements• Good in structured spaces

– Important research question: To what extent can we use Deep Learning in more arbitrary feature spaces?

– Recent deep training of MLPs with BP shows potential in this area

Documents

C03 Machine Vision Algorithms.ppt [호환 모드]dasan.sejong.ac.kr/~dihan/cv/C03_Machine Vision... · · 2016-03-21Machine Vision Algorithms Computer Engineering, Sejong University