81
Introduction to Pattern Recognition For 정정정정정 , Pattern Recognition Winter School 2011 정 2 정 정 정정 KAIST 정정정정 http:// ai.kaist.ac.kr/~jkim

Introduction to Pattern Recognition

Embed Size (px)

DESCRIPTION

For 정보과학회 , Pattern Recognition Winter School. Introduction to Pattern Recognition. 2011 년 2 월 김 진형 KAIST 전산학과 http://ai.kaist.ac.kr/~jkim. What is Pattern R ecognition?. A pattern is an object, process or event that can be given a name Pattern Recognition - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction to  Pattern Recognition

Introduction to Pattern Recognition

For 정보과학회 , Pattern Recognition Winter School

2011 년 2 월

김 진형KAIST 전산학과http://ai.kaist.ac.kr/~jkim

Page 2: Introduction to  Pattern Recognition

2

What is Pattern Recognition?

A pattern is an object, process or event that can be given a namePattern Recognition

assignment of physical object or event to one of several prespecified categeries -- Duda & Hart

A subfield of Artificial Intelligence

human intelligence is based on pattern recognition

Page 3: Introduction to  Pattern Recognition

3

Examples of Patterns

Page 4: Introduction to  Pattern Recognition

4

Pattern Recognition

Machine learningMathematical statisticsNeural networksSignal processingRobotics and visionCognitive scienceNonlinear optimizationExploratory data analysisFuzzy and genetic algorithmDetection and estimation theoryFormal languagesStructural modelingBiological cyberneticsComputational neuroscience…

Image processing /segmentationComputer visionSpeech recognitionAutomated target recognitionOptical character recognitionSeismic analysisMan and machine dialogueFingerprint identificationIndustrial inspectionMedical diagnosisECG signal analysisData miningGene sequence analysisProtein structure analysisRemote sensingAerial reconnaissance …

Related fields Application areas

Page 5: Introduction to  Pattern Recognition

5

패턴인식의 응용 예

Computer aided diagnosis

Medical imaging, EEF, ECG, X-ray mammography

영상인식공장 자동화 , Robot Navigation얼굴식별 , Gesture RecognitionAutomatic Target Recognition

음성인식Speaker identificationSpeech recognitionGoogle Maps Navigation (Beta): search by voice

Page 6: Introduction to  Pattern Recognition

6

생체 인식 (Biometrics Recognition)

불변의 생체 특징을 이용한 사람 식별정적 패턴

지문 , 홍채 , 얼굴 , 장문 , …DNA

동적 패턴Signature, 성문Typing pattern

활용출입통제전자상거래 인증

패턴인식의 응용

Page 7: Introduction to  Pattern Recognition

7

Gesture Recognition

Text editing on Pen ComputersTele-operations

Control remote by gesture inputTV control by hand motion

Sign language Interpretation

Camera2D Projection

Gesture (last)

Hand tracking Gesture spotting

패턴인식의 응용

Page 8: Introduction to  Pattern Recognition

8

데이터로부터 패턴의 추출 Data Mining

인구통계Point of SaleATM금융통계신용정보문헌첩보자료진료기록신체검사기록

데이타 정보 의사결정

A 상품 구매자의 80% 가 B 상품도 구매한다 (CRM)미국시장의 자동차 구매력이 6 개월간 감소A 상품의 매출 증가가 B 상품의 2배탈수 증상을 보이면 위험

광고전략은 ?상품의 진열최적의 예산 할당은 ?시장점유의 확대방안은 ?고객의 이탈 방지책은 ?처방은 ?

국내 사례 : 신용카드 사용 패턴의 학습에 의한 분실 카드 사용 방지

패턴인식의 응용

Page 9: Introduction to  Pattern Recognition

9

e-Book, Tablet PC, iPad, Smart-phone

패턴인식의 응용

Page 10: Introduction to  Pattern Recognition

Smart Phone with Rich Sensors

Page 11: Introduction to  Pattern Recognition

Online 한글 인식기 비교

11

패턴인식의 응용

Page 12: Introduction to  Pattern Recognition

KAIST Math Expression Recognizer : Demo

12

패턴인식의 응용

Page 13: Introduction to  Pattern Recognition

MathTutor-SE Demo

13

패턴인식의 응용

Page 14: Introduction to  Pattern Recognition

14

古文書 認識 : 承政院 日記

패턴인식의 응용

Page 15: Introduction to  Pattern Recognition

15

文書 認識 Verification & Correction

Interface패턴인식의 응용

Page 16: Introduction to  Pattern Recognition

16

Mail Sorter패턴인식의 응용

Page 17: Introduction to  Pattern Recognition

Scene Text Recognition

17

패턴인식의 응용

Page 18: Introduction to  Pattern Recognition

18

Autonomous Land Vehicle

(DARPA’s GrandChallenge contest)

http://www.youtube.com/watch?v=yQ5U8suTUw0

패턴인식의 응용

Page 19: Introduction to  Pattern Recognition

19

Protein Structure Analysis패턴인식의 응용

Page 20: Introduction to  Pattern Recognition

Protein Structure Analysis

20

패턴인식의 응용

Page 21: Introduction to  Pattern Recognition

21

Types of PR problems

ClassificationAssigning an object to a classOutput: a label of classEx: classifying a product as ‘good’ or ‘bad’ in quality control

ClusteringOrganizing objects into meaningful groupsOutput: (hierarchical) grouping of objectsEx: taxonomy of species

RegressionPredict value based on observationEx: predict stock price, future prediction

DescriptionRepresenting an object in terms of a series of primitivesOutput: a structural or linguistic descriptionEx: labeling ECG signals, video indexing, protein structure indexing

From Ricardo Gutierrez-Osuna,Texas A&M Univ.

Page 22: Introduction to  Pattern Recognition

22

Pattern Class

A collection of “similar” (not necessarily identical) objects

Inter-class variability

Intra-class variability

Pattern Class Modeldescriptions of each class/population (e.g., a probability density like Gaussian)

Page 23: Introduction to  Pattern Recognition

23

Classification vs Clustering

Classification (known categories) Clustering (creation of new categories)

Category “A”

Category “B”

Classification (Recognition) (Supervised Classification)

Clustering(Unsupervised Classification)

Page 24: Introduction to  Pattern Recognition

24

Pattern Recognition : Key Objectives

Process the sensed data to eliminate noise

Data vs Noise

Hypothesize models that describe each class population

Then we may recover the process that generated the patterns.

Choose the best-fitting model for given sensed data to assign the class label associated with the model.

Page 25: Introduction to  Pattern Recognition

25

일반적인 Classification 과정

ClassifierFeature

Extractor

Sensorsignal

Feature

Class Membership

Sensor

Page 26: Introduction to  Pattern Recognition

26

Example : Salmon or Sea Bass

Sort incoming fish on a belt according to two classes:

Salmon orSea Bass

Steps:Preprocessing (segmentation)Feature extraction (measure features or properties)Classification (make final decision)

Page 27: Introduction to  Pattern Recognition

27

Sea bass vs Salmon (by Image)

LengthLightnessWidthNumber and shape of finsPosition of the mouth …

Page 28: Introduction to  Pattern Recognition

28

Salmon vs. Sea Bass (by length)

Page 29: Introduction to  Pattern Recognition

29

Salmon vs. Sea Bass (by lightness)

Best Decision Strategy with lightness

Page 30: Introduction to  Pattern Recognition

30

Cost of Misclassification

There are two possible classification errors.(1) deciding a sea bass into a

salmon.(2) deciding a salmon into a sea

bass.

Which error is more important ?Generalized as Loss functionThen, look for the decision of minimun Risk

Risk = Expected Loss

Salmon

Sea Bass

Salmon

0 -10

Sea bass

-20 0

truth

decision

Loss Function

Page 31: Introduction to  Pattern Recognition

31

Classification with more features(by length and lightness)

It is possibly better.

Really ??

Page 32: Introduction to  Pattern Recognition

32

How Many Features and Which?

Choice of features determines success or failure of classification taskFor a given feature, we may compute the best decision strategy from the (training) data

Is called training, parameter adaptation, learningMachine Learning Issues

Page 33: Introduction to  Pattern Recognition

Issues with feature extraction:

Correlated features do not improve performance.It might be difficult to extract certain features.It might be computationally expensive to extract many features.“Curse” of dimensionality …

33

Page 34: Introduction to  Pattern Recognition

Feature and Feature Vector

34

− Length− Lightness− Width− …− Number

and shape of fins

− Position of the mouth

Page 35: Introduction to  Pattern Recognition

Goodness of Feature

35

Features and separability

Page 36: Introduction to  Pattern Recognition

36

Developing PR system

Sensors and preprocessing.A feature extraction aims to create discriminative features good for classification. A classifier. A teacher provides information about hidden state -- supervised learning. A learning algorithm sets PR from training examples.

Sensors and preprocessing

Feature extraction

Classifier

Classassignment

Learning algorithmTeacher

Pattern

Page 37: Introduction to  Pattern Recognition

37

PR Approaches

Template matchingThe pattern to be recognized is matched against a stored template

Statistical PR: based on underlying statistical model of patterns(features) and pattern classes.

Structural PR: Syntactic pattern recognitionpattern classes represented by means of formal structures as grammars, automata, strings, etc. Not only for classification but also description

Neural networksclassifier is represented as a network of cells modeling neurons of the human brain (connectionist approach).Knowledge is stored in the connectivity and strength of synaptic weights

Statistical structure AnalysisCombining Structure and statistical analysisBayesian Network, MRF 등의 Probabilistic framework 을 활용

…Modified From Vojtěch Franc

Page 38: Introduction to  Pattern Recognition

38

Template Matching

Template

Input scene

PR Approaches

Page 39: Introduction to  Pattern Recognition

39

Deformable Template Matching: Snake

Prototype registration to the low-level segmented

imageShape training set Prototype and variation learning

Prototype warping

Example : Corpus Callosum Segmentation

PR Approaches

Page 40: Introduction to  Pattern Recognition

40

From Ricardo Gutierrez-Osuna,Texas A&M Univ. PR Approaches

Page 41: Introduction to  Pattern Recognition

41

Classifier

The task of classifier is to partition feature space into class-labeled decision regions

Borders between decision regions decision boundariesDetermining decision region of a feature vector X

Page 42: Introduction to  Pattern Recognition

42

Representation of classifier

A classifier is typically represented as a set of discriminant functions

||,,1,:)( YX iGi x

The classifier assigns a feature vector x to the i-the class if )()( xx ji GG ij

)(1 xG

)(2 xG

)(|| xYG

maxx y

Feature vector

Discriminant function

Class identifier

From Vojtěch Franc

….

Page 43: Introduction to  Pattern Recognition

43

Classification of Classifiers by Form of Discriminant Function

Discriminant Function Classifier

A posteriori ProbabilityP( yi | X)

Bayesian

Linear Function Linear Discrinant Analysis, Support Vector Machine

Non-Linear Function Non-Linear Discrinant Analysis

Output of artificial Neuron

Artificial Neural Network

)(xiG

Page 44: Introduction to  Pattern Recognition

44

Bayesian Decision Making

Statistical approachthe optimal classifier with Minimum errorAssume that complete statistical model is known.

Decision given the posterior probabilities

X is an observation :

if P(1 | x) > P(2 | x) decide state of nature = 1

if P(1 | x) < P(2 | x) decide state of nature = 2

Page 45: Introduction to  Pattern Recognition

45

Searching Decision Boundary

Page 46: Introduction to  Pattern Recognition

46

Bayesian Rule : P(x|1) P(1|x)

jjj

iiiii PP

PP

P

PPP

)()|(

)()|(

)(

)()|()|(

x

x

x

xx

Page 47: Introduction to  Pattern Recognition

47

Limitations of Bayesian approach

Statistical model p(x,y) is mostly not known learning to estimate p(x,y) from training examples {(x1,y1),…,(x,y)}

Usually p(x,y) is assumed to be a parametric form

Ex: multivariate normal distribution

Non-parametric estimation of p(x,y) requires a large set of training samplesNon-Bayesian methods offers equally good (??)

From Vojtěch Franc

Page 48: Introduction to  Pattern Recognition

48

Polynomial Discriminative Function approaches

Assume that G(x) is a polynomial function

Linear function – Linear Discriminant Analysis (LDA)Quadratic function

Classifier design is determination of separating hyperplane.

From Vojtěch Franc

Page 49: Introduction to  Pattern Recognition

49

LDA Example : 기수 (J)- 농구선수 (H) 분리

x

2

1

x

x

height

weight

Task: 기수 (J)- 농구선수 (H) 분리

The set of hidden state is

The feature space is

},{ JHY2X

Training examples

)},(,),,{( 11 ll yy xx

1x

2x

Jy

Hy Linear classifier:

0)(

0)()q(

bifJ

bifH

xw

xwx

0)( bxw

From Vojtěch Franc

Page 50: Introduction to  Pattern Recognition

50

Artificial Neural Network Design

For a given structure, find best weight sets which minimizes sum of square error, J(w), from training examples {(x1,y1),…,(x,y)}

2

1

)(2

1)( k

l

kk ztJ

Page 51: Introduction to  Pattern Recognition

51

PR design cycle

Data collectionProbably the most time-intensive component of projectHow many examples are enough ?

Feature choiceCritical to the success of the PR projectRequire basic prior knowledge, engineering sense

Model choice and designStatistical, neural and structuralParameter settings

TrainingGiven a feature set and ‘blank’ model, adapt the model to explain the training dataSupervised, unsupervised, reinforcement learning

EvaluationHow well does the trained model do ?Overfitting vs. generalization

Page 52: Introduction to  Pattern Recognition

52

Learning for PR system

Which Feature is good for classifying given classes ?

Feature analysis

Can we get required probabilities or boundaries ?

Learning from training Data

Sensors and preprocessing

Feature extraction

Classifier

Classassignment

Learning algorithmTeacher

Pattern

Page 53: Introduction to  Pattern Recognition

53

Learning

Change of contents and organization of system’s knowledge enabling to improve to its performance on task - SimonWhen it acquire new knowledge from environment

Learning from Observationfrom trivial memorization to the creation of scientific theoriesInductive Inference

New consistent interpretation of data (observations)General conclusion from examplesInfer association between input and output with some confidence

Data MiningLearning rules from large set of data

Availability of large database allows application of machine learning to real problems

Page 54: Introduction to  Pattern Recognition

54

Learning Algorithm Categorization Depending on Available

Feedback

Supervised learningexamples of correct input/output pair is availableInduction

Unsupervised learningNo hint at all about the correct outputs.Clustering or consistent interpretation.

Reinforcement learningReceives no examples, but rewards or punishments at the end

Semi-supervised learningTraining with labeled training examples and unlabeled examples

Page 55: Introduction to  Pattern Recognition

55

Issues on Learning Algorithm

Prior Knowledge Prior knowledge can help in learning.Assumptions on parametric forms and range of values

Incremental learningUpdate old knowledge whenever new example arrives

Batch learningApply learning algorithm to the entire set of examples

Analytic approach : find the optimal parameter values by analysisIterative adaptation : improve parameter values from initial guess

Page 56: Introduction to  Pattern Recognition

56

Learning Algorithms

General IdeasTweak parameters so as to optimize performance criterionIn the course of learning, the parameter vector traces a path that (hopefully) ends at the best parameter vector

Page 57: Introduction to  Pattern Recognition

57

Inductive Learning

For given training examplescorrect input-output pairs),

Recover unknown underlying functionfrom which the training data generated

Generalization ability for unseen data is required

Forms of the FunctionLogical sentences / Polynomials / Set of weights (Neural Networks), …

Given form of function, adjust parameters to minimize error

Page 58: Introduction to  Pattern Recognition

58

Theory of Inductive Inference

Inductive biasconstraints on hypothesis spaceTable of all observation is not a choice

Restricted Hypothesis space biasesPreference biases

Occam’s razor (Ockham) : simple hypo is best

Concept C XExamples are given as (x, y) where xX and

y = 1 if x C, y = 0 if x C Find F such that F(x)= 1 if x C, and F(x)= 0 if x C

Page 59: Introduction to  Pattern Recognition

59

Consistent hypotheses

William of Ockham (also Occam ) 1285-1349English scholastic philosopher

Prefer the simplest hypothesis consistent with dataDefinition of ‘simple’ is not easy

Tradeoff between complexity of hypothesis and degree of fit

Page 60: Introduction to  Pattern Recognition

60

Model Complexity

Decision Boundary of Salmon and Sea bassWhich is better ? A or B

A B

Page 61: Introduction to  Pattern Recognition

61

Model Complexity

We can get perfect classification performance on the training data by choosing complex models.

Issue of generalization

Page 62: Introduction to  Pattern Recognition

62

Generalization

The main goal of pattern classification system is to suggest the class of objects yet unseen : Generalization

Some complex decision boundaries are not good at generalization.Some simple boundaries are not good either.

Tradeoff between performance and simplicity

core of statistical pattern recognition

Page 63: Introduction to  Pattern Recognition

63

Generalization Strategy

How can we improve generalization performance ?

More training examples (i.e., better pdf estimates).

Simpler models (i.e., simpler classification boundaries) usually yield better performance.

Simplify the decision boundary!

Page 64: Introduction to  Pattern Recognition

64

Overfitting and underfitting

underfitting overfittinggood fit

Problem of generalization: a small emprical risk Remp does not imply small true expected risk R.

From Vojtěch Franc

Page 65: Introduction to  Pattern Recognition

65

Curse of Dimensionality

Function 의 수를 늘려면 error 감소 훈련데이더에 대한 Classifier 성능 향상

제한된 양의 training Data 로 훈련 시에 Feature 수를 늘리면 일반화 능력 감소적절한 일반화 능력 향상을 위하여 요구되는 훈련데이터의 양은 feature dimension 에 따라 급격히 증가

For a finite set of training data, Finding Optimal set of Features is a difficult problem

Page 66: Introduction to  Pattern Recognition

66

Maximize outcomes from two slot machines of unknown return rates

How much coins should be spent to find the better machine ?

Two Slot Machine Problem

Page 67: Introduction to  Pattern Recognition

67

Optimal Number of Cells (example)

Page 68: Introduction to  Pattern Recognition

68

Implication of Curse of Dimensionality to PR system

design

With finite training samples, be cautious of adding features

Features of high Discrimination power first

Feature analysis is mandatory

Simple neural networks is generally better

small number of hidden nodes, links

Tips for structure simplificationParameter tyingEliminate links during learning

Page 69: Introduction to  Pattern Recognition

69

Cross-Validation

Validate learned model on different set to assess the generalization performance

guarding against overfitting

Partition Training set intoEstimation subset for learning parametersvalidation subset

cross-validation forbest model selectiondetermine when to stop training

Leave-one-out validation methodN-1 for training, 1 for validation, takes turnOvercome Small training set

Page 70: Introduction to  Pattern Recognition

70

Unsupervised learningInput: training examples {x1,…,x} without information about the hidden state.

Clustering: goal is to find clusters of data sharing similar properties.

Classifier

Learning algorithm

θ

},,{ 1 xx },,{ 1 yyClassifier YΘX :q

A broad class of unsupervised learning algorithms:

From Vojtěch Franc

… …

Page 71: Introduction to  Pattern Recognition

71

Example of unsupervised learning algorithm

k-Means clustering:

Classifier ||||minarg)q(

,,1i

kiy mxx

Goal is to minimize

1

2)q( ||||

ii ixmx

ij

ji

iII

,||

1xm })q(:{ ij ji xI

Learning algorithm

1m

2m

3m

},,{ 1 xx

},,{ 1 kmmθ

},,{ 1 yy

From Vojtěch Franc

Page 72: Introduction to  Pattern Recognition

Other Issues in Pattern Recognition

Page 73: Introduction to  Pattern Recognition

73

Difficulty of Class Modeling

Page 74: Introduction to  Pattern Recognition

74

인식에는 Context Processing 이 필수

Page 75: Introduction to  Pattern Recognition

75

문자인식에서의 Context Processing

Context 없을 경우 사람의 영문 필기 인식률 : 95 %

Page 76: Introduction to  Pattern Recognition

76

Global Consistency

Local decision is not enough

Page 77: Introduction to  Pattern Recognition

77

Combining Multiple Classifiers

Approaches for improving the performance of the group of experts

Best single classifier vs Combining multiple classifiersTwo heads (experts, classifiers) are better than one

Classifier output is eitherBest (single) classRankingScore as each class

Method for generating multiple classifiersCo-related classifiers would not be a help

Method for combining multiple classifiersMajority rules, borda count, decorelated combination, etc.

Page 78: Introduction to  Pattern Recognition

78

패턴 인식 성능의 평가실제

인식결과

a

not a

A not A

p

qr

s

( 정 ) 인식률 = (p+q)/(p+q+r+s)오인식률 = (r+s)/(p+q+r+s)Miss detection = r/(p+r)False alarm = s/(p+s)Recall = p/p+rPrecision = p/p+s기각율 (refuse to make decision)처리율

A case: 20% 기각했는데 결과에는 0.5% errorB case : 10% 기각했는데 결과에는 1.0% errorWhich is better ?

Page 79: Introduction to  Pattern Recognition

79

패턴인식의 성능 향상

시간 , 노력

성능

100% 를 향하여

Page 80: Introduction to  Pattern Recognition

첨부

Page 81: Introduction to  Pattern Recognition

81

Resources

Professional AssociationInternational Association for Pattern Recognition (IAPR)

Text BooksPattern Classification by Richard O. Duda, Pater E. Hart, and David G, Storks

JournalsIEEE Transactions on Pattern Analysis and Machine IntelligencePattern RecognitionPattern Recognition LettersArtificial Intelligence and Pattern Recognition…

Conference and WorkshopsInternational Conference for Pattern RecognitionInt’l Conference on Document Analysis and RecognitionInt’l Workshop in Frontiers of Handwriting RecognitionIEEE Computer Vision and Pattern Recognition…