30
Feature Pattern Classifier System Handwritten Digit Classification with LCS Evolutionary Computation Research Group Ignas Kukenys Victoria University of Wellington (now University of Otago) [email protected] Will N. Browne Victoria University of Wellington [email protected] Mengjie Zhang Victoria University of Wellington [email protected]

Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

Embed Size (px)

DESCRIPTION

Ignas Kukenys, Will N. Browne, Mengjie Zhang. "Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems". IWLCS, 2011

Citation preview

Page 1: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

Feature Pattern Classifier System Handwritten Digit Classification with LCS

Evolutionary Computation Research Group

Ignas Kukenys Victoria University of Wellington (now University of Otago) [email protected] Will N. Browne Victoria University of Wellington [email protected] Mengjie Zhang Victoria University of Wellington [email protected]

Page 2: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

2

Context

l  Machine learning for Robotics: l  Needs to be reinforcement-based and online l  Preferably also adaptive and transparent

l  Learning from visual input is hard: l  High-dimensionality vs. sparseness of data

l  Why Learning Classifier Systems l  Robust reinforcement learning l  Limited applications for visual input

Page 3: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

3

Goals

l  Adapt LCS to learn from image data l  Use image features that enable generalisation l  Tweak the evolutionary process l  Use a well known vision problem for evaluation

l  Build a classifier system for handwritten digit classification

Page 4: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

4

Learning Classifier Systems

l  LCS model an agent interacting with an unknown environment: l  Agent observes a state of the environment l  Agent performs an action l  Environment provides a reward

l  The above contract constrains learning:

l  Online: one problem instance at a time l  Ground truth not available (non-supervised)

Page 5: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

5

Learning Classifier Systems

Page 6: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

6

Learning Classifier Systems

Page 7: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

7

Basics of LCS

l  LCS evolve a population of rules: if condition(s) then action

l  Each rule also has associated properties:

l  Predicted reward for advocated action l  Accuracy based on prediction error l  Fitness based on relative accuracy

Page 8: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

l  Traditionally LCS use 'don't care' (#) encoding: l  e.g. condition #1# matches states 010, 111, 110 and

111 l  Enables rules to generalise over multiple states l  Varying levels of generalisation:

l  ### matches all possible states l  010 matches a single specific state

Simple rule conditions

Page 9: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

9

Naïve image classification l  Consider binary 3x3 pixel patterns: l  How to separate them into two classes

based on the colour of centre point?

Page 10: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

10

Naïve image classification

l  Environment states: 9 bit messages l  e.g. 011100001 and 100010101

l  Two actions represent two classes: 0, 1 l  Two rules are sufficient to solve the problem: [### #0# ###] → 0 [### #1# ###] → 1

Page 11: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

11

l  Example 2: how to classify 3x3 patterns that have “a horizontal line of 3 white pixels”?

[111 ### ###] → 1 [### 111 ###] → 1 [### ### 111] → 1

l  Example 3: how to deal with 3x3 patterns “at least one 0 on every row”?

l  27 unique rules to fully describe the problem

Naïve image classification

Page 12: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

12

l  Number of rules explodes for complex patterns l  Consider 256 pixel values for grey-scale, … l  Very limited generalisation in such conditions l  Photographic and other “real world” images:

l  Significantly different at “pixel level” l  Need more flexible conditions

Naïve image classification

Page 13: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

13

Haar-like features

Page 14: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

14

Haar-like features

l  Compute differences between pixel sums in rectangular regions of the image

l  Very efficient with the use of “integral image” l  Widely used in computer vision

l  e.g. state of the art Viola & Jones face detector

l  Can be flexibly placed at different scales and positions in the image

l  Enable varying levels of generalisation

Page 15: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

15

Haar-like feature rules

l  To obtain LCS-like rules, feature outputs need to be thresholded:

if (feature(type, position, scale) > threshold) then action

l  Flexible direction of comparison: < and > l  Range: t_low < feature < t_high

Page 16: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

16

“Messy” encoding

l  Multiple features form stronger rules: if (feature_1 && feature_2 && feature_3 ...) then action

l  Seems to be a limit to a useful number of features:

Page 17: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

17

MNIST digits dataset

l  Well known handwritten digits dataset l  60 000 training examples, 10 classes l  Examples from 250 subjects l  28x28 pixel grey-scale (0..255) images l  10 000 evaluation examples (test set, different

subjects)

Page 18: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

18

MNIST results

Page 19: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

19

MNIST results

l  Performance: l  Training set: 92% after 4M observations l  Evaluation set: 91%

l  Supervised and off-line methods reach 99% l  Encouraging initial result for reinforcement

learning

Page 20: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

20

Adaptive learning

Page 21: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

21

Why not 100% performance?

Page 22: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

22

Improving the FPCS

l  Tournament selection l  Performs better than proportional RW

l  Crossover only at feature level l  Rules swap features, not individual attributes

l  Features start at “best” position, then mutate l  Instead of random position place feature where

the output is highest

l  With all other fixes, performance still at 94%

Page 23: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

23

Why not 100% performance? •  Online reinforcement learning

•  Cannot adapt rules based on known ground truth

•  Forms of complete map of all states to all actions to their reward, e.g. learns “not a 3”

•  Rather than just correct state: action mapping

•  Only uses Haar-like features •  Could use ensemble of different features.

Page 24: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

24

Future work

•  Inner confusion matrix to “guide” learning to “hard” areas of the problem

•  Test with a supervised-learning LCS, e.g. UCS

•  Only learn accurate positive rules, rather than complete mapping

•  How to deal with outliers? •  Testing on harder image problems will likely

reveal further challenges

Page 25: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

25

Confusion matrix

Page 26: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

26

Confusion matrix

Page 27: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

27

Conclusions

•  LCS can successfully work with image data.

•  Autonomously learn the number, type, scale and threshold of features to use in a transparent manner.

•  Challenges remain to bridge the 5% gap to supervised learning performance

Page 28: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

28

Demo

•  Handwritten digit classification with FPCS

Page 29: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

Questions?

Page 30: Confusion Matrices for Improving Performance of Feature Pattern Classifier Systems

30

Basics of LCS

l  For observed state s all conditions are tested l  Matching rules form match set [M] l  For every action, a reward is predicted l  An action a is chosen (random vs. best) l  Rules in [M] advocating a form action set [A] l  [A] is updated according to reward received l  Rule Discovery, e.g. GA, is performed in [A] to

evolve better rules