Feature Pattern Classifier System Handwritten Digit Classification with LCS
Evolutionary Computation Research Group
Ignas Kukenys Victoria University of Wellington (now University of Otago) [email protected] Will N. Browne Victoria University of Wellington [email protected] Mengjie Zhang Victoria University of Wellington [email protected]
2
Context
l Machine learning for Robotics: l Needs to be reinforcement-based and online l Preferably also adaptive and transparent
l Learning from visual input is hard: l High-dimensionality vs. sparseness of data
l Why Learning Classifier Systems l Robust reinforcement learning l Limited applications for visual input
3
Goals
l Adapt LCS to learn from image data l Use image features that enable generalisation l Tweak the evolutionary process l Use a well known vision problem for evaluation
l Build a classifier system for handwritten digit classification
4
Learning Classifier Systems
l LCS model an agent interacting with an unknown environment: l Agent observes a state of the environment l Agent performs an action l Environment provides a reward
l The above contract constrains learning:
l Online: one problem instance at a time l Ground truth not available (non-supervised)
5
Learning Classifier Systems
6
Learning Classifier Systems
7
Basics of LCS
l LCS evolve a population of rules: if condition(s) then action
l Each rule also has associated properties:
l Predicted reward for advocated action l Accuracy based on prediction error l Fitness based on relative accuracy
l Traditionally LCS use 'don't care' (#) encoding: l e.g. condition #1# matches states 010, 111, 110 and
111 l Enables rules to generalise over multiple states l Varying levels of generalisation:
l ### matches all possible states l 010 matches a single specific state
Simple rule conditions
9
Naïve image classification l Consider binary 3x3 pixel patterns: l How to separate them into two classes
based on the colour of centre point?
10
Naïve image classification
l Environment states: 9 bit messages l e.g. 011100001 and 100010101
l Two actions represent two classes: 0, 1 l Two rules are sufficient to solve the problem: [### #0# ###] → 0 [### #1# ###] → 1
11
l Example 2: how to classify 3x3 patterns that have “a horizontal line of 3 white pixels”?
[111 ### ###] → 1 [### 111 ###] → 1 [### ### 111] → 1
l Example 3: how to deal with 3x3 patterns “at least one 0 on every row”?
l 27 unique rules to fully describe the problem
Naïve image classification
12
l Number of rules explodes for complex patterns l Consider 256 pixel values for grey-scale, … l Very limited generalisation in such conditions l Photographic and other “real world” images:
l Significantly different at “pixel level” l Need more flexible conditions
Naïve image classification
13
Haar-like features
14
Haar-like features
l Compute differences between pixel sums in rectangular regions of the image
l Very efficient with the use of “integral image” l Widely used in computer vision
l e.g. state of the art Viola & Jones face detector
l Can be flexibly placed at different scales and positions in the image
l Enable varying levels of generalisation
15
Haar-like feature rules
l To obtain LCS-like rules, feature outputs need to be thresholded:
if (feature(type, position, scale) > threshold) then action
l Flexible direction of comparison: < and > l Range: t_low < feature < t_high
16
“Messy” encoding
l Multiple features form stronger rules: if (feature_1 && feature_2 && feature_3 ...) then action
l Seems to be a limit to a useful number of features:
17
MNIST digits dataset
l Well known handwritten digits dataset l 60 000 training examples, 10 classes l Examples from 250 subjects l 28x28 pixel grey-scale (0..255) images l 10 000 evaluation examples (test set, different
subjects)
18
MNIST results
19
MNIST results
l Performance: l Training set: 92% after 4M observations l Evaluation set: 91%
l Supervised and off-line methods reach 99% l Encouraging initial result for reinforcement
learning
20
Adaptive learning
21
Why not 100% performance?
22
Improving the FPCS
l Tournament selection l Performs better than proportional RW
l Crossover only at feature level l Rules swap features, not individual attributes
l Features start at “best” position, then mutate l Instead of random position place feature where
the output is highest
l With all other fixes, performance still at 94%
23
Why not 100% performance? • Online reinforcement learning
• Cannot adapt rules based on known ground truth
• Forms of complete map of all states to all actions to their reward, e.g. learns “not a 3”
• Rather than just correct state: action mapping
• Only uses Haar-like features • Could use ensemble of different features.
24
Future work
• Inner confusion matrix to “guide” learning to “hard” areas of the problem
• Test with a supervised-learning LCS, e.g. UCS
• Only learn accurate positive rules, rather than complete mapping
• How to deal with outliers? • Testing on harder image problems will likely
reveal further challenges
25
Confusion matrix
26
Confusion matrix
27
Conclusions
• LCS can successfully work with image data.
• Autonomously learn the number, type, scale and threshold of features to use in a transparent manner.
• Challenges remain to bridge the 5% gap to supervised learning performance
28
Demo
• Handwritten digit classification with FPCS
Questions?
30
Basics of LCS
l For observed state s all conditions are tested l Matching rules form match set [M] l For every action, a reward is predicted l An action a is chosen (random vs. best) l Rules in [M] advocating a form action set [A] l [A] is updated according to reward received l Rule Discovery, e.g. GA, is performed in [A] to
evolve better rules