View
248
Download
0
Embed Size (px)
Citation preview
Rotation Forest: A New Classifier Ensemble Method
交通大學 電子所蕭晴駿2007.3.7
Juan J. Rodríguez and Ludmila I. Kuncheva
2
Outline
IntroductionRotation forestsExperimental resultsConclusions
3
Outline
IntroductionRotation forestsExperimental resultsConclusions
4
Introduction(1)
Why classifier ensemble? combine the predictions of multiple classifiers
instead of single classifierMotivation - reduce variance: less dependent on
peculiarities of a single training set - reduce bias: learn a more expressive
concept class than a single classifier
5
Introduction(2)
Key step:
formation of an ensemble of diverse classifiers from a single training set
It’s necessary to modify the data set (Bagging, Boosting) or the learning method (Random Forest) to create different classifiers
Performance evaluation:
diversity, accuracy
6
Bagging(1)
7
Bagging(2)
Bootstrap sample
- the individual classifiers have high classification accuracy
- low diversity
1. for m = 1 to M // M ... number of iterations a) draw (with replacement) a bootstrap sample Sm of the data b) learn a classifier Cm from Sm
2. for each test example a) try all classifiers Cm
b) predict the class that receives the highest number of votes
8
Boosting
Basic idea:
- later classifiers focus on examples that were misclassified by earlier classifiers
- weight the predictions of the classifiers with their error
9
Bagging vs. Boosting
Making the classifiers diverse will reduce individual accuracy accuracy-diversity dilemma
AdaBoost creates inaccurate classifiers by forcing them to concentrate on difficult objects and ignore the rest of the data large diversity that boost the ensemble performance
10
Outline
IntroductionRotation forestsExperimental resultsConclusions
11
Rotation Forest(1)
Rotation forest transforms the data set while preserving all information
PCA is used to transform the data
- subset of the instances
- subset of the classes
- subset of the features: low computation, low storage
12
13
Rotation Forest(2)
Base classifiers: decision tree Forest
PCA is a simple rotation of the coordinate axes Rotation Forest
14
Method(1)
X: the objects in the training data set
x = [x1, x2, …, xn]T a data point with n features
1 1 11 2
1 2
n
N N Nn
x x x
X
x x x
N×n matrix
Y = [y1, y2, …, yN]T : class label with c classes
15
Method(2)
Given: - L : the number of classifiers in the ensemble
(D1, D2, …, DL)
- F : the feature set- X, YAll classifiers can be trained in parallel
16
Method(3)
For i = 1 … L (to construct the training set for
classifier Di)
F : feature set
Fi,1 Fi,2
Fi,3
…Fi,K
K subsets (Fi,j j=1…K)
each has M = n/K features
17
Method(3)
For j = 1 … K
F1,1 F1,2
F1,3
…F1,K
X1,1: data set X for the features in F1,1
Eliminate a random subset of classesSelect a bootstrap sample from X1,1 to obtain X’1,1
Run PCA on X’1,1 using only M features
Principal components a(1)1,1,…,a(M1)
1,1
18
Method(4)
Arrange the principal components for all j to obtain rotation matrix
Rearrange the rows of R1 so as to match the order of features in F obtain R1
a
Build classifier D1 using XR1a as a training set
1
2
( )(1) (2)1,1 1,1 1,1
( )(1) (2)1,2 1,2 1,2
1
( )(1) (2)1, 1, 1,
, ,..., [0] [0]
[0] , ,..., [0]
[0] [0] , ,..., K
M
M
MK K K
a a a
a a aR
a a a
1 1 11 2
1 2
n
N N Nn
x x x
X
x x x
19
How It Works ?
Diversity - Each decision tree uses different set of
axes. - Trees are sensitive to rotation of the axesAccuracy - No principal components are discarded - The whole data set is used to train each
classifier (with different extracted features)
20
Outline
IntroductionRotation forestsExperimental resultsConclusions
21
Experimental Results(1)
Experimental settings:
1. Bagging, AdaBoost, and Random Forest were kept at their default values in WEKA
2. for Rotation Forest, M is fixed to be 3
3. all ensemble methods have the same L
4. base classifier: tree classifier J48 (WEKA)
5. database: UCI Machine Learning Repository
Waikato environment for knowledge analysis
22
Database
23
Experimental Results(2)
TABLE 2 Classification Accuracy and Standard Deviation of J48 and Ensemble Methods without Pruning
15 10-fold cross validation
24
Experimental Results(3)
Fig. 1. Percentage diagram for the four studied ensemble methods withunpruned J48 trees.
3.03%
24.24 %
3.03%
69.70%
25
Experimental Results (4)
Fig. 2. Comparison of accuracy of Rotation Forest ensemble (RF) and the best accuracy from any of a single tree, Bagging, Boosting, and Random Forest ensembles.
26
Diversity-Error Diagram
Pairwise diversity measures were chosenKappa(κ) evaluates the level of agreement bet
ween two classifier outputsDiversity-error diagram
- x-axis: κ for the pair
- y-axis: averaged individual error of Di and Dj
Ei,j=(Ei+Ej)/2
- small values of κ indicate the better diversity and small values of Ei,j indicate better accuracy
κ
Ei,j
27
Experimental Results (5)
Rotation Forest has the potential to improve on diversity significantly without compromising the individual accuracy
Fig. 3. Kappa-error diagrams for the vowel-n data set.
28
Experimental Results (6)Rotation Forest is not a
s diverse as the other ensembles but clearly has the most accurate classifiers
Rotation Forest is similar to Bagging, but more accurate and diverse
Fig. 4. Kappa-error diagrams for the waveform data set.
29
Conclusions
Rotation Forest transforms the data with different axes while preserve the information completely achieve diversity and accuracy
Rotation Forest gives a scope for ensemble methods “on the side of Bagging”
30
References
J.J. Rodriguez, L.I Kuncheva, and C.J. Alonso, “Rotation Forest: A New Classifier Ensemble Method,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1619-1630, Oct. 2006
J.J. Rodriguez, C. J. Alonso, “Rotation-based ensembles,” Proc. Current Topics in Artificial Intelligence: 10th Conference of the Spanish Association for Artificial Intelligence, LNAI 3040, Springer, 2004, 498-506.
J. Furnkranz, “Ensemble Classifiers” (class notes)