Upload
kylee-vang
View
33
Download
2
Embed Size (px)
DESCRIPTION
Statistical Tests for Computational Intelligence Research and Human Subjective Tests. Slides are at http://www.design.kyushu-u.ac.jp/~takagi/TAKAGI/StatisticalTests.html. Hideyuki TAKAGI Kyushu University, Japan http://www.design.kyushu-u.ac.jp/~takagi /. ver. July 15, 2013 - PowerPoint PPT Presentation
Citation preview
Statistical Tests for Computational Intelligence Research and
Human Subjective Tests
Hideyuki TAKAGIKyushu University, Japan
http://www.design.kyushu-u.ac.jp/~takagi/ver. March 26, 2015ver. July 15, 2013ver. July 11, 2013ver. April 23, 2013
Slides are downloadable fromhttp://www.design.kyushu-u.ac.jp/~takagi
ContentsContents2 groups n groups (n > 2)
datadistribution
・ unpaired t -test
・ sign test
・ Wilcoxon signed-ranks test ・ Friedman test
・ Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
(no
norm
ality
)
one-waydata
two-waydata
Par
am
etric
Tes
tN
on-p
ara
met
ric T
est
(nor
mal
ity)
unp
aire
d(in
dep
ende
nt)
pair
ed(r
ela
ted
)un
pai
red
(ind
epen
dent
)pa
ired
(rel
ate
d)
・ paired t -test
・ Mann-Whitney U-test
AN
OV
A(A
naly
sis
of V
aria
nce
)
+ Scheffé's method of paired comparison for Human Subjective Tests
fitne
ss
generations
fitne
ss
generations
conventionalEC
proposed EC1 proposed EC2
How to Show Significance?How to Show Significance?
conventionalEC
Just compare averages visually?It is not scientific.
Fig. XX Average convergence curves of n times of trial runs.
How to Show Significance?How to Show Significance?
sound made byconventional IEC
sound made byproposed IEC1
sound made byproposed IEC2
Sound design concept: exiting
Which method is good to make exiting sound?How to show it?
You cannot show the superiority of your method without statistical tests.You cannot show the superiority of your method without statistical tests.
Papers without statistics tests may be rejected.
statistical test
My method is significantly better!
Which Test Should We Use?Which Test Should We Use?2 groups n groups (n > 2)
datadistribution
・ unpaired t -test
・ sign test
・ Wilcoxon signed-ranks test ・ Friedman test
・ Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
(no
norm
ality
)
one-waydata
two-waydata
Pa
ram
etr
ic T
est
No
n-p
ara
me
tric
Tes
t(n
orm
ality
)
unpa
ired
(ind
epen
dent
)pa
ired
(rel
ated
)un
paire
d(in
dep
ende
nt)
paire
d(r
elat
ed)
・ paired t -test
・ Mann-Whitney U-test
AN
OV
A(A
naly
sis
of V
aria
nce)
Which Test Should We Use?Which Test Should We Use?2 groups n groups (n > 2)
datadistribution
・ unpaired t -test
・ sign test
・ Wilcoxon signed-ranks test ・ Friedman test
・ Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
(no
norm
ality
)
one-waydata
two-waydata
Pa
ram
etr
ic T
est
No
n-p
ara
me
tric
Tes
t(n
orm
ality
)
unpa
ired
(ind
epen
dent
)pa
ired
(rel
ated
)un
paire
d(in
dep
ende
nt)
paire
d(r
elat
ed)
・ paired t -test
・ Mann-Whitney U-test
AN
OV
A(A
naly
sis
of V
aria
nce)
n-th generation n-th generation
2 groups n groups (n > 2)
datadistribution
・ unpaired t -test
・ sign test
・ Wilcoxon signed-ranks test ・ Friedman test
・ Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
one-waydata
two-waydata
unpa
ired
(ind
epen
dent
)pa
ired
(rel
ated
)un
paire
d(in
dep
ende
nt)
paire
d(r
elat
ed)
・ paired t -test
・ Mann-Whitney U-test
AN
OV
A(A
naly
sis
of V
aria
nce)
(no
norm
ality
)P
ara
me
tric
Te
stN
on
-pa
ram
etr
ic T
est
(nor
mal
ity)
Which Test Should we Use?Which Test Should we Use?
• Anderson-Darling test• D'Agostino-Pearson test• Kolmogorov-Smirnov test• Shapiro-Wilk test• Jarque–Bera test ・・・・
Normality Test
Find a free Excel add-in or software.
2 groups n groups (n > 2)
datadistribution
・ unpaired t -test
・ sign test
・ Wilcoxon signed-ranks test ・ Friedman test
・ Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
one-waydata
two-waydata
・ paired t -test
・ Mann-Whitney U-test
AN
OV
A(A
naly
sis
of V
aria
nce)
(no
norm
ality
)P
ara
me
tric
Te
stN
on
-pa
ram
etr
ic T
est
(nor
mal
ity)
Which Test Should We Use?Which Test Should We Use?
unpa
ired
(ind
epen
dent
)pa
ired
(rel
ated
)un
paire
d(in
dep
ende
nt)
paire
d(r
elat
ed)
initialdata
#
conventional
proposed
1 4.23 2.51
2 3.21 3.30
3 3.63 3.75
4 4.42 3.22
5 4.08 3.99
6 3.98 3.65
unpaired data(independent)
paired data(related)
group A group B
4.23 2.51
3.21 3.3
3.63 3.75
4.42 3.22
4.08 3.99
3.98 3.65
2 groups n groups (n > 2)
datadistribution
・ unpaired t -test
・ sign test
・ Wilcoxon signed-ranks test ・ Friedman test
・ Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
one-waydata
two-waydata
・ paired t -test
・ Mann-Whitney U-test
AN
OV
A(A
naly
sis
of V
aria
nce)
(no
norm
ality
)P
ara
me
tric
Te
stN
on
-pa
ram
etr
ic T
est
(nor
mal
ity)
Which Test Should We Use?Which Test Should We Use?
unpa
ired
(ind
epen
dent
)pa
ired
(rel
ated
)un
paire
d(in
dep
ende
nt)
paire
d(r
elat
ed)
unpaired data(independent)
paired data(related)
A group data B group data initialdata
#GA proposed
2 groups n groups (n > 2)
datadistribution
・ unpaired t -test
・ sign test
・ Wilcoxon signed-ranks test ・ Friedman test
・ Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
one-waydata
two-waydata
・ paired t -test
・ Mann-Whitney U-test
AN
OV
A(A
naly
sis
of V
aria
nce)
(no
norm
ality
)P
ara
me
tric
Te
stN
on
-pa
ram
etr
ic T
est
(nor
mal
ity)
Which Test Should We Use?Which Test Should We Use?
unpa
ired
(ind
epen
dent
)pa
ired
(rel
ated
)un
paire
d(in
dep
ende
nt)
paire
d(r
elat
ed)
unpaired data(independent)
paired data(related)
A group data B group data initialdata
#GA proposed
Q1: Which tests are more sensitive, those for unpaired data or paired data?
A1: Statistical tests for paired data because of more data information.
2 groups n groups (n > 2)
datadistribution
・ unpaired t -test
・ sign test
・ Wilcoxon signed-ranks test ・ Friedman test
・ Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
one-waydata
two-waydata
・ paired t -test
・ Mann-Whitney U-test
AN
OV
A(A
naly
sis
of V
aria
nce)
(no
norm
ality
)P
ara
me
tric
Te
stN
on
-pa
ram
etr
ic T
est
(nor
mal
ity)
Which Test Should We Use?Which Test Should We Use?
unpa
ired
(ind
epen
dent
)pa
ired
(rel
ated
)un
paire
d(in
dep
ende
nt)
paire
d(r
elat
ed)
Q2: How should you design your experimental conditions to use statistical tests for paired data and reduce the # of trial runs?A2: Use the same initialized data for the set of (method A, method B) at each trial run.
n-th generation
significant?
2 groups n groups (n > 2)
datadistribution
・ unpaired t -test
・ sign test
・ Wilcoxon signed-ranks test ・ Friedman test
・ Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
one-waydata
two-waydata
unpa
ired
(ind
epen
dent
)pa
ired
(rel
ated
)un
paire
d(in
dep
ende
nt)
paire
d(r
elat
ed)
・ paired t -test
・ Mann-Whitney U-test
AN
OV
A(A
naly
sis
of V
aria
nce)
(no
norm
ality
)P
ara
me
tric
Te
stN
on
-pa
ram
etr
ic T
est
(nor
mal
ity)
Which Test Should we Use?Which Test Should we Use?Q3: Which statistical tests are sensitive, parametric tests or non-parametric ones and why?
A3: Parametric tests which can use information of assumed data distribution.
t -Testt -Test2 groups n groups (n > 2)
datadistribution
・ unpaired t -test
・ sign test
・ Wilcoxon signed-ranks test ・ Friedman test
・ Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
(no
norm
ality
)
one-waydata
two-waydata
Pa
ram
etr
ic T
est
No
n-p
ara
me
tric
Tes
t(n
orm
ality
)
unpa
ired
(ind
epen
dent
)pa
ired
(rel
ated
)un
paire
d(in
dep
ende
nt)
paire
d(r
elat
ed)
・ paired t -test
・ Mann-Whitney U-test
AN
OV
A(A
naly
sis
of V
aria
nce)
t -test
t-TestHow to Show Significance?
t-TestHow to Show Significance?
n-th generation
significant?
A B
12 10
14 9
14 7
11 15
16 11
19 10
significantdifference?
Conditions to use t-tests:(1) normality(2) equal variances (not essential though )
Test this difference with assuming no difference.( null hypothesis )
t-Testt-Test
Conditions to use t-tests:(1) normality(2) equal variances (not essential though )
A B
12 10
14 9
14 7
11 15
16 11
19 10
significantdifference?
Test this difference with assuming no difference.( null hypothesis )
Normality Test• Anderson-Darling test• D'Agostino-Pearson test• Kolmogorov-Smirnov test• Shapiro-Wilk test• Jarque–Bera test ・・・・
F-Test
When (p > 0.05), we assume that there is no significant difference between σ2
A and σ2B .
t-Testt-Test
t-Testt-TestExcel (32 bits version only?) has t-tests and ANOVA in Data Analysis Tools. You must install its add-in. (File -> option -> add-in, and set its add-in.)
t-Testt-Test
(1) t-Test: Pairs two sample for means
(2) t-Test: Two-sample assuming equal variances
(3) t-Test: Two-sample assuming unequal variances: Welch's t-test
n-th generation
significant? This is a case when each pair of two methods with the same initial condition.
t-Testt-Test
A B 4.23 2.51 3.21 3.31 3.63 3.75 4.42 3.22 4.08 3.99 3.98 3.65 3.68 3.35 4.18 3.93 3.85 3.91 3.71 3.82
Variable 1 Variable 2
Mean 3.897 3.544
Variance 0.125823333 0.208693333
Observations 10 10
Pearson Correlation -0.161190073Hypothesized Mean Difference
0
df 9
t Stat 1.794964241
P(T<=t) one-tail 0.053116886
t Critical one-tail 1.833112933
P(T<=t) two-tail 0.106233772
t Critical two-tail 2.262157163
sample data t-Test: Paired Two Sample for Means
t-Testt-Test
A B 4.23 2.51 3.21 3.31 3.63 3.75 4.42 3.22 4.08 3.99 3.98 3.65 3.68 3.35 4.18 3.93 3.85 3.91 3.71 3.82
Variable 1 Variable 2
Mean 3.897 3.544
Variance 0.125823333 0.208693333
Observations 10 10
Pearson Correlation -0.161190073Hypothesized Mean Difference
0
df 9
t Stat 1.794964241
P(T<=t) one-tail 0.053116886
t Critical one-tail 1.833112933
P(T<=t) two-tail 0.106233772
t Critical two-tail 2.262157163
sample data t-Test: Paired Two Sample for Means
2.5% 2.5% 5%
When p-value is less than 0.01 or 0.05, we assume that there is significant difference with the level of significance of (p < 0.01) or (p < 0.05).
A > B A < BA ≈ B When A>B never happens, you may use a one-tail test.
t-Testt-Test
(1) t-Test: Pairs two sample for means (2) t-Test: Two-sample assuming equal variances
Difference between two groups is significant (p < 0.01).
We cannot say that there is a significant difference between two group.
ANOVA: Analysis of VarianceANOVA: Analysis of Variance2 groups n groups (n > 2)
datadistribution
・ unpaired t -test
・ sign test
・ Wilcoxon signed-ranks test ・ Friedman test
・ Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
(no
norm
ality
)
one-waydata
two-waydata
Pa
ram
etr
ic T
est
No
n-p
ara
me
tric
Tes
t(n
orm
ality
)
unpa
ired
(ind
epen
dent
)pa
ired
(rel
ated
)un
paire
d(in
dep
ende
nt)
paire
d(r
elat
ed)
・ paired t -test
・ Mann-Whitney U-test
AN
OV
A(A
naly
sis
of V
aria
nce)
ANOVA
ANOVA: Analysis of VarianceANOVA: Analysis of Variance
n-th generation
sign
ifican
t?
If data have
(1) normality
(2) equal variances
A B C
11.0 12.8 9.4
9.3 11.3 12.4
11.5 9.5 16.8
16.4 14.0 14.3
16.0 15.2 17.0
15.0 13.0 14.6
12.8 12.4 17.0
13.6 15.0 14.3
13.0 12.4 15.6
12.0 17.8 15.0
13.4 12.6 18.6
10.0 13.4 12.4
10.8 16.8 15.4
1. Analysis of more than two data groups.2. Normality and equal variance are required.
C A B
Excel has ANOVA in Data Analysis Tools.
ANOVA: Analysis of VarianceANOVA: Analysis of Variance
A B C
11.0 12.8 9.4
9.3 11.3 12.4
11.5 9.5 16.8
16.4 14.0 14.3
16.0 15.2 17.0
15.0 13.0 14.6
12.8 12.4 17.0
13.6 15.0 14.3
13.0 12.4 15.6
12.0 17.8 15.0
13.4 12.6 18.6
10.0 13.4 12.4
10.8 16.8 15.4
1. Analysis of more than two data groups.2. Normality and equal variance are required.
C A B
three t-tests one ANOVA=1-(1-0.05)3 = 0.14
Three times of t-test with (p<0.05) equivalent one ANOVA (p<0.14).
Excel has ANOVA in Data Analysis Tools.
ANOVA: Analysis of VarianceANOVA: Analysis of Variance
Check it using the Bartlett test.
ANOVA: Analysis of VarianceANOVA: Analysis of Variance
n-th generation
When data are independent, use one-way ANOVA (single factor ANOVA).
When data correspond each other, use two-way ANOVA (two-factor ANOVA).
ANOVA: Analysis of VarianceANOVA: Analysis of Variance
When data are independent, use one-way ANOVA (single factor ANOVA).
When data correspond each other, use two-way ANOVA (two-factor ANOVA).
Q1: What are "single factor" and "two factors"?
A1: A column factor (e.g. three groups) and a sample factor (e.g. initialized condition).
column factor
sam
ple
fact
or
column factor
ANOVA: Analysis of VarianceANOVA: Analysis of Variance
column factor
sam
ple
fact
or
column factor
one-factor (one-way) ANOVA two-factor (two-way) ANOVA
group A group B group C4.23 2.51 3.043.21 3.3 2.893.63 3.75 3.554.42 3.22 4.394.08 3.99 3.863.98 3.65 3.53.75 2.62 3.63.22 2.93 3.21
initial condition
group A group B group C
#1 4.23 2.51 3.04#2 3.21 3.3 2.89#3 3.63 3.75 3.55#4 4.42 3.22 4.39#5 4.08 3.99 3.86#6 3.98 3.65 3.5#7 3.75 2.62 3.6#8 3.22 2.93 3.21
We cannot say that three groups are significantly different. (p=0.089)
There are significant difference somewhere among three groups. (p<0.05)
ANOVA: Analysis of VarianceANOVA: Analysis of Variance
Output of the one-way ANOVASource of Variation SS df MS F P-value F crit
Between Groups 6.11342 2 3.05671 15.30677 3.6E-05 3.354131Within Groups 5.39181 27 0.199697
Total 11.50523 29
Output of the two-way ANOVASource of Variation SS df MS F P-value F crit
Sample 0.755233 2 0.377617 2.755097 0.103596 3.885294Columns 3.582272 1 3.582272 26.13631 0.000256 4.747225Interaction 0.139411 2 0.069706 0.508573 0.613752 3.885294Within 1.644733 12 0.137061
Total 6.12165 17
When (p-value < 0.01 or 0.05), there is(are) significant differencesomewhere among data groups.
• Significant difference among Sample (e.g. initial conditions) cannot be found (p > 0.05).
• Significant difference can be found somewhere among Columns (e.g. three methods) (p < 0.01).
• We need not care an interaction effect between two factors (e.g. initial condition vs. methods) (p > 0.05).
A B C
11.0 12.8 9.4
9.3 11.3 12.4
11.5 9.5 16.8
16.4 14.0 14.3
16.0 15.2 17.0
15.0 13.0 14.6
12.8 12.4 17.0
13.6 15.0 14.3
13.0 12.4 15.6
12.0 17.8 15.0
13.4 12.6 18.6
10.0 13.4 12.4
10.8 16.8 15.4
Sam
ple
fact
or
Columnfactor
ANOVA: Analysis of VarianceANOVA: Analysis of Variance
Source of Variation SS df MS F P-value F critSample 0.755233 2 0.377617 2.755097 0.103596 3.885294Columns 3.582272 1 3.582272 26.13631 0.000256 4.747225Interaction 0.139411 2 0.069706 0.508573 0.613752 3.885294Within 1.644733 12 0.137061
Total 6.12165 17
A B C
11.0 12.8 9.4
9.3 11.3 12.4
11.5 9.5 16.8
16.4 14.0 14.3
16.0 15.2 17.0
15.0 13.0 14.6
12.8 12.4 17.0
13.6 15.0 14.3
13.0 12.4 15.6
12.0 17.8 15.0
13.4 12.6 18.6
10.0 13.4 12.4
10.8 16.8 15.4
Sam
ple
fact
or
Columnfactor
Q1: Where is significant among A, B, and C?
A1: Apply multiple comparisons between all pairs among columns.(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each has different characteristics.)
significant?
Non-Parametric TestsNon-Parametric Tests
2 groups n groups (n > 2)
datadistribution
・ unpaired t -test
・ sign test
・ Wilcoxon signed-ranks test ・ Friedman test
・ Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
(no
norm
ality
)
one-waydata
two-waydata
Pa
ram
etr
ic T
est
No
n-p
ara
me
tric
Tes
t(n
orm
ality
)
unpa
ired
(ind
epen
dent
)pa
ired
(rel
ated
)un
paire
d(in
dep
ende
nt)
paire
d(r
elat
ed)
・ paired t -test
・ Mann-Whitney U-test
AN
OV
A(A
naly
sis
of V
aria
nce)
If normality and equal variances are not guaranteed, use non-parametric tests.
Mann-Whitney U-testMann-Whitney U-test2 groups n groups (n > 2)
datadistribution
・ unpaired t -test
・ sign test
・ Wilcoxon signed-ranks test ・ Friedman test
・ Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
(no
norm
ality
)
one-waydata
two-waydata
Pa
ram
etr
ic T
est
No
n-p
ara
me
tric
Tes
t(n
orm
ality
)
unpa
ired
(ind
epen
dent
)pa
ired
(rel
ated
)un
paire
d(in
dep
ende
nt)
paire
d(r
elat
ed)
・ paired t -test
・ Mann-Whitney U-test
AN
OV
A(A
naly
sis
of V
aria
nce)
Mann-Whitney U-test(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)
Mann-Whitney U-test(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)
n-th generation
?
?no normality
1. Comparison of two groups.2. Data have no normality.3. There are no data corresponding
between two groups (independent).
Mann-Whitney U-test(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)
Mann-Whitney U-test(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)
1. Calculate a U value.
0
2
34
U = 0 + 2 + 3 + 4 = 9U' = 11 (U + U' = n1n2)
when two values are the same, count as 0.5.( )
2. See a U-test table.
Mann-Whitney U-test (cont.)(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)
Mann-Whitney U-test (cont.)(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)
• Use the smaller value of U or U'.• When n1 ≤ 20 and n2 ≤ 20 , see a Mann-Whitney test table.
(where n1 and n2 are the # of data of two groups.)• Otherwise, since U follows the below normal distribution roughly,
normalize U as and check a standard normal distribution table
with the , where and .
12
)1(,
2, 2121212 nnnnnn
NN UU
U
UUz
221nn
U 12
)1( 2121
nnnnUz
Use an Excel function to calculate the p-value for the z-value: p-value = 1 - NORM.S.DIST( z )
Examples: Mann-Whitney U-test(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)
Examples: Mann-Whitney U-test(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)
U = 9U' = 11
0
2
34
00.5
2.54
5
3.5
55
5
U = 23.5U' = 1.5
U = 12U' = 13
Ex.1 Ex.2 Ex.3
4 5 6・・・
・・・
ー・・・
・・・
・・・
4 0 1 2・・・
5 2 3・・・
・・・
・・・
・・・
n1
n2
(p < 0.05)
4 5 6・・・
・・・
ー・・・
・・・
・・・
4 ー ー 0・・・
5 1 1・・・
・・・
・・・
・・・
n1
n2
(p < 0.01)
5
(p > 0.05)
(p > 0.05)
(p >
0.0
5)
significant (p < 0.05)
Exercise: Mann-Whitney U-test(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)
Exercise: Mann-Whitney U-test(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)
2.5
45
6
66
U = 29.5U' = 6.5
4 5 6 7
3 ー 0 1 1
4 0 1 2 3
5 2 3 5
6 5 6
n1
n2
(p < 0.05)
4 5 6 7
3 ー ー ー ー
4 ー ー 0 0
5 1 1 1
6 2 3
n1
n2
(p < 0.01)
Since U' > 5, (p > 0.05): significance is not found( )
Sign TestSign Test2 groups n groups (n > 2)
datadistribution
・ unpaired t -test
・ sign test
・ Wilcoxon signed-ranks test ・ Friedman test
・ Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
(no
norm
ality
)
one-waydata
two-waydata
Pa
ram
etr
ic T
est
No
n-p
ara
me
tric
Tes
t(n
orm
ality
)
unpa
ired
(ind
epen
dent
)pa
ired
(rel
ated
)un
paire
d(in
dep
ende
nt)
paire
d(r
elat
ed)
・ paired t -test
・ Mann-Whitney U-test
AN
OV
A(A
naly
sis
of V
aria
nce)
( 1 ) Sign Test
( 2 ) Wilcoxon's Signed Ranks Test
significance test between the # of winnings and losses
173 174
143 137
158 151
156 143
176 180
165 162
- ++ -+ -+ -- ++ -
-1+6+7
+13-4+3
data of 2 groups
# of winnings and losses
the level of winnings/losses
Sign TestSign Test
significance test using both the # of winnings and losses and the level of winnings/losses
n-th generation
Sign TestSign Test
1. Calculate the # of winnings and losses by comparing runs with the same initial data.
2. Check a sign test table to show significance of two methods.
Sign TestSign TestFig.3 in Y. Pei and H. Takagi, "Fourier analysis of the fitness landscape for evolutionary search acceleration," IEEE Congress on Evolutionary Computation (CEC), pp.1-7, Brisbane, Australia (June 10-15, 2012).
The (+,-) marks show whether our proposed methods converge significantly better or poorer than normal DE, respectively, (p ≤0.05).
Fig.2 in the same paper.
Task ExampleWhether performances of pattern recognition methods A and B are significantly different?
n1 cases: Both methods succeeded.n2 cases: Method A succeeded, and method B failed.n3 cases: Method A failed, and method B succeeded.n4 cases: Both methods failed.
1. Set N = n2 + n3.2. Check the right table with the N.3. If min(n2, n3) is smaller than the number for the N,
we can say that there is significant difference with the significant risk level of XX.
How to check?
Whether there is significant difference for n2 = 12 and n3 = 28?
Exercise
level of significance
Sign TestSign Test %%
ANSWER:Check the right table with N = 40.As n2 is bigger than 11 and smaller than 13, we can say that there is a significant difference between two with (p < 0.05) but cannot say so with (p < 0.01).
% %
level of significance
level of significance
%%Sign TestSign Test
Let's think about the case of N = 17.
To say that n1 and n2 are significantly different,
(n1 vs. n2) = (17 vs. 0), (16 vs. 1), or (15 vs. 2) (p < 0.01)
or(n1 vs. n2) = (14 vs. 3) or (13 vs. 4) (p < 0.05)
Check the significance of:
Exercise: Sign TestExercise: Sign Test level of significance
%%
16 vs. 4
14 vs. 1
9 vs. 3
18 vs. 5
Wilcoxon Signed-Ranks TestWilcoxon Signed-Ranks Test2 groups n groups (n > 2)
datadistribution
・ unpaired t -test
・ sign test
・ Wilcoxon signed-ranks test ・ Friedman test
・ Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
(no
norm
ality
)
one-waydata
two-waydata
Pa
ram
etr
ic T
est
No
n-p
ara
me
tric
Tes
t(n
orm
ality
)
unpa
ired
(ind
epen
dent
)pa
ired
(rel
ated
)un
paire
d(in
dep
ende
nt)
paire
d(r
elat
ed)
・ paired t -test
・ Mann-Whitney U-test
AN
OV
A(A
naly
sis
of V
aria
nce)
n-th generation
Wilcoxon Signed-Ranks TestWilcoxon Signed-Ranks Test
Q: When a sign test could not show significance, how to do?A: Try the Wilcoxon signed-ranks test. It is more sensitive
than a simple sign test due to more information use.
( 1 ) Sign Test
( 2 ) Wilcoxon's Signed Ranks Test
significance test between the # of winnings and losses
173 174
143 137
158 151
156 143
176 180
165 162
- ++ -+ -+ -- ++ -
-1+6+7
+13-4+3
data of 2 groups
# of winnings and losses
the level of winnings/losses
significance test using both the # of winnings and losses and the level of winnings/losses
Wilcoxon Signed-Ranks TestWilcoxon Signed-Ranks Test
Wilcoxon Signed-Ranks TestWilcoxon Signed-Ranks Test
v (system A) v (system B) difference d rank of |d| add sign to the ranks
rank of fewer # of signs
182 163 19 7 7169 142 27 8 8172 173 - 1 1 - 1 1143 137 6 4 4158 151 7 5 5156 143 13 6 6176 172 4 3 3165 168 - 3 2 - 2 2
(step 1) (step 2) (step 3) (step 4)
(step 5) )4(# StepofT
3
8n
(step 6)
Wilcoxon test table
Example:
(step 6)
n = 8T = 3
Wilcoxon Test Table: significance point of T
T=3 ≤ 3 (n=8, p<0.05), then difference between systems A and B is significant.
T=3 > 0 (n=8, p<0.01), then we cannot say there is a significant difference.
one-tail p < 0.025 p < 0.005
two-tail p < 0.05 p < 0.01
n = 6789
10111213141516171819202122232425
02358
101317212529344046525865738189
013579
121519232732374248546168
When n > 25As T follows the below normal distribution roughly,
normalize T as the below and check a standard normal distribution table with the z; see andin the above equation.
24
)12)(1(,
4
)1(, 2 nnnnn
NN TT
T
TTz
T T
Wilcoxon Signed-Ranks TestWilcoxon Signed-Ranks Test
v (system A) v (system B) difference d rank of |d| add sign to the ranks
rank of fewer # of signs
176 163 13 7 → 6.5 6.5142 142 0172 173 - 1 1 - 1 1143 137 6 4 4158 151 7 5 5156 143 13 6 → 6.5 6.5176 172 4 3 3165 168 - 3 2 - 2 2
(step 1) (step 2) (step 3) (step 4)
Tips:1. When d = 0, ignore the data.2. When there are the same ranks of |d|,
give average ranks.
10
12
3
45
6
7 89
Give the average rank 6.5 = (5+6+7+8)/4.
Tip #1Tip #2
Tip #2
Exercise 1: Wilcoxon Signed-Ranks TestExercise 1: Wilcoxon Signed-Ranks Test(step 1) (step 2) (step 3) (step 4)
v (system A) v (system B) difference d rank of |d| add sign to the ranks
rank of fewer # of signs
182 163
169 142
173 172
143 137
158 151
156 143
176 172
165 168
n = (step 5) )4(# StepofT
(step 6)
Wilcoxon test table
T =
n = 8
Exercise 1: Wilcoxon Signed-Ranks TestExercise 1: Wilcoxon Signed-Ranks Test(step 1) (step 2) (step 3) (step 4)
(step 5) )4(# StepofT
(step 6)
Wilcoxon test table
v (system A) v (system B) difference d rank of |d| add sign to the ranks
rank of fewer # of signs
182 163 19 7 7
169 142 27 8 8
173 172 1 1 1
143 137 6 4 4
158 151 7 5 5
156 143 13 6 6
176 172 4 3 3
165 168 -3 2 -2 2
T = 2
As T(=2) < 3, there is a significant difference between A and B (p<0.05).But, as 0 < T(=2), we cannot say so with the significance level of (p<0.01).
(step 1) (step 2) (step 3) (step 4)
(step 5) )4(# StepofT
(step 6)
Wilcoxon test table
v (system A) v (system B) difference d rank of |d| add sign to the ranks
rank of fewer # of signs
27 31
20 25
34 33
25 27
31 31
23 29
26 27
24 30
35 34
n =
Exercise 2: Wilcoxon Signed-Ranks TestExercise 2: Wilcoxon Signed-Ranks Test
n = 8 (no count for d = 0.)
(step 1) (step 2) (step 3) (step 4)
(step 5) )4(# StepofT
(step 6)
Wilcoxon test table
v (system A) v (system B) difference d rank of |d| add sign to the ranks
rank of fewer # of signs
27 31 -4 5 -5
20 25 -5 6 -6
34 33 1 2 2 2
25 27 -2 4 -4
31 31 0
23 29 -6 7.5 -7.5
26 27 -1 2 -2
24 30 -6 7.5 -7.5
35 34 1 2 2 2
(No need to care the case of d = 0.)
T = 4
As T > 3, we cannot say that there is a significant difference between A and B.
Exercise 2: Wilcoxon Signed-Ranks TestExercise 2: Wilcoxon Signed-Ranks Test
n-th generation
Explain how to apply this test to test whether two groups are significantly different at the below generation?
Exercise 3: Wilcoxon Signed-Ranks TestExercise 3: Wilcoxon Signed-Ranks Test
Kruskal-Wallis TestKruskal-Wallis Test2 groups n groups (n > 2)
datadistribution
・ unpaired t -test
・ sign test
・ Wilcoxon signed-ranks test ・ Friedman test
・ Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
(no
norm
ality
)
two-waydata
Pa
ram
etr
ic T
est
No
n-p
ara
me
tric
Tes
t(n
orm
ality
)
unpa
ired
(ind
epen
dent
)pa
ired
(rel
ated
)un
paire
d(in
dep
ende
nt)
paire
d(r
elat
ed)
・ paired t -test
・ Mann-Whitney U-test
AN
OV
A(A
naly
sis
of V
aria
nce)
Kruskal-Wallis TestKruskal-Wallis Test
n-th generation
1. Comparison of more than two groups.2. Data have no normality.3. There are no data corresponding
among groups (independent). ??? no
nor
mal
ity
Kruskal-Wallis TestKruskal-Wallis Test
Let's use ranks of data.
1
23
45
6 7
8 910
11 121314
1516
17
Kruskal-Wallis TestKruskal-Wallis TestN: total # of datak: # of groupsni: # of data of group iRi : sum of ranks of group i
R1 = 38 R2 = 69 R3 = 46
1. Rank all data.2. Calculate N, k, ni and Ri .3. Calculate statistical value H.
4. If k = 3 and N ≤ 17, compare the H with a significant point in a Kruskal-Wallis test table.Otherwise, assume that H follows the χ2 distribution and test the H using a χ2 distribution table of (k-1) degrees of freedom
)1(3)1(
12
1
2
Nn
R
NNH
k
i i
i
1
23
4 56 7
8 91011 1213
141516
17
How to Test
Example: Kruskal-Wallis TestExample: Kruskal-Wallis Testn1 n2 n3 p < 0.05 p < 0.01 n1 n2 n3 p < 0.05 p < 0.012 2 2 - - 3 3 3 5.606 7.200 2 2 3 4.714 - 3 3 4 5.791 6.746 2 2 4 5.333 - 3 3 5 6.649 7.079 2 2 5 5.160 6.533 3 3 6 5.615 7.410 2 2 6 5.346 6.655 3 3 7 5.620 7.228 2 2 7 5.143 7.000 3 3 8 5.617 7.350 2 2 8 5.356 6.664 3 3 9 5.589 7.422 2 2 9 5.260 6.897 3 3 10 5.588 7.372 2 2 10 5.120 6.537 3 3 11 5.583 7.418 2 2 11 5.164 6.766 3 4 4 5.599 7.144 2 2 12 5.173 6.761 3 4 5 5.656 7.445 2 2 13 5.199 6.792 3 4 6 5.610 7.500 2 3 3 5.361 - 3 4 7 5.623 7.550 2 3 4 5.444 6.444 3 4 8 5.623 7.585 2 3 5 5.251 6.909 3 4 9 5.652 7.614 2 3 6 5.349 6.970 3 4 10 5.661 7.617 2 3 7 5.357 6.839 3 5 5 5.706 7.578 2 3 8 5.316 7.022 3 5 6 5.602 7.591 2 3 9 5.340 7.006 3 5 7 5.607 7.697 2 3 10 5.362 7.042 3 5 8 5.614 7.706 2 3 11 5.374 7.094 3 5 9 5.670 7.733 2 3 12 5.350 7.134 3 6 6 5.625 7.725 2 4 4 5.455 7.036 3 6 7 5.689 7.756 2 4 5 5.273 7.205 3 6 8 5.678 7.796 2 4 6 5.340 7.340 3 7 7 5.688 7.810 2 4 7 5.376 7.321 4 4 4 5.692 7.654 2 4 8 5.393 7.350 4 4 5 5.657 7.760 2 4 9 5.400 7.364 4 4 6 6.681 7.795 2 4 10 5.345 7.357 4 4 7 5.650 7.814 2 4 11 5.365 7.396 4 4 8 5.779 7.853 2 5 5 5.339 7.339 4 4 9 5.704 7.910 2 5 6 5.339 7.376 4 5 5 5.666 7.823 2 5 7 5.393 7.450 4 5 6 5.661 7.936 2 5 8 5.415 7.440 4 5 7 5.733 7.931 2 5 9 5.396 7.447 4 5 8 5.718 7.992 2 5 10 5.420 7.514 4 6 6 5.724 8.000 2 6 6 5.410 7.467 4 6 7 5.706 8.039 2 6 7 5.357 7.491 5 5 5 5.780 8.000 2 6 8 5.404 7.522 5 5 6 5.729 8.028 2 6 9 5.392 7.566 5 5 7 5.708 8.108 2 7 7 5.398 7.491 5 6 6 5.765 8.124 2 7 8 5.403 7.571
Kruskal-Wallis Test Table(for k = 3 and N ≤17)
N = n1+n2+n3 = 17 datak = 3 groups(n1, n2, n3) = (6, 5, 6)(R1, R2, R3) = (38, 69, 46)
)1(3)1(
12
1
2
Nn
R
NNH
k
i i
i
)117(36
46*46
5
69*69
6
38*38
)117(17
12
= 6.609
Since significant points of (p<0.05) and (p<0.01) for (n1, n2, n3) = (6, 5, 6) are 5.765 and 8.124, respectively, there are significant difference(s) somewhere among three groups (p<0.05).
6.6098.1245.765
significance point of (p<0.05)
significance point of (p<0.01)
Example: Kruskal-Wallis TestExample: Kruskal-Wallis Testn1 n2 n3 p < 0.05 p < 0.01 n1 n2 n3 p < 0.05 p < 0.012 2 2 - - 3 3 3 5.606 7.200 2 2 3 4.714 - 3 3 4 5.791 6.746 2 2 4 5.333 - 3 3 5 6.649 7.079 2 2 5 5.160 6.533 3 3 6 5.615 7.410 2 2 6 5.346 6.655 3 3 7 5.620 7.228 2 2 7 5.143 7.000 3 3 8 5.617 7.350 2 2 8 5.356 6.664 3 3 9 5.589 7.422 2 2 9 5.260 6.897 3 3 10 5.588 7.372 2 2 10 5.120 6.537 3 3 11 5.583 7.418 2 2 11 5.164 6.766 3 4 4 5.599 7.144 2 2 12 5.173 6.761 3 4 5 5.656 7.445 2 2 13 5.199 6.792 3 4 6 5.610 7.500 2 3 3 5.361 - 3 4 7 5.623 7.550 2 3 4 5.444 6.444 3 4 8 5.623 7.585 2 3 5 5.251 6.909 3 4 9 5.652 7.614 2 3 6 5.349 6.970 3 4 10 5.661 7.617 2 3 7 5.357 6.839 3 5 5 5.706 7.578 2 3 8 5.316 7.022 3 5 6 5.602 7.591 2 3 9 5.340 7.006 3 5 7 5.607 7.697 2 3 10 5.362 7.042 3 5 8 5.614 7.706 2 3 11 5.374 7.094 3 5 9 5.670 7.733 2 3 12 5.350 7.134 3 6 6 5.625 7.725 2 4 4 5.455 7.036 3 6 7 5.689 7.756 2 4 5 5.273 7.205 3 6 8 5.678 7.796 2 4 6 5.340 7.340 3 7 7 5.688 7.810 2 4 7 5.376 7.321 4 4 4 5.692 7.654 2 4 8 5.393 7.350 4 4 5 5.657 7.760 2 4 9 5.400 7.364 4 4 6 6.681 7.795 2 4 10 5.345 7.357 4 4 7 5.650 7.814 2 4 11 5.365 7.396 4 4 8 5.779 7.853 2 5 5 5.339 7.339 4 4 9 5.704 7.910 2 5 6 5.339 7.376 4 5 5 5.666 7.823 2 5 7 5.393 7.450 4 5 6 5.661 7.936 2 5 8 5.415 7.440 4 5 7 5.733 7.931 2 5 9 5.396 7.447 4 5 8 5.718 7.992 2 5 10 5.420 7.514 4 6 6 5.724 8.000 2 6 6 5.410 7.467 4 6 7 5.706 8.039 2 6 7 5.357 7.491 5 5 5 5.780 8.000 2 6 8 5.404 7.522 5 5 6 5.729 8.028 2 6 9 5.392 7.566 5 5 7 5.708 8.108 2 7 7 5.398 7.491 5 6 6 5.765 8.124 2 7 8 5.403 7.571
Kruskal-Wallis Test Table(for k = 3 and N ≤17)
N = n1+n2+n3 = 17 datak = 3 groups(n1, n2, n3) = (6, 5, 6)(R1, R2, R3) = (38, 69, 46)
)1(3)1(
12
1
2
Nn
R
NNH
k
i i
i
)117(36
46*46
5
69*69
6
38*38
)117(17
12 3
1
2
i i
i
n
R
= 6.609
Since significant points of (p<0.05) and (p<0.01) for (n1, n2, n3) = (6, 5, 6) are 5.765 and 8.124, respectively, there are significant difference(s) somewhere among three groups (p<0.05).
6.6098.1245.765
significance point of (p<0.05)
significance point of (p<0.01)
Q1: Where is significant among A, B, and C?
A1: Apply multiple comparisons between all pairs among columns.(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each has different characteristics.)
R1 = R2 = R3 =
1
2
4
7
10
8
11
12
13
3
5
6
9
N = n1+n2+n3 =k =(n1, n2, n3) =(R1, R2, R3) =
)1(3)1(
12
1
2
Nn
R
NNH
k
i i
i
=
Exercise: Kruskal-Wallis TestExercise: Kruskal-Wallis Test
24 44 23
13 samples 3 groups( 5, 4, 4)(24, 44, 23)
6.227
7.7605.657
significance point of (p<0.05)
significance point of (p<0.01)
6.227
There is/are significant difference(s) somewhere among three groups (p<0.05).
Friedman TestFriedman Test2 groups n groups (n > 2)
datadistribution
・ unpaired t -test
・ sign test
・ Wilcoxon signed-ranks test ・ Friedman test
・ Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
(no
norm
ality
) one-waydata
Pa
ram
etr
ic T
est
No
n-p
ara
me
tric
Tes
t(n
orm
ality
)
unpa
ired
(ind
epen
dent
)pa
ired
(rel
ated
)un
paire
d(in
dep
ende
nt)
paire
d(r
elat
ed)
・ paired t -test
・ Mann-Whitney U-test
AN
OV
A(A
naly
sis
of V
aria
nce)
Friedman TestFriedman TestWhen (1) more than two groups, (2) data have correspondence (not independent), but (3) the conditions of two-way ANOVA are not satisfied,Let' use ranks of data and Friedman test.
benchmarktasks
methods
a b c d
A 0.92 0.75 0.65 0.81
B 0.48 0.45 0.41 0.52
C 0.56 0.41 0.47 0.50
D 0.61 0.50 0.56 0.54
(ex.) Comparison of recognition rates.a b c d
methods
12
3
4
1
2 3
4 123
4
1
234
Friedman TestFriedman Test
Step 1: Make a ranking table.Step 2: Sum ranks of the factor that you want to test.
Step 3: Calculate the Friedman test value, χ2r .
Step 4: If k =3 or 4, compare χ2r with a significant point
in a Friedman test table. Otherwise, use a χ2 table of (k-1) degrees of freedom.
methods
12
3
4
1
23
4 1 234
123
4
a b c d
)1(3)1(
12
1
22
knRknk
k
iir
where (k, n) are the # of levels of factors 1 and 2.#
of d
ata
(n =
4)
# of methods (k = 4)
benchmarktasks
method
a b c d
A 4 2 1 3 B 3 2 1 4 C 4 1 2 3 D 4 1 3 2
Σ 15 6 7 12 ranking among methods
Example: Friedman TestExample: Friedman TestStep 1: Make a ranking table.Step 2: Sum ranks of the factor that you want to test.
Step 3: Calculate the Friedman test value, χ2r .
Step 4: Since significant point for (k,n) = (4,4) is7.80, there is/are significant difference(s) somewhere among four methods, a, b, c, and d (p<0.05).
1.8
5*4*31276155*4*4
12
)1(3)1(
12
2222
1
22
knRknk
k
iir
# of
dat
a (n
= 4
)
# of methods (k = 4)
benchmarktasks
method
a b c d
A 4 2 1 3 B 3 2 1 4 C 4 1 2 3 D 4 1 3 2
Σ 15 6 7 12
k n p<0.05 p<0.01
3
3 6.00 -4 6.50 8.00
5 6.40 8.40
6 7.00 9.00
7 7.14 8.86
8 6.25 9.00 9 6.22 9.56 ∞ 5.99 9.21
4
3 7.40 9.00
4 7.80 9.60
5 7.80 9.96
∞ 7.81 11.34
Friedman test table.
8.19.67.8
significance point of (p<0.05)
significance point of (p<0.01)
Example: Friedman TestExample: Friedman TestStep 1: Make a ranking table.Step 2: Sum ranks of the factor that you want to test.
Step 3: Calculate the Friedman test value, χ2r .
Step 4: Since significant point for (k,n) = (4,4) is7.80, there is/are significant difference(s) somewhere among four methods, a, b, c, and d (p<0.05).
1.8
5*4*31276155*4*4
12
)1(3)1(
12
2222
1
22
knRknk
k
iir
# of
dat
a (n
= 4
)
# of methods (k = 4)
benchmarktasks
method
a b c d
A 4 2 1 3 B 3 2 1 4 C 4 1 2 3 D 4 1 3 2
Σ 15 6 7 12
k n p<0.05 p<0.01
3
3 6.00 -4 6.50 8.00
5 6.40 8.40
6 7.00 9.00
7 7.14 8.86
8 6.25 9.00 9 6.22 9.56 ∞ 5.99 9.21
4
3 7.40 9.00
4 7.80 9.60
5 7.80 9.96
∞ 7.81 11.34
Friedman test table.
8.19.67.8
significance point of (p<0.05)
significance point of (p<0.01)
Q1: Where is significant among a, b, c, or d?
A1: Apply multiple comparisons between all pairs among columns.(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each has different characteristics.)
Multiple ComparisonsMultiple Comparisons
When there is significant difference among groups, multiple comparison is used to know which group is significantly difference from others.
4C2 = 6 times of pair comparisons with (p < 0.05)
1 - (1 - 0.05)6 = significance level 26.5%!Don't apply multiple pair comparisons simply. Example
Multiple ComparisonsMultiple Comparisons
When there is significant difference among groups, multiple comparison is used to know which group is significantly difference from others.
4C2 = 6 times of pair comparisons with (p < 0.05)
1 - (1 - 0.05)6 = significance level 26.5%!
Solution is to apply multiple pair comparisons with more strict significance level.
Example
Multiple Comparisons
-- Bobferroni method -- Multiple Comparisons
-- Bobferroni method --
When pair comparisons are applied m times,let's use a significance level of p / m.
4C2 = 6 times of pair comparisons with (p < )0.05
6
Features:(1) Simple.(2) Rather strict, i.e. showing significances is rather hard.
Multiple Comparisons
-- Holm method -- Multiple Comparisons
-- Holm method --
Corrected Bonferroni method to detect significances easily.
Example pair comparisons
p-valuecorrected
p-value eqn.corrected
p-value
0.0076 = p-value* 6 0.0456
0.0095 = p-value* 5 0.0475
0.0280 = p-value* 4 0.1120
0.0320 = p-value* 3 0.0960
0.0380 = p-value* 2 0.0760
0.0410 = p-value* 1 0.0410
vs.
vs.
vs.
vs.
vs.
vs.
normality(parametric)
2 groups n groups (n > 2)
datadistribution
t -test
・ sign test
・ Wilcoxon Signed-Ranks Test
ANOVA(Analysis of Variance)
・ Friedman test
・ kruskal-wallis test
one-way ANOVA
two-way ANOVA
no normality(non-parametric)
one-waydata
two-waydata
Scheffé's Method of Paired ComparisonScheffé's Method of Paired Comparison
+ Scheffé's method of paired comparison for Human Subjective Tests
IEC
lighting design of 3-D CG
measuring mental scale
geological simulation
hearing-aid fitting
Corridor
W
K
Wall
L
B
Verenda
MEMS design
EvolutionaryComputation
Target System
subjectiveevaluations
Interactive Evolutionary Computation
imag
e en
hanc
emen
t pr
oces
sing
room layoutplanning design
room lighting design byoptimizing LED assignments
??
Scheffé's Method of Paired ComparisonScheffé's Method of Paired Comparison
ANOVA based on nC2 paired comparisons for n objects.
ANOVAevenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
significance checkusing a yardstick
Scheffé's Method of Paired ComparisonScheffé's Method of Paired Comparison
Original method and three modified methods
All subjects must evaluate all pairs.
no yes
yes original( 原法 , 1952)
Ura's variation(浦の変法 , 1956)
no Haga's variation(芳賀の変法)
Nakaya's variation(中屋の変法 , 1970)or
der
effe
ct
Order Effect
(1) and then
(2) and then
may result different evaluation.
Scheffé's Method of Paired ComparisonScheffé's Method of Paired Comparison
Scheffé's Method of Paired ComparisonScheffé's Method of Paired Comparison
1. Ask N human subjects to evaluate t objects in 3, 5 or 7 grades. 2. Assign [-1, +1], [-2, +2] or [-3, +3] for these grades.3. Then, start calculation (see other material).
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
O1 O2 O3 O4 O5 O6
A1 - A2 2 1 1 2 1 2
A1 - A3 2 2 1 1 1 1
A2 - A3 1 0 1 1 -1 0
Six subjects (N = 6)
Pai
red
com
paris
ons
for
t=
3 ob
ject
s.
Questionnaire Total row data
・・・
strap for a mobile phone
invitation to a dinner
tea /coffee stuffed animal fountain pen
Ex. Q.
Application Example:
What is the best present to be her/his boy/girl friend?
[SITUATION] He/he is my longing. I want to be her/his boy/girl friend before we graduate from our university. To get over my one-way love, I decided to present something of about 3,000 JPY and express my heart.
I show you 5C2 pairs of presents. Please compare each pair and mark your relative evaluation in five levels.
・・
・・
present from a male
(significant difference)
Results of Scheffé's Method of Paired Comparison (Nakaya's variation)
What is the best present to be her/his boy/girl friend?
present from a female
I th
ink
effe
ctiv
e.R
ealit
y is
...
-1 -0.5 0 0.5 1
I will catch her heart by dinner.
mor
eef
fect
ive
less
effe
ctiv
e
-1 -0.5 0 0.5 1
How about tea leave or a stuffed anima?
mor
eef
fect
ive
less
effe
ctiv
e
-1 -0.5 0 0.5 1
Eat! Eat! Eat!
mor
eef
fect
ive
less
effe
ctiv
e
-1 -0.5 0 0.5 1
mor
eef
fect
ive
less
effe
ctiv
e
I hesitate to accept it as we have not gone about with him.
Original method and three modified methods
All subjects must evaluate all pairs.
no yes
yes original( 原法 , 1952)
Ura's variation(浦の変法 , 1956)
no Haga's variation(芳賀の変法)
Nakaya's variation(中屋の変法 , 1970)or
der
effe
ct
Scheffé's Method of Paired Comparison
Modified methods by Ura and NakayaScheffé's Method of Paired Comparison
Modified methods by Ura and Nakaya
Scheffé's Method of Paired Comparison
Modified method by UraScheffé's Method of Paired Comparison
Modified method by UraPairwise comparisons for objects which are effected by display order (order effect).
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
Scheffé's Method of Paired Comparison
Modified method by UraScheffé's Method of Paired Comparison
Modified method by UraAsk N human subjects to evaluate 2×tC2 pairs for t objects in 3, 5 or 7 grades and assign [-1, +1], [-2, +2] or [-3, +3], respectively.
Scheffé's Method of Paired Comparison
Modified method by UraScheffé's Method of Paired Comparison
Modified method by Ura
Step 1: Make paired comparison table of each human subject.
evenbetter betterslightly better
slightly better
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
A1
A2
A1
A1
A2
A4
A4
A3
A3
A2
A4
A3
・・・・・・ ・・・
A1 A2 A3 A4
A1 0 -1 -1
A2 3 0 0
A3 3 1 -1
A4 3 3 1
ijlx : evaluation value when the l-th human subject compares the i-th object with the j-th object.
SubjectO1
Subject O3
SubjectO2
Scheffé's Method of Paired Comparison
Modified method by UraScheffé's Method of Paired Comparison
Modified method by Ura
Step 1: Make paired comparison table of each human subject.
)(2
1ˆ iii xx
tN
-1.1667 0.5417-0.5000 1.1250
A4 A3 A2 A1
1̂2̂3̂4̂
Average of four objects
where t: # of object (4) N: # of human subjects (3)
Scheffé's Method of Paired Comparison
Modified method by UraScheffé's Method of Paired Comparison
Modified method by UraStep 2: Make a table summing all subjects' data and calculate the average evaluations for all objects.
27 13 -12 -28
SxxN
S
Sxxt
S
xxtN
S
i ijjiij
l iilliB
iii
2
2)(
2
)(2
1
)(2
1
)(2
1
l i ijijlT
BBT
ilB
xS
SSSSSSS
Sxtt
S
xtNt
S
2
)()(
2)(
2
)1(
1
)1(
1
freedom. of degree and
,,,,,, where )()(
f
SSSSSSSS TBB
Scheffé's Method of Paired Comparison
Modified method by UraScheffé's Method of Paired Comparison
Modified method by UraStep 3: Make a ANOVA table.
unbiased variance = S/f
F = unbiased variance
unbiased variance of S
for F tests.
SxxN
S
Sxxt
S
xxtN
S
i ijjiij
l iilliB
iii
2
2)(
2
)(2
1
)(2
1
)(2
1
l i ijijlT
BBT
ilB
xS
SSSSSSS
Sxtt
S
xtNt
S
2
)()(
2)(
2
)1(
1
)1(
1
Scheffé's Method of Paired Comparison
Modified method by UraScheffé's Method of Paired Comparison
Modified method by Ura
ANOVA table.
-1.1667 0.5417-0.5000 1.1250
A4 A3 A2 A1
1̂2̂3̂4̂There are significant difference among A1 - A4
Scheffé's Method of Paired Comparison
Modified method by UraScheffé's Method of Paired Comparison
Modified method by Ura
ANOVA table.
Scheffé's Method of Paired Comparison
Modified method by UraScheffé's Method of Paired Comparison
Modified method by UraStep 4: Apply multiple comparisons.
Q1: Where is significant among A1, A2, and A3?
A1: Apply multiple comparisons between all pairs.(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each has different characteristics.)
Step 4: Apply multiple comparisons between all pairs and find which distance is significant.
Example of a simple multiple comparison.• Calculate a studentized yardstick • When a difference of average > a studentized
yardstick, the distance is significant.
Scheffé's Method of Paired Comparison
Modified method by UraScheffé's Method of Paired Comparison
Modified method by Ura
-1.1667 0.5417-0.5000 1.1250
A4 A3 A2 A1
1̂2̂3̂4̂
(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each has different characteristics.)
(See in the next slide.)
Scheffé's Method of Paired Comparison
Modified method by UraScheffé's Method of Paired Comparison
Modified method by Ura
tNftqY 2/ˆ),( 2
Step 4: Example of a simple multiple comparisons.
(studentized yardstick)
When (t, f) = (4,21), studentized yardsticks for significance levels of 5% and 1% are:
),,ˆ( 2 Ntwhere are an unbiased variance of Sε, the # of objects, and the #ofhuman subjects; is a studentized range obtained is a statistical test table for t, the degree of freedom of Sε ( ), and the significant level of φ; see these variables in an ANOVA table.
),( ftqf
)21,4(05.0q
Studentized yardstick ),(05.0 ftq 2 3 4 5 6 7 8 9 10 12 15 20
1 18.0 27.0 32.8 37.1 40.4 43.1 45.4 47.4 49.1 52.0 55.4 59.6
2 6.09 8.30 9.80 10.9 11.7 12.4 13.0 13.5 14.0 14.7 15.7 16.8
3 4.50 5.91 6.82 7.50 8.04 8.48 8.85 9.18 9.46 9.95 10.5 11.2
4 3.93 5.04 5.76 6.29 6.71 7.05 7.35 7.60 7.83 8.21 8.66 9.23
5 3.64 4.60 5.22 5.67 6.03 6.33 6.58 6.80 6.99 7.32 7.72 8.21
6 3.46 4.34 4.90 5.31 5.63 5.89 6.12 6.32 6.49 6.79 7.14 7.59
7 3.34 4.16 4.68 5.06 5.36 5.61 5.82 6.00 6.16 6.43 6.76 7.17
8 3.26 4.04 4.53 4.89 5.17 5.40 5.60 5.77 5.92 6.18 6.48 6.87
9 3.20 3.95 4.42 4.76 5.02 5.24 5.43 5.60 5.74 5.98 6.28 6.64
10 3.15 3.88 4.33 4.65 4.91 5.12 5.30 5.46 5.60 5.83 6.11 6.47
11 3.11 3.82 4.26 4.57 4.82 5.03 5.20 5.35 5.49 5.71 5.99 6.33
12 3.08 3.77 4.20 4.51 4.75 4.95 5.12 5.27 5.40 5.62 5.88 6.21
13 3.06 3.73 4.15 4.45 4.69 4.88 5.05 5.19 5.32 5.53 5.79 6.11
14 3.03 3.70 4.11 4.41 4.67 4.83 4.99 5.10 5.25 5.46 5.72 6.03
15 3.01 3.67 4.08 4.37 4.60 4.78 4.94 5.08 5.20 5.40 5.66 5.96
16 3.00 3.65 4.05 4.33 4.56 4.74 4.90 5.03 5.15 5.35 5.59 5.90
17 2.98 3.63 4.02 4.30 4.52 4.71 4.86 4.99 5.11 5.31 5.55 5.84
18 2.97 3.61 4.00 4.28 4.49 4.67 4.82 4.96 5.07 5.27 5.50 5.79
19 2.96 3.59 3.98 4.25 4.47 4.65 4.79 4.92 5.04 5.23 5.46 5.75
20 2.95 3.58 3.96 4.23 4.45 4.62 4.77 4.90 5.01 5.20 5.43 5.71
24 2.92 3.53 3.90 4.17 4.37 4.54 4.68 4.81 4.92 5.10 5.32 5.59
30 2.89 3.49 3.84 4.10 4.30 4.46 4.60 4.72 4.83 5.00 5.21 5.48
40 2.86 3.44 3.79 4.04 4.23 4.39 4.52 4.63 4.74 4.91 5.11 5.36
60 2.83 3.40 3.74 3.98 4.16 4.31 4.44 4.55 4.65 4.81 5.00 5.24
120 2.80 3.36 3.69 3.92 4.10 4.24 4.36 4.48 4.56 4.72 4.90 5.13
∞ 2.77 3.31 3.63 3.86 4.03 4.17 4.29 4.39 4.47 4.62 4.80 5.01
f t
Scheffé's Method of Paired Comparison
Modified method by UraScheffé's Method of Paired Comparison
Modified method by UraStep 4: Example of a simple multiple comparisons.
Original method and three modified methods
All subjects must evaluate all pairs.
no yes
yes original( 原法 , 1952)
Ura's variation(浦の変法 , 1956)
no Haga's variation(芳賀の変法)
Nakaya's variation(中屋の変法 , 1970)or
der
effe
ct
Scheffé's Method of Paired Comparison
Modified methods by Ura and NakayaScheffé's Method of Paired Comparison
Modified methods by Ura and Nakaya
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
Scheffé's Method of Paired Comparison
Modified method by NakayaScheffé's Method of Paired Comparison
Modified method by NakayaPairwise comparisons for objects that can be compared without order effect.
Scheffé's Method of Paired Comparison
Modified method by NakayaScheffé's Method of Paired Comparison
Modified method by Nakaya1. Ask N human subjects to evaluate t objects in 3, 5 or 7 grades. 2. Assign [-1, +1], [-2, +2] or [-3, +3] for these grades, respectively.3. Then, start calculation (see other material).
O1 O2 O3 O4 O5 O6
A1 - A2 2 3 3 2 0 1
A1 - A3 2 0 0 1 1 0
A2 - A3 -3 -2 -1 -1 -3 -2
Six human subjects (N = 6)
Pai
red
com
paris
ons
for
t=3
obje
cts.
Questionnaire
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
evenbetter betterslightly better
slightly better
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
Scheffé's Method of Paired Comparison
Modified method by NakayaScheffé's Method of Paired Comparison
Modified method by NakayaStep 1: Make paired comparison table of each human subject.
: evaluation value when the l-th human subject compares the i-th object with the j-th object. ijlx
ii xtN
1̂Average of four objects
where t: # of object (3) N: # of human subjects (6)
Scheffé's Method of Paired Comparison
Modified method by NakayaScheffé's Method of Paired Comparison
Modified method by NakayaStep 2: Make a table summing all subjects' data and calculate the average evaluations for all objects.
ii
l iliB
ii
xtN
S
Sxt
S
xtN
S
2
2
..
2.)(
..
1
1
1
SF
SSSSS
Sxt
S
BT
l iliB
of varianceUnbariased
varianceUnbiased
1
)(
2.)(
There are significant difference among A1 - A3
Scheffé's Method of Paired Comparison
Modified method by NakayaScheffé's Method of Paired Comparison
Modified method by NakayaStep 3: Make a ANOVA table.
ANOVA table.
Scheffé's Method of Paired Comparison
Modified method by NakayaScheffé's Method of Paired Comparison
Modified method by NakayaStep 4: Apply multiple comparisons.
Q1: Where is significant among A1, A2, and A3?
A1: Apply multiple comparisons between all pairs among columns.(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each has different characteristics.)
ANOVA table.
Scheffé's Method of Paired Comparison
Modified method by NakayaScheffé's Method of Paired Comparison
Modified method by NakayaStep 4: Apply multiple comparisons between all pairs and find which distance is significant.
Example of a simple multiple comparison.• Calculate a studentized yardstick • When a difference of average > a studentized
yardstick, the distance is significant.
(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each has different characteristics.)
1980.263/79.197.6
4506.163/79.160.4
01.0
05.0
Y
Y
Scheffé's Method of Paired Comparison
Modified method by NakayaScheffé's Method of Paired Comparison
Modified method by NakayaStep 4: Example of a simple multiple comparisons.
tNftqY /ˆ),( 2 (studentized yardstick)
),,ˆ( 2 Ntwhere are an unbiased variance of Sε, the # of objects, and the #ofhuman subjects; is a studentized range obtained is a statistical test table for t, the degree of freedom of Sε ( ), and the significant level of φ; see these variables in an ANOVA table.
),( ftqf
(See in the next slide.))5,3(05.0q
Studentized yardstick ),(05.0 ftq 2 3 4 5 6 7 8 9 10 12 15 20
1 18.0 27.0 32.8 37.1 40.4 43.1 45.4 47.4 49.1 52.0 55.4 59.6
2 6.09 8.30 9.80 10.9 11.7 12.4 13.0 13.5 14.0 14.7 15.7 16.8
3 4.50 5.91 6.82 7.50 8.04 8.48 8.85 9.18 9.46 9.95 10.5 11.2
4 3.93 5.04 5.76 6.29 6.71 7.05 7.35 7.60 7.83 8.21 8.66 9.23
5 3.64 4.60 5.22 5.67 6.03 6.33 6.58 6.80 6.99 7.32 7.72 8.21
6 3.46 4.34 4.90 5.31 5.63 5.89 6.12 6.32 6.49 6.79 7.14 7.59
7 3.34 4.16 4.68 5.06 5.36 5.61 5.82 6.00 6.16 6.43 6.76 7.17
8 3.26 4.04 4.53 4.89 5.17 5.40 5.60 5.77 5.92 6.18 6.48 6.87
9 3.20 3.95 4.42 4.76 5.02 5.24 5.43 5.60 5.74 5.98 6.28 6.64
10 3.15 3.88 4.33 4.65 4.91 5.12 5.30 5.46 5.60 5.83 6.11 6.47
11 3.11 3.82 4.26 4.57 4.82 5.03 5.20 5.35 5.49 5.71 5.99 6.33
12 3.08 3.77 4.20 4.51 4.75 4.95 5.12 5.27 5.40 5.62 5.88 6.21
13 3.06 3.73 4.15 4.45 4.69 4.88 5.05 5.19 5.32 5.53 5.79 6.11
14 3.03 3.70 4.11 4.41 4.67 4.83 4.99 5.10 5.25 5.46 5.72 6.03
15 3.01 3.67 4.08 4.37 4.60 4.78 4.94 5.08 5.20 5.40 5.66 5.96
16 3.00 3.65 4.05 4.33 4.56 4.74 4.90 5.03 5.15 5.35 5.59 5.90
17 2.98 3.63 4.02 4.30 4.52 4.71 4.86 4.99 5.11 5.31 5.55 5.84
18 2.97 3.61 4.00 4.28 4.49 4.67 4.82 4.96 5.07 5.27 5.50 5.79
19 2.96 3.59 3.98 4.25 4.47 4.65 4.79 4.92 5.04 5.23 5.46 5.75
20 2.95 3.58 3.96 4.23 4.45 4.62 4.77 4.90 5.01 5.20 5.43 5.71
24 2.92 3.53 3.90 4.17 4.37 4.54 4.68 4.81 4.92 5.10 5.32 5.59
30 2.89 3.49 3.84 4.10 4.30 4.46 4.60 4.72 4.83 5.00 5.21 5.48
40 2.86 3.44 3.79 4.04 4.23 4.39 4.52 4.63 4.74 4.91 5.11 5.36
60 2.83 3.40 3.74 3.98 4.16 4.31 4.44 4.55 4.65 4.81 5.00 5.24
120 2.80 3.36 3.69 3.92 4.10 4.24 4.36 4.48 4.56 4.72 4.90 5.13
∞ 2.77 3.31 3.63 3.86 4.03 4.17 4.29 4.39 4.47 4.62 4.80 5.01
f t
SUMMARYSUMMARY
2 groups n groups (n > 2)
datadistribution
・ unpaired t -test
・ sign test・ Wilcoxon signed-ranks test ・ Friedman test
・ Kruskal-Wallis test
・ one-way ANOVA
・ two-way ANOVA
(no
no
rma
lity) one-way
data
two-waydata
Pa
ram
etr
ic T
est
No
n-p
ara
me
tric
Te
st(n
orm
alit
y)
unp
aire
d(in
depe
nden
t)pa
ired
(rel
ated
)un
pai
red
(inde
pend
ent)
pair
ed(r
elat
ed)
・ paired t -test
・ Mann-Whitney U-test
AN
OV
A(A
naly
sis
of V
aria
nce)+
Scheffé's method of paired comparison for Human Subjective Tests
1. We overview which statistical test we should use for which case.
2. We can appeal the effectiveness of our experiments with correct use of statistical tests.