다변량 분석이란 - wolfpack.hnu.ac.krwolfpack.hnu.ac.kr/2014_fall/LN_MDA_SAS 2014f.pdf · 데이터를 정리하거나 통계 ... 11 12 1p x ... x... x ... 경험적 타당성에

  • Upload
    vonga

  • View
    297

  • Download
    39

Embed Size (px)

Citation preview

  • CHAPTER 1

    1) (casual relationship) , (

    : Multiple Regression, : Multivariate ANOVA) 2)

    (reduction) (classification)

    .

    (large), (complicate & complex) .

    . .

    (1)

    , , , 4

    . 4 ()

    (), () () .

    (common entity).

    . (

    ) .

    4 (, , , ) 2 (, )

    . . 4

    .

  • Chapter1.

    2

    (2)

    100 IQ, , , , , ,

    . 100

    . .

    .

    (3)

    (, , ) (, , )

    , .

    .

    ,

    (confirmatory)

    , ((exploratory) .

    .

    1.1.

    1.1.1.

    (Statistics is about data) .

    (collection), (summarization), (analysis), (presentation)

    .

    (: population)

    (sample) (: , , ) (, : IQ, , ,

    , ) . .

    (: ,

    ), (: ),

    (: IQ, , ), (: ,

  • 3

    ) .

    .

    (1) (2)

    .

    .

    . 30

    . , ,

    .

    ,

    (data matrix) .

    . n p (, ,

    , , , ) .

    . () () .

    ijx

    .

    npn21

    2p2221

    1p1211

    x... x...

    x... x

    x... x

    nx

    xx

    X

    1var 2var 3var pvar

    1 1obs

    X11

    180

    X12

    82

    X13

    Married

    X1P

    2 2obs

    X21

    163

    X22

    56

    X23

    Single

    X2P

    n nobs

    Xn1

    173

    Xn2

    75

    Xn3

    Single

    Xnp

  • Chapter1.

    4

    . ix

    p .

    px

    xx

    x2

    1

    ~ nj

    x

    xx

    x

    pj

    j

    j

    j ,,2,1,2

    1

    ~

    1.1.2.

    , .

    , , ,

    . (history)

    (trail) . ,

    .

    .

    1.1.3.

    () (random experiment,

    ) (sample space, S ) (element)

    .

    . .

    (1)(continuous)

  • 5

    .

    , ,

    .

    (2)(discrete)

    , , , .

    (1) (metric, measurable, quantitative)

    , , , IQ, ,

    .

    . (: )

    (2), (Non-metric, categorical, classified)

    , , , (, , )

    . .

    (nominal)

    (, ), (, ) .

    (ordinal)

    (A>B>C>D>E) .

    (time series)

    (Cross-section: ) . (,

    ) ( ) .

    (casual relationship) ( ) (exploratory)

    (independent) ( )

    (dependent) (response) . Y , X

  • Chapter1.

    6

    . .

    ,

    .

    .

    1.2.

    . 1.2.2

    .

    1.2.1.

    (1) (Multiple Regression)

    , (

    .) 2

    (Multivariate Regression),

    (Simultaneous Equation Regression) .

    ||||

    (: ) ,

    , , . 1) ,

    , , ( ) 2)

    ( ) ( ) 3)

    ( ) .

    (2) (Logistic Regression)

    (binary, dichotomous)

    . .

    ||||

  • 7

    , ,

    (: ),

    .

    (3) (ANOVA: Analysis Of Variance)

    .

    ||||

    (: ) ( , , ), (,

    , ), (30 , 40 , 50 )

    .

    (4) (Multivariate ANOVA)

    2

    .

    ||||

    () , , (, , ),

    (/)

    .

    1.2.2.

    (variable directed

    technique) .

    (component), (factor), (canonical variate) .

    (1) (Principal Component Analysis: PCA)

    (2) (Factor Analysis: FA)

    (3) (Canonical Analysis: CA)

  • Chapter1.

    8

    .

    1.2.3.

    (Individual Directed) .

    (1) (Cluster Analysis: CA): Multi-Dimensional Scaling (MDS: )

    (2) (Discriminant Analysis: DA): (Canonical Discriminant

    Analysis: CDA), (Logistic Discriminant Analysis: LDA)

    .

    1.3.

    . 30

    ( 15, 15) , , , .

    1.3.1.

    (abnormality) (outliers:

    ) (

    ) () .

    .

    . p )( pk

    .

    , , , 2~3

    .

  • 9

    1.3.2.

    . (, , , )

    ( .) .

    , ( .)

    , .

    ( )

    (variable-directed technique)

    .

    .

    1.3.3.

    2 (

    ) . ,

    , ,

    (, , , ) .

    () ,

    , , .

    Logistic Logistic .

    30 (, , , )

    , , ,

    () .

    1.3.4.

    .

    .

    (2 ) (MDS: Multi-Dimensional Scaling)

    .

  • Chapter1.

    10

    ,

    . 30 (, )

    . , , , 30 .

    .

    1.3.5.

    2 . ,

    ( 3 ) , , (, , ),

    (, ) . ,

    , 3 (ANOVA)

    .

    (1) , , ( , )

    .

    (2)1 . k 1 ( )

    . ()

    k)1(1 . .

    (false

    significant) 1

    .

    1.3.6.

    , .

    .

    .

  • 11

    S D N N N S

    D S N S N N

    Yes Yes No No No Yes

    No No Yes Yes No No

    P P R R Yes No

    P P N N N D

    D P N N N N

    N: , P: , R: , S: , D:

  • CHAPTER 3.

    .

    (box-whisker plot) - (stem and leaf

    plot) , , .

    (W-), (, ) ( )

    . 2

    (scatter plot) .

    .

    3 ? 3

    4

    . .

    . )3(p 2

    .

    2 ( ) .

    3.1.

    p ix

    .

  • Chapter3.

    36

    p

    p

    x

    xx

    x2

    1

    p (Multivariate

    Normal Distribution) , ) ,(~ ppppp Nx

    )]()(2/1exp[||)2(

    1),;( 12/12/

    xxxfpx

    pp

    p

    xE

    xExE

    2

    1

    2

    1

    )(

    )()(

    ,

    pppp

    p

    p

    xxExCov

    21

    22221

    11211

    ))(()(

    ))((),( jjiijiij xxExxCov jifor

    2)()(),( iiiiiii xExVarxxCov

    jiforjifor iijjii

    ijij 1or

    pppp

    p

    p

    R

    21

    22221

    11211

    ( j ) jx , ( jj ) jjs .

    2

    2

    1

    ,

    2221

    1211

    .

  • 37

    )]()(2/1exp[||)2(

    1),;( 12/12/2

    xxxf px ,

    2

    1

    2

    12 )(

    )(

    xExE

    ,

    2221

    1211

    3.2.

    3.2.1.

    2222 )()(

    ))((

    )1/()()1/()(

    )1/())(()var()var(

    ),cov(

    YYXX

    YYXX

    nYYnXX

    nYYXXYX

    YXr

    9.0r 9.0r

    1.0r 1.0r

  • Chapter3.

    38

    (linear association) .

    (1)1 1 .

    (2)1 . ()

    ().

    (3)1 . ()

    ().

    (4)0 .

    (comparable)

    .

    , .

    3.2.2.

    0:0 H

    (1)

    )2(~)2/()1( 2

    ntnr

    rT where 22 )()(

    )])([(

    YYXXE

    YYXXEr

    ( )

    (2) 0: 00 H

    )3

    1,11ln5.0(~

    11ln5.0

    n

    Normalrrz

    app

    .

    (3)

    31

    11ln5.0 2/

    n

    zz ),( UL .

    )1/()1( ),1/()1( 2222 LLUU eeUeeL

  • 39

    527.0 ,50 rn . 586.0527.01527.01ln5.0

    11ln5.0

    rrz 95%

    )872.0,3.0( UL .

    3.2.3.

    0

    . 0

    ( 1212 4.0100 XXX ) .

    (p-) .

    .

    (1) control 0.9 .

    (2) control 0.7 .

    (3) 20-30 0.6 .

    (4) (1-5 ,

    )

    . ( SSTSSRR /2 ) .

  • Chapter3.

    40

    . Spearman , Kendall's Tau

    .

    3.2.4. SAS

    delimiter=0.9x MISSOVER DSD Tab

    .

    (1)NOPRINT output . NOPRINT

    (, ) , ( p -) .

    NOSIMPLE .

    (2)OUTP=OUT1 OUT1 SAS data

    . .

    3.3.

    3.3.1.

  • 41

    A 48 10 15

    6 . [APPLICANT.TXT]/ [Applied

    Multivariate Methods for Data Analysts, Dallas E. Johnson, p. 101]

    ID( ) Letter( 1X ) Appearance( 2X )

    Academic Ability( 3X ) Likeability( 4X ) Self-Confidence( 5X )

    Lucidity( 6X ) Honest( 7X ) Experience( 9X )

    Drive( 10X ) Ambition( 11X ) Potential( 13X )

    Keenness to Join( 14X ) Suitability( 15X )

    Salesmanship( 8X ) Grasp Concept( 12X )

    3.3.2.

    15 15/)( AAAPLAVG 6

    . .

    (1/15)

    .

    HOMEWORK#2-2 . .

    /* */ , /* */ mean

    .

  • Chapter3.

    42

    3.3.3.

    ( ) (weight)

    . ( SUwAPwLwAvg 1521 ... , where i

    iw 1)

    .

    A , , ,

    , . 5 2

    . 5 2

    20.

    3.3.4.

    .

    (grouping)

    . 15 .

    .

    . ,

    .

  • 43

    Group 1 5X , 6X , 8X , 10X , 11X , 12X , 13X

    Group 2 1X , 9X , 15X

    Group 3 4X , 7X , 14X

    Group 4 2X

    Group 5 3X

    15 (, ) 5

    . group 1 ( ) 7

    1

    .

    5/]14...3/)1591(7/)1365[( XXXXXXXAVGw

    3.4.

    3.4.1.

    (scatter

    plot). 3 .

    (scatter plot matrix) . SAS SAS/INSIGHT

    .

  • Chapter3.

    44

    . CTRL

    . .

    .

  • 45

    .

    .

    3.4.2. Bubble

    2 3

    Bubble ( blob ) . 2

    . GC y , PO

    x AM Bubble plot .

    . 5X ( y ) 6X ( x )

    . 5X 6X

  • Chapter3.

    46

    ( 5X 6X . =0.80755) . 5X 8X

    5X ( y )

    ( 79963.0r ) . 6X 8X 6X ( x )

    ( 81802.0r ) .

    Chernoff , , , , , , ,

    3

    .

  • 47

    [EXERCISE]

    (1) . [CLASS.txt]

    , . (=0.05)

    95% .

    (2) . [APPLICANT.txt]

    6 .

    6

    .

    6 .

    (3) 50 15 . [POLICE.txt]

    ID: REACT:

    HEIGHT (cm) WEIGHT (kg)

    SHLDR: (cm) PELVIC: (cm)

    CHEST: (cm) THIGH: (mm)

    PULSE: DIAST:

    CHNUP: BREATH: (liter)

    RECVR: (treadmill) 5

    SPEED:

    ENDUR: ()

    FAT:

    15 .

    , , Bubble plot .

  • CHAPTER 4

    ? .

    . PCA(Principle Component Analysis: )

    ? . ,

    , ,

    (, ) .

    ( Big & Tall) .

    2

    .

    .

    3 ( 15 )

    . 15

    .

    , , , 30

    . 30

    . 1-

    2 .

    (1) 3p 1-2 ( 3

    .) (2) ( )

    . p p

    () 1-2 () .

    . 80% .

  • Chapter 4.

    50

    4.1.

    4.1.1.

    19 (: pound) IQ .

    IQ ( ) .

    IQ

    .

    .

    .[2 SAS/IML ]

  • 51

    69.5181 18.12

    . () .

    (, IQ) ()

    (99.77%) .

    4.1.2.

    (1)

    - , -

    .

    . 3

    ? 3 Bubble

    3

    . 3p (1-2 )

    .

    (2)

    () 1-2 .

    4.1.1 IQ () ,

    . 2

    3p

    .

  • Chapter 4.

    52

    (3)

    .

    -

    .

    () .

    .

    (4)

    (multicollinearity)

    ( 1)( XX

    0|| XX 12 )()( XXMSEbs )

    .

    (Ridge Regression:

    )

    . (

    ) .

    ..

    4.2.

    (principal components) .

    (1) . ()

    (2) (, ) 2, 3,

    .

    p

    px

    xx

    x

    2

    1

    .

  • 53

    py

    yy

    y

    2

    1

    xLy

    pppp

    p

    p

    lll

    lll

    lll

    L

    11

    22221

    12111

    ( )

    .

    .

    .

    4.2.1.

    p

    .

    . ?

    px

    xx

    x

    2

    1

    pppp

    p

    p

    ... ...

    ...

    ...

    21

    22221

    11211

    .

    p ...21 ( i )

    ie xey ii '

    iiii eeYVar ')( ,

    kiforeeYYCov kiki ,0),('

    .

    iX .

    P .

    p

    ii

    p

    ii trPPtrPPtrtrxVar

    11

    )()()()()( .

  • Chapter 4.

    54

    k ppkkkk xexexexey k ...2211'

    p

    ii

    k

    1

    .

    4.2.2.

    (1) (first principal component)

    11'1 aa 1a )('1 xa )('( 1 xaV 1a

    )('11 xay .

    p

    2

    1

    i

    ix .

    (2) (second principal component)

    12'2 aa , 02

    '1 aa ( .

    ) )('2 xa 2a

    )('22 xay .

    (3) (third principal component)

    13'3 aa , 03

    '1 aa , 03

    '2 aa ( , )

    )('3 xa 3a )('33 xay

    .

  • 55

    ( pyyy , , , 21 ) .

    ( ).

    .

    1

    '

    '2

    '1

    2

    1

    p

    pp

    x

    a

    a

    a

    y

    yy

    y

    (4) ja ?

    - p ..21

    peee ,..., , 21 .

    pp eaeaea ..., , , 2211

    jiee ji ,1 , jiee ji ,0 .

    .

    jy j .

    - trace )(tr pxxx ,...,,, 21 () .

    ppi

    ixVtr ...)()( 2211 jy

    p

    jjj

    1/ .

    (5)

    - -

    S . - )( S

    . (, IQ)

    S .

  • Chapter 4.

    56

    , .

    ,

    (518.65+1.228) .

    1 1)999958.0()009142.0( 22 0 .

    .

    4.3.

    4.3.1.

    . ()

    . r j .

    )(' rjrj xey , rx r- ( nr ,...,2,1 )

    (IQ, Weight)

    68.1192.100 x .

  • 57

    j () 110(pound) IQ=125 j

    jjjY )68.119125(009142.0)02.100110(999958.01

    jjjY )68.119125(999958.0)02.100110(009142.02 .

    . SAS OUT=

    . (4.5 )

    4.3.2.

    je jjj ec , pj ,...,2,1

    (component loading vector) .

    .

    .

    .

    ,

    .

    .

    0.99 , IQ 0.009

    . IQ

    .

    .

    (1)

    .

    ,

    .(4.5 )

    (2)

    .

  • Chapter 4.

    58

    4.4.

    . .

    () () 100% .

    (, IQ) IQ 2

    () 1 . 4.1

    y (2 1 )

    115 3 . IQ

    3

    . .

    (

    80%) 2 .

    .

    4.4.1.

    k

    p

    jjk

    1/ .

    ( 1 ) , ,2

    . 80%

    ?

    9.0...)(

    )(

    7.0 21

    StrStr

    2-3 90%,

    5-6 70% .

    1 80% 1

    .

  • 59

    99.8%

    2 .

    . .

    4.4.2. SCREE plot

    ),,2( ),,1( 21 SCREE 0

    . 9 SCREE plot .

    4 0 3 .

    0

    10

    20

    30

    40

    50

    60

    70

    0 2 4 6 8 10

    eigen

    518.69 1.18

    1 . SCREE

    plot . 4.5 80%

    1 .

    4.5.

  • Chapter 4.

    60

    .

    ,

    .

    ,

    .

    (, ) ,

    .

    . (:

    ) ,

    .

    .

    pppp

    pp

    pp

    p

    pp

    p

    pp

    p

    pp

    p

    R

    ...

    ...

    ...

    22

    2

    11

    1

    22

    2

    2222

    22

    1122

    12

    11

    1

    2211

    21

    1111

    11

    1 ...

    ... 1

    ... 1

    21

    221

    112

    pp

    p

    p

    R

    1 ...

    ... 1

    ... 1

    21

    221

    112

    pp

    p

    p

    rr

    rr

    rr

    R

    - - S

    R R .

    - S

    .

    (1) S R .

    (2) 1 80% .

    (3) .

  • 61

    R

    p

    ikjj Str

    1/)(/ pj /

    4.6.

    4.6.1. 2

    .

    (1)IQ(y ) (x ) .

    (2)IQ .

    (3) . 1=545.25 2=16.27

    (4) , (0.0666, 0.998)

    (0.998, -0.0666). () y1 y , y2 x

    .

    (5)(1) (4) .

    SAS & (4)

    (, score) . )( x

    x .

  • Chapter 4.

    62

    (2)

    y1 y2 0 .

    (3)

    (1), (5)

  • 63

    , .

    ( ) (, IQ)

    545.25/(545.25+16.27)=97(%) .

    y (IQ) x () .

    4.6.2.

    48 .

    2~3

    . [APPLICANT.txt]

    15

    . 15 6 ( )

    . 15 1~2 (

  • Chapter 4.

    64

    )

    .

    DATA APPLICANT; INFILE "C:\TEMP\APPLICANT.TXT"; INPUT ID L AP AA LI SC LC HO SM EX DR AM GC PO KJ SU; RUN; PROC PRINCOMP DATA=APPLICANT OUT=SCORE COVARIANCE; VAR L--SU; RUN; PROC PRINT DATA=SCORE; VAR PRIN1-PRIN15; RUN;

    (1)OUT SCORE SAS . SAS

    PRIN1( ), PRIN2, .

    SCORE SAS . SCORE

    . 15

    4.3040, 0.3819, .

    )( x .

    (2)COVARIANCE(COV) -

    . Default ( R ) .

    (3)VAR L--SU . VAR L AP AA SU;

    . .

  • 65

    .

    p .

    15.7 1111 s , 26.1 21122112 ss , S

    ...8.6,6.10,2.18,5.66 4321

  • Chapter 4.

    66

    (15x15) (66.53+18.18++0.30) 15

    (122.53 = ..95.387.315.715

    1

    iiis ) .

    ( ii / ) . Difference

    . ( S ) 66.53

    54%(=66.54/122.54) . 48.36 (66.54-18.18) .

    Cumulative = . 4

    83% 15 4 .

    80%

    eigen-value 1 . 10

    1~2 80% .

    ( 69%)

    . 6

    .

    PROC FACTOR DATA=APPLICANT SCREE COVARIANCE; VAR L--SU; RUN;

  • 67

    SCREE plot

    54% .

    1-10

    . 1~2 80%

    .

    15152211 ,...,, eaeaea .

    r () j )(' rjrj xey , ( rx r -

    ). OUT=SCORE SAS score

    PROC PRINT .

    (4.304)

    )(*2745.0...)(*0296.0)(*132.0)(*149.0 SUSUAAAAAPAPLL

    4304.0)96.510(*2745.0...)08.72(*0296.0)08.77(*132.0)66(*149.0

  • Chapter 4.

    68

    4.6.3.

    , .

    . (

    ) .

    .

    DATA APPLICANT; INFILE "D:\TEMP\APPLICANT.TXT"; INPUT ID L AP AA LI SC LC HO SM EX DR AM GC PO KJ SU; RUN; PROC PRINCOMP DATA=APPLICANT OUT=SCORE1; VAR L--SU; RUN; PROC PRINT DATA=SCORE1; VAR PRIN1-PRIN15; RUN;

    [ ]

    . (L, AP) .

    8652.3149.72553.12388.0

    ( S ) .

    1 (4 ) 80% .

    i pi / .

    1 .

  • 69

    .

    .

    .

    .

    4.7.

    4.7.1.

  • Chapter 4.

    70

    ( )

    .

    .

    xL

    x

    xx

    e

    e

    e

    y

    yy

    y

    pp

    p

    .........

    2

    1

    '

    '2

    '1

    2

    1

    .

    ( ) ( )

    .

    .

    (index)

    . .

    . (

    80% ) 4 .

  • 71

    1 LC(), SM(), DR(), AM(),

    GC(), PO() 1 &

    .

    2 EX(), SU()

    .

    3 LI(), HO(), KJ()

    .

    4 AA(), KJ() .

    ? .

    .

    2 (,

    ) .

    4.7.2.

    .

    . (Prin1)

    (Prin2) .

    SAS data OUT1 . [OUTSTAT=OUT1]

    OUT .

    OUT1 SAS .

  • Chapter 4.

    72

    () _TYPE_ SCORE .

    (transpose) .

    4.6.2 PRINCOMP .

    Prin1 Prin2 _NMAE_ ID . SAS

    PLOTIT MACRO .

  • 73

    , .

    72 .

    . 3 4

    . %PLOTIT PRIN3 PRIN4

    .

  • Chapter 4.

    74

    4.7.3.

    (1)

    . (2.7.4 )

    . OUT=SCORE

    .

    PLOTIT macro . DATA=,

    LABELID=, PLOTVARS= .

  • 75

  • Chapter 4.

    76

    4 . 1 ( 1 2

    ), 2 ( 1 2 ), 3 , 4

    . 0 2

    . (80% ) 1

    (0 ) (0 ) .

    3 (10, 11, 12, 37, 38) (mild) (41,

    42) ( ) . (41, 42)

    () . . (10, 11, 12, 37, 38)

    .

  • 77

    (2)

    & ( 1) 6 .

    PRIN1 .

    .

    ( 2) 6 .

  • Chapter 4.

    78

    ( 3) .

    6 . 3

    .

  • 79

    4.8.

    ()

    . y , x xLy .

    . 1) ?

    . 2) ( L )

    ? (

    ) 0,1 '' jiii eeee

    . 3) ? 80%

    ( 1 )

    . . (

    p

    iii

    1

    / , pi / )

    (1)

    (multivariate normal dist)

    . (Why?

    ) ( ? Normal plot, Shapiro-Wilks W statistic)

    - (Box-whisker plot)

    .

    4

    . OUT SAS data

    SCORE . PROC UNIVARIATE

    (STEM-LEAF PLOT), -

    (BOX-WHISKER PLOT) .

  • Chapter 4.

    80

    W- .

    1 4 2, 3 .

    .

    SAS/INSIGHT Box-whisker plot . SCORE data

    .

  • 81

    SAS data

    .

    ( CTRL

    ) .

  • Chapter 4.

    82

    Box-whisker plot . Prin2

    Prin3 .

    bullet Prin2 (41, 42)

    . Prin2 (EX. SU)

    .

    (2)

    ( ipipiii eXXXY 22110 )

    () 0|| XX . ( 0|| XX : X

    .

    . ( jk aXX )

    0 .)

    )(||

    1)( 1 XXadjXX

    XX

    1)( XX .

  • 83

    YXXX 1)( , 12 )(

    XXMSEs

    ( ) t-

    .

    .

    (

    .)

    VIF(Variance Inflation Index) Condition Index ,

    10 .

    .

    (Ridge Regression:

    MSE(Mean Square of Error)

    ) .

    .

    .

    .

    ( )

    .

    . .

    ipipiii einpininY Pr2Pr1Pr 22110

    PRIN1, PRIN2 OUT= .

    .

    .

  • Chapter 4.

    84

    , .

    2~3 .

    (stepwise, backward, forward)

    . ,

    .

    .

    ( y )

    .

    4.9.

    .

    (1)

    (covariance

    matrix) . (: kg pound, :

    : ) (correlation matrix)

    .

    (2) / , ]

    DATA APPLICANT;

    INFILE "C:\TEMP\APPLICANT.TXT"; INPUT ID X1-X20; RUN; PROC PRINCOMP DATA=APPLICANT OUT=SCORE OUTSTAT=OUT1 COVARIANCE; VAR X1-X20;

    RUN;

  • 85

    (3)

    80%,

    1 . xLy L .

    OUT1 . PRIN1, PRIN2 .

    (4)

    ()

    . .

    (5)

    xLy x

    y OUT=SCORE SCORE . PRIN1, PRIN2,

    .

    PRIN1, PRIN2, ( PRIN3)

    .

    , PRIN1, PRIN2, ( PRIN3) ,

    - .

    ()

    . PRIN1(, ), PRIN2(, ) 4

    .

  • Chapter 4.

    86

    PRIN1, PRIN2

    .

    SAS data

  • 87

    [EXERCISE]

    (1)1990 (MMH) 25

    7 (ADORMS) . [Applied Multivariate

    Methods for Data Analysts, Dallas E. Johnson, 1998, p94] NAVY.txt(

    )

    7 MMH .

    .

    ( 1).

    ( ).

    (

    2). . ?

    ?

    .

    2)( ii yy .

    (2) 1994 BIG8 8 football .

    [http://lib.stat.cmu.edu/DASL]]BIG8.TXT

    , , Rushing (RO_YDS), Rushing , Passing ,

    Passing , , (TD_YDS), (Scoring Offence),

    (SD), (Turn-over margin per game), (WINS)

  • Chapter 4.

    88

    6 .

    - ? ?

    .

    .

    .

    4) .

    .

    (ID=)

    ? ?

    1 , 2 ,

    . . .

  • CHAPTER 5

    5.1.

    (FA: Factor Analysis )

    Galton( )

    (1888) Spearman(1904 )

    .

    (1) ( ) (2)

    ( ) .

    ,

    (Likert)

    . ( ) Cronbach .

    , , , ,

    , , , . 8

    ? . A 48

    15 () .

  • Chapter5.

    90

    20 (, , , )

    .

    5.1.1. Spearman (1904)

    Spearman 6

    .(,

    ) . (

    )

    Classic French English Math Discover Music

    Classic 1 .83 .78 .7 .66 .63

    French 1 .67 .67 .65 .57

    English 1 .64 .54 .51

    Math 1 .45 .51

    Discover 1 .4

    Music 1

    Spearman () (f:

    factor ) () .

    f j .

    f .

    66

    55

    44

    33

    22

    11

    cov

    fmusicferdis

    fmathfenglishffrenchfclassic

    ( fact )

    . ( factor1, factor2 ) 6 2

  • 91

    .

    ()

    () . (classic), (French), (English)

    (Math), (Discovery), (Music) .

    5.1.2.

    p 2~3

    p )( pm

    .

    .

    .

    .

    px

    xx

    x

    2

    1

    , R .

  • Chapter5.

    92

    XLY ijl

    .

    FLX ijl loading ()

    pppppp

    pp

    pp

    xlxlxly

    xlxlxly

    xlxlxly

    ......

    ...

    ...

    2211

    22221212

    12121111

    ppppppp

    pp

    pp

    flflflx

    flflflx

    flflflx

    ......

    ...

    ...

    2211

    222221212

    112121111

    ij =

    .

    -

    .

    ijl . ,

    . ( ) )( jiij el

    ( ) )( jiiij el

    ijl

    ijl

    .

    .

    5.2.

    5.2.1.

    (1) () .

    (2) () .

    (3) .

  • 93

    (4) .

    5.2.2.

    p ),...,,( 21 pxxxx , -

    .

    fLx

    pmpmpp

    m

    m

    p f

    ff

    lll

    llllll

    x

    xx

    ...... ...

    ... ... ...

    ...2

    1

    2

    1

    21

    22221

    11211

    2

    1

    (1) mk fff ..., , , 2 (: common factor)

    (2) ijl ( : factor loading)

    i j L (factor loading

    matrix) .

    (3) p ,...,, 21 ( : specific factor)

    j j

    L (factor loading matrix) . .

    (1) kf 0, 1 . ( mk ,..,2,1 )

    ),0(~ If

    (2) j 0, j . ( pj ,..,2,1 )

    ),..,,(,0(~ 21 pdiag , .

    (3) kf j . 0),( fCov

  • Chapter5.

    94

    5.2.3.

    fLx

    (1) LLLfLCovfLCovxCov )()()( ),..,,(,0(~ 21 pdiag )

    ( pxxx ,...,, 21 ) .

    0 .

    (2) j jm

    kjkjjmjjjjj llllxVar

    1

    2222

    21 ...)(

    m

    kjkl

    1

    2 (communality) j (specific) .

    m

    kjkl

    1

    2 ix (common factor) .

    1 11

    2

    jm

    kjkl

    .

    (3) i j

    m

    kjkikji llxx

    1),cov( jx (j )

    kf ( k ) ),cov( kj fx jkl .

    5.3.

    5.3.1.

    ( R ) . ( )

    (1) R LLR ,L .

    P '**))(( PPLPLPRLLIR

    L .

    (2) L , )( ppm P

    2/)1( pp .

  • 95

    3 6 (

    3 , 3 ) 9 . pm LL

    0 .

    (3) )( pm L ?

    (factor rotation) . (: SAS

    ROTATE=VARIMAX )

    5.3.2.

    principal factoring w/ or w/o iteration, Raos canonical

    factoring, alpha factoring, image factoring, maximum likelihood, un-weighted least square factor

    analysis, Harris factoring .

    / . (principal factoring

    w/ or w/o iteration) .

    (1) (Principal Component (factor) method)

    R ,

    p ...21 , peee ,...,, 21 . ''

    222'111 ppp eeeeee

    .

    LL

    e

    e

    e

    eee

    pp

    pp

    '

    '22

    '11

    2211 ]|||[

    pm ),(,),,(),,( 2211 pp eee L

    .

    LL

    e

    e

    e

    eee

    pmm

    mm

    0 0 ---

    0 00 0

    ]|||[ 11

    '

    '22

    '11

    2211

    ,

    m

    iijiii ls

    1

    2

  • Chapter5.

    96

    ppssstr 2211)(

    pRtr )( . j

    jjjjjjpjj eelll '22

    221 . j

    .

    : pp

    isss 2211

    , : pi

    . pyyy ,,, 21

    11)( yVar , 22 )( yVar ppyVar )(

    .

    111 / yf , 222 / yf , , ppp yf /

    () 1 75

    . . fLx

    ppppppp

    ppp

    ppp

    fefefex

    fefefex

    fefefex

    ...

    :

    ...

    ...

    222111

    2222212112

    1212211111

    0

    0)...( 22221

    2 jmjjjjj lll . )( pm

    .

    (2) (Maximum Likelihood Method)

    MLE L f

    . jf j jx

    ,L ,

    )|,()|,(max,

    xLLLxLL

  • 97

    MLE . L

    0 1 . MLE

    1

    Heywood . . Heywood

    .

    5.4.

    principal factoring (factor)

    . i

    ii

    yf

    .

    (S) ,

    .

    , . ,

    , .

    15 ( ) .

    SAS (default) principal factoring

    . Maximum

    Likelihood . PROC FACTOR DATA=APPLICANT METHOD=ML;

    .

  • Chapter5.

    98

    ( x ) ( y )

    xLy .

    .

  • 99

    i j Loading() ijf = iji e . 1y

    ap ( 2131.05138.7 ) 1 ap . fLx L . (Loading)

    . () .

    .

    . 1

    ( : 1 )

    ( ) . (LC, SM, DR, AM, GC, PO)

    ? 1 ? (LC=, SM=, DR=,

    AM=, GC= , PO=) 1

    .

    (Marketing Ability)=(LC+SM+DR+AM+GC+PO)/6 ( )

    (LC, SM, DR, AM, GC, PO)

    . .

    5.5.

    (loading) ()

    .

  • Chapter5.

    100

    pmpmpp

    m

    m

    p f

    ff

    lll

    llllll

    x

    xx

    ...... ...

    ... ... ...

    ...2

    1

    2

    1

    21

    22221

    11211

    2

    1

    ( )

    . .

    (1)trivial . 1-2 .

    1-2 .

    (2)Kaiser ( )

    0 ( ) R I

    1

    1 .

    1 1

    . SAS . (5.4 4

    .)

    (3)SCREE

    SCREE ( 47)

    . 80% ()

    . APPLICAT

    Kaiser 4 7.512.05 1.46 1.19

    1 2 .

    (4)Large-sample Test( 2 -)

    MLE

    2 - . p , m

    .

    pppmmppp LLH :0

    factor1 factor2 Error

    Common factors

  • 101

    :

    (positive definite)

    nSnSn

    )1( 20 )(~]

    ||||ln[

    L max maxlnln2 app

    SnHunderLL

    n

    . 5.9 .

    5.6.

    ( )

    . (1)

    : 2 (2) 0

    .

    (rotate)

    QUARTIMAX rotation, OBLIQUE rotation, PROMAX rotation

    VARIMAX . VARIMAX Kaiser

    .

    m p

    LL L .

    .

    . , LL L

    LPL * ( P ) LL .

    ( 21, ff ) . 20o

    .

  • Chapter5.

    102

    1 2

    -0.5

    -0.25

    0

    0.25

    0.5

    0 0.5 1

    X1 0.55 0.43

    X2 0.56 0.29

    X3 0.39 0.45

    X4 0.74 -0.27

    X5 0.72 -0.21

    X6 0.59 -0.13

    5.7.

    5.7.1. [POLICE.txt]

    50 15 . [Applied Multivariate Methods for

    Data Analysts, Dallas E. Johnson, 1998, p160]

    ID: /REACT: /HEIGHT (cm) / WEIGHT (kg)

    SHLDR: (cm)/PELVIC: (cm)/CHEST: (cm)

    THIGH: (mm)/PULSE: /DIAST: /CHNUP:

    BREATH: (liter)/RECVR: (treadmill) 5

    SPEED:

    ENDUR: ()/FAT:

    5.7.2. SAS

  • 103

    principal factoring

    (default) VARIMAX . ( )

    . REACT FAT

    Maximum Likelihood .

    5.7.3.

  • Chapter5.

    104

    (1)PROC FACTOR COVARIANCE default

    .

    p ( 15 )

    1 .

    (2) ? Kaiser . SAS default

    1 .

    1

    .

    (3)(MINEIGEN= minimum eigen value) (NFACTOR=)

    default 1 .

    80% .

    . [ ] 1 5 .

    15 5

    .

    (4) ( ) 1

    . 5 .

    .

    5.7.4.

    pmpmpp

    m

    m

    p f

    ff

    lll

    llllll

    x

    xx

    ...... ...

    ... ... ...

    ...2

    1

    2

    1

    21

    22221

    11211

    2

    1

    pmpp

    m

    m

    lll

    llllll

    ... ...

    ... ...

    21

    22221

    11211

    .

    Principal factoring

    ( ie : ) i .

    . 1 5.21

    .

    15p . ( )

  • 105

    5.7.5.

    jm

    kjkjjmjjjjj llllxVar

    1

    2222

    21 ...)(

    m

    kjkl

    1

    2 (communality)

    j (specific) . 5

    5m REACT REACT 1~ 5(5

    ) . , 9009.090082.006168.012762.023649.011577.0 22222

    . .

    1(100%)

    . SPEED ENDUR

    5 ( 1 ) 80% . ENDUR

    5 5

    .

  • Chapter5.

    106

    5.7.6. (Rotate )

    ( p ) Kaiser 1

    ( m )

    specific .

    pmpmpp

    m

    m

    p f

    ff

    lll

    llllll

    x

    xx

    ...... ...

    ... ... ...

    ...2

    1

    2

    1

    21

    22221

    11211

    2

    1

    fLx fL, L

    f . L

    f

    .

    ( pxxx ,...,, 21 ) ( mfff ,,, 21 )

    .

    .

    () .

    .

    VARIMAX . REORDER

    .

  • 107

    (1)

    .

    (2)

    . WEIGTH 1(Factor1) 0.65 2

    0.68 2 . WEIGTH 1

    FAT, THIGH .

    (3)(FAT , THIGHP , CHEST , CHNUP )

    . . (obesity)

    . ?

    . .

    .

    . (FAT+THIGH+CHEST-CHNUP)/4 ( ) .

    .

    (4) ENDURE 0.38966 1

    ( 2~ 5)

    . ENDURE 1~ 5

  • Chapter5.

    108

    38% . ENDURE 1~ 5

    .

    (5)(HEIGHT , SHLDR , PELVIC , WEIGHT , BREATH

    ) . 1

    0.65 2 0.68

    1 . ( )

    (6)WEIGHT 1 0.65

    1 . 2 .

    (7)(PULSE , RECVR 5 )

    .

    (8)DIAST , REACT . 4-5

    . 3 2

    ? 82 1-2 15

    .

    .

    =(FAT , THIGHP , CHEST , CHNUP )

    =(HEIGHT , SHLDR , PELVIC , WEIGHT ,

    BREATH )

    (9) (FAT_INDEX), (BODY), 6

    8 . 15 2 ( ,

    , ) 8

    .

    .

    . CHNUP

    .

  • 109

    . .

    . ROTATE

    .

    5.7.7.

    OUTSTAT F_STAT SAS data

    .

    F_STAT _TYPE_ UNROTATE , PATTERN

    .

  • Chapter5.

    110

    SUBSET . _TYPE_=PATTERN : TEMP

    1 2 . 3 2 ,

    4 5 2 .

    MACRO PLOTIT . 1

    2 .

  • 111

    5.8. (Factor score)

    pxxx ,...,, 21

    .

    fLx

    pmpmpp

    m

    m

    p f

    ff

    lll

    llllll

    x

    xx

    ...... ...

    ... ... ...

    ...2

    1

    2

    1

    21

    22221

    11211

    2

    1

    f

    f .

    .

    ( : factor score) . .

    ( )

    .

  • Chapter5.

    112

    2

    .

    5.8.1. Bartletts Method (Weighted Least Square Method)

    r )( rr xz . )()(1

    rrrrfLzfLz

    f r . rr zLLLf111 )(

    5.8.2. Thompsons Method (Regression Method)

    I '

    ,00

    ~L

    LPNf

    z zPLzfE 1')|( rr zRLf

    1'

    5.8.3. SAS

    (1)Default Regression . SCORE

    .

    (2) . (NFACTORS=2) 2

    5 .

    .

    (3) 2 . 1 .8

    WEIGHT, FAT, CHEST

    .

  • 113

    (4) rr zRLf1' 1' RL (standardized scoring

    coefficient) . OUT

    SAS data .

    (5) SAS data OUT

    . SCORE1 SAS data

    PRINT procedure .

    NFACTOR .

    . SAS data SAS

  • Chapter5.

    114

    1 SAS data

    .

    (7) ?

    . ( 1, 2)

    21, yy .

    2 ( , ,

    ) .

    5.9. Comment

    5.9.1.

    1 ,

    . 1 5

    5 . EDURE 5

    .

    2 - .

    ML (Maximum Likelihood)

    .

    HEYWOOD 1

    . ML . ( )

  • 115

    (5 ) 5 .

    1 .

    ?

    5 2

    .

    ML ( 2 )

    . 2 . NFACTORS=3, 4

    5 Kaiser .

    .

    5.9.2.

    . .

    .

    POLICE 1 2 .

  • Chapter5.

    116

    WEIGHT 1 2 .

    1, 2

    .

    .

    . FAT_INDEX, BODY

    .

  • 117

    5.10.

    ? (Likert)

    . (index)

    . pxxx ,...,, 21

    . Q4-Q13

    . 10 2-3

    . 10 ()

    . 2

    .

    1

    .

    2 .

    (1) .

    (2) .

    5.10.1.

    .[CODING.TXT]

    Q41 ?

    7 6 5 4 3 2 1

    Q52 ?

    7 6 5 4 3 2 1

    Q63 ~

    7 6 5 4 3 2 1

    Q74 ~

    7 6 5 4 3 2 1

  • Chapter5.

    118

    Q85 ?

    7 6 5 4 3 2 1

    Q96 ?

    7 6 5 4 3 2 1

    Q107 ?

    7 6 5 4 3 2 1

    Q118 ?

    7 6 5 4 3 2 1

    Q129 ?

    7 6 5 4 3 2 1

    Q4-Q12 .

    (Likert) (4 , 5 , 7 )

    .

    . DATA = ~ VAR = ~ ~

    .

    ROTATE=VARIMAX VARIANCE

    .

    REORDER ()

    .

    COVARIANCE () .

    (COVARIANCE . default)

    .

  • 119

    (METHOD) default= .

    1 80%

    1, 2, 3, 4, 5

    .(53%)

    .

    0.6 ( ) () () .

    1(factor 1) Q5-Q8 , 2

    Q10-Q11 . Q4, Q12 2

    0.75 0.55 .

    ( : Q5, Q6, Q7, Q8), ( : Q10, Q11)

    .

    5.10.2.

    .

  • Chapter5.

    120

    Factor1 Factor2

    Q7 0.74 0.22

    Q8 0.70 -0.04

    Q6 0.64 0.20

    Q5 0.62 0.37

    Q9 0.56 0.49

    Q10 0.13 0.80

    Q11 0.22 0.75

    Q12 0.09 0.55

    Q4 0.51 0.53

    0.69 0.68

    . 1(factor 1) Q5-Q8

    2 Q10, Q11 .

    ( , , ) (

    . (1) (2) )

    .

    2 55% Q8

    1 Q4, Q6, Q8 .

    .

    .

    .

    ( ) . .

  • 121

    5.10.3.

    (index)

    (internal consistency)

    Cronbach alpha( ) . .

    (: observed value)

    (measurement error) . 0),cov( , ETETY .

    (reliability coefficient) .

    )var()var(

    )var()var()var(

    )var()var(),cov(

    ),(

    2

    22

    YT

    TYT

    TYTYTY

    , (

    ) Cronbach . p

    ),,2,1( pjETY jjj jjO TTYY 0, .

    )var(

    )var(

    11

    )var(

    ),cov(

    1

    O

    jj

    O

    jiji

    Y

    Y

    pp

    Y

    YY

    pp

    Cronbach 0 1 1 .

    ? 0.6 ? 0.7 ? .

    Cronbach ,

    .

    Cronbach ,

    .

    (Cronbach ) .

  • Chapter5.

    122

    CRONBACH

    .

    2 (binary, dichotomous (0,1)) Cronbach Kuder-

    Richardson 20 (KR-20) .

    Cronbach

    .

    NOCORR () .

    NOSIMPLE (, ) .

    ALPHA CRONBACH .

    . Q5-Q8 4

    0.69 . Q5 Q6-Q8

    0.59 .() Q8 Q5-Q7

  • 123

    0.68 . 4 () ( )

    0.68 .

    2 . (Q10, Q11)

    0.68 .

    5.10.4.

    (1)Q5. ? (5 )

    (2)Q6. ? (5 )

    (3)Q7. ? (5 )

    (4)Q8. ? (5 )

    4 Q5-Q8

    (factor analysis) .

  • Chapter5.

    124

    Q5, Q6, Q7

    .

    Q6 . Q6=1( ), ,

    5( ). ()

    .

    Q5, Q6, Q7 ( , ) .

    .

    Q6 . .

  • 125

  • Chapter5.

    126

    [EXERCISE]

    (1)

    Maximum Likelihood

    . .

    (2) 26 9 .

    [http://lib.stat.cmu.edu/DASL]JOB.txt

    9 .

    9 .

    9 .

    ( )

    .

    Country:

    Agr:

    Min:

    Man:

    PS:

    Con:

    SI:

    Fin:

    SPS:

    TC:

  • CHAPTER 6

    6.1.

    pxxx ,...,, 21 (

    )

    (Variable-directed techniques) . (Discriminant Analysis)

    (Clustering Analysis) ()

    (individual directed techniques) .

    npnn

    p

    p

    xxx

    xxx

    xxx

    ...

    ...

    ...

    21

    22221

    11211

    :

    () .

    :

    .

    : (, )

    , ( ) .

    : .

  • Chapter 6.

    128

    6.1.1.

    40 , (=/: ), , ,

    , ( ), ( ) .

    7 (distance) (similarity)

    .

    .

    .

    () 7

    ( , ) .

    .

    16 ( 9 , 7 )

    .

    (, ) ( )

    . ( 3

    . )

    . .

    ,

    . 2 ( )

    3 .

  • 129

    ( ) .

    .

    (:, :) 1)()

    2)

    . ( ) (1) (2)

    . (2)

    . (Fisher Linear Deterministic

    Function ) (1) .

    6.1.2.

    , ,

    () . (1)

    ?

    . (2) ,

    (misclassification) ?

    2 2 . (1)

    () (2)

    () .

    .

    .

    .

    6.2.

  • Chapter 6.

    130

    2 .

    1: ),(~ 111 pN , 2: ),(~ 222 pN

    1n , 2n p

    0x . .

    ()

    ( ) ( x ), - ( ) -( S ) .

    (pooled) - )2(

    )1()1()2(

    )1()1(21

    2211

    21

    2211

    nnSnSn

    nnnn

    .

    6.2.1.

    .

    (1)(Likelihood)

    ),:(),:( 220110 xLxL 1 ,

    ),:(),:( 220110 xLxL 2 .

    )]()'(2/1[2/12/

    1

    exp||)2(

    1),:(

    xxppxL

    (2) (Fishers Linear Discriminant Function)

    - (variance-covariance )

    (likelihood function) . 00' kxb (Linear

    Discriminant function) 1 , 2 .

    121

    ' )( b , )()')(2/1(21

    121

    k

  • 131

    (3)Mahalanobis

    - (variance-covariance ) ,

    . 21 dd 1 , 2

    .

    Mahalanobis Distance: )()'( 01

    0 iiixxd i =1,2.

    (4) (Posterior Probability)

    - 1

    )|()|( 0201 xPxP 1 , 2

    .

    ]2/1[]2/1[

    )2/1(

    21 expexpexp)|(

    dd

    d

    pii

    xP

    6.2.2.

    2 1 2

    2 1 .

    .

    .

    (1)Re-substitution

    (overestimate)

    .

    (2)

    ,

    . 1/2

    .

  • Chapter 6.

    132

    (3)Cross-validation

    Lachenbrush(1968) .

    ,

    . Jackknife . 2

    .

    2 Cross-validation

    .

    1 2

    95 10 90 5

    5 90 10 90

    1 .

    (equal cost function) (ratio cost function) .

    6.2.3.

    Kansas Dr. Michael Finnegan

    82 9 .

    TURKEY.txt/[Applied Multivariate Methods for Data Analysts, Dallas E. Johnson, 1998, p223]

    ID: id HUM: RAD:

    ULN: ) FEMUR: TIN:

    CAR: carp metacarpus D3P: COR:

    SCA: TYPE: (WILD), (DOMESTIC)

  • 133

    HUM, ULN

    .

    .

    (1)

    HUM( ), URN( ) TYPE

    .

    SYMBOL V Value( ), C

    color( ) .

    GOPTIONS RESET=ALL

    . .

  • Chapter 6.

    134

    .

    (2)Fisher

    CROSSLIST cross-VALIDATION (

    ) . Resubstitution .

    METHOD=NORMAL NPAR NORMAL

    Fisher

    default .

    Nonparametric .

    OUT SAS data .

  • 135

    CLASS VAR .

    (3)

    82 HUM, ULN 40 .

    2 2 .

    () 22 , 18

    .

    prior prob. ( ) .

    default .

    .

    Fisher

  • Chapter 6.

    136

    (constant) )()')(2/1( 211

    21 k / (coefficient)

    )( 211' b

    (classification function): kxb 0' 1 ,

    2 . 0' xb

    .

    Resubstitution .

    Re-substitution under-estimate

    .

  • 137

    19 Fisher

    . From type Classified into

    .

    Posterior Prob. ( ) ()

    . 2 (Obs=2) 0.13,

    0.87 .

    1 .

  • Chapter 6.

    138

    , .

    Fisher , cross-validation .

    6 6/22=0.2727 .

    3 3/18=0.1667 .

    (0.2727+0.1667)/2=0.2197 .

    .

    SAS data OUT1 . (OUT=OUT1

    )

  • 139

    _INTO_ .

    .

    IF (GROUP^=.) .

    , .

  • Chapter 6.

    140

    .

    (3 ) . 6 3

    .( .)

    (4)

    2

    . (HUM, ULN) .

    (HUM, ULN) = (145, 150) , (HUM, ULN) = (150, 145)

    SAS . OUTPUT

    NEW 2 HUM, ULN .

    TYPE

    .

  • 141

    (HUM, ULN) = (145, 150) 0.64

    (HUM, ULN) = (150, 145) 0.56 . (

    0.5 )

    2 .

    .

  • Chapter 6.

    142

    6.3.

    6.3.1.

    3 .

    Resubstitution, Test Data , Cross-Validation. SAS Posterior Prob.

    Cross-validation . SAS data

    OUTCROSS= . OUT=(:OUT1)

    Resubstitution . Test data

    OUTTEST= .

  • 143

    6.3.2.

    2 .

    (1) 1|2C : 1 2

    (2) 2|1C : 2 1

    .

    2 1 1p , 2 2p .

    )ln()()(2/1 *01'

    0*

    iiiipxxd , 2,1i

    )2|1()1|2()1|2(

    21

    1*1 CpCp

    Cpp

    ,

    )2|1()1|2()2|1(

    21

    2*2 CpCp

    Cpp

    *2*1 dd 1 ( 1 )

    *2

    *1 dd 2 ( 2 ) .

    Wild Domestic .

    .

    ( p1 p2

    )1|2()2|1( CC ), ( 21 pp ) .

    .

    . .

    (1)PRIORS EQUAL

    (default)

  • Chapter 6.

    144

    (2)PRIORS PROPORTIONAL

    (3)PRIORS WILD=0.4 DOMESTIC=0.6

    0.6, 0.4

    6.2.3 => 6->2 4 , =>

    3 4 .

    .

    6.3.3.

    SAS

    - (within) -

    . -

    POOL=YES .

    .

  • 145

    ( 0.1)

    .

    6.4.

    () .

    (1) ? (2) ?

    . Forward , Backward

    , Stepwise .

    6.4.1. Forward

    .

    (1) () ( )

    (ANOVA) . F- .

    .

  • Chapter 6.

    146

    (2) ? (covariate)

    (ANCOVA) SS3 F- .

    .

    : .

    20 2

    , .

    .

    . (Y) ( / )

    (ANOVA). .

    . .

    . .

    .

    ? , (covariate)

    (ANCOVA) SS3 F- .

    . F- (SS3 p-

    ) . 0.25 0.5 . SAS

    SLE(Significant Level for Entry) .

    6.4.2. Backward

    .

    (1) , , ( )

    (Type III, Partial SS) F-

    .

    (2) . SS3 F- (p-

    ) . 0.15 . SAS

    SLS(Significant Level for Stay) .

  • 147

    6.4.3. Stepwise

    Forward .

    .

    .

    ,

    . SAS SLS, SLE

    .

    6.4.4.

    15 Backward , 15 Stepwise

    . .

    F-

    . (X1, X2) X1

    X2 . .

    .

    X2

    X1 .

    . .

  • Chapter 6.

    148

    (1) (2)

    .

    6.4.5.

    .

    .

    . (TURKEY0.TXT)

    (1)

  • 149

    CROSSLIST

    OUTCROSS . 8 cross-

    validation (, ) OUT1 .

    9 32 19 , 13

    . PRIOR equal (0.5) .

    (HUM, ULN) . 6.2.3

    9 .

  • Chapter 6.

    150

    (2)

    Forward, Backward, Stepwise

    Stepwise

    Fisher .

    0.25-0.4, 0.15

    . 0.25, 0.15

    Stepwise . SAS

    STEPDISC procedure .

  • 151

    . TIN, COR,

    D3P, ULN . Fisher .

    4 =>, => .

    .

    .

    (3)

    4 9

    .

    . 4 . ()

    OUT2 TYPE _INTO_ .

    .

    .

  • Chapter 6.

    152

    B710 COR L684 L750 4

    . WILD COR

  • 153

    6.4.6.

    . (TIN, COR, D3P, ULN) = (140, 105, 300, 145) .

    (domestic) 0.957 .

    6.5.

    Fisher Fishers between-within method .

    (Canonical)

    . ( p ) p -

    . (BOX-PLOT )

    . ( )

    .

    .

  • Chapter 6.

    154

    6.5.1.

    .

    ),(~ ipi

    N in . mi ,...,2,1 (m )

    p - .

    Between sample mean:

    m

    i oioiinB

    1))(( ,

    m

    i ii

    oon

    n 11 ,

    m

    iio nn

    1

    Within sample mean:

    m

    i

    n

    r iriiri

    ixxW

    1 1))((

    bWBbbBb

    b )(''

    max0

    b BWB 1)(

    . 1b . xby'11

    .

    || '1'1 ii bxbd , mi ,...,2,1 .

    6.5.2.

    2b BWB1)( . BWB 1)(

    2 . 2

    . 2'2'2

    2'1

    '1 )()( iii bxbbxbd , mi ,...,2,1

    ,

    () .

    .

  • 155

    6.5.3.

    ( 80% SCREE plot

    . p

    2 .

    6.5.4.

    PROC CANDISC . 8

    . (

    .) NCAN . NCAN=2

    2 OUT CANSCORE SAS CAN1 CAN2

    .

    (1)

    bWBb

    bBbb )(max

    0

  • Chapter 6.

    156

    NCAN=2 100%

    . . (

    100%) .

    .

    .

    2 .

    2

  • 157

    3969.0128*0234.0...140*1906.0153*1172.022016.0128*0033.0...140*022.0153*029.01

    CanCan

    (3)

    , .

    (4)

    2 SYMBOL 2 . (

    ) .

    . Bullet

    SYMBOL1 V=DOT V=CIRCLE .

  • Chapter 6.

    158

    8 100%

    . 2

    .

    2

    .

    . SAS default

    .

    ?

    .

    .

  • 159

    CAN1 2.09 CAN1 -1.54.

    .

    6.6. K Nearest Neighbor

    ( ) Mahalanobis ( )()'( 01

    0 iiixxd )

    . K nearest neighbor .

    (1) Mahalanobis

    .

    (2) 2 .

    (3)2

    3 . k nearest neighbor

    Mahalanobis k k

    . 3

    .

    6.6.1.

    LIST CROSSVALIDATE

    .

  • Chapter 6.

    160

    K=3

    K=2

    K=5

    K=3, K=5 () K

    . TURKEY K=5 nearest

    neighbor .

  • 161

    Fisher (6.4.5 ) 3

    . Fisher

    2 (10.53%), 1 (6.25%), 8.39% K Nearest

    neighbor 8.88% Fisher

    .

    6.6.2.

    ( ) , , Binary Classification Trees

    . Breiman, Friedman, Olshen, Stone (1984)

    CART(Classification And Regression Trees) . J. A. Hartigan

    CHAID(Chi-square Automatic Interaction Detector) . Data

    Mining . SAS E/Minor, SPSS Clementine Data

    Mining Tool .

    6.7. 3

    6.7.1.

    (Wheat) Arthur (soft ) Arkan (hard ) Group 1, 2 Group 3,

    4 . 4 .

  • Chapter 6.

    162

    . [WHEAT.txt]

    (Right) (Area), (Perimeter), (Length), (breadth) (down)

    (Area), (Perimeter), (Length), (breadth) . [Applied

    Multivariate Methods for Data Analysts, Dallas E. Johnson, 1998, p237]

    6.7.2.

    2 .

    Fisher

    . .

  • 163

    OUT1 data

    . Obs. 10 1

    2 0.621 2 . ()

    6.7.3.

    BACKWARD .

    3 (D_L, R_P, D_B) default 0.15

    . SLS=0.2 .

  • Chapter 6.

    164

    .

    .

    (1)

    (2)

  • 165

    K-nearest neighbor

    .

    6.8.

    (: continuous, measurement, metric))

    . decision tree (CART, CHAID)

    .

    (Logistic Regression) (binary, dichotomous:

    0 1 ) (ordinal: //)

    . .

    .

    0, 1(: , ) .

    . 1y , 0y

    . 1 0 .

    .

    (ordinal)

    . (, , )

    . 3 LOGISTIC

    CATMOD . CATMD

    CATegorical data MODeling LOGISTIC

    CATMOD

    6.8.1

    ipipiii exxxxfy ..)( 2211 , ),0(~2iidNei

    0 1 (

    ) ( 2R ) F- t- ,

  • Chapter 6.

    166

    . iy ( 0, 1 ) OLS

    .

    (1)ODDS: p

    podd

    1

    [p=0.5 1 . ] .

    2002 16 0.1 1/9 Odds . 1$ betting

    9$ return 2002 16 0.8 4 Odds .

    4$ betting 1$ return

    (2)ODDS transformation ()

    ppp

    1

    *

    (3)

    )1Pr( Ypi

    ( 1Y ) . odds Odds .

    i

    ip

    pp i

    1*

    ip (0,1) *ip (0, ) . )ln(

    *ip

    (-,) ),0(~ 2Normalei (

    ) . Logistic model .

    ipipiii

    ii exxxppy

    ..)1

    ln( 2211 , ),0(~ 2Normalei :

    .

    ixxxixxx

    xxx

    i ee

    ee

    exYppipiipipii

    pipii

    }..{}..{

    }..{

    22112211

    2211

    1

    1

    1)|1Pr(

  • 167

    ip (: 1Y )

    ip

    .

    (4)

    2Log L, AIC(Akaike Information Criterion) Schwartz Criterion

    (Adjusted ) Wald Chi-

    square .

    6.8.2. OLS

    EXAMPLE

    Remission.txt (cell, smear, infil, li, blast, temp)

    () .

    OLS (Li ) .

    OLS .

    remiss=1

    remiss=0 .

    OLS Y OUT1

  • Chapter 6.

    168

    Li

    (p=0.0035) .

    ( Y )?

  • 169

    6.8.3.

    OLS Li

    .

    descending ? SAS event()

    non-event . 1 ( event )

    descending . output

    ( Y ) OUT2 .

  • Chapter 6.

    170

    2 - . p- 0,0146

    Li .

    OLS Y ( oYhat _ ), Y ( lYhat _ )

    event non-event

    Li

    (i=join) .

  • 171

    OLS ( ) .

    0 1 )|1Pr( xYpY ii , event(

    ) 0 1 .

    6.8.3.

    Logistic . ,

    (Cross Table) .

    . stepwise Entry

    0.2, Stay 0.1 . ( SLE=0.25 , SLS=0.15

    )

  • Chapter 6.

    172

    SAS Order Value=1 (event) , Order Value=2 (non-event)

    . (domestic) .

    }*5.03.73{11)|Pr( TINi e

    xDomesticYp . TIN

    .

    (1)INCLUDE

    INCLUDE . TURKEY

    Fisher TIND3PCORULN TIN

    D3P Logistic . (TURKEY

    )

    INCLUDE=2 2 . TIN,

    D3P .

  • 173

    Tentative }*61.23*67.0*55.72.544{11)|Pr( FEMURPDTINi e

    xDomesticYp

    . TIN , D3P FEMUR

    .

    (3 . 3 0 )

    (: )

    (2) (Standardized Beta Coefficient)

    CTABLE (cross-

    tabulation) . STB (

    ) . ( TIN, D3P ).

    .

    p-

    .

  • Chapter 6.

    174

    TIN 4.37 1.54

    (Event) . (-0.92,

    0.23) .

    CTABLE (cross-

    tabulation) . STB . (

    TIN, D3P ).

    (DOMESTIC) EVENT (=1)

    . Pr(Y=1) Pr(Y=Event) .

  • 175

    Prob. Level Logit Pr(Event) Event

    . Pr(Event) 0.0 Event

    . EVENT 19

    Event non-EVENT 18 Event

    .

    Correct Event(Domestic) Event Non-event Non-event

    In-Correct Event(Domestic) non-Event Non-event event

    Correct . 51.4

    19/(19+18). 81.1 (19+7)/(19+18) .

    Sensitivity Event() Event()

    False Pos. Event() non-Event() ,

    Sensitivity + False Pos. 1 .

    Specificity non-Event() non-Event()

    False Neg. non-Event() Event() ,

    Specificity + False Neg. 1 .

    CORECT Prob. Level .

    Sensitivity Specificity . ( )

    Prob. Level 0.24~0.8 Correct,

    Sensitivity, Specificity . Prob. Level 0.4 .

    Pr(Event) 0.4 Event() non-

    Event() . 0.4 3 ( 8.1%=3/37,

    Non-event Event Event Non-event ) Fisher K-

    nearest .

    6.8.4.

  • Chapter 6.

    176

    Pr(Event) Phat SAS data OUT1

    .

    _LEVEL_ EVENT(Domestic) . Prob.

    Level 0.4 . PHAT 0.4

    WILD() _LEVEL_ WILD . PHAT

    .

    .

  • 177

    CTABLE 2 . . .

    () 2 logistic ( 2 : TIN, D3P)

    .

    2 Pr(EVENT) .

    0.4 (TIN=145, D3P=320) Domestic() (TIN=150, D3P=300)

    Wild () .

  • Chapter 6.

    178

    6.8.5.

    Fisher K nearest neighbor

    . ( CART, CHAID ) Logistic

    1) ( ) 2) ( )

    . 2 ( )

    Logistic . ()

    (: ) .

    3 Logistic

    . WHEAT GROUP (1

  • 179

    1( 1Y ) 3 Phat .

    467.0))89,226,,56,54(|1Pr( 1 rlrpradaxY

    297.0467.0764.0))89,226,,56,54(|2Pr( 1 rlrpradaxY

    175.0764.0939.0))89,226,,56,54(|3Pr( 1 rlrpradaxY

    06.0939.01))89,226,,56,54(|4Pr( 1 rlrpradaxY

    ( )

  • Chapter 6.

    180

    [EXERCISE]

    [CAR.TXT : http:// lib.stat.cmu.edu/DASL]

    (1)CAR.TXT 5 (MPGDISPLACEMENT) 2 (

    ) () .

    . 2 , -

    . 3 ? 2 .

    (2) (US, non-US) HOMEWORK #7-1

    ( )

    .

    (3)CAR.TXT 2 (MPG, HORSEPOWER) (US, non-

    US) .

    MPG, HORSEPOWER .

    FISHER CROSS-VALIDATION

    .

    (1) .

    (4) 2 . (3)

    .

    (MPG, HORSEPOWER)=(20, 100) (MPG, HORSEPOWER)=(25, 120)

    (5)CAR.TXT , 2 MPG, HORSEPOWER, (US, non-US) 2

    , non- 70%, 30% .

    .

    .

  • 181

    (6)CAR.TXT , (MPGDISPLACEMENT) 5, (US, non-US)

    PROC STEPDISC . ( &

    ) =0.2 (Type III SS Group .)

    (7)CAR.TXT , (MPGDISPLACEMENT) 5, 2(US, non-US)

    5 .

    .

    MPG, MANPOWER 2 , .

    (8)CAR.TXT 5 (MPGDISPLACEMENT)

    .

    (9)CAR.TXT 5 (MPGDISPLACEMENT) K nearest

    K .

    (10)Fisher , K-nearest ,

    .

    (11)(Wheat) [WHEAT.txt] 4 , 8

    Fisher . (8 )

    Fisher .

    K-nearest . K . (

    .)

    Location (

    ) cut-off .

    -

    .( ) 2

    ,

  • CHAPTER 7

    , , , , ,

    . (cluster)

    (Clustering Analysis).

    () .

    . grouping classification

    .

    ? 2

    . 12 2 () .

    Euclidean () ( ) .

  • Chapter 7.

    184

    1) 2 (scatter

    plot) 2) 3 Bubble Plot 3) 4

    2-3 .

    .

    7.1.

    7.1.1. Euclidean

    . (similarity)

    . r s Euclidean .

    2/1)]()'[( srsrrs xxxxd

    7.1.2. Euclidean

    . r s Euclidean

    .

  • 185

    2/1)]()'[( srsrrs zzzzd

    7.1.3. Mahalanobis

    r s Mahalanobis within - . .

    2/11 )]()'[( srsrrs xxxxd

    7.2.

    7.2.1.

    seed seed (

    ) () . 3 . 1)

    () . 2) seed

    . 3) seed

    . SAS

    procedure FASTCLUS .

    7.2.2.

    () single-linkage clustering

    . Neighbor Method single-linkage clustering

    .

    (1) (n) . 6

    Euclidean () . 6 .

  • Chapter 7.

    186

    1 2 3 4

    1 0.1 0.7 0.2

    2 0.4 0.6

    3 0.3

    4

    .

    (2) ( ) . (3.5)

    .

    (1, 2) 3 4

    (1, 2) ? ?

    3 0.3

    4

    ?: (1, 2) ) () ?

    (3) .

    ( ) () 5 .

    Nearest neighbor: ()

    Furthest neighbor:

    Centroid neighbor:

    Average neighbor:

    Wards minimum variance:

    .

    Nearest, Furthest, Centroid neighbor, Average neighbor, Wards minimum variance

    ? Nearest

    Furthest

    . 2-3

  • 187

    .

    Average neighbor .

    (4) Nearest neighbor .

    1 3 0.7, 2 3 0.4 (1, 2) 3 0.4 . 1

    4 0.2 2 4 0.6 0.2 (1, 2) 4

    .

    (1, 2) 3 4

    (1, 2) 0.4 0.2

    3 0.3

    4

    (1, 2) 0.4 3 4 0.3 (1, 2, 4) 3 0.3

    .

    (1, 2, 4) 3

    (1, 2, 4) 0.3

    3

    7.3.

    ? Tree diagram

    Hotel lings 2T Cubic Clustering Criterion .

    7.3.1.

  • Chapter 7.

    188

    (Tree Diagram) , ,

    (). (diagram) 2 (2, 4), (3, 5, 6, 1)

    .

    7.3.2. Pseudo Hotel lings 2T

    Hotel lings 2T .

    .

    7.3.3. CCC

    Searle(1983) CCC(Cubic Clustering Criterion)

    CCC 3 .

    .

  • 189

    7.4.

    (Discriminant Analysis) (Clustering Analysis)

  • Chapter 7.

    190

    .

    .

    ( ),.....,, 21 PXXX

    .

    (1)

    Fisher method (

    , )

    K Nearest Discriminant Analysis

    Logistic Regression (

    ,

    )

    (2)

    (by )

    . 2

    .

    (3)

    ()

    .

    (1) .

    Nearest neighbor

    Furthest neighbor

    Centroid neighbor

    Average neighbor

    Wards minimum variance

    (2) .

    CCC

    Pseudo Hotel lings 2T Tree Diagram

    (3)

    . 3

    .

    .

    (4) .

  • 191

    7.5.

    56 MOIS( ), PROT( ), FAT( ),

    ASH(ash ), SODIUM( ), CARB( ), CAL()

    . 56 . PIZZA.txt/[Applied Multivariate

    Methods for Data Analysts, Dallas E. Johnson, 1998, p331]

    7.5.1.

    (1)STANDARD Standardized Euclidean Distance

    . STANDARD .

    (2) Average neighbor( )

    . (METHOD=AVERAGE)

    AVERAGE | AVE average linkage vs. CENTROID | CEN centroid method

    COMPLETE | COM complete linkage (furthest neighbor, maximum method, diameter

    method, rank order typal analysis).

    DENSITY | DEN density linkage, which is a class of clustering methods using

    nonparametric probability density estimation. You must also specify one of the K=, R=, or

    HYBRID options

    EML maximum-likelihood hierarchical clustering

  • Chapter 7.

    192

    MEDIAN | MED Gower's median method

    SINGLE | SIN single linkage (nearest neighbor, minimum method, connectedness

    method, elementary linkage analysis, or dendritic method).

    WARD | WAR Ward's minimum-variance method

    (3)CCC Cubic Clustering Criterion , PSEUDO Pseudo Hotel lings 2T . .

    (4) TREE SAS data .

    7.5.2.

    7 . 7

    2 .

    .

    . ()

    .

    7.5.3.

  • 193

    (1) .

    (2) . 34021 34026 .

    () (Norm Distance): ) 0.0203 .

    (3) 24107 34022 . 0.0304

    (4) 14072 24030 . 0.0366

    (5) CL54( 54: 24107, 34022) 55( 34021, 34026)

    .

    (6) 24049, 24033 14118, 14143

    .

    (7) 52( 54, 55) 14067 . .

    (8)FREQ . 2

    .

    7.5.4.

  • Chapter 7.

    194

    .

    . NCL Number of Clustering . Tie()

    . Tie (Tie) .

    (1)CCC(Cubic Clustering Criterion, Searle; 1983) 20% .

    CCC 3 . CCC

    4 .

    (2)Hotellings 2T

    . PST2(Pseudo Hotellings 2T ) . PST2

    .

    . PST2 NCL=6, NCL=3

    (NCL) . NCL=6

    7 NCL=3 7 .

    (3)PSF Pseudo F PST2 Pseudo Hotellings 2T . CCC PST2 .

    (4)CCC 4 , PST2 4 7 .

  • 195

    7.5.5. Hierarchical Tree

    7.5.6.

    ( 4 7 )

    () . 7

    .

    NCL=2

    NCL=7

    NCL=4

  • Chapter 7.

    196

    =7

    1 7 .

    2001 1 2001 .

    .

    .

    7.5.7.

    .

    .

    7 (MOIS, PROT, FAT, ASH, SODIUM, CARB, CAL)

    . ( )

    .

    ? 2 .

    . .

  • 197

    . 2

    3 2 .

    HPOS=50 (H) 50%

    VPOS=50 (V) 25% .

    ( : ) .

    152 2 6 91.8%

    . Prin1 (PROT, FAT, ASH, SOLDIUM, CARB( ))

    Print2 ( ) . 1

    , 2 .

    ()

    .

    () .

    .

  • Chapter 7.

    198

    . 4 .

    7 4 .

    4 .

    7.6. Faster Cluster

    Faster clustering (hierarchical clustering)

    [() ] (non-

    hierarchical clustering) . seed seed

    . (number of clusters)

    (size: radius ) .

    SAS FASTCLUS .

    Faster Cluster .

    ,

    NCLUSTER=7

    ,

    ,

    ,

  • 199

    (STEP1) seeds . seed .

    MAXCLUSETRS=2 SEED 2 .

    (STEP2) seeds . DRIFT

    seed .

    (1) (2) .

    (STEP3) SEED .

  • Chapter 7.

    200

    (STEP4) STEP 2)-STEP3) . STEP SEED

    . STEP . MAXITER

    . MAXITER=3

    STEP 3 .

    Non-hierarchical clustering

    RANDOM .

    FASTCLUS . (

    3 PRIN1, PRIN2 .)

    .

    7.6.1.

    Fisher Iris() [IRIS.TXT]. 4 .

    (VARIETY: S, C, V 3 )

    .

  • 201

    (petal length), (petal width), (stamen length), (stamen

    width)

    7.6.2.

    PROC CLUSTER FASTCLUS () .

    procedure

    .

    (SL, SW), (PL, PW)

    .

  • Chapter 7.

    202

    7.6.3.

    Fast cluster . IRIS

    (3 : S, C, V)

    .

    2~3

    .

    1 (SL, PL, PW: ), 2 SW( ) .

    VAXIS Y , HAXIS X . VPOS 25%, HPOS 50%

    PLOT .

    2

  • 203

    PRIN1 . (

    , ) .

    7.6.4. Fast clustering

    3 .

    2 3

    1 .

    SAS data (IRIS_PRIN)

    PRIN1, PRIN2 .

  • Chapter 7.

    204

    RMS Std Deviation

    .

    Maximum Distance from Seed to Observation

    seed seed

    R-Square

    RSQ/(1-RSQ)

    ?

    Distance Between Cluster Centroids

    ()

    PL () . PW

    . 1

    .

  • 205

    .

    A

    B

  • Chapter 7.

    206

    2 2 .(A

    , B ) 1, 2

    .

    ID

    .

    7.6.5. 1

    3 , SEED (DRIFT), 3

    168 RMS .

  • 207

    7.6.6. 2

    .

    2 .

    .

  • Chapter 7.

    208

    7.7.

    (MDS: Multi Dimensional Scaling) n (

    2 ) . (similarity)

    . .

    Under-arm Deodorant

    () .

    . , , ,

    10 . 2

    .

    2

    7

    ,

    .

    3(RADIUS=3)

    3

    RMS

    .

    ? Intuition & Trial Error

  • 209

    metric non-metric .

    (1)Euclidean distance (Metric )

    (2) () . (Metric/non-Metric )

    (3) . (non-

    Metric )

    .

    (1)

    (2)

    (3) ,

    7.7.1.

    n ( 2 )

    () .

    () .

    (1)metric

    metric () Euclidean distance .

    X1 X2 Xp

    1 x11 x12 x1p

    2 x21 x22 x2p

    n xn1 xn2 xnp

  • Chapter 7.

    210

    ( 1D , 2D ) 22

    222

    11 )()()( jpipjiji xxxxxx (Euclidean

    distance)

    () (10 , 100 )

    . 1x , ,2x , px

    ( ) . Deodorant ,

    , , 10

    .

    (2)non-metric

    . . 50 20

    (1, 2) 30, (1, 20)

    25 , (2, 20) 45

    1 2 20

    1 0

    2 30 0

    20 25 45 0

    . . MDS

    (1-/) .

    7.7.2.

    . n k=n(n-1)/n

    .

    (1) . ikjkjiji SSS ...2211

    (2) m( 2) .

    2) . 2

    STRESS .

  • 211

    ji

    ji

    ij

    ijij

    S

    SS

    STRESS 2)2

    2)3)2

    )(

    )(

    2 Stress .

    Stress Goodness of fits

    20% Poor

    10% Fair

    5% Good

    2.5% Excellent

    7.7.3. 1 (Metric)

    .

    (5.) 5 @56 56 city .

    10 MDS 2

    . . non-metric

    (1-/) .

  • Chapter 7.

    212

    (1)MDS

    LEVEL=absolute .

    LEVEL=ordinal . default ordinal .

    SHAPE=square .

    SHAPE=triangle . triangle default .

    PLOTIT . () 2 .

    MDS ()

    .

    2 STRESS

    . SAS data OUT

    .

    (2)MDS

  • 213

    . ()

    (Dim1, Dim2) .

  • Chapter 7.

    214

    7.7.4. 2

    . 6 15 15

    . 15 .

    cheek() . mouth=>face=>head ..

    .

  • 215

    (1)CONDITION data ROW ,

    MATRIX(default) . . (

    )

    (2)LEVEL data ORDINAL(default) , ABSOLUTE

    .

    (3)DIMENSION .

    (4)PFINAL OUT .

  • Chapter 7.

    216

  • 217

    7.7.5. 3

    34 HP(), TIM1( 60 :) TIM2(1/4

    :), TS() BRAK1(60 ) BRAK2 (80

    ) SP( ) SS( ) MPG(

    ) . MDS . [CARS.txt]

    CARS4 CARS2

    .

  • Chapter 7.

    218

    ONE I . ORIG J

    ( ji, )=(1,2), (1,3), (34,34) 34*34 .

    DUP i j NN .

    .

    . .

  • 219

    =SHAPE( , nrow, ncol) (nrow, ncol)

    . ( DIST nn DD .)

  • Chapter 7.

    220

  • 221

    [EXERCISE]

    (1)ORANGE.txt 5 .

    Boron(B), Barium(BA), Calcium(CA), Potassium(PO), Magnesium(MA), Manganese(MN),

    Phosphorous(PH), Rubidium(RU), Zinc(ZN)

    ( Centroid( ), Average( ) )

    .

    .

    . (

    ?)

    (2)ORANGE.txt FASTCLUSTER (1) Hierarchical

    .

    (3) (WHEAT.txt) (, , )

    (1) (2) .

    (4) .

    Metric . .

    Non-metric .

  • CHAPTER 8

    (Canonical Correlation Analysis)

    . (, , ) (,

    , )

    .

    ( mXXX ,...,, 21 ) ( nYYY ,...,, 21 ) . Xs

    Ys mm XaXaXaV ...2211 , nnYbYbYbW ...2211 1)

    2) ),( VUCorr . p

    2 .

    )

    ,(~

    ...2221

    1211

    2

    1

    2

    1

    4

    3

    2

    1

    Normal

    xx

    x

    xxxx

    x

    p

    . 1) ( 21, xx )

    2)

  • Chapter 8.

    222

    ((determination coefficient= SSTSSR / ) ( 2R ).

    ( ) ( pp XaXaXa ...2211 )

    .

    8.1.

    8.1.1.

    .

    ),(max 110

    1 WVcorrba

    where 2'111

    '11 , xbWxaV

    11, ba (first canonical variate)

    11, ba . 1 (first canonical

    correlation) .

    1)var()var( 11 WV 111'1 aa , 122

    '1 bb

    8.1.2.

    2'221

    '22 , xbWxaV 22 , ba .

    (1)V2 W2 V1 W1 .

    (2) 1)var()var( 22 WV

    ),( 222 WVcorr .

    .

    2-3 .

  • 223

    8.1.3.

    .

    . p

    q ),min( qp .

    8.1.4.

    .

    (1) 0: 101 H vs. 0: 101 H 0: 1201 H vs. 0: 1201 H

    )1(||||

    || 212211

    ik

    iT

    , ),min( qpqk

    (2) 0:0 rrH vs. 0:0 rrH

    )1( 2ik

    rirT

    , 2 )1)(1(,~)log( rqprqrT

    8.2.

    (WHEAT.txt) (, , )

    (, , ) .

    8.2.1.

    SAS .

    . NOSIMPLE

    (, ) .

  • Chapter 8.

    224

    .

    . (pair-wise)

    .

    (1)OUT SAS SCORES .

    (2)NCAN=2 2 . (V1, W1), (V2,W2)

    (3)CORR . PROC CORR

    .

    (4)VPREFIX=DOWN V DOWN . WPREFIX=RIGHT W

    RIGHT .

    (5) VAR V, WITH W

    .

  • 225

    8.2.2.

    , .

    .

    CANONICAL

    4 . ( 4 )

    )2,2(),1,1( WVCorrWVCorr . )2,1( WVCorr ? 0

    .

    1

    2

    3

  • Chapter 8.

    226

    CANONICAL

    .

    0 . 0

    . 3 0.03 0.05

    . . 4

    0.9617 .

    ,

    2'111

    '11 , xbWxaV , 2

    '221

    '22 , xbWxaV 11,ba , 22 , ba .

    DBZDLZDPZDAZDOWNV _*2537.0_*2544.0_*7768.0_*0797.011

    DBZDLZDPZDAZDOWNV _*4587.0_*2224.1_*6688.0_*2867.021

  • 227

    RBZRLZRPZRAZRIGHTW _*0407.0_*1601.0_*8941.0_*0165.011

    RBZRLZRPZRAZRIGHTW _*4412.0_*8096.0_*1905.0_*8764.021

    .

    **_Z .

    RAW STANDADIZED

    . ()

    .

    . V1 , V2

    , W1 , W2 ( )

    . .

    . V1 V2, W1 W2 .

    .

    DOWN1 , ,

    DOWN2 .

    (RIGHT1) , (RIGHT2) .

  • Chapter 8.

    228

    (RIGHT1) , , .

    , , . (RIGHT2)

    , , .

    (DOWN1) , , ,

    (DOWN2) , , .

    . .

    OUT SCORE .

    (DOWN1, RIGHT1) (DOWN2, RIGHT2) (DOWN1, RIGHT2), (DOWN2, RIGHT1)

    0. V1, W1, V2, W2 .

    .

  • 229

    0.88 0.39 1

    . .

  • Chapter 8.

    230

  • 231

    [EXERCISE]

    6 .

    ( n =20)

    191 36 50 5 162 60 189 37 52 2 110 60 193 38 58 12 101 101 162 35 62 12 105 37 189 35 46 13 155 58 182 36 56 4 101 42 211 38 56 8 101 38 167 34 60 6 125 40 176 31 74 15 200 40 154 33 56 17 251 250 169 34 50 17 120 38 166 33 52 13 210 115 154 34 64 14 215 105 247 46 50 1 50 50 193 36 46 6 70 31 202 37 62 12 210 120 176 37 54 4 60 25 157 32 52 11 230 80 156 33 54 15 225 73 138 33 68 2 110 43

    (, , ) . 8

    .

    Chapter1Chapter3Chapter4Chapter5Chapter6Chapter7Chapter8