Methodology 5

  • View
    15

  • Download
    1

Embed Size (px)

Text of Methodology 5

  • Chng 5. Nhp v x l d liu

    Mn hc: Phng php nghin cu kinh tKhoa Kinh t Pht trin

    i hc Kinh T TP. H Ch Minh

    2

    5.1 Gii thiuNhm hng dn sinh vin cch: Cch nhp liu, x l v phn tch d liu. Cc k thut phn tch d liu mang tnh

    khm ph (exploratory data analysis). Cch s dng bng cho (cross-tabulation)

    trc nghim mi quan h gia cc bin phn loi (categorical variables).

    Cch s dng cc thng k phn tch trc nghim gi thit.

    3

    5.2 Quy trnh phn tch d liuLp cng NC

    Thu thp v chun b d liu

    Phn tch v din gii d liuPhn tch m t cc bin s

    Lp bng cho cho cc bin s

    Trnh by d liu(histogram, boxplots, Pareto, stem-and-

    leaf, AID, etc.)

    Phn tch d liu

    Bo co nghin cu

    Ra quyt nh

    K hoch phn tch s khi

    Xc nh li gi thit

    Th hin trc quan d liu

    Trc nghim gi thit

    Hnh 5.1 Cc bc khm ph, trc nghim v phn tch trong qu trnh nghin cu

    4

    5.3 Nhp s liu

    5.3.1 Cch b tr d liu trn my tnh

    Mc tiu: Nhm to iu kin thun tin cho vic

    nhp liu Nhm to s thun li cho vic chnh sa

    d liu

  • 55.3 Nhp s liu Thc hin: Nguyn tc chung: t tn bin ngn gn, vit

    tt (ting Vit khng du hoc ting Anh). Tn bin nn c t theo quy nh.

    Dng Excel: d thao tc v chnh sa, khng gian lu tr hn ch, cng c thng k v kinh t lng khng cho phn tch.

    Dng SPSS: khng gian lu tr gn nh khng hn ch, cng c thng k v kinh t lng pht trin y cho nhu cu phn tch. Khai bo d liu bt buc, mt thi gian.

    6

    5.3 Nhp s liu

    Hnh 5. 2 Cch nhp d liu vo bng tnh SPSS

    7

    5.3 Nhp s liu

    Hnh 5.3 Cch nh ngha cc thuc tnh ca cc bin s nh tnh v nh lng

    8

    nh ngha kiu bin

  • 9Xc nh nhn (gii thch) ca bin

    10

    Xc nh gi tr phn loi ca bin

    11

    Xc nh thang o ca bin

    12

    5.4 Lm sch d liu

    5.4.1 Pht hin gi tr d bit trong d liua. S dng Excel: hm Max v Min, cng c Auto Filter, th Scatter

  • 13

    5.4 Lm sch d liuHnh 5.4 Cng c th Scatter trong Excel

    14

    5.4 Lm sch d liu

    5.4.1 Pht hin gi tr d bit trong d liub. S dng SPSS: th Scatter, cng c Frequency, Bar Chart, Pie Chart, v Box Plot trong Explore

    15

    5.4 Lm sch d liu

    b. S dng SPSS: th Scatter

    Number of used days in a month

    403020100

    80

    70

    60

    50

    40

    30

    20

    10

    Others

    Honda @

    Honda Dream

    SYM Attila

    Yamaha Cygnus

    Honda Wave

    Yamaha Jupiter

    Yamaha Sirius

    Honda Future Neo

    Honda AirBlade

    16

    5.4 Lm sch d liu

    b. S dng SPSS: cng c Frequency, Explore

    Hnh 5.6 Cng c Frequency v Explore trong SPSS

  • 17

    5.4 Lm sch d liu

    b. S dng SPSS: cng c Frequency

    Frequency Percent %Valid Cumulative Percent

    Honda Air Blade 10 10.0 10.0 10.0Honda Future Neo 8 8.0 8.0 18.0

    Yamaha Sirius 7 7.0 7.0 25.0Yamaha Jupiter 13 13.0 13.0 38.0

    Honda Wave 24 24.0 24.0 62.0Yamaha Cygnus 4 4.0 4.0 66.0

    SYM Attila 11 11.0 11.0 77.0Honda Dream 6 6.0 6.0 83.0Honda @ 7 7.0 7.0 90.0

    Others 10 10.0 10.0 100.0Total 100 100.0 100.0

    18

    5.4 Lm sch d liu

    b. S dng SPSS: cng c Pie Chart v Bar Chart

    10.0%

    7.0%

    6.0%

    11.0%

    4.0%

    24.0%

    13.0%

    7.0%

    8.0%

    10.0%

    Others

    Honda @

    Honda Dream

    SYM Attila

    Yamaha Cygnus

    Honda Wave

    Yamaha Jupiter

    Yamaha Sirius

    Honda Future Neo

    Honda AirBlade

    Motobike Names

    Others

    Honda @

    Honda Dream

    SYM Attila

    Yamaha Cygnus

    Honda Wave

    Yamaha Jupiter

    Yamaha Sirius

    Honda Future Neo

    Honda AirBlade

    30

    20

    10

    0

    19

    5.4 Lm sch d liu

    Biu histogram l mt gii php quy c dng th hin cc d liu t l hoc khong cch.

    Biu histogram c s dng phn nhm cc gi tr d liu ca cc bin s (variable) thnh cc khong cch.

    Biu histogram c xy dng di dng cc thanh th hin gi tr d liu.

    b. S dng SPSS: cng c Histogram

    20

    5.4 Lm sch d liu

    Biu histogram rt hu dng cho vic: (1) th hin tt c cc khong cch trong mt phn phi (distribution), v (2) trc nghim dng hnh ca phn phi nh mo (skewness), nhn (kurtosis).

    Ghi ch: Biu histogram khng dng c cho cc bin danh ngha.

    b. S dng SPSS: cng c Histogram

  • 21

    5.4 Lm sch d liu

    Age of motorbike user

    757065605550454035302520

    30

    20

    10

    0

    Std. Dev = 14.42

    Mean = 39

    N = 100.00

    V d 5.2 Phn phi bin s tui ca ngi s dng xe my

    b. S dng SPSS: cng c Histogram

    22

    5.4 Lm sch d liu

    Mi dng ca biu c gi l mt thn; v mi s liu th hin trn mt thn gi l mt l.

    Khi biu thn-v-l c quay tri 900 , n s c dng hnh tng t nh biu histogram.

    b. S dng SPSS: biu Thn-v-L (Stem-and-Leaf Displays)

    23

    5.4 Lm sch d liu

    b. S dng SPSS: biu Thn-v-L (Stem-and-Leaf Displays)Age of motorbike user Stem-and-Leaf Plot

    Frequency Stem & Leaf

    6.00 1 . 88999918.00 2 . 0001111222222333448.00 2 . 55677788

    13.00 3 . 00122333344444.00 3 . 5556

    12.00 4 . 12333333444413.00 4 . 555556677778910.00 5 . 01233444449.00 5 . 5666677792.00 6 . 034.00 6 . 5567.00 7 .1.00 7 . 6

    Stem width: 10Each leaf: 1 case(s)

    5.3 Biu Thn-v L ca bin s Tui ca ngi s dng xe my

    24

    5.4 Lm sch d liu

    Biu hp, hay cn gi l biu hp-v-ru (box-and-whisker plot), cho ta mt hnh nh trc quan khc v v tr, phn tn, dng hnh, di ui v cc gi tr bt thng (outliers) ca phn phi.

    Biu hp th hin tm tt 5 gi tr thng k ca mt phn phi l trung v (median), hai t phn v trn v di (the upper and lower quartiles), v cc gi tr quan st ln nht v nh nht

    b. S dng SPSS: biu hp (Box-Plots)

  • 25

    5.4 Lm sch d liu

    Cc thnh phn ch yu ca biu hp l: Hp hnh ch nht cha ng 50% cc gi tr d

    liu. ng thng trung tm hp l gi tr trung v. Hai l ca hp th hin hai gi tr t phn v th 1 v

    th 3 (tng ng vi gi tr th 25% (25th percentile) v gi tr th 75% (75th percentile) ca dy s liu.

    Cc ru ko di t l pha trn v pha di ca hp th hin gi tr ln nht v nh nht. Cc gi tr ny nm trong khong ti a 1,5 ln khong cch gia cc t phn v tnh t l ca hp.

    b. S dng SPSS: biu hp (Box-Plots)

    26

    5.4 Lm sch d liu

    b. S dng SPSS: biu hp (Box-Plots)

    Cc gi tr ln hn 3 ln so vi di ca hp tnh t gi tr t phn v th 3 (75th percentile) (extremes)Cc gi tr ln hn 1,5 ln so vi di ca hp tnh t gi tr t phn v th 3 (75th percentile) (outliers)

    Gi tr ln nht quan st c khng phi l gi tr bt thng

    T phn v th 3 (75th PERCENTILE)

    Trung v (MEDIAN)

    T phn v th 1 (25th PERCENTILE)

    Cc gi tr ln hn 3 ln so vi di ca hp tnh t gi tr t phn v th 1 (25th percentile) (extremes)

    Cc gi tr ln hn 1,5 ln so vi di ca hp tnh t gi tr t phn v th 1 (25th percentile) (outliers)

    Gi tr ln nht quan st c khng phi l gi tr bt thng

    50% trng hp c gi tr nm trong hp

    27

    5.4 Lm sch d liu

    b. S dng SPSS: biu hp (Box-Plots)

    100100N =

    Number of used daysAge of motorbike use

    100

    80

    60

    40

    20

    0

    5.4 Biu hp ca bin s Tui ca ngi s dng xe my v s ngy s dng trong thng

    28

    5.5 Phn tch thng k m t

    S dng Excel: cng c Descriptives Statistics trong chc nng Data Analysis.

    S dng SPSS: cng c Frequency, Descriptives, Explore trong chc nng Descriptive Statistics ca SPSS.

    5.5.1 Phn tch thng k m t nh lng

  • 29

    5.5 Phn tch thng k m t

    Cc ch tiu thng k m t : xu hng trung tm, tnh bin thin v dng hnh phn phi ca d liu.

    5.5.1 Phn tch thng k m t nh lng

    30

    5.5 Phn tch thng k m t

    o lng xu hng trung tm (Measures of Central Tendency)

    Gi tr trung bnh (mean) l tng tt c gi tr ca cc d liu chia cho s lng ca d liu.

    Trung v (median) l gi tr ca s liu c v tr nm gia b s liu sp xp theo trt t. y chnh l im gia ca phn phi. Khi s quan st l chn, trung v l gi tr trung bnh ca hai quan st v tr trung tm.

    Mode l gi tr ca quan st c tn sut xut hin nhiu nht trong b d liu.

    Khong cch (range) l gi tr khc bit gia con s ln nht v nh nht trong b d liu.

    5.5.1 Phn tch thng k m t nh lng

    31

    5.5 Phn tch thng k m t

    o lng tnh bin thin (Measures of Variability) Phng sai (Variance; 2) l trung bnh tng cc sai s

    bnh phng gia cc gi tr ca cc quan st v gi tr trung bnh.

    lch chun (Standard deviation; SD; ) o lng mc phn tn ca s liu xung quanh gi tr trung bnh.

    Sai s chun ca gi tr trung bnh (Standard error of the mean; s.e.) o lng phm vi m gi tr trung bnh ca qun th () c th xut hin vi mt xc sut cho trc da trn gi tr trung bnh ca mu (mean).

    5.5.1 Phn tch thng k m t nh lng

    32

    5.5 Phn tch thng k m t

    o lng dng hnh ca phn phi (Measures of Shape) mo (skewness) o lng lch ca phn phi v

    mt trong hai pha. Phn phi mo tri (negative skew, left-skewed) khi

    ui pha tri di hn, v phn ln s liu tp trung pha phi ca phn phi.

    Phn phi mo phi (positive sknew, right-skewed) khi ui pha phi di hn, v phn ln s liu tp trung pha tri ca phn phi.

    Khi lch phi, gi tr sknewness dng; khi lch tri, gi tr skewness m. mo cng ln th gi tr sknewness cng ln hn 0.

    5.5.1 Phn tch thng k m t nh lng

  • 33

    5.5 Phn tch thng k m t5.5.1 Phn tch thng k m t nh lng

    Hnh 5.10 ng phn phi chun v cc c tnh

    34

    5.5 Phn tch thng k m t5.5.1 Phn tch thng k m t nh lng

    Hnh 5.11 Cc dng phn phi lch tri v lch phi so vi phn phi bnh thng

    35

    5.5 Phn tch thng k m t

    o lng dng hnh ca phn phi (Measures of Shape) nhn (kurtosis) o lng mc nhn hay bt ca

    phn phi so vi phn phi bnh thng (c nhn bng 0). Phn phi c dng nhn khi gi tr kurtosis dng v c dng