Chu de 2 Stata_VHLSS

Embed Size (px)

DESCRIPTION

Chu de 2 Stata_VHLSS

Citation preview

  • Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS

    Ch 2. STATA & VHLSS

    1. Gii thiu Stata Vi Stata:

    1. Thao tc (nhanh chng) bng cu lnh;

    2. Lu tr kt qu:

    Chn ni lu tr kt qu (th mc lm vic): File > Change working directory . cd "C:\...\Desktop\OutputStata11"

    Dng lnh : log using file_output, text lu kt qu. . log using 7Aug2010, text

    Trong , file_output l tn file kt qu (7Aug2010); text file_output c nh dng text. Trong qu trnh lm vic, dng lnh log off tm thi dng hnh ng lu tr;

    lnh log on m li chc nng lu tr; log close thot ra file lu tr.

    3. To mi v thc thi dofile: dng lnh doedit to mt dofile, sau t tn cho

    dofile (v d : vhlss2008.do) v lu vo th mc lm vic. thc thi dofile, dng

    lnh do dofile_name (v d: do vhlss2008).

    2. Cc cng c c bn trong Stata

    ---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 1

  • Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS

    2.1 Thng k m t

    Cu lnh Din gii

    describe M t d liu (Describe a dataset)

    summarize Thng k m t (Descriptive statistics)

    tabulate To bng tn s mt hoc hai chiu (One- and two-way frequency tables)

    table To bng (thng k m t)

    tabstat To bng thng k m t (Table of descriptive statistics)

    histogram Biu tn s

    graph V th

    2.2 Qun l d liu

    Cu lnh Din gii

    describe M t d liu

    save; edit Lu tr, chnh sa d liu

    Generate/egen; replace; recode To bin mi; thay th gi tr ca bin; m ha li

    keep; drop Gi li hay xa bin/quan st

    label; format To nhn cho bin; nh dng hnh thc

    append; merge Ghp ni quan st, cc file

    rename i tn bin

    sort; gsort; order Sp sp th t quan st theo gi tr; th t cc bin

    2.3 Phn tch phng sai

    Cu lnh Din gii

    ttest t-test

    oneway; anova ANOVA

    correlate; pwcorr Tng quan

    2.4 Phn tch hi quy

    Cu lnh Din gii

    regress Hi quy tuyn tnh

    logistic; logit; probit Hi quy logistic; logit; probit

    3. Thc hnh Stata

    ---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 2

  • Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS

    S dng file hhexpe08.dta

    3.1 Lnh use ti d liu

    . use "D:\VHLSS\VHLSS08\Data\Hhold\hhexpe08.dta", clear

    . use http://fmwww.bc.edu/ec-p/data/wooldridge/CEOSAL1

    (Introductory Econometrics: A Modern Approach by Jeffrey M. Wooldridge (1st &

    2d eds.) , Chapter 2 - The Simple Regression Model; Example 2.3: CEO Salary and

    Return on Equity)

    . use http://fmwww.bc.edu/ec-p/data/wooldridge/WAGE1

    3.2 Lnh describe (c th vit tt : d) m t tp d liu (s quan st, s bin, tn bin trong tp d liu).

    . d

    . d hh*

    Lnh tabmiss xem c gi tr missing. . tabmiss

    . tabmiss hh*

    3.3 if v in

    . su hhexp1rl if reg==8

    . su hhexp1rl if reg>7

    . sort tinh huyen xa diaban hoso

    . list tinh huyen xa diaban hoso riceexp in 1/10

    . list tinh huyen xa diaban hoso riceexp in -10/-1

    3.4 Lnh gennerate v replace

    To bin mi bng lnh generate (vit tt gen)

    generate newvar = exp [if exp]

    exp = m t ni dung ca bin mi.

    V d:

    . gen weexp= waterexp+elecexp to ra bin th hin tng chi tiu v in v nc ca h gia nh (bin weexp).

    . gen pcexp= hhexp1rl/ hhsize to ra bin th hin chi tiu bnh qun u ngi ca h gia nh (bin pcexp).

    To bin gi trong Stata:

    ---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 3

  • Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS

    . gen delta1 =1 if region==2 | region==8

    . replace delta1 =0 if delta1==. to ra bin th hin h gia nh sng vng ng bng/ khng phi vng ng bng.

    Hoc

    . gen delta2 = (region==2 | region==8)

    . gen poor = 1

    . replace poor = 0 if pcexp < 2930 to ra bin th hin h gia nh l h ngho (poor=1)/ khng ngho (poor=0).

    To bin mi bng lnh egen: to ra cc bin mi da trn cc hm thng k. C php:

    egen newvar = func_stat(varname)

    Cc hm thng k thng dng func_stat():

    Func_stat() Din gii

    mean() Trung bnh

    median() Trung v

    count() m s quan st

    sd() lch chun

    rowmean(var1 varn) Tnh gi tr trung bnh ca n bin t var1 n varntotal() Tnh tng cng

    V d:

    . egen avghhexp1=mean(hhexp1rl) chi tiu trung bnh ca h.

    . egen avghhexp2=mean(hhexp1rl), by(reg8) chi tiu h trung bnh theo 8 vng.

    . egen totalexp1=total(hhexp1rl)

    . egen totalexp2=total(hhexp1rl), by(urban08)

    . egen exp123=rowmean(waterexp elecexp garbexp)

    . order waterexp elecexp garbexp exp123

    . display (840+3720+108)/3

    3.5 Lnh sort v gsort

    . sort hhexp1rl

    . gsort +hhexp1rl

    . format %9.0f hhexp1rl

    ---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 4

  • Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS

    . gsort -hhexp1rl

    3.6 Lnh drop v keep

    C php:

    drop varlist keep varlist

    Xa hoc gi li quan st. C php:

    drop if exp keep if exp

    3.7 Lnh summarize (vit tt: su): thc hin thng k m t c bn.

    Thc hin thng k m t hoc cho tt c cc bin trong tp d liu hoc cho mt hay

    mt s bin chn lc trong tp d liu. C php:

    su su var1 var2 varn su, detail thc hin thng k m t chi tit, bao gm percentiles (phn v),

    trung v (median), trung bnh (mean), lch chun (sd), phng sai (variance),

    skewness, kurtosis.

    V d:

    . su hhexp1rl

    . su hh*

    . su waterexp if urban08==1

    . su waterexp elecexp if urban08==2

    Ch : if urban08 == 1 s dng 2 du =. . su hhexp1rl hhsize if reg8 == 8

    . su hhexp1rl hhsize if reg8 == 8 & urban08 == 1

    . su hhexp hhsize if region > 7 & urban08 == 1

    3.8 To bng tn s 1 hoc 2 chiu Lnh tab to bng tn s 1 chiu/ 2 chiu. C php:

    Bng 1 chiu : tab var

    Bng 2 chiu : tab var1 var2. tab reg8

    . tab urban08

    ---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 5

  • Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS

    . tab urban08 reg8

    C th s dng lnh tab3way to bng tn s a chiu.

    V d:

    . tab3way reg8 urban08 poor

    3.9 Kt hp tab v summarize Kt hp 2 lnh tab v su to ra thng k m t cho bin nh lng (cont_var1) theo

    cc thuc tnh ca bin phn loi (cat_var). C php:

    . tab cat_var, su(cont_var)

    . tab cat_var1 cat_var2, su(cont_var)

    V d:

    . tab urban08, su(hhexp1rl)

    . tab reg8, su(hhexp1rl)

    . tab reg8 urban08, su(hhexp1rl)

    th hin gi tr trung bnh ca bin nh lng theo tng thuc tnh ca 2 bin phn

    loi, s dng c php: tab cat_var1 cat_var2, su(cont_var) mean

    V d:

    . tab reg8 urban08, su(hhexp1rl) mean

    Mt s options ca lnh tab:

    C php: tab var1 var2, [options]

    Options Din gii

    cell Tnh phn trm theo mi

    row Tnh phn trm theo dng

    col Tnh phn trm theo ct

    chi2 Kim nh chi bnh phng (Pearson)

    all a ra nhiu kim nh

    V d:

    . tab poor urban08

    . tab poor urban08, cell

    . tab poor urban08, row col

    1 Bin nh tnh (bin phn loi Categorical variable).

    ---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 6

  • Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS

    3.10 Mt s lnh thng k m t khc

    Lnh tabstat, stats() C php: tabstat var, stats(options_stats) thc hin cc thng k m t cho bin var.

    Mt s thng k trong stats():

    Option_stat Din gii

    mean Trung bnh

    median Trung v

    iqr tri gia

    sd lch chun

    n S quan st

    max Gi tr ln nht

    min Gi tr nh nht

    range Khong bin thin Range = Max Min

    var Phng sai

    cv cv=sd/mean

    skewness Skewness

    kurtosis Kurtosis

    V d:

    . tabstat hhexp1rl, stats(mean median iqr sd)

    . tabstat hhexp1rl, stats(mean median iqr sd n max min range var)

    . tabstat hhexp1rl, stats(mean median iqr sd n max min range var cv skewness kurtosis)

    C th vit tt cu lnh v nh dng hnh thc th hin:

    . tabstat hhexp1rl, s(mean median iqr sd) format(%7.2f)

    Lnh tabstat vi by . tabstat hhexp1rl, stats(mean median iqr sd n max min range var) by(urban)

    Lnh table, c(): to bng thng k. C php: . table cat_var, c(option_stat cont_var)

    . table cat_var1 cat_var2, c(option_stat cont_var)

    . table cat_var1 cat_var2, c(option_stat1 cont_var1 option_stat2 cont_var2)

    V d:

    ---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 7

  • Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS

    . table poor, c(mean hhsize)

    . table poor, c(mean hhsize sd hhsize)

    . table poor, c(mean hhsize sd hhexp1rl)

    . table poor urban08, c(mean hhsize)

    . table poor urban08, c(mean hhsize sd hhexp1rl)

    Lnh table vi by . table poor, c(mean hhsize sd foodnom) by(urban)

    3.11 Biu phn phi tn s: hnh dng ca phn phi: cn i hay lch

    (lch phi hay lch tri). C php:

    histogram var, freq th tn s th hin s quan st tng ng. histogram var, percent th tn s th hin phn trm theo s quan st. histogram var, norm th tn s c th hin ng phn phi chun.

    V d:

    . histogram pcexp, freq

    . histogran lnpcexp, freq norm

    . histogram pcexp, percent

    . histogram pcexp, norm

    . histogram lnpcexp, norm

    020

    040

    060

    080

    0Fr

    eque

    ncy

    6 8 10 12lnpcexp

    ---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 8

  • Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS

    3.12 th hp Graph box

    Cho bit thng tin v khuynh hng tp trung, phn tn, cn i v cc outlier. C php:

    graph box var1 graph box varlist

    V d:

    . graph box edu

    graph box edu workepx

    ---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 9

  • Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS

    iqr = Q3 Q1 Max = Q3 + 1.5*iqr

    Min = Q1 1.5*iqr

    3.13 Lnh merge Ghp 2 file d liu li thnh 1 file. Gi s cn ghp file1 (file master) vi file2 (file using).

    - Xc nh bin (cc bin) chung ca file1 v file2;

    - Sort theo bin chung (cho c 2 file);

    - C php: merge [varlist] using file2

    ---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 10

  • Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS

    Ngun: Nicholas Minot (2002).

    V d:

    ***** merge

    *VHLSS2008

    use "D:\VHLSS\VHLSS08\Data\Hhold\hhexpe08.dta", clear

    gen double idhh= (10^2)*hoso+(10^4)*diaban+(10^7)*xa+(10^9)*huyen+(10^11)*tinh

    format idhh %15.0f

    order idhh

    keep idhh hh*

    foreach var of varlist hh* {

    rename `var' v08`var'

    }

    sort idhh

    save "C:\Users\TRUONGTHANHVU\Desktop\OutputStata11\hhexpe08s.dta"

    *VHLSS2006

    use "D:\VHLSS\VHLSS06\Data\hhold\hhexpe06.dta", clear

    gen double idhh= (10^2)*hoso+(10^4)*diaban+(10^7)*xa+(10^9)*huyen+(10^11)*tinh

    format idhh %15.0f

    order idhh

    keep idhh hh*

    foreach var of varlist hh* {

    rename `var' v06`var'

    }

    sort idhh

    save "C:\Users\TRUONGTHANHVU\Desktop\OutputStata11\hhexpe06s.dta"

    *MERGE

    merge idhh using "C:\Users\TRUONGTHANHVU\Desktop\OutputStata11\hhexpe08s.dta"

    tab _merge

    _merge | Freq. Percent Cum. ------------+----------------------------------- 1 | 5,062 35.52 35.52 2 | 5,062 35.52 71.04 3 | 4,127 28.96 100.00

    ---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 11

  • Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS

    ------------+----------------------------------- Total | 14,251 100.00

    * MERGE

    use "D:\VHLSS\VHLSS08\Data\Hhold\muc123a.dta", clear

    gen double idhh= (10^2)*hoso+(10^4)*diaban+(10^7)*xa+(10^9)*huyen+(10^11)*tinh

    format idhh %15.0f

    order idhh

    sort idhh

    gen double idmem= matv+(10^2)*hoso+(10^4)*diaban+(10^7)*xa+(10^9)*huyen+(10^11)*tinh

    format idmem %15.0f

    order idmem

    merge idhh using "C:\Users\TRUONGTHANHVU\Desktop\OutputStata11\hhexpe08s.dta"

    tab _merge

    count

    count if matv==1

    tab matv m1ac3

    duplicates drop idhh, force

    count

    ---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 12

  • Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS

    Ph lc

    Hm ton hc (Mathematic Functions)

    Cu lnh Din gii

    abs(x) Gi tr tuyt i (Absolute value)

    sin(x), cos(x), tan(x) Sin, cos, tang

    int(x), round(x) Ly s nguyn/lm trn s

    comb(n, k) Hm kt hp (Combinational function)

    exp(x) Hm m Exponential function

    ln(x) Logarit t nhin (Natural logarithm)

    logit(x), invlogit(x) Log ca t l odd v nghch o ca n

    max(x), min(x) GT ln nht v nh nht

    sqrt(x) Cn bc (Square root)

    sum(x) Tng cng

    Hnh thc d liu (Data Types)

    Dng Hnh thc Din gii

    float S thc 8 ch s

    double S thc 16 ch s

    int S nguyn -32767 ~ 32740

    long S nguyn -2,147,483,647 ~ 2,147,483,620

    byte S nguyn -127 ~ 100

    str# Chui (dng text) str1 n str244

    Hnh thc ton t (Operators)

    Dng Din gii

    S hc + (cng), - (tr), * (nhn), / (chia), ^ (m)

    So snh > (ln hn), >= (ln hn hoc bng), < (nh hn),

  • Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS

    Ti liu tham kho:

    1. Long, J. Scott. and Jeremy Freese. 2003. Regression Models for categorical

    Outcomes Using Stata , revised edition. College Station, TX: Stata Press.

    2. Nicholas Minot (2002).

    ---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 14