Upload
trinh-kun
View
36
Download
2
Embed Size (px)
DESCRIPTION
Chu de 2 Stata_VHLSS
Citation preview
Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS
Ch 2. STATA & VHLSS
1. Gii thiu Stata Vi Stata:
1. Thao tc (nhanh chng) bng cu lnh;
2. Lu tr kt qu:
Chn ni lu tr kt qu (th mc lm vic): File > Change working directory . cd "C:\...\Desktop\OutputStata11"
Dng lnh : log using file_output, text lu kt qu. . log using 7Aug2010, text
Trong , file_output l tn file kt qu (7Aug2010); text file_output c nh dng text. Trong qu trnh lm vic, dng lnh log off tm thi dng hnh ng lu tr;
lnh log on m li chc nng lu tr; log close thot ra file lu tr.
3. To mi v thc thi dofile: dng lnh doedit to mt dofile, sau t tn cho
dofile (v d : vhlss2008.do) v lu vo th mc lm vic. thc thi dofile, dng
lnh do dofile_name (v d: do vhlss2008).
2. Cc cng c c bn trong Stata
---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 1
Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS
2.1 Thng k m t
Cu lnh Din gii
describe M t d liu (Describe a dataset)
summarize Thng k m t (Descriptive statistics)
tabulate To bng tn s mt hoc hai chiu (One- and two-way frequency tables)
table To bng (thng k m t)
tabstat To bng thng k m t (Table of descriptive statistics)
histogram Biu tn s
graph V th
2.2 Qun l d liu
Cu lnh Din gii
describe M t d liu
save; edit Lu tr, chnh sa d liu
Generate/egen; replace; recode To bin mi; thay th gi tr ca bin; m ha li
keep; drop Gi li hay xa bin/quan st
label; format To nhn cho bin; nh dng hnh thc
append; merge Ghp ni quan st, cc file
rename i tn bin
sort; gsort; order Sp sp th t quan st theo gi tr; th t cc bin
2.3 Phn tch phng sai
Cu lnh Din gii
ttest t-test
oneway; anova ANOVA
correlate; pwcorr Tng quan
2.4 Phn tch hi quy
Cu lnh Din gii
regress Hi quy tuyn tnh
logistic; logit; probit Hi quy logistic; logit; probit
3. Thc hnh Stata
---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 2
Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS
S dng file hhexpe08.dta
3.1 Lnh use ti d liu
. use "D:\VHLSS\VHLSS08\Data\Hhold\hhexpe08.dta", clear
. use http://fmwww.bc.edu/ec-p/data/wooldridge/CEOSAL1
(Introductory Econometrics: A Modern Approach by Jeffrey M. Wooldridge (1st &
2d eds.) , Chapter 2 - The Simple Regression Model; Example 2.3: CEO Salary and
Return on Equity)
. use http://fmwww.bc.edu/ec-p/data/wooldridge/WAGE1
3.2 Lnh describe (c th vit tt : d) m t tp d liu (s quan st, s bin, tn bin trong tp d liu).
. d
. d hh*
Lnh tabmiss xem c gi tr missing. . tabmiss
. tabmiss hh*
3.3 if v in
. su hhexp1rl if reg==8
. su hhexp1rl if reg>7
. sort tinh huyen xa diaban hoso
. list tinh huyen xa diaban hoso riceexp in 1/10
. list tinh huyen xa diaban hoso riceexp in -10/-1
3.4 Lnh gennerate v replace
To bin mi bng lnh generate (vit tt gen)
generate newvar = exp [if exp]
exp = m t ni dung ca bin mi.
V d:
. gen weexp= waterexp+elecexp to ra bin th hin tng chi tiu v in v nc ca h gia nh (bin weexp).
. gen pcexp= hhexp1rl/ hhsize to ra bin th hin chi tiu bnh qun u ngi ca h gia nh (bin pcexp).
To bin gi trong Stata:
---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 3
Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS
. gen delta1 =1 if region==2 | region==8
. replace delta1 =0 if delta1==. to ra bin th hin h gia nh sng vng ng bng/ khng phi vng ng bng.
Hoc
. gen delta2 = (region==2 | region==8)
. gen poor = 1
. replace poor = 0 if pcexp < 2930 to ra bin th hin h gia nh l h ngho (poor=1)/ khng ngho (poor=0).
To bin mi bng lnh egen: to ra cc bin mi da trn cc hm thng k. C php:
egen newvar = func_stat(varname)
Cc hm thng k thng dng func_stat():
Func_stat() Din gii
mean() Trung bnh
median() Trung v
count() m s quan st
sd() lch chun
rowmean(var1 varn) Tnh gi tr trung bnh ca n bin t var1 n varntotal() Tnh tng cng
V d:
. egen avghhexp1=mean(hhexp1rl) chi tiu trung bnh ca h.
. egen avghhexp2=mean(hhexp1rl), by(reg8) chi tiu h trung bnh theo 8 vng.
. egen totalexp1=total(hhexp1rl)
. egen totalexp2=total(hhexp1rl), by(urban08)
. egen exp123=rowmean(waterexp elecexp garbexp)
. order waterexp elecexp garbexp exp123
. display (840+3720+108)/3
3.5 Lnh sort v gsort
. sort hhexp1rl
. gsort +hhexp1rl
. format %9.0f hhexp1rl
---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 4
Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS
. gsort -hhexp1rl
3.6 Lnh drop v keep
C php:
drop varlist keep varlist
Xa hoc gi li quan st. C php:
drop if exp keep if exp
3.7 Lnh summarize (vit tt: su): thc hin thng k m t c bn.
Thc hin thng k m t hoc cho tt c cc bin trong tp d liu hoc cho mt hay
mt s bin chn lc trong tp d liu. C php:
su su var1 var2 varn su, detail thc hin thng k m t chi tit, bao gm percentiles (phn v),
trung v (median), trung bnh (mean), lch chun (sd), phng sai (variance),
skewness, kurtosis.
V d:
. su hhexp1rl
. su hh*
. su waterexp if urban08==1
. su waterexp elecexp if urban08==2
Ch : if urban08 == 1 s dng 2 du =. . su hhexp1rl hhsize if reg8 == 8
. su hhexp1rl hhsize if reg8 == 8 & urban08 == 1
. su hhexp hhsize if region > 7 & urban08 == 1
3.8 To bng tn s 1 hoc 2 chiu Lnh tab to bng tn s 1 chiu/ 2 chiu. C php:
Bng 1 chiu : tab var
Bng 2 chiu : tab var1 var2. tab reg8
. tab urban08
---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 5
Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS
. tab urban08 reg8
C th s dng lnh tab3way to bng tn s a chiu.
V d:
. tab3way reg8 urban08 poor
3.9 Kt hp tab v summarize Kt hp 2 lnh tab v su to ra thng k m t cho bin nh lng (cont_var1) theo
cc thuc tnh ca bin phn loi (cat_var). C php:
. tab cat_var, su(cont_var)
. tab cat_var1 cat_var2, su(cont_var)
V d:
. tab urban08, su(hhexp1rl)
. tab reg8, su(hhexp1rl)
. tab reg8 urban08, su(hhexp1rl)
th hin gi tr trung bnh ca bin nh lng theo tng thuc tnh ca 2 bin phn
loi, s dng c php: tab cat_var1 cat_var2, su(cont_var) mean
V d:
. tab reg8 urban08, su(hhexp1rl) mean
Mt s options ca lnh tab:
C php: tab var1 var2, [options]
Options Din gii
cell Tnh phn trm theo mi
row Tnh phn trm theo dng
col Tnh phn trm theo ct
chi2 Kim nh chi bnh phng (Pearson)
all a ra nhiu kim nh
V d:
. tab poor urban08
. tab poor urban08, cell
. tab poor urban08, row col
1 Bin nh tnh (bin phn loi Categorical variable).
---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 6
Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS
3.10 Mt s lnh thng k m t khc
Lnh tabstat, stats() C php: tabstat var, stats(options_stats) thc hin cc thng k m t cho bin var.
Mt s thng k trong stats():
Option_stat Din gii
mean Trung bnh
median Trung v
iqr tri gia
sd lch chun
n S quan st
max Gi tr ln nht
min Gi tr nh nht
range Khong bin thin Range = Max Min
var Phng sai
cv cv=sd/mean
skewness Skewness
kurtosis Kurtosis
V d:
. tabstat hhexp1rl, stats(mean median iqr sd)
. tabstat hhexp1rl, stats(mean median iqr sd n max min range var)
. tabstat hhexp1rl, stats(mean median iqr sd n max min range var cv skewness kurtosis)
C th vit tt cu lnh v nh dng hnh thc th hin:
. tabstat hhexp1rl, s(mean median iqr sd) format(%7.2f)
Lnh tabstat vi by . tabstat hhexp1rl, stats(mean median iqr sd n max min range var) by(urban)
Lnh table, c(): to bng thng k. C php: . table cat_var, c(option_stat cont_var)
. table cat_var1 cat_var2, c(option_stat cont_var)
. table cat_var1 cat_var2, c(option_stat1 cont_var1 option_stat2 cont_var2)
V d:
---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 7
Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS
. table poor, c(mean hhsize)
. table poor, c(mean hhsize sd hhsize)
. table poor, c(mean hhsize sd hhexp1rl)
. table poor urban08, c(mean hhsize)
. table poor urban08, c(mean hhsize sd hhexp1rl)
Lnh table vi by . table poor, c(mean hhsize sd foodnom) by(urban)
3.11 Biu phn phi tn s: hnh dng ca phn phi: cn i hay lch
(lch phi hay lch tri). C php:
histogram var, freq th tn s th hin s quan st tng ng. histogram var, percent th tn s th hin phn trm theo s quan st. histogram var, norm th tn s c th hin ng phn phi chun.
V d:
. histogram pcexp, freq
. histogran lnpcexp, freq norm
. histogram pcexp, percent
. histogram pcexp, norm
. histogram lnpcexp, norm
020
040
060
080
0Fr
eque
ncy
6 8 10 12lnpcexp
---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 8
Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS
3.12 th hp Graph box
Cho bit thng tin v khuynh hng tp trung, phn tn, cn i v cc outlier. C php:
graph box var1 graph box varlist
V d:
. graph box edu
graph box edu workepx
---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 9
Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS
iqr = Q3 Q1 Max = Q3 + 1.5*iqr
Min = Q1 1.5*iqr
3.13 Lnh merge Ghp 2 file d liu li thnh 1 file. Gi s cn ghp file1 (file master) vi file2 (file using).
- Xc nh bin (cc bin) chung ca file1 v file2;
- Sort theo bin chung (cho c 2 file);
- C php: merge [varlist] using file2
---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 10
Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS
Ngun: Nicholas Minot (2002).
V d:
***** merge
*VHLSS2008
use "D:\VHLSS\VHLSS08\Data\Hhold\hhexpe08.dta", clear
gen double idhh= (10^2)*hoso+(10^4)*diaban+(10^7)*xa+(10^9)*huyen+(10^11)*tinh
format idhh %15.0f
order idhh
keep idhh hh*
foreach var of varlist hh* {
rename `var' v08`var'
}
sort idhh
save "C:\Users\TRUONGTHANHVU\Desktop\OutputStata11\hhexpe08s.dta"
*VHLSS2006
use "D:\VHLSS\VHLSS06\Data\hhold\hhexpe06.dta", clear
gen double idhh= (10^2)*hoso+(10^4)*diaban+(10^7)*xa+(10^9)*huyen+(10^11)*tinh
format idhh %15.0f
order idhh
keep idhh hh*
foreach var of varlist hh* {
rename `var' v06`var'
}
sort idhh
save "C:\Users\TRUONGTHANHVU\Desktop\OutputStata11\hhexpe06s.dta"
*MERGE
merge idhh using "C:\Users\TRUONGTHANHVU\Desktop\OutputStata11\hhexpe08s.dta"
tab _merge
_merge | Freq. Percent Cum. ------------+----------------------------------- 1 | 5,062 35.52 35.52 2 | 5,062 35.52 71.04 3 | 4,127 28.96 100.00
---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 11
Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS
------------+----------------------------------- Total | 14,251 100.00
* MERGE
use "D:\VHLSS\VHLSS08\Data\Hhold\muc123a.dta", clear
gen double idhh= (10^2)*hoso+(10^4)*diaban+(10^7)*xa+(10^9)*huyen+(10^11)*tinh
format idhh %15.0f
order idhh
sort idhh
gen double idmem= matv+(10^2)*hoso+(10^4)*diaban+(10^7)*xa+(10^9)*huyen+(10^11)*tinh
format idmem %15.0f
order idmem
merge idhh using "C:\Users\TRUONGTHANHVU\Desktop\OutputStata11\hhexpe08s.dta"
tab _merge
count
count if matv==1
tab matv m1ac3
duplicates drop idhh, force
count
---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 12
Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS
Ph lc
Hm ton hc (Mathematic Functions)
Cu lnh Din gii
abs(x) Gi tr tuyt i (Absolute value)
sin(x), cos(x), tan(x) Sin, cos, tang
int(x), round(x) Ly s nguyn/lm trn s
comb(n, k) Hm kt hp (Combinational function)
exp(x) Hm m Exponential function
ln(x) Logarit t nhin (Natural logarithm)
logit(x), invlogit(x) Log ca t l odd v nghch o ca n
max(x), min(x) GT ln nht v nh nht
sqrt(x) Cn bc (Square root)
sum(x) Tng cng
Hnh thc d liu (Data Types)
Dng Hnh thc Din gii
float S thc 8 ch s
double S thc 16 ch s
int S nguyn -32767 ~ 32740
long S nguyn -2,147,483,647 ~ 2,147,483,620
byte S nguyn -127 ~ 100
str# Chui (dng text) str1 n str244
Hnh thc ton t (Operators)
Dng Din gii
S hc + (cng), - (tr), * (nhn), / (chia), ^ (m)
So snh > (ln hn), >= (ln hn hoc bng), < (nh hn),
Khoa Kinh t pht trin D n Trng H Kinh t TPHCM Nghin cu nh lng vi VHLSS
Ti liu tham kho:
1. Long, J. Scott. and Jeremy Freese. 2003. Regression Models for categorical
Outcomes Using Stata , revised edition. College Station, TX: Stata Press.
2. Nicholas Minot (2002).
---------------------------------------------------------------------------------------------------------------------------------------------- Thanh V, Email: [email protected] Ghi ch bi ging 14