Apr-15H.S.1Apr-15H.S.1 Stata Introduction, Short v2 Hein Stigum Presentation, data and programs at: ...

Preview:

Citation preview

04/18/23 H.S. 104/18/23 H.S. 1

Stata Introduction, Shortv2

Hein Stigum

Presentation, data and programs at:

http://folk.uio.no/heins/

courses

Stata introduction

• General use– Interface and menu

– Do-files and syntax

– Data handling

• Analysis– Descriptive

– Graphs

– Bivariate

04/18/23 H.S. 2

04/18/23 H.S. 304/18/23 H.S. 3

Why Stata

• Pro– Aimed at epidemiology

– Many methods, growing

– Graphics

– Structured, Programmable

– Coming soon to a course near you

• Con– Memory>file size

Interface

04/18/23 H.S. 504/18/23 H.S. 5

Interface Stata 9

Interface Stata 12

04/18/23 H.S. 6

Dofile

Dataedit

04/18/23 H.S. 704/18/23 H.S. 7

Menu

04/18/23 H.S. 804/18/23 H.S. 8

Do-file example

New do-file: icon or Ctrl-9

Run: Mark, Ctrl-D

04/18/23 H.S. 904/18/23 H.S. 9

Syntax

• Syntax[bysort varlist:] command [varlist] [if exp] [in range][, opts]

• Examples– mean age– mean age if sex==1– bysort sex: summarize age– summarize age ,detail

Data handling

04/18/23 H.S. 1104/18/23 H.S. 11

Import data

• Using SPSS 14.0-17.0– Save as, Stata Version 8 SE

04/18/23 H.S. 1204/18/23 H.S. 12

Use and save data

• Open data– use “C:\Course\Myfile.dta”, clear

• Describe– describe describe all variables

– list x1 x2 in 1/20 list obs nr 1 to 20

• Save data– save “C:\Course\Myfile.dta” ,replace

04/18/23 H.S. 1304/18/23 H.S. 13

Use data from web

• webuse “file” use data from Stata homepage

1.webuse set “http://www.med.uio.no/forskning/doktorgrad-karriere/forskerutdanning/kurs/biostatistikk/mf9510-logistisk-regresjon-overlevelsesanalyse-cox/”

set homepage

2.webuse “birth1” data for exercise 1

04/18/23 H.S. 1404/18/23 H.S. 14

Generate, replace

• Index– generate index=0

– replace index=1 if sex==1 & age<30

• Young/Old– generate old=(age>50)

• Serial numbers, lags– generate id=_n

– generate age1=age[ _n-1]

if age<.

04/18/23 H.S. 1504/18/23 H.S. 15

Dates

• From numeric to dateex: m=12, d=2, y=1987

generate birth=mdy(m,d,y)

format birth %td

• From string to dateex: bstr=“01.12.1987”

generate birth=date(bstr,”DMY”)

format birth %td

04/18/23 H.S. 1604/18/23 H.S. 16

Missing• Obs!!!

– Represented as ”.”– Missing values are large numbers – age>30 will include missing.– age>30 if age<. will not.

• Test– replace age=0 if (age==.)

• Remove– drop if age==.

• Change– replace educ=. if educ==99

04/18/23 H.S. 1704/18/23 H.S. 17

Describe missing• Summarize variables

• Missing in tables

misstable summarize bullied sex new command

summarize id bullied sex

tab bullied sex, missing

04/18/23 H.S. 1804/18/23 H.S. 18

Help

• General– help command

– findit keyword search Stata+net

• Examples– help table

– findit aflogit

04/18/23 H.S. 1904/18/23 H.S. 19

Summing up

• Use do files– Run: Mark, Ctrl-D

• Syntax– command [varlist] [if exp] [in range] [, options]

• Missing– age>30 if age<.

– generate old=(age>50) if age<.

• Help– help describe

Descriptive

04/18/23 H.S. 21

Descriptive• Continuous

• Categorical

summarize weight

summarize weight, details fractiles ++

tabulate bullied

tabulate bullied, nolab show coding

04/18/23 H.S. 2204/18/23 H.S. 22

Other descriptives

tabstat mAge, stat( N min p50 mean max) by(parity)

04/18/23 H.S. 23

Graphics

04/18/23 H.S. 2404/18/23 H.S. 24

Twoway plots• Syntax

– twoway (plot1, opts) (plot2, opts), opts

• One plot– kdensity bw

– scatter bw gest

0 2000 4000 6000Birth weight

kernel = epanechnikov, bandwidth = 102.3251

Kernel density estimate

02

000

400

06

000

Birt

h w

eig

ht

240 260 280 300 320 340Gestational age

04/18/23 H.S. 2504/18/23 H.S. 25

0.0

002

.000

4.0

006

.000

8kd

ensi

ty w

eigh

t

1000 2000 3000 4000 5000gram

Weight distribution by sex

twoway ( kdensity bw if sex==1, lcolor(blue) ) ///( kdensity bw if sex==2, lcolor(red ) )

04/18/23 H.S. 2604/18/23 H.S. 26

twoway (scatter bw gest) (fpfitci bw gest) (lfit bw gest)

200

03

000

400

05

000

600

0g

ram

250 270 290 310days

Weight by gestational age

scatter smooth with CI line fit

04/18/23 H.S. 2704/18/23 H.S. 27

Titles

1000

2000

3000

4000

5000

ytitl

e

240 260 280 300 320xtitle

note

subtitletitle

scatter bw gest, title("title") subtitle("subtitle") ///xtitle("xtitle") ytitle("ytitle") note("note")

Bivariate analysis

04/18/23 H.S. 2904/18/23 H.S. 29

2 independent samples

2000 3000 4000 5000 6000Birth weight

twoway ( kdensity weight if sex==1, lcolor(blue) ) ///

( kdensity weight if sex==2, lcolor(red) )

Equal means?

Equal variance?

Do boys and girls have the same mean birth weight?

04/18/23 H.S. 3004/18/23 H.S. 30

2 independent samples test

ttest weight, by(sex) unequalttest w1 w2, paired

ttest weight, by(sex) 2-sample T-test

04/18/23 H.S. 3104/18/23 H.S. 31

Crosstables

equal proportions?

Are boys bullied as much as girls?

tabulate bullied sex, col chi2 nofreq

04/18/23 H.S. 3204/18/23 H.S. 32

Summing up

• Descriptivesummarize weight

tabulate sex

• Graphstwoway (plot1, opts) (plot2, opts), opts

• Bivariate• ttest weight, by(sex)• tabulate bullied sex, chi2