20
Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center [email protected]

Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center [email protected]

Embed Size (px)

Citation preview

Page 1: Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Harmonizing the World’s Census Microdata:

The IPUMS Project

Matt SobekMinnesota Population Center

[email protected]

Page 2: Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center sobek@pop.umn.edu

What is IPUMS-International?

Census data – 1960 to present

Samples – 1 to 10%, nationally representative

Microdata – individual-level

Extract system – select variables – pooled data

Downloadable – anonymized

Integrated – consistent codes across time and place

Page 3: Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Map of IPUMS Partners

Dark green = disseminating dataLight green = partners, not yet disseminating

83 countries

Page 4: Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Current Countries in IPUMS

44 countries130 samples279 million persons

EgyptGhanaGuineaKenyaRwandaSouth AfricaUganda

ArmeniaCambodiaChinaIndiaIraqIsraelJordanKyrgyz Rep.MalaysiaMongoliaPalestinePhilippinesVietnam

ArgentinaBoliviaBrazilCanadaChileColombiaCosta RicaEcuadorMexicoPanamaUnited StatesVenezuela

AustriaBelarusFranceGreeceHungaryItalyNetherlandsPortugalRomaniaSloveniaSpainUnited Kingdom

Africa Asia Americas Europe

Page 5: Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center sobek@pop.umn.edu

IPUMS MicrodataRelation to head

Marital status Literacy Occupation

Page 6: Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Aggregate Data

Page 7: Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Data Standardization

Original Input Input Output OutputLabels N Variable Codes Labels Codes Labels

SexMasculin 223,178 1 Male 1 MaleFeminin 234,369 2 Female 2 FemaleNon Declare 290 3 Undeclared 9 Unknown

School63,239 B [no label] 0 NIU (not in universe)

Oui 47,320 1 Yes 1 Attends schoolNon 346,460 2 No 2 Does not attend schoolNon Declare 800 3 Undeclared 9 Unknown

2 5 [no label] 9 "16 9 [no label] 9 "

Page 8: Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Data Integration – Marital Status

MARST Marital Status

code label CN82A403 CO73A411 KN89A413 MX70A402 US90A425

100 SINGLE/NEVER MARRIED 1=never married 4=single 1=single 9=single 6=never married

200 MARRIED/IN UNION

210 Married (not specified) 2=married 2=married 3=monogamous 1=married

211 Civil 3=only civil

212 Religious 4=only religious

213 Civil and religious 2=civil and religious

214 Polygamous 3=polygamous

220 Consensual union 1=free union 5=free union

300 SEPARATED/DIVORCED 3=sep. or divorced

310 Separated 6=separated 8=separated 3=separated

321 Legally separated

322 De facto separated

330 Divorced 4=divorced 5=divorced 7=divorced 4=divorced

400 WIDOWED 3=widowed 5=widowed 4=widowed 6=widowed 5=widowed

999 UNKNOWN/MISSING 0=missing 6=unknown B=blank 1=unknown

China1982

Colombia1973

Kenya1989

Mexico1970

U.S.A.1990

Page 9: Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center sobek@pop.umn.edu

XML Harmonization Table<sample>

<id>gh2000a</id>

<rectype>P</rectype>

<svar>GH00A401</svar>

<recode>

<orig>1</orig>

<targ>1000</targ>

<lab>Head</lab>

<freq>347162</freq>

</recode>

<recode>

<orig>3</orig>

<targ>2000</targ>

<lab>Spouse</lab>

<freq>178544</freq>

</recode>

<recode>

<orig>4</orig>

<targ>3000</targ>

<lab>Child</lab>

<freq>707986</freq>

</recode>

Page 10: Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Census Questionnaire (Mexico 2000)

Water

Access

Page 11: Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center sobek@pop.umn.edu

5. Number of Rooms

How many rooms are used for sleeping without counting hallways? _____ Write the number

Without counting the hallways or bathrooms how many total rooms are in this dwelling? Count the kitchen

_____Write the number

6. Access to water

Read all of the options until you get an affirmative answer. Circle only one answer

1 Running water inside the dwelling 2 Running water outside the dwelling but on the land 3 Running water from a public faucet or hydrant 4 Running water that is carried from another dwelling 5 Tanked in by truck 6 Water from a well, river, lake, stream or other

Answers 3, 4, 5, 6 continue with number 8

7. Water supply

How many days of the week is water available? Circle only one answer

1 Daily 2 Every third day 3 Twice a week 4 Once a week 5 Occasionally

Text of Census Questionnaire (Mexico 2000)

Page 12: Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Water access

XML-Tagged Census Questionnaire (Mexico 2000)

Page 13: Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Variable Description (Literacy)

Page 14: Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Availability of Selected Person Variables

(Number of samples)

Relationship to head 111 Religion 48

Age 111 Language 26

Sex 111 Ethnicity 36

Marital status 110 Race 19

Age at first marriage 16 School attendance 87

Children ever born 81 Literacy 75

Children surviving 51 Education attainment 100

Mother's mortality status 15 Years of schooling 65

Country of birth 72 Employment status 102

Place of birth 78 Class of worker 103

Citizenship 57 Occupation 100

Year of immigration 18 Industry 99

Migration, international 43 Hours worked weekly 37

Migration, internal 87 Total income 23

Disability 29 Earned income 22

Page 15: Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Availability of Selected Household Variables

(Number of samples)

Urban-rural status 75 Electricity 69

Geography, 1st level 101 Water 80

Geography, 2nd level 73 Sewage 70

Home ownership 94 Toilet 76

Number of rooms 86 Cooking fuel 34

Floor material 42 Telephone 49

Wall material 34 Television 42

Roof material 23 Computer 14

Living Area 12 Automobiles 39

Page 16: Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Number of Geographic Units Identified

Mexico 2454 Malaysia 133 France 22USA 2071 Ecuador 129 Mongolia 21Brazil 1447 Ghana 110 Italy 19Philippines 1173 Bolivia 84 Armenia 19Colombia 532 India 78 Israel 18Spain 366 Kenya 69 Palestine 16China 347 Vietnam 61 UK 13Argentina 315 Costa Rica 61 Slovenia 13Egypt 278 Kyrgyz 55 Rwanda 12Venezuela 235 Romania 47 Canada 11South Africa 225 Jordan 44 Portugal 7Chile 178 Iraq 44 Belarus 6Uganda 163 Guinea 34 Netherlands 1Greece 154 Panama 31 Hungary 1Cambodia 149 Austria 31

Page 17: Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Size of Geographic Units by Country

Mexico 12 Cambodia 73 Iraq 365Colombia 32 Mongolia 87 Romania 414Brazil 41 South Africa 96 Rwanda 731Philippines 41 Malaysia 98 Portugal 776Ecuador 44 Slovenia 100 Canada 963Spain 46 USA 130 Vietnam 1,045Costa Rica 47 Ghana 134 Belarus 1,413Bolivia 48 Uganda 137 France 2,087Venezuela 48 Palestine 138 Italy 2,114Panama 49 Armenia 142 China 2,985Greece 50 Guinea 168 UK 5,143Chile 55 Egypt 195 India 8,635Argentina 61 Austria 225 Hungary 10,210 *Jordan 65 Israel 332 Netherlands 15,986 *Kyrgyz 70 Kenya 359

(Median population in 000s)

Page 18: Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Urban Definitions (N of countries)

Administrative divisions 11

Population threshold 10

Population threshold and . . .

Administrative divisions 1

Agglomeration/density 4

Functional criteria 4

Agglomeration/density and . . .

Administrative divisions 2

Functional criteria 3

(Functional criteria include infrastructure, businesses, agriculture, etc.)

Page 19: Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center sobek@pop.umn.edu

National Stats Office

Questionnaire

Data collection

Data processing

Aggregate statistics

Tabulator

Public samples

Full microdata

Samples drawn

Public samples

IPUMS samples

Harmonization

Aggregate statistics

IPUMS

Sampling

Donation

Confidentiality

Page 20: Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center sobek@pop.umn.edu

END

http://international.ipums.org

Matt SobekMinnesota Population Center

[email protected]