数据挖掘 DATA MINING Shanghai China, July 5, 2015 Yawei Zhang, MD, PhD, MPH Associate Professor...

Preview:

Citation preview

数据挖掘数据挖掘DATA MINING

Shanghai China, July 5, 2015

Yawei Zhang, MD, PhD, MPHAssociate Professor

Yale University School of Public Health

Data MiningData Mining

Data mining is a process of finding anomalies, patterns, and correlations within large data sets to predict outcomes

More information does not mean more knowledge

Data mining allows us to sift through all the chaotic and repetitive noise, understand what is relevant and then make good use of that information to assess likely outcome

Data BasesData Bases

Registry DatabasesTumor registryBirth registryMortality registry

Health Insurance Databases Medical Records Research Survey Databases Individual epidemiologic study databases

Data MiningData Mining

Knowledge discovery in databases

Foundation

Statistics: numeric study of data relationships

Artificial intelligence: human-like intelligence displayed by software and/or machines

Machine learning: algorithms that can learn from data and make predictions

Gansu Provincial Maternity and Child Care Hospital

Lanzhou Birth Cohort StudyLanzhou Birth Cohort Study

Eligible Study Population (N=14,359)Come to the hospital for delivery in 2010-2012

Ages 18 years or olderGestational age ≥20 weeks

No mental illness

Participants (N=10,542)3,712 refused to participate

105 did not complete in-person interview

QuestionnaireDemographic and lifestyle

Residential historyMedical and reproductive

Diet and supplements

BiosamplesMaternal blood

Cord blood

Medical RecordsBirth outcomes

(Birthweight, gestational age, birth length, head circumference, defects,

Preterm Birth, SGA, LGA, low BW)Maternal complications

(Gestational hypertension, preeclampsia,

gestational diabetes, thyroid diseases)

Air PollutantsPM10

SO2

NO2

TemperatureHumidity

Folic acid supplementation and dietary folate Folic acid supplementation and dietary folate intake and risk of preeclampsia intake and risk of preeclampsia

Folic acid supplements reduce blood homocysteine levels, which is elevated among women with gestational hypertension and preeclampsia

Epidemiologic studies reached inconsistent results Three studies found reduced risk associated with folic acid containing

multivitamins Three studies reported no association with folic acid supplements alone One study reported reduced risk associated with dietary folate intake

(Wang et al. Eur J Clin Nutr 2015 PMID: 25626412)

Study PopulationStudy Population

Sample size 10,041 – Excluding women with chronic hypertension and

gestational hypertension

– Excluding women who give birth defects

Preeclampsia:

Exposure AssessmentExposure Assessment

Folic Acid Supplements Users: those who took folic acid supplements alone or

folic acid-containing multivitamins before conception and/or during pregnancy

Nonusers: those who never took folic acid supplements alone or folic acid-containing multivitamins before conception and/or during pregnancy

Dietary folate Estimated from the frequency of consumption and

portion size of food items using the Chinese Standard Tables of Food Consumption

Folic acid supplementation and dietary folate Folic acid supplementation and dietary folate intake and risk of preterm birth in China intake and risk of preterm birth in China

Folate plays an essential role in DNA synthesis, repair, and methylation

Seven randomized controlled trials linking maternal folic acid supplementation to PB have reported inconsistent findings.

Epidemiologic studies examining folic acid supplementation and dietary folate and PB have also reported mixed results 1 positive, 10 negative, and 3 null findings.

(Liu et al. Eur J Nutr 2015 (in press))

Study PopulationStudy Population

10,179 women having singleton live birth

Preterm Birth (<37 completed gestational weeks, N=1,019) – Moderate PB (32 to <37 completed weeks of gestation)

– Very PB ( <32 completed weeks)

– Medically indicated PB When a placental, uterine, fetal, or maternal condition exists prompting the

medical team to proceed with delivery after the risks and benefits of continuing pregnancy versus early delivery are weighed.

Examples of risky conditions prompting a decision include: placental abruption, placenta accreta, placenta or vasa previa, prior classical cesarean, uterine rupture or dehiscence, fetal intrauterine growth restriction, select fetal anomalies, severe preeclampsia, uncontrolled gestational or chronic hypertension, complicated pregestational diabetes and oligohydramnios.

– Spontaneous PB With or without PB premature rupture of membranes (PPROM).

Table 3. Associations between folic acid supplementation and risk of preterm birth

Folic acid supplement use Controls

Preterm (<37 weeks)Moderate preterm

(32 to <37 weeks)Very preterm (<32 weeks) Medically indicated preterm Spontaneous preterm

Cases ORa 95% CI Cases ORa 95% CI Cases ORa,c 95% CI Cases ORb,c 95% CI Cases ORb 95% CI

Non-Users 1982 333 1.00 252 1.00 81 1.00 120 1.00 213 1.00

Users 7178 686 0.80 0.68, 0.94 580 0.92 0.77, 1.09 106 0.50 0.36, 0.69 218 0.82 0.63, 1.05 468 0.77 0.64, 0.93

≤12 weeks 4405 481 0.85 0.72, 1.01 411 0.99 0.82, 1.18 70 0.50 0.35, 0.71 153 0.85 0.65, 1.11 328 0.82 0.68, 1.00

>12 weeks 2773 205 0.67 0.55, 0.83 169 0.74 0.59, 0.94 36 0.49 0.31, 0.77 65 0.73 0.52, 1.03 140 0.64 0.51, 0.82

P for trend 0.01 0.004 0.91 0.30 0.03

Preconception &

during pregnancy 2734 217 0.75 0.61, 0.92 183 0.85 0.68, 1.07 34 0.47 0.30, 0.75 66 0.79 0.56, 1.12 151 0.73 0.57, 0.93

≤12 weeks 569 59 0.88 0.64, 1.21 52 1.06 0.75, 1.49 7 0.40 0.18, 0.91 16 0.78 0.44, 1.36 43 0.93 0.65, 1.34

>12 weeks 2165 158 0.71 0.57, 0.89 131 0.79 0.61, 1.01 27 0.50 0.30, 0.81 50 0.80 0.55, 1.16 108 0.67 0.52, 0.87

P for trend 0.21 0.098 0.71 1.00 0.10

Preconception only 339 35 0.88 0.60, 1.31 33 1.12 0.75, 1.68 2 0.21 0.59, 0.88 8 0.72 0.38, 1.38 27 0.98 0.63, 1.51

≤4 weeks 89 12 1.01 0.52, 1.95 10 1.17 0.58, 2.37 2 0.61 0.14, 2.69 3 0.88 0.30, 2.60 9 1.18 0.57, 2.45

>4 weeks 250 23 0.83 0.52, 1.33 23 1.10 0.69, 1.76 0 - 5 0.66 0.30, 1.45 18 0.91 0.54, 1.52

P for trend 0.80 0.99 0.73 0.56 0.79

During pregnancy only

4105 434 0.82 0.69, 0.97 364 0.93 0.77, 1.12 70 0.53 0.37, 0.75 144 0.84 0.64, 1.09 290 0.77 0.63, 0.94

≤8 weeks 1871 246 0.94 0.77, 1.13 206 1.07 0.87, 1.32 40 0.59 0.39, 0.88 80 0.93 0.69, 1.27 166 0.91 0.72, 1.13

>8 weeks 2234 188 0.70 0.57, 0.85 158 0.79 0.63, 0.99 30 0.46 0.29, 0.72 64 0.74 0.53, 1.02 124 0.64 0.50, 0.81

P for trend 0.005 0.007 0.42 0.16 0.003a Adjusted for maternal age, education level, smoking, parity, preeclampsia, maternal diabetes, preeclampsia, pre-pregnancy BMI, family monthly income per capita, maternal employment during pregnancy, history of preterm, and dietary folate intake.b Adjusted all variables above except for preeclampsia and maternal diabetes.c Estimated by using Fisher’s exact test for the number of cases in a category<5.

Table 4. Associations between estimated dietary folate intake and risk of preterm birth  

Dietary folate duration & intake levels (µg

/day)Controls

Preterm (<37 weeks)Moderate preterm (32 to <37 weeks)

Very preterm (<32 weeks)

Medically indicated preterm Spontaneous preterm  

Cases ORa 95% CI Cases ORa 95% CI Cases ORa 95% CI Cases ORb 95% CI Cases ORb 95% CI  

Preconception  

Q1 <118.6 2248 313 1.00 252 1.00 61 1.00 109 1.00 204 1.00  

Q2 118.6-161.8 2236 246 0.90 0.75, 1.08 197 0.91 0.74, 1.11 49 0.91 0.62, 1.35 73 0.99 0.76, 1.29 173 0.95 0.76, 1.17  

Q3 161.8-224.6 2241 242 0.84 0.70, 1.01 196 0.85 0.69, 1.04 46 0.82 0.55, 1.21 79 0.76 0.57, 1.01 163 0.87 0.70, 1.08  

Q4 ≥224.6 2245 196 0.68 0.56, 0.83 172 0.76 0.61, 0.94 24 0.44 0.27, 0.71 67 0.60 0.44, 0.81 129 0.69 0.54, 0.87  

P for trend <.001 0.009 0.001 <.001 0.002  

 Per 10 µg increase

    0.996 0.990,1.001  0.998

0.993,1.003 

0.975 0.958,0.992

 0.991

0.981,1.000 

0.993 0.986,1.000

 

During pregnancy  

Q1 <155.8 2245 373 1.00 285 1.00 88 1.00 124 1.00 249 1.00  

Q2 155.8-202.8 2239 228 0.70 0.59, 0.84 182 0.74 0.60, 0.90 46 0.62 0.43, 0.90 72 0.53 0.40, 0.70 156 0.69 0.56, 0.85  

Q3 202.8-272.1 2245 212 0.67 0.55, 0.80 187 0.78 0.64, 0.95 25 0.33 0.21, 0.52 72 0.50 0.38, 0.67 140 0.63 0.51, 0.79  

Q4 ≥272.1 2241 184 0.57 0.47, 0.70 163 0.67 0.54, 0.83 21 0.28 0.17, 0.47 60 0.47 0.34, 0.63 124 0.57 0.45, 0.71  

P for trend <.001 <.001 <.001 <.001 <.001  

 Per 10 µg increase

    0.998 0.982,0.995  0.994

0.988,1.000 

0.949 0.931,0.968

 0.979

0.969,0.990 

0.985 0.977,0.993

 

a Adjusted for maternal age, education level, smoking, parity, preeclampsia, maternal diabetes, preeclampsia, pre-pregnancy BMI, family monthly income per capita, maternal employment during pregnancy, history of preterm, folic acid supplementation.

b Adjusted all variables above except for preeclampsia and maternal diabetes.

Passive Smoking and Preterm Birth in Urban China Passive Smoking and Preterm Birth in Urban China

Smoking is a risk factor for preterm birth

Role of passive smoking in preterm birth is unclear

Epidemiologic studies examining passive smoking and pretrm birth reported mixed results 7 positive, and 7 no association.

(Qiu et al. Am J Epidemiol 2014; 180(1): 94-102 PMID: 24838804)

Study Population and Exposure Study Population and Exposure AssessmentAssessment

10,094 women having singleton live birth and non-smokers

Preterm Birth (<37 completed gestational weeks, N=1,009) – Moderate PB (32 to <37 completed weeks of gestation)

– Very PB ( <32 completed weeks)

– Medically indicated PB

– Spontaneous PBWith or without PB premature rupture of membranes (PPROM).

Passive smokers – women who exposed to cigarette smoke at home, at work, during

social and recreational activities, and/or while commuting to and from work for at least 30 minutes per week during pregnancy

Maternal exposure to environmental tobacco smoke and risk of small for gestational age among

non-smoking Chinese women

Smoking is a risk factor for SGA

Role of passive smoking in SGA is unclear

Epidemiologic studies examining passive smoking and SGA birth reported mixed results 11 positive, and 7 no association.

(Huang et al. Paediatr Perinat Epidemiol 2015 (in press))

Study Population and Exposure Study Population and Exposure AssessmentAssessment

Small for gestational age (SGA): an infant born with a birth weight below the 10th percentile of the gestational age- and gender-specific birth weight standards for Chinese newborns (N=775)

Appropriate for gestational age (AGA): neonates who weighed between the 10th and 90th percentiles (N=7,863)

Large for gestational age (LGA): an infant born with a birth weight above the 90th percentile using the same standards (N=1,413)

Table 3. Associations between ETS exposure and small for gestational age by exposure timing, duration, and location.

Appropriate for gestational age

  Small for gestational age

  N (%) OR* (95% CI)

No ETS exposure 6,392 586 (8.4) 1.00

Ever exposed to ETS during pregnancy 1,471 189 (11.4) 1.29 (1.09-1.54)

Ever exposed to ETS during the 1st trimester 1,380 171 (7.8) 1.24 (1.03-1.49)

Ever exposed to ETS during the 2nd trimester 1,254 167 (11.8) 1.33 (1.11-1.61)

Ever exposed to ETS during the 3rd trimester 1,107 151 (12.0) 1.36 (1.12-1.65)

Duration of ETS exposure (hours/day)Ever exposed during pregnancy

<1 1,075 132 (10.9) 1.23 (1.01-1.51)≥1 396 57 (12.6) 1.46 (1.09-1.96)P for trend** 0.002

Ever exposed during the 1st trimester

<1 920 107 (10.4) 1.15 (0.92-1.43)≥1 460 64 (12.2) 1.43 (1.08-1.89)P for trend** 0.008

Ever exposed during the 2nd trimester

<1 834 102 (10.9) 1.21 (0.97-1.52)≥1 420 65 (13.4) 1.57 (1.19-2.08)P for trend** 0.001

Ever exposed during the 3rd trimester

<1 740 94 (11.3) 1.25 (0.99-1.58)≥1 367 57 (13.4) 1.57 (1.17-2.11)P for trend** <0.001

Location of ETS exposureEver exposed during pregnancy

Home 1,098 156 (12.4) 1.36 (1.12-1.65)Other locations 354 32 (8.3) 1.10 (0.75-1.60)

Ever exposed during the 1st trimester

Home 1,022 139 (12.0) 1.29 (1.06-1.58)Other locations 336 32 (8.7) 1.15 (0.79-1.67)

Ever exposed during the 2nd trimester

Home 925 138 (13.0) 1.42 (1.16-1.74)Other locations 309 28 (8.3) 1.09 (0.73-1.62)

Ever exposed during the 3rd trimester

Home 841 124 (12.8) 1.39 (1.12-1.72)Other locations 256 26 (9.2) 1.23 (0.81-1.87)

*Adjusted for maternal age (continuous), education, employment, parity, maternal pre-pregnancy BMI, gestational hypertension, history of delivery low birth weight infant, and total energy intake during pregnancy. **P for trends was estimated as duration a continuous variable.

Table 4. Associations between ETS exposure and small for gestational age by trimester.

Appropriate for gestational age

  Small for gestational age

  N (%) OR* (95% CI)

No ETS exposure 6,392 586 (8.4) 1.00

Exposed to ETS throughout entire pregnancy 1,030 133 (11.4) 1.27 (1.03-1.55)

Exposed to ETS in any two trimesters

The 1st and 2nd trimesters 157 18 (7.8) 1.23 (0.74-2.02)

The 1st and 3rd trimesters 10 2 (11.8) 2.37 (0.51-11.07)

The 2nd and 3rd trimesters 43 14 (24.6) 3.79 (2.04-7.02)

Exposed to ETS exclusively in one trimester

The 1st trimester 183 18 (9.0) 1.03 (0.63-1.70)

The 2nd trimester 24 2 (7.7) 0.86 (0.20-3.68)

The 3rd trimester 24 2 (7.7) 0.85 (0.20-3.65)

*Adjusted for maternal age (continuous), education, employment, parity, maternal pre-pregnancy BMI, gestational hypertension, history of delivery low birth weight infant, and total energy intake during pregnancy.

Ambient PMAmbient PM1010 Exposure and Preterm Birth Exposure and Preterm Birth

Twelve earlier studies (two in China) provided inconsistent results.

Majority were based on registry database including 2 in China

All studies (except 2 in China) were conducted in areas with low air pollution levels (mean PM10 ranges from 13µg/m3 to 90µg/m3)

Very few studies examine the associations with preterm subtypes

Nan et al., Environ Int 2015; 76: 71-7 PMID: 25553395

Locations of monitors, distribution of residences of births and buffers of 6, 12, and 50km from monitors (n=8969).

WHO guideline of PM10 : 20 μg/m3

Earlier studies: mean PM10 ranges from 13μg/m3 to 90μg/m3

WHO guideline of PM10 : 20 μg/m3

Earlier studies: mean PM10 ranges from 13μg/m3 to 90μg/m3

Of 8969 singleton live births, 677 (7.5%) were preterm and 8292 were term births. Among preterm births, moderate and very preterm birth were 571 (84.3%) and 103 (15.7) respectively. Medically indicated preterm births (n=185) accounted for 27.3% of preterm births while spontaneous preterm birth (n=492) accounted for 72.7% of all cases.

U.S. National Ambient Air Quality Standard (NAAQS) (150µg/m3, equivalent to the China NAAQS Grade II level)

Ambient air pollution and congenital heart Ambient air pollution and congenital heart defects in Lanzhou, China defects in Lanzhou, China

Jan et al., Environ Res Letter 2015 (in press)

Outcome groups Subtypes of outcome groups Number of cases

Congenital malformations of great arteries (Q25)

Patent ductus arteriosus 52

 Both Patent ductus arteriosus and Stenosis of pulmonary artery

2

Congenital malformations of cardiac septa (Q21)

Isolated cases of Ventricular septal defect

8

  Isolated cases of Artrial septal defect 10

  Both Ventricular septal defect 1Other congenital malformations of heart (Q24)

  7

Congenital malformations of cardiac chambers and connections (Q20)

  1

Exposure to cooking fuels and birth weight in Lanzhou, China: a birth cohort study

Jiang et al., BMC Public Health 2015 (in press)

• Exposure to household air pollution resulting from cooking fuels has also been suggested as an important cause of low birth weight in developing countries

• Several studies reported that exposure to biomass smoke was associated with an increased risk of low birth weight

• However, none of these studies have controlled for gestational age

• It is unclear whether biomass smoke was associated with prematurity or intrauterine growth restriction.

Table 3. Multiple linear regression model for mean birth weight of cooking fuel types

Fuel type N Mean±SD(g)Difference from

gas*(g)95%CI

Gas 7907 3310.66±499.16 0.00

Coal 358 2970.40±709.54 -73.31 -119.77 to -26.86

Biomass 120 2804.96±803.89 -87.84 -164.46 to -10.76

Electromagnetic 487 3150.22±613.10 -30.20 -69.02 to 8.63

*Adjusted for maternal age, education, and family income, and maternal weight gain, vitamin supplement during pregnancy, preeclampsia, caesarean section, parity, gestational week, smoking, and ventilation.

Table 4. Associations between type of fuel and risk of LBW

Fuel type NBW LBW ORᵃ(95%CI) ORᵇ(95%CI)

Gas 6965 371 1.00 1.00

Coal 270 70 1.92(1.37-2.69) 1.09(0.67-1.78)

Biomass 73 42 3.74(2.35-5.94) 2.51(1.26-5.01)

Electromagnetic 408 53 1.48(1.05-2.06) 1.14(0.71-1.83)

a Adjusted for maternal age, education, family income, maternal weight gain, vitamin supplement during pregnancy, preeclampsia, caesarean section, parity, smoking, and ventilation.

b Additional adjustment for gestational week.

Table 5. Associations between type of fuel and risk of LBW by preterm and term birthsFuel types NBW LBW ORᵃ(95%CI) ORᵇ(95%CI)

TermGas 6668 102 1.00 1.00

Coal 239 10 1.00(0.48-2.09) 0.96(0.46-2.03)

Biomass 67 7 1.87(0.76-4.62) 1.85(0.72-4.71)

Electromagnetic 382 9 0.91(0.44-1.89) 0.84(0.40-1.76)

Preterm

Gas 297 269 1.00 1.00

Coal 31 60 1.53(0.88-2.64) 1.26(0.67-2.37)

Biomass 6 35 5.24(2.03-13.53) 3.43(1.21-9.74)

Electromagnetic 26 44 1.47(0.84-2.58) 1.38(0.72-2.65)

Moderate Preterm

Gas 292 205 1.00 1.00

Coal 30 41 1.34(0.73-2.43) 1.25(0.64-2.41)

Biomass 6 24 4.32(1.61-11.58) 3.19(1.09-9.39)

Electromagnetic 25 35 1.58(0.87-2.87) 1.48(0.75-2.93)

Very Preterm

Gas 5 64 1.00 1.00

Coal 1 19 0.86(0.06-12.80) 0.85(0.06-12.69)

Biomass 0 11 — —

Electromagnetic 1 9 0.41(0.03-5.52) 0.42(0.03-5.83)a Adjusted for maternal age, education, family income, maternal weight gain, vitamin supplement during pregnancy, preeclampsia, caesarean section, parity, smoking, and ventilation.b Additional adjustment for gestational week.