56
Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Embed Size (px)

Citation preview

Page 1: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Estimating Sample SizeComputer Laboratories

Epidemiology and Biostatistics Department

Faculty of Medicine Universitas Padjadjaran

2013

Page 2: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Reference

• Dahlan, MS. Besar Sampel dan Cara Pengambilan Sampel dalam Penelitian Kedokteran dan Kesehatan. Edisi 3. Jakarta: Salemba Medika; 2008

• Hulley, SB et al. Designing Clinical Research. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; 2007

Page 3: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Introduction

Whom? what? Design?

How many subjects to sample?

Page 4: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Introduction

• If the sample size is too small fail to answer its research question

• If the sample size is too large more difficult and costly than necessary

Page 5: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Introduction

• Goal to estimate an appropriate number of subjects for a given study design

• Should be estimated early in the design phase, when major changes are still possible– Research design is not feasible– Different predictor or outcome variables are

needed

Page 6: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Reasons for sampling

• Unable to perform total sampling

• Results from representative sample (appropriate number of subjects and sampling technique) can be generalized to population

• More efficient and ethical

Page 7: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Generalization

Study subjects

Intended sample

Accessible population

Study/Target population

Internal validity

External validity I

External validity II

Page 8: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Internal validity

• Representative actual sample/study subjects from intended sample– same characteristics with intended sample– problems: non-response, drop-out, loss to

follow-up

Page 9: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

External validity I

• Representative intended sample from accessible population– Appropriate sample size– Probabilistic sampling method

Page 10: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

External validity II

• Representative accessible population from target/study population

Page 11: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

How to get appropriate sample size?

• Appropriate sample size formula– Can be decided from our research questions/

research problems/problem identification

• Correct sample size calculation

Page 12: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Type of research:Specific design

• Diagnostic– Sensitivity, specificity, PPV, NPV, LR (+), LR (-)

• Prognostic– Example: What are the prognostic factors of

shock in DHF patients?

• Survival analysis– Example: Is there a mortality rate difference

between HIV-patient treated with HAART starting at CD4 count 200 and 200 ?

Page 13: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Type of research:Non-specific design

• Descriptive– To estimate population proportion

• What is the prevalence of diarrhea in Kecamatan X? – To estimate population mean

• What is the mean of FBG level among adults in Kecamatan X?

• Analytic– To find relationship/association between

dependent and independent variable– To find a (proportion, mean) difference between

two or more groups– To find correlation between variables

Page 14: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Notes

• In one study, it is possible to use more than one sample size formula, due to:– More than one research questions– Different study design

• Cohort and nested-case control

Page 15: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Notes

• Stated in advance the primary and secondary research questions/hypotheses

• The sample size calculations are always focused on the primary research question/hypothesis

Page 16: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Power of the study(1 – β)

• Results may be different• Need to be calculated again due to:

– Actual sample/study subjects intended sample

– in correlation study is different

– Effect size (p1 – p2, x1 – x2) is different

– Sample size is predetermined

Page 17: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Z and Z *

Value of or Z

ZDescriptive or Two-sided

One-sided

1% 2.81 2.57 2.57

5% 1.96 1.64 1.64

10% 1.64 1.44 1.44

15% 1.44 1.28 1.28

20% 1.28 0.84 0.84

For two-tailed hypothesis Z1 – /2

For one-tailed hypothesis Z1 –

*From Dahlan, MS, 2008

Page 18: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Strategies for minimizing sample size and maximizing power

• Use continuous variable (for outcome variable)– Permits smaller sample size for a given power– Permits greater power for a given sample size

Page 19: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Strategies for minimizing sample size and maximizing power

• Use paired measurements or matching– By comparing each subject with herself, it

removes the baseline between-subjects part of the variability of the outcome variable

– Example: • Change in weight on a diet has less variability than

the final weight• Final weight is highly correlated with initial weight

Page 20: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Strategies for minimizing sample size and maximizing power

• Increase the precision– Standardizing the measurement methods– Training and certifying observer– Refining the instrument– Automating the instrument– Repeating the measurement

Page 21: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Strategies for minimizing sample size and maximizing power

• Use unequal group sizes– In general, the gain in power when the size of

one group is increase to twice the size of the other is considerable

– Tripling or quadrupling one of the groups provide progressively smaller gains.

– Example:

In a case control study 1 case : 2 controls

Page 22: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Strategies for minimizing sample size and maximizing power

• Use more common outcome (with caution!)– More frequent outcome– Enroll subjects at greater risk of developing

that outcome– Extend the follow-up period– Loosen the definition of what constitutes an

outcome

Page 23: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Common Errors to Avoid

• Estimating sample size late during the design of the study most common

• Percentage or rate misinterpreted as numeric• No planning for dropouts or subjects with

missing data• Equal vs unequal sample sizes• Two-sided alternative hypothesis or statistical

analysis (Z1 - /2), but we use one-sided (Z1 - ) during sample size determination

Page 24: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Literature vs Judgement* Variable Descriptive Analytic

Judgement Categorical Probability of type I error = Precision = d

Probability of type I error = (one/two-sided)Probability of type II error = p1 – p2

Numerical Probability of type I error = Precision = d

Probability of type I error = (one/two-sided)Probability of type II error = x1 – x2

Literature or pilot study

Categorical Proportion Proportion in control/non-exposed/standard group = P2

Numerical Standard deviation Combined standard deviation = SCorrelation coefficient = r

*From Dahlan, MS, 2008

Page 25: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Case I

• Students have a variety of reasons for doing research while in medical school. As part of the Jatinangor program you are interesting in reproductive health. The aim of your study is to know the prevalence of puberty (defined by menarche or wet dreams) among primary school children in Kecamatan Jatinangor. There is no previous study on prevalence of puberty in that community.

Page 26: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Answer

a. The most appropriate study design: cross-sectional studyOutcome variable : prevalence of puberty (history of menarche or wet dreams Yes-No, nominal) Predictor variable : -

b. The most appropriate statistical analysis for the study: Descriptive statistics

Page 27: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Answer

c. The target population: All Primary school in Kecamatan Jatinangor The accessible population: Primary school in Kecamatan Jatinangor Study unit of the study: Student age of 7 – 12 years old

d. The appropriate sampling technique for the study: Stratified random sampling, cluster sampling

Page 28: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Answer

e. Using 95% confidence interval ( =.05) and with precision of the study 10 % (within 10% of the true value), the sample size needed and the appropriate sampling technique are :

• For α= 0.05 then Z0.975 = 1.96

make sure npq ≥ 5 97(0,5)(0,5) = 24.25 ≥ 5 • The researcher will need at least 97 student age of 7

– 12 years old

Page 29: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Case II

• Suppose we wishes to know the random blood glucose level (mg/dl) among medical students in Faculty of Medicine X

Page 30: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Answer

a. The most appropriate study design: Cross-sectional studyOutcome variable : random blood glucose level (numeric) Predictor variable : -

b. The most appropriate statistical analysis for the study: Descriptive statistics

Page 31: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Answer

c. the target population: All medical students in Faculty of Medicine X the accessible population: All medical students in Faculty of Medicine X the study unit of the study: Medical student

d. The appropriate sampling technique for the study: Simple random sampling, stratified random sampling

Page 32: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

AnswerThe aspects that can be determined by the researcher from the beginning

• d (precision)

The aspects that must be searched by the researcher from literature or a pilot study

• s (standard deviation)

f. Based on a pilot study, ten students were selected, and the following were the result of their random blood glucose level. Using α= 0.05 and a precision of 2.5 mg/dl, the estimation of sample size needed for the study are:

Page 33: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Answer

• For α = 0.05 then Z0.975 = 1.96 ; d = 2.5 mg/dl ; s = 13.47 mg/dl

• The researcher will need at least 112 medical students

Page 34: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Case III

• One of the batch 2010 medical student prepare to conduct a study (for his minor thesis) on risk factors of diarrhea. Let’s say that the hypothesis was exclusive breastfed babies (first six months of life) will be less dehydrated (mild to moderate vs severe) during diarrhea in their age 7 to 11 months. The researcher wishes to conduct the study in Hasan Sadikin Hospital Bandung period of January – December 2011.

Page 35: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Answer

a. The most appropriate study design? Case-control, cross-sectional study

Outcome variable : dehydration during diarrhea (mild to moderate or severe, nominal) Predictor variable : history of exclusive breastfeeding (yes or no, nominal)

b. The most appropriate statistical analysis for the study: Chi-square test (assuming there are no confounding variables)

Page 36: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Answer

c. The target population: Baby age of 7 to 11 months diagnosed with diarrhea treated in Pediatric Emergency Unit, Hasan Sadikin Hospital, Bandung, period of January – December 2011 The accessible population: Baby age of 7 to 11 months diagnosed with diarrhea treated in Pediatric Emergency Unit, Hasan Sadikin Hospital, Bandung, period of January – December 2011 The study unit of the study: Medical record

d. The appropriate sampling technique for the study: Simple random sampling

Page 37: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

AnswerThe aspects that can be determined by the researcher from the beginning

• α • β,• p1 – p2

The aspects that must be searched by the researcher from literature or a pilot study

• p2 (depends on the study design)

Page 38: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Answer

• Using α = 0.05, β= 0.2, and difference of proportion considered by the researcher to be clinically significant = 0.2, the estimation of sample size needed for the study are

• For α = 0.05 then Z0.95 = 1.64 (one-sided) and β = 0.2 then Z0.8 = 0.84 ; p1 – p2= 0.2

p2 = 18/35 = 0.51 (cross-sectional) p1 = 0.2 + p2 = 0.2 + 0.51 = 0.71 q1 = 1 – p1 = 1 – 0.71 = 0.29 q2 = 1 – p2 = 1 – 0.51 = 0.49 p = (p1+p2)/2 = (0.71 + 0.51)/2 = 0.61 q = 1 – p = 1 – 0.61 = 0.39

p2 = 17/32 = 0.53 (case control)p1 = 0.2 + p2 = 0.2 + 0.53 = 0.73 q1 = 1 – p1 = 1 – 0.73 = 0.27 q2 = 1 – p2 = 1 – 0.53 = 0.47 p = (p1+p2)/2 = (0.73 + 0.53)/2 = 0.63 q = 1 – p = 1 – 0.61 = 0.37

Page 39: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Answer

The researcher will need at least 73 exclusive breastfed babies and 73 non-exclusive breastfed babies diagnosed with diarrhea

Cross sectional study

Page 40: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Answer

• For case group, the researcher will need at least 71 babies diagnosed with diarrhea plus severe dehydration

• For control group, the researcher will need at least 71 babies diagnosed with diarrhea plus mild to moderate dehydration

Case control study

Page 41: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Case IV

• The researcher wishes to compare fasting blood glucose level (mg/dl) between medical students of Faculty of Medicine X with and without family history of DM type II. The subjects were matched according to age and sex.

Page 42: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Answer

a. The most appropriate study design: cross-sectional study

Outcome variable : fasting blood glucose level (numeric) Predictor variable : -

b. The most appropriate statistical analysis for the study: Paired t-test with Wilcoxon signed-rank test as an alternative

Page 43: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Answer

c. The target population: All medical students in Faculty of Medicine X The accessible population: All medical students in Faculty of Medicine X The study unit of the study: Medical student

d. The appropriate sampling technique for the study? Matching technique

Page 44: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

AnswerThe aspects that can be determined by the researcher from the beginning

• α • β• x1 – x2

The aspects that must be searched by the researcher from literature or a pilot study

• S (combined standard deviation from two observations)

Page 45: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

AnswerBased on a pilot study, six-paired students with family history of DM type II and without family history of DM type II were selected

α = 0.05, β = 0.2, and difference of mean considered by the researcher to be clinically significant = 2.5 mg/dl, the estimation of sample size needed for the study are

Page 46: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Answer

• For α = 0.05 then Z0.975 = 1.96 (two-sided) and β = 0.2 then Z0.8 = 0.84

• x1 – x2 = 2.5 ; s1 = 4.88 mg/dl, n1 = 6 ; s2 = 3.74 mg/dl, n2 = 6

The researcher will need at least 24 of medical students with family history of DM type II and 24 medical students without family history of DM type II (matched according to age and sex)

Page 47: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Case V

• The investigator wants to conduct a cross-sectional study to know whether DM will give negative effect on the treatment outcome of TB. Data will be collected from hospital. The register showed that there are 50 people meet the criteria of inclusion in this study. From previous study, after 6 months of therapy, 9.6% of cultured sputum specimens from non-diabetic patients were still positive for Mycobacterium tuberculosis (RR = 2.65).

Page 48: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Answer

a. Outcome variable : response for treatment (Yes-No, nominal)Predictor variable : random blood glucose level (numeric)

b. The most appropriate statistical analysis for the study: Chi-square test

Page 49: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Answer

c. The target population: All TB patients with DM in Hospital X The accessible population: Adult TB age of 20 to 65 years old diagnosed with DM treated in in Hospital X The study unit of the study: Medical record

d. The appropriate sampling technique for the study? Simple random sampling

Page 50: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

• The power of the study in the number of samples taken from a total sampling? (Using = .05) : looking the formula and put the sample size

Page 51: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Case VI

• Let’s say the researcher has a hypothesis that serum 25(OH)-vitamin D levels (ng/ml) is positively correlated with bone mineral density, estimated using the quantitative ultrasound index (QUI), among postmenopausal women in Kecamatan Jatinangor

Page 52: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Answer

a. The most appropriate study design: Case-control, cross-sectional study

Serum 25(OH)-vitamin D levels (numeric) Quantitative ultrasound index (numeric)

b. What is the most appropriate statistical analysis for the study? Correlation methods (Pearson or Spearman’s rho coefficient correlation)

Page 53: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Answer

c. The target population: Postmenopausal women in Kecamatan Jatinangor The accessible population: Women who come to Posbindu Lansia in all villages The study unit of the study: Postmenopausal woman

d. The appropriate sampling technique for the study: Consecutive sampling

Page 54: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

AnswerThe aspects that can be determined by the researcher from the beginning

• α • β

The aspects that must be searched by the researcher from literature or a pilot study

• r (Pearson’s correlation coefficient)

Based on pilot study, with 10 participants

For α = 0.05 then Z0.975 = 1.64 (one-sided) and β = 0.2 then Z0.8 = 0.84 r = 0.78 (using SPSS or Excel)

Page 55: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Answer

• The researcher will need at least 9 postmenopausal women

Page 56: Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013

Review• Study Design

– Non-specific or specific?– Observational (cross-sectional, case-control, cohort) or experimental?

• Variables– Predictor/dependent and outcome/independent– Scale of measurement

• Categorical (nominal or ordinal)• Numerical

• Paired vs unpaired observation• Hypothesis

– Type I and type II error (α, β)– Power of the study (1 – β)– One or two-sided alternative hypothesis

• Statistical analysis• Sampling technique

– Probabilistic sampling technique– Non-probabilistic sampling technique