Assessing the Risk of Bias Critical appraisal of medical literature 2010. 9 고려대학교 의과대학 안 형 식

Assessing the Risk of Bias Critical appraisal of medical literature

2010. 9고려대학교 의과대학안 형 식

What is bias?

Systematic error or deviation from the truth may overestimate or underestimate biased studies lead to misleading results can’t measure the presence of bias, only the risk rigorous methods minimise the risk of bias question: “should I believe the results?”

Biases in randomised controlled trials

Bias - A process that tends to produce results that depart systematically from the true values existing in the study popula-tion Selection bias

Avoid with randomisation and concealment Performance bias

Avoid with standardisation of care and blinding Attrition bias

Avoid using all subjects intention to treat analysis Measurement bias

Avoid with blinding of outcome assessors and patients

Study sample

Control

Intervention

Experimental

Intervention

Allocation of subjects

Outcomes

Follow up

Selection biasRandomisation

Concealment

Performance biasStandardisation of care protocol

Blinding of care providers and patients

Measurement biasBlinding of outcome assessors

and patients

Follow up

Attrition biasDrop-outs?

Cross-over?

Is everyone accounted for?

Outcomes

Should we use quality scales?

> 30 available reliability and validity of many scales

not established different scales lead to conflicting con-

clusions may include criteria not related to bias no evidence for numerical weighting of

different elements how do readers interpret the score? not recommended for Cochrane re-

views

The Cochrane approach

describe the following for each study in detail:

random sequence generation allocation concealment blinding incomplete outcome data selective outcome reporting any other risks

empirical research shows that these compon-ents can have a significant effect on results, of-ten leading to exaggerated effects

For each study, in each domain

is there enough information to understand what happened? if not, rate unclear

what is your judgement: are you satisfied that the study is at a low risk of bias? yes indicates a low risk no indicates a high risk based on the context of your review, em-

pirical evidence of bias effect, likely direc-tion and magnitude of effect

Random sequence genera-tion

occurs at the start of a trial before allocation of

participants determines the order of allocation into inter-

vention and control groups avoids systematic differences between

groups accounts for known and unknown con-

founders minimises selection bias

Results of 32 comparative studies of anti-co-agulant therapy for MI patients (Chalmers et al, 1977)

Study design

Apparent

risk reduction Historical controls (18 studies, 9000 subjects)

62% ± 4

Concurrent non-random Controls (8 studies, 3000 subjects)

34% ± 7

Randomised (6 studies, 4000 subjects)

22% ± 8

Identifying a random se-quence

Adequate random number ta-

ble computer random

number generator coin toss shuffling cards or

envelopes throwing dice drawing lots

Not adequate date of birth day of visit ID or record number alternate allocation choice of the clinician

or participant test results intervention availabil-

ity

Allocation concealment

occurs at the start of the trial during allocation of participants

when a person is recruited to the study, no-one knows which group they will be allocated to

protects the random sequence: pre-vents

changing the order of recruitment, or deciding not to recruit

strongest empirical evidence showing this

is important to results minimises selection bias

Selection bias in trial with fore-knowledge of treatment allocation: Amniotomy or oxytocin for induction of labour (Bakose & Backstrom re-analysed by M Keirse)

* indicates an unfavourable cervix: Keirse hypothesised that such patients would be less likely to be entered in the trial if it were known that they would be allocated amniotomy.

This trial was described as the first “prospective randomised study “between amniotomy and oxytocin for induction of labour in a “totally unselected population” !!!

Bishop score at

entry to trial

Allocated oxytocin

(even date of

birth)

Allocated

amniotomy (odd

date of birth)

3 or less* 28 7

4 or 5 56 58

6 or more 29 45

Total 110 113

X 2

P value

16.1

<0.00025

Identifying allocation con-cealment

Adequate central allocation (phone, web,

pharmacy) sequentially numbered, identical

drug containers serially numbered, sealed, opaque

envelopes

Not adequate random sequence known to staff in

advance envelopes without all three safe-

guards non-random, predictable sequence

Blinding occurs during the intervention and

measurement of outcomes minimises performance bias

different treatment of the two groups participant expectations

minimises detection bias different measurement of outcomes between the two

groups subjective outcomes particularly vulnerable

can blind the participant, care provider, outcome asses-sor, other personnel – more than “double blinding”

check for intention and success of blinding

Schulz KF & Grimes DA 2002 Lancet

위약 효과 - 관절염 통증 완화

Identifying blinding

Adequate participants and key study personnel blinded blinding probably not broken outcomes not likely to be influenced

Not adequate any of the above not met

Allocation concealment vs blind-ing

Time

Randomisation

Concealment of allocation Blinding

Selection bias Performance bias

Incomplete outcome data

when complete outcome data is not available for all participants

can indicate attrition bias can have important impact when:

enough data is missing to affect the results: no. of partici-pants missing (dichotomous) or effect size (continuous)

the no. of people missing is not balanced between groups the reason for absence is related to the study outcomes

(e.g. moved away vs adverse event) two causes

loss of participants to follow up exclusion of participants by trialists

추적 탈락 (Losses-to-follow-up)어느 정도가 적당한가 ?

“5 : 20 의 법칙” * 5% 이하는 바이어스가 적을 것임 * 20% 이상은 타당도에 심각한 영향을 줌

→ 그러나 과도한 단순화

→ 비교되는 추적관찰 손실 비율 (Losses-to-follow-up)과 결과발생율 (outcome event rate) 에 의존함

* 추적관찰의 손실율이 결과발생율을 초과하지는 않아야 함

Intention-to-Treat 원칙 무작위화의 유지

원칙 : 대상이 무작위 할당 된후에는 애초에 배정된 군에 따라 분석할 것 , 실험중단 , 치료를 받지 않거나 , crossover 의 경우에도 무작위 할당 분석은 유지되어야 한다 .

예외 : 환자가 무작위 할당 전에 작성된 기준에 따라 맹검 재평가에서 부적당하다고 판명된 경우

Identifying incomplete outcome data

Adequate no missing data reasons for missing data not related to outcome missing data balanced across groups with similar reasons number of participants missing or plausible effect size not

enough to change observed effect

Not adequate any of the above criteria not met ‘as-treated’ analysis with substantial departure from alloc-

ated intervention missing data imputed using inappropriate methods

Selective outcome report-ing when outcomes are not reported as planned

outcomes missing new outcomes added (can be justified, e.g. ad-

verse events) unexpected statistics, subscales or subgroups reporting that cannot be used in a review

can indicate ‘within-study publication bias’ or ‘data mining’

difficult to determine compare methods to results refer to study protocol or trial register look for commonly used outcomes

Identifying selective outcome report-ing

Adequate protocol is available and all pre-specified outcomes

reported in the pre-specified way protocol not available but all expected outcomes are

reported most studies will be judged ‘unclear’ in this

category

Not adequate outcomes not reported as planned or expected limited information provided for some outcomes (e.g.

only direction of effect and significance)

Other potential problems

Adequate study appears to be free of other sources of risk

Not adequate issues specific to the study design

carryover in crossover trials comparability of groups in cluster-randomized tri-

als trial stopped early using data-dependent process

(including a formal stopping rule) extreme baseline imbalance possible fraud other problem

Relative risk reduction (RRR)

Absolute risk reduction (ARR)

NNT (number needed to treat)

NNT 의 의미에는 추적관찰기간이 내포되어 있다 .

NNThypothetical= NNTobserved X (observed time/hy-

pothetical time)

치료효과는 어느정도인가 ?

사건발생율= 33개월까지 장애

의 진행

대조군의

사건발생율

(위약군에서)

(CER = Control

event rate)

실험군의

사건발생율

(인터페론

투여군에서)

(EER =

Experimental

event rate)

상대위험

감소율

(Relative risk

reduction)

(RRR=

CER-EER

/CER

= 1-RR)

절대위험

감소율

(Absolute risk

reduction)

(ARR=

CER-EER )

Number

needed to

treat

(NNT

=1/ARR)

실제 임상시험례

(인터페론연구)

Lancet 1998; 352:

1491-7

50 % 39% (50%-39%)

/50%

= 22%

50%-39%

= 11%

1/11%=9

효과가 미약한

가상례

0.00050% 0.00039% (00050% -

0.00039%)

/0.00050%

=22%

0.00050%

- 0.00039%

= 0.00011%

1/0.00011%

= 909090

Interferon for multiple sclerosis

Risk of bias assessment in Cochrane re-views

Risk of bias sum-mary

Here ‘Blinding’ and ‘In-complete outcomes data’ have been assessed for two sets of outcomes

Risk of bias graph

The Newcastle-Ottawa Scale (NOS) for Assessing the Qual-ity of Nonrandomized Studies

in Meta-Analysis

Development

Applications

Current Develop-ments

Development: Item Selection

Newcastle quality assessment form Ottawa comprehensive list Panel review Critical review by experts

Development: Grouping Items

Cohort studies Selection of cohorts Comparability of cohorts Assessment of outcome

Case-Control studies Selection of case and controls Comparability of cases and controls Ascertainment of exposure

Development: Identifying Items

Identify ‘high’ quality choices with a ‘star’

A maximum of one ‘star’ for each h item within the ‘Selection’ and ‘Exposure/Outcome’ categories; maximum of two ‘stars’ for ‘Com-parability’

NEWCAS TLE - O TTAW A Q UALITY ASS ESS MENT SCALECO HORT S TUDIES

Note: A study can be awarded a ma ximum of one star for each numbered item within the Selection andOutcome categories. A maximum of two stars can be given for Comparability

Selection1) Representativeness of the exposed cohort

a) truly representative of the average _______________ (describe) in the community b ) somewhat representative of the average ______________ in the community c) selected group of users eg nurses, volunteersd) no description of the derivation of the cohort

2) Selection of the non exposed cohorta) drawn from the same community as the exposed cohort b) drawn from a different sourcec) no description of the derivation of the non exposed cohort

3) Ascertainment of exposurea) secure record (eg surgical records) b) structured interview c) written self reportd) no description

4) Demonstration that outcome of interest was not present at start of studya) yes b) no

Compara bility1) Comparability of cohorts on the basis of the design or analysis

a) study controls for _____________ (select the most important factor) b) study controls for any additional factor (This criteria could be modified to indicate specific

control for a second important factor.)

Outcome1) Assessment of outcome

a) independent blind assessment b ) record linkage c) self reportd) no description

2) Was follow-up long enough for outcomes to occura) yes (select an adequate follow up period for outcome of interest) b) no

3) Adequacy of follow up of cohortsa) complete follow up - all subjects accounted for

b ) sub jects lost to follow up unlikely to introduce bias - small number lost - > ____ % (select an adequate %) follow up, or description provided of those lost)

c) follow up rate < ____% (select an adequate %) and no description of those lostd) no statement

Newcastle-Ottawa Quality Assessment Scale: Cohort Studies

Selection (4)

Comparability (1)

Outcome (3)

A study can be awarded a maximum of one star for each numbered item within the Selection and out-come categories. A maximum of two stars can be given for Comparability

Selection1. Representativeness of the exposed cohort a) truly representative of the average ___________ (describe) in the commu-

nity b) somewhat representative of the average ___________ in the community c) selected group of users eg nurses, volunteers d) no description of the derivation of the cohort

2. Selection of the non exposed cohort a) drawn from the same community as the exposed cohort b) drawn from a different source c) no description of the derivation of the non exposed cohort

3. Ascertainment of exposure to implants a) secure record (eg surgical records) b) structured interview c) written self report d) no description

4. Demonstration that outcome of interest was not present at start of study a) yes b) no

In the case of mortality stud-ies, outcome of interest is still the presence of a dis-ease/ incident, rather than death; that is a statement of no history of disease or

incident earns a star

Comparability

1. Comparability of cohorts on the basis of the design or analysis

a) study controls for ___________ (select

the most important factor) b) study controls for any additional factor

(This criteria could be modified to indi-cate specific control for a second impor-tant factor.)

Outcome1. Assessment of outcome a) independent blind assessment b) record linkage c) self report d) no description

2. Was follow up long enough for outcomes to occur a) yes (select an adequate follow up period for outcome of

interest) b) no

3. Adequacy of follow up of cohorts a) complete follow up - all subjects accounted for b) subjects lost to follow up unlikely to introduce bias - small

number lost - > ___ % (select an adequate %) follow up, or description of those lost)

c) follow up rate < ___% (select an adequate %) and no de-scription of those lost d) no statement

N EWCAS TLE - O TTAW A Q UALITY ASS ESS MENT SCA LECAS E CON TRO L S TUD IES

Note: A study can be awarded a ma ximum of one star for each numbered item within the Selection andExposure categories. A maximum of two stars can be given for Comparability.

Selection

1) Is the case definition adequate?a) yes, with independent validation b) yes, eg record linkage or based on self reportsc) no description

2) Representativeness of the casesa) consecutive or obviously representative series of cases b) potential for selection biases or not stated

3) Selection of Controlsa) community controls b) hospital controlsc) no description

4) Definition of Controlsa) no history of disease (endpoint) b) no description of source

Compara bility

1) Comparability of cases and controls on the basis of the design or analysisa) study controls for _______________ (Select the most important factor.) b) study controls for any additional factor (This criteria could be modified to indicate specific

control for a second important factor.)

Exposure

1) Ascertainment of exposure

a) secure record (eg surgical records) b) structured interview where blind to case/control sta tus c) interview not blinded to case/control status

d) written self report or medical record only

e) no description

2) Same method of ascertainment for cases and controlsa) yes b) no

3) Non-Response ratea) same rate for both groups b) non respondents describedc) rate different and no designation

Newcastle-Ottawa Quality Assessment Scale: Case-Control Studies

Selection (4)

Comparability (1)

Exposure (3)

A study can be awarded a maximum of one star for each numbered item within the Selection and Expo-sure categories. A maximum of two stars can be given for Comparability

1. Is the case definition adequate? a) yes, with independent validation b) yes, eg record linkage or based on self reports c) no description

2. Representativeness of the cases a) consecutive or obviously representative series of cases b) potential for selection biases or not stated

3. Selection of Controls a) community controls b) hospital controls c) no description

4. Definition of Controls a) no history of disease (endpoint) b) no description of source

Selection

>1 person/record/time/process to extract information, or

reference to primary record source such as x-rays or medical/hospital records

e.g. ICD codes in database or self-report with no

reference to primary record or no description

Comparability

1. Comparability of cases and controls on the basis of the design or analysis

a) study controls for ___________ (select

the most important factor) b) study controls for any additional factor

(This criteria could be modified to indi-cate specific control for a second impor-tant factor.)

Exposure

1. Ascertainment of exposure a) secure record (eg surgical records) b) structured interview where blind to case/control status c) interview not blinded to case/control status d) written self report or medical record only e) no description

2. Same method of ascertainment for cases and controls a) yes b) no

3. Non-Response Rate a) same rate for both groups b) non respondents described c) rate different and no designation

Applications:

Assess quality of nonrandomized studies

Incorporate assessments in inter-pretation of meta-analytic results

Design, content and ease of use

Long Term Hormone Re-placement Therapy and Coronary Heart Disease

Events

• Clearly formulated question• Comprehensive data search• Unbiased selection and abstraction

process• Critical appraisal of data• Synthesis of data• Perform sensitivity and subgroup analy-

ses if appropriate and possible• Prepare a structured report

Steps of a Cochrane Systematic Review

Objective

Is there a relationship between hormone replacement therapy and the incidence of coronary heart disease in post-menopausal women

Inclusion Criteria

Types of studies case-control, cohort or cross-sectional studies

Population postmenopausal women

Intervention women exposed to hormone replacement therapy (e-

strogen or estrogen + progesterone) ever, current, past

Outcomes coronary heart disease (events) fatal, non-fatal, both





Search Strategy

Electronic Search of: MEDLINE (1966 to May 2000) Current Contents (to May 2000)

Other Data Sources: review of references cited in retrieved arti-

cles





Data Extraction

2 independent reviewers selected trials 2 independent reviewers extracted data

using pre-determined forms study design population characteristics exposure to implants outcomes measures results

differences resolved by consensus

Results

16 case-control or cross-sectional 14 cohort

Quantification of Effects

Exposure (ever, current, past) Outcome (fatal, non-fatal, both) Effect estimates (EE)

• Relative Risk (RR) • Odds Ratio (OR)

Adjusted effect estimates Effects vs population, follow-up periods,

etc. (homogeneity)





Avila / 90

Cauley / 97

Grodstein / 96

Henderson / 91

Lafferty / 94

Wilson / 85

Wolf / 96

Lauritzen / 83

Ettinger / 96

Petitti / 87

Sourander / 98

Bush / 87

Criqui / 98

Folsom / 95

Cohort Star Template

Selection Comparability Outcome

Adam / 81

Beard / 89

Croft / 89

Grodstein / 97

Heckbert / 97

LaVecchia / 87

Mann / 94

Pfeffer / 78

Rosenberg / 76

Rosenberg / 80

Rosenberg / 93

Ross / 81

Sidney / 97

Szklo / 84

Talbott / 77

Thompson / 89

Case-Control Star Template

Selection Comparability Exposure

Adjusted Effect Estimates for Coronary Heart Dis-ease

(All Events) (HRT: Estrogen Current Use)Case-Control Studies


Rosenberg / 76

Talbott / 77

Pfeffer / 78

Rosenberg / 80

Heckbert / 87

LaVecchia / 87

Rosenberg / 93

Mann / 94

Grodstein / 97

Sidney / 97

0.01 0.1 1 10


(All Events) (HRT: Estrogen Past Use)Case-Control Studies


Rosenberg / 80

Heckbert / 87

LaVecchia / 87

Grodstein / 97

Sidney / 97

0.1 1 10


(All Events) (HRT: Estrogen Ever Use)Case-Control Studies


Pfeffer / 78

Rosenberg / 80

Ross / 81

Szklo / 84

Heckbert / 87

LaVecchia / 87

Beard / 89

Croft / 89

Thompson / 89

Rosenberg / 93

0.1 1 10


(All Events) (HRT: Estrogen + Progestin Ever Use)Case-Control Studies


Heckbert / 87

Thompson / 89

Rosenberg / 93

0.1 1 10


(All Events) (HRT: Estrogen Current Use)Cohort Studies


Bush / 87

Avila / 90

Folsom / 95

Grodstein / 96

Cauley / 97

Criqui / 98

Sourander / 98

0.01 0.1 1 10


(All Events) (HRT: Estrogen Ever Use)Cohort Studies


Lauritzen / 83

Wilson / 85

Petitti / 87

Henderson / 91

Lafferty / 94

Folsom / 95

Ettinger / 96

Wolf / 96

0.01 0.1 1 10

Current Development: Validity

Face/content validity Criterion validity

compare to more comprehensive scales

compare to expert judgement

Construct validity external criteria

‘convergent validity’ ‘divergent validity’

internal structure ‘factorial validity’

Current Development: Reliabil-ity

Inter-rater reliability Intra-rater reliability

Future Development: Scoring

Identify threshold score distin-guishing between ‘good’ and ‘poor’ quality studies

The Newcastle-Ottawa Scale (NOS) for Assessing the Qual-ity of Nonrandomized Studies

in Meta-Analysiswww.lri.ca

NOS Quality Assessment Scales:Case-control studiesCohort studies

Manual for NOS Scales

Documents

Assessing the Risk of Bias Critical appraisal of medical literature 2010. 9 고려대학교 의과대학 안 형 식