8
Viewpoint Enhancing causal interpretations of quality improvement interventions G Cable Abstract In an era of chronic resource scarcity it is critical that quality improvement profes- sionals have condence that their project activities cause measured change. A com- monly used res earch desi gn, the single group pre-t est/pos t-tes t design, provides li tt le insi ght into whet her qua li ty im- provement interventio ns cause measur ed out comes. A re- eva luatio n of a qualit y impro vement programme designed to re- duce the percentage of bil ateral car diac catheterisations for the period from Janu- ary 1991 to October 1996 in three cath- eterisation laboratories in a north eastern state in the USA was performed using an interrupte d ti me seri es d es i gn wi th switching replications. The accuracy and causal interpretability of the ndings were considerably improved compared with the original evaluation design. Moreover, the re-ev aluati on prov ided tangible evidence in suppo rt of the sugge stion that more rigorous designs can and should be more widely empl oyed to impr ove the causal interpretability of quality impro vement eV ort s. Eval uation desi gns for quali ty improvement project s should be con- structed to pro vide a reasonable oppo r- tunity, given available time and resources, for causal int erpr eta tio n of the res ult s. Evaluators of quality improvement initia- tives may infrequently have access to ran- domised designs. Nonetheless, as shown here, other very rigorous resea rch designs are available for improving causal inter- pretabili ty . Uni lat era l met hodologica l surrender need not be the only alternative to randomised experiments. (Quality in Health Care 2001;10:179–186) Keywords: causal interpretations; quality improvement; interrupted time series design; implementation delity Actuating benecial change is the raison d’être of quality improvement eV orts. In an era when priva te entit ies and governme nts are incre as- ingl y less will ing to pa y for ca re, quality impr ove ment initi ativ es must addit ional ly be able to demo nstra te superi ority ov er comp eting str ate gies , or the opt ion of doi ng not hing at all. 1–4 In the curren t milieu, every majo r quality improvement project should include an evalua- tion research design that permits rigorous test- ing of the extent to which improvement eV orts actual ly cau se me asured cha nge . The mos t commonly employed research designs—single group pre-test/post-test designs and one shot case studies (that is, a single group design with a post-test only)—provide little evidence of the causal impact of improvement activities. 5 Con- sequently, methodological rigour is often un- necessarily sacriced when design s are ava il- able which can improve the causal interpretability of the results. The purpose of this paper is to make a case for improving the rigour of research designs in order to enhance Key messag es + In an era of chronic resource scarcity it is critical that quality improvement profes- sionals have condence that their project activities cause measured change. + A commonly used research design, the sin gle group pre -te st/ pos t-t est des ign , pro vid es lit tle ins igh t int o whether qualit y impr o ve me nt int er ve n ti on s c au s e measured outcomes. + Man y other quasi -experi ment al desig ns can be employ ed in most context s instead of single group pre-test/post-test designs, and provide much great er causa l inter - pretability of the ndings of the quality improvement project evaluation. One of the most powerful of these is interrupted time series designs. Implica tions of these ndings for quality improvement The ado pti on of mor e rigorous res ear ch designs to eval uate quality impro vement eV orts will enable quality improvement pro- fessionals to know with greater condence whether the eV orts actually work. In turn, projects determined to be successful based on this more compelling evidence can more condently be diV used as best practices. A latent bene t of impr ov ing the leve l of  rigour in the evaluation research design is that the perception of quality improvement projects as “scientic” will be enhanced. Quality in Health Care 2001;10:179–186 179 The Quality Institute, Atlantic Health System, 325 Columbia Turnpike, Florham Park, NJ 07932, USA G Cable, director of research Correspondence to: Dr G Cable [email protected] Accepted 30 March 2001 www.qualityhealthcare.com

179qs

Embed Size (px)

Citation preview

8/8/2019 179qs

http://slidepdf.com/reader/full/179qs 1/8

Viewpoint 

Enhancing causal interpretations of quality

improvement interventions

G Cable

AbstractIn an era of chronic resource scarcity it iscritical that quality improvement profes-sionals have confidence that their projectactivities cause measured change. A com-monly used research design, the singlegroup pre-test/post-test design, provides

little insight into whether quality im-provement interventions cause measuredoutcomes. A re-evaluation of a qualityimprovement programme designed to re-duce the percentage of bilateral cardiaccatheterisations for the period from Janu-ary 1991 to October 1996 in three cath-eterisation laboratories in a north easternstate in the USA was performed using aninterrupted time series design withswitching replications. The accuracy andcausal interpretability of the findings wereconsiderably improved compared with theoriginal evaluation design. Moreover, there-evaluation provided tangible evidencein support of the suggestion that morerigorous designs can and should be morewidely employed to improve the causalinterpretability of quality improvementeV orts. Evaluation designs for qualityimprovement projects should be con-structed to provide a reasonable oppor-tunity, given available time and resources,for causal interpretation of the results.Evaluators of quality improvement initia-tives may infrequently have access to ran-domised designs. Nonetheless, as shownhere, other very rigorous research designsare available for improving causal inter-pretability. Unilateral methodologicalsurrender need not be the only alternative

to randomised experiments.(Quality in Health Care 2001;10:179–186)

Keywords: causal interpretations; quality improvement;interrupted time series design; implementation fidelity

Actuating beneficial change is the raison d’êtreof quality improvement eV orts. In an era whenprivate entities and governments are increas-ingly less willing to pay for care, qualityimprovement initiatives must additionally beable to demonstrate superiority over competingstrategies, or the option of doing nothing atall.1–4 In the current milieu, every major quality

improvement project should include an evalua-

tion research design that permits rigorous test-ing of the extent to which improvement eV ortsactually cause measured change. The mostcommonly employed research designs—singlegroup pre-test/post-test designs and one shotcase studies (that is, a single group design witha post-test only)—provide little evidence of thecausal impact of improvement activities.5 Con-sequently, methodological rigour is often un-necessarily sacrificed when designs are avail-able which can improve the causalinterpretability of the results. The purpose of this paper is to make a case for improving therigour of research designs in order to enhance

Key messages+ In an era of chronic resource scarcity it is

critical that quality improvement profes-sionals have confidence that their projectactivities cause measured change.

+ A commonly used research design, the

single group pre-test/post-test design,provides little insight into whether qualityimprovement interventions causemeasured outcomes.

+ Many other quasi-experimental designscan be employed in most contexts insteadof single group pre-test/post-test designs,and provide much greater causal inter-pretability of the findings of the qualityimprovement project evaluation. One of the most powerful of these is interruptedtime series designs.

Implications of these findings forquality improvement

The adoption of more rigorous researchdesigns to evaluate quality improvementeV orts will enable quality improvement pro-fessionals to know with greater confidencewhether the eV orts actually work. In turn,projects determined to be successful basedon this more compelling evidence can moreconfidently be diV used as best practices. Alatent benefit of improving the level of rigour in the evaluation research design isthat the perception of quality improvementprojects as “scientific” will be enhanced.

Quality in Health Care 2001;10:179–186 179

The Quality Institute,Atlantic HealthSystem, 325 ColumbiaTurnpike, Florham

Park, NJ 07932, USAG Cable, director of research

Correspondence to:Dr G [email protected]

Accepted 30 March 2001

www.qualityhealthcare.com

8/8/2019 179qs

http://slidepdf.com/reader/full/179qs 2/8

the causal interpretability of quality improve-ment eV orts.

The first section of the paper provides back-ground regarding the comparative virtues of diV erent research designs for making causalstatements about intervention eV ects. Thissection also contains a description of a rigorousquasi-experimental design that can often beemployed in quality improvement studiesinstead of less rigorous single group pre-test/

post-test designs. The second part of the papercontains a comparison of the results of are-evaluation of a quality improvement projectconducted to reduce the rates of bilateralcardiac catheterisations with the results of theoriginal evaluation. The original evaluationused a single group pre-test/post-test designwhile the quasi-experimental design describedin the first part of the paper is employed in there-evaluation. The comparison shows the valueof improving the rigour of the evaluationresearch design for enhancing the causal inter-pretability of improvement eV orts.

Comparing the internal validity of research designs: randomised trials,single group pre-test/post-test designs,and quasi-experimentsThe primary purpose of research design is toprovide investigators with evidence regardingthe degree to which an intervention causesmeasured outcomes. A causal relationship isone in which four conditions are met: (1) ameasurable cause precedes a measurable eV ectand the timing of the eV ect is consistent withthe nature of the mechanism behind the cause;(2) the magnitude (including duration) of theeV ect is proportional to that of the cause; (3)the eV ect is not present in the absence of theaction; and (4) all plausible competing expla-

nations for the eV ect can be ruled out. Thedegree to which an investigator can infer causeis reflected in the level of internal validity of theresearch design. Designs with high internalvalidity provide the investigator with great con-fidence, ceteris paribus, that a manipulatedvariable—for example, in this context, theactivities of a quality improvement eV ort— caused the observed change in the outcomevariable(s).5 6

Randomised controlled trials are widely rec-ognised as having high levels of internal validityprimarily due to the creation by the investigatorof two or more mathematically equivalentcomparison groups.5–7 Group equivalence is

achieved through the process of randomassignment, assignment that assures that eachcase has the same probability of being assignedto the two (or more) arms of the trial. Whensample sizes are suYciently large, randomassignment of cases to treatment arms providesgreat confidence before the implementation of the intervention that the arms are equivalent onall known and unknown factors that mightaV ect the outcome measures employed in thetrial. Hence, upon normal completion of thetrial investigators can have great confidence,quantifiable by statistical intervals and othermeasures, that diV erences between the arms of 

the trial are due largely to variables they them-selves have manipulated.5–7 In many contexts,however, it is not feasible to design and imple-ment a randomised controlled trial becauseinvestigators have little control over how casesare assigned to treatment arms, or an appropri-ate comparison group simply cannot be assem-bled. Unfortunately, in quality improvementresearch this often leads investigators to use thesingle group pre-test/post-test design or one

closely related as the primary fall back design,resulting in a large decrease in the level of internal validity.

In contrast to randomised controlled de-signs, single group pre-test/post-test designshave comparatively low internal validity in partbecause they provide the investigator with nocounterfactual—that is, no evidence regardingwhat would have happened in the absence of the intervention. In the absence of a relevantcounterfactual, changes in the outcome meas-ure from pre-test to post-test can be attributedto many factors other than the intervention.8

These factors include, but are not limited to,such things as external events, cyclical varia-

tions in the outcome measure, and undiscov-ered changes in the instrumentation employedto collect outcome. The fall oV  in the level of internal validity is wholly unnecessary in mostinstances, given the availability of several rigor-ous quasi-experimental designs.5 6 8

One family of quasi-experimental designswith generally high levels of internal validity isinterrupted times series designs. Time seriesare data collected at equal intervals over time.These data may be collected for any unit of analysis. In medicine, time series data might becollected to capture weekly medication order-ing errors in a hospital, to monitor variations inhourly temperature readings for a septicpatient, or as monthly rates of bilateral cardiac

catheterisations (as in the example presentedbelow). Time series data can then be repre-sented through graphical techniques and math-ematical modelling in a way that permits theidentification of systematic (and potentiallycausal) factors and non-systematic factors.5 9–11

Interrupted time series, as the name implies,are time series during which an event occursthat is thought to “interrupt” the existingnumerical variation in the series in somesystematic manner, as when administration of the antibiotics (the event) interrupts the hourlyvariation in the temperature of the septicpatient. Following the event the series isexpected to change in level, slope, and/or shape

based on some a priori  understanding of themechanism through which the event causeschange. An interrupted time series design canbe depicted as follows with a commonly usednotation of research design5 6:

O1 O2 O3 O4 O5 O6 X O7 O8 O9 O10 O11 O12 O13

O14 O15 O16 O17 O18

In this example the series is 18 equal intervalperiods long. The “O”s depict the collection of data for each period (O for observation). Theposition of the “X” indicates that the eventoccurred between periods 6 and 7.

Interrupted time series designs can be usedto determine whether the empirical post-event

180 Cable

www.qualityhealthcare.com

8/8/2019 179qs

http://slidepdf.com/reader/full/179qs 3/8

series is diV erent in some systematic fashionfrom the series before the event. Powerfulmathematical modelling techniques have beendeveloped to determine the eV ect of the eventon the series, independent of other systematicfactors such as other discrete events andseasonal or cyclical variations in the series.10 11

The net eV ect is a design that provides muchgreater confidence than single group pre-test/post-test designs that the event “interrupting”

the series actually caused post-event changes inthe series.

An especially powerful interrupted timeseries design is one with “switching replica-tions” in which time series of identical lengthare assembled for two or more non-equivalent(that is, not the product of random assignment)but nonetheless similar groups—for example,two hospitals in the same area. Each series inthe design will have experienced the event of interest, but at diV erent periods in the series.The switching replications design for twogroups in which the event has occurred isdepicted as follows:

group 1: O1 O2 O3 O4 O5 O6 X O7 O8 O9 O10

O11 O12 O13 O14 O15 O16 O17 O18

group 2: O1 O2 O3 O4 O5 O6 O7 O8 O9 O10 O11

O12 X O13 O14 O15 O16 O17 O18

In this example the event occurred betweenperiods 6 and 7 in the first group under studyand between periods 12 and 13 in the secondgroup. Each series can ser ve as acomparison—a counterfactual—for the other,because the series are coeval. As a result, theinternal validity of this design greatly exceedsthat of single group pre-test/post-test designs aswell as that of a single group interrupted timeseries.5 8

When a study uses concomitant time seriesin which the events occur at diV erent periods as

in the switching replications design, research-ers can more readily detect the potential pres-ence of “history” threats to internal validity.History threats are events or processes otherthan the event of interest that can aV ect thevariation of the outcome measure(s) understudy. For this reason, history threats tointernal validity are sometimes referred to as“external” events.8 History threats often goundetected and potentially confound the inves-tigator’s ability to imply cause from the eV ect of the interventions.5 6 8 However, when the de-sign includes at least two series as in a switch-ing replications design, a “common” historythreat will probably register as a change thatoccurs at similar periods, and of similar magni-tude and duration in all series. Hence, thethreat posed by the external event to the causalinterpretability of the intervention eV ect isessentially neutralised because the investigatorcan detect whether and when it is present inmore than one series.5 8

In the next section the switching replicationsdesign is used to re-evaluate a quality improve-ment eV ort implemented in the mid 1990s inone state in the USA. The re-evaluation showshow the use of this design provides greaterinsight into the extent to which improvementeV orts caused the measured outcomes than the

single group pre-test/post-test design used inthe original evaluation.

Comparison of switching replicationsinterrupted time series design with asingle group pre-test and post-test

design: PRO Cardiac CatheterisationProjectBACKGROUND TO THE ORIGINAL PROJECT

Since the implementation of a programmedeveloped by the Health Care Finance Admin-istration in the mid 1990s to improve care pro-vided to Medicare patients, medical peerreview organisations (PROs) have completedhundreds of improvement projects.12 13 In 1994a PRO in a north eastern state in the USA ini-tiated a quality improvement project designedto reduce the use of bilateral cardiac catheteri-sations in the Medicare population (aged 65and over) within a few participating catheteri-sation laboratories in the state.14 The projectwas initiated in response to new guidelines

which held that right heart catheterisations areunnecessary for diagnostic purposes in theabsence of specific clinical indications (box1).15

The intervention for each laboratory in-cluded the following components: (1) a policychange requiring documentation to perform abilateral catheterisation; (2) staV  educationregarding the policy change; (3) a multidiscipli-nary approach to the development, implemen-tation, and assessment of the improvementplan; and (4) development of plans of action toaddress policy non-compliance (box 2).14

In the original analysis the PRO used a singlegroup pre-test/post-test design to evaluate the

success of the improvement eV 

orts in eachlaboratory. The original evaluation measuredchange from a pre-intervention period from  January to December 1993 (that is, one datapoint for the pre-intervention period) to eachquarter of 1995 and the 1995 period as a whole(one data point for the post-interventionperiod).14 No data were analysed from 1994,the year in which interventions were imple-mented. This design can be depicted for eachlaboratory as:

O1993 X1994 O1995where the “O”s representmeasurement periods and “X” the timing of the intervention. Hence, it is clear from the

In 1991 the American College of Cardiol-ogy and the American Heart Associationpublished guidelines regarding cardiac cath-eterisation and catheterisation laboratoriesin response to the increases in the numbersof catheterisations being performed,changes in the reason why they were beingperformed, and the setting in which theywere being performed—that is, other than in

hospital based laboratories. In addition toclinical practice issues, the guidelines ad-dressed ethical concerns including patientsafety, conflicts of interest related to owner-ship, operation, self-referrals, and advertis-ing of services.

Box 1 Rationale behind new guidelines for right heart catheterisations.

Enhancing causal interpretations in quality improvement  181

www.qualityhealthcare.com

8/8/2019 179qs

http://slidepdf.com/reader/full/179qs 4/8

notation depicting this design that temporalvariation in bilateral catheterisations wentunmeasured at intervals within the pre-intervention period, which included all of 1993up to the point when interventions were imple-mented in 1994 in each laboratory. The omis-sion of information could have provided insightinto seasonal or cyclical variation in bilateralcatheterisation rates, as well as trendingevidence.

The results of the original evaluation indi-cated that bilateral catheterisation rates de-clined in each laboratory in 1995 from baselinelevels of 1993. In laboratory A the decline from1993 to 1995 was from 87.9% to 41%, in labo-

ratory B rates fell from 82% to 40.4%, and inlaboratory C the decline was from 97.9% to58.1%.14 The use of this single group pre-testand post-test design in which no data were col-lected during 1994, the year in which theprojects were implemented in each laboratory,makes it diYcult to assess the nature of theimpact of the individual interventions. Moreimportantly, this design is poorly equipped toassess whether the improvement eV orts caused 

the measured changes. Moreover, the designignores the potentially unique eV ect on rates inone laboratory in which it was decided to aug-ment the project activities common to all labo-ratory interventions by changing the packagingof the catheterisation trays. This component of the intervention required the cardiologist tomake a specific request in order to catheteriseboth sides of the heart which compelled him orher to provide the clinical justification to cath-eterise the second side of the heart—for exam-ple, pulmonary hypertension, mitral or aorticvalve disease.14

METHODS USED TO RE-EVALUATE THE PRO

PROJECT

An interrupted time series design with switch-ing replications was employed to assess theimpact of the individual interventions in the

three laboratories. In this re-evaluation,monthly bilateral catheterisation rates wereanalysed for the period from January 1991 toOctober 1996, 70 consecutive months of bilat-eral catheterisation rates for each laboratory.The data were assembled from Medicaredischarge databases, the same source of data aswas used in the original evaluation.14 Thisextended duration was chosen to provide suY-ciently long pre-intervention and post-

intervention periods to examine the impact of the interventions after controlling for poten-tially confounding systematic variation (dis-cussed in greater detail below). To maintainconfidentiality, the three laboratories are re-ferred to hereafter as laboratory A, B and C,respectively. Laboratory A implemented theirintervention in July 1994, laboratory B inOctober, and laboratory C in December,respectively. Our re-evaluation design can bedepicted as follows:

Laboratory A: O91, Jan O. O. O. O. O94, Jul X O94,

Aug O. O. O. O. O. O. O. O. O. O. O96, Oct

Laboratory B: O91, Jan O. O. O. O. O. O. O. O.O94, Oct X O94, Nov O. O. O. O. O. O96, Oct

Laboratory C: O91, Jan O. O. O. O. O. O. O. O.O. O. O. O94, Dec X O95, Jan. O. O. O96, Oct .

It should be apparent that the switching rep-lications design permits a more rigorous testthan the original design of the possibility thatthe implementation of laboratory A’s interven-tion in July independently aV ected the bilateralcatheterisation rates for laboratories B and Cbefore implementation of their interventions.By extension, this design also facilitates theevaluation of whether laboratory C’s rates wereindependently aV ected by the implementationof laboratory A’s and/or laboratory B’s inter-vention.

STATISTICAL ANALYSIS

The quantitative impact of individual interven-tions was determined using Box and Tiao’smethods. Based on Box and Jenkin’s autore-gressive integrated moving average (ARIMA)models, Box–Tiao methods allow the model-ling of systematic structures in each laborato-ry’s series, including potentially confoundingseasonal and cyclical eV ects.5 10 11 In otherwords, this part of the modelling processpermits the analyst to filter out the potentiallyconfounding eV ects of systematic variation intime series in order to determine the magni-tude and structure of the eV ect of the interven-tions. After first constructing ARIMA models

for each laboratory, variables representing thetiming of the impact of the intervention in eachlaboratory were added. Three possible forms of the impact were tested in each laboratory: (1)an abrupt permanent eV ect; (2) a gradual per-manent eV ect; and (3) an abrupt temporaryeV ect. Statistical hypotheses were tested at=0.05. Ljung–Box’s Q statistic was employedas a goodness of fit measure for the models of each laboratory. The statistic tests the nullhypothesis that model residuals are not auto-correlated. Thus, failure to reject this hypoth-esis indicates that the empirical model fits thedata well.16 Analyses were conducted using

+ A policy change requiring documenta-tion to perform a cardiac catheterisationon both sides of the heart including achecklist of the indications for thebilateral catheterisation to be part of thepatient’s medical record

+ StaV  in-service education regarding thepolicy change

+ A multidisciplinary approach to the

development, implementation, and as-sessment of the improvement plan— specifically, inclusion of a representativefrom all professions potentially aV ectedby the plan

+ A mechanism for monitoring the imple-mentation of the plan

+ Development of plans of action toaddress policy non-compliance (for ex-ample, what the administrative stepswould be should a physician refuse tocomply with policy changes)

Box 2 Elements common to all laboratoryinterventions.

182 Cable

www.qualityhealthcare.com

8/8/2019 179qs

http://slidepdf.com/reader/full/179qs 5/8

8/8/2019 179qs

http://slidepdf.com/reader/full/179qs 6/8

Some evidence exists of a “local” historythreat to the internal validity of the re-evaluation in which an external event or eventshas aV ected bilateral catheterisation rates inonly one of the laboratories.5 8 Specifically, inlaboratory B, but not A or C, something otherthan the intervention appears to have initiateda long monotonic decline in rates beginning inabout November 1993 that continued through-out the remainder of the period of study. Thiscould be in part the result of a change in thecomposition of patients in the service area of laboratory B (for example, a younger healthierpopulation), practice changes within the labo-ratory, or a change in how bilateral catheterisa-

tions were measured. Since we do not havethese data for laboratory B, or any data forlaboratories near it, we are unable to determinewhether this downward trend was somethingunique to laboratory B or the result of event(s)common to other laboratories in proximity toit. Regardless of this, it is evident that the iden-tification of this change in bilateral catheterisa-tion rates in laboratory B beginning in Novem-ber 1993 was also not possible with the originalPRO research design.

The results of Box and Tiao modelling indi-cate that the intervention in laboratory A wasassociated with a mean decrease in bilateralcatheterisations of almost 2% each month (t =

 –0.40, p=0.04) in the post-intervention periodas part of an ARIMA (0,1,1) model (Ljung– Box Q, p=0.95, that is, fail to reject thehypothesis that model residuals are not auto-correlated, that is, the model fits the data well).The data from laboratory B were first trans-formed into natural logarithmic form toachieve homogeneity of variance in the series (anecessary prerequisite in Box–Jenkins mod-els).10 11 The intervention in laboratory B wasassociated with a not statistically significantand small percentage decrease of 0.0005 permonth in the post-intervention period as partof an ARIMA (0,1,1) model (Ljung–Box Q,

p=0.80). The data from laboratory C were alsomodelled using a natural logarithm transfor-mation. The intervention at laboratory C wasassociated with a mean percentage decrease of 0.03 per month (t =–4.81, p=0.02) in anARIMA (0,1,1) model without a constant(Ljung–Box Q, p=0.99).

Finally, laboratories A and C are in closegeographical proximity to each other (a fewmiles apart within the same city), raising thepossibility that “cross talk” regarding labora-tory A’s intervention occurred which began ade facto intervention in laboratory C before theimplementation of the formal intervention inlaboratory C. We therefore examined whether

the implementation of laboratory A’s interven-tion had a measurable eV ect on laboratory C’sbilateral catheterisation rates, independent of laboratory C’s formal intervention. Althoughfig 3 provides some graphical evidence in sup-port of this hypothesis, the evidence from theBox–Tiao modelling process suggests that nostatistically significant decrease occurred in thedata in laboratory C as a result of theimplementation of the intervention in labora-tory A. Moreover, no statistical evidenceexisted that the October intervention inlaboratory B had an independent eV ect on therates in laboratory C. Similarly, no evidencewas seen that the intervention in laboratory A

had an independent eV 

ect on bilateral cath-eterisation rates in laboratory B.

DISCUSSION OF RE-EVALUATION RESULTS

Our results strongly suggest that the qualityimprovement interventions reduced bilateralcatheterisation rates in two of the three labora-tories, laboratories A and C. The evidenceindicates that the intervention in laboratory Aresulted in an immediate fall in bilateralcatheterisation rates that continued during theperiod of study. The eV ect of the interventionin laboratory C was somewhat delayed andsmaller in magnitude. Visual evidence suggests

  Figure 3 Percentage bilateral catheterisations at laboratory C monthly from January 1991 to October 1996, withdelineation of pre-intervention and post-intervention periods for laboratory C, and laboratories A and B.

   J  a  n   9   1

   M  a  r   9   1

   M  a  y   9   1

   J  u   l   9   1

   S  e  p   9   1

   N  o  v   9   1

   J  a  n   9   2

   M  a  r   9   2

   M  a  y   9   2

   J  u   l   9   2

   S  e  p   9   2

   N  o  v   9   2

   J  a  n   9   3

   M  a  r   9   3

   M  a  y   9   3

   J  u   l   9   3

   S  e  p   9   3

   N  o  v   9   3

   J  a  n   9   4

   M  a  r   9   4

   M  a  y   9   4

   J  u   l   9   4

   S  e  p   9   4

   N  o  v   9   4

   J  a  n   9   5

   M  a  r   9   5

   M  a  y   9   5

   J  u   l   9   5

   S  e  p   9   5

   N  o  v   9   5

   J  a  n   9   6

   M  a  r   9   6

   M  a  y   9   6

   J  u   l   9   6

   S  e  p   9   6

120

100

80

60

40

20

0

   %    b

   i   l  a   t  e  r  a   l  c  a   t   h  e   t  e  r   i  s  a   t   i  o  n  s

Lab A intervention (July)

Pre-intervention decline begins (June 1994)

Lab B intervention (Oct)

Lab C intervention (Dec)

485050

54

7070

60

67

48

4743

5555

56

64

7876

5344

38

43

66

94

84

83

94

88

8280

9392

97 97

979797 979797

97

9393939190

9797

96

9696 95

969696

100 100 100100 100

100 100100 100

100 100 100 100100

100100

184 Cable

www.qualityhealthcare.com

8/8/2019 179qs

http://slidepdf.com/reader/full/179qs 7/8

that near the end of the period of study thedeclines might have ceased in laboratory C.Laboratory A was the laboratory that aug-mented the activities common to all of thelaboratory interventions by changing the pack-aging of the catheterisation trays, thus requir-ing that cardiologists make special requests tocatheterise both sides of the heart. The imme-diate large magnitude of the eV ect resultingfrom intervention in laboratory A suggests that

this innovation may have increased the successof the intervention. Perhaps of greater signifi-cance for our purposes is the fact that the suc-cess of the change in packaging of catheterisa-tion trays could not have been discovered usingthe original single group pre-test/post-testdesign. Hence, the importance of using anappropriately rigorous research design isbrought into greater relief.

Our findings contrast clearly with those of the original evaluation which described largenumerical decreases “following” the qualityimprovement eV orts—that is, between 1993and 1995—in all of the catheterisation labora-tories.14 The original analysis did not, however,

tell the entire story of the quality improvementeV orts. Rates did, indeed, fall in all the labora-tories between 1993 and 1995, but more rigor-ous evidence from the re-evaluation suggeststhat the improvement eV orts were eYcaciousin only two of the three laboratories.

The design employed in our re-evaluationtherefore provides demonstrably greater confi-dence in the causal eV ect of the improvementeV orts in laboratories A and C, as well asevidence regarding the magnitude and dura-tion of the eV ects (through the Box and Tiaomodels). Moreover, in contrast to the findingsof the original evaluation, the switching replica-tions design provides greater evidence that theintervention in laboratory B was not eV ective.

Potentially undetected external eventsthreaten our ability to attribute cause with evengreater confidence to the interventions in labo-ratories A and C. Specifically, there may havebeen events external to the quality improve-ment intervention that occurred ahead of, or atthe same time as, the implementation of theimprovement activities in laboratories A and Cthat can explain post-intervention changes inthe series. For example, in March 1994 the USgovernment’s Agency for Health Care Policyand Research and National Heart, Lung, andBlood Institute jointly released practice guide-lines for the diagnosis and management of unstable angina.17 The guideline contains

recommendations regarding the use of cardiaccatheterisation. We cannot therefore rule outthe possibility that some part of the decline inbilateral catheterisation rates in both laborato-ries can be attributed to the momentum forchange in practice created by the release of these guidelines. However, this threat alsowould have aV ected the original evaluation.Moreover, the single group pre-test/post-testdesign used in the original evaluation is subjectto a panoply of additional threats.5 6 8 In sum,we believe our evaluation research design hasradically improved the accuracy and the causalinterpretability of the findings. Moreover, the

re-evaluation provides tangible evidence insupport of our position that more rigorousdesigns can and should be more widelyemployed to improve the causal interpretabilityof quality improvement eV orts.

Conclusions

This paper has attempted to make a case forimproving the rigor of evaluation designs usedin quality improvement projects in order to

enhance the causal interpretability of theresults. We have focused on a particularinterrupted time series design—switchingreplications—as one time series design that canbe used in many contexts in place of designswith lower internal validity. We recognise thatin some contexts a switching replicationsdesign is not feasible because the interventioncannot be implemented in a second group.When working under this constraint, singlegroup interrupted time series will still providegreater ability to make causal inferences thanthe seemingly ubiquitous single group pre-test/post-test design.5 6

Two especially compelling interrupted time

series alternatives to a single group pre-test/post-test design are the single group inter-rupted design with a removed treatment andthe single group interrupted design with anon-equivalent dependent variable.5 The re-moved treatment design can be used when theinvestigators have the ability to institute theintervention and then subsequently remove it,preferably at a randomly selected post-intervention period.5 8 An example of thedesign is depicted below for a time series of 18periods:

O1 O2 O3 O4 O5 O6 X O7 O8 O9 O10 O11 O12 R O13 O14 O15 O16 O17 O18

Again, “X” indicates the timing of theimplementation of the quality improvement

intervention. “R” depicts the timing of theremoval of the intervention. The designincreases causal interpretability over a simpleinterrupted time series by providing the inves-tigator with a more powerful test of both thepresence and absence of the intervention.5

The interrupted time series design with anon-equivalent dependent variable can bedepicted as follows:

Outcome measure: O1 O2 O3 O4 O5 O6 X O7

O8 O9 O10 O11 O12 O13 O14 O15 O16 O17 O18

Related measure: O1 O2 O3 O4 O5 O6 X O7 O8

O9 O10 O11 O12 O13 O14 O15 O16 O17 O18

This is a single group interrupted time seriesdesign because both series are data collected

from the same unit of analysis—for example,floor, patient, hospital. The first series are datafor the outcome measure expected to beaV ected by the quality improvement interven-tion. The second is a time series coeval with thefirst that normally varies in the same way as theoutcome measure, but is not expected tochange after the intervention. Thus, if theimprovement eV ort works as designed, investi-gators would expect to see changes only in thepost-intervention series of the outcome meas-ure.

These variants of the interrupted times seriesdesign provide powerful alternatives to single

Enhancing causal interpretations in quality improvement  185

www.qualityhealthcare.com

8/8/2019 179qs

http://slidepdf.com/reader/full/179qs 8/8

group pre-test/post-test designs. Given the nearchronic paucity of resources available toprovide care and conduct research, it isimperative that quality improvement projectsare able to demonstrate their eV ectiveness.1 2 Itis not suYcient to show that measures changedfor the better “following” interventions andthen to assume that the change was caused bythe intervention, post hoc, ergo proptor hoc. Thepresence of favourable post-intervention

changes in outcome measures may lead anorganisation to believe that the quality im-provement eV ort was successful, regardless of the internal validity of the evaluation design,and to continue to expend resources on the“successful” quality improvement programme.As long as outcomes are favourable, theinference that the improvement eV ort workedwill have no real consequences. If outcomesdeteriorate, however, no mechanism wouldexist to identify the causes of the deteriorationsince the original evaluation design was illsuited to determine whether the interventioninitially worked.

Evaluation designs for quality improvement

projects should be constructed to provide areasonable opportunity, given available timeand resources, for causal interpretation of theresults. Evaluators of quality improvement ini-tiatives may infrequently have access to ran-domised designs. Nonetheless, as we haveshown here, other very rigorous researchdesigns are available for improving causalinterpretability. Unilateral methodological sur-render need not be the only alternative to ran-domised experiments.

1 Smith S, Freeland M, HeZer S, et al . The next ten years of health spending: what does the future hold? The HealthExpenditures Projection Team. Hea lth A V airs1998;17:128–40.

2 Iglehart J. The American health care system—expenditures. N Engl J Med  1999;340:70–6.

3 Kuttner R. The American health care system—employer-sponsored health coverage. N Engl J Med  1999;340:248– 52.

4 Kuttner R. The American health care system—Wall Streetand health care. N Engl J Med  1999;340:664–8.

5 Cook TD, Campbell DT. Quasi-experimentation: design and analysis issues for field settings. Boston: Houghton MiZin,1979.

6 Campbell DT, Stanley JC. Experimental and quasi-experimental designs for research. Boston: Houghton MiZin,1963.

7 Meinert C. Clinical trials: design, conduct and analysis. NewYork: Oxford University Press, 1986.

8 Mohr L. Impact analysis for program evaluation. Chicago:TheDorsey Press, 1988.

9 Tukey J. Exploratory data analysis. Reading, MA: Addison-Wesley, 1977.

10 Box GEP, Tiao GC. Interventions analysis with applicationsto economic and environmental problems. J Am Stat Assoc1975;70:70–9.

11 Pankratz A. Forecasting with dynamic regression models. NewYork: John Wiley and Sons, 1983.

12 Bodenheimer T. The American health care system: themovement for improved quality in health care. N Engl J 

 Med  1999;340:488–92.13 Jencks SF, Wilensky GR. The health care quality improve-

ment initiative: a new approach to quality assurance inMedicare. JAMA 1992;268:900–3.

14 New Jersey Peer Review Organization (PRO). Reducing the

use of combined right/left heart catheterisation in Medicare  patients with uncomplicated coronary heart disease. New Jersey: PRO of New Jersey, 1997.

15 Pepine CJ, Allen HD, Bashore TM, et al . American Collegeof Cardiology/American Heart Association guidelines forcardiac catheterization and cardiac catheterization labora-tories. Ad Hoc Task Force on Cardiac Catheterization. Cir-culation 1991;84:2213–47.

16 Ljung GM, Box GEP. On a measure of lack of fit in timeseries models. Biometrika 1978;65:297–303.

17 Agency for Health Care Policy and Research/ NationalHeart, Lung and Blood Institute. Unstable angina: diagnosisand management. Clinical practice guideline. Washington:Public Health Service, 1994.

186 Cable

www.qualityhealthcare.com