22
Corso di Biostatistica corso di biostatistica descrivere i dati lun 27 luglio introduzione, cos’è la biostatistica, esempi tratti da ricerche pubblicate. la tecnologia, il software R, il linguaggio, l’interfaccia R Commander, la suite R Studio. statistica descrittiva, centralità e dispersione, rappresentazione grafica dei dati, frequenze, misure di tendenza centrale, misure di dispersione e di entropia. la probabilità, le variabili aleatorie della biostatistica, le v.a. gaussiana, bernoulliana / binomiale, di Poisson: densità, probabilità, valore atteso e varianza; introduzione alla simulazione di eventi aleatori; dalla disuguaglianza di Čebišev alla sovradispersione dei dati. the most dangerous equation, lo standard error, a cosa serve e a cosa non serve. le stime, stime puntuali e stime intervallari, intervallo di fiducia (confidence interval ), grado di fiducia: metodi esatti e metodi bootstrap. criteri di decisione, informazione e probabilità, il p-value: cos’è, cosa non è, a cosa serve e a cosa non serve; la verosimiglianza, la devianza e i criteri di informazione. B [email protected] Í www.dmi.units.it/borelli

Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

  • Upload
    others

  • View
    16

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

Corso di Biostatistica

corso di biostatisticadescrivere i dati

lun 27 luglio introduzione, cos’è la biostatistica, esempi tratti da ricerche pubblicate.la tecnologia, il software R, il linguaggio, l’interfaccia R Commander, la suite R Studio.statistica descrittiva, centralità e dispersione, rappresentazione grafica dei dati, frequenze,misure di tendenza centrale, misure di dispersione e di entropia.la probabilità, le variabili aleatorie della biostatistica, le v.a. gaussiana, bernoulliana /binomiale, di Poisson: densità, probabilità, valore atteso e varianza; introduzione allasimulazione di eventi aleatori; dalla disuguaglianza di Čebišev alla sovradispersione dei dati.the most dangerous equation, lo standard error, a cosa serve e a cosa non serve.le stime, stime puntuali e stime intervallari, intervallo di fiducia (confidence interval), gradodi fiducia: metodi esatti e metodi bootstrap.criteri di decisione, informazione e probabilità, il p-value: cos’è, cosa non è, a cosa serve ea cosa non serve; la verosimiglianza, la devianza e i criteri di informazione.

B [email protected] • Í www.dmi.units.it/borelli

Page 2: Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

ORIGINAL CONTRIBUTION

Medicine Residents’ Understandingof the Biostatistics and Resultsin the Medical LiteratureDonna M. Windish, MD, MPHStephen J. Huot, MD, PhDMichael L. Green, MD, MSc

PHYSICIANS MUST KEEP CURRENTwith clinical information topractice evidence-based medi-cine (EBM). In doing so, most

prefer to seek evidence-based summa-ries, which give the clinical bottomline,1 or evidence-based practice guide-lines.1-3 Resources that maintain theseinformation summaries, however, cur-rently include a limited number of com-mon conditions.4 Thus, to answer manyof their clinical questions, physiciansneed to access reports of original re-search. This requires the reader to criti-cally appraise the design, conduct, andanalysis of each study and subse-quently interpret the results.

Several surveys in the 1980s dem-onstrated that practicing physicians,particularly those with no formal edu-cation in epidemiology and biostatis-tics, had a poor understanding of com-mon statistical tests and limited abilityto interpret study results.5-7 Many phy-sicians likely have increased difficultytoday because more complicated sta-tistical methods are being reported inthe medical literature.8 They may beable to understand the analysis and in-terpretation of results in only 21% ofresearch articles.8

Educators have responded by increas-ing training in critical appraisal and bio-statistics throughout the continuum ofmedical education. Many medical

schools currently provide some formalteaching of basic statistical concepts.9 Aspart of the Accreditation Council forGraduate Medical Education’s practice-based learning and improvement com-petency, residents must demonstrateability in “locating, appraising, and as-similating evidence from scientific stud-

ies related to their patients’ problems andapply knowledge of study designs andstatistical methods to the appraisal of

Author Affiliations: Department of Internal Medi-cine, Yale University School of Medicine, New Ha-ven, Connecticut.Corresponding Author: Donna M. Windish, MD, MPH,Yale Primary Care Residency Program, 64 Robbins St,Waterbury, CT 06708 ([email protected]).

Context Physicians depend on the medical literature to keep current with clinical in-formation. Little is known about residents’ ability to understand statistical methods orhow to appropriately interpret research outcomes.

Objective To evaluate residents’ understanding of biostatistics and interpretation ofresearch results.

Design, Setting, and Participants Multiprogram cross-sectional survey of inter-nal medicine residents.

Main Outcome Measure Percentage of questions correct on a biostatistics/studydesign multiple-choice knowledge test.

Results The survey was completed by 277 of 367 residents (75.5%) in 11 residencyprograms. The overall mean percentage correct on statistical knowledge and inter-pretation of results was 41.4% (95% confidence interval [CI], 39.7%-43.3%) vs 71.5%(95% CI, 57.5%-85.5%) for fellows and general medicine faculty with research train-ing (P! .001). Higher scores in residents were associated with additional advanceddegrees (50.0% [95% CI, 44.5%-55.5%] vs 40.1% [95% CI, 38.3%-42.0%]; P!.001);prior biostatistics training (45.2% [95% CI, 42.7%-47.8%] vs 37.9% [95% CI, 35.4%-40.3%]; P=.001); enrollment in a university-based training program (43.0% [95%CI, 41.0%-45.1%] vs 36.3% [95% CI, 32.6%-40.0%]; P=.002); and male sex (44.0%[95% CI, 41.4%-46.7%] vs 38.8% [95% CI, 36.4%-41.1%]; P=.004). On indi-vidual knowledge questions, 81.6% correctly interpreted a relative risk. Residents wereless likely to know how to interpret an adjusted odds ratio from a multivariate regres-sion analysis (37.4%) or the results of a Kaplan-Meier analysis (10.5%). Seventy-fivepercent indicated they did not understand all of the statistics they encountered in jour-nal articles, but 95% felt it was important to understand these concepts to be an in-telligent reader of the literature.

Conclusions Most residents in this study lacked the knowledge in biostatistics neededto interpret many of the results in published clinical research. Residency programs shouldinclude more effective biostatistics training in their curricula to successfully prepareresidents for this important lifelong learning skill.JAMA. 2007;298(9):1010-1022 www.jama.com

1010 JAMA, September 5, 2007—Vol 298, No. 9 (Reprinted) ©2007 American Medical Association. All rights reserved.

Downloaded From: http://jama.jamanetwork.com/ on 10/16/2013

Page 3: Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

il dataset tooth

gender il1b smoke areainfl1 F etero low 39.972 M wt low 24.013 F etero low 35.774 M etero high 58.655 M etero low 27.716 M etero high 48.367 M mut low 44.978 F wt high 56.099 M wt high 68.82

10 F etero low 22.08.. .. .. .. ..

65 M mut low 51.2766 F wt low 27.7167 F wt high 68.4568 F wt low 39.4069 F wt low 19.95

Grafici di base

Quali sono i grafici piu appropriati per ’illustrare’:

• gender

• il1b

• smoke

• areainfl

1

Page 4: Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

Grafici ’bivariati’

Quali sono i grafici piu appropriati per ’illustrare’:

• areainfl vs. gender

• il1b vs. smoke

Tabelle

Evidenziamo con una tabella i dati relativi a:

• gender

• il1b

• areainfl

• areainfl vs. gender

• il1b vs. smoke

Statistica descrittiva

Come possiamo riassumere, in maniera ’univariata’:

• gender

• il1b

• smoke

• areainfl

e in maniera ’bivariata’

• areainfl vs. gender

• il1b vs. smoke

Normalita

Cosa possiamo dire circa la normalita di areainfl?

2

Page 5: Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

1.14. Entropy and Related Concepts 49

where the second equality follows from (B.21) and the o(h) notation is dis-cussed in Appendix B.8. In the case of cellular molecules, this implies thathaving lived to age at least x, the probability that a molecule degrades inthe time interval (x, x+h), where h is small, is approximately proportionalto the length h of the interval. This example will be pursued further inSections 2.11.1 and 4.1.

1.14 Entropy and Related Concepts

1.14.1 Entropy

Suppose that Y is a discrete random variable with probability distributionPY (y). The entropy H(PY ) of this probability distribution is defined by

H(PY ) = −∑

y

PY (y) logPY (y), (1.117)

the sum (as with all sums in this section) being taken over all values of yin the range of Y. Since this quantity depends only on the probabilities ofthe various values of Y , and not on the actual values themselves, it can bethought of, as the notation indicates, as being a function of the probabilitydistribution PY = {PY (y)} rather than the random variable Y .In some areas of computational biology the base of the logarithm in

this definition is taken to be 2. The reason for this, in terms of “bits” ofinformation, is discussed in Appendix B.10. Although we will use the base2 in the definition (1.117) later in this book, for the moment we use naturallogarithms (and the notation “log” for these).The entropy of a probability distribution is a measure of how close to

uniform that distribution is, and thus, in a sense, of the unpredictability ofany observed value of a random variable having that distribution. If thereare s possible values for the random variable, the entropy is maximizedwhen PY (y) = s−1. In this case it takes the value log s, and the value to beassumed by Y is in a sense maximally unpredictable. At the other extreme,if only one value of Y is possible, the entropy of the distribution is zero,and the value to be assumed by the random variable Y is then completelypredictable.Despite the fact that both the entropy and the variance of a probability

distribution measure in some sense the uncertainty of the value of a randomvariable having that distribution, the entropy has an interpretation differentfrom that of the variance. The entropy is defined only by the probabilities ofthe possible values of the random variable and not by the values themselves.On the other hand, the variance depends on these values. Thus if Y1 takesthe values 1 and 2 each with probability 0.5, and Y2 takes the values 1 and100 each with probability 0.5, the distributions of Y1 and Y2 have equalentropies but quite different variances.

Page 6: Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

urna = c("bianco", "nero")

p = 0.5sample(urna, 1000, replace = TRUE, prob = c(p, 1-p))

E = -p * log2(p) - (1-p)*log2(1-p)E

p = 0.05sample(urna, 1000, replace = TRUE, prob = c(p, 1-p))

E = -p * log2(p) - (1-p)*log2(1-p)E

Page 7: Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

BioMed Central

Page 1 of 12(page number not for citation purposes)

BMC Cell Biology

Open AccessResearch articleSilencing of directional migration in roundabout4 knockdown endothelial cellsSukhbir Kaur†2, Ganesh V Samant†1, Kallal Pramanik1, Philip W Loscombe3, Michael L Pendrak4, David D Roberts4 and Ramani Ramchandran*1

Address: 1Department of Pediatrics, Children's Research Institute, Medical College of Wisconsin, Milwaukee, WI, USA, 2Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA, 3University of Scranton, Biology Department, Scranton, PN, USA and 4Laboratory of Pathology, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA

Email: Sukhbir Kaur - [email protected]; Ganesh V Samant - [email protected]; Kallal Pramanik - [email protected]; Philip W Loscombe - [email protected]; Michael L Pendrak - [email protected]; David D Roberts - [email protected]; Ramani Ramchandran* - [email protected]

* Corresponding author †Equal contributors

AbstractBackground: Roundabouts are axon guidance molecules that have recently been identified to playa role in vascular guidance as well. In this study, we have investigated gene knockdown analysis ofendothelial Robos, in particular roundabout 4 (robo4), the predominant Robo in endothelial cellsusing small interfering RNA technology in vitro.

Results: Robo1 and Robo4 knockdown cells display distinct activity in endothelial cell migrationassay. The knockdown of robo4 abrogated the chemotactic response of endothelial cells to serumbut enhanced a chemokinetic response to Slit2, while robo1 knockdown cells do not displaychemotactic response to serum or VEGF. Robo4 knockdown endothelial cells unexpectedly showup regulation of Rho GTPases. Zebrafish Robo4 rescues both Rho GTPase homeostasis and serumreduced chemotaxis in robo4 knockdown cells. Robo1 and Robo4 interact and share moleculessuch as Slit2, Mena and Vilse, a Cdc42-GAP. In addition, this study mechanistically implicates IRSp53in the signaling nexus between activated Cdc42 and Mena, both of which have previously beenshown to be involved with Robo4 signaling in endothelial cells.

Conclusion: This study identifies specific components of the Robo signaling apparatus that worktogether to guide directional migration of endothelial cells.

BackgroundMajor classes of axon guidance molecules include theNetrins, Semaphorins, Ephrins and Slit ligands, whichinteract with their cognate family of receptors to orches-trate stereotypical nerve patterns in a developing verte-brate embryo [1]. Each family has at least one memberthat plays a functional role in vascular development. Ourstudy focuses on the Roundabout (Robo) family of axon

guidance genes [2]. Robos are cell surface transmembranereceptors that have been identified in most species tomediate repulsion-guidance mechanisms in axons [3].Four robo receptor genes (robo1-4) have been identified inmammals, and their function vary widely depending onthe tissue where they are expressed [4]. Robo4, the fourthmember of the Robo family is expressed in both the neu-ral and vascular systems [5,6]. Robo4 knockdown

Published: 3 November 2008

BMC Cell Biology 2008, 9:61 doi:10.1186/1471-2121-9-61

Received: 22 February 2008Accepted: 3 November 2008

This article is available from: http://www.biomedcentral.com/1471-2121/9/61

© 2008 Kaur et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Page 8: Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

BMC Cell Biology 2008, 9:61 http://www.biomedcentral.com/1471-2121/9/61

Page 6 of 12(page number not for citation purposes)

shows both chemotactic and chemokinetic effects onendothelial cells, and the chemotactic response is in partthrough Robo4 while serum mediates an exclusive chem-otactic response on robo4 knockdown endothelial cells.

Mechanism of Slit2 inhibition of endothelial migration is independent of the Rho GTPase pathwayPreviously, we had shown that Rho GTPases were acti-vated by Robo4 in endothelial cells [22]. To investigate

whether active Rho GTPases are modulated by Slit2 treat-ment of endothelial cells, we have checked by pulldownassay for Cdc42-GTP levels in control endothelial cells.Slit2 was incubated for 5, 10 (data not shown) and 15 minwith control endothelial cells and pulldown analysis wasperformed on lysates from the treated cells (Fig. 3D). Wedid not convincingly notice an up-regulation of Cdc42-GTP levels in endothelial cells treated with Slit2 in either5 or 15 min incubation times. (Fig. 3D, compare lanes 1

Slit2 mediates chemokinetic and chemotactic behaviour on endothelial cells while serum exclusively mediates chemotaxisFigure 3Slit2 mediates chemokinetic and chemotactic behaviour on endothelial cells while serum exclusively mediates chemotaxis. A shows the migration of control lacZ siRNA and robo4 siRNA transfected endothelial cells to AP and AP-Slit2N (25 ng/ml) fusion proteins in a Boyden chamber assay. The data here is consolidated from three independent experiments with each experiment performed with samples in triplicate. B shows the AP activity in lysates prepared from untransfected (UT), AP and AP-Slit2N treated control and robo4 siRNA transfected endothelial cells. C shows migration assay for control lacZ and robo4 siRNA transfected cells to Serum or AP-Slit2N in either upper (U), lower (L) or both chambers as indicated. Error bars in A (n = 3), and B (n = 3) represent SD while in C represent SEM (n = 4). D shows pulldown analysis of Cdc42-GTP levels in AP and AP-Slit2N (25 ng/ml) treated endothelial cell lysates for 5 and 15 minute respectively. + indicate addition of the reagent on the left, pd: pulldown, total: total Cdc42 protein in lysates.

Page 9: Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

BioMed Central

Page 1 of 5(page number not for citation purposes)

BMC Health Services Research

Open AccessCorrespondenceThe most dangerous hospital or the most dangerous equation?Yu-Kang Tu*1,2 and Mark S Gilthorpe1

Address: 1Biostatistics Unit, Centre for Epidemiology and Biostatistics, University of Leeds, 30/32 Hyde Terrace, Leeds, LS2 9LN, UK and 2Leeds Dental Institute, University of Leeds, Clarendon Way, Leeds, LS2 9LU, UK

Email: Yu-Kang Tu* - [email protected]; Mark S Gilthorpe - [email protected]

* Corresponding author

AbstractBackground: Hospital mortality rates are one of the most frequently selected indicators formeasuring the performance of NHS Trusts. A recent article in a national newspaper named thehospital with the highest or lowest mortality in the 2005/6 financial year; a report by theorganization Dr Foster Intelligence provided information with regard to the performance of allNHS Trusts in England.

Methods: Basic statistical theory and computer simulations were used to explore the relationshipbetween the variations in the performance of NHS Trusts and the sizes of the Trusts. Data ofhospital standardised mortality ratio (HSMR) of 152 English NHS Trusts for 2005/6 were re-analysed.

Results: A close examination of the information reveals a pattern which is consistent with astatistical phenomenon, discovered by the French mathematician de Moivre nearly 300 years ago,described in every introductory statistics textbook: namely that variation in performance indicatorsis expected to be greater in small Trusts and smaller in large Trusts. From a statistical viewpoint,the number of deaths in a hospital is not in proportion to the size of the hospital, but is proportionalto the square root of its size. Therefore, it is not surprising to note that small hospitals are morelikely to occur at the top and the bottom of league tables, whilst mortality rates are independentof hospital sizes.

Conclusion: This statistical phenomenon needs to be taken into account in the comparison ofhospital Trusts performance, especially with regard to policy decisions.

Mortality in NHS hospitalsAccording to an article in the Daily Telegraph [1](accessed online on 25/04/2007), the George Elliot Hos-pital (the only hospital run by the George Elliot HospitalNHS Trust) may have been the most dangerous hospitalin England during the 2005/6 financial year. This isbecause its Hospital Standardised Mortality Ratio (HSMR)was 1.43, i.e. the number of patient deaths in this hospitalwas 43% higher than expected. In contrast, the hospital

run by the Royal Free Hampstead Trust may have been thesafest, since its HSMR was only 0.74, i.e. the number ofpatient deaths in this Trust was 26% lower than expected.The source of information in the Daily Telegraph was pro-vided by an organization called Dr Foster Intelligence,which recently published a report entitled "How healthyis your hospital" [2], in which the performance of NHSTrusts was assessed against several indicators, such aspost-operative mortality and emergency readmission.

Published: 15 November 2007

BMC Health Services Research 2007, 7:185 doi:10.1186/1472-6963-7-185

Received: 11 June 2007Accepted: 15 November 2007

This article is available from: http://www.biomedcentral.com/1472-6963/7/185

© 2007 Tu and Gilthorpe; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Page 10: Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

BMC Health Services Research 2007, 7:185 http://www.biomedcentral.com/1472-6963/7/185

Page 2 of 5(page number not for citation purposes)

According to the Daily Telegraph, the George Elliot Hos-pital had problems in the areas of both finance and hos-pital infection, though the Royal Free Hospital seems tohave its own problems too. Whilst we do not have anyexplanations for the higher than average mortality rate inthe George Elliot Hospital NHS Trust, we know that it is arelatively small hospital with only 352 beds, and admis-sions totalled 42,577 during 2005/6, according to Hospi-tal Episode Statistics [3]. Even the Royal Free HampsteadTrust is not very large, with around 900 beds in current usefor patient care, with total admissions of 62,062 during2005/6 [3]. In contrast, the Leeds Teaching Hospitals NHSTrust has three hospitals and 2,370 beds (according to theDaily Telegraph website) with a total of 190,604 admis-sions during 2005/6 [3]. There is clearly huge variation inthe sizes of Trusts and hence the number of patients theytreat, and the question we consider is does size matter? Weshall explain in this article why, from a statistical view-point, the size of a hospital may be a crucial factor as towhether or not that hospital appears at the top or the bot-tom of any league table.

Why size mattersFirst let us use a simple example to illustrate why the sizeof a hospital can matter. Suppose hospitals in Englandhave only five different sizes – 200, 400, 600, 800 and1000 beds – and they undertake 100, 200, 300, 400, and500 coronary artery bypass graft operations each year,respectively. Also suppose that the post-operative mortal-ity rate is nominally 10%, irrespective of hospital size. Theexpected number of deaths for the different sized hospi-tals should then be 10, 20, 30, 40, and 50, respectively.Nevertheless, it is inevitable that across the years there willbe some variation; for instance, in some hospitals with200 beds only 8 patients may die, whilst in other hospitalsof the same size 12 patients may die. The overall averagemortality rate nevertheless remains 10%. Suppose theextent of variation (standard deviation) between hospitalsof the same size is similar across all hospitals, e.g. theobserved number of deaths plus or minus its standarddeviation is 10 ± 3, 20 ± 3, 30 ± 3, 40 ± 3, and 50 ± 3,respectively. So what of the observed mortality rates?From the smallest to largest hospital, the observed mortal-ity rates have 95% confidence intervals of 4.0%–16.0%,7.0%–13.0%, 8.0%–12.0%, 8.5%–11.5%, and 8.8%–11.2%, respectively. If we were to make a league table forthese hospitals, the smaller hospitals are more likely to befound at the bottom and the top the league table. Never-theless, if factors related to the success of coronary arterybypass surgery act in a similar way across different sizedhospitals, then variations in the number of deaths forlarger hospitals would be expected to be greater than forsmaller hospitals.

Many factors affect the performance indicators of hospi-tals, such as the post-operative mortality rate. There hasbeen a continuing debate regarding whether or not theseindicators can really measure the quality of healthcareprovided by a hospital Trust [4-9]. Any hospital that treatsmore patients with higher risks or greater complexity mayshow higher mortality rates. However, notwithstandingthe controversy regarding the validity of performanceindicators, it is important to note that the extent of varia-tion in the number of deaths in hospitals of the same sizeis not in proportion to the size of the hospital, but is inproportion to the square root of its size [10,11]. There-fore, for our simple example, if all the factors related topost-operative mortality (e.g. case-mix, staff experiences,and support from post-operative care units, etc.) werecomparable for all hospitals and operated in similar waysacross hospitals of different sizes, the variation of theobserved number of deaths would be 10 ± 3.0, 20 ± 4.2,30 ± 5.2, 40 ± 6.0, and 50 ± 6.7, rather than 10 ± 3.0, 20± 6.0, 30 ± 9.0, 40 ± 12.0, and 50 ± 15.0. The observedmortality rates would then have 95% confidence intervalsof 4.0%–16.0%, 5.8%–14.2%, 6.5%–13.5%, 7.0%–13.0%, and 7.3%–12.7%, respectively. Hence, the smallerhospitals are still more likely to be found at the bottomand the top the league table.

From a statistical viewpoint, this is because the standarddeviation of the sampling distribution of the mean, i.e.the standard error of the mean, is inversely related to the

square root of the sample size: . This equation

appears in every introductory statistics textbook and wasfirst stated by the French mathematician de Moivre in1730. This equation shows that the greater the samplesize, the less likely is the sample mean to fluctuate, i.e. thevariation is much greater for small hospitals and muchless for large hospitals.

It has been noted in the literature that there is "over-dis-persion" of performance indicators for smaller hospitalsor Primary Care Trusts [7], and therefore the use of leaguetables for the ranking of hospital performance may bemisleading [4,5,7,12]. Quality control charts [4,8] andfunnel plots [5,7,12,13] have been proposed as alternativestrategies to compare hospital performance, and to iden-tify those for whom performance is below the nationalstandard. To understand why quality control charts andfunnel plots are more appropriate methods for comparingthe performance of hospitals, it is crucial for health serv-ices researchers, doctors, and patients to appreciate fullythe significance of de Moivre's equation.

σ σx n=

Page 11: Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

BMC Health Services Research 2007, 7:185 http://www.biomedcentral.com/1472-6963/7/185

Page 3 of 5(page number not for citation purposes)

Throwing the die of deathSuppose there is an imaginary fair die with 20 surfaces.One surface of the die is black and the other 19 are white.When the die is thrown, the probability of the black sur-face showing is 0.05, i.e. when the die is thrown 20 times,we expect on average to see the black surface only once.However, due to the nature of all random processes, ineach round of 20 throws the black surface may or may notshow, or might show more than once. Similarly, althoughthe black surface is expected to show 50 times when thedie is thrown 1,000 times, the black surface may actuallyshow more or less than 50 times. From a statistical view-point, an experiment like this is known as a Bernoulli trial[14,15]. The results collected from performing multipleindependent Bernoulli trials, such as throwing ourtwenty-sided die 1,000 times, follow a binomial distribu-tion [14,15], which can be used to calculate the variationin the number of times the black surface is expected toshow. By denoting the number of throws (trials) as n =

1,000 and the probability of obtaining black as π = 0.05,

statistical theory tells us that the population mean, nπ, is

50 and the standard error of this mean, , is

6.9 [10]. Consequently, the number of times the blackshows has a 95% confidence interval of 36 to 64.

In 2002–2003, the average mortality following selectedsurgical procedures in English NHS hospital Trusts wasaround 5% [16]. Now suppose the die represents theprobability of death following these selected surgical pro-cedures and n is number of surgeries undertaken by thehospital. The 95% confidence interval for the number ofdeaths is between 36 and 64, i.e. the mortality rate has a95% confidence interval between 3.6% and 6.4%. ForNHS hospital Trusts undertaking 4,000 surgeries, theexpected number of deaths is 200 and the standard errorof the mean is 13.8, which is twice as large as that for hos-pital Trusts undertaking only 1,000 surgeries. However,the 95% confidence interval for the mortality rate of thislarger hospital Trust is 4.3% to 5.7%, which is narrowerthan that for the hospital Trust undertaking only 1,000surgeries.

A funnel plot in Figure 1 shows a simulated dataset of1,000 hospitals in which the number of surgical proce-dures (Y) in each year has a mean of 2,500 and a standarddeviation (SD) of 700. The mortality rate is assumed to be5% across all sizes of hospital, so the number of expecteddeaths (X) has a mean of 125 and SD of 35. A random var-iable with zero mean and SD proportional to the squareroot of X is simulated to represent the variation/fluctua-tion in the observed mean number of deaths, and this isadded to X. The vertical axis in Figure 1 is the ratio of

observed number of deaths (Z) over the expected numberof deaths (X), and the horizontal axis is X. If we fit a linearregression model to the data, the regression slope will beclose to zero, indicating that the observed to expectedratio is independent of the expected number of deaths(i.e. hospital size), yet variation in the 95% confidenceinterval of these ratios (represented by the blue lines bothtop and the bottom of the figure) is inversely related tohospital size. Although this simulation assumes no rela-tionship between mortality and the number of surgeriesundertaken, a few hospitals are below the lower confi-dence limit or above the upper confidence limit, as wouldbe expected due to chance alone 5% of the time, indicat-ing that their performance is either alarmingly poor orextremely good. We would therefore still need to be cau-tious in identifying the poor or good performers usingfunnel plots or quality control charts, given that chance isinvolved. In the report published in the "How healthy isyour hospital?" readers can find that the report's graphsfollow a very similar pattern [2].

The most dangerous equationIn a recently published article [10], Howard Wainer nom-inated de Moivre's equation as "the most dangerous equa-tion", since being ignorant of its consequences may cost

nπ π1 −( )

A funnel plot of the relationship between the expected number of events and the ratio of observed to expected number of events in the simulated dataset of 1,000 hospitalsFigure 1A funnel plot of the relationship between the expected number of events and the ratio of observed to expected number of events in the simulated dataset of 1,000 hospitals. The blue lines (top and the bottom of the panel) represent respectively the upper and lower 95% confidence limits of the observed/expected ratio.

Page 12: Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

The Most Dangerous Equation

Ignorance of how sample size affects statistical variationhas created havoc for nearly a millennium

Howard Wainer

Wliat constitutes a dangerous equation?There are two obvious interpretations:

Some equations are dangerous if you knowthem, and others arc dangerous if you do not.The first category may pose danger becausethe secrets within its bounds open doors be-

hind which lies terrible peril. The obvious win-ner in this is Einstein's iconic equation e = nic~,for it provides a measure of the enormousenergy hidden within ordinary matter. Itsdestructive capability was recognized by LeoSzilard, who then instigated the sequence of

!'.• The Goldsmiths't!iiiiipLii f ; p fi

Figure 1. Trial of the pyx has been performed since 1150 A.D. In the trial, a sample of minted coins, say 100 at a time, is compared with a stan-dard. Limits are set on the amount that the sample can be over- or underweight. In 1150, that amount was set at 1/400. Nearly 600 years later, in1730, a French mathematician, Abraham de Moivre, showed Ihat the standard deviation does not increase in proportion to the sample. Instead,it is proportional to the square root of the sample size. Ignorance of de Moivre's equation has persisted to the present, as the author relates infive examples. This ignorance has proved costly enough that the author nominates de Moivre's formula as the most dangerous equation.

www.americanscientist.org 2007 May-June 249

Page 13: Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

Howard Wainer is Distin-guisliai Ri'seiirch Scientist atthe Nalioiml Botml ofMediailExaminers and nn ndjunctprofessor ofsttitislics at HieWliarton School of the Uni-versity ofPennsylvimia. Hehas published 16 books, mostrecenthi, Testlet ResponseTheory and Its Applications(Cambridge University Press).Heisa Fellow of the AmericanStatistical Association andwas awarded the 2007 Nation-al Council on Measurementin Education Career Acliieve-ment Award for Contributionsto Educational Measurement.Address: National Board ofMedical Examiners, 3750Market St.. Philadelphia, PAJ9105. Internet:hzi>ainer®nbme.org

events that culminated in the construction ofatomic bombs.

Supporting ignorance is not, however, the di-rection I wish to pursue—^indeed it is quite theantithesis of my message. Instead I am inter-ested in equations that unleash their danger notwhen we know about them, but rather whenwe do not. Kept close at hand, these equationsallow us to understand things clearly, but theirabsence leaves us dangerously ignorant.

There are many plausible candidates, andI have identified three prime examples: Kel-ley's equation, which indicates that the truthis estimated best when its observed value isregressed toward the mean of the group thatit came from; the standard linear regressionequation; and the equation that provides uswith the standard deviation of the samplingdistribution of the mean—what might becalled de Moivre's equation:

a ^ = <T / Vn

where O- is the standard error of the mean, ois the standard deviation of the sample andn is the size of the sample. (Note the squareroot symbol, which will be a key to at least oneof the misunderstandings of variation.) DeMoivre's equation was derived by the Frenchmathematician Abraham de Moivre, who de-scribed it in his 1730 exploration of the bino-mial distribution. Miscellanea Anahjtica.

Ignorance of Kelley's equation has provedto be very dangerous indeed, especially toeconomists who have interpreted regressiontoward the mean as having economic causesrather than merely reflecting the uncertaintyof prediction. Horace Secrist's The Triumph ofMediocrity in Business is but one example listedin the bibliography. Other examples of failureto understand Kelley's equation exist in thesports world, where the expression "sopho-more slump" merely describes the likelihoodof an average season following an especiallygood one.

The familiar linear regression equationcontains many pitfalls to trap the unwary.The correlation coefficient that emerges fromregression tells us about the strength of thelinear relation between the dependent andindependent variables. But alas it encouragesfallacious attributions of cause and effect. Iteven encourages fallacious interpretation bythose who think they are being careful. ("1may not be able to believe the exact value ofthe coefficient, but surely I can use its signto tell whether increasing the variable willincrease or decrease the answer") Tlie linearregression equation is also badly non-robust,but its weaknesses are rarely diagnosed ap-propriately, so many models are misleading.When regression is applied to observationaldata (as it almost always is), it is difficult toknow whether an appropriate set of predictors

has been selected—and if we have an inappro-priate set, our interpretations are questionable.It is dangerous, ironically, because it can be themost useful model for the widest variety ofdata when wielded with caution, wisdom andmuch interaction between the analyst and thecomputer program.

Yet, as dangerous as Kelley's equation andthe common regression equations are, I findde Moivre's equation more perilous still. I ar-rived at this conclusion because of the extremelength of time over which ignorance of it hascaused confusion, the variety of fields thathave gone astray and the seriousness of theconsequences that such ignorance has caused.

In the balance of this essay I will describefive very different situations in which igno-rance of de Moivre's equation has led to bil-lions of dollars of loss over centuries yieldinguntold hardship. Tliese are but a small sam-pling; there are many more.

The Trial of the PyxIn 1150, a century after the Battle of Hastings, itwas recognized that the King of England couldnot just mint money and assign it to have anyvalue he chose. Instead the coinage's valueneeded to be intrinsic, based on the amount ofprecious materials in its make-up. And so stan-dards were set for the weight of gold in coins—a guinea, for example, should weigh 128 grains(there are 360 grains m an ounce). In the trial ofthe pyx—tlie pyx is actually the wooden boxthat contains the standard coins—samples aremeasured and compared with the standard.

It was recognized, even then, that coinagemethods were too iniprecise to insist that al!coins be exactly equal in weight, so instead theking and the barons who supplied the LondonMint (an independent organization) with goldinsisted that coins when tested in the aggre-gate (say 100 at a time) conform to the regu-lated size plus or minus some allowance forvariability. They chose 1/400th of the weight,which for one guinea would be 0.28 grainsand so for the aggregate, 28 grains. Obviously,they assumed that variability increased pro-portionally to tlie number of coins and not toits square root, as de Moivre's equation wouldlater indicate. This deeper understanding layalmost 600 years in the future.

The costs of making errors are of two types.If the average of all the coins was too light, thebarons were being cheated, for there would beextra gold left over after minting the agreednumber of coins. This kind of error is easilydetected, and, if found, the director of the mintwould suffer grievous punisliment. But if theallowable variability was larger than neces-sary, there would be an excessive number of tooheavy coins. The mint could thus stay within thebounds specified and still provide the opportu-nity for someone at the mint to collect these

250 American Scientist, Volume 95

Page 14: Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

Figure 2. A cursory glance at the distribution of the U.S. counties with the lowest rates of kidney cancer (teal) might lead one to conclude thatsomething about the rural lifestyle reduces the risk of that cancer. After all, the counties with the lowest 10 percent of risk are mainly Midwest-ern, Southern and Westem counties. When one examines the distribution of counties with the highest rates of kidney cancer (red), however,it becomes clear that some other factor is at play Knowledge of de Moivre's equation leads to the conclusion that what the counties with thelowest and highest kidney-cancer rates have in common is low population—and therefore high variation in kidney-cancer rates.

overweight coins, melt them down and recastthem at the correct lou-er weight. This wouldleave the balance of gold as an excess paymentto the mint. The fact that this error continued foralmost 600 years pro\ndes strong support for deMoivre's equation to be considered a candidatefor the title of most dangerous equation.

Life in the Country: Haven or Threat?Figure 2 is a map of the locations of of countieswith unusual kidney-cancer rates. Tlie coun-ties colored teal are those that are in the lowesttenth of the cancer distribution. We note thatthese healthful counties tend to be very rural,Midwestern, Southern or Western. It is botheasy and tempting to infer that this outcome isdirectly due to the clean living of the rural life-style—no air pollution, no water pollution, ac-cess to fresh food without additives and so on.

Tlie counties colored in red, however, beliethat inference. Although they have much thesame distribution as the teal counties—in fact,they're often adjacent—they are those thatare in the highest decile of the cancer distribu-tion. We note that these unhealthful countiestend to be very rural, Midwestern, Southemor Westem. Tt would be easy to infer that thisoutcome might be directly due to the poverty

of the rural lifestyle—no access to good medi-cal care, a high-fat diet, and too much alcoholand tobacco.

What is going on? We are seeing de Moivre'sequation in action. The variation of the meanis inversely proportional to the sample size, sosmall counties display much greater variationthan large counties. A county with, say, 100inhabitants that has no cancer deaths wouldbe in the lowest category. But if it has 1 cancerdeath it would be among the highest. Countieslike Los Angeles, Cook or Miami-Dade withmillions of inliabitants do not bounce aroundlike that.

Wlien we plot the age-adjusted cancer ratesagainst county population, this result becomesclearer still (see Figure 3). We see the typicaltriangle-shaped bivariate distribution: Whenthe population is small (left side of the graph)there is wide variation in cancer rates, from 20per 100,000 to 0; when county populations arelarge (right side of graph) there is very Uttlevariation, with all counties at about 5 cases per100,000 of population.

The Small-Schools MovementTlie urbanization that characterized the 20thcentury led to the abandonment of the rural

www.americansdentist.org 2007 May-June 251

Page 15: Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

il dataset tooth

gender il1b smoke areainfl1 F etero low 39.972 M wt low 24.013 F etero low 35.77.. .. .. .. ..

65 M mut low 51.2766 F wt low 27.7167 F wt high 68.4568 F wt low 39.4069 F wt low 19.95

Stime puntuali ed intervallari

Calcoliamo una stima della media di areainfl in maniera:

• puntuale

• intervallare con fiducia del 95%

> t.test(areainfl, mu = 41.86)

• intervallare con fiducia del 99%

> t.test(areainfl, mu = 41.86, conf.level = 0.99)

• puntuale bootstrap ed intervallare bootstrap con fiducia del 99%

> fmedia = function(valori,i) mean(valori[i])

> mb = boot(areainfl, fmedia, 10000)

> mb

> boot.ci (mb, conf = 0.99)

1

Page 16: Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

A�Dirty�Dozen:�Twelve�P-Value�MisconceptionsSteven�Goodman

The� P� value� is� a� measure� of� statistical� evidence� that� appears� in� virtually� all� medicalresearch�papers.�Its�interpretation�is�made�extraordinarily�difficult�because�it�is�not�part�ofany�formal�system�of�statistical�inference.�As�a�result,�the�P�value’s�inferential�meaning�iswidely� and� often� wildly� misconstrued,� a� fact� that� has� been� pointed� out� in� innumerablepapers�and�books�appearing�since�at�least�the�1940s.�This�commentary�reviews�a�dozen�ofthese� common� misinterpretations� and� explains� why� each� is� wrong.� It� also� reviews� thepossible�consequences�of�these�improper�understandings�or�representations�of�its�mean-ing.�Finally,�it�contrasts�the�P�value�with�its�Bayesian�counterpart,�the�Bayes’�factor,�whichhas�virtually�all�of�the�desirable�properties�of�an�evidential�measure�that�the�P�value�lacks,most� notably� interpretability.� The� most� serious� consequence� of� this� array� of� P-valuemisconceptions�is�the�false�belief�that�the�probability�of�a�conclusion�being�in�error�can�becalculated�from�the�data�in�a�single�experiment�without�reference�to�external�evidence�orthe�plausibility�of�the�underlying�mechanism.Semin�Hematol�45:135-140�©�2008�Elsevier�Inc.�All�rights�reserved.

T�he�P�value�is�probably�the�most�ubiquitous�and�at�thesame� time,� misunderstood,� misinterpreted,� and� occa-

sionally�miscalculated�index1,2� in�all�of�biomedical�research.In�a�recent�survey�of�medical�residents�published�in�JAMA,88%�expressed�fair�to�complete�confidence�in�interpreting�Pvalues,�yet�only�62%�of� these�could�answer�an�elementaryP-value�interpretation�question�correctly.3�However,�it�is�notjust�those�statistics�that�testify�to�the�difficulty�in�interpretingP�values.�In�an�exquisite�irony,�none�of�the�answers�offeredfor�the�P-value�question�was�correct,�as�is�explained�later�inthis�chapter.

Writing�about�P�values�seems�barely�to�make�a�dent�in�themountain�of�misconceptions;�articles�have�appeared�in�thebiomedical� literature� for� at� least� 70� years4-15� warning� re-searchers�of�the�interpretive�P-value�minefield,�yet�these�les-sons�appear� to�be�either�unread,� ignored,�not�believed,�orforgotten�as�each�new�wave�of�researchers�is�introduced�to�thebrave�new�technical�lexicon�of�medical�research.

It�is�not�the�fault�of�researchers�that�the�P�value�is�difficultto�interpret�correctly.�The�man�who�introduced�it�as�a�formalresearch�tool,�the�statistician�and�geneticist�R.A.�Fisher,�couldnot�explain�exactly� its� inferential�meaning.�He�proposed�arather�informal�system�that�could�be�used,�but�he�never�coulddescribe�straightforwardly�what�it�meant�from�an�inferentialstandpoint.�In�Fisher’s�system,�the�P�value�was�to�be�used�as

a�rough�numerical�guide�of�the�strength�of�evidence�againstthe�null�hypothesis.�There�was�no�mention�of�“error�rates”�orhypothesis�“rejection”;�it�was�meant�to�be�an�evidential�tool,to�be�used�flexibly�within�the�context�of�a�given�problem.16

Fisher� proposed� the� use� of� the� term� “significant”� to� beattached�to�small�P�values,�and�the�choice�of�that�particularword� was� quite� deliberate.� The� meaning� he� intended� wasquite�close�to�that�word’s�common�language�interpretation—something�worthy�of�notice.� In�his� enormously� influential1926�text,�Statistical�Methods� for�Research�Workers,� the�firstmodern�statistical�handbook�that�guided�generations�of�bio-medical�investigators,�he�said:

Personally,�the�writer�prefers�to�set�a�low�standard�ofsignificance�at�the�5�percent�point�. . . .�A�scientific�factshould�be�regarded�as�experimentally�established�only�ifa�properly�designed�experiment�rarely�fails�to�give�thislevel�of�significance.17

In�other�words,�the�operational�meaning�of�a�P�value�lessthan�.05�was�merely�that�one�should�repeat�the�experiment. Ifsubsequent� studies� also� yielded� significant� P� values,� onecould�conclude�that�the�observed�effects�were�unlikely�to�bethe�result�of�chance�alone.�So�“significance”�is�merely�that:worthy of attention in the form of meriting more experimen-tation, but not proof in itself.

The P value story, as nuanced as it was at its outset, gotincomparably more complicated with the introduction of themachinery of “hypothesis testing,” the mainstay of currentpractice. Hypothesis testing involves a null and alternativehypothesis, “accepting and rejecting” hypotheses, type I and

Departments of Oncology, Epidemiology, and Biostatistics, Johns HopkinsSchools of Medicine and Public Health, Baltimore, MD.

Address correspondence to Steven Goodman, MD, MHS, PhD, 550 N Broad-way,�Suite�1103,�Baltimore,�MD,�21205.�E-mail:�[email protected]

1350037-1963/08/$-see front matter © 2008 Elsevier Inc. All rights reserved.doi:10.1053/j.seminhematol.2008.04.003

Page 17: Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

II “error rates,” “power,” and other related ideas. Even thoughwe use P values in the context of this testing system today, itis not a comfortable marriage, and many of the misconcep-tions we will review flow from that unnatural union. In-depth explanation of the incoherence of this system, and theconfusion that flows from its use can be found in the litera-ture.16,18-20 Here we will focus on misconceptions about howthe P value should be interpreted.

The definition of the P value is as follows—in words: Theprobability of the observed result, plus more extreme results, if thenull hypothesis were true; in algebraic notation: Prob(X ! x |Ho), where “X” is a random variable corresponding to someway of summarizing data (such as a mean or proportion), and“x” is the observed value of that summary in the current data.This is shown graphically in Figure 1.

We have now mathematically defined this thing we call a Pvalue, but the scientific question is, what does it mean? This isnot the same as asking what people do when they observeP ".05. That is a custom, best described sociologically. Ac-tions should be motivated or justified by some conception offoundational meaning, which is what we will explore here.

Because the P value is not part of any formal calculus ofinference, its meaning is elusive. Below are listed the mostcommon misinterpretations of the P value, with a brief dis-cussion of why they are incorrect. Some of the misconcep-tions listed are equivalent, although not often recognized assuch. We will then look at the P value through a Bayesian lensto get a better understanding of what it means from an infer-ential standpoint.

For simplicity, we will assume that the P value arises froma two-group randomized experiment, in which the effect ofan intervention is measured as a difference in some averagecharacteristic, like a cure rate. We will not explore the manyother reasons a study or statistical analysis can be misleading,from the presence of hidden bias to the use of impropermodels; we will focus exclusively on the P value itself, underideal circumstances. The null hypothesis will be defined asthe hypothesis that there is no effect of the intervention (Ta-ble 1).

Misconception #1: If P!.05, the null hypothesis has only a5% chance of being true. This is, without a doubt, the mostpervasive and pernicious of the many misconceptions aboutthe P value. It perpetuates the false idea that the data alonecan tell us how likely we are to be right or wrong in ourconclusions. The simplest way to see that this is false is tonote that the P value is calculated under the assumption thatthe null hypothesis is true. It therefore cannot simultaneouslybe a probability that the null hypothesis is false. Let us sup-pose we flip a penny four times and observe four heads,two-sided P ! .125. This does not mean that the probabilityof the coin being fair is only 12.5%. The only way we cancalculate that probability is by Bayes’ theorem, to be dis-cussed later and in other chapters in this issue of Seminars inHematology.21-24

Misconception #2: A nonsignificant difference (eg, P ".05)means there is no difference between groups. A nonsignificantdifference merely means that a null effect is statistically con-sistent with the observed results, together with the range ofeffects included in the confidence interval. It does not makethe null effect the most likely. The effect best supported bythe data from a given experiment is always the observedeffect, regardless of its significance.

Misconception #3: A statistically significant finding is clini-

Figure 1 Graphical depiction of the definition of a (one-sided) Pvalue. The curve represents the probability of every observed out-come under the null hypothesis. The P value is the probability of theobserved outcome (x) plus all “more extreme” outcomes, repre-sented by the shaded “tail area.”

Table 1 Twelve P-Value Misconceptions

1 If P ! .05, the null hypothesis has only a 5% chance of being true.2 A nonsignificant difference (eg, P >.05) means there is no difference between groups.3 A statistically significant finding is clinically important.4 Studies with P values on opposite sides of .05 are conflicting.5 Studies with the same P value provide the same evidence against the null hypothesis.6 P ! .05 means that we have observed data that would occur only 5% of the time under the null hypothesis.7 P ! .05 and P <.05 mean the same thing.8 P values are properly written as inequalities (eg, “P <.02” when P ! .015)9 P ! .05 means that if you reject the null hypothesis, the probability of a type I error is only 5%.

10 With a P ! .05 threshold for significance, the chance of a type I error will be 5%.11 You should use a one-sided P value when you don’t care about a result in one direction, or a difference in

that direction is impossible.12 A scientific conclusion or treatment policy should be based on whether or not the P value is significant.

136 S. Goodman

Page 18: Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

Editorial

Guidelines for reporting statistics in journals published by theAmerican Physiological SocietyConcepts and procedures in statistics are inherent to publi-

cations in science. Based on the incidence of standard devia-tions, standard errors, and confidence intervals in articlespublished by the American Physiological Society (APS), how-ever, many scientists appear to misunderstand fundamentalconcepts in statistics (9). In addition, statisticians have docu-mented that statistical errors are common in the scientificliterature: roughly 50% of published articles have at least oneerror (1, 2). This misunderstanding and misuse of statisticsjeopardizes the process of scientific discovery and the accu-mulation of scientific knowledge.In an effort to improve the caliber of statistical information

in articles they publish, most journals have policies that governthe reporting of statistical procedures and results. These werethe previous guidelines for reporting statistics in the Informa-tion for Authors (3) provided by the APS: 1) In the MATERIALSAND METHODS, authors were told to “describe the statisticalmethods that were used to evaluate the data.” 2) In the RESULTS,authors were told to “provide the experimental data and resultsas well as the particular statistical significance of the data.” 3)In the DISCUSSION, authors were told to “Explain your interpre-tation of the data. . . .” To an author unknowing about statistics,these guidelines gave almost no help.In its 1988 revision of Uniform Requirements (see Ref. 13,

p. 260), the International Committee of Medical Journal Edi-tors issued these guidelines for reporting statistics:Describe statistical methods with enough detail to enable aknowledgeable reader with access to the original data to verifythe reported results. When possible, quantify findings andpresent them with appropriate indicators of measurement erroror uncertainty (such as confidence intervals). Avoid sole reli-ance on statistical hypothesis testing, such as the use of Pvalues, which fails to convey important quantitative informa-tion. . . . Give numbers of observations. . . . References forstudy design and statistical methods should be to standardworks (with pages stated) when possible rather than to paperswhere designs or methods were originally reported. Specify anygeneral-use computer programs used.

The current guidelines issued by the Committee (see Ref. 14,p. 39) are essentially identical. To an author unknowing aboutstatistics, these Uniform Requirements guidelines give onlyslightly more help.In this editorial, we present specific guidelines for reporting

statistics.1 These guidelines embody fundamental concepts instatistics; they are consistent with the Uniform Requirements(14) and with the upcoming 7th edition of Scientific Style andFormat, the style manual written by the Council of ScienceEditors (6) and used by APS Publications. We have written thiseditorial to provide investigators with concrete steps that will

help them design an experiment, analyze the data, and com-municate the results. In so doing, we hope these guidelines willhelp improve and standardize the caliber of statistical informa-tion reported throughout journals published by the APS.GUIDELINES

The guidelines address primarily the reporting of statistics inthe MATERIALS AND METHODS, RESULTS, and DISCUSSION sections ofa manuscript. Guidelines 1 and 2 address issues of experimen-tal design.MATERIALS AND METHODS

Guideline 1. If in doubt, consult a statistician when you planyour study. The design of an experiment, the analysis of itsdata, and the communication of the results are intertwined. Infact, design drives analysis and communication. The time toconsult a statistician is when you have defined the experimentalproblem you want to address: a statistician can help you designan experiment that is appropriate and efficient. Once you havecollected the data, a statistician can help you assess whether theassumptions underlying the analysis were satisfied. When youwrite the manuscript, a statistician can help you ensure yourconclusions are justified.Guideline 2. Define and justify a critical significance level !

appropriate to the goals of your study. For any statistical test,if the achieved significance level P is less than the criticalsignificance level !, defined before any data are collected, thenthe experimental effect is likely to be real (see Ref. 9, p. 782).By tradition, most researchers define ! to be 0.05: that is, 5%of the time they are willing to declare an effect exists when itdoes not. These examples illustrate that ! " 0.05 is sometimesinappropriate.If you plan a study in the hopes of finding an effect that

could lead to a promising scientific discovery, then ! " 0.10 isappropriate. Why? When you define ! to be 0.10, you increasethe probability that you find the effect if it exists.In contrast, if you want to be especially confident of a

possible scientific discovery, then ! " 0.01 is appropriate: only1% of the time are you willing to declare an effect exists whenit does not.A statistician can help you satisfy this guideline (see Guide-

line 1).Guideline 3. Identify your statistical methods, and cite them

using textbooks or review papers. Cite separately commercialsoftware you used to do your statistical analysis. This guide-line sounds obvious, but some researchers fail to identify thestatistical methods they used.2 When you follow Guideline 1,you can be confident that your statistical methods were appro-priate; when you follow this guideline, your reader can beconfident also. It is important that you identify separately thecommercial software you used to do your statistical analysis.Guideline 4. Control for multiple comparisons. Many phys-

iological studies examine the impact of an intervention on a set

Address for reprints and other correspondence: D. Curran-Everett, Divisionof Biostatistics, M222, National Jewish Medical and Research Center, 1400Jackson St., Denver, CO 80206 (E-mail: [email protected]).1 Discussions of common statistical errors, underlying assumptions of com-

mon statistical techniques, and factors that impact the choice of a parametricor the equivalent nonparametric procedure fall outside the purview of thiseditorial.

2 We include resources that may be useful for general statistics (15),regression analyses (10), and nonparametric procedures (5).

J Appl Physiol 97: 457–459, 2004;10.1152/japplphysiol.00513.2004.

8750-7587/04 $5.00 Copyright © 2004 the American Physiological Societyhttp://www. jap.org 457

on May 25, 2010

jap.physiology.orgDownloaded from

Page 19: Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

Comparison of a Novel Multiple Marker Assay Versus the Risk ofMalignancy Index for the Prediction of Epithelial Ovarian Cancerin Patients with a Pelvic Mass

Richard G. MOORE, M.D.1, Moune JABRE-RAUGHLEY, M.D.1, Amy K. BROWN, M.D.2,Katina M. ROBISON, M.D.1, M. Craig MILLER, B.S.3, W. Jeffery ALLARD, Ph.D.3, Robert J.KURMAN, M.D.4, Robert C. BAST, M.D.5, and Steven J. SKATES, Ph.D.61Program in Women’s Oncology, Department of Obstetrics and Gynecology, Women and Infants’Hospital, Alpert Medical School, Brown University, Providence, RI, USA.2Department of Obstetrics and Gynecology, Hartford Hospital, Hartford, CT, USA.3Medivice consulting, New Durham, NH, USA.4Department of Pathology, Johns Hopkins Medical School, Baltimore, MD, USA.

© 2010 Mosby, Inc. All rights reserved.

Address all correspondence to: Richard G. Moore, M.D. Associate Professor Program in Women’s Oncology Department ofObstetrics and Gynecology Women and Infants’ Hospital Alpert Medical School Brown University, Providence, RI, 02905Telephone: (401) 453-7520 FAX: (401) 453-7529 [email protected].

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to ourcustomers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review ofthe resulting proof before it is published in its final citable form. Please note that during the production process errors may bediscovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

These data were presented February 6, 2009, at the Society of Gynecologic Oncology Annual Meeting on Women’s Cancer at SanAntonio, TX.

List of Participating Institutions:

Investigator Institution

Richard Moore, MD Women and Infants’ Hospital of Rhode Island,Providence, RI

Dan Schlitzer, MD Healthcare for Women, Inc., New Bedford, MA

Steven DePasquale, MD Chattanooga Gyn-Oncology, Chattanooga, TN

Walter Gajewski, MD New Hanover Regional Medical Center, Wilmington,NC

Laura Havrilesky, MD Duke University Medical Center, Durham, NC

Donald Chamberlain, MD Chattanooga Gyn-Oncology, Chattanooga, TN

Amy Kirkpatrick Brown, MD,MPH

Hartford Hospital, Hartford, CT . New Britain GeneralHospital of Central Connecticut, New Britain, CT

Alan Gordon, MD Arizona Gyn-Oncology, Phoenix, AZ

Scott McMeekin, MD Oklahoma University Health Science Center, OklahomaCity, OK

Howard Homesley, MD Brody School of Medicine, Leo Jenkins Cancer Center,Greenville, NC

Elizabeth Swisher, MD University of Washington Medical Center, Seattle, WA

Audrey Garrett, MD Northwest Gynecologic Oncology, Eugene, OR

Alexander Burnett, MD University of Arkansas for Medical Sciences, LittleRock, AR

NIH Public AccessAuthor ManuscriptAm J Obstet Gynecol. Author manuscript; available in PMC 2013 March 11.

Published in final edited form as:Am J Obstet Gynecol. 2010 September ; 203(3): 228.e1–228.e6. doi:10.1016/j.ajog.2010.03.043.

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

Page 20: Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

instructions, and appropriate controls were within the ranges provided by the manufacturerfor all runs.

Study sites were monitored for compliance with the protocol and for data accuracy. All datawere captured onto case report forms and entered into a validated NetRegulus database. Allpatients underwent surgical removal of the ovarian masses or cysts, and if a patient wasdiagnosed with an epithelial ovarian cancer, surgical staging was required by protocol.Tissue specimens were obtained from all patients and centrally reviewed by threegynecologic pathologists to verify the diagnoses made by the site pathologists. Twogynecologic oncologists reviewed the histopathology results from the site pathologist andthe central review pathologists to determine concordance and the final consensus forhistopathological diagnosis. All histological evaluations were conducted blinded tolaboratory values for the biomarker assays and laboratory testing was conducted blinded tohistological outcome. Serum levels for HE4 and CA125II, as well as the ROMA valuedetermined for the protocol, were withheld from the physicians and patients participating inthe study.

For the purpose of analysis, women were considered to be postmenopausal if they had nothad a menstrual period for >1 year prior to their study blood draw, or if they were >55 yearsold and the date of the last menstrual period was unknown. Women were considered to bepremenopausal if they had a period within one year of the study blood draw or if they were<48 years old and the date of their last menstrual period was unknown. Follicle stimulatinghormone testing was utilized to determine menopausal status for women between the ages of48 and 55 who had an unknown last menstrual period or who had a hysterectomy withovarian preservation.

Predictive Probability CalculationsThe primary endpoint of the clinical study was to classify patients with a pelvic mass intohigh or low-risk groups for having EOC using the serum biomarkers CA125 and HE4 in thefollowing predictive probability algorithm (ROMA):

The predictive probability algorithm (ROMA) was developed from two separate pilotstudies as described in previous publication publications(20;21) and validated in thisnational trial.

Imaging AnalysisAll patients were required to have either a pelvic ultrasound, CT scan, MRI or anycombination of imaging modalities for documentation of an ovarian cyst or pelvic mass.Imaging reports were captured for all patients and results entered into the trial database. AnRMI imaging score was calculated using the architectural features of the ovarian cyst orpelvic mass and assigned an imaging score as described by Jacobs et al (17). Briefly, one

MOORE et al. Page 4

Am J Obstet Gynecol. Author manuscript; available in PMC 2013 March 11.

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

NIH

-PA Author Manuscript

Page 21: Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

il dataset ovarian

[?]

HE4 CA125 CA199 CEA ETA MENOPAUSA OUTCOME1 3576 7014 2785 124 34 PRE BENIGNO2 3046 23216 12656 127 21 PRE BENIGNO3 29376 11226 2462 250 64 POST MALIGNO4 6296 5224 3450 583 58 POST MALIGNO.. .. .. .. .. .. .. ..

208 4116 6156 842 260 36 PRE BENIGNO209 5816 903 98 146 55 POST BENIGNO210 5226 5616 530 203 63 POST MALIGNO

www = "http://www.dmi.units.it/˜borelli/dataset/ovarian.csv"ovarian = read.csv(www, header = TRUE)attach(ovarian)logHE4 = log(HE4/100)

la tavola di contingenza

• MENOPAUSA ed OUTCOME sono due caratteri qualitativi associati tra loro? Lovediamo con una tavola di contingenza

– eventi dipendenti ed indipendenti: il teorema di Bayes

– le misure di rischio relativo ed odds ratio

– il test di Fisher ed il test del Chi Quadrato

• logHE4 < 4.1 e un cut-off ’ideale’ per predire l’OUTCOME?

– sensibilita e specificita

– valori predittivi

∗ nota: i valori predittivi dipendono dalla prevalenza della patologia

– la curva ROC

1

Page 22: Corso di Biostatistica - units.itborelli/magnagraecia/150713dispensa.pdf · Corso diBiostatistica corso di biostatistica descrivere i dati lun27luglio introduzione,cos’èlabiostatistica,esempitrattidaricerchepubblicate

Per ’capire bene’ il livello α = 5%

Simulazione

Supponiamo di sapere che la statura media degli italiani sia distribuita normalmente conmedia 170 cm e deviazione standard 10 cm.Fantasticando, supponiamo di andare in giro per i bar all’ora dell’aperitivo, e misurare lastatura di 200 persone. Suddividiamo casualmente (’randomizziamo’) le 200 persone indue campioni di 100 persone ciascuno.La domanda e: la statura dei due gruppi e uguale o e diversa, in senso statistico?

Il codice

tantibar = 10000b = numeric(tantibar)

for (j in 1:tantibar){set.seed(j)x = rnorm(100, 170, 10)y = rnorm(100, 170, 10)b[j] = t.test(x, y)$p.value

}

boxplot(b)

min(b)which(b == min(b))

set.seed(6966)x = rnorm(100, 170, 10)y = rnorm(100, 170, 10)boxplot(x, y)t.test(x, y)

b < 0.05sum(b < 0.05)sum(b < 0.05) / tantibar

1