052176727X_Astronom

7/21/2019 052176727X_Astronom

1/503

7/21/2019 052176727X_Astronom

2/503

7/21/2019 052176727X_Astronom

3/503

Modern Statistical Methods for Astronomy

Modern astronomical research is beset with avast rangeof statistical challenges, rangingfrom reducing data from megadatasets to characterizing an amazing variety of variablecelestial objectsor testingastrophysical theory. Linkingastronomy totheworldof modernstatistics, this volumeis auniqueresource, introducing astronomers toadvancedstatisticsthroughready-to-usecodein thepublic-domain R statistical softwareenvironment.

The book presents fundamental results of probability theory and statistical inference,before exploring several elds of applied statistics, such as data smoothing, regression,multivariate analysis and classication, treatment of nondetections, timeseries analysis,andspatial pointprocesses.Itappliesthemethodsdiscussedtocontemporary astronomicalresearch datasets using the R statistical software, making it an invaluable resource forgraduatestudentsandresearchers facingcomplex dataanalysis tasks.

A link totheauthors website for this book can befoundat www.cambridge.org/msma.Material availableontheir websiteincludesdatasets, R codeanderrata.

Eric D. Feigelson is a Professor in the Department of Astronomy and Astrophysics atPennsylvania StateUniversity. He is a leading observational astronomer and has workedwithstatisticiansfor 25years tobringadvancedmethodology toproblems inastronomicalresearch.

G. Jogesh Babu is Professor of Statistics and Director of the Center for Astrostatistics atPennsylvaniaStateUniversity.Hehasmadeextensivecontributionstoprobabilisticnumbertheory, resamplingmethods, nonparametric methods, asymptotic theory, andapplicationstobiomedical research, genetics, astronomy, andastrophysics.

7/21/2019 052176727X_Astronom

4/503

7/21/2019 052176727X_Astronom

5/503

Modern Statistical Methods forAstronomy

WithRApplications

ERIC D. FEIGELSONPennsylvaniaState University

G. JOGESH BABUPennsylvaniaState University

7/21/2019 052176727X_Astronom

6/503

C A M B R I D GE U N I V E R S I T Y P R E SS

Cambridge, NewYork, Melbourne, Madrid, CapeTown,Singapore, S ao Paulo, Delhi, MexicoCity

CambridgeUniversity Press TheEdinburghBuilding, CambridgeCB28RU,UK

Publishedin theUnitedStatesof Americaby CambridgeUniversity Press, New York

www.cambridge.orgInformationonthis title: www.cambridge.org/9780521767279

C E. D.FeigelsonandG. J. Babu2012

Thispublicationis incopyright. Subject to statutoryexceptionandtotheprovisionsof relevantcollectivelicensingagreements,noreproductionof anypart may takeplacewithout thewritten

permissionof CambridgeUniversity Press.

Firstpublished2012

Printedin theUnitedKingdomattheUniversity Press, Cambridge

A catalog record for this publi cation is avail able from the Bri tish Library

Library of Congress Catalog in Publi cation data Feigelson, Eric D.

Modern statistical methodsfor astronomy : withR applications/Eric D.Feigelson,G. JogeshBabu.

p. cm.ISBN 978-0-521-76727-9(hardback

1.Statistical astronomy. I. Babu,Gutti Jogesh, 1949 II. Title.QB149.F45 2012

520.72 7 dc23 2012009113

ISBN 978-0-521-76727-9Hardback

Additional resourcesfor this publication: www.cambridge.org/msma

CambridgeUniversity Presshasnoresponsibility for thepersistenceoraccuracyof URLsfor external or third-party internet websitesreferredto

in this publication, anddoesnot guaranteethat anycontentonsuchwebsitesis, orwill remain, accurateorappropriate.

7/21/2019 052176727X_Astronom

7/503

For Zoe, Claraand Micah

In memory of myparents,Nagarathnamand Mallayya

7/21/2019 052176727X_Astronom

8/503

7/21/2019 052176727X_Astronom

9/503

Contents

Preface page xv

1 Introduction 11.1 Theroleof statistics inastronomy 1

1.1.1 Astronomy andastrophysics 11.1.2 Probability andstatistics 31.1.3 Statisticsandscience 4

1.2 History of statistics inastronomy 61.2.1 Antiquity through the Renaissance 61.2.2 Foundationsof statistics incelestial mechanics 71.2.3 Statisticsintwentieth-century astronomy 8

1.3 Recommendedreading 10

2 Probability 132.1 Uncertainty in observational science 132.2 Outcomespacesandevents 142.3 Axiomsof probability 152.4 Conditional probabilities 17

2.4.1 Bayes theorem 182.4.2 Independent events 19

2.5 Randomvariables 202.5.1 Densityanddistribution functions 212.5.2 Independent andidentically distributedr.v.s. 24

2.6 Quantilefunction 252.7 Discretedistributions 262.8 Continuousdistributions 272.9 Distributions that areneither discretenor continuous 292.10 Limit theorems 30

2.11 Recommendedreading 302.12 R applications 31

3 Statistical inference 353.1 Theastronomical context 353.2 Conceptsof statistical inference 363.3 Principles of point estimation 38

vii

7/21/2019 052176727X_Astronom

10/503

viii Contents

3.4 Techniquesof point estimation 403.4.1 Methodof moments 413.4.2 Methodof least squares 423.4.3 Maximumlikelihoodmethod 43

3.4.4 Condenceintervals 453.4.5 CalculatingMLEswiththeEM algorithm 473.5 Hypothesis testing techniques 483.6 Resamplingmethods 52

3.6.1 Jackknife 523.6.2 Bootstrap 54

3.7 Model selection and goodness-of-t 573.7.1 Nonparametricmethods for goodness-of-t 583.7.2 Likelihood-basedmethods for model selection 603.7.3 Informationcriteriafor model selection 613.7.4 Comparingdifferentmodel families 62

3.8 Bayesianstatistical inference 633.8.1 Inferencefor thebinomial proportion 643.8.2 Prior distributions 653.8.3 Inferencefor Gaussiandistributions 673.8.4 HypothesestestingandtheBayesfactor 693.8.5 Model selection and averaging 703.8.6 Bayesiancomputation 71

3.9 Remarks 723.10 Recommendedreading 733.11 R applications 74

4 Probability distributionfunctions 764.1 Binomial andmultinomial 76

4.1.1 Ratioof binomial randomvariables 794.2 Poisson 80

4.2.1 Astronomical context 804.2.2 Mathematical properties 814.2.3 Poissonprocesses 83

4.3 Normal andlognormal 854.4 Pareto (power-law) 87

4.4.1 Least-squaresestimation 89

4.4.2 Maximumlikelihood estimation 904.4.3 Extensionsof thepower-law 914.4.4 MultivariatePareto 924.4.5 Originsof power-laws 93

4.5 Gamma 944.6 Recommendedreading 964.7 R applications 96

4.7.1 ComparingPareto distributionestimators 97

7/21/2019 052176727X_Astronom

11/503

ix Contents

4.7.2 Fittingdistributions to data 1014.7.3 Scopeof distributions in R and CRAN 103

5 Nonparametricstatistics 105

5.1 Theastronomical context 1055.2 Concepts of nonparametric inference 1065.3 Univariateproblems 107

5.3.1 KolmogorovSmirnov andother e.d.f. tests 1075.3.2 Robust statistics of location 1105.3.3 Robust statisticsof spread 111

5.4 Hypothesis testing 1115.4.1 Sign test 1125.4.2 Two-sampleand k -sampletests 112

5.5 Contingency tables 1135.6 Bivariateandmultivariatetests 1155.7 Remarks 1165.8 Recommendedreading 1175.9 R applications 117

5.9.1 Exploratory plotsandsummary statistics 1175.9.2 Empirical distributionandquantilefunctions 1215.9.3 Two-sampletests 1245.9.4 Contingency tables 1255.9.5 Scopeof nonparametrics in R and CRAN 127

6 Data smoothing: density estimation 1286.1 Theastronomical context 1286.2 Conceptsof density estimation 1286.3 Histograms 1296.4 Kernel density estimators 131

6.4.1 Basic properties 1316.4.2 Choosingbandwidths by cross-validation 1326.4.3 Multivariatekernel density estimation 1336.4.4 Smoothingwithmeasurementerrors 134

6.5 Adaptivesmoothing 1346.5.1 Adaptivekernel estimators 1346.5.2 Nearest-neighbor estimators 135

6.6 Nonparametric regression 1366.6.1 NadarayaWatson estimator 1366.6.2 Local regression 137


6.9.1 Histogram, quantile functionandmeasurement errors 1396.9.2 Kernel smoothers 140

7/21/2019 052176727X_Astronom

12/503

x Contents

6.9.3 Nonparametric regressions 1446.9.4 Scopeof smoothingin R and CRAN 148

7 Regression 150

7.1 Astronomical context 1507.2 Conceptsof regression 1517.3 Least-squares linear regression 154

7.3.1 Ordinary least squares 1547.3.2 Symmetric least-squaresregression 1557.3.3 Bootstraperror analysis 1567.3.4 Robust regression 1587.3.5 Quantileregression 1607.3.6 Maximumlikelihood estimation 161

7.4 Weighted least squares 1627.5 Measurement error models 164

7.5.1 Least-squaresestimators 1667.5.2 SIMEX algorithm 1687.5.3 Likelihood-based estimators 169

7.6 Nonlinear models 1697.6.1 Poisson regression 1707.6.2 Logistic regression 171

7.7 Model validation, selectionandmisspecication 1727.7.1 Residual analysis 1737.7.2 Cross-validationandthebootstrap 175


7.10.1 Linear modeling 1797.10.2 Generalized linear modeling 1817.10.3 Robust regression 1827.10.4 Quantileregression 1837.10.5 Nonlinear regressionof galaxysurfacebrightnessproles 1847.10.6 Scopeof regressionin R and CRAN 189

8 Multivariate analysis 1908.1 Theastronomical context 190

8.2 Concepts of multivariateanalysis 1918.2.1 Multivariatedistances 1928.2.2 Multivariatenormal distribution 194

8.3 Hypothesis tests 1958.4 Relationships among thevariables 197

8.4.1 Multiplelinear regression 1978.4.2 Principal components analysis 1998.4.3 Factor andcanonical correlationanalysis 200

7/21/2019 052176727X_Astronom

13/503

xi Contents

8.4.4 Outliers and robust methods 2018.4.5 Nonlinear methods 202

8.5 Multivariatevisualization 2038.6 Remarks 204

8.7 Recommendedreading 2058.8 R applications 2068.8.1 Univariatetests of normality 2068.8.2 Preparing thedataset 2088.8.3 Bivariaterelationships 2098.8.4 Principal components analysis 2128.8.5 Multipleregression and MARS 2148.8.6 Multivariatevisualization 2168.8.7 Interactive graphical displays 2178.8.8 Scopeof multivariateanalysis R and CRAN 220

9 Clustering, classi cation and data mining 2229.1 Theastronomical context 2229.2 Conceptsof clusteringandclassication 224

9.2.1 Denitionsandscopes 2249.2.2 Metrics, groupcentersandmisclassications 225

9.3 Clustering 2269.3.1 Agglomerativehierarchical clustering 2269.3.2 k -meansandrelatednonhierarchical partitioning 228

9.4 Clusters with substructureor noise 2299.5 Mixturemodels 2319.6 Supervisedclassication 232

9.6.1 Multivariatenormal clusters 2329.6.2 Linear discriminant analysis anditsgeneralizations 2339.6.3 Classication trees 2349.6.4 Nearest-neighbor classiers 2369.6.5 Automated neural networks 2379.6.6 Classier validation, improvement andfusion 238


9.9.1 Unsupervisedclusteringof COMBO-17galaxies 242

9.9.2 Mixturemodels 2469.9.3 Supervisedclassicationof SDSS point sources 2509.9.4 LDA, k -nnandANN classication 2519.9.5 CART andSVM classication 2559.9.6 Scopeof R and CRAN 259

10 Nondetections: censored and truncateddata 26110.1 Theastronomical context 261

7/21/2019 052176727X_Astronom

14/503

xii Contents

10.2 Conceptsof survival analysis 26310.3 Univariate datasets with censoring 266

10.3.1 Parametric estimation 26610.3.2 KaplanMeier nonparametricestimator 268

10.3.3 Two-sampletests 26910.4 Multivariatedatasetswithcensoring 27110.4.1 Correlationcoefcients 27110.4.2 Regressionmodels 272

10.5 Truncation 27410.5.1 Parametric estimation 27510.5.2 NonparametricLynden-BellWoodroofeestimator 275


10.8.1 KaplanMeier estimator 27910.8.2 Two-sample testswithcensoring 28110.8.3 Bivariateandmultivariateproblemswithcensoring 28410.8.4 Lynden-BellWoodroofeestimator for truncation 28710.8.5 Scopeof censoringandtruncation in R and CRAN 290

11 Time series analysis 29211.1 Theastronomical context 29211.2 Concepts of timeseries analysis 29411.3 Time-domainanalysisof evenly spaceddata 296

11.3.1 Smoothing 29611.3.2 Autocorrelationandcross-correlation 29711.3.3 Stochasticautoregressivemodels 29811.3.4 Regressionfor deterministic models 301

11.4 Time-domainanalysisof unevenly spaceddata 30211.4.1 Discrete correlation function 30211.4.2 Structurefunction 304

11.5 Spectral analysisof evenlyspaceddata 30411.5.1 Fourier power spectrum 30511.5.2 Improving theperiodogram 307

11.6 Spectral analysisof unevenly spaceddata 30811.6.1 LombScargleperiodogram 308

11.6.2 Non-Fourier periodograms 31011.6.3 Statistical signicanceof periodogrampeaks 31211.6.4 Spectral analysisof eventdata 31311.6.5 Computational issues 314

11.7 State-spacemodelingandtheKalmanlter 31511.8 Nonstationary timeseries 31711.9 1 / f noiseor long-memory processes 31911.10 Multivariatetimeseries 322

7/21/2019 052176727X_Astronom

15/503

xiii Contents


11.13.1 Exploratory timeseriesanalysis 326

11.13.2 Spectral analysis 32911.13.3 Modelingasanautoregressiveprocess 33011.13.4 Modelingasalong-memory process 33311.13.5 Wavelet analysis 33411.13.6 Scopeof timeseriesanalysis in R and CRAN 336

12 Spatial point processes 33712.1 Theastronomical context 33712.2 Concepts of spatial point processes 33812.3 Testsof uniformity 34012.4 Spatial autocorrelation 341

12.4.1 Global measuresof spatial autocorrelation 34112.4.2 Local measures of spatial autocorrelation 343

12.5 Spatial interpolation 34412.6 Global functions of clustering 346

12.6.1 Cumulativesecond-moment measures 34612.6.2 Two-point correlation function 348

12.7 Model-basedspatial analysis 35112.7.1 Models for galaxy clustering 35112.7.2 Models ingeostatistics 353

12.8 Graphical networksandtessellations 35412.9 Pointsonacircleor sphere 35512.10 Remarks 35712.11 Recommendedreading 35812.12 R applications 359

12.12.1 Characterizationof autocorrelation 36112.12.2 Variogramanalysis 36212.12.3 Characterizationof clustering 36412.12.4 Tessellations 36812.12.5 Spatial interpolation 37012.12.6 Spatial regressionandmodeling 37312.12.7 Circular andspherical statistics 374

12.12.8 Scopeof spatial analysis in R and CRAN 377

Appendix A Notation andacronyms 379

Appendix B Getting startedwithR 382B.1 History andscopeof R/CRAN 382B.2 Sessionenvironment 382B.3 R object classes 385

7/21/2019 052176727X_Astronom

16/503

xiv Contents

B.4 Basic operationsonclasses 386B.5 Input/output 388B.6 A sample R session 389B.7 Interfaces to other programsandlanguages 394

B.8 Computational efciency 394B.9 Learningmoreabout R 397B.10 Recommendedreading 398

Appendix C Astronomical datasets 399C.1 Asteroids 400C.2 Protostar populations 402C.3 Globular cluster magnitudes 403C.4 Stellar abundances 405C.5 Galaxy clustering 406C.6 Hipparcosstars 408C.7 Globular cluster properties 410C.8 SDSS quasars 411C.9 SDSS point sources 413C.10 Galaxy photometry 419C.11 Elliptical galaxy proles 420C.12 X-ray sourcevariability 421C.13 Sunspot numbers 422C.14 Exoplanet orbits 423C.15 Kepler stellar light curves 425C.16 Sloan Digital Sky Survey 428C.17 Fermi gamma-ray light curves 430C.18 Swift gamma-ray bursts 432

References 434Subject index 462R and CRAN commands 470

The color plates are to be found between pages 398 and 399 .

7/21/2019 052176727X_Astronom

17/503

Preface

Motivation andgoalsFormanyyears,astronomershavestruggledwith theapplicationof sophisticatedstatisticalmethodologiesto analyze their richdatasetsandaddresscomplex astrophysical problems.Ononehand, at least in theUnitedStates,astronomersreceivelittleornoformal traininginstatistics. Thetraditional methodof educationhasbeeninformal exposuretoafewfamiliarmethods duringearly research experiences. On theother hand, astronomers correctly per-

ceive that avastworldof appliedmathematical andstatistical methodologieshas emergedinrecentdecades.Butsystematic, broadtraininginmodernstatistical methodshasnotbeenavailable tomostastronomers.

Thisvolumeseeks toaddress thisproblemat threelevels. First, wepresent fundamentalprinciples andresults of broad elds of statisticsapplicabletoastronomical research. Thematerial isroughlyatalevel ofadvancedundergraduatecoursesinstatistics.Wealsooutlinesome recent advanced techniques that may beuseful for astronomical research to giveaavor of thebreadthof modernmethodology. It is importanttorecognizethatwegiveonlyincompleteintroductionstotheelds, andweguidetheastronomer towardsmorecompleteandauthoritativetreatments.

Second, we present tutorials on the application of both simple and more advanced

methods applied to contemporary astronomical research datasets using the R statisticalsoftware package. R has emerged in recent years as the most versatile public-domainstatistical software environment for researchers in many elds. In addition to a coherentlanguage for data analysis and common statistical tools, over 3000 packages have beenadded for advanced analyses in the CRAN archive. We have culled these packages forfunctionalities that may beuseful to astronomers. R can also be linked to other analysissystems and languages such as C, FORTRAN and Python, so that legacy codes can beincludedin an R-basedanalysis and vice versa .

Third, we hope the book communicates to astronomers our enthusiasm for statisticsas a substantial and fascinating intellectual enterprise. J ust as astronomers use the latestengineeringtobuild their telescopes andapply advanced physics to interpret cosmic phe-nomena, they canbenet fromexploringthemany roadsof analyzingandinterpretingdatathroughmodern statistical analysis.

Another importantpurposeof thisvolumeis togiveastronomersandother physical sci-entistsabridgetothevastlibraryof specializedtextsandmonographsinstatisticsandalliedelds.WestronglyencourageresearcherswhoareengagedinstatisticaldataanalysistoreadmoredetailedtreatmentsintheRecommendedreading attheendof eachchapter; theyarecarefullychosenfrommanyavailablevolumes.Mostof thematerial inthebookwhichisnot

xv

7/21/2019 052176727X_Astronom

18/503

xvi Preface

specicallyreferencedinthetextispresentedinmoredetail intheserecommendedreadings. Tofurtherthisgoal,thepresentbookdoesnotshyawayfromtechnical languagethat, thoughunfamiliarintheastronomical community, iscritical for further learningfromthestatisticalliterature. For example, theastronomers upper limits areleft-censored datapoints, a

power-lawdistribution isaParetodistribution,and1 / f noise isalong-memorypro-cess. Thetextmaketheseconnectionsbetween thelanguagesof astronomy andstatistics,andthecomprehensiveindex canassistthereader inndingmaterial inboth languages.

The reader may nd theappendices useful. An introduction to R is given in AppendixB. It includes an overview of the programming language and an outline of its statisti-cal functionalities, including the many CRAN packages. R applications to astronomicaldatasetsaregivenattheendof eachchapterwhichimplementmethodsdiscussedinthetext.AppendixC presents18astronomical datasetsillustratingtherangeof statistical challengesthat ariseincontemporary research. Thefull datasetsand R scripts areavailableonlineathttp://astrostatistics.psu.edu/MSMA. Readerscanthuseasily reproducethe R resultsin thebook.

In this volume, we do not present mathematical proofs underlying statistical results,and we give only brief outlines of a few computational algorithms. We do not reviewresearchat thefrontiers of astrostatistics, except for afew topicswhereastronomers havecontributed critically importantmethodology (such as the treatmentof truncated dataandirregularly spaced timeseries). Only asmall fraction of themany methodological studiesin therecentastronomical literaturearementioned. Someeldsof appliedstatisticsusefulfor astronomy (such as wavelet analysis and imageprocessing) are covered only briey.Finally, only

2500 CRAN packages were examined for possible inclusion in the book;

roughly onenew packageisaddedevery day andmany others areextended.

Audience The main audience envisioned for this volume is graduate students and researchers inobservational astronomy. Wehopeit servesbothasatextbook inacourseondataanalysisor astrostatistics, andasareferencebook tobeconsultedas specic researchproblemsareencountered. Researchers in allied elds of physical science, suchas high-energy physicsandEarthsciences,may alsondportionsof thevolumehelpful. Statisticianscanseehowexistingmethodsrelatetoquestionsinastronomy, providingbackgroundfor astrostatisticalresearchinitiatives.

Our presentationassumes that the reader has a background in basic linear algebra andcalculus. Familiarity of elementary statistical methods commonly used in the physical

sciences is also useful; this preparatory material is covered in volumes suchas Bevington& Robinson(2002) andCowan(1998).

Outlineandclassroom use Theintroduction(Chapter 1)reviewsthelonghistorical relationshipbetweenastronomyandstatisticsandphilosophical discussionsof therelationshipbetweenstatistical andscienticinference.Wethenstartwithprobability theoryandproceedtolayfoundationsof statisticalinference: hypothesis testing, estimation, modeling, resampling and Bayesian inference

7/21/2019 052176727X_Astronom

19/503

xvii Preface

(Chapters 2 and 3). ProbabilitydistributionsarediscussedinChapter 4 andnonparametricstatistics arecoveredinChapter 5.

Thevolumeproceedstovariouseldsof appliedstatisticsthat restonthesefoundations.Data smoothing is covered in Chapters 5 and 6. Regression is discussed in Chapter 7,

followedby analysis andclassicationof multivariatedata(Chapters 8 and 9). Treatmentsof nondetections are covered in Chapter 10, followed by the analysis of time-variableastronomical phenomena in Chapter 11. Chapter 12 considers spatial point processes. The book ends with appendices introducing the R software environment and providingastronomical datasets illustrativeof avarietyof statistical problems.

Wecanmakesomerecommendationregardingclassroomuse.Therstpartof asemestercourse in astrostatistics for astronomy students would be devoted to the principles of statistical inference in Chapters 14 and learning the basics of R in Appendix B. Thesecond part of the semester would be topics of applied statistical methodology selectedfromChapters 512.Wedonotprovidepredenedstudentexerciseswithdenitiveanswers,but rather encourageboth instructorsandstudents to develop open-ended explorations of the contemporary astronomical datasets based on the R tutorials distributed throughoutthe volume. Suggestions for both simple andadvanced problems aregiven in the datasetpresentations(Appendix C).

Astronomical datasets andR scripts The datasets and R scripts in the book can be downloaded from Penn States Centerfor Astrostatisticsathttp://astrostatistics.psu.edu/MSMA. The R scriptsareself-contained;simplecut-and-pastewill ingestthedatasets, performthestatistical operations,andproducetabular andgraphical results.

Extensive resources to pursue issues discussed in the book are available on-line. TheR system can be downloaded from http://www.r-project.org and CRAN packages areinstalled on-the-y within an R session. The primary astronomy research literature, in-cluding full-text articles, is available through the NASASmithsonian Astrophysics Data System (http://adswww.harvard.edu).Thousandsofastronomical datasetsareavailablefromthe Vizier serviceat the Centredes Donn ees Stellaires (http://vizier.u-strasbg.fr) and theemerging International Vir tual Observatory All iance (http://ivo.net).TheprimarystatisticalliteraturecanbeaccessedthroughMathSciNet (http://www.ams.org/mathscinet/) providedby theAmericanMathematical Society. Considerablestatistical informationis availableonWikipedia (http://en.wikipedia.org/wiki/Index of statistics articles). Astronomers shouldnote, however, that the best way to learn statistics is often through textbooks andmono-

graphs writtenby statisticians, suchas thosein therecommendedreading.

Acknowledgements

Thisbookemergedfrom25yearsof discussionandcollaborationbetweenastronomersandstatisticians at Penn Stateunder theauspices of theCenter for Astrostatistics. Thevolumeparticularly beneted from the lectures and tutorials developed for the Summer Schools

7/21/2019 052176727X_Astronom

20/503

xviii Preface

in Statistics for Astronomers since2005 andtaught at PennStateand Bangalores IndianInstituteof Astrophysics. Wearegrateful toour dozensof statisticiancolleagueswhohavetaught at the Summer Schools in Statistics for Astronomers for generously sharing theirknowledgeandperspectives. David Hunter andArnab Chakraborty developed R tutorials

for astronomers. Donald Percival generously gave detailed comments on the time seriesanalysis chapter. Wearegrateful toNancyButkovich andher colleagues for providingex-cellentlibraryservices.Finally,weacknowledgetheNational ScienceFoundation,NationalAeronautics andSpaceAdministration, andtheEberly Collegeof Sciencefor supportingastrostatistics at PennStateover manyyears.

Eric D.FeigelsonG. JogeshBabu

Center for AstrostatisticsPennsylvaniaStateUniversity

University Park, PA, U.S.A.

7/21/2019 052176727X_Astronom

21/503

1 Introduction

1.1 The role of statistics in astronomy

1.1.1 Astronomy andastrophysics

Today, the term astronomy is best understood as shorthand for astronomy andastrophysics. Astronomy ( astro

=star and nomen

=name in ancient Greek) is the ob-

servational studyof matter beyondEarth: planetsandbodies in theSolar System, stars inthe Milky Way Galaxy, galaxies in the Universe, and diffusematter between thesecon-centrations of mass. Theperspectiveis rooted in our viewpointonor near Earth, typicallyusing telescopes on mountaintops or robotic satellites to enhancethe limited capabilitiesof our eyes. Astrophysics ( astro =star and physis =nature) is the study of the intrinsicnature of astronomical bodies and the processes by which they interact andevolve. Thisis an indirect, inferential intellectual effortbasedonthe(apparently valid) assumptionthatphysical processes established to rule terrestrial phenomena gravity, thermodynamics,electromagnetism, quantummechanics, plasmaphysics, chemistry, andsoforth alsoap-ply todistantcosmicphenomena. Figure 1.1 givesabroad-strokeoutlineof themajorelds

andthemesof modern astronomy. Theelds of astronomy areoften distinguishedby thestructures under study. Thereareplanetary astronomers (who study our Solar System and extra-solar planetary systems),solar physicists(who study our Sun), stellar astronomers(who study other stars), Galacticastronomers (who study our Milky Way Galaxy), extragalactic astronomers (who studyother galaxies), andcosmologists (who study theUniverse as a whole). Astronomers canalso bedistinguished by the type of telescope used: there are radio astronomers, infraredastronomers, visible-light astronomers, X-ray astronomers, gamma-ray astronomers, andphysicistsstudyingcosmic rays,neutrinosandtheelusivegravitational waves.Astrophysi-cistsaresometimesclassiedbytheprocessestheystudy:astrochemists,atomicandnuclearastrophysicists, general relativists(studyinggravity) andcosmologists.

Theastronomer mightproceedto investigatestellar processesby measuringanordinarymain-sequence star with spectrographs at different wavelengths of light, examining itsspectral energydistributionwiththousandsofabsorptionlines.Theastrophysicistinterpretsthat the emission of a star is produced by a sphereof 10 57 atoms with a specic mixtureof elemental abundances, powered by hydrogen fusion to helium in the core, revealingitself totheUniverseasablackbody surfaceat several thousanddegrees temperature. Thedevelopment of the observations of normal stars started in the late-nineteenth century,

1

7/21/2019 052176727X_Astronom

22/503

2 Introduction

Fig. 1.1 Diagrams summarizing some important elds and themes of modern astronomy. Top: the history and growth of structures of the expanding Universe; bottom: the evolution of stars with generation of heavy elements andproduction of long-lived structures.

7/21/2019 052176727X_Astronom

23/503

3 1.1 The role of statistics in astronomy

andthesuccessful astrophysical interpretation emergedgradually throughoutthetwentiethcentury. Thevibrant interwoven progress of astronomy and astrophysics continues todayas many other cosmic phenomena, frommolecular clouds toblack holes, areinvestigated.Cosmology, inparticular,hasemergedwiththeremarkableinferencethatthefamiliaratoms

arounduscompriseonly asmall fractionof thestuff in theUniversewhichis dominatedby mysteriousdark matter anddark energyinaccessibletonormal telescopesor laboratoryinstruments.

1.1.2 Probability andstatistics

Whilethereis little debateabout themeaningandgoals of astronomy andastrophysicsasintellectual enterprises, themeaningandgoalsof probabilityandstatisticshasbeenwidelydebated. In his volume Statistics and Truth , C. R. Rao (1997) discusses how the termstatistics has changedmeaningover thecenturies. It originally referred to thecollectionandcompilationof data. In thenineteenthcentury, it accruedthegoal of themathematicalinterpretation of data, often to assist in making real-world decisions. Rao views contem-porary statistics as an amalgam of a science (techniques derived from mathematics), atechnology (techniques useful for decision-making in thepresenceof uncertainty), andanart (incompletely codiedtechniquesbasedoninductivereasoning).

Barnett(1999) considersvariousviewpointsonthemeaningof statistics. Therstgroupof quotesseestatisticsasavery useful, but essentially mechanical, technologyfor treatingdata. In this sense, it playsarole similar toastronomys role vis a vis astrophysics.

1. Therst task of a statistician is cross-examination of data. (Sir R. A. Fisher, quotedby Rao1997)

2. [S]tatistics refers to the methodology for the collection, presentation, and analysis of data, andfor theusesof suchdata. (Neter et al . 1978)

3. Broadly dened, statistics encompasses thetheory andmethodsof collecting, organiz-ing, presenting, analyzing, andinterpreting datasets soas to determine their essentialcharacteristics. (Panik 2005)

The following interpretations of statistics emphasize its role in reducing random vari-ations in observations to reveal important effects in the underlying phenomenon understudy.

4. A statistical inferencecarriesusfromobservationstoconclusionsaboutthepopulationssampled. (Cox1958)

5. Uncertainknowledge +Knowledgeof theamountof uncertaintyin it =Usableknowl-edge. (Rao 1997)6. My favourite denition [of statistics] is bipartite: statistics is both the science of

uncertainty andthetechnologyof extractinginformationfromdata. (Hand2010)7. Instatistical inferenceexperimental orobservational dataaremodeledastheobserved

values of randomvariables, to provideaframework fromwhich inductiveconclusionsmay bedrawnaboutthemechanismgivingrisetothedata. (Young& Smith 2005)

7/21/2019 052176727X_Astronom

24/503

4 Introduction

1.1.3 Statistics andscience

Opinions differ widely when considering the relationship between statistical analysis of empirical dataandtheunderlyingreal phenomena. A groupof prominenttwentieth-century

statisticiansexpressconsiderablepessimismthat statistical modelsareanythingbut usefulctions, much as Renaissance Europe debated the meaning of Copernicus heliocentriccosmological model. Thesescholars view statistical models as useful but often trivial orevenmisleadingrepresentationsofacomplexworld.Sir D.R. Cox,towardstheendofalongcareer, perceivesabarrier betweenstatistical ndingsandthedevelopmentorvalidationof scientic theories.

8. There is no need for these hypotheses to be true, or even to be at all like the truth;rather onethingis sufcient for them that they should yield calculationswhichagreewiththeobservations. (OsiandersprefacetoCopernicus De Revolutionibus , quotedby Rao1997)

9. Essentially, all models arewrong, butsomeareuseful. (Box& Draper 1987)10. [Statistical] models can provideuswith ideas which we test against data, and about

whichwebuildupexperience. Theycanguideour thinking, leadustoproposecoursesof action, and so on, and if used sensibly, and with an open mind, and if checkedfrequently with reality, might help us learn something that is true. Some statisticalmodels arehelpful in agiven context, andsomearenot. . . .What wedo works(whenit does) because it can be seen to work, notbecause it is based ontrue or even goodmodelsof reality. (Speed1992, addressingameetingof astronomers)

11. It is notalwaysconvenient toremember that therightmodel for apopulationcantasampleof dataworsethanawrongmodel, evenawrongmodel withfewer parameters.We cannot rely on statistical diagnostics to save us, especially with small samples.Wemust think about what our models mean, regardless of t, or wewill promulgatenonsense. (Wilkinson2005).

12. Theobject [of statistical inference] is to provideideas andmethods for the criticalanalysis and, as far as feasible, the interpretationof empirical data. . . Theextremelychallengingissuesof scientic inferencemayberegardedasthoseof synthesisingverydifferent kinds of conclusions if possibleinto acoherent wholeor theory . . . Theuse,if any, in theprocess of simple quantitative notionsof probability andtheir numericalassessment is unclear . . . (Cox 2006)

Other scholarsquoted below are moreoptimistic. The older Sir R. A. Fisher bemoansa mechanistic view of statistics without meaning in the world. G. Young and R. Smithimply that statistical modelingcan lead to an understanding of thecausativemechanismsof variations in theunderlyingpopulation. I. Hacking,aphilosopher,believesstatisticscanimprove our scientic inferences but not lead to new discovery. B. Efron, in an addressas President of theAmerican Statistical Association, feels that statistics can propel manysciences towardsimportantresultsandinsights.

13. Toonebroughtupin thefreeintellectual atmosphereof anearlier timethereis some-thing rather horrifying in the ideological movement represented by the doctrinethat

7/21/2019 052176727X_Astronom

25/503

5 1.1 The role of statistics in astronomy

reasoning, properly speaking,cannotbeappliedtoempirical datatoleadtoinferencesvalid in thereal world. (Fisher 1973)

14. The quiet statisticians have changed our world, not by discovering new facts ortechnical developments, but by changing the ways wereason, experiment, and form

our opinions. (Hacking1990)15. Statistics has become the primary mode of quantitativethinking in literally dozensof elds, fromeconomicstobiomedical research. Thestatistical tidecontinues torollin, now lapping at the previously unreachable shores of the hard sciences. . . . Yes,condenceintervals apply aswell toneutrinomassesas todiseaserates, andraisethesameinterpretivequestions, too. (Efron2004)

16. Thegoal of scienceistounlocknaturessecrets. . . . Ourunderstandingcomesthroughthe development of theoretical models which are capable of explaining the existingobservations as well as making testable predictions. . . . Fortunately, a variety of so-phisticated mathematical andcomputational approaches havebeen developed to helpus throughthis interface, thesegounder thegeneral headingof statistical inference.(Gregory 2005)

Leadingstatisticiansarethusoftenmorecautious,oratleastlessself-condent,aboutthevalueof their laborsfor understandingphenomenathanareastronomers.Mostastronomersbelieveimplicitly thattheir observationsprovideaclear windowintothephysical Universe,and that simple quantitative statistical interpretations of their observations represent animprovementover qualitativeexamination of thedata.

We generally share the optimistic view of statistical methodology in the service of astronomy and astrophysics, as expressed by P. C. Gregory (2005). In the languageof thephilosophyofscience,wearepositivistswhobelievethatunderlyingcausal relationshipscanbediscoveredthroughthedetectionandstudyof regularpatternsof observablephenomena.While quantitative interpretation and models of complex biological and human affairsattempted by many statisticians may be more useful for prediction or decision-makingthan understanding the underlying behaviors, we feel that quantitative models of manyastrophysical phenomenacanbeveryvaluable.A social scientistmightinterviewasampleof voters toaccurately predict theoutcomeof an election,yet never understandthebeliefsunderlyingthesevotes. But an astrostatisticianmay largely succeed in understanding theorbitsof binary stars, orthebehaviorof anaccretiondisk aroundablack holeorthegrowthof structurein an expanding Universe, that must obey deterministic mathematical lawsof physics.

However,wewishtoconveythroughoutthisvolumethattheprocessof linkingstatistical

analysis torealityis not simpleandchallengesmustbefacedatall stages. Insettingupthecalculation, thereareoftenseveral relatedquestionsthatmightbeaskedinagivenscienticenterprise, andtheir statistical evaluation may lead toapparently differentconclusions. Inperformingthecalculation, thereareoftenseveral statistical approachestoagivenquestionaskedaboutadataset,eachmathematicallyvalidundercertainconditions,yetagainleadingto different scientic inferences. In interpreting the result, even a clear statistical ndingmay giveanerroneousscientic interpretation if themathematical model ismismatchedtophysical reality.

7/21/2019 052176727X_Astronom

26/503

6 Introduction

Astronomers should be exible and sophisticated in their statistical treatments,and adopt a more cautious view of the results. A 3-sigma result does not necessarilyrepresent astrophysical reality. Astronomers might rst seek consensus about the exactquestion to beaddressed, apply a suiteof reasonable statistical approaches to the dataset

with clearly stated assumptions, andrecognize that the link between thestatistical resultsandtheunderlyingastrophysical truthmay not bestraightforward.

1.2 History of statistics in astronomy

1.2.1 Antiquity through theRenaissance

Astronomy is the oldest observational science. The effort to understand the mysteriousluminous objects in the sky has been an important element of human culture for tensof thousands of years. Quantitative measurements of celestial phenomena were carriedout by many ancient civilizations. The classical Greeks were not active observers butwere unusually creative in the applications of mathematical principles to astronomy. Thegeometricmodelsof thePlatonistswithcrystallinespheresspinningaroundthestaticEarthwereelaborated in detail, andthismodel enduredinEuropefor fteencenturies.

The Greek natural philosopher Hipparchus made one of the rst applications of math-ematical principles in therealmof statistics, andstarted amillennium-long discussiononprocedures for combining inconsistentmeasurements of aphysical phenomenon(Sheynin1973, Hald 2003). Finding scatter in Babylonian measurements of the length of a year,dened as the time between solstices, he took the middle of the range rather than themean or median for the best value. Today, this is known as the midrange estimator of location, and is generally not favored due to its sensitivity to erroneous observations.Ptolemy andtheeleventh-century Persianastronomer AbuRayhan Biruni (al-Biruni) sim-ilarly recommended theaverageof extremes. Somemedieval scholars advised against theacquisition of repeated measurements, fearing that errors would compound uncertaintyrather than compensatefor each other. Theutility of themean of discrepantobservationsto increaseprecisionwas promoted in the sixteenth century by Tycho Brahe andGalileoGalilei. Johannes Kepler appears to have inconsistently used arithmetic means, geomet-ric means and middle values in his work. The supremacy of the mean was not settled inastronomy until theeighteenthcentury (Simpson1756).

Ancient astronomers were concerned with observational errors, discussing dangers of propagatingerrors frominaccurateinstrumentsandinattentiveobservers. Inastudy of thecorrections to astronomical positions fromobservers in different cities, al-Biruni alludestothreetypesof errors: . . . theuseof sinesengenderserrorswhichbecomeappreciable if they areaddedtoerrorscausedby theuseof small instruments,anderrorsmadeby humanobservers(quotedbySheynin1973).Inhis1609 Di alogue on theTwo Great World Views,Ptolemaic and Copernican , Galileo also gavean early discussion of observational errorsconcerning the distance to the supernovaof 1572. Hereheoutlined in nonmathematical

7/21/2019 052176727X_Astronom

27/503

7 1.2 History of statistics in astronomy

languagemany of thepropertiesof errors later incorporatedby Gauss into his quantitativetheoryof errors.

1.2.2 Foundationsof statistics in celestialmechanics

Celestial mechanicsin theeighteenthcentury, inwhichNewtonslaw of gravity wasfoundto explain even the subtlest motions of heavenly bodies, required the quantication of afew interestingphysical quantities fromnumerous inaccurateobservations. Isaac Newtonhimself had little interest in quantitativeprobabilistic arguments. In 1726, he wrotecon-cerning discrepant observations of theComet of 1680 that, Fromall this it is plain thattheseobservations agreewiththeory, in so faras they agreewithoneanother (quoted byStigler 1986).

Otherstackledtheproblemof combiningobservationsandestimatingphysical quantitiesthrough celestial mechanics moreearnestly. In 1750 while analyzing the libration of theMoon as head of the observatory at G ottingen, Tobias Mayer developed a method of averages for parameter estimation involving multiple linear equations. In 1767, BritishastronomerJohnMichell similarly usedasignicancetestbasedontheuniformdistribution(though with some technical errors) to show that the Pleiades is a physical, rather thanchance, groupingof stars. JohannLambert presented an elaboratetheory of errors, oftenin astronomical contexts, duringthe1760s.Bernouilli andLambert laid thefoundationsof theconceptof maximumlikelihoodlater developedmorethoroughly by Fisher intheearlytwentiethcentury.

TheMarquisPierre-SimondeLaplace(1749 1827), themostdistinguishedFrenchsci-entistof his time, andhis competitorAdrien-MarieLegendre, madeseminal contributionsboth tocelestial mechanics andtoprobability theory, often intertwined. Their generaliza-tions of Mayersmethods for treating multiple parametric equations constrained by manydiscrepant observations had great impact. In astronomical and geodetical studies duringthe1780s andin his huge17991825 opus M ecanique C eleste , Laplaceproposed param-eter estimation for linear models by minimizingthe largest absoluteresidual. In an 1805appendix to a paper on cometary orbits, Legendre proposed minimizing the sum of thesquaresof residuals,or themethodof leastsquares.Heconcludedthat themethodof leastsquaresreveals,inamannerof speaking, thecenteraroundwhichtheresultsofobservationsarrangethemselves,sothat thedeviationsfromthatcenter areassmall aspossible (quotedby Stigler 1986).

Both Carl Friedrich Gauss, also director of the observatory at G ottingen, andLaplacelater placedthemethodof leastsquaresontoasolidmathematical probabilistic foundation.

While themethodof least squares had been adopted as apractical convenience by GaussandLegendre, Laplacersttreatedit asaprobleminprobabilitiesinhis Th eorieAnalytique des Probabil it es . Heprovedby an intricateanddifcult courseof reasoningthat it was themostadvantageousmethodfor ndingparameters inorbital models fromastronomical ob-servations, themeanof theprobabilitiesof error inthedeterminationof theelementsbeingthereby reduced to aminimum. Least-squarescomputations rapidly became theprincipalinterpretivetool for astronomical observationsandtheir linkstocelestial mechanics. Theseandother approaches tostatistical inferencearediscussedinChapter 3.

7/21/2019 052176727X_Astronom

28/503

8 Introduction

Inanother portionof the Th eorie , Laplacerescuedfromobscurity thepostulationof theCentral Limit Theoremby the mathematician AbrahamDe Moivrewho, in a remarkablearticle published in 1733, used the normal distribution to approximate the distributionof the number of heads resulting frommany tosses of a fair coin. Laplace expanded De

Moivresnding by approximatingthebinomial distribution with thenormal distribution.Laplacesproof was awed, andimprovements weredeveloped by Sim eon-Denis Poisson,an astronomer at Paris Bureau des Longitudes, andFriedrich Bessel, director of the ob-servatory in K onigsberg. Today, theCentral Limit Theoremis considered tobeoneof thefoundationsof probability theory (Section 2.10 ).

Gauss established his famous error distribution and related it to Laplaces method of least squares in 1809. Astronomer Friedrich Bessel introduced the concept of probableerror ina1816study of comets, anddemonstratedtheapplicability of Gauss distributiontoempirical stellar astrometric errors in 1818. Gauss also introduced sometreatments forobservationswithdifferent(heteroscedastic) measurementerrorsanddevelopedthetheoryfor unbiased minimum variance estimation. Throughout the nineteenth century, Gaussdistributionwaswidely knownas theastronomical error function.

Although the fundamental theory was developed by Laplace and Gauss, other astron-omerspublishedimportantcontributionstothetheory, accuracyandrangeof applicabilityof the normal distribution and least-squares estimation during the latter part of the nine-teenth century (Hald 1998). They include Ernst Abbe at the Jena Observatory and theoptics rm of Carl Zeiss, Auguste Bravais of the Univerity of Lyons, J ohann Encke of the Berlin Observatory, Britains Sir John Herschel, Simon Newcomb of the U.S. NavalObservatory, Giovanni Schiaparelli of BreraObservatory, andDenmarksThorvaldThiele.Sir GeorgeB. Airy, BritishRoyal Astronomer,wrotean1865textonleast-squaresmethodsandobservational error.

AdolpheQuetelet, founder of theBelgianRoyal Observatory, andFrancisGalton, direc-tor of Britains Kew Observatory, did little to advance astronomy but were distinguishedpioneers extendingstatistical analysis fromastronomy intothehuman sciences. They par-ticularly laid thegroundwork for regressionbetween correlatedvariables. Theapplicationof least-squarestechniquestomultivariatelinearregressionemergedinbiometrical contextsby Karl Pearsonandhis colleagues in theearly 1900s (Chapter 7).

Theintertwinedhistoryof astronomy andstatisticsduringtheeighteenthandnineteenthcenturiesis detailed in themonographsby Stigler (1986), Porter (1986) andHald (1998).

1.2.3 Statistics in twentieth-century astronomy

Theconnections between astronomy and statistics considerably weakened during therstdecades of the twentieth century as statistics turned its attention principally to biologicalsciences, human attributes, social behavior and statistical methods for industries such aslife insurance, agriculture and manufacturing. Advances in astronomy similarly movedaway from the problem of evaluating errors in measurements of deterministic processesof celestial mechanics. Major efforts on the equilibriumstructure of stars, thegeometry

7/21/2019 052176727X_Astronom

29/503

9 1.2 History of statistics in astronomy

of theGalaxy, thediscovery of theinterstellar medium, thecomposition of stellar atmo-spheres, the study of solar magnetic activity and thediscovery of extragalactic nebulaegenerally did not involvestatistical theory or application. Two distinguished statisticianswroteseriesof papersintheastronomical literature Karl Pearsononcorrelationsbetween

stellar properties around 190711, andJerzy Neyman with Elizabeth Scott on clusteringof galaxies around 195264 but neither had a stronginuence on further astronomicaldevelopments.

Theleast-squaresmethodwasusedinmanyastronomical applicationsduringthersthalf of the twentieth century, but not in all cases. Schlesinger (1916) admonished astronomersestimating elements of binary-star orbits to use least-squares rather than trial-and-errortechniques. The stellar luminosity function derived by Jacobus Kapteyn, and thereby theinferredstructureof theGalaxy,werebasedonsubjectivecurvetting(Kapteyn& vanRhijn1920), although Kapteyn had madesome controversial contributions to themathematicsof skewed distributions andcorrelation. An important study on dark matter in the ComaClustertstheradial distributionofgalaxiesbyeyeanddoesnotquantifyitssimilarity toanisothermal sphere(Zwicky 1937). Incontrast, Edwin Hubblesseminal studiesongalaxieswereoftenbasedonleast-squarests(e.g. theredshiftmagnituderelationshipinHubble&Humason1931), althoughanearly studyreportsanonstandardsymmetrical averageof tworegression lines (Hubble 1926, Section 7.3.2 ). Applications of statistical methods basedonthenormal error law wereparticularly stronginstudiesinvolvingpositional astronomyand star counts (Trumpler & Weaver 1953). Astronomical applications of least-squaresestimation were strongly promoted by the advent of computers and Bevingtons (1969)useful volume with FORTRAN code. Fourier analysis was also commonly used for timeseriesanalysis in thelatter part of thetwentiethcentury.

Despiteits formulationby Fisher in the1920s,maximumlikelihoodestimationemergedonly slowly in astronomy. Early applicationsincluded studies of stellar cluster convergentpoints (Brown 1950), statistical parallaxes from the HertzsprungRussell diagram (Jung1970), andsomeearly work in radioandX-rayastronomy.Crawford et al. (1970) advocateduse of maximum likelihood for estimating power-law slopes, a message we reiterate inthis volume (Section 4.4 ). Maximum likelihood studies with truly broad impact did notemerge until the 1970s. Innovative and widely accepted methods include Lynden-Bells(1971) luminosityfunctionestimator for ux-limitedsamples,Lucys(1974)algorithmforrestoring blurry images, and Cashs (1979) algorithm for parameter estimation involvingphoton counting data. Maximum likelihood estimators became increasingly important inextragalactic astronomy; they were crucial for the discovery of galaxy streaming towardstheGreatAttractor (Lynden-Bell et al. 1988) andcalculatingthegalaxyluminosityfunction

fromux-limited surveys (Efstathiou et al. 1988). The1970s also witnessed therst useandrapid acceptanceof thenonparametric KolmogorovSmirnov statistic for two-sampleandgoodness-of-t tests.

The development of inverse probability and Bayes theorem by Thomas Bayes andLaplaceinthelateeighteenthcentury took placelargely withoutapplicationstoastronomy.Despitetheprominenceof theleadingBayesianproponentSir HaroldJeffreys,whowontheGoldMedal of theRoyal Astronomical Society in 1937andservedasSocietyPresident in

7/21/2019 052176727X_Astronom

30/503

10 Introduction

the1950s,Bayesianmethodsdidnotemergeinastronomyuntil thelatterpartofthetwentiethcentury. Bayesian classiers for discriminatingstars andgalaxies (based onthe2001textwritten for engineers by Duda et al.) were used to construct large automated sky surveycatalogs (Valdes 1982), and maximum entropy image restoration gained some interest

(Narayan& Nityananda1986). Butitwasnotuntil the1990sthatBayesianmethodsbecamewidespreadin importantstudies, particularly inextragalactic astronomy andcosmology. The modern eld of astrostatistics grew suddenly and rapidly starting in the late

1990s. This was stimulated in part by monographs on statistical aspects of astronom-ical image processing (Starck et al. 1998, Starck & Murtagh 2006), galaxy clustering(Martnez & Saar 2001), Bayesiandataanalyses (Gregory 2005) andBayesiancosmology(Hobson et al. 2010). Babu & Feigelson (1996) wrote a brief overview of astrostatistics. The continuing conference series Statistical Challenges in Modern Astronomy organizedby us since 1991 brought together astronomers and statisticians interested in forefrontmethodological issues (Feigelson & Babu 2012). Collaborations between astronomersandstatisticians emerged, such as the CaliforniaHarvard Astro-Statistical Collaboration(http://hea-www.harvard.edu/AstroStat), the International Computational AstrostatisticsGroup centered in Pittsburgh (http://www.incagroup.org), and the Center for Astrostatis-tics at Penn State (http://astrostatistics.psu.edu). However, the education of astronomersin statistical methodology remains weak. PennStates Center andother institutes operateweek-long summer schools in statistics for young astronomers to partially address thisproblem.

1.3 Recommended reading

Weoffer hereanumber of volumes with broad coveragein statistics. Stiglersmonographreviews the history of statistics and astronomy. Rice, Hogg & Tanis, andHogg et al. arewell-respected textbooksin statistical inferenceat undergraduateandgraduatelevels, andWasserman gives amodern viewpoint. Lupton, James, and Wall & Jenkins arewritten byandfor physical scientists.Ghosh et al. andGregory introduceBayesianinference.

Ghosh, J. K., Delampady, M. & Samanta, T. (2006) An Introduction to Bayesian Analysis: Theory and Methods , Springer,BerlinA graduate-level textbookinBayesianinferencewithcoverageof theBayesianapproach,objective and reference priors, convergence and large-sample approximations, model

selection and testing criteria, Markov chain Monte Carlo computations, hierarchicalBayesian models, empirical Bayesian models andapplications to regression and high-dimensional problems.

Gregory, P. (2005) Bayesian Logical Data Analysis for the Physical Sciences , CambridgeUniversity Press This monograph treats probability theory and sciences, practical Bayesian inference,frequentists approaches, maximumentropy, linear andnonlinear model tting, Markov

7/21/2019 052176727X_Astronom

31/503

11 1.3 Recommended reading

chain MonteCarlo, harmonic timeseriesanalysis, andPoissonproblems. Examplesareillustratedusing Mathematica .

Hogg, R., McKean, J. & Craig, A. (2005) Introduction to Mathematical Statistics , 6th ed.,PrenticeHall, EnglewoodCliffsA slim text aimed at graduate students in statistics that includes Bayesian methodsand decision theory, hypothesis testing, sufciency, condence sets, likelihood theory,prediction, bootstrap methods, computational techniques (e.g. bootstrap, MCMC), andother topics(e.g. pseudo-likelihoods, Edgeworthexpansion, Bayesianasymptotics).

Hogg, R. V. & Tanis,E. (2009) Probability and Statistical Inference , 8thed., Prentice-Hall,EnglewoodCliffsA widely used undergraduate text covering randomvariables, discreteand continuousdistributions, estimation,hypothesis tests, linear models,multivariatedistributions,non-parametric methods, Bayesianmethodsandinferencetheory.

James, F. (2006) Statistical Methods in Experimental Physics , 2nded., World Scientic,Singapore This excellent volume covers concepts in probability, distributions, convergence the-orems, likelihoods, decision theory, Bayesian inference, point and interval estimation,hypothesis tests andgoodness-of-t.

Lupton,R. (1993) Statistics in Theory and Practice , PrincetonUniversity Press This slimmonograph explains probability distributions, sampling statistics, condenceintervals,hypothesis tests,maximumlikelihoodestimation,goodness-of-tandnonpara-metric rank tests.

Rice, J. A. (2007) Mathematical Statistics and Data Analysis , 3rded., Duxbury Press

Anundergraduate-level textwithbroadcoverageofmodernstatisticswithboththeoryandapplications. Topics covered include probability, statistical distributions, Central Limit Theorem, survey sampling, parameter estimation, hypothesis tests, goodness-of-t, datavisualization, two-samplecomparisons, bootstrap,analysisof variance, categorical data,linear leastsquares, Bayesianinferenceanddecisiontheory.

Stigler, S. M. (1986) The History of Statistics: The Measurement of Uncertainty before 1900 , HarvardUniversity Press This readable monograph presents the intellectual history of the intertwined develop-mentsinastronomy andstatistics duringtheeighteenthandnineteenthcenturies. Topicsincludecombiningobservations, leastsquares, inverseprobability (Bayesianinference),

correlation, regression andapplicationsinastronomy andbiology.Wall, J. V. & Jenkins, C. R. (2003) Practical Statistics for Astronomers , Cambridge Uni-

versity PressA useful volume on statistical methods for physical scientists. Coverageincludes con-ceptsofprobabilityandinference,correlation,hypothesistests,modelingbyleastsquaresandmaximumlikelihoodestimation,bootstrapandjackknife,nondetectionsandsurvivalanalysis, timeseriesanalysis andspatial point processes.

7/21/2019 052176727X_Astronom

32/503

12 Introduction

Wasserman,L. (2004) All of Statistics: A Concise Course in Statistical Inference , Springer,BerlinA short text intended for graduatestudents in allied elds presenting a widerange of topicswithemphasisonmathematical foundations.Topicsincluderandomvariables,ex-

pectations, empirical distribution functions, bootstrap, maximumlikelihood, hypothesistesting, Bayesian inference, linear and loglinear models, multivariate models, graphs,density estimation, classication, stochastic processes and simulation methods. AnassociatedWebsiteprovides R codeanddatasets.

7/21/2019 052176727X_Astronom

33/503

2 Probability

2.1 Uncertainty in observational science

Probability theory models uncertainty. Observational scientists often come across eventswhoseoutcomeisuncertain. It may bephysically impossible, too expensiveor even coun-terproductivetoobserveall theinputs.Theastronomer mightwanttomeasurethelocationandmotions of all stars in a globular cluster to understand its dynamical state. But evenwith thebest telescopes, only a fraction of the stars can be located in the two dimensionsof sky coordinates with the third distance dimension unobtainable. Only one component(theradial velocity) of thethree-dimensional velocity vectorcanbemeasured, andthis maybeaccessible for only afew cluster members.Furthermore, limitationsof thespectrographandobserving conditions lead to uncertainty in the measured radial velocities. Thus, ourknowledge of the structure and dynamics of globular clusters is subject to considerablerestrictions anduncertainty.

In developing the basic principles of uncertainty, we will consider both astronomicalsystemsandsimplefamiliar systemssuchasatossedcoin.Theoutcomeof atoss, headsortails,iscompletelydeterminedbytheforcesonthecoinandNewtonslawsofmotion.Butwe

wouldneedtomeasuretoomanyparametersof thecoinstrajectory androtationstopredictwithacceptablereliabilitywhichfaceof thecoinwill beup.Theoutcomesofcoin tossesarethus considered to beuncertain even though they are regulated by deterministic physicalprocesses. Similarly, the observed properties of a quasar have considerable uncertainty,even though the physics of accretion disks and their radiation are based on deterministicphysical processes.

Theuncertainty in our knowledgecould bedueto thecurrent level of understanding of the phenomenon, andmight bereduced in the future. Consider, for example, the predic-tion of solar eclipses. In ancient societies, the motions of Solar System bodies were notunderstoodand the occurrence of a solar eclipse would havebeen modeledas a randomevent (or attributed to divine intervention). However, an astronomer noticing that solar

eclipses occur only on a new moon day could have revised the model with a monthlycycle of probabilities. Further quantitativeprediction would follow fromthe Babylonianastronomers discovery of the18-year saroseclipsecycle. Finally, with Newtonian celes-tial mechanics, thephenomenonbecameessentially completely understoodandthemodelchanged fromarandomtoadeterministic model subject to direct prediction withknownaccuracy.

13

7/21/2019 052176727X_Astronom

34/503

14 Probability

Theuncertainty of our knowledgecould bedue to futurechoices or events. Wecannotpredictwithcertaintytheoutcomeof anelectionyet tobeheld, althoughpolls of thevotingpublic will constrain the prediction. We cannot accurately predict the radial velocity of aglobularstarprior toitsmeasurement,althoughourpriorknowledgeof theclustersvelocity

dispersion will constrain theprediction. Butwhen theelectionresultsaretabulated, or theastronomical spectrumisanalyzed, our level of uncertainty is suddenly reduced.When the outcome of a situation is uncertain, why do we think that it is possible to

model it mathematically? In many physical situations, the events that are uncertain atthe micro-level appear to be deterministic at the macro-level. While the outcome of asingle toss of a coin is uncertain, the proportion of heads in a large number of tossesis stable. While the radial velocity of a single globular cluster star is uncertain, we canmakepredictionswith somecondencebasedonaprior measurementof theglobal clustervelocity andour knowledgeof cluster dynamicsfrompreviousstudies. Probability theoryattempts to capture and quantify this phenomenon; the Law of Large Numbers directlyaddresses the relationship between micro-level uncertainty andmacro-level deterministicbehavior.

2.2 Outcome spaces and events

An experiment is any action that can have a set of possible results where the actuallyoccurringresult cannotbepredictedwithcertainty prior totheaction. Experimentssuchastossingacoin, rollinga die, or counting of photons registered at a telescope, all result insetsof outcomes. Tossingacoin resultsin aset of two outcomes = {H , T }; rolling adieresults in asetof six outcomes

= {1, 2, 3, 4, 5, 6

}; whilecountingphotonsresults in

aninnitesetof outcomes = {0, 1, 2, . . . }. Thenumber of neutronstarswithin 1kpcof theSun is adiscreteand nitesample space. Theset of all outcomes of an experimentis knownas the outcomespace or samplespace.

An event isasubsetof asamplespace. Forexample, consider nowthesamplespace of all exoplanets,wheretheevent E describesall exoplanetswitheccentricityintherange0 .50.6,andtheevent F describesthatthehoststar is abinary system. Thereareessentially twoaspectstoprobabilitytheory: rst, assigningprobabilitiestosimpleoutcomes; andsecond,manipulatingprobabilitiesor simpleevents toderiveprobabilitiesof complicatedevents.

Inthesimplestcases,suchasawell-balancedcointossordieroll, theinherentsymmetriesof the experiment lead to equally likely outcomes. For the coin toss, = {H , T } withprobabilities P (H ) = 0.5 and P (T ) = 0.5. For the die roll, = {1, 2, 3, 4, 5, 6} withP (i ) = 16 for i = 1, 2, . . . , 6. Nowconsider themorecomplicated casewhereaquarter, adimeandanickel aretossedtogether.Theoutcomespaceis

= {H H H , H H T , H T H , H T T , T H H , T H T , T T H , T T T }, (2.1)wheretherstletter istheoutcomeof thequarter,thesecondof thedimeandthethirdof thenickel. Again, it is reasonabletomodel all theoutcomesasequally likely withprobabilities

7/21/2019 052176727X_Astronom

35/503

15 2.3 Axioms of probability

18 . Thus, when an experiment results in m equally likely outcomes, {e 1, e 2, . . . , e m }, thentheprobabilityof any event A is simply

P (A) = #Am

, (2.2)

where #is read the number of. That is, P (A) is the ratio of the number of outcomesfavorableto A andthetotal number of outcomes.

Even when the outcomes arenot equally likely, in some cases it is possible to identifytheoutcomes as combinations of equally likely outcomes of another experimentandthusobtain a model for the probabilities. Consider the three-coin tosswherewe only note thenumber of heads. Thesamplespaceis = {0, 1, 2, 3}. Theseoutcomescannotbemodeledas equally likely. In fact, if we toss three coins 100 times, then we would observe that

{1, 2} occur far morefrequently than {0, 3}. Thefollowingsimple argument will lead toalogical assignmentof probabilities.Theoutcome in this experimentis relatedtotheoutcomes in( 2.1 ):

= 0whenTTT occurs = 1whenHTT,THT or TTH occurs = 2whenHHT, HTH or THH occurs = 3whenHHH occurs.

Thus P (0) = P (3) = 0.125and P (1) = P (2) = 0.375.For nite (or countably innite) sample spaces = {e 1, e 2, . . . }, a probability model

assignsanonnegativeweight p i to theoutcome e i for every i insuchaway that the p i saddupto1. A nite(orcountably innite) samplespaceis sometimescalleda discretesamplespace . For example, when exploring thenumber of exoplanetsorbiting stars within 10 pcof the Sun, we consider a discretesample space. In the caseof countable sample spaces,wedenetheprobability P (A) of anevent A as

P (A) =i : e i A

p i . (2.3)

In words, this says that theprobability of an event A is equal to thesumof the individualprobabilitiesof outcomes e i belongingto A.

If thesamplespace is uncountable, thennotall subsetsareallowedtobecalledeventsfor mathematical and technical reasons. Astronomers deal with both countable spaces suchas thenumber of starsin theGalaxy, or theset of photons fromaquasar arrivingat adetector anduncountablespaces such as thevariability characteristics of aquasar, orthebackgroundnoiseinan imageconstructedfrominterferometry observations.

2.3 Axioms of probability

A probability space consistsof the triplet ( , F , P ), with sample space , aclass F of events,andafunction P thatassignsaprobability toeacheventin F thatobey threeaxiomsof probability:

7/21/2019 052176727X_Astronom

36/503

16 Probability

C D

C D

E F

G

E F G

Fig. 2.1 Union and intersection of events.

Axiom1 0 P (A) 1, for all events AAxiom2 P ( ) = 1Axiom3 For mutually exclusive(pairwisedisjoint) events A1, A2, . . . ,

P (A1A2A3 ) = P (A1) +P (A2) +P (A3) + ,that is, if for all i = j , Ai A j = (denotes theempty set or null event), then

P

i =1Ai =

i =1P (Ai ).

Here, represents the union of sets while represents their intersection. Axiom 3states that theprobability that at leastoneof themutually exclusiveevents Ai occursis thesame as the sumof the probabilities of the events Ai , and this should hold for innitelymanyevents.Thisisknownasthe countableadditivity property. Thisaxiom, inparticular,implies that theniteadditivity propertyholds; that is, for mutually exclusive(or disjoint)events A, B (i.e. AB = ),

P (AB ) = P (A) +P (B ). (2.4)

Thisinparticular impliesthatfor any event A, theprobability of itscomplement Ac = { : / A}, theset of points in thesamplespacethat arenotin A, is givenby

P (Ac ) = 1 P (A). (2.5)(A technical commentcanbemadehere: in thecaseof an uncountable samplespace , itis impossible todene a probability function P that assigns zero weight to singletonsetsandsatisfyingtheseaxiomsfor all subsetsof .)

Usingtheaboveaxioms, it is easy toestablish that for any two events C , D

P (C

D )

= P (C )

+P (D )

P (C

D )

; (2.6)

that is, the probability of the union of the two events is equal to the sum of the eventprobabilitiesminus theprobability of theintersection of thetwo events. This is illustratedin theleft-handpanel of Figure 2.1 .

For threeevents E , F , G ,

P (E F G ) = P (E ) +P (F ) +P (G ) P (E F ) P (F G )P (E G ) +P (E F G ) (2.7)

7/21/2019 052176727X_Astronom

37/503

17 2.4 Conditional probabilities

as shownin theright-handpanel of Figure 2.1 . Thegeneralization to n events, E 1, . . . , E n is called the inclusionexclusionformula ;

P (E 1

E 2

E n )

=

i =1P (E i )

i 1< i 2P (E i 1

E i 2 )

+ (1) r +1 i 1< i 2< < i r P (E i 1 E i 2 E i r )+ +(1)n +1P (E 1 E 2 E n ), (2.8)

wherethesummation

i 1< i 2< < i r P (E i 1 E i 2 E i r ) (2.9)

is takenover all of the n r possible subsetsof size r of theset {1, 2, . . . , n }.

2.4 Conditional probabilities

Conditional probabilityisoneof themostimportantconceptsin probabilitytheoryandcanbetrickytounderstand. Itoftenhelpsin computingdesiredprobabilities,particularlywhenonly partial informationregardingaresult of anexperimentis available. Bayes theorematthefoundationof Bayesianstatisticsusesconditional probabilities.

Consider thefollowingsimpleexample.Whenadie is rolled, theprobability that it turnsuponeof thenumbers

{1, 2, 3

} is 1/ 2, as eachof thesix outcomes is equally likely. Now

consider that someone took abrief glimpseat thedie andfoundthat it turned upan evennumber. How does this additional information inuence the assignment of probability toA = {1, 2, 3}? Inthis case, theweightsassignedtothepointsarereassessedbygivingequalweights, 1 / 3, to each of the three even integers in B = {2, 4, 6} and zero weights to theoddintegers. Sinceit is alreadyknownthat B occurred, it is intuitivetoassignthescaleandprobability 1 to B andprobability 0 to thecomplementary event B c . Now as all thepointsin B areequally likely, it follows that therequiredprobability is theratio of thenumber of pointsof A that arein B to thetotal number of points in B . Since2is theonly number fromA in B , therequiredprobability is # (AB )/ #B = 1/ 3.

Generalizingthis, letusconsider an experimentwith m equally likely outcomes andletA and B be two events. If we are given the information that B has happened, what is theprobability that A has happened in light of thenew knowledge? Let # A = k , #B = n and#(A B ) = i . Then, as in the rolled die example above, given that B has happened, thenew probabilityallocationassignsprobability1 / n to all theoutcomes in B . Outof these n ,#(AB ) = i outcomesbelongto A. Notingthat P (AB ) = i / m and P (B ) = n / m , it leadsto theconditional probability, P (A | B ) = i / n , of A given B ,

P (A | B ) = P (AB )

P (B ). (2.10)

7/21/2019 052176727X_Astronom

38/503

18 Probability

Equation (2.10 ) can beconsidered a formal denition of conditional probabilities pro-viding P (B ) > 0,eveninthemoregeneral casewhereoutcomesmay notbeequally likely.Asaconsequence, themultiplicativeruleof probability for two events,

P (A

B )

= P (A

| B )P (B ) (2.11)

holds. The multiplicationrule easily extends to n events:

P (A1 A2 . . . An ) =P (A1) P (A2 | A1) . . . P (An 1 | A1, . . . An 2) P (An | A1, . . . An 1). (2.12)

These concepts are very relevant to observational sciences such as astronomy. Exceptfor therarecircumstancewhenanentirely newphenomenonisdiscovered, astronomersaremeasuringpropertiesofcelestial bodiesorpopulationsforwhichsomedistinctivepropertiesarealreadyavailable. Consider, for example, asubpopulation of galaxies found toexhibitSeyfert-like spectra in the optical band(property A) that havealready been examined for

nonthermal lobes in the radio band (property B ). Then the conditional probability thata galaxy has a Seyfert nucleus given that it also has radio lobes is given by Equation(2.10 ), and this probability can be estimated from careful study of galaxy samples. Thecomposition of aSolar Systemminor body can bepredominately ices or rock. Icybodiesaremorecommonatlargeorbital distancesandshowspectral signaturesof water (or other)ice rather than the spectral signatures of silicates. The probability that a given asteroid,comet or Kuiper Belt Object is mostly icy is then conditioned on its semi-major axis andspectral characteristics.

2.4.1 Bayes theorem

Wearenow ready toderivethefamous Bayes theorem, also knownas Bayes formula orBayes rule. It is namedfor themid-eighteenth-centuryBritishmathematicianandPresby-terian minister Thomas Bayes, although it was recognized earlier by James Bernoulli andAdriandeMoivre, andwas later fully explicatedby PierreSimonLaplace. Let B 1, . . . , B k beapartitionof thesample space . A partitionof is acollectionof mutually exclusive(pairwisedisjoint) setswhoseunion is ; that is, B i B j = 0 for i = j . If A is any eventin , then tocompute P (A) , onecan useprobabilitiesof piecesof A oneachof thesets B i andaddthemtogether toobtain

P (A) = P (A|B 1)P (B 1) + +P (A|B k )P (B k ). (2.13)

This is called the lawof total probability andfollowsfromtheobservationP (A) = P (AB 1) + +P (AB k ), (2.14)

andthemultiplicativeruleof probability, P (AB i ) = P (A|B i )P (B i ).Now consider the following example, abit morecomplicated than those treated above.

Suppose a box contains ve quarters, of which one is a trick coin that has heads on bothsides. A coin is picked at randomand tossed three times. It was observed that all threetosses turnedupheads.

7/21/2019 052176727X_Astronom

39/503

19 2.4 Conditional probabilities

If thetypeof thecoinchosenisknown, thenonecaneasilycomputetheprobabilityof theevent H that all threetossesyield heads. If it is thetwo-headedcoin, thentheprobability is1, otherwiseit is 1 / 8. That is, P (H | M ) = 1and P (H | M c ) = 1/ 8,where M denotes theeventthat thetwo-headedcoin ischosenand M c is thecomplimentary eventthat aregular

quarter is chosen.After observingthreeheads, what is theprobability that thechosencoinhasbothsidesheads? Bayes theoremhelpstoanswer this question. Here, by usingthelawof total probability andthemultiplicationrule,oneobtains,

P (M | H ) = P (M H )

P (H ) = P (H | M )P (M )

P (H | M )P (M ) +P (H | M c )P (M c )=

1/ 5(1/ 5) +(1/ 8) (4/ 5) =

23

. (2.15)

For a partition B 1, . . . , B k of , Bayes theorem generalizes the above expression toobtain P (B i | A) in termsof P (A | B j ) and P (B j ) , for j = 1, . . . , k . Theresult is very easytoprove, andis thebasisof Bayesianinferencediscussedin Section 3.8 .

Theorem 2.1 (Bayes theorem) If B 1, B 2, . . . , B k is a partition of the sample space, then for i = 1, . . . , k ,

P (B i | A) = P (A | B i )P (B i )

P (A | B 1)P (B 1) + +P (A | B k )P (B k ). (2.16)

Bayes theorem thus arises directly from logical inference based on the three axiomsof probability. While it applies to any formof probabilitiesand events, modern Bayesianstatisticsadopts aparticular interpretation of theseprobabilities, which wewill present inSection 3.8 .

2.4.2 Independent events Theexamples aboveshow that, for any two events A and B , theconditional probability of A given B , P (A | B ) , is not necessarily equal to theunconditional probability of A, P (A).Knowledge of B generally changes the probability of A. In the special situation whereP (A | B ) = P (A) wheretheknowledgethat B has occurredhas not altered theprobabilityof A, A and B aresaid to be independent events . As theconditional probability P (A | B )is not dened when P (B ) = 0, themultiplication rule P (AB ) = P (A | B )P (B ) will beusedtoformally deneindependence:

De nition 2.2 Two events A and B aredenedtobeindependentif

P (AB ) = P (A)P (B ). Thisshowsthatif Aisindependentof B , then B isindependentof A. Itisnotdifcult toshowthat if A and B areindependent, then A and B c areindependent, Ac and B are independentandalso Ac and B c areindependent.

Note that three events E , F , G satisfying P (E F G ) = P (E )P (F )P (G ) cannot becalled independent, as it does not guarantee independence of E , F or independence of F , G or independenceof E , G . This can be illustrated with asimple example of a sample

7/21/2019 052176727X_Astronom

40/503

20 Probability

space = {1, 2, 3, 4, 5, 6, 7, 8}, whereall thepointsareequally likely. Consider theeventsE = {1, 2, 3, 4}, F = G = {4, 5, 6, 7}. Clearly P (E F G ) = P (E )P (F )P (G ). Butneither E and F , F and G , nor E and G are independent.

Similarly, wenotethat independence of A and B , B and C , and A and C together does

not imply P (A B C ) = P (A)P (B )P (C ). If we consider the events A = {1, 2, 3, 4},B = {1, 2, 5, 6}, C = {1, 2, 7, 8} then clearly A and B are independent, B and C areindependent, and also A and C are independent, as A, B , C each contain exactly fournumbers,

P (A) = P (B ) = P (C ) = 48 =

12

, (2.17)

but AB = B C = AC = A B C = {1, 2}andP (AB ) = P (B C ) = P (AC ) = P ({1, 2}) =

28 =

14

. (2.18)

However,

P (AB C ) = P ({1, 2}) = 14 =

18 = P (A)P (B )P (C ). (2.19)

Though A and B areindependent, and A and C are independent, P (A | B C ) = 1. So A isnotindependentof B C . This leads to thefollowingdenition:De nition 2.3 (Independent events) A set of A1, . . . , An events is said to be independent if,for every subcollection AI 1 , . . . , Ai r , r n ,

P (AI 1 AI 2 AI r ) = P (AI 1 )P (AI 2 ) P (AI r ). (2.20)An innite set of events is dened to be independent if every nite subcollection of

theseeventsis independent. It is worth notingthat for thecaseof threeevents, A, B , C areindependentif all thefollowingfour conditionsaresatised:

P (AB C ) =P (A)P (B )P (C ),P (AB ) =P (A)P (B ),P (B C ) =P (B )P (C ),P (AC ) =P (A)P (C ). (2.21)

2.5 Randomvariables

Often, insteadof focusingontheentire outcomespace, it may besufcient toconcentrateonasummary of outcomesrelevanttotheproblemathand, say afunctionof theoutcomes.In tossingacoin four times, it may besufcient to look at thenumber of heads instead of theorder in which they areobtained. Inobservingphotons fromanastronomical source, itmay besufcienttolookat themeannumber of photons in aspectral bandover sometimeinterval, or theratio of photons in two spectral bands, rather than examiningeachphotonindividually.

7/21/2019 052176727X_Astronom

41/503

21 2.5 Random variables

These real-valued functions on the outcome space or samplespace are called randomvariables . Dataare realizations of randomvariables. Typically a randomvariable X is afunction onthesamplespace . In thecaseof countablesamplespaces , this denitionalways works. But in the case of uncountable , one should be careful. As mentioned

earlier, not all subsets of an uncountable space can be called an event, or a probabilityassigned to them. A randomvariable is a function such that { : X () a }, is aneventfor all real numbers a . Inpractical situations, thecollectionof eventscanbedenedtobeinclusiveenough that thesetof events followscertain mathematical conditions (closureunder complementation, countableunionsandintersections). So in practice, the technicalaspectscanbeignored.

Notethatincasual usage, somepeoplelabel aphenomenonasrandomtomeanthattheeventshaveequal chancesofpossibleoutcomes.Thisconceptiscorrectlycalled uniformity . The concept of randomness does not require uniformity. Indeed, the following sectionsandChapter 4 arelargely devotedtophenomenathat follow nonuniformdistributions.

2.5.1 Densityand distribution functions

A random variable is called a discrete randomvariable if it maps a sample space to acountableset(e.g.theintegers)witheachvalueintherangehavingprobability greater thanor equal tozero.

De nition 2.4 (Cumulative distribution function) The cumulative distribution function(c.d.f.) or simply the distributionfunction F of arandomvariable X is denedas

F (x ) = P (X x ) = P ( : X () x ), (2.22)for all real numbers x . In thediscretecasewhen X takesvalues x 1, x 2, . . . , then

F (x ) = P (X x ) =x i x

P (X = x i ). (2.23) Thec.d.f. F is anondecreasing, right-continuous functionsatisfying

limx

F (x ) = 0and limx F (x ) = 1. (2.24) The c.d.f. of a discrete random variable is called a discrete distribution . A random

variable with a continuous distribution function is referred to as a continuous randomvariable. A continuous randomvariable mapsthesamplespacetoan uncountable set (e.g.thereal numbers).Whiletheprobabilitythatacontinuousrandomvariabletakesanyspecicvalueis zero, theprobability that it belongs toan inniteset of values suchas an interval

may bepositive. Itshould beunderstoodclearly that therequirementthat X isacontinuousrandomvariabledoesnotmeanthat X () is acontinuousfunction; in fact,continuity doesnot makesensein thecaseof anarbitrary samplespace, .

Often some continuous distributions are described through the probability densityfunction (p.d.f.). A nonnegativefunction f is called theprobability density function of adistributionfunctionif for all x

F (x ) = x f (y )dy . (2.25)

7/21/2019 052176727X_Astronom

42/503

7/21/2019 052176727X_Astronom

43/503

23 2.5 Random variables

Thesamedenition of expectation as in (2.28 ) and ( 2.29 ) can beused for any discreterandomvariable X takinginnitely many nonnegativevalues.However,difcultiesmaybeencounteredindeningtheexpectationof arandomvariabletakinginnitely manypositiveandnegativevalues. Consider thecasewhere W isarandomvariablesatisfying

P (W = 2 j ) = P (W = 2 j ) = 2 j 1, for j = 1, 2, . . . (2.30)In this case, the expectation E [W ] cannot be dened, as both the positive part W + =max (0, W ) and the negative part W = max (0, W ) have innite expectations. Thiswould make E [X ] to be , which is meaningless. However, for a general discreterandomvariable, E [X ] canbedenedas in ( 2.28 ) provided

i |x i |P (X = x i ) < . (2.31)

In case the distribution F of a randomvariable X has density f as in ( 2.25 ), then theexpectation isdenedas

E [X ] = y f (y )dy , provided |y | f (y )dy < . (2.32) Theexpectationof afunction h ofarandomvariable X canbedenedsimilarlyasin ( 2.29 ),provided i |h (x i )|P (X = x i ) < in thediscretecase, and

E [h (X )] = h (y ) f (y )dy provided |h (y )| f (y )dy < (2.33)in casethedistributionof X hasdensity f .

Another importantandcommonly usedfunctionof adistributionfunctionthatquantiesthespread is thesecondmomentcentered onthemean, knownas the variance andoftendenotedby 2. Thevarianceis denedby

2 =Var (X ) = E (X ) 2 = E [X 2] 2, (2.34)where = E [X ].

The mean and variance need not be closely related, as seen in the following simpleexample. Let X be a random variable takingvalues 1 and 1 with probability 0 .5 each,andlet Y bearandomvariable takingvalues 1000 and 1000 with probability 0 .5. BothX and Y havethesamemean( = 0), but Var (X ) = 1and Var (Y ) = 106.

It is helpful to derivethevarianceof thesumof randomvariables. If X 1, X 2, . . . , X n aren randomvariables, wendthat

E

n

i =1X i =

n

i =1E [X i ] (2.35)

andthevarianceof thesum n i =1 X i canbeexpressedas

Var n

i =1X i =

n

i =1Var (X i ) +

n

i =1

n

j =1i = j

Co v( X i , X j ), where

Co v( X , Y ) = E [(X E [X ])( Y E [Y ])]. (2.36)

7/21/2019 052176727X_Astronom

44/503

24 Probability

The Co v quantity is the covariance measuring the relation between the scatter in tworandomvariables. If X and Y are independent randomvariables, then Co v( X , Y ) = 0andVar ( n i 1 X i ) =

n i =1 Var (X i ), while the converse is not true; some dependent variablesmay havezerocovariance.

If all the X i variables havethesamevariance 2, thesituation is called homoscedastic .If X 1, X 2, . . . , X n areindependent, thevarianceof thesample mean X = (1/ n ) n i =1 X i isgivenby

Var ( X ) = 1n 2

n

i =1Var (X i ) =

2

n . (2.37)

The variance essentially measures the mean square deviation from the mean of thedistribution. The square root of the variance, , is called the standard deviation . Themean and thestandard deviation of a randomvariable X areoften used toconvert X to a standardizedform

X std = X

(2.38)

with mean zero andvarianceunity. This important transformationalso removes theunitsof theoriginal variable. Other transformationsalso reducescaleandrender avariable freefromunits, suchas thelogarithmic transformationoftenusedby astronomers. It shouldberecognizedthat thelogarithmic transformationisonly oneof many optional variabletrans-formations.Standardizationisoftenpreferredbystatisticianswithmathematical propertiesuseful in statistical inference.

The third central moment E [(X E [X ])3] provides information about the skewnessof thedistributionof a randomvariable; that is, whether thedistributionof X leansmoretowardsrightorleft. Higher order momentslikethe k -thorder moment E [X k ] alsoprovidesomeadditional informationaboutthedistributionof therandomvariable.

2.5.2 Independent and identically distributed random variables

Whenrepeatedobservationsaremade, orwhenanexperimentisrepeatedseveral times,thesuccessiveobservationsleadtoindependentrandomvariables.If thedataaregeneratedfromthesamepopulation, then theresultantvalues can beconsideredas randomvariableswithacommondistribution.Theseareasequenceof independentandidenticallydistributedor i.i.d. randomvariables.Inthei.i.d.case, therandomvariableshaveacommonmeanandvariance(if thesemomentsexist).

Someobservational studiesinastronomyproducei.i.d.randomvariables.Theredshiftsof galaxiesinanAbell cluster, theequivalentwidthsof absorptionlinesin aquasar spectrum,theultravioletphotometryof acataclysmicvariableaccretiondisk,andtheproper motionsof a sample of Kuiper Belt bodies will all be i.i.d. if the observational conditions areunchanged. But thei.i.d. conditions areoften violated. Thesamplemay beheterogeneouswith objects drawn from different underlying distributions. The observations may havebeentakenunder differentconditionssuchthat themeasurementerrorsdiffer.Thisleadsto

7/21/2019 052176727X_Astronom

45/503

25 2.6 Quantile function

aconditioncalled heteroscedasticity thatviolatesthei.i.d.assumption.Heteroscedasticitymeansthat different datapointshavedifferentvariances.

Sinceagreatmany methodsof statistics, bothclassical andmodern, dependonthei.i.d.assumption, it is crucial that astronomers understand the concept and its relationship to

thedatasetsunder study. Incorrectuseof statistics that requirei.i.d. will lead to incorrectquantitative results, and thereby increase the risk of incorrect or unsupported scienticinferences.

2.6 Quantile function

Thecumulativedistributionfunction F (x ) estimatesthevalueof thepopulationdistributionfunctionat achosenvalueof x . But theastronomer often askstheinversequestion: Whatvalue of x corresponds to a specied valueof F (x )? This answers questions likeWhatfraction of galaxies haveluminosities above L? or At what agehave95% of stars losttheir protoplanetary disks? This requiresestimationof the quantilefunction of arandomvariable X , theinverseof F , denedas

Q (u ) = F 1(u ) = inf {y : F (y ) u } (2.39)where0 < u < 1. Here inf (inmum) refers to the smallest valueof y with the propertyspeciedin thebrackets.

Whenlargesamplesareconsidered, thequantilefunctionisoftenconvenientfor scienticanalysis as the largenumber of datapoints arereduced toasmaller controllednumber of interestingquantilessuchasthe5%,25%,50%,75%and95%quantiles.A quantilefunctionfor an astronomical dataset is compared to the more familiar histogram in Figure 6.1of Chapter 6. Quantile-quantile (Q-Q) plots are often used in visualization to comparetwo samples or one sample with a probability distribution. Q-Q plots are illustrated inFigures 5.4 , 7.2 , 8.2 and 8.6 .

Butwhensmall samplesareconsidered, thequantilefunctioncanbequiteunstable.Thisis readily understood: for asample of n = 8points, the25% and75% quartilesaresimplythevalues of thesecondandsixthdatapoints, but for n = 9 interpolationis neededbasedonvery little informationabout theunderlyingdistributionof

Documents

052176727X_Astronom