052176727X_Astronom

  • Upload
    ccnene

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

  • 7/21/2019 052176727X_Astronom

    1/503

  • 7/21/2019 052176727X_Astronom

    2/503

  • 7/21/2019 052176727X_Astronom

    3/503

    Modern Statistical Methods for Astronomy

    Modern astronomical research is beset with avast rangeof statistical challenges, rangingfrom reducing data from megadatasets to characterizing an amazing variety of variablecelestial objectsor testingastrophysical theory. Linkingastronomy totheworldof modernstatistics, this volumeis auniqueresource, introducing astronomers toadvancedstatisticsthroughready-to-usecodein thepublic-domain R statistical softwareenvironment.

    The book presents fundamental results of probability theory and statistical inference,before exploring several elds of applied statistics, such as data smoothing, regression,multivariate analysis and classication, treatment of nondetections, timeseries analysis,andspatial pointprocesses.Itappliesthemethodsdiscussedtocontemporary astronomicalresearch datasets using the R statistical software, making it an invaluable resource forgraduatestudentsandresearchers facingcomplex dataanalysis tasks.

    A link totheauthors website for this book can befoundat www.cambridge.org/msma.Material availableontheir websiteincludesdatasets, R codeanderrata.

    Eric D. Feigelson is a Professor in the Department of Astronomy and Astrophysics atPennsylvania StateUniversity. He is a leading observational astronomer and has workedwithstatisticiansfor 25years tobringadvancedmethodology toproblems inastronomicalresearch.

    G. Jogesh Babu is Professor of Statistics and Director of the Center for Astrostatistics atPennsylvaniaStateUniversity.Hehasmadeextensivecontributionstoprobabilisticnumbertheory, resamplingmethods, nonparametric methods, asymptotic theory, andapplicationstobiomedical research, genetics, astronomy, andastrophysics.

  • 7/21/2019 052176727X_Astronom

    4/503

  • 7/21/2019 052176727X_Astronom

    5/503

    Modern Statistical Methods forAstronomy

    WithRApplications

    ERIC D. FEIGELSONPennsylvaniaState University

    G. JOGESH BABUPennsylvaniaState University

  • 7/21/2019 052176727X_Astronom

    6/503

    C A M B R I D GE U N I V E R S I T Y P R E SS

    Cambridge, NewYork, Melbourne, Madrid, CapeTown,Singapore, S ao Paulo, Delhi, MexicoCity

    CambridgeUniversity Press TheEdinburghBuilding, CambridgeCB28RU,UK

    Publishedin theUnitedStatesof Americaby CambridgeUniversity Press, New York

    www.cambridge.orgInformationonthis title: www.cambridge.org/9780521767279

    C E. D.FeigelsonandG. J. Babu2012

    Thispublicationis incopyright. Subject to statutoryexceptionandtotheprovisionsof relevantcollectivelicensingagreements,noreproductionof anypart may takeplacewithout thewritten

    permissionof CambridgeUniversity Press.

    Firstpublished2012

    Printedin theUnitedKingdomattheUniversity Press, Cambridge

    A catalog record for this publi cation is avail able from the Bri tish Library

    Library of Congress Catalog in Publi cation data Feigelson, Eric D.

    Modern statistical methodsfor astronomy : withR applications/Eric D.Feigelson,G. JogeshBabu.

    p. cm.ISBN 978-0-521-76727-9(hardback

    1.Statistical astronomy. I. Babu,Gutti Jogesh, 1949 II. Title.QB149.F45 2012

    520.72 7 dc23 2012009113

    ISBN 978-0-521-76727-9Hardback

    Additional resourcesfor this publication: www.cambridge.org/msma

    CambridgeUniversity Presshasnoresponsibility for thepersistenceoraccuracyof URLsfor external or third-party internet websitesreferredto

    in this publication, anddoesnot guaranteethat anycontentonsuchwebsitesis, orwill remain, accurateorappropriate.

  • 7/21/2019 052176727X_Astronom

    7/503

    For Zoe, Claraand Micah

    In memory of myparents,Nagarathnamand Mallayya

  • 7/21/2019 052176727X_Astronom

    8/503

  • 7/21/2019 052176727X_Astronom

    9/503

    Contents

    Preface page xv

    1 Introduction 11.1 Theroleof statistics inastronomy 1

    1.1.1 Astronomy andastrophysics 11.1.2 Probability andstatistics 31.1.3 Statisticsandscience 4

    1.2 History of statistics inastronomy 61.2.1 Antiquity through the Renaissance 61.2.2 Foundationsof statistics incelestial mechanics 71.2.3 Statisticsintwentieth-century astronomy 8

    1.3 Recommendedreading 10

    2 Probability 132.1 Uncertainty in observational science 132.2 Outcomespacesandevents 142.3 Axiomsof probability 152.4 Conditional probabilities 17

    2.4.1 Bayes theorem 182.4.2 Independent events 19

    2.5 Randomvariables 202.5.1 Densityanddistribution functions 212.5.2 Independent andidentically distributedr.v.s. 24

    2.6 Quantilefunction 252.7 Discretedistributions 262.8 Continuousdistributions 272.9 Distributions that areneither discretenor continuous 292.10 Limit theorems 30

    2.11 Recommendedreading 302.12 R applications 31

    3 Statistical inference 353.1 Theastronomical context 353.2 Conceptsof statistical inference 363.3 Principles of point estimation 38

    vii

  • 7/21/2019 052176727X_Astronom

    10/503

    viii Contents

    3.4 Techniquesof point estimation 403.4.1 Methodof moments 413.4.2 Methodof least squares 423.4.3 Maximumlikelihoodmethod 43

    3.4.4 Condenceintervals 453.4.5 CalculatingMLEswiththeEM algorithm 473.5 Hypothesis testing techniques 483.6 Resamplingmethods 52

    3.6.1 Jackknife 523.6.2 Bootstrap 54

    3.7 Model selection and goodness-of-t 573.7.1 Nonparametricmethods for goodness-of-t 583.7.2 Likelihood-basedmethods for model selection 603.7.3 Informationcriteriafor model selection 613.7.4 Comparingdifferentmodel families 62

    3.8 Bayesianstatistical inference 633.8.1 Inferencefor thebinomial proportion 643.8.2 Prior distributions 653.8.3 Inferencefor Gaussiandistributions 673.8.4 HypothesestestingandtheBayesfactor 693.8.5 Model selection and averaging 703.8.6 Bayesiancomputation 71

    3.9 Remarks 723.10 Recommendedreading 733.11 R applications 74

    4 Probability distributionfunctions 764.1 Binomial andmultinomial 76

    4.1.1 Ratioof binomial randomvariables 794.2 Poisson 80

    4.2.1 Astronomical context 804.2.2 Mathematical properties 814.2.3 Poissonprocesses 83

    4.3 Normal andlognormal 854.4 Pareto (power-law) 87

    4.4.1 Least-squaresestimation 89

    4.4.2 Maximumlikelihood estimation 904.4.3 Extensionsof thepower-law 914.4.4 MultivariatePareto 924.4.5 Originsof power-laws 93

    4.5 Gamma 944.6 Recommendedreading 964.7 R applications 96

    4.7.1 ComparingPareto distributionestimators 97

  • 7/21/2019 052176727X_Astronom

    11/503

    ix Contents

    4.7.2 Fittingdistributions to data 1014.7.3 Scopeof distributions in R and CRAN 103

    5 Nonparametricstatistics 105

    5.1 Theastronomical context 1055.2 Concepts of nonparametric inference 1065.3 Univariateproblems 107

    5.3.1 KolmogorovSmirnov andother e.d.f. tests 1075.3.2 Robust statistics of location 1105.3.3 Robust statisticsof spread 111

    5.4 Hypothesis testing 1115.4.1 Sign test 1125.4.2 Two-sampleand k -sampletests 112

    5.5 Contingency tables 1135.6 Bivariateandmultivariatetests 1155.7 Remarks 1165.8 Recommendedreading 1175.9 R applications 117

    5.9.1 Exploratory plotsandsummary statistics 1175.9.2 Empirical distributionandquantilefunctions 1215.9.3 Two-sampletests 1245.9.4 Contingency tables 1255.9.5 Scopeof nonparametrics in R and CRAN 127

    6 Data smoothing: density estimation 1286.1 Theastronomical context 1286.2 Conceptsof density estimation 1286.3 Histograms 1296.4 Kernel density estimators 131

    6.4.1 Basic properties 1316.4.2 Choosingbandwidths by cross-validation 1326.4.3 Multivariatekernel density estimation 1336.4.4 Smoothingwithmeasurementerrors 134

    6.5 Adaptivesmoothing 1346.5.1 Adaptivekernel estimators 1346.5.2 Nearest-neighbor estimators 135

    6.6 Nonparametric regression 1366.6.1 NadarayaWatson estimator 1366.6.2 Local regression 137

    6.7 Remarks 1386.8 Recommendedreading 1386.9 R applications 139

    6.9.1 Histogram, quantile functionandmeasurement errors 1396.9.2 Kernel smoothers 140

  • 7/21/2019 052176727X_Astronom

    12/503

    x Contents

    6.9.3 Nonparametric regressions 1446.9.4 Scopeof smoothingin R and CRAN 148

    7 Regression 150

    7.1 Astronomical context 1507.2 Conceptsof regression 1517.3 Least-squares linear regression 154

    7.3.1 Ordinary least squares 1547.3.2 Symmetric least-squaresregression 1557.3.3 Bootstraperror analysis 1567.3.4 Robust regression 1587.3.5 Quantileregression 1607.3.6 Maximumlikelihood estimation 161

    7.4 Weighted least squares 1627.5 Measurement error models 164

    7.5.1 Least-squaresestimators 1667.5.2 SIMEX algorithm 1687.5.3 Likelihood-based estimators 169

    7.6 Nonlinear models 1697.6.1 Poisson regression 1707.6.2 Logistic regression 171

    7.7 Model validation, selectionandmisspecication 1727.7.1 Residual analysis 1737.7.2 Cross-validationandthebootstrap 175

    7.8 Remarks 1767.9 Recommendedreading 1777.10 R applications 177

    7.10.1 Linear modeling 1797.10.2 Generalized linear modeling 1817.10.3 Robust regression 1827.10.4 Quantileregression 1837.10.5 Nonlinear regressionof galaxysurfacebrightnessproles 1847.10.6 Scopeof regressionin R and CRAN 189

    8 Multivariate analysis 1908.1 Theastronomical context 190

    8.2 Concepts of multivariateanalysis 1918.2.1 Multivariatedistances 1928.2.2 Multivariatenormal distribution 194

    8.3 Hypothesis tests 1958.4 Relationships among thevariables 197

    8.4.1 Multiplelinear regression 1978.4.2 Principal components analysis 1998.4.3 Factor andcanonical correlationanalysis 200

  • 7/21/2019 052176727X_Astronom

    13/503

    xi Contents

    8.4.4 Outliers and robust methods 2018.4.5 Nonlinear methods 202

    8.5 Multivariatevisualization 2038.6 Remarks 204

    8.7 Recommendedreading 2058.8 R applications 2068.8.1 Univariatetests of normality 2068.8.2 Preparing thedataset 2088.8.3 Bivariaterelationships 2098.8.4 Principal components analysis 2128.8.5 Multipleregression and MARS 2148.8.6 Multivariatevisualization 2168.8.7 Interactive graphical displays 2178.8.8 Scopeof multivariateanalysis R and CRAN 220

    9 Clustering, classi cation and data mining 2229.1 Theastronomical context 2229.2 Conceptsof clusteringandclassication 224

    9.2.1 Denitionsandscopes 2249.2.2 Metrics, groupcentersandmisclassications 225

    9.3 Clustering 2269.3.1 Agglomerativehierarchical clustering 2269.3.2 k -meansandrelatednonhierarchical partitioning 228

    9.4 Clusters with substructureor noise 2299.5 Mixturemodels 2319.6 Supervisedclassication 232

    9.6.1 Multivariatenormal clusters 2329.6.2 Linear discriminant analysis anditsgeneralizations 2339.6.3 Classication trees 2349.6.4 Nearest-neighbor classiers 2369.6.5 Automated neural networks 2379.6.6 Classier validation, improvement andfusion 238

    9.7 Remarks 2399.8 Recommendedreading 2419.9 R applications 242

    9.9.1 Unsupervisedclusteringof COMBO-17galaxies 242

    9.9.2 Mixturemodels 2469.9.3 Supervisedclassicationof SDSS point sources 2509.9.4 LDA, k -nnandANN classication 2519.9.5 CART andSVM classication 2559.9.6 Scopeof R and CRAN 259

    10 Nondetections: censored and truncateddata 26110.1 Theastronomical context 261

  • 7/21/2019 052176727X_Astronom

    14/503

    xii Contents

    10.2 Conceptsof survival analysis 26310.3 Univariate datasets with censoring 266

    10.3.1 Parametric estimation 26610.3.2 KaplanMeier nonparametricestimator 268

    10.3.3 Two-sampletests 26910.4 Multivariatedatasetswithcensoring 27110.4.1 Correlationcoefcients 27110.4.2 Regressionmodels 272

    10.5 Truncation 27410.5.1 Parametric estimation 27510.5.2 NonparametricLynden-BellWoodroofeestimator 275

    10.6 Remarks 27710.7 Recommendedreading 27810.8 R applications 279

    10.8.1 KaplanMeier estimator 27910.8.2 Two-sample testswithcensoring 28110.8.3 Bivariateandmultivariateproblemswithcensoring 28410.8.4 Lynden-BellWoodroofeestimator for truncation 28710.8.5 Scopeof censoringandtruncation in R and CRAN 290

    11 Time series analysis 29211.1 Theastronomical context 29211.2 Concepts of timeseries analysis 29411.3 Time-domainanalysisof evenly spaceddata 296

    11.3.1 Smoothing 29611.3.2 Autocorrelationandcross-correlation 29711.3.3 Stochasticautoregressivemodels 29811.3.4 Regressionfor deterministic models 301

    11.4 Time-domainanalysisof unevenly spaceddata 30211.4.1 Discrete correlation function 30211.4.2 Structurefunction 304

    11.5 Spectral analysisof evenlyspaceddata 30411.5.1 Fourier power spectrum 30511.5.2 Improving theperiodogram 307

    11.6 Spectral analysisof unevenly spaceddata 30811.6.1 LombScargleperiodogram 308

    11.6.2 Non-Fourier periodograms 31011.6.3 Statistical signicanceof periodogrampeaks 31211.6.4 Spectral analysisof eventdata 31311.6.5 Computational issues 314

    11.7 State-spacemodelingandtheKalmanlter 31511.8 Nonstationary timeseries 31711.9 1 / f noiseor long-memory processes 31911.10 Multivariatetimeseries 322

  • 7/21/2019 052176727X_Astronom

    15/503

    xiii Contents

    11.11 Remarks 32311.12 Recommendedreading 32411.13 R applications 325

    11.13.1 Exploratory timeseriesanalysis 326

    11.13.2 Spectral analysis 32911.13.3 Modelingasanautoregressiveprocess 33011.13.4 Modelingasalong-memory process 33311.13.5 Wavelet analysis 33411.13.6 Scopeof timeseriesanalysis in R and CRAN 336

    12 Spatial point processes 33712.1 Theastronomical context 33712.2 Concepts of spatial point processes 33812.3 Testsof uniformity 34012.4 Spatial autocorrelation 341

    12.4.1 Global measuresof spatial autocorrelation 34112.4.2 Local measures of spatial autocorrelation 343

    12.5 Spatial interpolation 34412.6 Global functions of clustering 346

    12.6.1 Cumulativesecond-moment measures 34612.6.2 Two-point correlation function 348

    12.7 Model-basedspatial analysis 35112.7.1 Models for galaxy clustering 35112.7.2 Models ingeostatistics 353

    12.8 Graphical networksandtessellations 35412.9 Pointsonacircleor sphere 35512.10 Remarks 35712.11 Recommendedreading 35812.12 R applications 359

    12.12.1 Characterizationof autocorrelation 36112.12.2 Variogramanalysis 36212.12.3 Characterizationof clustering 36412.12.4 Tessellations 36812.12.5 Spatial interpolation 37012.12.6 Spatial regressionandmodeling 37312.12.7 Circular andspherical statistics 374

    12.12.8 Scopeof spatial analysis in R and CRAN 377

    Appendix A Notation andacronyms 379

    Appendix B Getting startedwithR 382B.1 History andscopeof R/CRAN 382B.2 Sessionenvironment 382B.3 R object classes 385

  • 7/21/2019 052176727X_Astronom

    16/503

    xiv Contents

    B.4 Basic operationsonclasses 386B.5 Input/output 388B.6 A sample R session 389B.7 Interfaces to other programsandlanguages 394

    B.8 Computational efciency 394B.9 Learningmoreabout R 397B.10 Recommendedreading 398

    Appendix C Astronomical datasets 399C.1 Asteroids 400C.2 Protostar populations 402C.3 Globular cluster magnitudes 403C.4 Stellar abundances 405C.5 Galaxy clustering 406C.6 Hipparcosstars 408C.7 Globular cluster properties 410C.8 SDSS quasars 411C.9 SDSS point sources 413C.10 Galaxy photometry 419C.11 Elliptical galaxy proles 420C.12 X-ray sourcevariability 421C.13 Sunspot numbers 422C.14 Exoplanet orbits 423C.15 Kepler stellar light curves 425C.16 Sloan Digital Sky Survey 428C.17 Fermi gamma-ray light curves 430C.18 Swift gamma-ray bursts 432

    References 434Subject index 462R and CRAN commands 470

    The color plates are to be found between pages 398 and 399 .

  • 7/21/2019 052176727X_Astronom

    17/503

    Preface

    Motivation andgoalsFormanyyears,astronomershavestruggledwith theapplicationof sophisticatedstatisticalmethodologiesto analyze their richdatasetsandaddresscomplex astrophysical problems.Ononehand, at least in theUnitedStates,astronomersreceivelittleornoformal traininginstatistics. Thetraditional methodof educationhasbeeninformal exposuretoafewfamiliarmethods duringearly research experiences. On theother hand, astronomers correctly per-

    ceive that avastworldof appliedmathematical andstatistical methodologieshas emergedinrecentdecades.Butsystematic, broadtraininginmodernstatistical methodshasnotbeenavailable tomostastronomers.

    Thisvolumeseeks toaddress thisproblemat threelevels. First, wepresent fundamentalprinciples andresults of broad elds of statisticsapplicabletoastronomical research. Thematerial isroughlyatalevel ofadvancedundergraduatecoursesinstatistics.Wealsooutlinesome recent advanced techniques that may beuseful for astronomical research to giveaavor of thebreadthof modernmethodology. It is importanttorecognizethatwegiveonlyincompleteintroductionstotheelds, andweguidetheastronomer towardsmorecompleteandauthoritativetreatments.

    Second, we present tutorials on the application of both simple and more advanced

    methods applied to contemporary astronomical research datasets using the R statisticalsoftware package. R has emerged in recent years as the most versatile public-domainstatistical software environment for researchers in many elds. In addition to a coherentlanguage for data analysis and common statistical tools, over 3000 packages have beenadded for advanced analyses in the CRAN archive. We have culled these packages forfunctionalities that may beuseful to astronomers. R can also be linked to other analysissystems and languages such as C, FORTRAN and Python, so that legacy codes can beincludedin an R-basedanalysis and vice versa .

    Third, we hope the book communicates to astronomers our enthusiasm for statisticsas a substantial and fascinating intellectual enterprise. J ust as astronomers use the latestengineeringtobuild their telescopes andapply advanced physics to interpret cosmic phe-nomena, they canbenet fromexploringthemany roadsof analyzingandinterpretingdatathroughmodern statistical analysis.

    Another importantpurposeof thisvolumeis togiveastronomersandother physical sci-entistsabridgetothevastlibraryof specializedtextsandmonographsinstatisticsandalliedelds.WestronglyencourageresearcherswhoareengagedinstatisticaldataanalysistoreadmoredetailedtreatmentsintheRecommendedreading attheendof eachchapter; theyarecarefullychosenfrommanyavailablevolumes.Mostof thematerial inthebookwhichisnot

    xv

  • 7/21/2019 052176727X_Astronom

    18/503

    xvi Preface

    specicallyreferencedinthetextispresentedinmoredetail intheserecommendedreadings. Tofurtherthisgoal,thepresentbookdoesnotshyawayfromtechnical languagethat, thoughunfamiliarintheastronomical community, iscritical for further learningfromthestatisticalliterature. For example, theastronomers upper limits areleft-censored datapoints, a

    power-lawdistribution isaParetodistribution,and1 / f noise isalong-memorypro-cess. Thetextmaketheseconnectionsbetween thelanguagesof astronomy andstatistics,andthecomprehensiveindex canassistthereader inndingmaterial inboth languages.

    The reader may nd theappendices useful. An introduction to R is given in AppendixB. It includes an overview of the programming language and an outline of its statisti-cal functionalities, including the many CRAN packages. R applications to astronomicaldatasetsaregivenattheendof eachchapterwhichimplementmethodsdiscussedinthetext.AppendixC presents18astronomical datasetsillustratingtherangeof statistical challengesthat ariseincontemporary research. Thefull datasetsand R scripts areavailableonlineathttp://astrostatistics.psu.edu/MSMA. Readerscanthuseasily reproducethe R resultsin thebook.

    In this volume, we do not present mathematical proofs underlying statistical results,and we give only brief outlines of a few computational algorithms. We do not reviewresearchat thefrontiers of astrostatistics, except for afew topicswhereastronomers havecontributed critically importantmethodology (such as the treatmentof truncated dataandirregularly spaced timeseries). Only asmall fraction of themany methodological studiesin therecentastronomical literaturearementioned. Someeldsof appliedstatisticsusefulfor astronomy (such as wavelet analysis and imageprocessing) are covered only briey.Finally, only

    2500 CRAN packages were examined for possible inclusion in the book;

    roughly onenew packageisaddedevery day andmany others areextended.

    Audience The main audience envisioned for this volume is graduate students and researchers inobservational astronomy. Wehopeit servesbothasatextbook inacourseondataanalysisor astrostatistics, andasareferencebook tobeconsultedas specic researchproblemsareencountered. Researchers in allied elds of physical science, suchas high-energy physicsandEarthsciences,may alsondportionsof thevolumehelpful. Statisticianscanseehowexistingmethodsrelatetoquestionsinastronomy, providingbackgroundfor astrostatisticalresearchinitiatives.

    Our presentationassumes that the reader has a background in basic linear algebra andcalculus. Familiarity of elementary statistical methods commonly used in the physical

    sciences is also useful; this preparatory material is covered in volumes suchas Bevington& Robinson(2002) andCowan(1998).

    Outlineandclassroom use Theintroduction(Chapter 1)reviewsthelonghistorical relationshipbetweenastronomyandstatisticsandphilosophical discussionsof therelationshipbetweenstatistical andscienticinference.Wethenstartwithprobability theoryandproceedtolayfoundationsof statisticalinference: hypothesis testing, estimation, modeling, resampling and Bayesian inference

  • 7/21/2019 052176727X_Astronom

    19/503

    xvii Preface

    (Chapters 2 and 3). ProbabilitydistributionsarediscussedinChapter 4 andnonparametricstatistics arecoveredinChapter 5.

    Thevolumeproceedstovariouseldsof appliedstatisticsthat restonthesefoundations.Data smoothing is covered in Chapters 5 and 6. Regression is discussed in Chapter 7,

    followedby analysis andclassicationof multivariatedata(Chapters 8 and 9). Treatmentsof nondetections are covered in Chapter 10, followed by the analysis of time-variableastronomical phenomena in Chapter 11. Chapter 12 considers spatial point processes. The book ends with appendices introducing the R software environment and providingastronomical datasets illustrativeof avarietyof statistical problems.

    Wecanmakesomerecommendationregardingclassroomuse.Therstpartof asemestercourse in astrostatistics for astronomy students would be devoted to the principles of statistical inference in Chapters 14 and learning the basics of R in Appendix B. Thesecond part of the semester would be topics of applied statistical methodology selectedfromChapters 512.Wedonotprovidepredenedstudentexerciseswithdenitiveanswers,but rather encourageboth instructorsandstudents to develop open-ended explorations of the contemporary astronomical datasets based on the R tutorials distributed throughoutthe volume. Suggestions for both simple andadvanced problems aregiven in the datasetpresentations(Appendix C).

    Astronomical datasets andR scripts The datasets and R scripts in the book can be downloaded from Penn States Centerfor Astrostatisticsathttp://astrostatistics.psu.edu/MSMA. The R scriptsareself-contained;simplecut-and-pastewill ingestthedatasets, performthestatistical operations,andproducetabular andgraphical results.

    Extensive resources to pursue issues discussed in the book are available on-line. TheR system can be downloaded from http://www.r-project.org and CRAN packages areinstalled on-the-y within an R session. The primary astronomy research literature, in-cluding full-text articles, is available through the NASASmithsonian Astrophysics Data System (http://adswww.harvard.edu).Thousandsofastronomical datasetsareavailablefromthe Vizier serviceat the Centredes Donn ees Stellaires (http://vizier.u-strasbg.fr) and theemerging International Vir tual Observatory All iance (http://ivo.net).TheprimarystatisticalliteraturecanbeaccessedthroughMathSciNet (http://www.ams.org/mathscinet/) providedby theAmericanMathematical Society. Considerablestatistical informationis availableonWikipedia (http://en.wikipedia.org/wiki/Index of statistics articles). Astronomers shouldnote, however, that the best way to learn statistics is often through textbooks andmono-

    graphs writtenby statisticians, suchas thosein therecommendedreading.

    Acknowledgements

    Thisbookemergedfrom25yearsof discussionandcollaborationbetweenastronomersandstatisticians at Penn Stateunder theauspices of theCenter for Astrostatistics. Thevolumeparticularly beneted from the lectures and tutorials developed for the Summer Schools

  • 7/21/2019 052176727X_Astronom

    20/503

    xviii Preface

    in Statistics for Astronomers since2005 andtaught at PennStateand Bangalores IndianInstituteof Astrophysics. Wearegrateful toour dozensof statisticiancolleagueswhohavetaught at the Summer Schools in Statistics for Astronomers for generously sharing theirknowledgeandperspectives. David Hunter andArnab Chakraborty developed R tutorials

    for astronomers. Donald Percival generously gave detailed comments on the time seriesanalysis chapter. Wearegrateful toNancyButkovich andher colleagues for providingex-cellentlibraryservices.Finally,weacknowledgetheNational ScienceFoundation,NationalAeronautics andSpaceAdministration, andtheEberly Collegeof Sciencefor supportingastrostatistics at PennStateover manyyears.

    Eric D.FeigelsonG. JogeshBabu

    Center for AstrostatisticsPennsylvaniaStateUniversity

    University Park, PA, U.S.A.

  • 7/21/2019 052176727X_Astronom

    21/503

    1 Introduction

    1.1 The role of statistics in astronomy

    1.1.1 Astronomy andastrophysics

    Today, the term astronomy is best understood as shorthand for astronomy andastrophysics. Astronomy ( astro

    =star and nomen

    =name in ancient Greek) is the ob-

    servational studyof matter beyondEarth: planetsandbodies in theSolar System, stars inthe Milky Way Galaxy, galaxies in the Universe, and diffusematter between thesecon-centrations of mass. Theperspectiveis rooted in our viewpointonor near Earth, typicallyusing telescopes on mountaintops or robotic satellites to enhancethe limited capabilitiesof our eyes. Astrophysics ( astro =star and physis =nature) is the study of the intrinsicnature of astronomical bodies and the processes by which they interact andevolve. Thisis an indirect, inferential intellectual effortbasedonthe(apparently valid) assumptionthatphysical processes established to rule terrestrial phenomena gravity, thermodynamics,electromagnetism, quantummechanics, plasmaphysics, chemistry, andsoforth alsoap-ply todistantcosmicphenomena. Figure 1.1 givesabroad-strokeoutlineof themajorelds

    andthemesof modern astronomy. Theelds of astronomy areoften distinguishedby thestructures under study. Thereareplanetary astronomers (who study our Solar System and extra-solar planetary systems),solar physicists(who study our Sun), stellar astronomers(who study other stars), Galacticastronomers (who study our Milky Way Galaxy), extragalactic astronomers (who studyother galaxies), andcosmologists (who study theUniverse as a whole). Astronomers canalso bedistinguished by the type of telescope used: there are radio astronomers, infraredastronomers, visible-light astronomers, X-ray astronomers, gamma-ray astronomers, andphysicistsstudyingcosmic rays,neutrinosandtheelusivegravitational waves.Astrophysi-cistsaresometimesclassiedbytheprocessestheystudy:astrochemists,atomicandnuclearastrophysicists, general relativists(studyinggravity) andcosmologists.

    Theastronomer mightproceedto investigatestellar processesby measuringanordinarymain-sequence star with spectrographs at different wavelengths of light, examining itsspectral energydistributionwiththousandsofabsorptionlines.Theastrophysicistinterpretsthat the emission of a star is produced by a sphereof 10 57 atoms with a specic mixtureof elemental abundances, powered by hydrogen fusion to helium in the core, revealingitself totheUniverseasablackbody surfaceat several thousanddegrees temperature. Thedevelopment of the observations of normal stars started in the late-nineteenth century,

    1

  • 7/21/2019 052176727X_Astronom

    22/503

    2 Introduction

    Fig. 1.1 Diagrams summarizing some important elds and themes of modern astronomy. Top: the history and growth of structures of the expanding Universe; bottom: the evolution of stars with generation of heavy elements andproduction of long-lived structures.

  • 7/21/2019 052176727X_Astronom

    23/503

    3 1.1 The role of statistics in astronomy

    andthesuccessful astrophysical interpretation emergedgradually throughoutthetwentiethcentury. Thevibrant interwoven progress of astronomy and astrophysics continues todayas many other cosmic phenomena, frommolecular clouds toblack holes, areinvestigated.Cosmology, inparticular,hasemergedwiththeremarkableinferencethatthefamiliaratoms

    arounduscompriseonly asmall fractionof thestuff in theUniversewhichis dominatedby mysteriousdark matter anddark energyinaccessibletonormal telescopesor laboratoryinstruments.

    1.1.2 Probability andstatistics

    Whilethereis little debateabout themeaningandgoals of astronomy andastrophysicsasintellectual enterprises, themeaningandgoalsof probabilityandstatisticshasbeenwidelydebated. In his volume Statistics and Truth , C. R. Rao (1997) discusses how the termstatistics has changedmeaningover thecenturies. It originally referred to thecollectionandcompilationof data. In thenineteenthcentury, it accruedthegoal of themathematicalinterpretation of data, often to assist in making real-world decisions. Rao views contem-porary statistics as an amalgam of a science (techniques derived from mathematics), atechnology (techniques useful for decision-making in thepresenceof uncertainty), andanart (incompletely codiedtechniquesbasedoninductivereasoning).

    Barnett(1999) considersvariousviewpointsonthemeaningof statistics. Therstgroupof quotesseestatisticsasavery useful, but essentially mechanical, technologyfor treatingdata. In this sense, it playsarole similar toastronomys role vis a vis astrophysics.

    1. Therst task of a statistician is cross-examination of data. (Sir R. A. Fisher, quotedby Rao1997)

    2. [S]tatistics refers to the methodology for the collection, presentation, and analysis of data, andfor theusesof suchdata. (Neter et al . 1978)

    3. Broadly dened, statistics encompasses thetheory andmethodsof collecting, organiz-ing, presenting, analyzing, andinterpreting datasets soas to determine their essentialcharacteristics. (Panik 2005)

    The following interpretations of statistics emphasize its role in reducing random vari-ations in observations to reveal important effects in the underlying phenomenon understudy.

    4. A statistical inferencecarriesusfromobservationstoconclusionsaboutthepopulationssampled. (Cox1958)

    5. Uncertainknowledge +Knowledgeof theamountof uncertaintyin it =Usableknowl-edge. (Rao 1997)6. My favourite denition [of statistics] is bipartite: statistics is both the science of

    uncertainty andthetechnologyof extractinginformationfromdata. (Hand2010)7. Instatistical inferenceexperimental orobservational dataaremodeledastheobserved

    values of randomvariables, to provideaframework fromwhich inductiveconclusionsmay bedrawnaboutthemechanismgivingrisetothedata. (Young& Smith 2005)

  • 7/21/2019 052176727X_Astronom

    24/503

    4 Introduction

    1.1.3 Statistics andscience

    Opinions differ widely when considering the relationship between statistical analysis of empirical dataandtheunderlyingreal phenomena. A groupof prominenttwentieth-century

    statisticiansexpressconsiderablepessimismthat statistical modelsareanythingbut usefulctions, much as Renaissance Europe debated the meaning of Copernicus heliocentriccosmological model. Thesescholars view statistical models as useful but often trivial orevenmisleadingrepresentationsofacomplexworld.Sir D.R. Cox,towardstheendofalongcareer, perceivesabarrier betweenstatistical ndingsandthedevelopmentorvalidationof scientic theories.

    8. There is no need for these hypotheses to be true, or even to be at all like the truth;rather onethingis sufcient for them that they should yield calculationswhichagreewiththeobservations. (OsiandersprefacetoCopernicus De Revolutionibus , quotedby Rao1997)

    9. Essentially, all models arewrong, butsomeareuseful. (Box& Draper 1987)10. [Statistical] models can provideuswith ideas which we test against data, and about

    whichwebuildupexperience. Theycanguideour thinking, leadustoproposecoursesof action, and so on, and if used sensibly, and with an open mind, and if checkedfrequently with reality, might help us learn something that is true. Some statisticalmodels arehelpful in agiven context, andsomearenot. . . .What wedo works(whenit does) because it can be seen to work, notbecause it is based ontrue or even goodmodelsof reality. (Speed1992, addressingameetingof astronomers)

    11. It is notalwaysconvenient toremember that therightmodel for apopulationcantasampleof dataworsethanawrongmodel, evenawrongmodel withfewer parameters.We cannot rely on statistical diagnostics to save us, especially with small samples.Wemust think about what our models mean, regardless of t, or wewill promulgatenonsense. (Wilkinson2005).

    12. Theobject [of statistical inference] is to provideideas andmethods for the criticalanalysis and, as far as feasible, the interpretationof empirical data. . . Theextremelychallengingissuesof scientic inferencemayberegardedasthoseof synthesisingverydifferent kinds of conclusions if possibleinto acoherent wholeor theory . . . Theuse,if any, in theprocess of simple quantitative notionsof probability andtheir numericalassessment is unclear . . . (Cox 2006)

    Other scholarsquoted below are moreoptimistic. The older Sir R. A. Fisher bemoansa mechanistic view of statistics without meaning in the world. G. Young and R. Smithimply that statistical modelingcan lead to an understanding of thecausativemechanismsof variations in theunderlyingpopulation. I. Hacking,aphilosopher,believesstatisticscanimprove our scientic inferences but not lead to new discovery. B. Efron, in an addressas President of theAmerican Statistical Association, feels that statistics can propel manysciences towardsimportantresultsandinsights.

    13. Toonebroughtupin thefreeintellectual atmosphereof anearlier timethereis some-thing rather horrifying in the ideological movement represented by the doctrinethat

  • 7/21/2019 052176727X_Astronom

    25/503

    5 1.1 The role of statistics in astronomy

    reasoning, properly speaking,cannotbeappliedtoempirical datatoleadtoinferencesvalid in thereal world. (Fisher 1973)

    14. The quiet statisticians have changed our world, not by discovering new facts ortechnical developments, but by changing the ways wereason, experiment, and form

    our opinions. (Hacking1990)15. Statistics has become the primary mode of quantitativethinking in literally dozensof elds, fromeconomicstobiomedical research. Thestatistical tidecontinues torollin, now lapping at the previously unreachable shores of the hard sciences. . . . Yes,condenceintervals apply aswell toneutrinomassesas todiseaserates, andraisethesameinterpretivequestions, too. (Efron2004)

    16. Thegoal of scienceistounlocknaturessecrets. . . . Ourunderstandingcomesthroughthe development of theoretical models which are capable of explaining the existingobservations as well as making testable predictions. . . . Fortunately, a variety of so-phisticated mathematical andcomputational approaches havebeen developed to helpus throughthis interface, thesegounder thegeneral headingof statistical inference.(Gregory 2005)

    Leadingstatisticiansarethusoftenmorecautious,oratleastlessself-condent,aboutthevalueof their laborsfor understandingphenomenathanareastronomers.Mostastronomersbelieveimplicitly thattheir observationsprovideaclear windowintothephysical Universe,and that simple quantitative statistical interpretations of their observations represent animprovementover qualitativeexamination of thedata.

    We generally share the optimistic view of statistical methodology in the service of astronomy and astrophysics, as expressed by P. C. Gregory (2005). In the languageof thephilosophyofscience,wearepositivistswhobelievethatunderlyingcausal relationshipscanbediscoveredthroughthedetectionandstudyof regularpatternsof observablephenomena.While quantitative interpretation and models of complex biological and human affairsattempted by many statisticians may be more useful for prediction or decision-makingthan understanding the underlying behaviors, we feel that quantitative models of manyastrophysical phenomenacanbeveryvaluable.A social scientistmightinterviewasampleof voters toaccurately predict theoutcomeof an election,yet never understandthebeliefsunderlyingthesevotes. But an astrostatisticianmay largely succeed in understanding theorbitsof binary stars, orthebehaviorof anaccretiondisk aroundablack holeorthegrowthof structurein an expanding Universe, that must obey deterministic mathematical lawsof physics.

    However,wewishtoconveythroughoutthisvolumethattheprocessof linkingstatistical

    analysis torealityis not simpleandchallengesmustbefacedatall stages. Insettingupthecalculation, thereareoftenseveral relatedquestionsthatmightbeaskedinagivenscienticenterprise, andtheir statistical evaluation may lead toapparently differentconclusions. Inperformingthecalculation, thereareoftenseveral statistical approachestoagivenquestionaskedaboutadataset,eachmathematicallyvalidundercertainconditions,yetagainleadingto different scientic inferences. In interpreting the result, even a clear statistical ndingmay giveanerroneousscientic interpretation if themathematical model ismismatchedtophysical reality.

  • 7/21/2019 052176727X_Astronom

    26/503

    6 Introduction

    Astronomers should be exible and sophisticated in their statistical treatments,and adopt a more cautious view of the results. A 3-sigma result does not necessarilyrepresent astrophysical reality. Astronomers might rst seek consensus about the exactquestion to beaddressed, apply a suiteof reasonable statistical approaches to the dataset

    with clearly stated assumptions, andrecognize that the link between thestatistical resultsandtheunderlyingastrophysical truthmay not bestraightforward.

    1.2 History of statistics in astronomy

    1.2.1 Antiquity through theRenaissance

    Astronomy is the oldest observational science. The effort to understand the mysteriousluminous objects in the sky has been an important element of human culture for tensof thousands of years. Quantitative measurements of celestial phenomena were carriedout by many ancient civilizations. The classical Greeks were not active observers butwere unusually creative in the applications of mathematical principles to astronomy. Thegeometricmodelsof thePlatonistswithcrystallinespheresspinningaroundthestaticEarthwereelaborated in detail, andthismodel enduredinEuropefor fteencenturies.

    The Greek natural philosopher Hipparchus made one of the rst applications of math-ematical principles in therealmof statistics, andstarted amillennium-long discussiononprocedures for combining inconsistentmeasurements of aphysical phenomenon(Sheynin1973, Hald 2003). Finding scatter in Babylonian measurements of the length of a year,dened as the time between solstices, he took the middle of the range rather than themean or median for the best value. Today, this is known as the midrange estimator of location, and is generally not favored due to its sensitivity to erroneous observations.Ptolemy andtheeleventh-century Persianastronomer AbuRayhan Biruni (al-Biruni) sim-ilarly recommended theaverageof extremes. Somemedieval scholars advised against theacquisition of repeated measurements, fearing that errors would compound uncertaintyrather than compensatefor each other. Theutility of themean of discrepantobservationsto increaseprecisionwas promoted in the sixteenth century by Tycho Brahe andGalileoGalilei. Johannes Kepler appears to have inconsistently used arithmetic means, geomet-ric means and middle values in his work. The supremacy of the mean was not settled inastronomy until theeighteenthcentury (Simpson1756).

    Ancient astronomers were concerned with observational errors, discussing dangers of propagatingerrors frominaccurateinstrumentsandinattentiveobservers. Inastudy of thecorrections to astronomical positions fromobservers in different cities, al-Biruni alludestothreetypesof errors: . . . theuseof sinesengenderserrorswhichbecomeappreciable if they areaddedtoerrorscausedby theuseof small instruments,anderrorsmadeby humanobservers(quotedbySheynin1973).Inhis1609 Di alogue on theTwo Great World Views,Ptolemaic and Copernican , Galileo also gavean early discussion of observational errorsconcerning the distance to the supernovaof 1572. Hereheoutlined in nonmathematical

  • 7/21/2019 052176727X_Astronom

    27/503

    7 1.2 History of statistics in astronomy

    languagemany of thepropertiesof errors later incorporatedby Gauss into his quantitativetheoryof errors.

    1.2.2 Foundationsof statistics in celestialmechanics

    Celestial mechanicsin theeighteenthcentury, inwhichNewtonslaw of gravity wasfoundto explain even the subtlest motions of heavenly bodies, required the quantication of afew interestingphysical quantities fromnumerous inaccurateobservations. Isaac Newtonhimself had little interest in quantitativeprobabilistic arguments. In 1726, he wrotecon-cerning discrepant observations of theComet of 1680 that, Fromall this it is plain thattheseobservations agreewiththeory, in so faras they agreewithoneanother (quoted byStigler 1986).

    Otherstackledtheproblemof combiningobservationsandestimatingphysical quantitiesthrough celestial mechanics moreearnestly. In 1750 while analyzing the libration of theMoon as head of the observatory at G ottingen, Tobias Mayer developed a method of averages for parameter estimation involving multiple linear equations. In 1767, BritishastronomerJohnMichell similarly usedasignicancetestbasedontheuniformdistribution(though with some technical errors) to show that the Pleiades is a physical, rather thanchance, groupingof stars. JohannLambert presented an elaboratetheory of errors, oftenin astronomical contexts, duringthe1760s.Bernouilli andLambert laid thefoundationsof theconceptof maximumlikelihoodlater developedmorethoroughly by Fisher intheearlytwentiethcentury.

    TheMarquisPierre-SimondeLaplace(1749 1827), themostdistinguishedFrenchsci-entistof his time, andhis competitorAdrien-MarieLegendre, madeseminal contributionsboth tocelestial mechanics andtoprobability theory, often intertwined. Their generaliza-tions of Mayersmethods for treating multiple parametric equations constrained by manydiscrepant observations had great impact. In astronomical and geodetical studies duringthe1780s andin his huge17991825 opus M ecanique C eleste , Laplaceproposed param-eter estimation for linear models by minimizingthe largest absoluteresidual. In an 1805appendix to a paper on cometary orbits, Legendre proposed minimizing the sum of thesquaresof residuals,or themethodof leastsquares.Heconcludedthat themethodof leastsquaresreveals,inamannerof speaking, thecenteraroundwhichtheresultsofobservationsarrangethemselves,sothat thedeviationsfromthatcenter areassmall aspossible (quotedby Stigler 1986).

    Both Carl Friedrich Gauss, also director of the observatory at G ottingen, andLaplacelater placedthemethodof leastsquaresontoasolidmathematical probabilistic foundation.

    While themethodof least squares had been adopted as apractical convenience by GaussandLegendre, Laplacersttreatedit asaprobleminprobabilitiesinhis Th eorieAnalytique des Probabil it es . Heprovedby an intricateanddifcult courseof reasoningthat it was themostadvantageousmethodfor ndingparameters inorbital models fromastronomical ob-servations, themeanof theprobabilitiesof error inthedeterminationof theelementsbeingthereby reduced to aminimum. Least-squarescomputations rapidly became theprincipalinterpretivetool for astronomical observationsandtheir linkstocelestial mechanics. Theseandother approaches tostatistical inferencearediscussedinChapter 3.

  • 7/21/2019 052176727X_Astronom

    28/503

    8 Introduction

    Inanother portionof the Th eorie , Laplacerescuedfromobscurity thepostulationof theCentral Limit Theoremby the mathematician AbrahamDe Moivrewho, in a remarkablearticle published in 1733, used the normal distribution to approximate the distributionof the number of heads resulting frommany tosses of a fair coin. Laplace expanded De

    Moivresnding by approximatingthebinomial distribution with thenormal distribution.Laplacesproof was awed, andimprovements weredeveloped by Sim eon-Denis Poisson,an astronomer at Paris Bureau des Longitudes, andFriedrich Bessel, director of the ob-servatory in K onigsberg. Today, theCentral Limit Theoremis considered tobeoneof thefoundationsof probability theory (Section 2.10 ).

    Gauss established his famous error distribution and related it to Laplaces method of least squares in 1809. Astronomer Friedrich Bessel introduced the concept of probableerror ina1816study of comets, anddemonstratedtheapplicability of Gauss distributiontoempirical stellar astrometric errors in 1818. Gauss also introduced sometreatments forobservationswithdifferent(heteroscedastic) measurementerrorsanddevelopedthetheoryfor unbiased minimum variance estimation. Throughout the nineteenth century, Gaussdistributionwaswidely knownas theastronomical error function.

    Although the fundamental theory was developed by Laplace and Gauss, other astron-omerspublishedimportantcontributionstothetheory, accuracyandrangeof applicabilityof the normal distribution and least-squares estimation during the latter part of the nine-teenth century (Hald 1998). They include Ernst Abbe at the Jena Observatory and theoptics rm of Carl Zeiss, Auguste Bravais of the Univerity of Lyons, J ohann Encke of the Berlin Observatory, Britains Sir John Herschel, Simon Newcomb of the U.S. NavalObservatory, Giovanni Schiaparelli of BreraObservatory, andDenmarksThorvaldThiele.Sir GeorgeB. Airy, BritishRoyal Astronomer,wrotean1865textonleast-squaresmethodsandobservational error.

    AdolpheQuetelet, founder of theBelgianRoyal Observatory, andFrancisGalton, direc-tor of Britains Kew Observatory, did little to advance astronomy but were distinguishedpioneers extendingstatistical analysis fromastronomy intothehuman sciences. They par-ticularly laid thegroundwork for regressionbetween correlatedvariables. Theapplicationof least-squarestechniquestomultivariatelinearregressionemergedinbiometrical contextsby Karl Pearsonandhis colleagues in theearly 1900s (Chapter 7).

    Theintertwinedhistoryof astronomy andstatisticsduringtheeighteenthandnineteenthcenturiesis detailed in themonographsby Stigler (1986), Porter (1986) andHald (1998).

    1.2.3 Statistics in twentieth-century astronomy

    Theconnections between astronomy and statistics considerably weakened during therstdecades of the twentieth century as statistics turned its attention principally to biologicalsciences, human attributes, social behavior and statistical methods for industries such aslife insurance, agriculture and manufacturing. Advances in astronomy similarly movedaway from the problem of evaluating errors in measurements of deterministic processesof celestial mechanics. Major efforts on the equilibriumstructure of stars, thegeometry

  • 7/21/2019 052176727X_Astronom

    29/503

    9 1.2 History of statistics in astronomy

    of theGalaxy, thediscovery of theinterstellar medium, thecomposition of stellar atmo-spheres, the study of solar magnetic activity and thediscovery of extragalactic nebulaegenerally did not involvestatistical theory or application. Two distinguished statisticianswroteseriesof papersintheastronomical literature Karl Pearsononcorrelationsbetween

    stellar properties around 190711, andJerzy Neyman with Elizabeth Scott on clusteringof galaxies around 195264 but neither had a stronginuence on further astronomicaldevelopments.

    Theleast-squaresmethodwasusedinmanyastronomical applicationsduringthersthalf of the twentieth century, but not in all cases. Schlesinger (1916) admonished astronomersestimating elements of binary-star orbits to use least-squares rather than trial-and-errortechniques. The stellar luminosity function derived by Jacobus Kapteyn, and thereby theinferredstructureof theGalaxy,werebasedonsubjectivecurvetting(Kapteyn& vanRhijn1920), although Kapteyn had madesome controversial contributions to themathematicsof skewed distributions andcorrelation. An important study on dark matter in the ComaClustertstheradial distributionofgalaxiesbyeyeanddoesnotquantifyitssimilarity toanisothermal sphere(Zwicky 1937). Incontrast, Edwin Hubblesseminal studiesongalaxieswereoftenbasedonleast-squarests(e.g. theredshiftmagnituderelationshipinHubble&Humason1931), althoughanearly studyreportsanonstandardsymmetrical averageof tworegression lines (Hubble 1926, Section 7.3.2 ). Applications of statistical methods basedonthenormal error law wereparticularly stronginstudiesinvolvingpositional astronomyand star counts (Trumpler & Weaver 1953). Astronomical applications of least-squaresestimation were strongly promoted by the advent of computers and Bevingtons (1969)useful volume with FORTRAN code. Fourier analysis was also commonly used for timeseriesanalysis in thelatter part of thetwentiethcentury.

    Despiteits formulationby Fisher in the1920s,maximumlikelihoodestimationemergedonly slowly in astronomy. Early applicationsincluded studies of stellar cluster convergentpoints (Brown 1950), statistical parallaxes from the HertzsprungRussell diagram (Jung1970), andsomeearly work in radioandX-rayastronomy.Crawford et al. (1970) advocateduse of maximum likelihood for estimating power-law slopes, a message we reiterate inthis volume (Section 4.4 ). Maximum likelihood studies with truly broad impact did notemerge until the 1970s. Innovative and widely accepted methods include Lynden-Bells(1971) luminosityfunctionestimator for ux-limitedsamples,Lucys(1974)algorithmforrestoring blurry images, and Cashs (1979) algorithm for parameter estimation involvingphoton counting data. Maximum likelihood estimators became increasingly important inextragalactic astronomy; they were crucial for the discovery of galaxy streaming towardstheGreatAttractor (Lynden-Bell et al. 1988) andcalculatingthegalaxyluminosityfunction

    fromux-limited surveys (Efstathiou et al. 1988). The1970s also witnessed therst useandrapid acceptanceof thenonparametric KolmogorovSmirnov statistic for two-sampleandgoodness-of-t tests.

    The development of inverse probability and Bayes theorem by Thomas Bayes andLaplaceinthelateeighteenthcentury took placelargely withoutapplicationstoastronomy.Despitetheprominenceof theleadingBayesianproponentSir HaroldJeffreys,whowontheGoldMedal of theRoyal Astronomical Society in 1937andservedasSocietyPresident in

  • 7/21/2019 052176727X_Astronom

    30/503

    10 Introduction

    the1950s,Bayesianmethodsdidnotemergeinastronomyuntil thelatterpartofthetwentiethcentury. Bayesian classiers for discriminatingstars andgalaxies (based onthe2001textwritten for engineers by Duda et al.) were used to construct large automated sky surveycatalogs (Valdes 1982), and maximum entropy image restoration gained some interest

    (Narayan& Nityananda1986). Butitwasnotuntil the1990sthatBayesianmethodsbecamewidespreadin importantstudies, particularly inextragalactic astronomy andcosmology. The modern eld of astrostatistics grew suddenly and rapidly starting in the late

    1990s. This was stimulated in part by monographs on statistical aspects of astronom-ical image processing (Starck et al. 1998, Starck & Murtagh 2006), galaxy clustering(Martnez & Saar 2001), Bayesiandataanalyses (Gregory 2005) andBayesiancosmology(Hobson et al. 2010). Babu & Feigelson (1996) wrote a brief overview of astrostatistics. The continuing conference series Statistical Challenges in Modern Astronomy organizedby us since 1991 brought together astronomers and statisticians interested in forefrontmethodological issues (Feigelson & Babu 2012). Collaborations between astronomersandstatisticians emerged, such as the CaliforniaHarvard Astro-Statistical Collaboration(http://hea-www.harvard.edu/AstroStat), the International Computational AstrostatisticsGroup centered in Pittsburgh (http://www.incagroup.org), and the Center for Astrostatis-tics at Penn State (http://astrostatistics.psu.edu). However, the education of astronomersin statistical methodology remains weak. PennStates Center andother institutes operateweek-long summer schools in statistics for young astronomers to partially address thisproblem.

    1.3 Recommended reading

    Weoffer hereanumber of volumes with broad coveragein statistics. Stiglersmonographreviews the history of statistics and astronomy. Rice, Hogg & Tanis, andHogg et al. arewell-respected textbooksin statistical inferenceat undergraduateandgraduatelevels, andWasserman gives amodern viewpoint. Lupton, James, and Wall & Jenkins arewritten byandfor physical scientists.Ghosh et al. andGregory introduceBayesianinference.

    Ghosh, J. K., Delampady, M. & Samanta, T. (2006) An Introduction to Bayesian Analysis: Theory and Methods , Springer,BerlinA graduate-level textbookinBayesianinferencewithcoverageof theBayesianapproach,objective and reference priors, convergence and large-sample approximations, model

    selection and testing criteria, Markov chain Monte Carlo computations, hierarchicalBayesian models, empirical Bayesian models andapplications to regression and high-dimensional problems.

    Gregory, P. (2005) Bayesian Logical Data Analysis for the Physical Sciences , CambridgeUniversity Press This monograph treats probability theory and sciences, practical Bayesian inference,frequentists approaches, maximumentropy, linear andnonlinear model tting, Markov

  • 7/21/2019 052176727X_Astronom

    31/503

    11 1.3 Recommended reading

    chain MonteCarlo, harmonic timeseriesanalysis, andPoissonproblems. Examplesareillustratedusing Mathematica .

    Hogg, R., McKean, J. & Craig, A. (2005) Introduction to Mathematical Statistics , 6th ed.,PrenticeHall, EnglewoodCliffsA slim text aimed at graduate students in statistics that includes Bayesian methodsand decision theory, hypothesis testing, sufciency, condence sets, likelihood theory,prediction, bootstrap methods, computational techniques (e.g. bootstrap, MCMC), andother topics(e.g. pseudo-likelihoods, Edgeworthexpansion, Bayesianasymptotics).

    Hogg, R. V. & Tanis,E. (2009) Probability and Statistical Inference , 8thed., Prentice-Hall,EnglewoodCliffsA widely used undergraduate text covering randomvariables, discreteand continuousdistributions, estimation,hypothesis tests, linear models,multivariatedistributions,non-parametric methods, Bayesianmethodsandinferencetheory.

    James, F. (2006) Statistical Methods in Experimental Physics , 2nded., World Scientic,Singapore This excellent volume covers concepts in probability, distributions, convergence the-orems, likelihoods, decision theory, Bayesian inference, point and interval estimation,hypothesis tests andgoodness-of-t.

    Lupton,R. (1993) Statistics in Theory and Practice , PrincetonUniversity Press This slimmonograph explains probability distributions, sampling statistics, condenceintervals,hypothesis tests,maximumlikelihoodestimation,goodness-of-tandnonpara-metric rank tests.

    Rice, J. A. (2007) Mathematical Statistics and Data Analysis , 3rded., Duxbury Press

    Anundergraduate-level textwithbroadcoverageofmodernstatisticswithboththeoryandapplications. Topics covered include probability, statistical distributions, Central Limit Theorem, survey sampling, parameter estimation, hypothesis tests, goodness-of-t, datavisualization, two-samplecomparisons, bootstrap,analysisof variance, categorical data,linear leastsquares, Bayesianinferenceanddecisiontheory.

    Stigler, S. M. (1986) The History of Statistics: The Measurement of Uncertainty before 1900 , HarvardUniversity Press This readable monograph presents the intellectual history of the intertwined develop-mentsinastronomy andstatistics duringtheeighteenthandnineteenthcenturies. Topicsincludecombiningobservations, leastsquares, inverseprobability (Bayesianinference),

    correlation, regression andapplicationsinastronomy andbiology.Wall, J. V. & Jenkins, C. R. (2003) Practical Statistics for Astronomers , Cambridge Uni-

    versity PressA useful volume on statistical methods for physical scientists. Coverageincludes con-ceptsofprobabilityandinference,correlation,hypothesistests,modelingbyleastsquaresandmaximumlikelihoodestimation,bootstrapandjackknife,nondetectionsandsurvivalanalysis, timeseriesanalysis andspatial point processes.

  • 7/21/2019 052176727X_Astronom

    32/503

    12 Introduction

    Wasserman,L. (2004) All of Statistics: A Concise Course in Statistical Inference , Springer,BerlinA short text intended for graduatestudents in allied elds presenting a widerange of topicswithemphasisonmathematical foundations.Topicsincluderandomvariables,ex-

    pectations, empirical distribution functions, bootstrap, maximumlikelihood, hypothesistesting, Bayesian inference, linear and loglinear models, multivariate models, graphs,density estimation, classication, stochastic processes and simulation methods. AnassociatedWebsiteprovides R codeanddatasets.

  • 7/21/2019 052176727X_Astronom

    33/503

    2 Probability

    2.1 Uncertainty in observational science

    Probability theory models uncertainty. Observational scientists often come across eventswhoseoutcomeisuncertain. It may bephysically impossible, too expensiveor even coun-terproductivetoobserveall theinputs.Theastronomer mightwanttomeasurethelocationandmotions of all stars in a globular cluster to understand its dynamical state. But evenwith thebest telescopes, only a fraction of the stars can be located in the two dimensionsof sky coordinates with the third distance dimension unobtainable. Only one component(theradial velocity) of thethree-dimensional velocity vectorcanbemeasured, andthis maybeaccessible for only afew cluster members.Furthermore, limitationsof thespectrographandobserving conditions lead to uncertainty in the measured radial velocities. Thus, ourknowledge of the structure and dynamics of globular clusters is subject to considerablerestrictions anduncertainty.

    In developing the basic principles of uncertainty, we will consider both astronomicalsystemsandsimplefamiliar systemssuchasatossedcoin.Theoutcomeof atoss, headsortails,iscompletelydeterminedbytheforcesonthecoinandNewtonslawsofmotion.Butwe

    wouldneedtomeasuretoomanyparametersof thecoinstrajectory androtationstopredictwithacceptablereliabilitywhichfaceof thecoinwill beup.Theoutcomesofcoin tossesarethus considered to beuncertain even though they are regulated by deterministic physicalprocesses. Similarly, the observed properties of a quasar have considerable uncertainty,even though the physics of accretion disks and their radiation are based on deterministicphysical processes.

    Theuncertainty in our knowledgecould bedueto thecurrent level of understanding of the phenomenon, andmight bereduced in the future. Consider, for example, the predic-tion of solar eclipses. In ancient societies, the motions of Solar System bodies were notunderstoodand the occurrence of a solar eclipse would havebeen modeledas a randomevent (or attributed to divine intervention). However, an astronomer noticing that solar

    eclipses occur only on a new moon day could have revised the model with a monthlycycle of probabilities. Further quantitativeprediction would follow fromthe Babylonianastronomers discovery of the18-year saroseclipsecycle. Finally, with Newtonian celes-tial mechanics, thephenomenonbecameessentially completely understoodandthemodelchanged fromarandomtoadeterministic model subject to direct prediction withknownaccuracy.

    13

  • 7/21/2019 052176727X_Astronom

    34/503

    14 Probability

    Theuncertainty of our knowledgecould bedue to futurechoices or events. Wecannotpredictwithcertaintytheoutcomeof anelectionyet tobeheld, althoughpolls of thevotingpublic will constrain the prediction. We cannot accurately predict the radial velocity of aglobularstarprior toitsmeasurement,althoughourpriorknowledgeof theclustersvelocity

    dispersion will constrain theprediction. Butwhen theelectionresultsaretabulated, or theastronomical spectrumisanalyzed, our level of uncertainty is suddenly reduced.When the outcome of a situation is uncertain, why do we think that it is possible to

    model it mathematically? In many physical situations, the events that are uncertain atthe micro-level appear to be deterministic at the macro-level. While the outcome of asingle toss of a coin is uncertain, the proportion of heads in a large number of tossesis stable. While the radial velocity of a single globular cluster star is uncertain, we canmakepredictionswith somecondencebasedonaprior measurementof theglobal clustervelocity andour knowledgeof cluster dynamicsfrompreviousstudies. Probability theoryattempts to capture and quantify this phenomenon; the Law of Large Numbers directlyaddresses the relationship between micro-level uncertainty andmacro-level deterministicbehavior.

    2.2 Outcome spaces and events

    An experiment is any action that can have a set of possible results where the actuallyoccurringresult cannotbepredictedwithcertainty prior totheaction. Experimentssuchastossingacoin, rollinga die, or counting of photons registered at a telescope, all result insetsof outcomes. Tossingacoin resultsin aset of two outcomes = {H , T }; rolling adieresults in asetof six outcomes

    = {1, 2, 3, 4, 5, 6

    }; whilecountingphotonsresults in

    aninnitesetof outcomes = {0, 1, 2, . . . }. Thenumber of neutronstarswithin 1kpcof theSun is adiscreteand nitesample space. Theset of all outcomes of an experimentis knownas the outcomespace or samplespace.

    An event isasubsetof asamplespace. Forexample, consider nowthesamplespace of all exoplanets,wheretheevent E describesall exoplanetswitheccentricityintherange0 .50.6,andtheevent F describesthatthehoststar is abinary system. Thereareessentially twoaspectstoprobabilitytheory: rst, assigningprobabilitiestosimpleoutcomes; andsecond,manipulatingprobabilitiesor simpleevents toderiveprobabilitiesof complicatedevents.

    Inthesimplestcases,suchasawell-balancedcointossordieroll, theinherentsymmetriesof the experiment lead to equally likely outcomes. For the coin toss, = {H , T } withprobabilities P (H ) = 0.5 and P (T ) = 0.5. For the die roll, = {1, 2, 3, 4, 5, 6} withP (i ) = 16 for i = 1, 2, . . . , 6. Nowconsider themorecomplicated casewhereaquarter, adimeandanickel aretossedtogether.Theoutcomespaceis

    = {H H H , H H T , H T H , H T T , T H H , T H T , T T H , T T T }, (2.1)wheretherstletter istheoutcomeof thequarter,thesecondof thedimeandthethirdof thenickel. Again, it is reasonabletomodel all theoutcomesasequally likely withprobabilities

  • 7/21/2019 052176727X_Astronom

    35/503

    15 2.3 Axioms of probability

    18 . Thus, when an experiment results in m equally likely outcomes, {e 1, e 2, . . . , e m }, thentheprobabilityof any event A is simply

    P (A) = #Am

    , (2.2)

    where #is read the number of. That is, P (A) is the ratio of the number of outcomesfavorableto A andthetotal number of outcomes.

    Even when the outcomes arenot equally likely, in some cases it is possible to identifytheoutcomes as combinations of equally likely outcomes of another experimentandthusobtain a model for the probabilities. Consider the three-coin tosswherewe only note thenumber of heads. Thesamplespaceis = {0, 1, 2, 3}. Theseoutcomescannotbemodeledas equally likely. In fact, if we toss three coins 100 times, then we would observe that

    {1, 2} occur far morefrequently than {0, 3}. Thefollowingsimple argument will lead toalogical assignmentof probabilities.Theoutcome in this experimentis relatedtotheoutcomes in( 2.1 ):

    = 0whenTTT occurs = 1whenHTT,THT or TTH occurs = 2whenHHT, HTH or THH occurs = 3whenHHH occurs.

    Thus P (0) = P (3) = 0.125and P (1) = P (2) = 0.375.For nite (or countably innite) sample spaces = {e 1, e 2, . . . }, a probability model

    assignsanonnegativeweight p i to theoutcome e i for every i insuchaway that the p i saddupto1. A nite(orcountably innite) samplespaceis sometimescalleda discretesamplespace . For example, when exploring thenumber of exoplanetsorbiting stars within 10 pcof the Sun, we consider a discretesample space. In the caseof countable sample spaces,wedenetheprobability P (A) of anevent A as

    P (A) =i : e i A

    p i . (2.3)

    In words, this says that theprobability of an event A is equal to thesumof the individualprobabilitiesof outcomes e i belongingto A.

    If thesamplespace is uncountable, thennotall subsetsareallowedtobecalledeventsfor mathematical and technical reasons. Astronomers deal with both countable spaces suchas thenumber of starsin theGalaxy, or theset of photons fromaquasar arrivingat adetector anduncountablespaces such as thevariability characteristics of aquasar, orthebackgroundnoiseinan imageconstructedfrominterferometry observations.

    2.3 Axioms of probability

    A probability space consistsof the triplet ( , F , P ), with sample space , aclass F of events,andafunction P thatassignsaprobability toeacheventin F thatobey threeaxiomsof probability:

  • 7/21/2019 052176727X_Astronom

    36/503

    16 Probability

    C D

    C D

    E F

    G

    E F G

    Fig. 2.1 Union and intersection of events.

    Axiom1 0 P (A) 1, for all events AAxiom2 P ( ) = 1Axiom3 For mutually exclusive(pairwisedisjoint) events A1, A2, . . . ,

    P (A1A2A3 ) = P (A1) +P (A2) +P (A3) + ,that is, if for all i = j , Ai A j = (denotes theempty set or null event), then

    P

    i =1Ai =

    i =1P (Ai ).

    Here, represents the union of sets while represents their intersection. Axiom 3states that theprobability that at leastoneof themutually exclusiveevents Ai occursis thesame as the sumof the probabilities of the events Ai , and this should hold for innitelymanyevents.Thisisknownasthe countableadditivity property. Thisaxiom, inparticular,implies that theniteadditivity propertyholds; that is, for mutually exclusive(or disjoint)events A, B (i.e. AB = ),

    P (AB ) = P (A) +P (B ). (2.4)

    Thisinparticular impliesthatfor any event A, theprobability of itscomplement Ac = { : / A}, theset of points in thesamplespacethat arenotin A, is givenby

    P (Ac ) = 1 P (A). (2.5)(A technical commentcanbemadehere: in thecaseof an uncountable samplespace , itis impossible todene a probability function P that assigns zero weight to singletonsetsandsatisfyingtheseaxiomsfor all subsetsof .)

    Usingtheaboveaxioms, it is easy toestablish that for any two events C , D

    P (C

    D )

    = P (C )

    +P (D )

    P (C

    D )

    ; (2.6)

    that is, the probability of the union of the two events is equal to the sum of the eventprobabilitiesminus theprobability of theintersection of thetwo events. This is illustratedin theleft-handpanel of Figure 2.1 .

    For threeevents E , F , G ,

    P (E F G ) = P (E ) +P (F ) +P (G ) P (E F ) P (F G )P (E G ) +P (E F G ) (2.7)

  • 7/21/2019 052176727X_Astronom

    37/503

    17 2.4 Conditional probabilities

    as shownin theright-handpanel of Figure 2.1 . Thegeneralization to n events, E 1, . . . , E n is called the inclusionexclusionformula ;

    P (E 1

    E 2

    E n )

    =

    i =1P (E i )

    i 1< i 2P (E i 1

    E i 2 )

    + (1) r +1 i 1< i 2< < i r P (E i 1 E i 2 E i r )+ +(1)n +1P (E 1 E 2 E n ), (2.8)

    wherethesummation

    i 1< i 2< < i r P (E i 1 E i 2 E i r ) (2.9)

    is takenover all of the n r possible subsetsof size r of theset {1, 2, . . . , n }.

    2.4 Conditional probabilities

    Conditional probabilityisoneof themostimportantconceptsin probabilitytheoryandcanbetrickytounderstand. Itoftenhelpsin computingdesiredprobabilities,particularlywhenonly partial informationregardingaresult of anexperimentis available. Bayes theorematthefoundationof Bayesianstatisticsusesconditional probabilities.

    Consider thefollowingsimpleexample.Whenadie is rolled, theprobability that it turnsuponeof thenumbers

    {1, 2, 3

    } is 1/ 2, as eachof thesix outcomes is equally likely. Now

    consider that someone took abrief glimpseat thedie andfoundthat it turned upan evennumber. How does this additional information inuence the assignment of probability toA = {1, 2, 3}? Inthis case, theweightsassignedtothepointsarereassessedbygivingequalweights, 1 / 3, to each of the three even integers in B = {2, 4, 6} and zero weights to theoddintegers. Sinceit is alreadyknownthat B occurred, it is intuitivetoassignthescaleandprobability 1 to B andprobability 0 to thecomplementary event B c . Now as all thepointsin B areequally likely, it follows that therequiredprobability is theratio of thenumber of pointsof A that arein B to thetotal number of points in B . Since2is theonly number fromA in B , therequiredprobability is # (AB )/ #B = 1/ 3.

    Generalizingthis, letusconsider an experimentwith m equally likely outcomes andletA and B be two events. If we are given the information that B has happened, what is theprobability that A has happened in light of thenew knowledge? Let # A = k , #B = n and#(A B ) = i . Then, as in the rolled die example above, given that B has happened, thenew probabilityallocationassignsprobability1 / n to all theoutcomes in B . Outof these n ,#(AB ) = i outcomesbelongto A. Notingthat P (AB ) = i / m and P (B ) = n / m , it leadsto theconditional probability, P (A | B ) = i / n , of A given B ,

    P (A | B ) = P (AB )

    P (B ). (2.10)

  • 7/21/2019 052176727X_Astronom

    38/503

    18 Probability

    Equation (2.10 ) can beconsidered a formal denition of conditional probabilities pro-viding P (B ) > 0,eveninthemoregeneral casewhereoutcomesmay notbeequally likely.Asaconsequence, themultiplicativeruleof probability for two events,

    P (A

    B )

    = P (A

    | B )P (B ) (2.11)

    holds. The multiplicationrule easily extends to n events:

    P (A1 A2 . . . An ) =P (A1) P (A2 | A1) . . . P (An 1 | A1, . . . An 2) P (An | A1, . . . An 1). (2.12)

    These concepts are very relevant to observational sciences such as astronomy. Exceptfor therarecircumstancewhenanentirely newphenomenonisdiscovered, astronomersaremeasuringpropertiesofcelestial bodiesorpopulationsforwhichsomedistinctivepropertiesarealreadyavailable. Consider, for example, asubpopulation of galaxies found toexhibitSeyfert-like spectra in the optical band(property A) that havealready been examined for

    nonthermal lobes in the radio band (property B ). Then the conditional probability thata galaxy has a Seyfert nucleus given that it also has radio lobes is given by Equation(2.10 ), and this probability can be estimated from careful study of galaxy samples. Thecomposition of aSolar Systemminor body can bepredominately ices or rock. Icybodiesaremorecommonatlargeorbital distancesandshowspectral signaturesof water (or other)ice rather than the spectral signatures of silicates. The probability that a given asteroid,comet or Kuiper Belt Object is mostly icy is then conditioned on its semi-major axis andspectral characteristics.

    2.4.1 Bayes theorem

    Wearenow ready toderivethefamous Bayes theorem, also knownas Bayes formula orBayes rule. It is namedfor themid-eighteenth-centuryBritishmathematicianandPresby-terian minister Thomas Bayes, although it was recognized earlier by James Bernoulli andAdriandeMoivre, andwas later fully explicatedby PierreSimonLaplace. Let B 1, . . . , B k beapartitionof thesample space . A partitionof is acollectionof mutually exclusive(pairwisedisjoint) setswhoseunion is ; that is, B i B j = 0 for i = j . If A is any eventin , then tocompute P (A) , onecan useprobabilitiesof piecesof A oneachof thesets B i andaddthemtogether toobtain

    P (A) = P (A|B 1)P (B 1) + +P (A|B k )P (B k ). (2.13)

    This is called the lawof total probability andfollowsfromtheobservationP (A) = P (AB 1) + +P (AB k ), (2.14)

    andthemultiplicativeruleof probability, P (AB i ) = P (A|B i )P (B i ).Now consider the following example, abit morecomplicated than those treated above.

    Suppose a box contains ve quarters, of which one is a trick coin that has heads on bothsides. A coin is picked at randomand tossed three times. It was observed that all threetosses turnedupheads.

  • 7/21/2019 052176727X_Astronom

    39/503

    19 2.4 Conditional probabilities

    If thetypeof thecoinchosenisknown, thenonecaneasilycomputetheprobabilityof theevent H that all threetossesyield heads. If it is thetwo-headedcoin, thentheprobability is1, otherwiseit is 1 / 8. That is, P (H | M ) = 1and P (H | M c ) = 1/ 8,where M denotes theeventthat thetwo-headedcoin ischosenand M c is thecomplimentary eventthat aregular

    quarter is chosen.After observingthreeheads, what is theprobability that thechosencoinhasbothsidesheads? Bayes theoremhelpstoanswer this question. Here, by usingthelawof total probability andthemultiplicationrule,oneobtains,

    P (M | H ) = P (M H )

    P (H ) = P (H | M )P (M )

    P (H | M )P (M ) +P (H | M c )P (M c )=

    1/ 5(1/ 5) +(1/ 8) (4/ 5) =

    23

    . (2.15)

    For a partition B 1, . . . , B k of , Bayes theorem generalizes the above expression toobtain P (B i | A) in termsof P (A | B j ) and P (B j ) , for j = 1, . . . , k . Theresult is very easytoprove, andis thebasisof Bayesianinferencediscussedin Section 3.8 .

    Theorem 2.1 (Bayes theorem) If B 1, B 2, . . . , B k is a partition of the sample space, then for i = 1, . . . , k ,

    P (B i | A) = P (A | B i )P (B i )

    P (A | B 1)P (B 1) + +P (A | B k )P (B k ). (2.16)

    Bayes theorem thus arises directly from logical inference based on the three axiomsof probability. While it applies to any formof probabilitiesand events, modern Bayesianstatisticsadopts aparticular interpretation of theseprobabilities, which wewill present inSection 3.8 .

    2.4.2 Independent events Theexamples aboveshow that, for any two events A and B , theconditional probability of A given B , P (A | B ) , is not necessarily equal to theunconditional probability of A, P (A).Knowledge of B generally changes the probability of A. In the special situation whereP (A | B ) = P (A) wheretheknowledgethat B has occurredhas not altered theprobabilityof A, A and B aresaid to be independent events . As theconditional probability P (A | B )is not dened when P (B ) = 0, themultiplication rule P (AB ) = P (A | B )P (B ) will beusedtoformally deneindependence:

    De nition 2.2 Two events A and B aredenedtobeindependentif

    P (AB ) = P (A)P (B ). Thisshowsthatif Aisindependentof B , then B isindependentof A. Itisnotdifcult toshowthat if A and B areindependent, then A and B c areindependent, Ac and B are independentandalso Ac and B c areindependent.

    Note that three events E , F , G satisfying P (E F G ) = P (E )P (F )P (G ) cannot becalled independent, as it does not guarantee independence of E , F or independence of F , G or independenceof E , G . This can be illustrated with asimple example of a sample

  • 7/21/2019 052176727X_Astronom

    40/503

    20 Probability

    space = {1, 2, 3, 4, 5, 6, 7, 8}, whereall thepointsareequally likely. Consider theeventsE = {1, 2, 3, 4}, F = G = {4, 5, 6, 7}. Clearly P (E F G ) = P (E )P (F )P (G ). Butneither E and F , F and G , nor E and G are independent.

    Similarly, wenotethat independence of A and B , B and C , and A and C together does

    not imply P (A B C ) = P (A)P (B )P (C ). If we consider the events A = {1, 2, 3, 4},B = {1, 2, 5, 6}, C = {1, 2, 7, 8} then clearly A and B are independent, B and C areindependent, and also A and C are independent, as A, B , C each contain exactly fournumbers,

    P (A) = P (B ) = P (C ) = 48 =

    12

    , (2.17)

    but AB = B C = AC = A B C = {1, 2}andP (AB ) = P (B C ) = P (AC ) = P ({1, 2}) =

    28 =

    14

    . (2.18)

    However,

    P (AB C ) = P ({1, 2}) = 14 =

    18 = P (A)P (B )P (C ). (2.19)

    Though A and B areindependent, and A and C are independent, P (A | B C ) = 1. So A isnotindependentof B C . This leads to thefollowingdenition:De nition 2.3 (Independent events) A set of A1, . . . , An events is said to be independent if,for every subcollection AI 1 , . . . , Ai r , r n ,

    P (AI 1 AI 2 AI r ) = P (AI 1 )P (AI 2 ) P (AI r ). (2.20)An innite set of events is dened to be independent if every nite subcollection of

    theseeventsis independent. It is worth notingthat for thecaseof threeevents, A, B , C areindependentif all thefollowingfour conditionsaresatised:

    P (AB C ) =P (A)P (B )P (C ),P (AB ) =P (A)P (B ),P (B C ) =P (B )P (C ),P (AC ) =P (A)P (C ). (2.21)

    2.5 Randomvariables

    Often, insteadof focusingontheentire outcomespace, it may besufcient toconcentrateonasummary of outcomesrelevanttotheproblemathand, say afunctionof theoutcomes.In tossingacoin four times, it may besufcient to look at thenumber of heads instead of theorder in which they areobtained. Inobservingphotons fromanastronomical source, itmay besufcienttolookat themeannumber of photons in aspectral bandover sometimeinterval, or theratio of photons in two spectral bands, rather than examiningeachphotonindividually.

  • 7/21/2019 052176727X_Astronom

    41/503

    21 2.5 Random variables

    These real-valued functions on the outcome space or samplespace are called randomvariables . Dataare realizations of randomvariables. Typically a randomvariable X is afunction onthesamplespace . In thecaseof countablesamplespaces , this denitionalways works. But in the case of uncountable , one should be careful. As mentioned

    earlier, not all subsets of an uncountable space can be called an event, or a probabilityassigned to them. A randomvariable is a function such that { : X () a }, is aneventfor all real numbers a . Inpractical situations, thecollectionof eventscanbedenedtobeinclusiveenough that thesetof events followscertain mathematical conditions (closureunder complementation, countableunionsandintersections). So in practice, the technicalaspectscanbeignored.

    Notethatincasual usage, somepeoplelabel aphenomenonasrandomtomeanthattheeventshaveequal chancesofpossibleoutcomes.Thisconceptiscorrectlycalled uniformity . The concept of randomness does not require uniformity. Indeed, the following sectionsandChapter 4 arelargely devotedtophenomenathat follow nonuniformdistributions.

    2.5.1 Densityand distribution functions

    A random variable is called a discrete randomvariable if it maps a sample space to acountableset(e.g.theintegers)witheachvalueintherangehavingprobability greater thanor equal tozero.

    De nition 2.4 (Cumulative distribution function) The cumulative distribution function(c.d.f.) or simply the distributionfunction F of arandomvariable X is denedas

    F (x ) = P (X x ) = P ( : X () x ), (2.22)for all real numbers x . In thediscretecasewhen X takesvalues x 1, x 2, . . . , then

    F (x ) = P (X x ) =x i x

    P (X = x i ). (2.23) Thec.d.f. F is anondecreasing, right-continuous functionsatisfying

    limx

    F (x ) = 0and limx F (x ) = 1. (2.24) The c.d.f. of a discrete random variable is called a discrete distribution . A random

    variable with a continuous distribution function is referred to as a continuous randomvariable. A continuous randomvariable mapsthesamplespacetoan uncountable set (e.g.thereal numbers).Whiletheprobabilitythatacontinuousrandomvariabletakesanyspecicvalueis zero, theprobability that it belongs toan inniteset of values suchas an interval

    may bepositive. Itshould beunderstoodclearly that therequirementthat X isacontinuousrandomvariabledoesnotmeanthat X () is acontinuousfunction; in fact,continuity doesnot makesensein thecaseof anarbitrary samplespace, .

    Often some continuous distributions are described through the probability densityfunction (p.d.f.). A nonnegativefunction f is called theprobability density function of adistributionfunctionif for all x

    F (x ) = x f (y )dy . (2.25)

  • 7/21/2019 052176727X_Astronom

    42/503

  • 7/21/2019 052176727X_Astronom

    43/503

    23 2.5 Random variables

    Thesamedenition of expectation as in (2.28 ) and ( 2.29 ) can beused for any discreterandomvariable X takinginnitely many nonnegativevalues.However,difcultiesmaybeencounteredindeningtheexpectationof arandomvariabletakinginnitely manypositiveandnegativevalues. Consider thecasewhere W isarandomvariablesatisfying

    P (W = 2 j ) = P (W = 2 j ) = 2 j 1, for j = 1, 2, . . . (2.30)In this case, the expectation E [W ] cannot be dened, as both the positive part W + =max (0, W ) and the negative part W = max (0, W ) have innite expectations. Thiswould make E [X ] to be , which is meaningless. However, for a general discreterandomvariable, E [X ] canbedenedas in ( 2.28 ) provided

    i |x i |P (X = x i ) < . (2.31)

    In case the distribution F of a randomvariable X has density f as in ( 2.25 ), then theexpectation isdenedas

    E [X ] = y f (y )dy , provided |y | f (y )dy < . (2.32) Theexpectationof afunction h ofarandomvariable X canbedenedsimilarlyasin ( 2.29 ),provided i |h (x i )|P (X = x i ) < in thediscretecase, and

    E [h (X )] = h (y ) f (y )dy provided |h (y )| f (y )dy < (2.33)in casethedistributionof X hasdensity f .

    Another importantandcommonly usedfunctionof adistributionfunctionthatquantiesthespread is thesecondmomentcentered onthemean, knownas the variance andoftendenotedby 2. Thevarianceis denedby

    2 =Var (X ) = E (X ) 2 = E [X 2] 2, (2.34)where = E [X ].

    The mean and variance need not be closely related, as seen in the following simpleexample. Let X be a random variable takingvalues 1 and 1 with probability 0 .5 each,andlet Y bearandomvariable takingvalues 1000 and 1000 with probability 0 .5. BothX and Y havethesamemean( = 0), but Var (X ) = 1and Var (Y ) = 106.

    It is helpful to derivethevarianceof thesumof randomvariables. If X 1, X 2, . . . , X n aren randomvariables, wendthat

    E

    n

    i =1X i =

    n

    i =1E [X i ] (2.35)

    andthevarianceof thesum n i =1 X i canbeexpressedas

    Var n

    i =1X i =

    n

    i =1Var (X i ) +

    n

    i =1

    n

    j =1i = j

    Co v( X i , X j ), where

    Co v( X , Y ) = E [(X E [X ])( Y E [Y ])]. (2.36)

  • 7/21/2019 052176727X_Astronom

    44/503

    24 Probability

    The Co v quantity is the covariance measuring the relation between the scatter in tworandomvariables. If X and Y are independent randomvariables, then Co v( X , Y ) = 0andVar ( n i 1 X i ) =

    n i =1 Var (X i ), while the converse is not true; some dependent variablesmay havezerocovariance.

    If all the X i variables havethesamevariance 2, thesituation is called homoscedastic .If X 1, X 2, . . . , X n areindependent, thevarianceof thesample mean X = (1/ n ) n i =1 X i isgivenby

    Var ( X ) = 1n 2

    n

    i =1Var (X i ) =

    2

    n . (2.37)

    The variance essentially measures the mean square deviation from the mean of thedistribution. The square root of the variance, , is called the standard deviation . Themean and thestandard deviation of a randomvariable X areoften used toconvert X to a standardizedform

    X std = X

    (2.38)

    with mean zero andvarianceunity. This important transformationalso removes theunitsof theoriginal variable. Other transformationsalso reducescaleandrender avariable freefromunits, suchas thelogarithmic transformationoftenusedby astronomers. It shouldberecognizedthat thelogarithmic transformationisonly oneof many optional variabletrans-formations.Standardizationisoftenpreferredbystatisticianswithmathematical propertiesuseful in statistical inference.

    The third central moment E [(X E [X ])3] provides information about the skewnessof thedistributionof a randomvariable; that is, whether thedistributionof X leansmoretowardsrightorleft. Higher order momentslikethe k -thorder moment E [X k ] alsoprovidesomeadditional informationaboutthedistributionof therandomvariable.

    2.5.2 Independent and identically distributed random variables

    Whenrepeatedobservationsaremade, orwhenanexperimentisrepeatedseveral times,thesuccessiveobservationsleadtoindependentrandomvariables.If thedataaregeneratedfromthesamepopulation, then theresultantvalues can beconsideredas randomvariableswithacommondistribution.Theseareasequenceof independentandidenticallydistributedor i.i.d. randomvariables.Inthei.i.d.case, therandomvariableshaveacommonmeanandvariance(if thesemomentsexist).

    Someobservational studiesinastronomyproducei.i.d.randomvariables.Theredshiftsof galaxiesinanAbell cluster, theequivalentwidthsof absorptionlinesin aquasar spectrum,theultravioletphotometryof acataclysmicvariableaccretiondisk,andtheproper motionsof a sample of Kuiper Belt bodies will all be i.i.d. if the observational conditions areunchanged. But thei.i.d. conditions areoften violated. Thesamplemay beheterogeneouswith objects drawn from different underlying distributions. The observations may havebeentakenunder differentconditionssuchthat themeasurementerrorsdiffer.Thisleadsto

  • 7/21/2019 052176727X_Astronom

    45/503

    25 2.6 Quantile function

    aconditioncalled heteroscedasticity thatviolatesthei.i.d.assumption.Heteroscedasticitymeansthat different datapointshavedifferentvariances.

    Sinceagreatmany methodsof statistics, bothclassical andmodern, dependonthei.i.d.assumption, it is crucial that astronomers understand the concept and its relationship to

    thedatasetsunder study. Incorrectuseof statistics that requirei.i.d. will lead to incorrectquantitative results, and thereby increase the risk of incorrect or unsupported scienticinferences.

    2.6 Quantile function

    Thecumulativedistributionfunction F (x ) estimatesthevalueof thepopulationdistributionfunctionat achosenvalueof x . But theastronomer often askstheinversequestion: Whatvalue of x corresponds to a specied valueof F (x )? This answers questions likeWhatfraction of galaxies haveluminosities above L? or At what agehave95% of stars losttheir protoplanetary disks? This requiresestimationof the quantilefunction of arandomvariable X , theinverseof F , denedas

    Q (u ) = F 1(u ) = inf {y : F (y ) u } (2.39)where0 < u < 1. Here inf (inmum) refers to the smallest valueof y with the propertyspeciedin thebrackets.

    Whenlargesamplesareconsidered, thequantilefunctionisoftenconvenientfor scienticanalysis as the largenumber of datapoints arereduced toasmaller controllednumber of interestingquantilessuchasthe5%,25%,50%,75%and95%quantiles.A quantilefunctionfor an astronomical dataset is compared to the more familiar histogram in Figure 6.1of Chapter 6. Quantile-quantile (Q-Q) plots are often used in visualization to comparetwo samples or one sample with a probability distribution. Q-Q plots are illustrated inFigures 5.4 , 7.2 , 8.2 and 8.6 .

    Butwhensmall samplesareconsidered, thequantilefunctioncanbequiteunstable.Thisis readily understood: for asample of n = 8points, the25% and75% quartilesaresimplythevalues of thesecondandsixthdatapoints, but for n = 9 interpolationis neededbasedonvery little informationabout theunderlyingdistributionof