705

Introduction - الصفحات الشخصية ...site.iugaza.edu.ps/.../Introduction-to-Mathematical-Statistics-7th.pdf · Introduction to Mathematical Statistics Seventh Editio n

  • Upload
    dohuong

  • View
    287

  • Download
    1

Embed Size (px)

Citation preview

  • Boston Columbus Indianapolis New York San Francisco Upper Saddle RiverAmsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto

    Delhi Mexico City Sao Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo

    Introductionto

    Mathematical Statistics

    Seventh Edition

    Robert V. HoggUniversity of Iowa

    Joseph W. McKeanWestern Michigan University

    Allen T. CraigLate Professor of Statistics

    University of Iowa

  • Editor in Chief: Deirdre LynchAcquisitions Editor: Christopher CummingsSponsoring Editor: Christina LepreEditorial Assistant: Sonia AshrafSenior Managing Editor: Karen WernholmSenior Production Project Manager: Beth HoustonDigital Assets Manager: Marianne GrothMarketing Manager: Erin K. LaneMarketing Coordinator: Kathleen DeChavezSenior Author Support/Technology Specialist: Joe VetereRights and Permissions Advisor: Michael JoyceManufacturing Buyer: Debbie RossiCover Image: Fotolia: Yurok AleksandrovichCreative Director: Jayne ConteDesigner: Suzanne Behnke

    Many of the designations used by manufacturers and sellers to distinguish theirproducts are claimed as trademarks. Where those designations appear in this book,and Pearson Education was aware of a trademark claim, the designations have beenprinted in initial caps or all caps.

    Library of Congress Cataloging-in-Publications DataHogg, Robert V.

    Introduction to mathematical statistics / Robert V. Hogg, Joseph W.McKean, Allen T. Craig. 7th ed.

    p. cm.ISBN 978-0-321-79543-4

    1. Mathematical statistics. I. McKean, Joseph W., 1944- II. Craig,Allen T. (Allen Thorton), 1905- III. Title.

    QA276.H59 2013519.5dc23

    2011034906

    Copyright 2013, 2005, 1995 Pearson Education, Inc.

    All rights reserved. No part of this publication may be reproduced, stored in a re-trieval system, or transmitted, in any form or by any means, electronic, mechanical,photocopying, recording, or otherwise, without the prior written permission of thepublisher. Printed in the United States of America . For information on obtain-ing permission for use of material in this work, please submit a written request toPearson Education, Inc., Rights and Contracts Department, 501 Boylston Street,Suite 900 , Boston , MA 02116 , fax your request to 617-671-3447, or e-mail athttp://www.pearsoned.com/legal/permissions.htm.

    1 2 3 4 5 6 7 8 9 10CRS15 14 13 12 11

    www.pearsonhighered.com

    ISBN 13: 978-0-321-79543-4ISBN 10: 0-321-79543-1

    http://www.pearsoned.com/legal/permissions.htmwww.pearsonhighered.com

  • To Ann and to Marge

  • This page intentionally left blank

  • Contents

    Preface ix

    1 Probability and Distributions 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 The Probability Set Function . . . . . . . . . . . . . . . . . . . . . . 101.4 Conditional Probability and Independence . . . . . . . . . . . . . . . 211.5 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321.6 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . 40

    1.6.1 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . 421.7 Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . 44

    1.7.1 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . 461.8 Expectation of a Random Variable . . . . . . . . . . . . . . . . . . . 521.9 Some Special Expectations . . . . . . . . . . . . . . . . . . . . . . . 571.10 Important Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . 68

    2 Multivariate Distributions 732.1 Distributions of Two Random Variables . . . . . . . . . . . . . . . . 73

    2.1.1 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 792.2 Transformations: Bivariate Random Variables . . . . . . . . . . . . . 842.3 Conditional Distributions and Expectations . . . . . . . . . . . . . . 942.4 The Correlation Coefficient . . . . . . . . . . . . . . . . . . . . . . . 1022.5 Independent Random Variables . . . . . . . . . . . . . . . . . . . . . 1102.6 Extension to Several Random Variables . . . . . . . . . . . . . . . . 117

    2.6.1 Multivariate Variance-Covariance Matrix . . . . . . . . . . . 1232.7 Transformations for Several Random Variables . . . . . . . . . . . . 1262.8 Linear Combinations of Random Variables . . . . . . . . . . . . . . . 134

    3 Some Special Distributions 1393.1 The Binomial and Related Distributions . . . . . . . . . . . . . . . . 1393.2 The Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . 1503.3 The , 2, and Distributions . . . . . . . . . . . . . . . . . . . . . 1563.4 The Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 168

    3.4.1 Contaminated Normals . . . . . . . . . . . . . . . . . . . . . 174

    v

  • vi Contents

    3.5 The Multivariate Normal Distribution . . . . . . . . . . . . . . . . . 178

    3.5.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 1853.6 t- and F -Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 189

    3.6.1 The t-distribution . . . . . . . . . . . . . . . . . . . . . . . . 189

    3.6.2 The F -distribution . . . . . . . . . . . . . . . . . . . . . . . . 191

    3.6.3 Students Theorem . . . . . . . . . . . . . . . . . . . . . . . . 193

    3.7 Mixture Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 197

    4 Some Elementary Statistical Inferences 203

    4.1 Sampling and Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 203

    4.1.1 Histogram Estimates of pmfs and pdfs . . . . . . . . . . . . . 207

    4.2 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

    4.2.1 Confidence Intervals for Difference in Means . . . . . . . . . . 217

    4.2.2 Confidence Interval for Difference in Proportions . . . . . . . 219

    4.3 Confidence Intervals for Parameters of Discrete Distributions . . . . 223

    4.4 Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

    4.4.1 Quantiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

    4.4.2 Confidence Intervals for Quantiles . . . . . . . . . . . . . . . 234

    4.5 Introduction to Hypothesis Testing . . . . . . . . . . . . . . . . . . . 240

    4.6 Additional Comments About Statistical Tests . . . . . . . . . . . . . 248

    4.7 Chi-Square Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

    4.8 The Method of Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . 261

    4.8.1 AcceptReject Generation Algorithm . . . . . . . . . . . . . . 268

    4.9 Bootstrap Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . 273

    4.9.1 Percentile Bootstrap Confidence Intervals . . . . . . . . . . . 273

    4.9.2 Bootstrap Testing Procedures . . . . . . . . . . . . . . . . . . 276

    4.10 Tolerance Limits for Distributions . . . . . . . . . . . . . . . . . . . 284

    5 Consistency and Limiting Distributions 289

    5.1 Convergence in Probability . . . . . . . . . . . . . . . . . . . . . . . 289

    5.2 Convergence in Distribution . . . . . . . . . . . . . . . . . . . . . . . 294

    5.2.1 Bounded in Probability . . . . . . . . . . . . . . . . . . . . . 300

    5.2.2 -Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

    5.2.3 Moment Generating Function Technique . . . . . . . . . . . . 303

    5.3 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 307

    5.4 Extensions to Multivariate Distributions . . . . . . . . . . . . . . . 314

    6 Maximum Likelihood Methods 321

    6.1 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . 321

    6.2 RaoCramer Lower Bound and Efficiency . . . . . . . . . . . . . . . 327

    6.3 Maximum Likelihood Tests . . . . . . . . . . . . . . . . . . . . . . . 341

    6.4 Multiparameter Case: Estimation . . . . . . . . . . . . . . . . . . . . 350

    6.5 Multiparameter Case: Testing . . . . . . . . . . . . . . . . . . . . . . 359

    6.6 The EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367

  • Contents vii

    7 Sufficiency 3757.1 Measures of Quality of Estimators . . . . . . . . . . . . . . . . . . . 3757.2 A Sufficient Statistic for a Parameter . . . . . . . . . . . . . . . . . . 3817.3 Properties of a Sufficient Statistic . . . . . . . . . . . . . . . . . . . . 3887.4 Completeness and Uniqueness . . . . . . . . . . . . . . . . . . . . . . 3927.5 The Exponential Class of Distributions . . . . . . . . . . . . . . . . . 3977.6 Functions of a Parameter . . . . . . . . . . . . . . . . . . . . . . . . 4027.7 The Case of Several Parameters . . . . . . . . . . . . . . . . . . . . . 4077.8 Minimal Sufficiency and Ancillary Statistics . . . . . . . . . . . . . . 4157.9 Sufficiency, Completeness, and Independence . . . . . . . . . . . . . 421

    8 Optimal Tests of Hypotheses 4298.1 Most Powerful Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 4298.2 Uniformly Most Powerful Tests . . . . . . . . . . . . . . . . . . . . . 4398.3 Likelihood Ratio Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 4478.4 The Sequential Probability Ratio Test . . . . . . . . . . . . . . . . . 4598.5 Minimax and Classification Procedures . . . . . . . . . . . . . . . . . 466

    8.5.1 Minimax Procedures . . . . . . . . . . . . . . . . . . . . . . . 4668.5.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 469

    9 Inferences About Normal Models 4739.1 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4739.2 One-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4789.3 Noncentral 2 and F -Distributions . . . . . . . . . . . . . . . . . . . 4849.4 Multiple Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . 4869.5 The Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . 4909.6 A Regression Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 4979.7 A Test of Independence . . . . . . . . . . . . . . . . . . . . . . . . . 5069.8 The Distributions of Certain Quadratic Forms . . . . . . . . . . . . . 5099.9 The Independence of Certain Quadratic Forms . . . . . . . . . . . . 516

    10 Nonparametric and Robust Statistics 52510.1 Location Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52510.2 Sample Median and the Sign Test . . . . . . . . . . . . . . . . . . . . 528

    10.2.1 Asymptotic Relative Efficiency . . . . . . . . . . . . . . . . . 53310.2.2 Estimating Equations Based on the Sign Test . . . . . . . . . 53810.2.3 Confidence Interval for the Median . . . . . . . . . . . . . . . 539

    10.3 Signed-Rank Wilcoxon . . . . . . . . . . . . . . . . . . . . . . . . . . 54110.3.1 Asymptotic Relative Efficiency . . . . . . . . . . . . . . . . . 54610.3.2 Estimating Equations Based on Signed-Rank Wilcoxon . . . 54910.3.3 Confidence Interval for the Median . . . . . . . . . . . . . . . 549

    10.4 MannWhitneyWilcoxon Procedure . . . . . . . . . . . . . . . . . . 55110.4.1 Asymptotic Relative Efficiency . . . . . . . . . . . . . . . . . 55510.4.2 Estimating Equations Based on the MannWhitneyWilcoxon 55610.4.3 Confidence Interval for the Shift Parameter . . . . . . . . . 557

    10.5 General Rank Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . 559

  • viii Contents

    10.5.1 Efficacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56210.5.2 Estimating Equations Based on General Scores . . . . . . . . 56310.5.3 Optimization: Best Estimates . . . . . . . . . . . . . . . . . . 564

    10.6 Adaptive Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . 57110.7 Simple Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 57610.8 Measures of Association . . . . . . . . . . . . . . . . . . . . . . . . . 581

    10.8.1 Kendalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58210.8.2 Spearmans Rho . . . . . . . . . . . . . . . . . . . . . . . . . 584

    10.9 Robust Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58810.9.1 Location Model . . . . . . . . . . . . . . . . . . . . . . . . . . 58910.9.2 Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 595

    11 Bayesian Statistics 60511.1 Subjective Probability . . . . . . . . . . . . . . . . . . . . . . . . . . 60511.2 Bayesian Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . 608

    11.2.1 Prior and Posterior Distributions . . . . . . . . . . . . . . . . 60911.2.2 Bayesian Point Estimation . . . . . . . . . . . . . . . . . . . . 61211.2.3 Bayesian Interval Estimation . . . . . . . . . . . . . . . . . . 61511.2.4 Bayesian Testing Procedures . . . . . . . . . . . . . . . . . . 61611.2.5 Bayesian Sequential Procedures . . . . . . . . . . . . . . . . . 617

    11.3 More Bayesian Terminology and Ideas . . . . . . . . . . . . . . . . . 61911.4 Gibbs Sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62611.5 Modern Bayesian Methods . . . . . . . . . . . . . . . . . . . . . . . . 632

    11.5.1 Empirical Bayes . . . . . . . . . . . . . . . . . . . . . . . . . 636

    A Mathematical Comments 641A.1 Regularity Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 641A.2 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642

    B R Functions 645

    C Tables of Distributions 655

    D Lists of Common Distributions 665

    E References 669

    F Answers to Selected Exercises 673

    Index 683

  • Preface

    Changes for Seventh Edition

    In the preparation of this seventh edition, our goal has remained steadfast: toproduce an outstanding text in mathematical statistics. In this new edition, we haveadded examples and exercises to help clarify the exposition. For the same reason, wehave moved some material forward. For example, we moved the discussion on someproperties of linear combinations of random variables from Chapter 4 to Chapter2. This helps in the discussion of statistical properties in Chapter 3 as well as inthe new Chapter 4.

    One of the major changes was moving the chapter on Some Elementary Sta-tistical Inferences, from Chapter 5 to Chapter 4. This chapter on inference coversconfidence intervals and statistical tests of hypotheses, two of the most importantconcepts in statistical inference. We begin Chapter 4 with a discussion of a randomsample and point estimation. We introduce point estimation via a brief discussionof maximum likelihood estimation (the theory of maximum likelihood inference isstill fullly discussed in Chapter 6). In Chapter 4, though, the discussion is illus-trated with examples. After discussing point estimation in Chapter 4, we proceedonto confidence intervals and hypotheses testing. Inference for the basic one- andtwo-sample problems (large and small samples) is presented. We illustrate thisdiscussion with plenty of examples, several of which are concerned with real data.We have also added exercises dealing with real data. The discussion has also beenupdated; for example, exact confidence intervals for the parameters of discrete dis-tributions and bootstrap confidence intervals and tests of hypotheses are discussed,both of which are being used more and more in practice. These changes enablea one-semester course to cover basic statistical theory with applications. Such acourse would cover Chapters 14 and, depending on time, parts of Chapter 5. Fortwo-semester courses, this basic understanding of statistical inference will provequite helpful to students in the later chapters (68) on the statistical theory ofinference.

    Another major change is moving the discussion of robustness concepts (influ-ence function and breakdown) of Chapter 12 to the end of Chapter 10. To reflectthis move, the title of Chapter 10 has been changed to Nonparametric and RobustStatistics. This additional material in the new Chapter 10 is essentially the im-portant robustness concepts found in the old Chapter 12. Further, the simple linearmodel is discussed in Chapters 9 and 10. Hence, with this move we have eliminated

    ix

  • x Preface

    Chapter 12.Additional examples of R functions are in Appendix B to help readers who want

    to use R for statistical computation and simulation. We have also added a listingof discrete and continuous distributions in Appendix D. This will serve as a quickand helpful reference to the reader.

    Content and Course Planning

    Chapters 1 and 2 give the reader the necessary background material on probabilityand distribution theory for the remainder of the book. Chapter 3 discusses the mostwidely used discrete and continuous probability distributions. Chapter 4 containstopics in basic inference as described above. Chapter 5 presents large sample the-ory on convergence in probability and distribution and ends with the Central LimitTheorem. Chapter 6 provides a complete inference (estimation and testing) basedon maximum likelihood theory. This chapter also contains a discussion of the EMalgorithm and its application to several maximum likelihood situations. Chapters78 contain material on sufficient statistics and optimal tests of hypotheses. The fi-nal three chapters provide theory for three important topics in statistics. Chapter 9contains inference for normal theory methods for basic analysis of variance, univari-ate regression, and correlation models. Chapter 10 presents nonparametric methods(estimation and testing) for location and univariate regression models. It also in-cludes discussion on the robust concepts of efficiency, influence, and breakdown.Chapter 11 offers an introduction to Bayesian methods. This includes traditionalBayesian procedures as well as Markov Chain Monte Carlo techniques.

    Our text can be used in several different courses in mathematical statistics. Aone-semester course would include most of the sections in Chapters 14. The secondsemester would usually consist of Chapters 58, although some instructors mightprefer to use topics from Chapters 911. For example, a Bayesian advocate mightwant to teach Chapter 11 after Chapter 5, a nonparametrician could insert Chapter10 earlier, or a traditional statistician would include topics from Chapter 9.

    Acknowledgements

    We have many readers to thank. Their suggestions and comments proved invaluablein the preparation of this edition. A special thanks goes to Jun Yan of the Universityof Iowa, who made his web page on the sixth edition available to all, and also toThomas Hettmansperger of Penn State University, Ash Abebe of Auburn University,and Bradford Crain of Portland State University for their helpful comments. Wethank our accuracy checkers Kimberly F. Sellers (Georgetown University), BrianNewquist, Bill Josephson, and Joan Saniuk for their careful review. We would alsolike to thank the following reviewers for their comments and suggestions: RalphRusso (University of Iowa), Kanapathi Thiru (University of Alaska), Lifang Hsu(Le Moyne College), and Xiao Wang (University of MarylandBaltimore). Last, butnot least, we must thank our wives, Ann and Marge, who provided great supportfor our efforts.

    Bob Hogg & Joe McKean

  • Chapter 1

    Probability and Distributions

    1.1 Introduction

    Many kinds of investigations may be characterized in part by the fact that repeatedexperimentation, under essentially the same conditions, is more or less standardprocedure. For instance, in medical research, interest may center on the effect ofa drug that is to be administered; or an economist may be concerned with theprices of three specified commodities at various time intervals; or the agronomistmay wish to study the effect that a chemical fertilizer has on the yield of a cerealgrain. The only way in which an investigator can elicit information about any suchphenomenon is to perform the experiment. Each experiment terminates with anoutcome. But it is characteristic of these experiments that the outcome cannot bepredicted with certainty prior to the performance of the experiment.

    Suppose that we have such an experiment, but the experiment is of such a naturethat a collection of every possible outcome can be described prior to its performance.If this kind of experiment can be repeated under the same conditions, it is calleda random experiment, and the collection of every possible outcome is called theexperimental space or the sample space.

    Example 1.1.1. In the toss of a coin, let the outcome tails be denoted by T and letthe outcome heads be denoted by H . If we assume that the coin may be repeatedlytossed under the same conditions, then the toss of this coin is an example of arandom experiment in which the outcome is one of the two symbols T and H ; thatis, the sample space is the collection of these two symbols.

    Example 1.1.2. In the cast of one red die and one white die, let the outcome bethe ordered pair (number of spots up on the red die, number of spots up on thewhite die). If we assume that these two dice may be repeatedly cast under the sameconditions, then the cast of this pair of dice is a random experiment. The samplespace consists of the 36 ordered pairs: (1, 1), . . . , (1, 6), (2, 1), . . . , (2, 6), . . . , (6, 6).

    Let C denote a sample space, let c denote an element of C, and let C represent acollection of elements of C. If, upon the performance of the experiment, the outcome

    1

  • 2 Probability and Distributions

    is in C, we shall say that the event C has occurred. Now conceive of our havingmade N repeated performances of the random experiment. Then we can count thenumber f of times (the frequency) that the event C actually occurred throughoutthe N performances. The ratio f/N is called the relative frequency of the eventC in these N experiments. A relative frequency is usually quite erratic for smallvalues of N , as you can discover by tossing a coin. But as N increases, experienceindicates that we associate with the event C a number, say p, that is equal orapproximately equal to that number about which the relative frequency seems tostabilize. If we do this, then the number p can be interpreted as that number which,in future performances of the experiment, the relative frequency of the event C willeither equal or approximate. Thus, although we cannot predict the outcome ofa random experiment, we can, for a large value of N , predict approximately therelative frequency with which the outcome will be in C. The number p associatedwith the event C is given various names. Sometimes it is called the probability thatthe outcome of the random experiment is in C; sometimes it is called the probabilityof the event C; and sometimes it is called the probability measure of C. The contextusually suggests an appropriate choice of terminology.

    Example 1.1.3. Let C denote the sample space of Example 1.1.2 and let C be thecollection of every ordered pair of C for which the sum of the pair is equal to seven.Thus C is the collection (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), and (6, 1). Suppose that thedice are cast N = 400 times and let f , the frequency of a sum of seven, be f = 60.Then the relative frequency with which the outcome was in C is f/N = 60400 = 0.15.Thus we might associate with C a number p that is close to 0.15, and p would becalled the probability of the event C.

    Remark 1.1.1. The preceding interpretation of probability is sometimes referredto as the relative frequency approach, and it obviously depends upon the fact that anexperiment can be repeated under essentially identical conditions. However, manypersons extend probability to other situations by treating it as a rational measureof belief. For example, the statement p = 25 would mean to them that their personalor subjective probability of the event C is equal to 25 . Hence, if they are not opposedto gambling, this could be interpreted as a willingness on their part to bet on theoutcome of C so that the two possible payoffs are in the ratio p/(1 p) = 25/ 35 = 23 .Moreover, if they truly believe that p = 25 is correct, they would be willing toaccept either side of the bet: (a) win 3 units if C occurs and lose 2 if it does notoccur, or (b) win 2 units if C does not occur and lose 3 if it does. However, sincethe mathematical properties of probability given in Section 1.3 are consistent witheither of these interpretations, the subsequent mathematical development does notdepend upon which approach is used.

    The primary purpose of having a mathematical theory of statistics is to providemathematical models for random experiments. Once a model for such an experi-ment has been provided and the theory worked out in detail, the statistician may,within this framework, make inferences (that is, draw conclusions) about the ran-dom experiment. The construction of such a model requires a theory of probability.One of the more logically satisfying theories of probability is that based on theconcepts of sets and functions of sets. These concepts are introduced in Section 1.2.

  • 1.2. Set Theory 3

    1.2 Set Theory

    The concept of a set or a collection of objects is usually left undefined. However,a particular set can be described so that there is no misunderstanding as to whatcollection of objects is under consideration. For example, the set of the first 10positive integers is sufficiently well described to make clear that the numbers 34 and14 are not in the set, while the number 3 is in the set. If an object belongs to aset, it is said to be an element of the set. For example, if C denotes the set of realnumbers x for which 0 x 1, then 34 is an element of the set C. The fact that34 is an element of the set C is indicated by writing

    34 C. More generally, c C

    means that c is an element of the set C.The sets that concern us are frequently sets of numbers. However, the language

    of sets of points proves somewhat more convenient than that of sets of numbers.Accordingly, we briefly indicate how we use this terminology. In analytic geometryconsiderable emphasis is placed on the fact that to each point on a line (on whichan origin and a unit point have been selected) there corresponds one and only onenumber, say x; and that to each number x there corresponds one and only one pointon the line. This one-to-one correspondence between the numbers and points on aline enables us to speak, without misunderstanding, of the point x instead of thenumber x. Furthermore, with a plane rectangular coordinate system and with xand y numbers, to each symbol (x, y) there corresponds one and only one point in theplane; and to each point in the plane there corresponds but one such symbol. Hereagain, we may speak of the point (x, y), meaning the ordered number pair x andy. This convenient language can be used when we have a rectangular coordinatesystem in a space of three or more dimensions. Thus the point (x1, x2, . . . , xn)means the numbers x1, x2, . . . , xn in the order stated. Accordingly, in describing oursets, we frequently speak of a set of points (a set whose elements are points), beingcareful, of course, to describe the set so as to avoid any ambiguity. The notationC = {x : 0 x 1} is read C is the one-dimensional set of points x for which0 x 1. Similarly, C = {(x, y) : 0 x 1, 0 y 1} can be read C is thetwo-dimensional set of points (x, y) that are interior to, or on the boundary of, asquare with opposite vertices at (0, 0) and (1, 1).

    We say a set C is countable if C is finite or has as many elements as there arepositive integers. For example, the sets C1 = {1, 2, . . . , 100} and C2 = {1, 3, 5, 7, . . .}are countable sets. The interval of real numbers (0, 1], though, is not countable.

    We now give some definitions (together with illustrative examples) that lead toan elementary algebra of sets adequate for our purposes.

    Definition 1.2.1. If each element of a set C1 is also an element of set C2, theset C1 is called a subset of the set C2. This is indicated by writing C1 C2.If C1 C2 and also C2 C1, the two sets have the same elements, and this isindicated by writing C1 = C2.

    Example 1.2.1. Let C1 = {x : 0 x 1} and C2 = {x : 1 x 2}. Here theone-dimensional set C1 is seen to be a subset of the one-dimensional set C2; thatis, C1 C2. Subsequently, when the dimensionality of the set is clear, we do notmake specific reference to it.

  • 4 Probability and Distributions

    Example 1.2.2. Define the two sets C1 = {(x, y) : 0 x = y 1} and C2 ={(x, y) : 0 x 1, 0 y 1}. Because the elements of C1 are the points on onediagonal of the square, then C1 C2.

    Definition 1.2.2. If a set C has no elements, C is called the null set. This isindicated by writing C = .

    Definition 1.2.3. The set of all elements that belong to at least one of the sets C1and C2 is called the union of C1 and C2. The union of C1 and C2 is indicated bywriting C1 C2. The union of several sets C1, C2, C3, . . . is the set of all elementsthat belong to at least one of the several sets, denoted by C1C2C3 = j=1Cjor by C1 C2 Ck = kj=1Cj if a finite number k of sets is involved.

    We refer to a union of the form j=1Cj as a countable union.

    Example 1.2.3. Define the sets C1 = {x : x = 8, 9, 10, 11, or 11 < x 12} andC2 = {x : x = 0, 1, . . . , 10}. Then

    C1 C2 = {x : x = 0, 1, . . . , 8, 9, 10, 11, or 11 < x 12}= {x : x = 0, 1, . . . , 8, 9, 10 or 11 x 12}.

    Example 1.2.4. Define C1 and C2 as in Example 1.2.1. Then C1 C2 = C2.

    Example 1.2.5. Let C2 = . Then C1 C2 = C1, for every set C1.

    Example 1.2.6. For every set C, C C = C.

    Example 1.2.7. Let

    Ck ={x : 1k+1 x 1

    }, k = 1, 2, 3, . . . .

    Then k=1Ck = {x : 0 < x 1}. Note that the number zero is not in this set, sinceit is not in one of the sets C1, C2, C3, . . . .

    Definition 1.2.4. The set of all elements that belong to each of the sets C1 and C2is called the intersection of C1 and C2. The intersection of C1 and C2 is indicatedby writing C1 C2. The intersection of several sets C1, C2, C3, . . . is the set of allelements that belong to each of the sets C1, C2, C3, . . . . This intersection is denotedby C1C2 C3 = j=1Cj or by C1 C2 Ck = kj=1Cj if a finite numberk of sets is involved.

    We refer to an intersection of the form j=1Cj as a countable intersection.

    Example 1.2.8. Let C1 = {(0, 0), (0, 1), (1, 1)} and C2 = {(1, 1), (1, 2), (2, 1)}.Then C1 C2 = {(1, 1)}.

    Example 1.2.9. Let C1 = {(x, y) : 0 x + y 1} and C2 = {(x, y) : 1 < x + y}.Then C1 and C2 have no points in common and C1 C2 = .

    Example 1.2.10. For every set C, C C = C and C = .

  • 1.2. Set Theory 5

    C1 C2

    (a) (b)

    C1 C2

    Figure 1.2.1: (a) C1 C2 and (b) C1 C2.

    Example 1.2.11. Let

    Ck ={x : 0 < x < 1k

    }, k = 1, 2, 3, . . .

    Then k=1Ck = , because there is no point that belongs to each of the setsC1, C2, C3, . . . .

    Example 1.2.12. Let C1 and C2 represent the sets of points enclosed, respectively,by two intersecting ellipses. Then the sets C1 C2 and C1 C2 are represented,respectively, by the shaded regions in the Venn diagrams in Figure 1.2.1.

    Definition 1.2.5. In certain discussions or considerations, the totality of all ele-ments that pertain to the discussion can be described. This set of all elements underconsideration is given a special name. It is called the space. We often denote spacesby letters such as C and D.

    Example 1.2.13. Let the number of heads, in tossing a coin four times, be denotedby x. Of necessity, the number of heads is of the numbers 0, 1, 2, 3, 4. Here, then,the space is the set C = {0, 1, 2, 3, 4}.

    Example 1.2.14. Consider all nondegenerate rectangles of base x and height y.To be meaningful, both x and y must be positive. Then the space is given by theset C = {(x, y) : x > 0, y > 0}.

    Definition 1.2.6. Let C denote a space and let C be a subset of the set C. The setthat consists of all elements of C that are not elements of C is called the comple-ment of C (actually, with respect to C). The complement of C is denoted by Cc.In particular, Cc = .

  • 6 Probability and Distributions

    Example 1.2.15. Let C be defined as in Example 1.2.13, and let the set C = {0, 1}.The complement of C (with respect to C) is Cc = {2, 3, 4}.Example 1.2.16. Given C C. Then C Cc = C, C Cc = , C C = C,C C = C, and (Cc)c = C.Example 1.2.17 (DeMorgans Laws). A set of useful rules is known as DeMorgansLaws. Let C denote a space and let Ci C, i = 1, 2. Then

    (C1 C2)c = Cc1 Cc2 (1.2.1)(C1 C2)c = Cc1 Cc2 . (1.2.2)

    The reader is asked to prove these in Exercise 1.2.4 and to extend them to countableunions and intersections.

    Many of the functions used in calculus and in this book are functions which mapreal numbers into real numbers. We are often, however, concerned with functionsthat map sets into real numbers. Such functions are naturally called functions of aset or, more simply, set functions. Next we give some examples of set functionsand evaluate them for certain simple sets.

    Example 1.2.18. Let C be a set in one-dimensional space and let Q(C) be equalto the number of points in C which correspond to positive integers. Then Q(C)is a function of the set C. Thus, if C = {x : 0 < x < 5}, then Q(C) = 4; ifC = {2,1}, then Q(C) = 0; if C = {x : < x < 6}, then Q(C) = 5.Example 1.2.19. Let C be a set in two-dimensional space and let Q(C) be thearea of C if C has a finite area; otherwise, let Q(C) be undefined. Thus, if C ={(x, y) : x2 + y2 1}, then Q(C) = ; if C = {(0, 0), (1, 1), (0, 1)}, then Q(C) = 0;if C = {(x, y) : 0 x, 0 y, x + y 1}, then Q(C) = 12 .Example 1.2.20. Let C be a set in three-dimensional space and let Q(C) be thevolume of C if C has a finite volume; otherwise, let Q(C) be undefined. Thus, ifC = {(x, y, z) : 0 x 2, 0 y 1, 0 z 3}, then Q(C) = 6; if C = {(x, y, z) :x2 + y2 + z2 1}, then Q(C) is undefined.

    At this point we introduce the following notations. The symbolC

    f(x) dx

    means the ordinary (Riemann) integral of f(x) over a prescribed one-dimensionalset C; the symbol

    C

    g(x, y) dxdy

    means the Riemann integral of g(x, y) over a prescribed two-dimensional set C; andso on. To be sure, unless these sets C and these functions f(x) and g(x, y) arechosen with care, the integrals frequently fail to exist. Similarly, the symbol

    C

    f(x)

  • 1.2. Set Theory 7

    means the sum extended over all x C; the symbolC

    g(x, y)

    means the sum extended over all (x, y) C; and so on.

    Example 1.2.21. Let C be a set in one-dimensional space and let Q(C) =

    C f(x),where

    f(x) =

    {(12 )

    x x = 1, 2, 3, . . .0 elsewhere.

    If C = {x : 0 x 3}, then

    Q(C) = 12 + (12 )

    2 + (12 )3 = 78 .

    Example 1.2.22. Let Q(C) =

    C f(x), where

    f(x) =

    {px(1 p)1x x = 0, 10 elsewhere.

    If C = {0}, then

    Q(C) =0

    x=0

    px(1 p)1x = 1 p;

    if C = {x : 1 x 2}, then Q(C) = f(1) = p.

    Example 1.2.23. Let C be a one-dimensional set and let

    Q(C) =

    C

    exdx.

    Thus, if C = {x : 0 x < }, then

    Q(C) =

    0

    exdx = 1;

    if C = {x : 1 x 2}, then

    Q(C) =

    21

    exdx = e1 e2;

    if C1 = {x : 0 x 1} and C2 = {x : 1 < x 3}, then

    Q(C1 C2) = 3

    0

    exdx

    =

    10

    exdx + 3

    1

    exdx

    = Q(C1) + Q(C2).

  • 8 Probability and Distributions

    Example 1.2.24. Let C be a set in n-dimensional space and let

    Q(C) =

    C

    dx1dx2 dxn.

    If C = {(x1, x2, . . . , xn) : 0 x1 x2 xn 1}, then

    Q(C) =

    10

    xn0

    x3

    0

    x20

    dx1dx2 dxn1dxn

    =1

    n!,

    where n! = n(n 1) 3 2 1.EXERCISES

    1.2.1. Find the union C1 C2 and the intersection C1 C2 of the two sets C1 andC2, where

    (a) C1 = {0, 1, 2, }, C2 = {2, 3, 4}.

    (b) C1 = {x : 0 < x < 2}, C2 = {x : 1 x < 3}.

    (c) C1 = {(x, y) : 0 < x < 2, 1 < y < 2}, C2 = {(x, y) : 1 < x < 3, 1 < y < 3}.1.2.2. Find the complement Cc of the set C with respect to the space C if(a) C = {x : 0 < x < 1}, C = {x : 58 < x < 1}.

    (b) C = {(x, y, z) : x2 + y2 + z2 1}, C = {(x, y, z) : x2 + y2 + z2 = 1}.

    (c) C = {(x, y) : |x| + |y| 2}, C = {(x, y) : x2 + y2 < 2}.1.2.3. List all possible arrangements of the four letters m, a, r, and y. Let C1 bethe collection of the arrangements in which y is in the last position. Let C2 be thecollection of the arrangements in which m is in the first position. Find the unionand the intersection of C1 and C2.

    1.2.4. Referring to Example 1.2.17, verify DeMorgans Laws (1.2.1) and (1.2.2) byusing Venn diagrams and then prove that the laws are true. Generalize the laws tocountable unions and intersections.

    1.2.5. By the use of Venn diagrams, in which the space C is the set of pointsenclosed by a rectangle containing the circles C1, C2, and C3, compare the followingsets. These laws are called the distributive laws.

    (a) C1 (C2 C3) and (C1 C2) (C1 C3).

    (b) C1 (C2 C3) and (C1 C2) (C1 C3).1.2.6. If a sequence of sets C1, C2, C3, . . . is such that Ck Ck+1, k = 1, 2, 3, . . . ,the sequence is said to be a nondecreasing sequence. Give an example of this kindof sequence of sets.

  • 1.2. Set Theory 9

    1.2.7. If a sequence of sets C1, C2, C3, . . . is such that Ck Ck+1, k = 1, 2, 3, . . . ,the sequence is said to be a nonincreasing sequence. Give an example of this kindof sequence of sets.

    1.2.8. Suppose C1, C2, C3, . . . is a nondecreasing sequence of sets, i.e., Ck Ck+1,for k = 1, 2, 3, . . . . Then limk Ck is defined as the union C1C2C3 . Findlimk Ck if

    (a) Ck = {x : 1/k x 3 1/k}, k = 1, 2, 3, . . . .

    (b) Ck = {(x, y) : 1/k x2 + y2 4 1/k}, k = 1, 2, 3, . . . .

    1.2.9. If C1, C2, C3, . . . are sets such that Ck Ck+1, k = 1, 2, 3, . . ., limk Ck isdefined as the intersection C1 C2 C3 . Find limk Ck if

    (a) Ck = {x : 2 1/k < x 2}, k = 1, 2, 3, . . . .

    (b) Ck = {x : 2 < x 2 + 1/k}, k = 1, 2, 3, . . . .

    (c) Ck = {(x, y) : 0 x2 + y2 1/k}, k = 1, 2, 3, . . . .

    1.2.10. For every one-dimensional set C, define the function Q(C) =

    C f(x),where f(x) = (23 )(

    13 )

    x, x = 0, 1, 2, . . . , zero elsewhere. If C1 = {x : x = 0, 1, 2, 3}and C2 = {x : x = 0, 1, 2, . . .}, find Q(C1) and Q(C2).Hint: Recall that Sn = a + ar + + arn1 = a(1 rn)/(1 r) and, hence, itfollows that limn Sn = a/(1 r) provided that |r| < 1.

    1.2.11. For every one-dimensional set C for which the integral exists, let Q(C) =C f(x) dx, where f(x) = 6x(1 x), 0 < x < 1, zero elsewhere; otherwise, let Q(C)

    be undefined. If C1 = {x : 14 < x < 34}, C2 = { 12}, and C3 = {x : 0 < x < 10}, findQ(C1), Q(C2), and Q(C3).

    1.2.12. For every two-dimensional set C contained in R2 for which the integralexists, let Q(C) =

    C

    (x2 + y2) dxdy. If C1 = {(x, y) : 1 x 1,1 y 1},C2 = {(x, y) : 1 x = y 1}, and C3 = {(x, y) : x2 +y2 1}, find Q(C1), Q(C2),and Q(C3).

    1.2.13. Let C denote the set of points that are interior to, or on the boundary of, asquare with opposite vertices at the points (0, 0) and (1, 1). Let Q(C) =

    C

    dy dx.

    (a) If C C is the set {(x, y) : 0 < x < y < 1}, compute Q(C).

    (b) If C C is the set {(x, y) : 0 < x = y < 1}, compute Q(C).

    (c) If C C is the set {(x, y) : 0 < x/2 y 3x/2 < 1}, compute Q(C).

    1.2.14. Let C be the set of points interior to or on the boundary of a cube withedge of length 1. Moreover, say that the cube is in the first octant with one vertexat the point (0, 0, 0) and an opposite vertex at the point (1, 1, 1). Let Q(C) =

    Cdxdydz.

    (a) If C C is the set {(x, y, z) : 0 < x < y < z < 1}, compute Q(C).

  • 10 Probability and Distributions

    (b) If C is the subset {(x, y, z) : 0 < x = y = z < 1}, compute Q(C).

    1.2.15. Let C denote the set {(x, y, z) : x2 + y2 + z2 1}. Using spherical coordi-nates, evaluate

    Q(C) =

    C

    x2 + y2 + z2 dxdydz.

    1.2.16. To join a certain club, a person must be either a statistician or a math-ematician or both. Of the 25 members in this club, 19 are statisticians and 16are mathematicians. How many persons in the club are both a statistician and amathematician?

    1.2.17. After a hard-fought football game, it was reported that, of the 11 startingplayers, 8 hurt a hip, 6 hurt an arm, 5 hurt a knee, 3 hurt both a hip and an arm,2 hurt both a hip and a knee, 1 hurt both an arm and a knee, and no one hurt allthree. Comment on the accuracy of the report.

    1.3 The Probability Set Function

    Given an experiment, let C denote the sample space of all possible outcomes. Asdiscussed in Section 1.1, we are interested in assigning probabilities to events, i.e.,subsets of C. What should be our collection of events? If C is a finite set, then wecould take the set of all subsets as this collection. For infinite sample spaces, though,with assignment of probabilities in mind, this poses mathematical technicalitieswhich are better left to a course in probability theory. We assume that in allcases, the collection of events is sufficiently rich to include all possible events ofinterest and is closed under complements and countable unions of these events.Using DeMorgans Laws, Example 1.2.17, the collection is then also closed undercountable intersections. We denote this collection of events by B. Technically, sucha collection of events is called a -field of subsets.

    Now that we have a sample space, C, and our collection of events, B, we can definethe third component in our probability space, namely a probability set function. Inorder to motivate its definition, we consider the relative frequency approach toprobability.

    Remark 1.3.1. The definition of probability consists of three axioms which wemotivate by the following three intuitive properties of relative frequency. Let C bea sample space and let C C. Suppose we repeat the experiment N times. Thenthe relative frequency of C is fC = #{C}/N , where #{C} denotes the number oftimes C occurred in the N repetitions. Note that fC 0 and fC = 1. These arethe first two properties. For the third, suppose that C1 and C2 are disjoint events.Then fC1C2 = fC1 + fC2. These three properties of relative frequencies form theaxioms of a probability, except that the third axiom is in terms of countable unions.As with the axioms of probability, the readers should check that the theorems weprove below about probabilities agree with their intuition of relative frequency.

  • 1.3. The Probability Set Function 11

    Definition 1.3.1 (Probability). Let C be a sample space and let B be the set ofevents. Let P be a real-valued function defined on B. Then P is a probability setfunction if P satisfies the following three conditions:

    1. P (C) 0, for all C B.

    2. P (C) = 1.

    3. If {Cn} is a sequence of events in B and Cm Cn = for all m = n, then

    P

    ( n=1

    Cn

    )=

    n=1

    P (Cn).

    A collection of events whose members are pairwise disjoint, as in (3), is said to bea mutually exclusive collection. The collection is further said to be exhaustiveif the union of its events is the sample space, in which case

    n=1 P (Cn) = 1.

    We often say that a mutually exclusive and exhaustive collection of events forms apartition of C.

    A probability set function tells us how the probability is distributed over the setof events, B. In this sense we speak of a distribution of probability. We often dropthe word set and refer to P as a probability function.

    The following theorems give us some other properties of a probability set func-tion. In the statement of each of these theorems, P (C) is taken, tacitly, to be aprobability set function defined on the collection of events B of a sample space C.

    Theorem 1.3.1. For each event C B, P (C) = 1 P (Cc).

    Proof: We have C = C Cc and C Cc = . Thus, from (2) and (3) of Definition1.3.1, it follows that

    1 = P (C) + P (Cc),

    which is the desired result.

    Theorem 1.3.2. The probability of the null set is zero; that is, P () = 0.

    Proof: In Theorem 1.3.1, take C = so that Cc = C. Accordingly, we have

    P () = 1 P (C) = 1 1 = 0

    and the theorem is proved.

    Theorem 1.3.3. If C1 and C2 are events such that C1 C2, then P (C1) P (C2).

    Proof: Now C2 = C1 (Cc1 C2) and C1 (Cc1 C2) = . Hence, from (3) ofDefinition 1.3.1,

    P (C2) = P (C1) + P (Cc1 C2).

    From (1) of Definition 1.3.1, P (Cc1 C2) 0. Hence, P (C2) P (C1).

  • 12 Probability and Distributions

    Theorem 1.3.4. For each C B, 0 P (C) 1.

    Proof: Since C C, we have by Theorem 1.3.3 that

    P () P (C) P (C) or 0 P (C) 1,

    the desired result.

    Part (3) of the definition of probability says that P (C1 C2) = P (C1) + P (C2)if C1 and C2 are disjoint, i.e., C1 C2 = . The next theorem gives the rule forany two events.

    Theorem 1.3.5. If C1 and C2 are events in C, then

    P (C1 C2) = P (C1) + P (C2) P (C1 C2).

    Proof: Each of the sets C1 C2 and C2 can be represented, respectively, as a unionof nonintersecting sets as follows:

    C1 C2 = C1 (Cc1 C2) and C2 = (C1 C2) (Cc1 C2).

    Thus, from (3) of Definition 1.3.1,

    P (C1 C2) = P (C1) + P (Cc1 C2)

    and

    P (C2) = P (C1 C2) + P (Cc1 C2).If the second of these equations is solved for P (Cc1 C2) and this result substitutedin the first equation, we obtain

    P (C1 C2) = P (C1) + P (C2) P (C1 C2).

    This completes the proof.

    Remark 1.3.2 (Inclusion Exclusion Formula). It is easy to show (Exercise 1.3.9)that

    P (C1 C2 C3) = p1 p2 + p3,where

    p1 = P (C1) + P (C2) + P (C3)

    p2 = P (C1 C2) + P (C1 C3) + P (C2 C3)p3 = P (C1 C2 C3). (1.3.1)

    This can be generalized to the inclusion exclusion formula:

    P (C1 C2 Ck) = p1 p2 + p3 + (1)k+1pk, (1.3.2)

  • 1.3. The Probability Set Function 13

    where pi equals the sum of the probabilities of all possible intersections involving isets. It is clear in the case k = 3 that p1 p2 p3, but more generally p1 p2 pk. As shown in Theorem 1.3.7,

    p1 = P (C1) + P (C2) + + P (Ck) P (C1 C2 Ck).

    This is known as Booles inequality. For k = 2, we have

    1 P (C1 C2) = P (C1) + P (C2) P (C1 C2),

    which gives Bonferronis inequality,

    P (C1 C2) P (C1) + P (C2) 1, (1.3.3)that is only useful when P (C1) and P (C2) are large. The inclusion exclusion formulaprovides other inequalities that are useful, such as

    p1 P (C1 C2 Ck) p1 p2

    andp1 p2 + p3 P (C1 C2 Ck) p1 p2 + p3 p4.

    Example 1.3.1. Let C denote the sample space of Example 1.1.2. Let the proba-bility set function assign a probability of 136 to each of the 36 points in C; that is, thedice are fair. If C1 = {(1, 1), (2, 1), (3, 1), (4, 1), (5, 1)} and C2 = {(1, 2), (2, 2), (3, 2)},then P (C1) =

    536 , P (C2) =

    336 , P (C1 C2) = 836 , and P (C1 C2) = 0.

    Example 1.3.2. Two coins are to be tossed and the outcome is the ordered pair(face on the first coin, face on the second coin). Thus the sample space may berepresented as C = {(H, H), (H, T ), (T, H), (T, T )}. Let the probability set functionassign a probability of 14 to each element of C. Let C1 = {(H, H), (H, T )} andC2 = {(H, H), (T, H)}. Then P (C1) = P (C2) = 12 , P (C1 C2) = 14 , and, inaccordance with Theorem 1.3.5, P (C1 C2) = 12 + 12 14 = 34 .

    Example 1.3.3 (Equilikely Case). Let C be partitioned into k mutually disjointsubsets C1, C2, . . . , Ck in such a way that the union of these k mutually disjoint sub-sets is the sample space C. Thus the events C1, C2, . . . , Ck are mutually exclusiveand exhaustive. Suppose that the random experiment is of such a character thatit is reasonable to assume that each of the mutually exclusive and exhaustive eventsCi, i = 1, 2, . . . , k, has the same probability. It is necessary then that P (Ci) = 1/k,i = 1, 2, . . . , k; and we often say that the events C1, C2, . . . , Ck are equally likely.Let the event E be the union of r of these mutually exclusive events, say

    E = C1 C2 Cr, r k.

    ThenP (E) = P (C1) + P (C2) + + P (Cr) =

    r

    k.

    Frequently, the integer k is called the total number of ways (for this particularpartition of C) in which the random experiment can terminate and the integer r is

  • 14 Probability and Distributions

    called the number of ways that are favorable to the event E. So, in this terminology,P (E) is equal to the number of ways favorable to the event E divided by the totalnumber of ways in which the experiment can terminate. It should be emphasizedthat in order to assign, in this manner, the probability r/k to the event E, we mustassume that each of the mutually exclusive and exhaustive events C1, C2, . . . , Ck hasthe same probability 1/k. This assumption of equally likely events then becomes apart of our probability model. Obviously, if this assumption is not realistic in anapplication, the probability of the event E cannot be computed in this way.

    In order to illustrate the equilikely case, it is helpful to use some elementarycounting rules. These are usually discussed in an elementary algebra course. In thenext remark, we offer a brief review of these rules.

    Remark 1.3.3 (Counting Rules). Suppose we have two experiments. The firstexperiment results in m outcomes, while the second experiment results in n out-comes. The composite experiment, first experiment followed by second experiment,has mn outcomes, which can be represented as mn ordered pairs. This is called themultiplication rule or the mn-rule. This is easily extended to more than twoexperiments.

    Let A be a set with n elements. Suppose we are interested in k-tuples whosecomponents are elements of A. Then by the extended multiplication rule, thereare n n n = nk such a k-tuples whose components are elements of A. Next,suppose k n and we are interested in k-tuples whose components are distinct (norepeats) elements of A. There are n elements from which to choose for the firstcomponent, n 1 for the second component, . . . , n (k 1) for the kth. Hence,by the multiplication rule, there are n(n 1) (n (k 1)) such k-tuples withdistinct elements. We call each such k-tuple a permutation and use the symbolPnk to denote the number of k permutations taken from a set of n elements. Hence,we have the formula

    Pnk = n(n 1) (n (k 1)) =n!

    (n k)! . (1.3.4)

    Next, suppose order is not important, so instead of counting the number of permu-tations we want to count the number of subsets of k elements taken from A. Weuse the symbol

    (nk

    )to denote the total number of these subsets. Consider a subset

    of k elements from A. By the permutation rule it generates P kk = k(k 1) 1permutations. Furthermore, all these permutations are distinct from permutationsgenerated by other subsets of k elements from A. Finally, each permutation of kdistinct elements drawn from A must be generated by one of these subsets. Hence,we have just shown that Pnk =

    (nk

    )k!; that is,(n

    k

    )=

    n!

    k!(n k)! . (1.3.5)

    We often use the terminology combinations instead of subsets. So we say that thereare

    (nk

    )combinations of k things taken from a set of n things. Another common

    symbol for(nk

    )is Cnk .

  • 1.3. The Probability Set Function 15

    It is interesting to note that if we expand the binomial,

    (a + b)n = (a + b)(a + b) (a + b),

    we get

    (a + b)n =

    nk=0

    (n

    k

    )akbnk (1.3.6)

    because we can select the k factors from which to take a in(nk

    )ways. So

    (nk

    )is also

    referred to as a binomial coefficient.

    Example 1.3.4 (Poker Hands). Let a card be drawn at random from an ordinarydeck of 52 playing cards which has been well shuffled. The sample space C isthe union of k = 52 outcomes, and it is reasonable to assume that each of theseoutcomes has the same probability 152 . Accordingly, if E1 is the set of outcomesthat are spades, P (E1) =

    1352 =

    14 because there are r1 = 13 spades in the deck; that

    is, 14 is the probability of drawing a card that is a spade. If E2 is the set of outcomesthat are kings, P (E2) =

    452 =

    113 because there are r2 = 4 kings in the deck; that

    is, 113 is the probability of drawing a card that is a king. These computations arevery easy because there are no difficulties in the determination of the appropriatevalues of r and k.

    However, instead of drawing only one card, suppose that five cards are taken,at random and without replacement, from this deck; i.e, a five card poker hand. Inthis instance, order is not important. So a hand is a subset of five elements drawnfrom a set of 52 elements. Hence, by (1.3.5) there are

    (525

    )poker hands. If the

    deck is well shuffled, each hand should be equilikely; i.e., each hand has probability1/(525

    ). We can now compute the probabilities of some interesting poker hands. Let

    E1 be the event of a flush, all five cards of the same suit. There are(41

    )= 4 suits

    to choose for the flush and in each suit there are(135

    )possible hands; hence, using

    the multiplication rule, the probability of getting a flush is

    P (E1) =

    (41

    )(135

    )(525

    ) = 4 12872598960

    = 0.00198.

    Real poker players note that this includes the probability of obtaining a straightflush.

    Next, consider the probability of the event E2 of getting exactly three of a kind,(the other two cards are distinct and are of different kinds). Choose the kind forthe three, in

    (131

    )ways; choose the three, in

    (43

    )ways; choose the other two kinds,

    in(122

    )ways; and choose one card from each of these last two kinds, in

    (41

    )(41

    )ways.

    Hence the probability of exactly three of a kind is

    P (E2) =

    (131

    )(43

    )(122

    )(41

    )2(525

    ) = 0.0211.Now suppose that E3 is the set of outcomes in which exactly three cards are

    kings and exactly two cards are queens. Select the kings, in(43

    )ways, and select

  • 16 Probability and Distributions

    the queens, in(42

    )ways. Hence, the probability of E3 is

    P (E3) =

    (4

    3

    )(4

    2

    )/(525

    )= 0.0000093.

    The event E3 is an example of a full house: three of one kind and two of anotherkind. Exercise 1.3.18 asks for the determination of the probability of a full house.

    Example 1.3.4 and the previous discussion allow us to see one way in whichwe can define a probability set function, that is, a set function that satisfies therequirements of Definition 1.3.1. Suppose that our space C consists of k distinctpoints, which, for this discussion, we take to be in a one-dimensional space. If therandom experiment that ends in one of those k points is such that it is reasonableto assume that these points are equally likely, we could assign 1/k to each pointand let, for C C,

    P (C) =number of points in C

    k

    =xC

    f(x), where f(x) =1

    k, x C.

    For illustration, in the cast of a die, we could take C = {1, 2, 3, 4, 5, 6} andf(x) = 16 , x C, if we believe the die to be unbiased. Clearly, such a set functionsatisfies Definition 1.3.1.

    The word unbiased in this illustration suggests the possibility that all six pointsmight not, in all such cases, be equally likely. As a matter of fact, loaded dice doexist. In the case of a loaded die, some numbers occur more frequently than othersin a sequence of casts of that die. For example, suppose that a die has been loadedso that the relative frequencies of the numbers in C seem to stabilize proportionalto the number of spots that are on the up side. Thus we might assign f(x) = x/21,x C, and the corresponding

    P (C) =xC

    f(x)

    would satisfy Definition 1.3.1. For illustration, this means that if C = {1, 2, 3}, then

    P (C) =

    3x=1

    f(x) =1

    21+

    2

    21+

    3

    21=

    6

    21=

    2

    7.

    Whether this probability set function is realistic can only be checked by performingthe random experiment a large number of times.

    We end this section with an additional property of probability which provesuseful in the sequel. Recall in Exercise 1.2.8 we said that a sequence of events

  • 1.3. The Probability Set Function 17

    {Cn} is a nondecreasing sequence if Cn Cn+1, for all n, in which case we wrotelimn Cn = n=1Cn. Consider limn P (Cn). The question is: can we inter-change the limit and P? As the following theorem shows, the answer is yes. Theresult also holds for a decreasing sequence of events. Because of this interchange,this theorem is sometimes referred to as the continuity theorem of probability.

    Theorem 1.3.6. Let {Cn} be a nondecreasing sequence of events. Then

    limn

    P (Cn) = P ( limn

    Cn) = P

    ( n=1

    Cn

    ). (1.3.7)

    Let {Cn} be a decreasing sequence of events. Then

    limn

    P (Cn) = P ( limn

    Cn) = P

    ( n=1

    Cn

    ). (1.3.8)

    Proof. We prove the result (1.3.7) and leave the second result as Exercise 1.3.19.Define the sets, called rings, as R1 = C1 and for n > 1, Rn = Cn Ccn1. Itfollows that

    n=1 Cn =

    n=1 Rn and that Rm Rn = , for m = n. Also,

    P (Rn) = P (Cn) P (Cn1). Applying the third axiom of probability yields thefollowing string of equalities:

    P[

    limnCn

    ]= P

    ( n=1

    Cn

    )= P

    ( n=1

    Rn

    )=

    n=1

    P (Rn) = limn

    nj=1

    P (Rj)

    = limn

    P (C1) +n

    j=2

    [P (Cj) P (Cj1)]

    = limnP (Cn).(1.3.9)This is the desired result.

    Another useful result for arbitrary unions is given by

    Theorem 1.3.7 (Booles Inequality). Let {Cn} be an arbitrary sequence of events.Then

    P

    ( n=1

    Cn

    )

    n=1

    P (Cn). (1.3.10)

    Proof: Let Dn =n

    i=1 Ci. Then {Dn} is an increasing sequence of events which goup to

    n=1 Cn. Also, for all j, Dj = Dj1 Cj . Hence, by Theorem 1.3.5,

    P (Dj) P (Dj1) + P (Cj),

    that is,

    P (Dj) P (Dj1) P (Cj).

  • 18 Probability and Distributions

    In this case, the Cis are replaced by the Dis in expression (1.3.9). Hence, using theabove inequality in this expression and the fact that P (C1) = P (D1), we have

    P

    ( n=1

    Cn

    )= P

    ( n=1

    Dn

    )= lim

    n

    P (D1) +n

    j=2

    [P (Dj) P (Dj1)]

    lim

    n

    nj=1

    P (Cj) =

    n=1

    P (Cn).

    EXERCISES

    1.3.1. A positive integer from one to six is to be chosen by casting a die. Thus theelements c of the sample space C are 1, 2, 3, 4, 5, 6. Suppose C1 = {1, 2, 3, 4} andC2 = {3, 4, 5, 6}. If the probability set function P assigns a probability of 16 to eachof the elements of C, compute P (C1), P (C2), P (C1 C2), and P (C1 C2).

    1.3.2. A random experiment consists of drawing a card from an ordinary deck of52 playing cards. Let the probability set function P assign a probability of 152 toeach of the 52 possible outcomes. Let C1 denote the collection of the 13 hearts andlet C2 denote the collection of the 4 kings. Compute P (C1), P (C2), P (C1 C2),and P (C1 C2).

    1.3.3. A coin is to be tossed as many times as necessary to turn up one head.Thus the elements c of the sample space C are H, TH, TTH, TTTH, and soforth. Let the probability set function P assign to these elements the respec-tive probabilities 12 ,

    14 ,

    18 ,

    116 , and so forth. Show that P (C) = 1. Let C1 = {c :

    c is H, TH, TTH, TTTH, or TTTTH}. Compute P (C1). Next, suppose that C2 ={c : c is TTTTH or TTTTTH}. Compute P (C2), P (C1 C2), and P (C1 C2).

    1.3.4. If the sample space is C = C1 C2 and if P (C1) = 0.8 and P (C2) = 0.5, findP (C1 C2).

    1.3.5. Let the sample space be C = {c : 0 < c < }. Let C C be defined byC = {c : 4 < c < } and take P (C) =

    C

    ex dx. Show that P (C) = 1. EvaluateP (C), P (Cc), and P (C Cc).

    1.3.6. If the sample space is C = {c : < c < } and if C C is a set for whichthe integral

    C

    e|x| dx exists, show that this set function is not a probability setfunction. What constant do we multiply the integrand by to make it a probabilityset function?

    1.3.7. If C1 and C2 are subsets of the sample space C, show that

    P (C1 C2) P (C1) P (C1 C2) P (C1) + P (C2).

    1.3.8. Let C1, C2, and C3 be three mutually disjoint subsets of the sample spaceC. Find P [(C1 C2) C3] and P (Cc1 Cc2).

  • 1.3. The Probability Set Function 19

    1.3.9. Consider Remark 1.3.2.

    (a) If C1, C2, and C3 are subsets of C, show that

    P (C1 C2 C3) = P (C1) + P (C2) + P (C3) P (C1 C2)P (C1 C3) P (C2 C3) + P (C1 C2 C3).

    (b) Now prove the general inclusion exclusion formula given by the expression(1.3.2).

    Remark 1.3.4. In order to solve Exercises (1.3.10)-(1.3.18), certain reasonableassumptions must be made.

    1.3.10. A bowl contains 16 chips, of which 6 are red, 7 are white, and 3 are blue. Iffour chips are taken at random and without replacement, find the probability that:(a) each of the four chips is red; (b) none of the four chips is red; (c) there is atleast one chip of each color.

    1.3.11. A person has purchased 10 of 1000 tickets sold in a certain raffle. Todetermine the five prize winners, five tickets are to be drawn at random and withoutreplacement. Compute the probability that this person wins at least one prize.Hint: First compute the probability that the person does not win a prize.

    1.3.12. Compute the probability of being dealt at random and without replacementa 13-card bridge hand consisting of: (a) 6 spades, 4 hearts, 2 diamonds, and 1 club;(b) 13 cards of the same suit.

    1.3.13. Three distinct integers are chosen at random from the first 20 positiveintegers. Compute the probability that: (a) their sum is even; (b) their product iseven.

    1.3.14. There are five red chips and three blue chips in a bowl. The red chipsare numbered 1, 2, 3, 4, 5, respectively, and the blue chips are numbered 1, 2, 3,respectively. If two chips are to be drawn at random and without replacement, findthe probability that these chips have either the same number or the same color.

    1.3.15. In a lot of 50 light bulbs, there are 2 bad bulbs. An inspector examinesfive bulbs, which are selected at random and without replacement.

    (a) Find the probability of at least one defective bulb among the five.

    (b) How many bulbs should be examined so that the probability of finding at leastone bad bulb exceeds 12?

    1.3.16. If C1, . . . , Ck are k events in the sample space C, show that the probabilitythat at least one of the events occurs is one minus the probability that none of themoccur; i.e.,

    P (C1 Ck) = 1 P (Cc1 Cck). (1.3.11)

  • 20 Probability and Distributions

    1.3.17. A secretary types three letters and the three corresponding envelopes. Ina hurry, he places at random one letter in each envelope. What is the probabilitythat at least one letter is in the correct envelope? Hint: Let Ci be the event thatthe ith letter is in the correct envelope. Expand P (C1 C2 C3) to determine theprobability.

    1.3.18. Consider poker hands drawn from a well-shuffled deck as described in Ex-ample 1.3.4. Determine the probability of a full house, i.e, three of one kind andtwo of another.

    1.3.19. Prove expression (1.3.8).

    1.3.20. Suppose the experiment is to choose a real number at random in the in-terval (0, 1). For any subinterval (a, b) (0, 1), it seems reasonable to assign theprobability P [(a, b)] = ba; i.e., the probability of selecting the point from a subin-terval is directly proportional to the length of the subinterval. If this is the case,choose an appropriate sequence of subintervals and use expression (1.3.8) to showthat P [{a}] = 0, for all a (0, 1).

    1.3.21. Consider the events C1, C2, C3.

    (a) Suppose C1, C2, C3 are mutually exclusive events. If P (Ci) = pi, i = 1, 2, 3,what is the restriction on the sum p1 + p2 + p3?

    (b) In the notation of part (a), if p1 = 4/10, p2 = 3/10, and p3 = 5/10, areC1, C2, C3 mutually exclusive?

    For the last two exercises it is assumed that the reader is familar with -fields.

    1.3.22. Suppose D is a nonempty collection of subsets of C. Consider the collectionof events

    B = {E : D E and E is a -field}.

    Note that B because it is in each -field, and, hence, in particular, it is in each-field E D. Continue in this way to show that B is a -field.

    1.3.23. Let C = R, where R is the set of all real numbers. Let I be the set of allopen intervals in R. The Borel -field on the real line is given by

    B0 = {E : I E and E is a -field}.

    By definition, B0 contains the open intervals. Because [a,) = (, a)c and B0is closed under complements, it contains all intervals of the form [a,), for a R.Continue in this way and show that B0 contains all the closed and half-open intervalsof real numbers.

  • 1.4. Conditional Probability and Independence 21

    1.4 Conditional Probability and Independence

    In some random experiments, we are interested only in those outcomes that areelements of a subset C1 of the sample space C. This means, for our purposes, thatthe sample space is effectively the subset C1. We are now confronted with theproblem of defining a probability set function with C1 as the new sample space.

    Let the probability set function P (C) be defined on the sample space C and letC1 be a subset of C such that P (C1) > 0. We agree to consider only those outcomesof the random experiment that are elements of C1; in essence, then, we take C1 tobe a sample space. Let C2 be another subset of C. How, relative to the new samplespace C1, do we want to define the probability of the event C2? Once defined,this probability is called the conditional probability of the event C2, relative to thehypothesis of the event C1, or, more briefly, the conditional probability of C2, givenC1. Such a conditional probability is denoted by the symbol P (C2|C1). We nowreturn to the question that was raised about the definition of this symbol. Since C1is now the sample space, the only elements of C2 that concern us are those, if any,that are also elements of C1, that is, the elements of C1 C2. It seems desirable,then, to define the symbol P (C2|C1) in such a way that

    P (C1|C1) = 1 and P (C2|C1) = P (C1 C2|C1).

    Moreover, from a relative frequency point of view, it would seem logically inconsis-tent if we did not require that the ratio of the probabilities of the events C1 C2and C1, relative to the space C1, be the same as the ratio of the probabilities ofthese events relative to the space C; that is, we should have

    P (C1 C2|C1)P (C1|C1)

    =P (C1 C2)

    P (C1).

    These three desirable conditions imply that the relation

    P (C2|C1) =P (C1 C2)

    P (C1)

    is a suitable definition of the conditional probability of the event C2, given theevent C1, provided that P (C1) > 0. Moreover, we have

    1. P (C2|C1) 0.2. P

    (j=2Cj |C1

    )=

    j=2 P (Cj |C1), provided that C2, C3, . . . are mutually ex-

    clusive events.

    3. P (C1|C1) = 1.Properties (1) and (3) are evident and the proof of property (2) is left as Exercise

    1.4.1. But these are precisely the conditions that a probability set function mustsatisfy. Accordingly, P (C2|C1) is a probability set function, defined for subsetsof C1. It may be called the conditional probability set function, relative to thehypothesis C1, or the conditional probability set function, given C1. It should benoted that this conditional probability set function, given C1, is defined at this timeonly when P (C1) > 0.

  • 22 Probability and Distributions

    Example 1.4.1. A hand of five cards is to be dealt at random without replacementfrom an ordinary deck of 52 playing cards. The conditional probability of an all-spade hand (C2), relative to the hypothesis that there are at least four spades inthe hand (C1), is, since C1 C2 = C2,

    P (C2|C1) =P (C2)

    P (C1)=

    (135

    )/(525

    )[(134

    )(391

    )+(135

    )]/(525

    )=

    (135

    )(134

    )(391

    )+(135

    ) = 0.0441.Note that this is not the same as drawing for a spade to complete a flush in drawpoker; see Exercise 1.4.3.

    From the definition of the conditional probability set function, we observe that

    P (C1 C2) = P (C1)P (C2|C1).

    This relation is frequently called the multiplication rule for probabilities. Some-times, after considering the nature of the random experiment, it is possible to makereasonable assumptions so that both P (C1) and P (C2|C1) can be assigned. ThenP (C1 C2) can be computed under these assumptions. This is illustrated in Ex-amples 1.4.2 and 1.4.3.

    Example 1.4.2. A bowl contains eight chips. Three of the chips are red andthe remaining five are blue. Two chips are to be drawn successively, at randomand without replacement. We want to compute the probability that the first drawresults in a red chip (C1) and that the second draw results in a blue chip (C2). Itis reasonable to assign the following probabilities:

    P (C1) =38 and P (C2|C1) = 57 .

    Thus, under these assignments, we have P (C1 C2) = (38 )(57 ) = 1556 = 0.2679.Example 1.4.3. From an ordinary deck of playing cards, cards are to be drawnsuccessively, at random and without replacement. The probability that the thirdspade appears on the sixth draw is computed as follows. Let C1 be the event of twospades in the first five draws and let C2 be the event of a spade on the sixth draw.Thus the probability that we wish to compute is P (C1 C2). It is reasonable totake

    P (C1) =

    (132

    )(393

    )(525

    ) = 0.2743 and P (C2|C1) = 1147

    = 0.2340.

    The desired probability P (C1C2) is then the product of these two numbers, whichto four places is 0.0642.

    The multiplication rule can be extended to three or more events. In the case ofthree events, we have, by using the multiplication rule for two events,

    P (C1 C2 C3) = P [(C1 C2) C3]= P (C1 C2)P (C3|C1 C2).

  • 1.4. Conditional Probability and Independence 23

    But P (C1 C2) = P (C1)P (C2|C1). Hence, provided P (C1 C2) > 0,

    P (C1 C2 C3) = P (C1)P (C2|C1)P (C3|C1 C2).

    This procedure can be used to extend the multiplication rule to four or moreevents. The general formula for k events can be proved by mathematical induction.

    Example 1.4.4. Four cards are to be dealt successively, at random and withoutreplacement, from an ordinary deck of playing cards. The probability of receiving aspade, a heart, a diamond, and a club, in that order, is (1352 )(

    1351 )(

    1350 )(

    1349 ) = 0.0044.

    This follows from the extension of the multiplication rule.

    Consider k mutually exclusive and exhaustive events C1, C2, . . . , Ck such thatP (Ci) > 0, i = 1, 2, . . . , k; i.e., C1, C2, . . . , Ck form a partition of C. Here the eventsC1, C2, . . . , Ck do not need to be equally likely. Let C be another event such thatP (C) > 0. Thus C occurs with one and only one of the events C1, C2, . . . , Ck; thatis,

    C = C (C1 C2 Ck)= (C C1) (C C2) (C Ck).

    Since C Ci, i = 1, 2, . . . , k, are mutually exclusive, we have

    P (C) = P (C C1) + P (C C2) + + P (C Ck).

    However, P (C Ci) = P (Ci)P (C|Ci), i = 1, 2, . . . , k; so

    P (C) = P (C1)P (C|C1) + P (C2)P (C|C2) + + P (Ck)P (C|Ck)

    =k

    i=1

    P (Ci)P (C|Ci).

    This result is sometimes called the law of total probability.From the definition of conditional probability, we have, using the law of total

    probability, that

    P (Cj |C) =P (C Cj)

    P (C)=

    P (Cj)P (C|Cj)ki=1 P (Ci)P (C|Ci)

    , (1.4.1)

    which is the well-known Bayes Theorem. This permits us to calculate the con-ditional probability of Cj , given C, from the probabilities of C1, C2, . . . , Ck and theconditional probabilities of C, given Ci, i = 1, 2, . . . , k.

    Example 1.4.5. Say it is known that bowl C1 contains three red and seven bluechips and bowl C2 contains eight red and two blue chips. All chips are identical insize and shape. A die is cast and bowl C1 is selected if five or six spots show on theside that is up; otherwise, bowl C2 is selected. In a notation that is fairly obvious,it seems reasonable to assign P (C1) =

    26 and P (C2) =

    46 . The selected bowl is

    handed to another person and one chip is taken at random. Say that this chip is

  • 24 Probability and Distributions

    red, an event which we denote by C. By considering the contents of the bowls, it isreasonable to assign the conditional probabilities P (C|C1) = 310 and P (C|C2) = 810 .Thus the conditional probability of bowl C1, given that a red chip is drawn, is

    P (C1|C) =P (C1)P (C|C1)

    P (C1)P (C|C1) + P (C2)P (C|C2)

    =(26 )(

    310 )

    (26 )(310 ) + (

    46 )(

    810 )

    =3

    19.

    In a similar manner, we have P (C2|C) = 1619 .

    In Example 1.4.5, the probabilities P (C1) =26 and P (C2) =

    46 are called prior

    probabilities of C1 and C2, respectively, because they are known to be due to therandom mechanism used to select the bowls. After the chip is taken and observedto be red, the conditional probabilities P (C1|C) = 319 and P (C2|C) = 1619 are calledposterior probabilities. Since C2 has a larger proportion of red chips than doesC1, it appeals to ones intuition that P (C2|C) should be larger than P (C2) and,of course, P (C1|C) should be smaller than P (C1). That is, intuitively the chancesof having bowl C2 are better once that a red chip is observed than before a chipis taken. Bayes theorem provides a method of determining exactly what thoseprobabilities are.

    Example 1.4.6. Three plants, C1, C2, and C3, produce respectively, 10%, 50%,and 40% of a companys output. Although plant C1 is a small plant, its managerbelieves in high quality and only 1% of its products are defective. The other two, C2and C3, are worse and produce items that are 3% and 4% defective, respectively.All products are sent to a central warehouse. One item is selected at randomand observed to be defective, say event C. The conditional probability that itcomes from plant C1 is found as follows. It is natural to assign the respective priorprobabilities of getting an item from the plants as P (C1) = 0.1, P (C2) = 0.5, andP (C3) = 0.4, while the conditional probabilities of defective items are P (C|C1) =0.01, P (C|C2) = 0.03, and P (C|C3) = 0.04. Thus the posterior probability of C1,given a defective, is

    P (C1|C) =P (C1 C)

    P (C)=

    (0.10)(0.01)

    (0.1)(0.01) + (0.5)(0.03) + (0.4)(0.04),

    which equals 132 ; this is much smaller than the prior probability P (C1) =110 . This

    is as it should be because the fact that the item is defective decreases the chancesthat it comes from the high-quality plant C1.

    Example 1.4.7. Suppose we want to investigate the percentage of abused childrenin a certain population. The events of interest are: a child is abused (A) and itscomplement a child is not abused (N = Ac). For the purposes of this example, weassume that P (A) = 0.01 and, hence, P (N) = 0.99. The classification as to whethera child is abused or not is based upon a doctors examination. Because doctors arenot perfect, they sometimes classify an abused child (A) as one that is not abused

  • 1.4. Conditional Probability and Independence 25

    (ND, where ND means classified as not abused by a doctor). On the other hand,doctors sometimes classify a nonabused child (N) as abused (AD). Suppose theseerror rates of misclassification are P (ND |A) = 0.04 and P (AD |N) = 0.05; thusthe probabilities of correct decisions are P (AD |A) = 0.96 and P (ND |N) = 0.95.Let us compute the probability that a child taken at random is classified as abusedby a doctor. Because this can happen in two ways, A AD or N AD, we have

    P (AD) = P (AD |A)P (A)+P (AD |N)P (N) = (0.96)(0.01)+(0.05)(0.99) = 0.0591,

    which is quite high relative to the probability of an abused child, 0.01. Further, theprobability that a child is abused when the doctor classified the child as abused is

    P (A |AD) =P (A AD)

    P (AD)=

    (0.96)(0.01)

    0.0591= 0.1624,

    which is quite low. In the same way, the probability that a child is not abusedwhen the doctor classified the child as abused is 0.8376, which is quite high. Thereason that these probabilities are so poor at recording the true situation is that thedoctors error rates are so high relative to the fraction 0.01 of the population thatis abused. An investigation such as this would, hopefully, lead to better training ofdoctors for classifying abused children. See also Exercise 1.4.17.

    Sometimes it happens that the occurrence of event C1 does not change theprobability of event C2; that is, when P (C1) > 0,

    P (C2|C1) = P (C2).

    In this case, we say that the events C1 and C2 are independent . Moreover, themultiplication rule becomes

    P (C1 C2) = P (C1)P (C2|C1) = P (C1)P (C2). (1.4.2)

    This, in turn, implies, when P (C2) > 0, that

    P (C1|C2) =P (C1 C2)

    P (C2)=

    P (C1)P (C2)

    P (C2)= P (C1).

    Note that if P (C1) > 0 and P (C2) > 0, then by the above discussion, indepen-dence is equivalent to

    P (C1 C2) = P (C1)P (C2). (1.4.3)What if either P (C1) = 0 or P (C2) = 0? In either case, the right side of (1.4.3) is0. However, the left side is 0 also because C1 C2 C1 and C1 C2 C2. Hence,we take Equation (1.4.3) as our formal definition of independence; that is,

    Definition 1.4.1. Let C1 and C2 be two events. We say that C1 and C2 areindependent if Equation (1.4.3) holds.

    Suppose C1 and C2 are independent events. Then the following three pairs ofevents are independent: C1 and C

    c2 , C

    c1 and C2, and C

    c1 and C

    c2 (see Exercise 1.4.11).

  • 26 Probability and Distributions

    Remark 1.4.1. Events that are independent are sometimes called statistically in-dependent, stochastically independent, or independent in a probability sense. Inmost instances, we use independent without a modifier if there is no possibility ofmisunderstanding.

    Example 1.4.8. A red die and a white die are cast in such a way that the numbersof spots on the two sides that are up are independent events. If C1 represents afour on the red die and C2 represents a three on the white die, with an equallylikely assumption for each side, we assign P (C1) =

    16 and P (C2) =

    16 . Thus, from

    independence, the probability of the ordered pair (red = 4, white = 3) is

    P [(4, 3)] = (16 )(16 ) =

    136 .

    The probability that the sum of the up spots of the two dice equals seven is

    P [(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)]

    =(

    16

    ) (16

    )+(

    16

    ) (16

    )+(

    16

    ) (16

    )+(

    16

    ) (16

    )+(

    16

    ) (16

    )+(

    16

    ) (16

    )= 636 .

    In a similar manner, it is easy to show that the probabilities of the sums of2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 are, respectively,

    136 ,

    236 ,

    336 ,

    436 ,

    536 ,

    636 ,

    536 ,

    436 ,

    336 ,

    236 ,

    136 .

    Suppose now that we have three events, C1, C2, and C3. We say that they aremutually independent if and only if they are pairwise independent :

    P (C1 C3) = P (C1)P (C3), P (C1 C2) = P (C1)P (C2),P (C2 C3) = P (C2)P (C3),

    andP (C1 C2 C3) = P (C1)P (C2)P (C3).

    More generally, the n events C1, C2, . . . , Cn are mutually independent if andonly if for every collection of k of these events, 2 k n, the following is true:

    Say that d1, d2, . . . , dk are k distinct integers from 1, 2, . . . , n; then

    P (Cd1 Cd2 Cdk) = P (Cd1)P (Cd2) P (Cdk).

    In particular, if C1, C2, . . . , Cn are mutually independent, then

    P (C1 C2 Cn) = P (C1)P (C2) P (Cn).

    Also, as with two sets, many combinations of these events and their complementsare independent, such as

    1. The events Cc1 and C2 Cc3 C4 are independent,

    2. The events C1 Cc2 , Cc3 and C4 Cc5 are mutually independent.

    If there is no possibility of misunderstanding, independent is often used without themodifier mutually when considering more than two events.

  • 1.4. Conditional Probability and Independence 27

    Example 1.4.9. Pairwise independence does not imply mutual independence. Asan example, suppose we twice spin a fair spinner with the numbers 1, 2, 3, and 4.Let C1 be the event that the sum of the numbers spun is 5, let C2 be the event thatthe first number spun is a 1, and let C3 be the event that the second number spunis a 4. Then P (Ci) = 1/4, i = 1, 2, 3, and for i = j, P (Ci Cj) = 1/16. So thethree events are pairwise independent. But C1 C2 C3 is the event that (1, 4) isspun, which has probability 1/16 = 1/64 = P (C1)P (C2)P (C3). Hence the eventsC1, C2, and C3 are not mutually independent.

    We often perform a sequence of random experiments in such a way that theevents associated with one of them are independent of the events associated withthe others. For convenience, we refer to these events as as outcomes of independentexperiments, meaning that the respective events are independent. Thus we oftenrefer to independent flips of a coin or independent casts of a die or, more generally,independent trials of some given random experiment.

    Example 1.4.10. A coin is flipped independently several times. Let the event Cirepresent a head (H) on the ith toss; thus Cci represents a tail (T). Assume that Ciand Cci are equally likely; that is, P (Ci) = P (C

    ci ) =

    12 . Thus the probability of an

    ordered sequence like HHTH is, from independence,

    P (C1 C2 Cc3 C4) = P (C1)P (C2)P (Cc3)P (C4) = (12 )4 = 116 .Similarly, the probability of observing the first head on the third flip is

    P (Cc1 Cc2 C3) = P (Cc1)P (Cc2)P (C3) = (12 )3 = 18 .Also, the probability of getting at least one head on four flips is

    P (C1 C2 C3 C4) = 1 P [(C1 C2 C3 C4)c]= 1 P (Cc1 Cc2 Cc3 Cc4)= 1

    (12

    )4= 1516 .

    See Exercise 1.4.13 to justify this last probability.

    Example 1.4.11. A computer system is built so that if component K1 fails, it isbypassed and K2 is used. If K2 fails, then K3 is used. Suppose that the probabilitythat K1 fails is 0.01, that K2 fails is 0.03, and that K3 fails is 0.08. Moreover, wecan assume that the failures are mutually independent events. Then the probabilityof failure of the system is

    (0.01)(0.03)(0.08) = 0.000024,

    as all three components would have to fail. Hence, the probability that the systemdoes not fail is 1 0.000024 = 0.999976.EXERCISES

    1.4.1. If P (C1) > 0 and if C2, C3, C4, . . . are mutually disjoint sets, show that

    P (C2 C3 |C1) = P (C2|C1) + P (C3|C1) + .

  • 28 Probability and Distributions

    1.4.2. Assume that P (C1 C2 C3) > 0. Prove that

    P (C1 C2 C3 C4) = P (C1)P (C2|C1)P (C3|C1 C2)P (C4|C1 C2 C3).

    1.4.3. Suppose we are playing draw poker. We are dealt (from a well-shuffled deck)five cards, which contain four spades and another card of a different suit. We decideto discard the card of a different suit and draw one card from the remaining cardsto complete a flush in spades (all five cards spades). Determine the probability ofcompleting the flush.

    1.4.4. From a well-shuffled deck of ordinary playing cards, four cards are turnedover one at a time without replacement. What is the probability that the spadesand red cards alternate?

    1.4.5. A hand of 13 cards is to be dealt at random and without replacement froman ordinary deck of playing cards. Find the conditional probability that there areat least three kings in the hand given that the hand contains at least two kings.

    1.4.6. A drawer contains eight different pairs of socks. If six socks are taken atrandom and without replacement, compute the probability that there is at least onematching pair among these six socks. Hint: Compute the probability that there isnot a matching pair.

    1.4.7. A pair of dice is cast until either the sum of seven or eight appears.

    (a) Show that the probability of a seven before an eight is 6/11.

    (b) Next, this pair of dice is cast until a seven appears twice or until each of asix and eight has appeared at least once. Show that the probability of the sixand eight occurring before two sevens is 0.546.

    1.4.8. In a certain factory, machines I, II, and III are all producing springs of thesame length. Machines I, II, and III produce 1%, 4%, and 2% defective springs,respectively. Of the total production of springs in the factory, Machine I produces30%, Machine II produces 25%, and Machine III produces 45%.

    (a) If one spring is selected at random from the total springs produced in a givenday, determine the probability that it is defective.

    (b) Given that the selected spring is defective, find the conditional probabilitythat it was produced by Machine II.

    1.4.9. Bowl I contains six red chips and four blue chips. Five of these 10 chipsare selected at random and without replacement and put in bowl II, which wasoriginally empty. One chip is then drawn at random from bowl II. Given that thischip is blue, find the conditional probability that two red chips and three blue chipsare transferred from bowl I to bowl II.

    1.4.10. In an office there are two boxes of computer disks: Box C1 contains sevenVerbatim disks and three Control Data disks, and box C2 contains two Verbatim

  • 1.4. Conditional Probability and Independence 29

    disks and eight Control Data disks. A person is handed a box at random withprior probabilities P (C1) =

    23 and P (C2) =

    13 , possibly due to the boxes respective

    locations. A disk is then selected at random and the event C occurs if it is fromControl Data. Using an equally likely assumption for each disk in the selected box,compute P (C1|C) and P (C2|C).

    1.4.11. If C1 and C2 are independent events, show that the following pairs of eventsare also independent: (a) C1 and C

    c2 , (b) C

    c1 and C2, and (c) C

    c1 and C

    c2 . Hint:

    In (a), write P (C1 Cc2) = P (C1)P (Cc2 |C1) = P (C1)[1 P (C2|C1)]. From theindependence of C1 and C2, P (C2|C1) = P (C2).

    1.4.12. Let C1 and C2 be independent events with P (C1) = 0.6 and P (C2) = 0.3.Compute (a) P (C1 C2), (b) P (C1 C2), and (c) P (C1 Cc2).

    1.4.13. Generalize Exercise 1.2.5 to obtain

    (C1 C2 Ck)c = Cc1 Cc2 Cck.

    Say that C1, C2, . . . , Ck are independent events that have respective probabilitiesp1, p2, . . . , pk. Argue that the probability of at least one of C1, C2, . . . , Ck is equalto

    1 (1 p1)(1 p2) (1 pk).

    1.4.14. Each of four persons fires one shot at a target. Let Ck denote the event thatthe target is hit by person k, k = 1, 2, 3, 4. If C1, C2, C3, C4 are independent andif P (C1) = P (C2) = 0.7, P (C3) = 0.9, and P (C4) = 0.4, compute the probabilitythat (a) all of them hit the target; (b) exactly one hits the target; (c) no one hitsthe target; (d) at least one hits the target.

    1.4.15. A bowl contains three red (R) balls and seven white (W) balls of exactlythe same size and shape. Select balls successively at random and with replacementso that the events of white on the first trial, white on the second, and so on, can beassumed to be independent. In four trials, make certain assumptions and computethe probabilities of the following ordered sequences: (a) WWRW; (b) RWWW; (c)WWWR; and (d) WRWW. Compute the probability of exactly one red ball in thefour trials.

    1.4.16. A coin is tossed two independent times, each resulting in a tail (T) or a head(H). The sample space consists of four ordered pairs: TT, TH, HT, HH. Makingcertain assumptions, compute the probability of each of these ordered pairs. Whatis the probability of at least one head?

    1.4.17. For Example 1.4.7, obtain the following probabilities. Explain what theymean in terms of the problem.

    (a) P (ND).

    (b) P (N |AD).

    (c) P (A |ND).

  • 30 Probability and Distributions

    (d) P (N |ND).

    1.4.18.