320
i Dr. Jaffar S. Almousawi An Introduction to Statistics First Edition " ن عمادة م بدعملكتاب نشر ھذا ا دلفيا جامعة فيعلمي في البحث ال"

Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: [email protected] ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

i

Dr. Jaffar S. Almousawi

An Introduction

to Statistics First Edition

نشر ھذا الكتاب بدعم من عمادة " "البحث العلمي في جامعة في�دلفيا

Page 2: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

ii

The Hashemite Kingdom of Jordan ) 1340 / 4 / 2009( The Deposit Number at The

National Library 425 Al-mousawi, Jaffar S. An Introduction to Statistics / Jaffar S . Al-mousawi.- Amman : Dar Al – Baraka , 2009 ( ) p

) 1340 / 4 / 2009( No. : Deposit

Mathematics // Higher // Statistics Descriptors : /

/ Education

أعدت دائرة المكتبة الوطنية بيانات الفھرسة والتصنيف اولية*يتحمل المؤلف كامل المسؤولية القانونية عن محتوى مصنفه و" يعبر ھذا *

المصنف عن رأي دائرة المكتبة الوطنية أو أي جھة حكومية

محكم ومقيم علمياالكتاب

حقوق الطبع محفوظة

الطبعة ا�ولى م 2009

ر والتوزيعدار البركـة للنش

اردن ـ عمـان عمان 11947 ـ 1432 ب.ص

5054540 - 6 - 962 + ـ تلفاكس 5527822 - 79 - 962 + ھاتف جوال ـ

E. Mall: [email protected]

ISBN 978 - 6 – 69 – 414 – 9957 ) ردمك(

Page 3: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

iii

An Introduction

to Statistics First Edition

Dr. Jaffar Almousawi

Associate Professor of Statistics

Department of Basic Sciences and Mathematics

Faculty of Science

Philadelphia University

Amman, Jordan

Page 4: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

iv

Page 5: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

v

To my Wife, Daughter, and Son

Page 6: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

vi

Page 7: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

vii

Preface

This is an introductory textbook for a first course in statistics and probability for undergraduate students. It is written, mainly, for students taking the course as a faculty requirement. Many of the methods we present in the book are fundamental to a variety of disciplines such as engineering, computer science, IT, genetics engineering, nursing, pharmaceutics, business or management. We have worked hard to ensure that the material be easily understood by students. The levels of, both, mathematics and English are kept minimum. We have tried very hard to use examples and exercises of real data taken from the Jordanian society. Toward this end, a lot of the examples and exercises were taken from the Statistical Yearbook, 2006, Published by the Department of Statistics (DOS) in Jordan.

The book is a comprehensive 11-chapter text that provides sufficient coverage for a one-semester course. We focus on applications of statistics in the aforementioned disciplines. We strongly recommend the use of computers in class and hope to use computers to apply statistical methods to solve problems and exercises. The statistical packages that are widely used are SPSS, Minitab, and Excel. The exercises at the end of each chapter give the student the opportunity to strengthen what he/she has learned.

Among the many features for creating a better learning environment throughout the text are examples that illustrate the application of the statistical methods used, graphs, and tables. We highlight the definitions and main topics. We end each chapter with the chapter's learning outcomes and chapter's key terms. The book is divided into four parts.

Part I is an introduction. Chapter 1 enables the reader to be familiar with the term "statistics" and why we study it. We discuss types of statistics, and introduce two important concepts; population and sample. Also, in this chapter, we give reasons for taking a sample and introduce classes of sampling and sampling error.

Part II is devoted for various descriptive statistics. In Chapter 2, we introduce data and how data is organized. We discuss the importance of data and its types. We give some important methods that are used to organize data, as well as shapes of distributions. In Chapter 3, we summarize data numerically using measures of central tendency, measures of variability, and measures of position. We end Chapter 3 with the five-number summary and box plots. In Chapter 4 we introduce the simple linear correlation and regression.

Part III covers the important subject of probability. In Chapter 5, we discuss concepts of probability where we first give some important definitions, and introduce some graphical displays and relationships between events. We state the

Page 8: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

viii

axioms of probability, and give some of the important rules in probability such as the addition and multiplication rules. The chapter also introduces the student to the conditional probability and independence. We end Chapter 5 with some important counting techniques. In Chapter 6, we introduce discrete probability distributions, where we discuss the uniform, Bernoulli, binomial, hypergeometric, and the Poisson probability distributions. Chapter 7 is devoted to the normal distribution as part of the continuous distributions. We discuss the uniform continuous distribution as an easy example of continuous distributions. We then discuss the normal distribution, its probability density, the standard normal distribution, the 68.26-95.44-99.74 rule, and finally give the percentiles of the standard normal distribution.

Part IV covers statistical inference. The material has been extensively written and organized. In Chapter 8, we introduce sampling distributions where we discuss sampling error and the need for sampling distribution. We also derive the sampling distribution of the mean and that of proportion. In Chapter 9, we present point and interval estimation of a parameter. We develop confidence intervals about a population mean when the standard deviation of the population is known or unknown. Also, we give confidence interval about population proportion. In Chapter 10, we discuss hypotheses testing. We start with an introduction and general concepts in the one-sample case. We test hypotheses about the mean when the population standard deviation is known or unknown. Also, we test hypotheses about population proportion. We introduce the p-value approach to test hypotheses, and end the chapter with calculating the power of the test. In Chapter 11, we introduce statistical inferences for the two-sample case. We develop inferences about two-population means in the cases of independent and dependent samples, and end the chapter and the book with inferences about two population proportions.

Finally, we welcome any suggestions, new thoughts, and corrections of mistakes regarding what's written in this first edition of the book. Please do not hesitate to write me at the following address:

[email protected] Acknowledgment

I would like to express my grateful appreciation to the referees whose valuable remarks have made the book look better. Appreciation and thanks are also given to Philadelphia University for the assistance in publishing this book, bring it to the real world, and put it in the hands of students.

Jaffar S. Almousawi

Amman, Jordan

Page 9: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

ix

Contents

♦ Part I Introduction 1

Chapter 1 Statistics: What is it? 3 Chapter Outline 4

1.1 Why do we Study Statistics? 5

1.2 Types of Statistics 5

1.3 Two Important Concepts: Population and Sample 6

1.4 Reasons for Taking a Sample 7

1.5 Classes of Sampling and Sampling Error 8

Exercises 12

Chapter Learning Outcomes 13

Chapter Key Terms 13

♦ Part II Descriptive Statistics 15

Chapter 2 Data and Data Organizing 17 Chapter Outline 17

2.1 Importance of Data and Its Types 18

2.2 Some Methods to Organize Data 20

2.2.1 Organizing Numerical Data 20

2.2.2 Tabulating and Graphing Univariate

Numerical Data 23

2.2.3 Graphing Bivariate Numerical Data 30

2.2.4 Tabulating and Graphing Univariate

Categorical Data 32

2.2.5 Tabulating and Graphing Univariate

Categorical Data 36

Page 10: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

x

2.3 Shapes of Distributions 39

Exercises 41

Chapter Learning Outcomes 45

Chapter Key Terms 45

Chapter 3 Summarizing Data

Numerically 47 Chapter Outline 47

3.1 Measures of Central Tendency 48

3.2 Measures of Variability 58

3.3 Measures of Position 70

3.3.1 The z-Score 70

3.3.2 Percentiles 71

3.3.3 Deciles and Quartiles 75

3.4 The Five-Number Summary and Box Plots 76

Exercises 79

Chapter Learning Outcomes 82

Chapter Key Terms 83

Chapter 4 Simple Linear Correlation and Regression 85 Chapter Outline 85

4.1 Scatter Plots 86

4.2 Simple Linear Correlation and Pearson's Correlation Coefficient 88

4.2.1 Properties of r 88

4.2.2 Calculating the Value of r 89

4.3 Simple Linear Regression and Prediction 92 4.3.1 The Least-Square Criterion and

Regression Equation 93

4.3.2 Finding the Regression Equation 95

Page 11: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

xi

4.3.3 Meaning of the Slope of Regression

Line 98

4.3.4 Using the Regression Equation to Make

Predictions 99 4.4 The Coefficient of Determination 100 Exercises 101

Chapter Learning Outcomes 105

Chapter Key Terms 105

� Part III Probability Concepts

and Distributions 107

Chapter 5 Probability Concepts 109 Chapter Outline 109

5.1 Some Definitions 110

5.2 Graphical Displays and Relationships Between Events 115

5.2.1 Graphical Displays 115

5.2.2 Relationships between Events 116

5.3 Axioms of Probability 117

5.4 The Addition Rule 119

5.5 The Conditional Probability 122

5.6 Independence and the Multiplication Rule 124

5.6.1 The Multiplication Rule 124

5.6.2 Independent Events 126

5.6.3 The Rule of Total Probability 128

5.6.4 Bayes's Rule 130

5.7 Some Counting Rules 131

Exercises 136

Chapter Learning Outcomes 143

Chapter Key Terms 144

Page 12: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

xii

Chapter 6 Discrete Probability Distribution 145 Chapter Outline 145

6.1 Discrete Random Variables 146

6.1.1 Probability Distribution for Discrete

Random Variable 148

6.1.2 Mean and Variance for Discrete Random

Variable 151

6.2 The Uniform Discrete Probability Distribution 153

6.3 The Bernoulli and Binomial Probability Distributions 154

6.4 The Hypergeometric Probability Distribution 158

6.5 The Poisson Probability Distribution 160

Exercises 162

Chapter Learning Outcomes 164

Chapter Key Terms 165

Chapter 7 The Normal Probability Distribution 167 Chapter Outline 168

7.1 Continuous Probability Distributions 169

7.2 The Continuous Uniform Distribution 173

7.3 Properties of the Normal Distribution 174

7.4 The Probability Density Function of the Normal Distribution and its Graph 175

7.5 Comparing Two or More Normal Distributions 176

7.6 The Standard Normal Distribution 177

7.7 Interpreting the Meaning of the Value of z 184

7.8 The 68.26-95.44-99.74 Rule 185

7.9 Percentiles of the Standard Normal Distribution 186

Exercises 188

Chapter Learning Outcomes 192

Chapter Key Terms 192

Page 13: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

xiii

� Part IV Statistical Inference 193

Chapter 8 Sampling Distributions 195 Chapter Outline 195

8.1 Sampling Error and the Need for Sampling Distribution 196

8.2 Sampling Distribution of the Sample Mean 197

8.3 Sampling Distribution of the Sample Proportion 208

Exercises 212

Chapter Learning Outcomes 214

Chapter Key Terms 214

Chapter 9 Point and Interval Estimation 215 Chapter Outline 215

9.1 Point Estimation of a Parameter 216

9.2 Confidence Interval about the Population Mean when σ is Known 217

9.3 Confidence Interval about the Population Mean when σ is Unknown 225

9.4 Confidence Interval about Population

Proportion 231

Exercises 234

Chapter Learning Outcomes 237

Chapter Key Terms 237

Chapter 10 Hypotheses Testing 239 Chapter Outline 239

10.1 Introduction and General Concepts of Hypotheses Testing in the One-Sample Case 240

Page 14: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

xiv

10.2 Testing Hypotheses about the Mean when σ is Known 248

10.3 Testing Hypothesis about the Mean when σ is Unknown 253

10.4 Testing Hypothesis about Population Proportion 256

10.5 The p-Value 259

10.6 The Power of the Test 260

Exercises 266

Chapter Learning Outcomes 271

Chapter Key Terms 271

Chapter 11 Statistical Inference on Two Samples 273 Chapter Outline 273

11.1 Inference about Two Means: Independent Samples 274

11.2 Inference about Two means: Dependent Samples 283

11.3 Inference about Two Population Proportions 291

Exercises 298

Chapter Learning Outcomes 301

Chapter Key Terms 301

Appendix Standard Normal Tables 303-304

t-Table 305

Bibliography 306

Page 15: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part I: Introduction

1

Part I

Introduction

Page 16: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter One: Statistics: What is it?

2

Page 17: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part I: Introduction

3

Chapter 1

Statistics: What is it?

What do we mean by the word statistics? What does it bring to our minds? Most people think of numerical facts, figures, and data, such as prices of oil, interest rates, unemployment, etc. However, the Cambridge Advanced Learner's Dictionary gives two definitions of the word statistics.

1. Information based on a study of the number of times something happens or is present, or other facts.

2. The science of using information discovered from studying numbers.

The American Heritage Dictionary of the English Language, Fourth Edition, 2000 gives, somewhat, a different definition of the word statistics: … The mathematics of collecting, organizing, and interpreting of numerical data, especially the analysis of population characteristics by inference from sampling…

The word statistics as defined in the Wikipedia, the free encyclopedia is: statistics encompasses the collection, analysis, and interpretation of data.

Finally, Webster's New World Dictionary gives two definitions of the word statistics.

1. Numerical data assembled and classified so as to present significant information.

2. The science of compiling such data.

Statistics, however, means a lot more than what these definitions could possibly include. It is true that we statisticians assemble, classify, and tabulate data but we do a lot more than just that. We analyze data so that we can make generalizations and decisions. Moreover, an important aspect of statistics after analyzing the data is to interpret the results.

Page 18: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter One: Statistics: What is it?

4

Chapter Outline

1.1 Why Do We Study Statistics?

1.2 Types of Statistics

1.3 Two Important Concepts: Population and Sample

1.4 Reasons for Taking a Sample

1.5 Classes of Sampling and Sampling Error

Page 19: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part I: Introduction

5

Why Do We Study Statistics?

Every one (students or otherwise) should be able to have some data literacy. Data literacy is one's ability to follow and understand numerical facts and figures. In order to achieve this goal, we must study statistics and be familiar with. What do we mean by the word statistics? To answer this question we shall adopt the following definition of statistics.

Definition 1.1 Statistics

Statistics is the science that is concerned with the collecting, organizing, analyzing, and interpreting of the data.

Why do we need statistics when we study a problem in any area or discipline? There are many reasons that we can mention for the need of statistics. We mention only four important reasons below.

• To present different types of data (numerical and non-numerical) in a proper way.

• To use a sample rather than the whole population.

• To make the correct decision in the right time.

• To predict what will happen.

1.1 Types of Statistics

Statistics, in general, is divided into two major types: descriptive statistics and inferential statistics.

Definition 1.2 Descriptive Statistics

Descriptive statistics consists of methods and tools for organizing and summarizing the data.

Descriptive statistics includes the presentation of data in tables and graphs; nowadays, it includes a summary of data by different descriptive measures.

Definition 1.3 Inferential Statistics

Inferential statistics is the use of sample data to learn about population parameter(s) of interest.

In inferential statistics we take a sample and study it rather than taking the whole population of interest.

Page 20: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter One: Statistics: What is it?

6

1.3 Two Important Concepts: Population and Sample

In the study of statistics we must realize and clearly understand two important basic concepts, that is, the definitions of population and sample.

Definition 1.4 Population

Population is the set of all people (or things) we want to study or have numerical information about.

Definition 1.5 Sample

Sample is part (or subset) of the population from which we actually collect data and draw some conclusions about the population.

Definition 1.6 Sampling

Sampling is the act, process, or technique of selecting a sample from the population for the purpose of determining parameter(s) or characteristics of the population.

In Figure 1.1 we show the relationship between a population and a sample from that population.

Figure 1.1 Relationship between population and sample

Example 1.1 The relationship between population and sample

Let us take the population as being all students currently enrolled in Philadelphia University. The morning class registered in the course math 210231 Introduction to Probability and statistics from 8:10 to 9:00 S T Th is a sample from the population. The afternoon class from 12:45 to 2:00 in the same course is another sample from the population of all students of the university.

Page 21: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part I: Introduction

7

1.4 Reasons for Taking a Sample

There would be no need for statistical theory if the whole population rather than a sample was always used to obtain information about populations. Considering the whole population may not be practical and is almost never economical. Thus, the question "why do we take a sample instead of considering the whole population?" is now in place. In general, there are six main reasons for sampling instead of considering the population under study. These reasons are given below.

1. The economic advantage of using a sample. It is obvious that a sample requires fewer resources than a population. Thus less cost.

2. Less time. A sample may provide us with the needed information quickly. For example, if you are a physician and a disease has broken out in a small village within the area of your work, assume that the disease is contagious and it is killing people and no body knows what it is. You must do something to help people, so you think of conducting quick tests. If you try the population of all those affected, they may be long dead by the time you arrive with results. In this case just a few number of those already infected could help and be used to provide the required information.

3. The large size of many populations. Many populations about which inferences must be made are quite large. For example, consider the population of children 3 to 5 years old. This group of children could be as large as millions. In such a case, selecting a sample may be the only way to get the required information about the children.

4. Inaccessibility of some populations. Some populations are so difficult to access that only a sample can be used. For example, sick people in hospitals, criminals in prisons, and crashed airplanes in the sea.

5. The destructive nature of the observations we deal with. In some instances, observing a unit leads to destroying it. For example, to test for the quality of a bullet, it must be fired and thus destroyed.

6. Accuracy. A sample may be more accurate than a population. A carefully obtained sample can provide greater reliable information than a sloppily conducted population.

Page 22: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter One: Statistics: What is it?

8

1.5 Classes of Sampling and Sampling Error

There are two classes of sampling, namely, non-probability sampling and probability sampling. As the name indicates, non-probability sampling is not based on probability theory, while probability sampling is. We will consider the types of samples comprise each class. Figure 1.2 shows classes of sampling.

Figure 1.2 Classes of sampling

1.5.1 Non-Probability Sampling

This class of sampling procedures has five kinds of samples. These samples are:

1. Convenient Sample. This sample includes whoever is available as its units. For example, students of the morning class of math 210231.

2. Purposive Sample. In this sample units selected based upon judgment. For example, most vocal people at a public meeting.

3. Snowball Sample. This sample is taken in such a way that new respondents selected based upon recommendation of existing respondents. Therefore, units selected are rapport important. For example, members of activist group.

4. Quota Sample. This sample is widely used in opinion polling and market research. The selection of respondents is left to the interviewer. Interviewers are each given a quota of subjects of specified type to attempt to recruit. For example, an interviewer might be told to go out and select 10 adult men, 10 adult women, 5 teenage boys, and 5 teenage girls that they could interview them about the number of hours they watch T.V. every week.

5. Key Informant Sample. This kind of samples selects insiders who know much about phenomenon of interest. For examples, mayors and councilors to speak about residents small town.

Page 23: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part I: Introduction

9

1.5.2 Probability Sampling

This second class of sampling procedures has the properties of being representative, avoids bias, and most importantly gives an equal chance of selection to all members of the population. This last property makes the selection of units purely random. For example, if you want to select 10 students randomly from a population of 200, you can write their names on 200 pieces of papers such that a name is written on a single paper, fold the papers up, mix them thoroughly, and then choose 10. In this case, every student had an equal chance of being selected. We will consider four kinds of samples for this class that are widely used in application. We assume that the population has N units and the sample has n units.

1. Simple Random Sample (SRS). A simple random sample is obtained by choosing units in such a way that each unit in the population has an equal chance of being selected. SRS is free from sampling bias and the method of selection could be with or without replacement. For example, the selection of 10 students out of 200 mentioned earlier.

2. Systematic Random Sample. Suppose that you have N units in the population. First you number the units of the population from 1 to N. A systematic random sample is obtained by selecting one unit at random basis and choosing additional units at evenly spaced intervals until the desired number of units is obtained. So we need to decide our sample size n and divide N by n to get the sampling fraction k. We then take a unit at random and every k units thereafter. For example, suppose that N = 64 and n = 8 then k = 8. If a unit drawn at random is number 6, then subsequent units are 14, 22, 30, 38, 46, 54, and 62.

3. Stratified Random Sample. A stratified random sample is obtained by first dividing the population of N units into L subpopulations of N1, N2,…, NL units such that N = N1 + N2 + … + NL. These subpopulations are also called "strata" and must be nonoverlaping; that is they have no common units. Then, we take a simple random sample of size ni from each stratum of size Ni, i = 1, 2,…, L to get the stratified random sample whose size is n = n1 + n2 +…+ nL.

In general, the size of sample in each stratum is taken proportional to the size of that stratum. This is called "Proportional Allocation". For example, suppose the Department of Mathematics in Philadelphia University consists of number of students according to the year of study: N1 = 90 students in year 1, N2 = 63 in year 2, N3 = 18 in year 3, and N4 = 9 in year 4. Suppose that you were asked to take a stratified random sample of size 40 students. You calculate the total number of students in all four years (strata) of study to get N = 180. Then you would calculate the percentage of students in each year (stratum) to get the following results.

Page 24: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter One: Statistics: What is it?

10

% of students in first year = (90/180)*100% = 50%,

% of students in second year = (63/180)*180 = 35%,

% of students in third year = (18/180)*180 = 10%,

% of students in fourth year = (9/180)*180 = 5%.

The above calculated percentages represent the weight of each year of study and the sample size you would take from each year should be proportional to this weight. Therefore

n1 = 50% of 40 = 20,

n2 = 35% of 40 = 14,

n3 = 10% of 40 = 4, and

n4 = 5% of 40 = 2.

4. Cluster Random Sample. A cluster sample is used when a natural grouping is evident in the population. We divide the population into groups or "clusters" and select a simple random sample of clusters. The cluster sample will consist of those units included in the selected clusters. No unit from other cluster is added to the sample. This is how this method of sampling differs than stratified sampling where units are selected from each group.

As an example, suppose we are interested in a cluster sample of households of size 200 in a large town. Suppose that the town contains 20000 households, all listed in convenient records. Suppose that the town is divided into 400 areas (clusters) with 50 houses in each cluster; then we could randomly select 4 areas and include all households in those areas.

1.5.3 Sampling Error

A sample is expected to be like a mirror of the population from which it is taken. However, there is no guarantee that any sample will be precisely representative of the population from which it comes. Chance may play a role in including untypical observations in the sample. What makes a sample unrepresentative of its population? One of the frequent causes is sampling error.

Definition 1.7 Sampling Error

Sampling error is an error which arises because the data are collected from a part rather than the whole of the population. It is usually measurable from the sample data in the case of probability sampling.

Page 25: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part I: Introduction

11

Sampling error comprises the differences between the sample and population that are due solely to the particular units that happen to be selected. It looks like that sampling error is the price we have to pay when we use a sample rather than the whole population.

For example, suppose that a sample of 25 students of Philadelphia University is selected and each is asked about the place of residency and are all found to reside in Amman. It is very clear even without any statistical proof that this sample will be a highly unrepresentative sample which leads to invalid conclusions. Because of sampling error, survey sample results should be seen only as estimators for the true values.

As mentioned earlier, sampling errors can be calculated for probability samples but they cannot be determined for non-probability samples. Sampling errors can be estimated from the sample itself using the standard error (SE), a quantity of great importance which we discuss later on.

Definition 1.8 Non-Sampling Error

Non-sampling error is the error arising during the course of all survey activities other than sampling. Unlike sampling errors, non-sampling error can be present in both sample surveys and censuses.

Non-sampling errors can be classified into two groups: random errors and systematic errors.

• Random errors are the unpredictable errors resulting from estimation. They are generally cancelled out if a large enough sample is used. However, when these errors do take effect, they often lead to increased variability in the characteristic of interest.

• Systematic errors are those errors that tend to accumulate over the entire sample. For example, if there is an error in the questionnaire design, this could cause problems with the respondent's answers, which in turn, can create processing error, etc. These types of errors often lead to a bias in the final results.

Page 26: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter One: Statistics: What is it?

12

Exercises

1.1 Give an example where statistics is used. Clearly define the population of interest and the sample you intend to take from this population.

1.6 Suppose that a warehouse contains 50 tires of different brands and you want to select a simple random sample of only 4 tires. Suggest a way to do that.

1.2 Give an example of a descriptive study, and an inferential study.

1.7 For simplicity, suppose that a population has size N = 100 and you want to take a systematic sample of size 4. How would you get the sample? What will you do? Explain.

1.3 Suppose that the Health Department is interested in knowing the average weight (in kg) of the children in the age group from 6 to 8 years. Can you identify the population of interest? Can you suggest a method of sampling to get a sample of, say, 50 children? Do you have any idea about the variable of interest?

1.8 The following statistics are taken from the Department of Statistics (DOS) statistical yearbook, 2006. Classify theses statistics as descriptive or inferential.

Total population (in 1000) 5,600

Population density (person/km2)

63.1

Population growth rate (%) 2.3

Rate of natural increase (%) 2.1

Population doubling time (years)

30

1.4 Consider the library at your university. What do you call the group of all books in the library? What do you call the group of books that contain only the mathematics books?

1.9 A study published in 2007 attempted to estimate the proportion of Jordanians who read a news paper every morning. 963 persons were interviewed. Define population and sample.

1.5 Suppose that the Registrar Office in the University is interested in knowing the Grade Point Average (GPA) of the students in the Department of Mathematics. Can you identify the population of interest? Can you suggest a method of sampling to get a sample of, say, 50 students? Do you have any idea about the variable of interest?

1.10 An instructor of introductory course in statistics thinks that 5% of the students taking the course fail. To check this assumption, he takes a random sample of 37 students who have taken the course before. Define population and sample and compute the proportion of failures in the sample.

Page 27: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part I: Introduction

13

Chapter Learning Outcomes

When you complete your careful study of this chapter you should be able to:

1. define statistics and understand the exact meaning of the word.

2. realize the relationship between statistics and data.

3. understand the importance of statistics in real world problems.

4. distinguish between two types of statistics; descriptive and inferential.

5. know the definition of two important concepts; population and sample.

6. learn the reasons behind taking a sample instead of considering the whole population.

7. get an idea about classes of sampling.

8. distinguish between probability and non-probability sampling.

9. describe convenient sample, purposive sample, snowball sample, quota sample, and key informant sample.

10. describe simple random sample, systematic sample, stratified random sample, and cluster sample.

11. learn about sampling and non-sampling errors.

Chapter Key Terms

Cluster sample Probability sampling

Descriptive statistics Random sampling

Inferential statistics Representative sample

Non-probability sampling Sample

Non-sampling error Sampling

Population Sampling error

Page 28: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter One: Statistics: What is it?

14

Page 29: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

15

Part II

Descriptive Statistics

Page 30: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Two: Data and Data Organizing

16

Page 31: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

17

Chapter 2

Data and Data Organizing

Recall that statistics is the science that is concerned with collecting, organizing, analyzing, and interpreting the data. From this definition, it is clear that statistics deals with data and we can see without much effort the strong relationship between the statistics and data. What does data mean? The Webster's New World Dictionary gives the following definition of the word data.

Data are facts or figures from which conclusions can be drawn.

Data in everyday language is a synonym for information. In sciences there is a clear distinction between data and information, where data is a measurement that could be disorganized and when the data becomes organized it becomes information. Data, data collection, and data organizing are important in a variety of areas and disciplines. Some examples are industry, agriculture, the economy, markets and businesses, health sciences, social sciences, politics, and sports.

Chapter Outline

2.1 Importance of Data and Its Types

2.2 Some Methods to Organize Data

2.3 Shapes of Distributions

Page 32: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Two: Data and Data Organizing

18

2.1 Importance of Data and Its Types

Data are as important as words. No matter what we study and what our area of specialization is, it is important to know how to deal with data and how to follow and understand numerical facts and figures. After all, we must know how to treat data and deal with, in order to be able to make useful and successful decisions to:

• plan for short or long term strategies,

• offer better services,

• know about a new crop varieties or medicine, or

• get better quality and have more reliability.

Data are classified as either quantitative or qualitative. Quantitative data could be either discrete or continuous, whereas qualitative data could be either nominal or ordinal.

Data are collected to get an idea about something we have interest in. For example, suppose that we are interested in the grades of students in a given test. Notice that these grades vary from one student to the other, the grades are inconstant. This is why we call the grades a variable. Examples of variables are names, weights, speeds, days of the weeks, number of children in family, etc.

Definition 2.1 Variable

Variable is a quantity that varies from one thing to another.

Variables can be classified into

1. Numerical variables. Numerical variables, which are also called quantitative variables, are of two types; discrete variables and continuous variables. Quantitative variables are variables that assume numbers.

2. Non-numerical variables. Non-numerical variables, which are also called qualitative variables (or categorical variables), are also of two types; nominal variables and ordinal variables. Qualitative variables are variables that place the individuals (or observations) in categories.

Numerical (or quantitative) variables, in general, are classified according to the number of values which they can assume.

Definition 2.2 Discrete Variable

Discrete variables are numerical variables that can assume either finite or countable infinite number of values.

Page 33: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

19

Definition 2.3 Continuous Variable

Continuous variables are numerical variables that assume values in intervals of the real line.

Example 2.1 Some Discrete and Continuous variables

As examples of a finite discrete variable, consider number of children in family, number of students who get 95 or above in the final test, and number of olive tree which have a yield greater than 60 kg.

For countable infinite discrete random variable, consider the number of cars passing through a bridge in the period 9:00 am-1:00 am, number of telephone calls in a switch board, and number of people ahead of you trying to get a service from bank teller.

For continuous random variable examples we consider time, weight, height, area, volume, speed, and probability.

Non-numerical (qualitative or categorical) variables, are those variables that place individuals (or observations) in categories or classes, as mentioned above.

Definition 2.4 Nominal Variable

The qualitative variable is called nominal if order is not important.

Definition 2.5 Ordinal Variable

The qualitative variable is called ordinal if order is important.

Example 2.2 Some Nominal and Ordinal Variables

Blood types (A, B, AB, and O), gender (male, and female), country of birth (Jordan, Palestine, and Egypt), race (black, and white) are all examples of nominal variables. However, year in college (year 1, year 2, year 3, and year 4), days of the week (Sunday, Monday, … , Friday), degrees of agreement (do not agree, agree weakly, agree, agree strongly) are all examples of ordinal variables.

Figure 2.1 illustrates kinds of variables.

Page 34: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Two: Data and Data Organizing

20

Figure 2.1: Kinds of variables

Some Methods to Organize Data

After we collect the raw data about the problem or study at hand we need to organize it, that is, to present it in simple, easy, and understandable way. By raw data we mean data before organizing and/or treatment. We consider raw data as a treasure that contains a large amount of useful information if treated well in a proper manner. Just like gold and diamonds which are found inside the mountains and hills, data could have lots of valuable information.

We can organize data using methods that depends on tables and graphs. These methods will assist to better understand the phenomenon or problem under study. For Graphing we use the two coordinates, the x-axis to represent classes or intervals and the y-axis to represent frequencies of the classes. There are many graphs that one could use, we shall look at and discuss some which we think are the most important and used very frequently.

2.2.1 Organizing numerical data.

2.2.2 Tabulating and graphing univariate numerical data.

2.2.3 Graphing bivariate numerical data.

2.2.4 Tabulating and graphing univariate categorical data.

2.2.5 Tabulating and graphing bivariate categorical data.

2.2.1 Organizing Numerical Data

We can use dot diagrams, ordered data (ordered array), or stem-and-leaf diagram. We discuss each method with an example.

Page 35: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

21

Dot Diagram

It is a graph of the values at hand on the x-axis, where each value is represented by a dot. If one value is repeated again, you put two dots on top of each other. The dot diagram visually summarizes the information about the problem under study and is used when the number of observations is small. Dot diagram is good in detecting outliers and when multiple samples are used in the same dot diagram then it shows differences between samples.

Example 2.3 Dot Diagram for Grades of Students

Suppose that the final grades of a sample of n = 17 students in one of the courses are given below.

82 79 79 82 80 80 82 85 60 70 78 80 88 83 79 87 78

Figure 2.2 shows the dot diagram for this data.

Figure 2.2 Dot diagram of grades

Ordered Data

Data could be ordered (or ranked) in two ways. Ordering (or ranking) could either be in an ascending order (from smallest to largest) or in a descending order (from largest to smallest). When data are ordered, it becomes easier to pick out extremes, typical values, and concentration of values.

Example 2.4 Ordered Data for Grades of Students

Refer to the grades of n =17 students. Order the data in an ascending order.

The raw data are

82 79 79 82 80 80 82 85 60 70 78 80 88 83 79 87 78

These are the data as collected before any treatment. The data ordered in ascending order would look like the following.

60 70 78 78 79 79 79 80 80 80 82 82 82 83 85 87 88

Page 36: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Two: Data and Data Organizing

22

Stem-and-Leaf Diagram

The stem-and leaf diagram is a valuable tool to organize a set of data and understand how the values distribute and cluster over the range of the observations in the set of data. A stem-and-leaf diagram can be constructed as follows:

1. Regard each data point as having two parts, a stem part and a leaf part, respectively. The stem part consists of the number formed by all but the rightmost digit of the number, and the leaf part consists of the rightmost digit. Thus the stem of the number 76 is 7, the leaf is 6 and the stem of the number 762 is 76 and the leaf is 2.

2. Write the smallest stem in your data set in the upper left-hand corner of the diagram.

3. Write the second stem, which equals the first stem + 1, below the first stem.

4. Repeat step 3 until you get the largest stem in your data set.

5. Draw a vertical bar to the right of the column of stems.

6. For each number in your data set, find the appropriate stem and right its leaf to the right of the vertical bar.

7. To get an ordered stem-and-leaf, for each stem put the leafs in an ascending order.

8. To get two lines for each stem, take the first line with leaf digits 0-4 and the second line with leaf digits 5-9.

Example 2.5 Stem-and-Leaf Diagram for Grades of Students

The weights (in kg) for a sample of 53 high school students are listed below. Construct stem-and-leaf diagram, ordered stem-and-leaf diagram, and two lines ordered stem-and-leaf diagram.

88 88 66 83 51 82 81 81 81 79

61 58 91 55 82 50 49 48 46 38

61 61 62 66 83 66 65 65 68 68

69 70 71 71 76 96 75 75 98 71

88 89 58 92 95 75 97 98 74 105

40 42 44

Page 37: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

23

Solution

Figure 2.3 shows a stem-and-leaf for the weights data set, Figure 2.4 shows an ordered stem-and-leaf, and Figure 2.5 shows two lines stem-and-leaf diagram.

2.2.2 Tabulating and Graphing Univariate Numerical Data

Dot diagrams, ordered data, and stem-and-leafs are good methods to display a data set when the number of observations in the data set is small. As this number gets large, it becomes necessary to further condense the data into appropriate summary table in order to properly present, analyze, and interpret the results. We wish to arrange the data into classes or intervals according to convenient divisions of the range of observations. Such an arrangement of data in a table is called a frequency distribution.

Definition 2.6 Frequency

The frequency of a particular observation is the number of times the observation occurs in the data set.

Frequency is usually denoted by f.

Definition 2.7 Distribution

The distribution of a variable is the pattern of frequencies of the observations.

Definition 2.8 Frequency Distribution

The frequency distribution is a table used to arrange data. It contains two columns, the first column is used for numerically ordered classes (or intervals) and the second column is used to denote the frequency that corresponds to each class; that is, the number of observations in that class.

Page 38: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Two: Data and Data Organizing

24

The idea behind the frequency distribution is to put "similar" observations in the same class. Experts of the subject under study can determine what is meant by "similar" observations. Examples are assigning letter grade for numerical grades, the problem of blood pressure, etc.

Frequency distributions can show either the actual number of observations falling in each class; the frequency of each class, or the percentage of observations. When percentages are used in the latter instance, the distribution is called a relative frequency distribution.

Frequency distributions can be used for both types of variables quantitative and qualitative. Continuous variables should only be used with class intervals, which will be explained later.

Unlike stem-and-leaf diagrams, in frequency distributions some of the information will be sacrificed. Instead of knowing the exact value of each observation, we only know that it belongs to a certain class or category. However, this kind of grouping often brings out important features of the data and is very efficient if the size of data is large. In this case, the dot diagram and stem-and-leaf procedures are not appropriate,

How to Construct a Frequency Distribution: In this book we will give a procedure that can be used for constructing a frequency distribution. Using this procedure there are some steps that we must follow. These steps are summarized in the following:

• Selecting the Number of Classes. The number of classes to be selected depends on the number of observations. Large number of observations requires large number of classes. In general, we usually use 5 to 15 classes. The number of classes affects the amount of information gained from the frequency distribution. When the number of classes is not enough, or if there are too many classes then little new information will be learned. Some authors use a rule that we do not consider mandatory, if we let n be the number of observations in a data set and k be the number of classes then

k2 n.≥

• Obtaining the Class Width. A desirable feature to have in the frequency distribution is to have the same width for all classes. To determine the width of each class interval, we must first compute the range of the data. The range = largest value - smallest value. Then we divide the range by the number of classes desired and take the first integer greater than this result (or the ceiling). For example, suppose that the result of dividing the range by the number of classes turns out to be 5.2, 5.5, or 5.9 then we take the width of the class to be 6.

Page 39: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

25

• Setting Up Class Boundaries. The boundaries of each class should be clearly defined so that each observation falls in exactly one class. Overlapping of classes must never happen.

Example 2.6 Constructing Frequency Distribution Table

The grades (out of a 100) in the final examination of a group of 30 students are given as follows:

75 96 40 42 85 50 88 52 48 54

72 61 31 50 28 64 36 66 53 55

85 50 22 62 34 76 46 76 69 43

To construct a frequency distribution, we must first decide on the number of classes or intervals. Let us take 8 intervals for this example. Next, we find the class width and in order to do so, notice that we must find the range of the data, where range = 96 – 22 = 74. Now, we divide the range (74) by the number of classes (8) and the result is 74/8 = 9.25. Therefore, we take the class width equal to 10. Finally, we set the class boundaries as 20 to less than 30, 30 to less than 40, … , 90 to less than 100. Note that the class limits are given to as many decimal places as the original data. Since the original data in our example do not have decimal places, so are the class limits.

Remember, there should always be enough classes so that the smallest and largest values have been included. Also remember that the classes must be non-overlapping.

Note that there are many ways to write classes or intervals. An equivalent way to write the classes of the grades example is to write the classes as 20 x 29, 30 x 39, 40 x 49, ... , 90 x 100.≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤ Yet, another way to

write the classes, used especially when the variable is of the continuous type, is to write the classes as 20 -, 30 -, …, 90 -.

The frequency distribution should temporarily have three columns, one for classes, the other for tallying and the third for frequency of each class. We skip the tally column after we finish the construction of the frequency distribution. It is not needed because of frequency. Table 2.1 is a frequency distribution for the grades of 30 students in the final.

Page 40: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Two: Data and Data Organizing

26

Table 2.1 Grades for 30 students in the final examination

Table 2.2 gives the frequency distribution table after the second column for tallying has been omitted.

Table 2.2 Frequency distribution for the grades of 30 students in the final examination

The class midpoint is the point in the middle between the boundaries of each class. Class midpoint is also called class mark. So, the class midpoints for the above frequency distribution are 25, 35, … , 95.

Relative Frequency and Percentage Frequency Distributions

In addition to the frequency of a class, we are often interested in the relative frequency of a class. This relative frequency of the ith class is found by dividing the frequency (fi) of the class i by the total number of observations (n). For example, the relative frequency of the first class; 20 to less than 30 in the final examination grade example is 2/30 = 0.07, the relative frequency of the second class; 30 to less than 40 is 3/30 = 0.10, and so on for the rest of classes.

Grade Interval Frequency (f)

20 to less than 30 2 30 to less than 40 3 40 to less than 50 5 50 to less than 60 7 60 to less than 70 5 70 to less than 80 4 80 to less than 90 3 90 to less than 100 1

Total 30

Page 41: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

27

When the relative frequency of a class is multiplied by 100, it becomes a percentage, for example multiplying the relative frequency of the first class by 100 gives the result (0.07 x 100) = 7%, the second class is (0.10 x 100) = 10% and so on. Table 2.3 displays a relative frequency, a percentage frequency, and the original frequency distribution for the grade of students' example.

Table 2.3 Relative frequency and percentage frequency for the grade of 30 students in the final exam

Classes

Frequency (f)

Relative Frequency

Percentage Frequency

20 to less than 30 2 0.07 7 30 to less than 40 3 0.10 10 40 to less than 50 5 0.17 17 50 to less than 60 7 0.23 23 60 to less than 70 5 0.17 17 70 to less than 80 4 0.13 13 80 to less than 90 3 0.10 10

90 to less than 100 1 0.03 3 Total 30 1.00 100%

Frequency distributions along with relative frequency and percentage frequency can provide important information for the problem under study. From Table 2.4, we can say that

• 7% of the students have taken grades of 20 to less than 30 in the final examination, and

• The probability, an important concept of this book which will be discussed in details later on, of any randomly selected student having a grade in the range from 20 to less than 30 is 0.07.

The Cumulative Distribution

The cumulative distribution table is another useful method to present data. It can be constructed from the frequency distribution, the relative frequency distribution, or the percentage distribution. As the name indicates, a cumulative frequency for a class is the sum of its frequency plus all the frequencies of the preceding classes. Therefore, the cumulative frequency for the first class is always equal to its frequency; the cumulative frequency for the second class is the sum of the frequencies of the first class and the second; and so on. The last class always has cumulative frequency equals to the total of observations, n.

For example, referring to the grade of students' example, the cumulative frequency for the first class; 20 to less than 30 is 2; the cumulative

Page 42: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Two: Data and Data Organizing

28

frequency of the second class; 30 to less than 40 is 2+3 = 5, and so on. The last class; 90 to less than 100 has a cumulative frequency equals to 30. Table 2.4 displays the cumulative frequency for the grade of students' example.

Table 2.4 Cumulative frequency for the grade of 30 students in the final exam

Classes

Frequency (f)

Cumulative Frequency

20 to less than 30 2 2 30 to less than 40 3 5 40 to less than 50 5 10 50 to less than 60 7 17 60 to less than 70 5 22 70 to less than 80 4 26 80 to less than 90 3 29

90 to less than 100 1 30 Total 30

There are several forms of cumulative distributions, these are the "less than", "or less", "more than", and "or more" cumulative distributions. We illustrate the idea of the "less than" cumulative frequency distribution in the example that follows.

Example 2.7 An Example to Illustrate the Idea of "OR Less" Cumulative Distribution

A production engineer collected data on the number of defective sets of TVs of 30 batches each contains 100 TVs. The data are as follows:

0 3 0 0 3 0 2 2 0 1 2 1 0 0 1 2 4 0 4 2 1 0 1 0 0 2 0 1 3 2

1. Construct a frequency distribution using classes based on a single value, then

2. Construct an "or less" cumulative distribution.

To construct a frequency distribution using classes based on a single value, we proceed in the usual manner we discussed earlier in constructing frequency distribution. The case for a frequency distribution using classes based on a single value is even easier, because we do not have to worry about number of classes and class width. Table 2.5 displays the classes, in this case the single numerical values 0, 1, 2, 3, and 4.

Page 43: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

29

Table 2.5 Frequency distribution for defective TVs

Number of defective TVs

Frequency

0 12 1 6 2 7 3 3 4 2

Total 30

Table 2.6 displays an "or less" cumulative distribution for data in the Table 2.6.

Table 2.6 An "or less" cumulative distribution for defective TVs

The Histograms

The properties of frequency distributions, with regard to their shapes, are best exhibited by means of a graph. Indeed, we often employ graphic techniques to add more meaning to describe a set of data. In particular, histograms are used to describe numerical data that have been grouped into frequency, relative frequency, or percentage frequency distributions.

A histogram is a graph of the frequency distribution. It is a vertical bar chart in which the rectangular bars are constructed at the limits of each class.

Some observations about histograms that must be noted are:

• There are no gaps between bars.

• The height of each bar is equal to the frequency of the class the bar represents.

• The width of each bar extends from the lower limit to the upper limit of the class the bar represents.

• Each axis of the histogram must be labeled; given a name, and the whole graph must be given a title.

Number of defective TVs

Cumulative Frequency

0 or Less 12 1 or Less 18 2 or Less 25 3 or Less 28 4 or Less 30

Page 44: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Two: Data and Data Organizing

30

Grade

100 90 80 70 60 50 40 30 20

Fre

qu

en

cy

7

6

5

4

3

2

1

0

Mean =56.97 Std. Dev. =18.993

N =30

Example 2.8 Construct a Histogram for the Data of the Grades of Students in Table 2.2

We draw the x-axes; lable it grade , and the y-axes; label it frequency. The first interval; 20 to less than 30 has frequency of 2 so we draw a bar (or rectangle) that extends from 20 to 30 with height equals 2. The second interval; 30 to less than 40 has frequency of 3 so we draw another bar that extends from 30 t0 40 with height of 3 and so on untill we take all the intervals of Table 2.2. When we finish, the histogram will look like as in Figure 2.6 below.

Figure 2.6 A Histogram for the data of the grade of students'

How Can Frequency Histrograms Be Useful?

Just like stem-and-leaf diagram, histograms show shapes of distributions of the observations in a data set. If properly constructed, not too few or too many intervals, histograms allow us to determine whether the shape of the distribution of our data is bell-shaped, right-skewed, left skewed, or either, based on the overall heights of the bars. The histogram in Figure 2.6 looks pretty symmetric, or bell-shaped. Histograms are also useful in identifying outliers. Finally, if a histogram is symmetric around a value, that value equals the average (which will be discussed later). In this case, half the area under the histogram lies to the left of that value and half to the right. The average of the the data in Figure 2.6 is approximately 57.

2.2.3 Graphing Bivariate Numerical Data

Bivariate data are data measured on two variables; that is the measurements are taken on two variables such as weight and height, age of car and price, blood pressure and age of patients, etc. In this case we have a sample of size n of pairs of variables (x1, y1), (x2, y2), … , (xn, yn). To graph this bivariate numerical data we will use a graph called scatter plot.

Page 45: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

31

0 6000 12000 18000 24000 30000

Area

0

120

240

360

480

600

De

nsi

ty

��

Scatter plot is a graph of the values of the points (x1, y1), (x2, y2), … , (xn, yn) on the xy-plane. It shows how the data are scattered so that we can visualize any apparent relationships between the two variables x and y.

Example 2.9 An Example that Illustrates Graphing Numerical Bivariate Data Using Scatter Plot

The following table represents area (in km2) and population density (in km2) by governorate in Jordan*.

Governorate

Area x

Population density y

Amman 7,579 286.7 Balqa 1,119 335.3 Zarqa 4,761 175.3

Madaba 940 148.9 Irbid 1,572 634.1

Mafraq 26,541 9.9 Jarash 410 409.7 Ajlun 420 306.7 Karak 3,495 62.5 Tafiela 2,209 35.5 Ma'an 32,832 3.2 Aqaba 6,900 17.0

*Source: Statistical Yearbook, DOS, 2006.

To graph this set of bivariate data using a scatter plot, we put the area on the x-axis and population density on the y-axis. The resulting graph is shown in Figure 2.7 below. Scatter plots are widely used in regression analysis where the predicted variable is put on the x-axis and the response variable on the y-axis.

Figure 2.7 Scatter plot for area and population density in Jordan

The scatter plot in Figure 2.7 shows the relationship between the two variables area (in km2) and population density (in km2) in that as area increases, the population density decreases.

Page 46: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Two: Data and Data Organizing

32

2.2.4 Tabulating and Graphing Univariate Categorical Data

By a univariate categorical data we mean data taken on one categorical variable. To tabulate data of this type we use a summary table, see Figure 2.8, and the data is presented by a tally (or count). A summery table for categorical data is similar to the frequency distribution table we used in section 2.2.3.

Figure 2.8 Methods used in tabulating and graphing univariate categorical data

To illustrate the idea of a summary table, let us consider the following example.

Example 2.10 Tabulating Univariate Categorical Data

Suppose that the morning class 8:10-9:00 of the course Introduction to Statistics has the following students classified according to class level (that is, the year of study).

Jonior Senior Senior Sophomore Freshman

Freshman Junior Freshman Senior Senior

Sophomore Junior Sophomore Freshman Sophomore

Junior Sophomore Sophomore Freshman Sophomore

The variable class level is a qualitatitive (categorical) variable of the ordinal type. Therefore, to tabulate the data given in the above table, we construct a summary table (see Table 2.7) that consists of two columns. Column one is labeled class level and column two counts.

Page 47: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

33

Cla

ss L

eve

l

Senior

Junior

Sophomore

Freshman

Frequency6420

Class LevelSeniorJuniorSophomoreFreshman

Fre

que

ncy

6

4

2

0

Table 2.7 summary table for students in statistics class

Class Level Frequency

Freshman 5 Sophomore 7

Junior 4 Senior 4

Graphing Univariate Categorical Data

To graph univariate categorical data, we use bar charts, pie charts, or Pareto diagrams. We discuss each of these methods of graphing next.

Bar Graphs

One of the most widly used methods for displying categorical data is the bar graph.. It is often used in exploratory data analysis to illustrate the major features of the distribution of the data in a convenient way. A bar graph looks like a histogram except that its bars, which are of the same width, do not touch each other. In bar graphs, each category is depicted by a bar, the bar length is the frequency (or percentage) of observations. The length, and hence area, of each rectangle is proportional to the frequency of the category it represents. Bar graphs can be represented horizantally or vertically.

(a) Vertical Bar Chart (b) Horizantal Bar Chart

Figure 2.9 Bar graph for data in Table 2.8

From Figure 2.9 we observe that the bar graph allows us to directly compare the class levels. The tallest bar is for sophomores, followed by the bar for freshmae, folled by two equal bars for both juniors and seniors.

Pie Charts

Categorical distributions are often presented by means of pie charts, in which a circle is divided into sectors proportional in size to the frequencies, relative frequencies, or percentages with which the data are distributed among the categories. Thus, the categories are shown as different slices of a pie.

Page 48: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Two: Data and Data Organizing

34

FreshmanSophomoreJuniorSenior

Level

Pies show counts

Freshman

25.00%

Sophomore

35.00%Junior

20.00%

Senior

20.00%

As an example to construct the pie chart, we use the data in Table 2.8 for class levels to draw a pie chart using relative frequencies of the intervals, as given in Table 2.8.

Table 2.8 Frequency and Relative Frequency Distributions for Class Level Data

In this case we need to divide the circle into four pie-shaped pieces comprising 25%, 35%, 20%, and 20% of the circle. To do this we can use the fact that there are 360° in a circle and the help of a protractor. The first piece of the circle is obtained by marking off 90° (25% of 360°), the second piece is obtained by marking off 126° (35% of 360°), and finally the third and fourth pieces are obtained by making off 72° (20% of 360°) for each piece. The pie chart for the relative frequency distribution in Table 2.9 is shown in Figure 2.10.

Figure 2.10 Classl level pie chart

Pareto Diagrams

The Pareto diagram is named after Vilfredo Pareto, a 19th century Italian economist who postulated that a large share of wealth is owned by a small percentage of the population. This basic principle is translated well into quality problems; most quality problems results from a small number of causes. Quality experts often refer to this principal as the 80-20 rule; that is, 80% of problems are caused by 20% of potential causes.

The Paret diagram is a special type of bar graph where the values being plotted are arranged in descending order. Typically, on the left vertical

Class Level

Frequency

Relative Frequency

Freshman 5 0.25 Sophomore 7 0.35

Junior 4 0.20 Senior 4 0.20

Page 49: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

35

axis is frequency of occurance, but it can alternatively represnts cost or other important unit of measure. The right vertical axis is the cumulative percentage of the the total number of of occurences, total cost, or total of the particular unit of measure. Again, the purpose of the Pareto diagram is to highlight the most important "vital few" among a "typically large" set of factors.

We consider the following example to demonestrate the idea of Paret diagram. A manufacturing company produces plastic bottles for the dairy industry. Some of the bottles are rejected for poor quality. Causes of poor quality bottles are given below for 500 plastic bottles that were rejected.

Problem Frequency

Discoloration 30 Thickness 120

Broken Handle 90 Fault in Plastic 220

Labeling 40 Total 500

Now we write the categories of the problem in plastic bottles in a descending order with respect to their frequencies. In this way we get the following table:

Problem Frequency

Fault in Plastic 220 Thickness 120

Broken Handle 90 Labeling 40

Discoloration 30 Total 500

From the above frequency distribution we construct both relative frequency distribution and cumulative relative frequency distribution. In doing so, we get the following table:

Problem

Frequency

Relative

Frequency

Cumulative Relative

Frequency

Fault in Plastic 220 0.44 0.44 Thickness 120 0.24 0.68

Broken Handle 90 0.18 0.86 Labeling 40 0.08 0.94

Discoloration 30 0.06 1.00 Total 500 1.00 (100%)

Page 50: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Two: Data and Data Organizing

36

Then, we draw two vertical axis, one to the left where we put frequency, and the other to the right where we put cumulative frequency. We also draw a horizantal line where we put a bar for each category in an descending order of their frequencies. Finally we draw what is called a cumulative polygon on the same scale. A cumulative polygon is represented by a dot for the relative cumulative frequency of each category and then we coonnect each two dots with a straight line.

Figure 2.11 below depicts the Pareto diagram for the plastic bottles problem example.

Figure 2.11 Pareto diagram for the plastic bottles problem

How Can Pareto Diagrams Be Useful?

Look for a break point on the relative cumulative frequency polygon. It can be identified by a marked change in the slope of the graph. This point seperates the vital few from the trivial many. In our example, we see from the lengths of the vertical bars that 90 out of the 500 bottles are classified as having broken handles. From the cumulative polygon we see that 86% of the bottles having problems are classified as either to have fault in plastic, thickness, or broken handles.

2.2.5 Tabulating and Graphing Bivariate Categorical Data

By a bivariate categorical data we mean data taken on two categorical variables. To tabulate data of this type we use a two-way summary table called the r x c contingency table. Contingency tables are also used for numerical quantitative variables.

Co

un

t

Pe

rce

nt

Causes of poor quality

Count

18.0 8.0 6.0

Cum % 44.0 68.0 86.0 94.0 100.0

220 120 90 40 30

Percent 44.0 24.0

Disco

lora

tion

Labelin

g

Bro

ken h

andle

Thic

kness

Fault

in p

last

ic

500

400

300

200

100

0

100

80

60

40

20

0

Pareto diagram for plastic bottle problems

Page 51: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

37

It is customary to get the following totals in a contingency table: 1. Total number of observations in each row. These totals are displayed

in the right margins and are called row marginal totals. 2. Total number of observations in each column. These totals are

displayed in the bottom margins and are called column marginal totals.

3. Total number of observations in the r by c cells. This total is displayed in the lower right-hand corner of the table and is called the grand total.

To illustrate the idea of a contingency table, let us consider the following example.

Example 2.11 The r x c Contingency Table

Suppose that the morning class 8:10-9:00 of the course Introduction to Statistics has the following students classified according to class level and major.

Class Level Major Class Level Major

Junior Genetic Freshman Genetic Freshman IT Junior IT

Sophomore IT Sophomore Genetic Senior IT Junior IT Junior IT Junior Genetic Junior IT Freshman IT Junior Genetic Junior IT

Freshman IT Sophomore Genetic Sophomore IT Senior IT

Senior Genetic Junior Genetic

The Contingency Table or the Two-Way Table is a table used to classify population (or sample) observations according to two characteristics.

It is composed of r rows cross-classified by c columns. This is why it has r by c cells

The rows represent the classes of one variable, and the columns represent the classes of the other.

One variable is randomly assigned to the rows and the other variable to the columns.

Each of the r by c cells represents the number of observations with a specific value for each of the two variables.

These cells are referred to by numbers. The (i, j)-th cell is the cell in the ith row and the jth column, where i = 1, 2, … , r and j = 1, 2, … , c.

Page 52: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Two: Data and Data Organizing

38

Junior Genetic Freshman IT Freshman IT Senior IT

Sophomore Genetic Sophomore Genetic Junior IT Junior IT

Freshman Genetic Junior IT Junior IT Freshman IT

Sophomore Genetic Senior Genetic Senior IT Freshman Genetic Senior Genetic Senior IT

Freshman Genetic Sophomore Genetic

The first categorical variable in the above table is Class Level which has 4 categories; namely, Freshman, Sophomore, Senior, and Junior. The second categorical variable is Major which has 2 categories; IT, and Genetics. At random we choose one of the two variables, say, Major for rows and the other variable, Class Level, for columns. This way we have a 2x4 contingency table which would look like as in Table 2.9 shown below.

Table 2.9 A 2 x 4 contingency table for data in Example 2.11

Class Level Freshman Sophomore Junior Senior Total

IT 6 2 9 5 22

Genetic 4 6 5 3 18

Majo

r

Total 10 8 14 8 40

The number in the (i, j)-th cell represents the frequency resulting from the intersection of the ith row with the jth column. For example, the 6 in cell (1, 1) means that there are 6 students whose Major is IT and Class Level Freshman. The 5 in cell (2, 3) means that there are 5 students whose Major is Genetics and Class Level Junior. There are three totals in this contingency table; row total, column total, and grand total. The row total is the total of the two rows; IT (22 = 6 + 2 + 9 + 5) and Genetc (18 = 4 + 6 + 5 + 3). The column total is the total of the four columns Freshman (10=6+4), Sophomore (8 = 2 + 6), Junior (14 = 9 + 5), and Senior (8 = 5 + 3). Finally, the grand total is the sum of the two rows or the sum of the four column. The grand total can also be found by summing the frequencies of all eight cells of the table.

Side-by-Side Bar Chart

The side-by-side bar chart is a graphic display used to visualize bivariate categorical data in an r x c contingency table. It is best used when our primary interest is showing differences in magnitude rather than

Page 53: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

39

FreshmanSophomoreJuniorSenior

Class

Bars show counts

2 4 6 8

Count

IT

GentcM

ajor

n=6n=6n=6n=6

n=2n=2n=2n=2

n=9n=9n=9n=9

n=5n=5n=5n=5

n=4n=4n=4n=4

n=6n=6n=6n=6

n=5n=5n=5n=5

n=3n=3n=3n=3

differences in percentages. Figure 2.12 is the side-by-side bar chart for the data given in Table 2.10.

Figure 2.12 Side-by-side bar chart for data of class level and major

2.3 Shapes of Distributions

An important property for any set of data is its shape, meaning how the data looks. Distributions have different shapes; they don't all look alike. Describing the shape of the data is a first step in understanding the distribution of data. Shapes of distributions for the data can be described graphically by the methods we learned in section 2.3, such as dot diagrams, stem-and-leaf diagrams, or histograms. We will classify the distributions as symmetric or asymmetric.

• A distribution is said to be symmetric if its right-hand side is a mirror image of its left-hand side. If you fold the distribution in the middle, the two sides will match perfectly (Figure 2.13 (a)).

• A distribution is said to be asymmetric if it is not symmetric. One class of asymmetric distributions of interest is the class of skewed distributions. These either have a long tail to the right or to the left. If the long tail is to the right, the distribution is called skewed to the right (or positive skewed) (Figure 2.13 (b). On the other hand, if the long tail is to the left, the distribution is called skewed to the left (or negative skewed) (Figure 2.13 (c)).

Figure 2.13 Comparing three different shapes

Page 54: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Two: Data and Data Organizing

40

Some Common Shapes of Distributions

1. Bimodal (see Figure 2.14 (a))

2. Rectangular (see Figure 2.14 (b).

3. S-Shpaed (see Figure 2.14 (c)).

4. Reversed J (see Figure 2.14 (d)).

Figure 2.14 Some common distributions

Page 55: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

41

Exercises

2.1 Classify each of the following variables as being numerical, discrete, or continuous, or non-numerical, nominal, or ordinal.

Color, time, grade, political affiliation, height, opinion, weight, shifts of work, marital status,higer degree father earned, months of the year, area, volume, race, Jordan Governorates.

2.2 A survey was taken in a city. In each of 20 households, people were asked how many t.v. set do they have. The results were recorded as follows:

1 2 1 0 3 2 2 3

4 0 1 1 1 3 2 2

1 4 0 0

Order the data and graph a dot diagram.

2.3 The ages, in years, for a random sample of 30 chess players is given below.

80 66 76 63 57 65 55 65 66 72 80

66 92 54 48 36 45 60 50 62 78 61

66 67 55 69 59 59 55 43

Draw stem-and-leaf and ordered stem-and-leaf diagrams for this data.

2.6 Population (in 1,000) by sex for selected years in Jordan is given in the following table*:

Year Male Female Total

1952 301.7 284.5 586.2

1961 469.4 431.4 900.8

1979 1,115.8 1,017.2 2,133.0

1994 2,160.7 1,978.7 4,139.4

1996 2,270.7 2,079.3 4,350.0

1997 2,328.1 2,131.9 4,460.0

1998 2,385.5 2,184.5 4,570.0

1999 2,448.2 2,241.8 4,690.0

2000 2,482.3 2,337.7 4,820.0

2001 2,544.1 2,359.9 4,940.0

2002 2,611.0 2,459.0 5,070.0

2003 2,678.0 2,522.0 5,200.0

2004 2,275.7 2,592.3 5,350.0

2005 2,281.1 2,651.9 5.473.0

2006 2,886.6 2,713.4 5,600.0

*Source: Jordan in Figures, DOS, 2006.

Draw three scatter plots for this data: year and male, year and female, and year and total. Do you see any relationship between the two variables you graph?

2.4 Refer to the data given in Exercise 2.3. Construct frequency distribution and relative frequency distributions for the ages of chess players. Use 6 classes 35-44, 45-54, … , 85-94. Draw histograms.

2.5 Refer again to the data in Exercise 2.3. This time construct cumulative frequency of the type "or less". Graph your findings.

2.7 Obtain stem-and-leaf and dot diagrams for the following set of data. Comment on the shape.

66 9 62 21 11 59 25 39

24 21 19 67 71 124 67 21

4 82 32 91 152 20 23 40

108 5 63 1 10 125

Page 56: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Two: Data and Data Organizing

42

2.8 The following table represents the percent distribution of employed Jordanians age 15+ years by employment status*:

Employment status

M

F

Total

Paid employee 82.2 94.3 83.8

Employer 7.3 1.9 6.6

Self employed 9.9 2.5 8.9

Unpaid family worker

0.5 0.8 0.5

Unpaid worker

0.2 0.4 0.2

Total 100.0 100.0 100.0

*Source: Jordan in Figures, DOS, 2006.

a. Draw a pie chart for each of the males, females, and total.

b. Construct a bar graph for the total.

c. Construct a side-by-side bar graph for males and females by employment status.

2.10 The following table represents the percent distribution of Jordanians age 15+ years by educational level and sex*:

Educational level

M %

F %

Total %

Illiterate 5.1 13.7 9.3

Less than secondary

58.3 49.1 53.7

Secondary 17.5 18.8 18.2

Intermediate diploma

6.3 9.8 8.0

Bachelor and above

12.8 8.5 10.7

Total 100.0 100.0 100.0

*Source: Jordan in Figures, DOS, 2006.

a. Draw a pie chart for each of the males, females, and total.

b. Construct a bar graph for the total.

c. Construct a side-by-side bar graph for males and females by educational level.

2.9 The following table represents transit goods (1000 tons) passing through Aqaba port by country of destination in 2006*.What is the variable of interest?

Country of destination

Tons

Iraq 460.5

Syria 32.8

S. Arabia 107.7

Lebanon 3.4

Kuwait 19.9

Others 30.4

*Source: Statistical Yearbook, DOS, 2006.

2.11 The following table represents the electricity peak load (in G.W.H.), 1998 – 2006*. What is the variable?

Year Peak load

1998 1060

1999 1137

2000 1238

2001 1255

2002 1410

2003 1428

2004 1555

2005 1751

2006 1901

*Source: Statistical Yearbook, DOS, 2006.

Page 57: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

43

2.12 Hospitals keep records for injury incidence arriving the emergency room. The following is a data set that contains 66 injury incidence in a given day.

Sp Co Co Sp Fr Co

Fr Sp Fr Co St Sp

Co St Sp Fr Sp Fr

St Sp St Sp Co Co

Co Co Sp Fr Fr Fr

Fr Sp Co Sp Sp Sp

St Fr Sp St Co Fr

Co Sp Fr Sp Sp St

Fr Co Sp Co Fr Co

St St St Sp Co St

Co Fr Sp Fr Fr Co

where,

Sp = sprain, Co = contusion, Fr = fraction, and St = strain.

a. Construct a summary table for the type of injury incident.

b. Draw a bar chart.

c. Draw a pie chart. Remem-ber, for pie chart you need rel-ative frequency distribution.

Draw a Pareto diagram. Remember, for a Paret diagr-am you need cumulative rel-ative frequency.

2.13 Hospitals keep records for injury incidence arriving the emergency room. The foll-owing is a data set that con-tains 66 injury incidence in a given day classified according to gender; male and female.

Inj. Sex Inj. Sex Inj. Sex Inj. Sex

Sp M Sp M Sp M Sp M

Fr F Co M Fr F Co F

Co M Fr F Sp M Sp M

St F Sp M Co F Fr M

Co M Fr F St M Co F

Fr F Sp F Fr M Fr M

St F St M Co F Co F

Co M Sp F Fr M Sp M

Fr F Co M Sp F Fr M

St M Sp F Str M Co M

Co F Fr M Sp M Fr F

Co M Fr F Co F Sp M

Sp M St F Sp M Fr F

St M Sp M Fr F St M

Sp F Co M Sp M Co M

Co F Fr M St M St M

Sp M Sp M

where,

Sp = sprain, Co = contusion, Fr = fraction, and St = strain, M = male, and F = female.

a. Construct a contingency table for this set of data.

b. Draw a side-by-side bar chart.

Page 58: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Two: Data and Data Organizing

44

2.14 Bar graph below shows percentage distribution of Jordanian age 15+ years by education level*:

*Source: Jordan in Figures, DOS, 2006.

Approximately, what percentage of Jordanians has less than secondary school education?

2.15 The following pie chart shows area (in km2) by region in Jordan*:

*Source: Jordan in Figures, DOS, 2006.

If the total area of the Kingdom is 88,778 km2, determine the area of the north region.

2.16 Identify the shape of the distribution:

a.

b.

Exercise 2.16 continued

c.

d.

10 5 4

20 4 1

30 6 2 3 8

40 0 4 7 9 2 7

50 5 1 8 3 4 0 8 8

60 6 2 2 1 7 8 7 4 5 3

70 5 0 2 4 1

Page 59: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

45

Chapter Learning Outcomes

When you complete your careful study of this chapter you should be able to: 1. know why data are important. 2. classify data and variables as either quantitative (numerical) or qualitative

(non-numerical). 3. see the difference between discrete and continuous data and variables, and

the difference between nominal and ordinal data and vaiables. 4. graph univariate quantitative data using dot diagram, and stem-and-leaf

diagram when the set of data is small. 5. tabulte and graph quantitative data using frequency distributions and

histograms when the set of data is large. 6. construct relative frequency distributions and histograms and feel the

usefulness. 7. learn about cumulative distributions and see how good they are. 8. graph bivariate numerical data using scatter plot. 9. tabulate and graph univariate qualitative (categorical) data using

summary tables, bar graphs, pie charts, and Pareto diagrams. 10. tabulate and graph bivariate qualitative (categorical) data using

contingency tables and side-by-side bar chart. 11. identify the shape of distribution of a data set and classify the distributions

as being symmetric or asymmetric. 12. have an idea about the graphical representation of some common shapes

of distributions.

Chapter Key Terms

Asymmetric Nominal variable Bar graph Normal Bivariate data Numerical data Contingency table Ordered data Continuous variable Ordinal variable Cumulative distribution Pareto diagram Data Pie chart Data organizing Scatter plot Descriptive statistics Shape Discrete vaiable Skewed to the left distribution Distribution Skewed to the right distribution Dot diagram Stem-and-leaf Frequency Symmetric Frequency distribution Univariate data Frequency histogram Variable

Page 60: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Two: Data and Data Organizing

46

Page 61: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

47

Chapter 3

Summarizing Data Numerically

In the preceding chapter we learned methods on how to present numerical and non-numerical (categorical) data in tables and graphs. The question now is "How do we make sense out of these data and the information it contains"? It is true that presenting the data in a proper manner is essential to data analysis, but it is not enough and should be accompanied with computing and summarizing the important features and analyzing the findings.

In this chapter we discuss some important descriptive measures that are found from any data set. These measures include measures of central tendency, measures of variation, measures of position, and finally we discuss the five-number summary and boxplots.

Chapter Outline

3.1 Measures of Central Tendency

3.2 Measures of Variation

3.3 Measures of Position

3.4 The Five-Number Summary and Box Plots

Page 62: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Three: Summarizing Data Numerically

48

3.1 Measures of Central Tendency

Figure 3.1 Measures of central tendency

Measures of central tendency are statistical terms used for describing the typical middle or centre of a distribution. To analyze a set of data, we often try to find a number or data item that can represent the whole set. These numbers or pieces of data are called measures of central tendency. In other words, measures of central tendency summarize a sample or a population by a single value. The three most commonly used measures of central tendency are the mean, the median, and the mode (see Figure 3.1).

Before we discuss these measures, let us introduce an important notation used too often in statistics, that is, the Σ notation. The Greek letter Σ (read: sigma) means 'add up'.

• Σx means add up all of the values for the variable x.

• Σy means add up all the values for the variable y.

• Σx2 means add up all the values of x after squaring them.

• (Σx)2 means add up all the values of x first and then square the result.

• Σ(x - y)2 means subtract each value of y from the corresponding value of x, square the difference, and then add up.

Example 3.1 Getting to Be Familiar with the Σ Notation

Suppose that a set of data contains the following values of the two variables x and y.

x 3 2 7 9 y 1 4 5 2

Then

Σx = 3 + 2 + 7 + 9 = 21,

Σy = 1 + 4 + 5 + 2 = 12,

Σx2 = 32 + 22 + 72 + 92 = 9 + 4 + 49 + 81 = 143,

Σy2 = 12 + 42 + 52 + 22 = 1 + 16 + 25 + 4 = 46,

Σ(x – y) = (3 – 1) + (2 – 4) + (7 – 5) + (9 – 2) = 2 + (-2) + 2 + 7 = 9,

Page 63: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

49

Σ(x – y)2 = (3 – 1)2 + (2 – 4)2 + (7 – 5)2 + (9 – 2)2 = 22 + (-2)2 + 22 + 72 = 4 + 4 + 4 + 49 = 61

The Mean

Also called the arithmetic mean is the most commonly used measure of central tendency. It is calculated by adding up all the observations in a data set and then dividing the result by the number of observations. Thus for a sample that contains n observations; x1, x2, …, xn the mean, denoted by the symbol x (read: x bar) is calculated as

1 2 n

n

ii 1

x x ... xx

n

x

n=

+ + +=

=

Example 3.2 Calculating the Arithmetic Mean

Calculate the mean for the following data:

11 17 18 10 22 23 15 17

14 13 10 12 18 18 11 14

Solution

n

ii 1

x11 17 ... 14 243

x 15.1875n 16 16= + + +

= = = =

Note that the calculation of the mean is based on all the observations of the data set and that the computed value of the mean is not necessarily one of the values in the sample. In addition, 9 of the 16 values in the data set are smaller than the value of the arithmetic mean; this is also obvious from the dot diagram we present in Figure 3.2.

Figure 3.2 A Dot Diagram for the Data Given in Example 3.2

From Figure 3.2 it appears that the mean acts as a balancing point so that smaller observations balance out larger ones. To better understand this point, let us introduce what is called deviations. The deviation is the difference between an observation; xi, and the mean x , that is xi - x . If we calculate the

x 15.1875=

Page 64: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Three: Summarizing Data Numerically

50

deviations of all the points we see that these deviations will have negative signs for observations smaller than the mean and positive signs for observations larger than the mean. The sum of negative deviations is always equal to the sum of positive deviations which will give a sum of zero for deviations. Symbolically, this important result is written as

( )n

ii 1

x x 0.=

− =∑

Further explanation of this property is given in Example 3.3.

Example 3.2 Calculating the Deviations

Refer to the data in Example 3.2; calculate the deviations and check to see

that ( )n

ii 1

x x 0.=

− =∑

Solution

After calculating the value of the arithmetic mean, we usually make a table that contains two columns, namely, xi and xi - x .

xi xi - x 11 -4.1875

17 1.8125 18 2.8125

10 -5.1875 22 6.8125

23 7.8125 15 -0.1875

17 1.8125 14 -1.1875

13 -2.1875 The sum of negative deviations is:

10 -5.1875 -4.1875 -5.1875 -0.1875 -2.1875 -5.1875 -3.1875 - 12 -3.1875 4.1875 -1.1875 = -26.6875

18 2.8125 18 2.8125 The sum of positive deviations is:

11 -4.1875 1.8125 +2.8125 +6.8125 +7.8125 +1.8125 +2.8125 14 -1.1875 +2.8125 =26.6875

Total 0 -26.6875 + 26.6875 = 0

The Population Mean

Suppose that the population has N observations x1, x2, … , xN, then the population mean; which is denoted by the Greek letter µµµµ (read: mu), is calculated by adding up all the observations and then dividing the result by N. That is

Page 65: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

51

Note that we calculate the population mean in the same way we did in calculating the sample mean, we sum up the observations and then divide by the number of observations. Other statistics differ in the way they are calculated for the sample versus the population.

Major Characteristics of the Mean

1. It is the arithmetic average of the observations in a data set.

2. There is only one mean for a data set.

3. Its value is influenced by extreme observations (large or small).

4. It is applicable to quantitative data only.

5. The sum of deviations about the mean is always zero;

( )n

ii 1

x x 0.=

− =∑

6. For symmetric distributions, it is located in the middle; see Figure 3.3(a).

7. For right -skewed distributions, it is further to the right; see Figure 3.3(b).

8. For left-skewed distribution, it is further to the left; see Figure 3.3(c).

Figure 3.3 Arrow points to the location of the mean

Definition 3.1 Resistant Measures

A resistant measure is a measure that can resist the influence of extreme values.

From this definition and major characteristic 3 of the mean, we say that it is not resistant; meaning it is affected by extreme values.

The Weighted Means

Some times we work with what is called weighted mean. Assume that wi is the weight associated with some case i. The formula for the weighted mean is given by:

N

ii 1

x

µ .N==

Page 66: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Three: Summarizing Data Numerically

52

where i = 1, 2, … , I is the number of cases considered in the problem.

The weighted mean naturally arises in the context of a frequency distribution, where the frequencies; the fi, are now the weights. In this case, the formula for the weighted mean would then be written as:

where i = 1, 2, … , and I is this time the number of classes or categories. In this case the weighted mean is the arithmetic mean or simply the mean.

Let us consider the following example.

Example 3.4 The Weighted Mean

Table below shows number of academic staff at Philadelphia University by educational level, 2005/2006*:

Educational level Value** Frequency

B.Sc. 1 78

High Diploma 2 0

M.A. and M.Sc. 3 80

Ph.D. 4 210

*Source: Statistical Yearbook, DOS, 2006.

**Values have been added to illustrate the example.

Calculate the weighted mean for the data given in the above table.

Solution

Since the weights are the frequencies, then the weighted mean is the arithmetic mean using the frequency formula above. The mean is given by

Iw xi i

i 1xI

wii 1

==

=

If xi i

i 1xI

fii 1

==

=

( ) ( ) ( ) ( )

4f xi i

1 78 2 0 3 80 4 210i 1x 3.14674 78 0 80 210

fii 1

× + × + × + ×== = =+ + +

=

Page 67: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

53

The Median

Another measure of central tendency is the median of a group of data. It is the number such that at least half of the data is less than or equal to it and at least half the data is greater than or equal to it. The median is not influenced by extreme values. The computation of the median depends on whether there are an odd or even number of observations in the data set.

Rule 1: If the number of observations n is odd, then the median is the value of

the observation numberedn 1

2+

in the ordered array.

Rule 2: If the number of observations n is even, then the median is the mean

(average) of the observations numbered n n

and 12 2

+ in the ordered array.

Example 3.5 illustrates the computation of the median for the case when the number of observations; n is odd followed by Example 3.6 for n even.

Example 3.5 Computation of median for odd n

Compute the median for the following set of data:

11 17 18 10 22 23 15 17

14 13 10 12 18 18 11 14 21

Solution

The number of observations in this set of data is odd; n = 17. First, we order the data to get the ordered array, and second we find the location of the median by

substituting the value of n = 17 in the formula n 1

2+

to get 17 1

92+= . This means

that the median is the value of the ninth ordered observation, namely median = 15.

Ordered array

10 10 11 11 12 13 14 14 15 17 17 18 18 18 21 22 23

For more details on how to compute the median for the data in Example 3.5, look at Figure 3.4.

Figure 3.4 Median for odd number of observations, Example 3.5

10 12 14 16 18 20 22 24 26 288 15

Median

Page 68: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Three: Summarizing Data Numerically

54

Example 3.6 Computation of Median for Even n

If we delete the last observation in the data set used in Example 3.5 above, we get a set with even number of observations; namely n = 16. The new set becomes

11 17 18 10 22 23 15 17 14 13 10 12 18 18 11 14

To find the median, first, we order the data to get the ordered array and second find the location of the median which is determined, this time, by the two

middle observations16 16

and 12 2

+ , or the 8th and 9th ordered observations. The

value of the median is then equal to the average of these two observations.

Ordered array

10 10 11 11 12 13 14 14 15 17 17 18 18 18 22 23

Therefore, 14 15

median 14.52+

= = .

For more details on how to compute the median for the data in Example 3.6, look at Figure 3.5.

Figure 3.5 Median for even number of observations, Example 3.6

Major Characteristics of the Median

1. It is the central value; 50% of the observations lie above it and 50% fall below it.

2. There is only one median for a data set.

3. It is a resistant measure; it is not affected by extreme observations.

4. It is applicable for quantitative data only.

5. For symmetric distributions, the median is equal to the mean; see Figure 3.6(a).

6. For right-skewed distributions, the median is smaller than the mean; see Figure 3.6(b).

7. For left skewed distributions, the median is larger than the mean; see Figure 3.6(c).

Figure 3.6 Relationship between mean and median for symmetric and skewed distributions

Page 69: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

55

The Mode

The mode of a data set is the most frequently occuring value. If data are qualitative, the mode is the category that has the largest frequency.

It is the only measure of central tendency that is good for qualitative (categorical) data. As we said before, means and medians cannot be used if data are not numerical. In a histogram we can immediately see what the mode is because it is the tallest value; the class with the largest frequency. Example 3.7 demonstrates the computations of the mode for quantitative data, followed by Example 3.8 to calculate the mode for categorical data.

Example 3.7 Calculating the Mode for Qualitative Data

To calculate the mode, we use the same set of data in Example 3.6; namely

11 17 18 10 22 23 15 17

14 13 10 12 18 18 11 14

To obtain the mode it is easier to work with the ordered array of the data set, although this is not a must. The ordered array of the data is

10 10 11 11 12 13 14 14 15 17 17 18 18 18 22 23

We see that the observation 18 is the most frequent value, and thus mode = 18. For more details on how to compute the mode for the data in Example 3.7, look at Figure 3.7. Also, Figure 3.8 illustrates the case when the data set has no mode.

Figure 3.7 Mode for data in Example 3.7

Figure 3.8 The no mode case for a data set

Page 70: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Three: Summarizing Data Numerically

56

Example 3.8 Calculating the mode for Qualitative Data

The following table represents medical and related professional employees at the Ministry of Health*. Determine the mode.

Category Frequency

Physicians 3,590

Dentists 549

Pharmacists 263

Nurses 3,633

Midwives 1,074

*Source: Jordan in Figures, DOS, 2006.

Solution

The mode of this data is nurses since this category has the highest frequency.

Major Characteristics of the Mode

1. It is the most frequent or probable observation in a data set.

2. There can be no mode, one mode, or more than one mode for a data set.

3. Modes are resistant measures; they are not influenced by extreme observations.

4. It is applicable for both quantitative and qualitative data.

5. For symmetric distributions, the mode, median, and mean are all equal; see Figure 3.9(a).

6. For right-skewed distributions, the mode < the median < the mean; see Figure 3.9(b).

7. For left-skewed distributions, the mode > the median > the mean; see Figure 3.9(c).

Figure 3.9 Relationship between mean, median, and mode for symmetric and skewed distributions

Page 71: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

57

Example 3.9 Calculating All Threes Measures of Central Tendency and Interpreting the Results

A high school teacher at a private school assigns calculus problems to be worked via the net. Students must use password to access the problems and the time of log-in and log-off are automatically recorded for the teacher. At the end of the week, the teacher examines the amount of time each student spent working the assigned problems. The data, in minutes, is shown below.

15 28 25 48 22

43 49 32 22 33

27 25 22 20 39

a. Find the mean, median, and mode for the above data.

b. What does this information tell the teacher about students' length of time on the computer solving calculus problems?

c. Is this data skewed?

Solution (a)

Mean

The mean is the sum of all times in minutes divided by 15. n

ii 1

x15 28 ... 39 450

Mean 30 minutesn 15 15= + + +

= = = =

∑.

Median

Since the number of observations is odd, n = 15, the median is the 8th ordered observation.

Ordered array

15 20 22 22 22 25 25 27 28 32 33 39 43 48 49

Therefore, the median = 27 minutes. Observe that, half of the times of the students lie above this number and half fall below it.

Page 72: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Three: Summarizing Data Numerically

58

The Mode

The mode is equal to 22 minutes. This value occurs three times; all other observations occur only two times or less.

Solution (b)

The mean number of minutes spent solving the problems on the computer was 30 minutes. Half of the students spent more than 27 minutes solving the calculus problems, and half spent less than 27 minutes. More students spent 22 minutes solving the problems than any other amount of time.

Solution (c)

The data indicate a slight positive skewness. This is most likely due to the students who spent over 40 minutes working on the calculus problems. Note that it is a very slightly skewed; only a 3 minutes difference between the mean and median.

3.2 Measures of Variation

Figure 3.10 Measures of variation

The measures of central tendency are discussed in Section 3.1. Theses measures give us numbers where the middle or centre of a set of data occurs. However, this is not enough to characterize the data. Sometimes two or more different sets of data have the same mean. Consider the following two sets of data:

Set (1): 50 60 70 80 90 and

Set (2): 69 69 70 71 71 Both sets of data have a mean of 70. Yet the first set of data is more widely spread than the second. In general, two sets of data may differ in both their measures of central tendency and variation; or two sets of data may have the same variation but differ in central tendencies; or two sets of data may have the same measures of central tendency but differ greatly in variation.

Page 73: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

59

The measures of variation allow us to assess how much the observations in a data set differ from each other. In this book we will consider four measures of variation, namely; the range, the variance, the standard deviation, and the coefficient of variation.

3.2.1 The Range

The range is the difference between the largest and smallest observations in a set of data.

It measures the total variation in a data set. To Compute the range, we denote the largest observation by maximum x and the smallest observation minimum x, and hence

We compute the range in Example 3.10.

Example 3.10 Computing the Range

The following table represents number of contractors registered in the Jordanian Contractors Association (JCA) by governorate*:

Governorate Am Ba Za Md Ir Mf Ja Aj Ka Ta Ma Aq

Number of contractors

952

60

93

43

158

20

22

18

127

35

28

33

*Source: Jordan in Figures, DOS, 2006.

Compute the range.

Solution

The largest observation is 952 and the smallest is 18, thus

Range = Maximum x – Minimum x

= 952 – 18

= 934.

As a measure of variation, the range suffers from a distinct weakness in that it accounts for only two observations, the largest and smallest, and discards all other observations in the data set. That is why the range is not a good measure for the total variation of the distribution of the data; see Figure 3.11 to understand this point more.

Range = Maximum x - Minimum x

Page 74: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Three: Summarizing Data Numerically

60

Figure 3.11 Range accounts only for the maximum and minimum observations in a set of data

3.2.2 The Population Variance

In the discussion above, we saw that the range is a measure of the total variation in a set of data, however, it does not account for the distribution of the observation. This is why we have to think of another measure of variation.

The variance, in contrast to the range, takes into account all the observations in a set of data and thus its calculation is not as easy as that of the range. However, this is not a problem, and we will learn how to calculate the variance in an unforgettable manner.

Roughly speaking, the variance measures variation by measuring how far, on the average, the observations are from the mean. The variance will be large for a set of data with large amount of variation because the observations will, on the average, be far from the mean. On the other hand, the variance will be small for a set of data with small amount of variation, this time because the observations will, on the average, be close to the mean.

We have two kinds of variances, as shown by Figure 3.10, the population variance and the sample variance. The Population variance is denoted by

the Greek letter 2σ (read: sigma square). It is calculated by summing up the squared deviations of each observation from the population mean and dividing the result by the size of the population. The formula used to calculate the population variance is

10

10

10

20

20

20

30

30

30

40

40

40

50

50

50

60

60

60

70

70

70 80

80

80

90

90

90

100

100

100

4o matter how the shape is, range = 90 – 20 = 70

Page 75: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

61

This formula is called the Defining Formula because it comes from the way we define the population variance.

Steps to Calculate The Population Variance

1. Calculate the population mean; µ.

2. Calculate the deviation (xi - µ) for each xi by subtracting µ from xi.

3. Square each deviation to get the deviations square; (xi - µ)2.

4. Repeat step 4 for all the observations in your population; namely from 1 to N.

5. Sum up the values of the squared deviations you get in step 3 above.

This way you get the sum ( )N 2

ii 1

x µ .=

−∑

6. Divide the result of step 5 by the population size N.

Note that we can use an alternative formula to calculate the population variance. This formula is called the Shortcut Formula and is derived from the defining formula given above. The population variance this time becomes

3.2.3 The Sample Variance

The sample variance, on the other hand, is denoted by s2, the lower case letter s squared. It is calculated by summing up the squared deviations of each observation from the sample mean and dividing the result by the size of the sample. Again, we have two formulas to calculate the sample variance. The Defining Formula used to calculate the sample variance is given by:

( )N 2

i2 i 1

x µ

σN

=

=

N2i

2 2i 1x

σ µN== −

( )n 2

i2 i 1

x x

sn 1

=

=−

Page 76: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Three: Summarizing Data Numerically

62

Steps to Calculate The Sample Variance

1. Calculate the sample mean.

2. Calculate the deviation (xi - x ) for each xi by subtracting x from xi.

3. Square each deviation to get the deviations square; (xi - x )2.

4. Repeat step 4 for all the observations in your sample; namely from 1 to n.

5. Sum up the values of the squared deviations you get in step 3

above. This way you get the sum ( )n 2

ii 1

x x .=

−∑

6. Divide the result of step 5 by the sample size n minus 1.

Note that we can use an alternative formula to calculate the sample variance. Again, this formula is called the Shortcut Formula and is derived from the defining formula given above. The sample variance this time becomes

As you can see now, the calculations of population variance and sample variance are slightly different. We will consider each case separately. In Example 3.11, we start the computation of the population variance first and then in Example 3.12 we consider the sample variance. In both examples we will use the defining formula first, and then recalculate the variance using the shortcut formula.

Example 3.11 Computing Population Variance Using the Defining and Shortcut Formulas

We start with the computation of the population variance using the defining formula. For simplicity, we assume that the population has only 6 observations; that is N = 6. Note that in reality populations have a very large number of observations.

The data are 10 60 50 30 40 20.

The defining formula for calculating the variance is given by

( )N 2

i2 i 1

x µ

σ .N

=

=

2n

ini 12

i2 i 1

x

xn

sn 1

=

=

=−

Page 77: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

63

We first calculate µµµµ and get 10 60 50 30 40 20

µ 35.6

+ + + + += =

Now make a table which looks like the one below.

x x - µ (x - µ)2

10 -25 625 60 25 625 50 15 225 30 -5 25 40 5 25 20 -15 225

Total 0 1750

To find σ2, all we need to do is to divide the sum of the third column in the above table by N = 6 to get the result. That is

2 1750σ 291.6666.

6= =

We now find the population variance σ2, but this time using the shortcut formula

N2i

2 2i 1x

σ µ .N== −

Notice that in this formula we also have to find the value of the population mean µ first. Then we make a table that has two columns only; namely x and x2 as shown below

.

We know that µ = 35, Therefore,

( )22 9100σ 35 1516.6666 1225 291.6666.

6= − = − =

It does not really matter which formula one has to use because, as we said earlier, the shortcut formula is derived from the defining one. Both formulas give exactly the same answer as a result, except maybe for some negligible difference due to approximation.

x x2

10 100 60 3600 50 2500 30 900 40 1600 20 400

Total 210 9100

Page 78: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Three: Summarizing Data Numerically

64

Example 3.12 Computing Sample Variance Using the Defining and Shortcut Formulas

We assume that we have a sample of size n = 6 measurements. We will use the same set of data used in Example 3.11, namely, the observations

10 60 50 30 40 20

We remind ourselves by the defining formula for computing the sample variance; it is

( )n 2

i2 i 1

x x

s .n 1

=

=−

Note that we must first calculate the sample mean; x , where

10 60 50 30 40 20x 35.

6+ + + + +

= =

Then we make a table that has three columns, namely x, (x - x ), and

(x - x )2. For our data, the table looks like:

x x - x (x - x )2

10 -25 625

60 25 625

50 15 225

30 -5 25

40 5 25

20 -15 225

Total 0 1750

To find s2, all we need to do is divide the sum of the third column in the table above by n - 1 = 5 to get the result. That is

2 1750s 350.

6 1= =−

We now find the sample variance s2, but this time using the shortcut formula

2n

ini 12

i2 i 1

x

xn

s .n 1

=

=

=−

Page 79: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

65

Notice that all we need to apply this formula are the two sums 2x and x .∑ ∑ Thus, we make a table that has two columns only; namely,

x and x2.

x x2

10 100 60 3600 50 2500 30 900 40 1600 20 400

210 9100

Plugging these results in the shortcut formula would result in the following value for s2:

( )2102 6

29100

s 350.6 1

−= =

Properties of the Variance

1. The population variance σ2 measures the variation about the

population mean; µ, whereas the sample variance measures the

variation about the sample mean; x .

2. σ2 = 0 or s2 = 0 only when the data has no variation. This occurs when the observations are all equal.

3. The unit of measurement of σ2 and s2 is the square of the original unit used for x.

4. σ2 and s2 are not resistant measures so extreme values will affect them.

Mean and Variance for Grouped Data

Before we proceed to the next section, let's take an example to learn the calculation of variance in the case of grouped data. A useful formula to calculate the sample mean and variance from a frequency table are given by:

Where x denotes class midpoint, f the class frequency, and n (= Σf) the sample size.

( )22 x x fxfx and s

n n 1

−= =

∑∑

Page 80: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Three: Summarizing Data Numerically

66

Notice that the above formulas give approximate values for the true mean and variance since for these formulas the observations in a class are assumed to be centered at the midpoint.

Example 3.13 Calculating Sample Mean and Variance for

Grouped Data

The following frequency table represents the number of male children died in road accidents*:

Age group Deaths

0 – 2 32

3 – 5 44

6 – 8 29

9 – 11 14

12 – 14 23

Total 142

*Source: Statistical Yearbook, DOS, 2006.

Calculate the mean and variance.

Solution

To find the mean, we make the following table: Age group f x xf

0 – 2 32 1 32

3 – 5 44 4 176

6 – 8 29 7 203

9 – 11 14 10 140

12 – 14 23 13 299

Total 142 850

Therefore,

= = =∑∑

≃xf 850

x 5.99 6 years.f 142

To determine the variance, we construct the following table:

Page 81: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

67

Age group f x x - x (x - x )2 (x - x )2f

0 – 2 32 1 -5 25 800

3 – 5 44 4 -2 4 176

6 – 8 29 7 1 1 29

9 – 11 14 10 4 16 224

12 – 14 23 13 7 49 1127

Total 142 2356

Therefore,

( )22 x x f 2356s 16.71

n 1 142 1

−= = =

− −

3.2.4 The Population and Sample Standard Deviation

In subsection 3.2.3, we mentioned in property 3 of the variance that the unit of measurements for the population variance σ2 and the sample variance s2 is the square of that used with the original data. For example, if x is measured in years, the variance is in (years)2, and this squared measure has no particular meaning and as a matter of fact, it does not have any meaning at all. Therefore, in order to get the original unit of measurement back, we take the square root of the variance. Taking the square root introduces a new measure of variation called the standard deviation. It is denoted by the Greek letter σ (read: sigma).

The population standard deviation 2σ σ= is calculated as follows:

1. Using the defining formula,

( )N 2

ii 1

x µ

σ .N

=

=

2. Using the shortcut formula,

N2i

2i 1x

σ µ .N== −

On the other hand, the sample standard deviation 2s s= is calculated as follows:

1. Using the defining formula,

( )n 2

ii 1

x x

s .n 1

=

=−

Page 82: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Three: Summarizing Data Numerically

68

2. Using the shortcut formula,

2n

ini 12

ii 1

x

xn

s .n 1

=

=

=−

Except for property 3, the standard deviation has the same properties as those the variance has. The unit of measurement for the standard deviation is the same as that of x. We illustrate in Example 3.14 the calculations of the standard deviation.

Example 3.14 Calculations of The Standard Deviation

Refer to Example 3.11 where we calculated the value of the population variance using either the defining formula or the shortcut formula. That value was σ2 = 291.6666. Now, to get the value of the population standard deviation, all we need to do is take the square root of 291.6666; that is

2σ σ 291.6666 17.08.= = =

Refer now to Example 3.12 where we calculated the value of the sample variance using either the defining formula or the shortcut formula. That value was s2 = 350. Now, to get the value of the sample standard deviation, all we need to do is to take the square root of 350; that is

2s s 350 18.71.= = =

3.2.5 The Coefficient of Variation

The range and variance are, as we said earlier, measures of the total variation while the coefficient of variation is a measure of relative variation. It measures the variation in the data relative to the mean. It is always expressed as a percentage rather than in terms of the units of the data. The sample coefficient of variation, denoted by the symbol CV, is defined by

Example 3.15 demonstrates the calculations of the coefficient of variation.

sCV 100%

x

where

s standard deviation in a set of data

x sample mean in the same set of data

=

=

=

Page 83: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

69

Example 3.15 Calculating the Coefficient of Variation

Refer to the data given in Example 3.13 for the number of male children died in road accidents in the Kingdom. We have x 6 years = and

2 2s 16.71 (years)= . Thus s 16.71 4.09 years= = .

Using the formula to calculate the coefficient of variation, we get

.

To interpret this value of CV we say that for this sample, the relative size of the "average spread around the mean" is 68.17%.

The coefficient of variation is most useful when we want to compare the variation of different samples, each with different mean. This is because a higher variability is expected when the mean increases, and the CV is the measure that accounts for this variability. Also CV is used to compare the variability of two or more sets of data that have different units of measurements. We illustrate this point in Example 3.16.

Example 3.16 Comparing Coefficient of Variation for Two Sets of Data

Measurements of the diameter of a ball bearing made with one micrometer have a mean of 3.92 mm and a standard deviation of 0.015 mm, whereas measurements made with another micrometer have a mean of 1.54 inches and a standard deviation of 0.008 inch. Which of these two measuring instruments is relatively more precise*?

*Source: Johnson, Richard A. (2000), Miller & Freund's Probability and Statistics for Engineers, 6th edition, Prentice Hall.

Solution

Using the first micrometer, the coefficient of variation is

0.015CV 100% 0.38%

3.92

= =

and using the second micrometer, the coefficient of variation is

0.008CV 100% 0.52%

1.54

= =

Therefore, the first micrometer is more precise.

s 4.09CV 100% 100% 68.17%

x 6

= = =

Page 84: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Three: Summarizing Data Numerically

70

3.3 Measures of Position

Figure 3.12 Measures of position

A measure of position (or location) is a number that gives relative position of a data value in the data set. In Section 3.1 we discussed measures of central tendency where our main interest was to find the middle value or the centre of the distribution. The only measure of central tendency that is related to the position of the data is the median, where 50% of the ordered data points lie above it and 50% fall below. In this section we want to elaborate our search to get more insight about the positions of data points. We will consider four commonly used measures of position, namely, the z-score, percentiles, deciles, and quartiles.

3.3.1 The z-Score

The z-score for a data value is obtained by subtracting the mean of the data set from the value and dividing the result by the standard deviation of the data set. We can use the z-score for both, population and sample; hence we can compute either a population z-score or a sample z-score.

The Sample z-score for a value x is given by the following formula:

x xz

s−

=

where x is the sample mean and s the sample standard deviation.

The population z-score for a value x is given by the following formula:

x µz

σ

−=

where µ is the population mean and σ the population standard deviation.

The z-score is the number of standard deviations the data value falls above (positive z-score) or below (negative z-score) the mean for the data set. It is not affected by extreme values; it is resistant, since an extreme value

Page 85: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

71

directly affects the value of the mean and the standard deviation. Example 3.17 illustrates the computations of the z-score.

Example 3.17 Computation of the z-Score

What is the z-score for the value 14 in the following sample values?

6 9 7 14 10 5 12 4 8 15

Solution

We compute the sample mean and sample standard deviation. The sample mean is x = 9, and the sample standard deviation is s = 3.7417. Verify that these values are correct. Thus the z-score is

approximated to two decimal places. Thus, the data value of 14 is located 1.34 standard deviations above the mean of 9, since the z-score is positive.

One might ask the following question: "Why do we use the z-score as a measure of relative position"? The answer becomes clear if we observe that the distance between the mean of 9 and the value of 14 is 1.34s = 5.01 ≈ 5. If we add the mean of 9 to this value of 5, we will get 14; the data value. Thus, this shows that the value of 14 is 1.34 standard deviations above the mean of 9, (see Figure 3.13).

Figure 3.13 Dot diagram of the data points with the location of the mean and data value of 14

From the above figure, it seems obvious that the z-score gives us how many standard deviations the observation is far away from the mean.

3.3.2 Percentiles

Percentiles are numerical values that divide an ordered data set into 100 groups of values with at most 1% of the data values in each group. When we discuss percentiles, we generally present the discussion through the kth percentile. Let the kth percentile be denoted by Pk.

14 9z 1.3362 1.34

3.7147−

= = ≈

Page 86: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Three: Summarizing Data Numerically

72

Pk is a number such that the percentage of observations less than or equal to Pk is greater than or equal to k% and the percentage of observations greater than or equal to Pk is greater than or equal to (100 – k)%. The idea of the kth percentile is illustrated in Figure 3.14.

Figure 3.14 Illustration of the kth percentile, Pk

From Figure 3.14, notice that there are 99 percentiles in a data set. In order to determine a percentile, the data set must first be ordered from the smallest to the largest value. Figure 3.15 depicts the 99th percentile in a data set.

Figure 3.15 Illustration of the 99th percentile, P99

It is always important to find the percentile that corresponds to a given data value, say x. In order to do so, we might use the following formula.

Example 3.18 shows how the computation of the percentiles is carried out.

Example 3.18 Computation of Percentiles

Suppose a pediatrician tested the cholesterol levels (in mg per 100 mL) of patients. He collected data for 14 patients whose readings are given bellow.

204 198 201 205 199 198 203

200 204 199 201 199 204 200

What is the percentile rank that corresponds to:

Number of values below xPercentile 100%

Number of values in data set= ×

Page 87: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

73

a. the reading of 203?

b. the reading of 200?

Solution

First, we need to arrange the values from smallest to largest. The ordered array is given below:

198 198 199 199 199 200 200 201 201 203 204 204 204 205

a. Observe that the number of values below the value of 203 is 9 and the total number of values in the data set is 14. Thus, using the formula, the corresponding percentile is:

We say the value of 203 corresponds to, approximately, the 64th percentile.

b. Observe that the number of values below the value of 200 is 5, and recall that the number of values in the data set is 14. Thus, using the above formula, the corresponding percentile is:

We say the value of 200 corresponds to, approximately, the 36th percentile.

Now what if we are interested in finding a data value that corresponds to a given percentile? The following steps will enable us to find a general percentile Pk for a data set.

Step 1: Order the data set from smallest to largest.

Step 2: Compute the position c of the percentile. To compute the value of c, use the following formula:

Step 3.1: If c is not a whole number, round up to the next whole number.

• Locate the position in the ordered set.

• The value in this location is the required percentile.

Step 3.2: If c is a whole number, find the average of the values of c and c+1 positions in the ordered array. This average is the required percentile.

Example 3.19 illustrates the computations of Pk.

n.kc

100=

9100% 64.29%

14× =

5100% 35.71%

14× =

Page 88: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Three: Summarizing Data Numerically

74

Example 3.19 Computation of Pk when c is not a Whole Number

Suppose that the highway mileage of 19 cars is as shown below. Find the 60th percentile.

24 16 13 16 13 26 13 27 26 27 27 21 23 13 13 28 23 21 19

Solution

First, we need to arrange the values from smallest to largest. The ordered array is given below:

13 13 13 13 13 16 16 19 21 21 23 23 24

26 26 27 27 27 28

Next, we compute the position of the percentile. Here n = 19, and k = 60.

Thus, ( )c 19 60 100 11.4= × = , and we need to round up to the value 12.

This means that the 12th in the ordered array corresponds to the 60th percentile; that is P60 = 23.

Finally, we pose the following question: "Why does a percentile measure relative position"? To answer this question we refer to Figure 3.16 below.

Figure 3.16 Display of the 60th percentile along with the data points

From Figure 3.16, observe that the value of 23 is such that at most 60% of the data values are smaller than 23 and at most 40% of the values are larger than 23. This shows that the percentile value of 23 is a measure of relative position.

Example 3.20 Computation of Pk when c is a Whole Number

Suppose that the city mileage of 20 cars is as shown below. Find the 25th percentile for this data.

17 22 8 45 13 19 21 19 11 22 38 20 20 17 13 11 10 25 16 21

Page 89: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

75

Solution

First, we must arrange the data set in order. The ordered array is

8 10 11 11 13 13 16 17 17 19

19 20 20 21 21 22 22 25 38 45

Next, we compute the position of the percentile, where n = 20, and k = 25.

Thus, ( )c 20 25 100 5.= × = Hence, the 25th percentile will be the average of

the values located at the 5th and 6th positions in the ordered set. Therefore,

3.3.3 Deciles and Quartiles

Deciles and quartiles are special cases of percentiles. Deciles divide an ordered data set into 10 equal parts, while quartiles divide an ordered data set into 4 equal parts. We usually denote the deciles by D1, D2, D3, … , D9 and the quartiles by Q1, Q2, and Q3. Figures 3.17 and 3.18 depict the deciles and quartiles, respectively, in a data set.

Figure 3.17 Deciles of a data Set

Figure 3.18 Quartiles of a data Set

Figure 3.17 shows that there are nine deciles in a data set; each contains at most 10% of the values, while Figure 3.18 shows that there are three quartiles; each contains at most 25% of the values.

For any data set, there are 99 percentiles, 9 deciles, and 3 quartiles. All are measures of position or location, as we previously said. The relationship between these measures of position ascertains the fact we mentioned earlier; deciles and quartiles are special cases of percentiles.

• Q1 = first quartile = P25.

• Q2 = second quartile = P50.

( )25P 13 13 2 13.= + =

Page 90: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Three: Summarizing Data Numerically

76

• Q3 = third quartile = P75.

• D1 = first decile = P10.

• D2 = second decile = P20.

• …

• D9 = ninth decile = P90.

It is worthy to mention that median = Q2 = P50 = D5, that is, the median, the second quartile, the 50th percentile, and the 5th decile are all equal to one another.

Finally, we have to introduce what is called "inter quartile range", denoted by IQR, and calculated as the difference between Q3 and Q1; that is,

IQR represents the range of the middle half of the data. It is a resistant measure for variation, and can be used to identify outliers. A data value, x, will be considered an outlier if it is smaller than lower fence (Q1 –

1.5××××IQR) or larger than the upper fence (Q3 + 1.5××××IQR). Outliers are usually denoted by an asterisk (*).

3.4 The Five-Number Summary and Box Plots

The five-number summary is an abbreviated way to describe a data set. It is used to get a quick summary about the centre, variation, and shape of a distribution. For any data set, the five-number summary is constituted from the following measures:

1. the minimum value,

2. the first quartile; Q1,

3. the median; Q2,

4. the third quartile; Q3, and

5. the maximum value.

These values have been specifically selected to give a summary of a data set because each value describes a specific part of the a data set; the median locates the centre of a data set; the first and third quartiles span the middle half of a data set; and the minimum and maximum values provide additional information about the actual variation of the data.

A five-number summary can be represented in a diagram known as a box plot. A horizontal box plot is constructed by drawing a box between the quartiles Q1 and Q3. Horizontal lines are then drawn from the middle of the sides of the box to the minimum and maximum values. These horizontal lines are called

3 1IQR Q Q= −

Page 91: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

77

whiskers. A vertical line inside the box marks the median. Example 3.21 illustrates the idea of the five-number summary and box plot.

Example 3.21 The Five-Number Summary and Box Plot

Let's use the same data of Example 3.20 where we have the city mileage of 20 cars is given below:

17 22 8 45 13 19 21 19 11 22 38 20 20 17 13 11 10 25 16 21

(a) Find the five-number summary.

(b) Draw a box plot for the data.

(c) Identify outliers.

Solution

We must arrange the data set in order. The ordered array is

8 10 11 11 13 13 16 17 17 19

19 20 20 21 21 22 22 25 38 45

(a) Thus, minimum = 8, maximum = 45, and from the solution of Example 3.20, we know that Q1 = P25 = 13. We compute the median = Q2 = P50 and Q3 = P75. The value of Q2 = 19 and Q3 = 21.5. Verify to see that these two values are computed correctly. Figure 3.19 represents the five-number summary for the data at hand.

Figure 3.19 The five-number summary for data in Example 3.21

(b)

A box plot is drawn in Figure 3.20.

Figure 3.20 Box plot for data in Example 3.21

Page 92: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Three: Summarizing Data Numerically

78

(c)

In order to identify outliers, we find the two values that represent the lower and upper fences, where the lower fence = Q1 – 1.5 IQR = 13 – 1.5(8.5) = 0.25 and the upper fence = Q3 + 1.5 IQR = 21.5 + 1.5(8.5) = 34.25. Any data value that is smaller than the value of the lower fence or larger than the value of the upper fence is considered an outlier. Thus, in this data set there are two outlier; the data values 38 and 45. Figure 3.21 shows box plot and outliers (denoted by *) in the data of Example 3.21.

Figure 3.21 Box plot showing outliers for data in Example 3.21

Finally, the information that can be obtained from a box plot is best described by looking at the median or at the length of the whiskers. We can say

• If the median is close to the center of the box, or if the whiskers are approximately the same length, the distribution of the data values will be approximately symmetric.

• If the median is to the left of the center of the box, or if the right whisker is longer than the left whisker, the distribution of the data values will be positively skewed.

• If the median is to the right of the center of the box, or if the left whisker is longer than the right whisker, the distribution of the data values will be negatively skewed.

Page 93: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

79

Exercises

3.1 For the following set of data, find the value of each expression: x∑ ,

y∑ , 2x∑ , 2y∑ , ( )2x∑ , xy∑ ,

and y∑ x∑

x y

3 4

0 5

7 3

3 6

2 1

0 4

4 2

3.4 The following table shows number of dead people in road accidents, 2000-2006*:

Year Dead

2000 686

2001 783

2002 758

2003 832

2004 818

2005 790

2006 899

*Source: Statistical Yearbook, DOS, 2006.

What is the mode of the number of dead people?

3.2 The following data set represents the final score of 20 students in a statistics course. Determine the mean, median, and mode.

69 78 76 73 78

83 77 77 77 77

81 80 80 84 75

75 77 79 76 77

3.3 Find the mean, median, and mode for each of the following sets of data.

a. 18 23 7 33 25

26 23 42 18 23 11

b. 25 26 27 28 25 28

29 30 31 30 26 27

28

c. 103 99 114 22 99 119

117 105 100 119 108 96

d.

2.3

5.6

6.3

4.7

8.5

2.3

6.5 2.9 7.7 8.1 7.3 4.6

3.5 The following table shows number of dead people by governorate, 2006*:

Governorate Dead

Amman 231

Balqa 76

Zarqa 88

Madaba 20

Irbid 132

Mafraq 46

Jarash 29

Ajlun 22

Karak 54

Tafiela 9

Ma'an 41

Aqaba 38

Albadea 113

*Source: Statistical Yearbook, DOS, 2006.

a. What is the mode?

b. Does it make sense to calculate the mean of the data? Why or why not?

Page 94: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Three: Summarizing Data Numerically

80

3.6 Find the mean and median for each of the following sets of data.

a. 1 2 3 4 5

6 7 8 9 10

b. 1 2 3 4 5

6 7 8 9 100

c. 1 2 3 4 5

6 7 8 90 100

What happens to the values of the mean and median?

3.9 Suppose you are given the following frequency distribution:

Class Intervals Frequency

16 to under 20.99 4 21 to under 25.99 2 26 to under 30.99 3 31 to under 35.99 3 36 to under 40.99 5 41 to under 45.99 3

Total 20

Calculate the mean, standard deviation, and CV.

3.7 Calculate the weighted mean for the following frequency distribution.

Word length Frequency

1 – 2 50

3 – 4 105

5 – 6 45

7 – 8 30

9 – 10 15

11 – 12 5

Total 250

3.10 The following frequency table represents the number of female children died in road accidents*:

Age Group Deaths

0 – 2 18

3 – 5 35

6 – 8 18

9 – 11 9

12 – 14 7

Total 87

*Source: Statistical Yearbook, DOS, 2006.

Calculate the mean, variance, and coefficient of variation.

3.8 Exam 2 scores for a small class are provided below.

87 99 75 87 94

35 88 87 93 75

Find the mean, median, and mode for the above data.

a. What does this information tell you about student's performance on Exam 2?

b. Is this data skewed? Explain

3.11 Refer to Exercise 3.7. Calculate the sample standard deviation.

3.12 Refer to Exercise 3.2. Calculate each of the following measure of variation:

Range, Sample variance, Sample standard deviation, CV.

Page 95: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

81

3.13 The monthly utility bills in a city have a mean of $70 and a standard deviation of $8. Find the z-scores that correspond to utility bills of $60, $71, and $92.

3.14 The mean score for the statistics test is 63 and the standard deviation is 7. The mean score for the biology test is 23 and the standard deviation is 3.9. A student gets 73 on the statistics test and 26 on the biology test.

a. Transform each test score to a z-score.

b. Determine on which test the student had a better score.

3.15 The following data represent the test scores for a mathematics class.

52 70 90 94 70 49

83 93 89 56 99 77

86 78 74 80 62 96

87 95 84 87 85 68

89 90 95 82 67 99 a. Arrange the data from smallest to

largest value.

b. Find Q1, the median, and Q3.

c. Determine the five-number summary for these values.

d. Find IQR and calculate the lower and upper fences.

e. Are there any outliers?

f. Draw a box plot for these values. Show outliers, if any.

g. What is the percentile rank that corresponds to the reading of 85?

h. Find the 30th percentile.

3.16 The following table represents average wage (in J.D.) per male employee in the private sector establishments during the refrence month of October 2005*:

1018 496 282 258

173 176 172 146

*Source: Statistical Yearbook, DOS, 2006.

a. Calculate P25, P50, and P75.

b. Calculate D10, D20, D80, and D90.

c. Determine the five-number summary.

d. Determine IQR and calculate the lower and upper fences.

e. Are there any outliers?

f. Draw a box plot for these data. Show outliers, if any.

g. What is the percentile rank that corresponds to the reading of 496?

Find the 20th percentile.

3.17 The following table represents average work hours per male employee in the private sector establishments during the refrence month of October 2005*:

249 245 255 280

281 268 257 267

*Source: Statistical Yearbook, DOS, 2006.

a. calculate the five-number summary.

b. calculate P20, P55, and P83.

c. Calculate D10, D40, and D90.

d. what is the percentile rank that corresponds to the reading of 280?

e. Find the 45th percentile.

Page 96: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Three: Summarizing Data Numerically

82

Chapter Learning Outcomes

When you complete your careful study of this chapter you should be able to:

1. use and understand the formulas presented in this chapter.

2. use and understand summation notation.

3. know what is meant by a measure of central tendency.

4. define, compute, and interpret the mean, median, and mode of a data set.

5. understand the major characteristics of mean, median, and mode.

6. explain the purpose of a measure of variation.

7. be familiar with properties of the variance and standard deviation.

8. define, compute, and interpret the range, variance, and standard deviation of a data set.

9. obtain the z-score, percentiles, deciles, and quartiles of a data set.

10. understand the relationship between percentiles and both deciles and quartiles.

11. get and interpret the inter quartile range and the five-number summary of a data set.

12. learn the computations of the lower and upper fences of a data set.

13. learn how to decide if a data value is an outlier.

14. construct a box plot and feel its usefulness to identify distribution shape, center, and spread.

Page 97: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

83

Chapter Key Terms

Arithmetic mean

Box plot

Coefficient of variation

Data values

Deciles

Descriptive measures

Deviations from the mean

First quartile

Five-number summary

Interquartile range

Left-skewed distribution

Location

Lower fence

Mean of a random variable

Measure of centre

Measure of position

Measure variation

Median

Mode

Outliers

Population mean

Population standard deviation

Population variance

Quartiles

Range

Resistant measure

Right-skewed distribution

Sample mean

Sample standard deviation

Sample variance

Second quartile

Standard deviation

Sum of squared deviation

Summation notation

Symmetric distribution

Third quartile

Upper fence

Weighted mean

Whiskers

Z-score

Page 98: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Three: Summarizing Data Numerically

84

Page 99: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

85

Chapter 4

Simple Linear Correlation and Regression

In Chapters 2 and 3, our main interest focused on the purpose of finding different numerical measures for a single variable. We studied various methods to describe a data set and found many measures that help achieving that goal. In this chapter, we are interested in the case of baivariate data where two variables are studied and the relationship that may exist between them is determined. To do this successfully, we will discuss two important famous techniques in statistics called correlation and regression. Correlation and regression analyses are related to each other in the sense that both deal with relationships among two (or more) variables.

As examples where we would be interested in applying correlation and regression are the following pairs of variables: (family income, child's IQ), (high school G.P.A, college G.P.A), (college G.P.A., income), (amount of time watching T.V., fear of crime), and (yearly income, area of store).

Correlation analysis is used widely to measure the direction and strength of association between numerical variables, while regression analysis is used mainly for the purpose of prediction. Our main goal in regression analysis is to develop a statistical model that is used to predict the values of a dependent variable based on the value of one (or more) independent variables.

In this chapter we study the simple linear correlation and regression, where one independent variable is taken with the dependent variable. We consider scatter plots, the simple linear correlation and the Pearson's correlation coefficient, the simple linear regression and prediction, and the coefficient of determination.

Chapter Outline

4.1 Scatter Plots

4.2 Simple Linear Correlation and Pearson's Correlation Coefficient

4.3 Simple Linear Regression and Prediction

4.4 The Coefficient of Determination

Page 100: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Four: Simple Linear Correlation and Regression

86

4.1 Scatter Plot

Scatter plots were first introduced in subsection 2.2.3 on graphing bivariate numerical data. There, we learned how to graph bivariate data and the purpose for doing so. We learned that scatter plot is a graph that displays relationship between two quantitative variables. The independent variable; x is plotted on the horizontal axis and the dependent variable; y is plotted on the vertical axis. In correlation and regression we usually refer to the independent variable x as the predictor (or explanatory) variable and the dependent variable y as the response variable.

Our first look at the data should always be some type of explanatory graphical analysis. Scatter plot is used to find out if a relationship exists between the predictor and response variables. For problems where correlation and regression are involved, scatter plots are vital and we must never skip it. It provides a visual description of the relationship between the two variables, the explanatory and response. Also, scatter plots show outliers, if any exists.

The reader should realize that the main purpose of scatter plot is to see if a relationship exists between the two variables under consideration. If the plot reveals no relationship, we then discard the predictor variable at hand and search for another one. If plot reveals a relationship, we then ask the question: "How strong or weak is the relationship"? The answer of this question will be discussed in some details when we introduce the simple linear correlation.

Example 2.9 in subsection 2.2.3 showed that as the area (in km2) increases the population density (in km2) decreases, so there is a negative linear relationship between the predictor variable; area (in km2), and the response; population density (in km2). Example 4.1 will show that the relationship between the two variables is in such a away that as the predictor variable increases, the response increases too, so there is a positive linear relationship exists between the two variables.

Example 4.1 Scatter Plot that Shows positive Relationship

Table 4.1 represents marketing costs and selling price in central markets in fils/kg*:

Page 101: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

87

36 48 60 72 84 96

Cost (fils/kg)

200

300

400

500

600

Pri

ce (

fils

/kg

)

��

Table 4.1 Average marketing costs and selling prices in fils/kg for crops

Crop Cost (fils/kg) Price (fils/kg)

Tomatoes 30.70 121.1

Eggplants 45.60 165.7

Potatoes 42.50 156.3

Squash 52.40 206.0

Cucumber 50.60 229.1

Cabbage 27.40 129.9

Cauliflower 38.60 158.1

Hot pepper 75.20 313.8

Sweet pepper 69.60 248.9

Broad beans 72.40 264.3

String beans 92.60 514.6

Radish 32.00 190.4

Okra 101.70 636.7

Lemons 58.10 252.8

Orange 63.70 331.8

Clementines 56.50 162.5

Apple 68.50 289.0

Corn 50.70 154.7

*Source: Statistical Yearbook, DOS, 2006.

The scatter plot of the above data is given in Figure 4.1.

Figure 4.1 Scatter plot for the data of Example 4.1 that shows positive relationship

Page 102: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Four: Simple Linear Correlation and Regression

88

From the scatter plot we can visualize the relationship that exists between the two variables cost and price. As cost increases, price also increases. This is why we define the relationship between the two variables as being positive.

The question that now arises is: how weak or strong is this relationship?

This question is answered by studying the simple linear correlation.

4.2 Simple Linear Correlation and Pearson's Correlation Coefficient

The purpose of correlation analysis is to measure and interpret the direction and strength of linear relationships between variables. In this book we will consider only the simple linear correlation. "Simple" indicates that there is only one predictor variable and "linear" refers to the model type.

Simple linear correlation is a measure of association between two variables. It measures the strength of the linear relationship between two variables. A frequently asked question is: "How does the response variable y changes as the predictor variable x changes"? For example, how does the sale change as advertising expenditures change? The answer would be there is a positive correlation between advertising expenditures and sale. Another question is: "How does heart pulse rate change as age changes"? The answer would be there is a negative correlation between age and heart pulse rate.

Several statistics can be used to measure the correlation between two variables, but the one most commonly used is the Pearson's correlation coefficient, which is denoted by r. It is a measure of the strength between two quantitative variables; x and y. The properties of r are summarized in Subsection 4.2.1 below.

4.2.1 Properties of r

1. r is unitless (or unit free).

2. | r | 1, or 1 r 1 means that the values of r ranges from 1 to 1.≤ − ≤ ≤ −

3. A value of r = 0 means no linear relationship exists between the two variables x and y.

4. A value of r near zero means a little (or weak) linear relationship exists between x and y.

5. A value of r close to -1 or 1 means a strong linear relationship exists between x and y.

6. The sign of r provides important information about the direction of association.

i. If r is positive, then as x increases, y increases linearly.

ii. If r is negative, then as x increases, y decreases linearly.

Page 103: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

89

If we desire to get a better feeling of the weakness or strength of the correlation coefficient r, we may consider the following classification for r:

• -0.25 < r < 0 or 0 < r < 0.25 as no or little linear relationship

• - 0.5 < r < -0.25 or 0.25 < r < 0.5 as poor linear relationship

• -0.75 < r < -0.50 or 0.50 < r < 0.75 as good linear relationship

• -1 < r < -0.75 or 0.75 < r < 1 as strong linear relationship

Figure 4.2 is a graph showing different degrees of association between two variables x and y.

Figure 4.2 Graph reflecting different degrees of linear correlation

4.2.2 Calculating the Value of r

Now we illustrate how to calculate the value of the simple linear correlation, r, for any data set. Suppose that we have a sample of n paired observations (x1, y1), (x2, y2), … , (xn, yn) then, r is computed as follows.

Y-A

xis

Y-A

xis

Y-A

xis

Y-A

xis

Y-A

xis

Y-A

xis

Page 104: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Four: Simple Linear Correlation and Regression

90

The formulas given above for Sxy, Sxx, and Syy are called "Defining Formulas". For hand calculations, these defining formulas are easily obtained by the following "Shortcut Formulas".

The shortcut formulas are easily derived from the defining formulas and we should get exactly the same answers using either formula. What's left

to do is to calculate the value or r. Example 4.2 illustrates the calculation of the value of r.

Example 4.2 Calculating the Value of Linear Correlation Coefficient r

Now, we illustrate how to calculate and interpret the linear correlation coefficient, r, for any data set. In order to do that, let's go back to the data on cost and price of crops in fils/kg in Example 4.1. We will apply the

xy

xx yy

n

xy i ii=1n

2xx i

i=1n

2yy i

i=1

Sr=

S .S

where

S = (x -x)(y -y )

S = (x -x) , and

S = (y -y)

1 1

12

12

12

12

1

n nx yi in

i iS x yxy i i ni

nxin

iS xxx i ni

nyin

iS yyy i ni

= = = −

=

= = −

=

= = −

=

∑ ∑∑

∑∑

∑∑

Page 105: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

91

shortcut formulas to calculate Sxy, Sxx, and Syy. Besides n, notice that these formulas involve five important sums, namely

Thus we need to make a table that shows values of n, x, y, xy, x2, y2 and their sums. Table 4.2 shows these values and also the sum of each column.

Table 4.2 Values of x, y, xy, x2, and y2 and their sums for the data of Example 4.1

Crop

Cost

(fils/kg)

x

Price

(fils/kg)

y

xy

x2

y2

Tomatoes 30.70 121.1 3717.77 942.49 14665.21

Eggplants 45.60 165.7 7555.92 2079.36 27456.49

Potatoes 42.50 156.3 6642.75 1806.25 24429.69

Squash 52.40 206.0 10794.40 2745.76 42436.00

Cucumber 50.60 229.1 11592.46 2560.36 52486.81

Cabbage 27.40 129.9 3559.26 750.76 16874.01

Cauliflower 38.60 158.1 6102.66 1489.96 24995.61

Hot pepper 75.20 313.8 23597.76 5655.04 98470.44

Sweet pepper 69.60 248.9 17323.44 4844.16 61951.21

Broad beans 72.40 264.3 19135.32 5241.76 69854.49

String beans 92.60 514.6 47651.96 8574.76 264813.16

Radish 32.00 190.4 6092.80 1024.00 36252.16

Okra 101.70 636.7 64752.39 10342.89 405386.89

Lemons 58.10 252.8 14687.68 3375.61 63907.84

Orange 63.70 331.8 21135.66 4057.69 110091.24

Clementines 56.50 162.5 9181.25 3192.25 26406.25

Apple 68.50 289.0 19796.50 4692.25 83521.00

Corn 50.70 154.7 7843.29 2570.49 23932.09

Total 1028.80 4525.70 301163.27 65945.84 1447930.59

2 2n, x, y, xy, x , and y∑ ∑ ∑ ∑ ∑

Page 106: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Four: Simple Linear Correlation and Regression

92

From Table 4.2 we get the required sums we need; namely:

1028 80 4525 70

2 2301163 27 65945 84 1447930 59

18

= =

= = ==

∑ ∑

∑ ∑ ∑

x . , y . ,

xy . , x . , y . ,

and n

Using these sums, we calculate the quantities Sxy, Sxx, and Syy as follows:

Now we are ready to calculate the value of the Pearson's correlation coefficient, r and

The value of the linear correlation coefficient of r = 0.91 indicates a strong positive linear correlation between the two variables cost and price. This means that as cost increases, price increases too. Notice that this result agrees thoroughly with the scatter plot.

An important question now arises, "What is the mathematical model that relates x to y"?

The answer to this question leads us to the study of regression.

4.3 Simple Linear Regression and Prediction

Suppose that the simple linear correlation coefficient, r reveals an association between the two variables x and y, then, what is the type of relationship that exists between the two variables? Put it in another way, what is the mathematical model that relates the two variables x and y together? To answer this question, we study regression.

Regression means examining the relationship between the dependent variable and one (or more) independent variables. We will consider only the "simple linear regression" where there is only one independent variable and there is a linear relationship between x and y.

We shall develop an equation; that is, a mathematical formula that relates the dependent variable to the independent variable. This equation is then used to

( ) ( )

( )

( )

1028 80 4525 70301163 27 42494 37

1821028 80

65945 84 7144 2018

24525 701447930 59 310043 90

18

. .S . .xy

.S . .xx

.S . .yy

= − =

= − =

= − =

( ) ( )42494 37

0 917144 20 310043 90

.r .

. .= =

Page 107: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

93

predict the value of the dependent (or response) variable for a given value of the independent (or predictor) variable.

Now we start the process of finding the equation that relates the response to the predictor variable. We will continue with the two variables cost and price given in Examples 4.1 and 4.2.

Recall that the scatter plot of the data of the two variables showed scattered points that look like a "linear cloud". Although data points do not lie on a straight line, they are clustered about a straight line.

We could draw many different straight lines through the points of data, but how good are these lines? Among all the many different lines, we look for the one that best fits the data.

Best line means Closest to all the points. The regression line, also called the least-squares line is the best line that fits the data.

4.3.1 The Least-Squares Criterion and Regression Equation

In this subsection, we introduce the least-squares criterion and the equation of the regression line. We will define the quantities that occur in each, and then take on an example to illustrate calculations.

Definition 4.1 The Least Squares Criterion

The straight line that best fits a set of data points is the one having the smallest sum of squared errors.

Definition 4.2 Regression Line and Regression Equation

Regression line is the straight line that best fits a set of data points according to the least-squares criterion.

Regression Equation is the equation of the regression line.

According to the least-squares criterion, the straight line that fits the data best is that line which has the smallest (or the least) sum of squared errors.

The response in the regression equation is denoted by y to indicate that

these values are the predicted y values not the true observed y values. The "hat" denotes prediction. The error, denoted by e, is given by ˆe y y= − .

Figure 4.3 shows the error at an x value.

Page 108: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Four: Simple Linear Correlation and Regression

94

Figure 4.3 Error of an x value and the regression line

Notice that the least-squares criterion tells us what property the regression

line for a data points must have; namely it says that 2e∑ is the least or

smallest. The equation of the regression line is given by the following formula:

The least-squares line, or the regression line, is the line 0 1y b b x= + that

minimizes 2e ,∑ where

b0 is called the intercept (the value of the regression line at x = 0).

b1 is called the slope of the regression line. This slope may be positive, negative, or zero.

• If b1> 0, the regression line slopes upward.

• If b1< 0, the regression line slopes downward.

• If b1= 0, the regression line is horizontal.

See Figure 4.4.

e observed y predicted y

ˆ y y

= −= −

2∑e is min imum (or the least)

Regression Equation

The regression equation for a set of n data points is

y = b + b x0 1where

Sxyb = and b = y b x1 0 1Sxx

Page 109: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

95

Figure 4.4 Different values of the slope for regression line

4.3.2 Finding the Regression Equation

We said earlier that the equation of the regression line is given by the line

0 1y b b x= + where b0 is the intercept of the line, and b1 is the slope. Now,

what are the values of b0 and b1? How do we calculate it? The answer to this question lies in some quantities that we used earlier in the calculation of the value of the simple linear correlation coefficient, r. Remember Sxy and Sxx, we will use these two quantities to find the slope, b1.

Note that we always find b1 first (before b0), where

After finding b1, we find b0 where

As an illustration, consider Example 4.3 where we go back to the data of Example 4.1. Observe first that in finding the value of the simple correlation coefficient, r, we needed six quantities, namely:

However, in order to find the equation of the regression line, we only need

the first five quantities 2n, x, y, xy, and x∑ ∑ ∑ ∑ . No need for 2y∑ .

( )( ) ( )22x y xS xy and S xxy xxn n

= − = −∑ ∑ ∑∑ ∑

( ) ( )

( )1 2

2

x yxySxy nb

Sxx xx

n

−= =

∑ ∑∑

∑∑

0 1 1y x

b y b x bn n

= − = −

∑ ∑

2 2n, x, y, xy, x , and y .∑ ∑ ∑ ∑ ∑

Page 110: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Four: Simple Linear Correlation and Regression

96

Example 4.3 Calculations of the Regression Line Equation

Table 4.3 below shows the quantities needed in the calculation of the regression line equation; namely, values of n, x, y, xy, x2 and their sums for the data given in Example 4.1.

Table 4.3 Values of x, y, xy, and x2 and their sums for the data of example 4.1

Crop

Cost (fils/kg)

x

Price (fils/kg)

y

xy

x2

Tomatoes 30.70 121.1 3717.77 942.49

Eggplants 45.60 165.7 7555.92 2079.36

Potatoes 42.50 156.3 6642.75 1806.25

Squash 52.40 206.0 10794.40 2745.76

Cucumber 50.60 229.1 11592.46 2560.36

Cabbage 27.40 129.9 3559.26 750.76

Cauliflower 38.60 158.1 6102.66 1489.96

Hot pepper 75.20 313.8 23597.76 5655.04

Sweet pepper 69.60 248.9 17323.44 4844.16

Broad beans 72.40 264.3 19135.32 5241.76

String beans 92.60 514.6 47651.96 8574.76

Radish 32.00 190.4 6092.80 1024.00

Okra 101.70 636.7 64752.39 10342.89

Lemons 58.10 252.8 14687.68 3375.61

Orange 63.70 331.8 21135.66 4057.69

Clementines 56.50 162.5 9181.25 3192.25

Apple 68.50 289.0 19796.50 4692.25

Corn 50.70 154.7 7843.29 2570.49

Total 1028.80 4525.70 301163.27 65945.84

From the table, we get the following sums:

Remember, we first calculate b1 and then b0 simply because the calculations of b0 depends on b1 . For b1 we have:

21028 80 4525 70 301163 27 65945 84= = = =∑ ∑ ∑ ∑x . , y . , xy . , and x . .

Page 111: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

97

For b0 we have:

After finding both the slope and intercept of the regression line, now we

can write the regression equation as 88 65 5 95= − +y . . x .

To graph the regression line, we two take any two different values of x and

plug it into the regression equation 88 65 5 95= − +y . . x .

• Let us take the two values of x , say x = 50 and x = 75

• The corresponding values are:

Therefore the regression line passes through the two points (50, 208.85) and (75, 357.60). In Figure 4.5, we have plotted these two points using the symbol "■" on the regression line.

( )( )

( )

( ) ( )

( )

1 22

1028 80 4525 70301163 27

1821028 80

65945 8418

42494 37

7144 205 95

x yxySxy nb

Sxx xx

n. .

. -

.. -

.

. .

−= =

=

=

=

∑ ∑∑

∑∑

( )

0 1 1

4525 70 1028 805 95

18 18

88 65

= − = −

= −

= −

∑ ∑y xb y b x b

n n

. . .

.

( ) ( )ˆ ˆy 88.65 5.95 50 =208.85 and y 88.65 5.95 75 =357.60.= − + = − +

Page 112: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Four: Simple Linear Correlation and Regression

98

28 42 56 70 84 98

Cost

100

200

300

400

500

600

Pri

ce

��

��

Figure 4.5 The regression line

It should be clear now that this regression line, the line 88 65 5 95= − +y . . x ,

is the straight line that fits the data best, according to the least-squares criterion. The line 88 65 5 95= − +y . . x is the straight line that has the

smallest (least) possible sum of squared errors. Note that the line need not pass through the observed data points. In fact, it often will not pass through any of them.

Some Facts about Least Squares Regression

The distinction between predictor and response variables is essential.

Looking at vertical deviations means that changing the axes would change the regression line.

1. A change of 1 standard deviation in x corresponds to a change of r standard deviations in y.

2. The least-squares regression line always passes through the

point ( )x, y .

4.3.3 Meaning of the Slope of Regression Line

What dose a negative or a positive value of the slope b1 mean? Let's look at the following two figures.

Page 113: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

99

Figure 4.6 Graph that explains the meaning of negative slope

Figure 4.7 Graph that explains the meaning of positive slope

From Figure 4.6 we conclude that if the slope b1 of the regression line is negative, then a one unit increase in x will result in b1units decrease in y. Figure 4.7, on the other hand, says that if the slope is negative, then a one unit increase in x will result in b1units increase in y.

Therefore, we conclude for the data in Example 4.3 that as the cost of crop increases 1 fils/kg the price increases 5.95 fils/kg. In general, the slope b1 of the regression equation 0 1y b b x= + means the change in y for a unit increase in x.

4.3.4 Using the Regression Equation to Make Predictions

In the introduction of Chapter 4 we mentioned that regression analysis is used mainly for the purpose of prediction. Our main goal in regression analysis is to develop a statistical model that is used to predict the values of a dependent variable based on the value of one independent variable. The regression equation 88 65 5 95= − +y . . x is, thus, used to make

0 1

The Regression Line

ˆ y b b x= +

0 1

The Regression Line

ˆ y b b x= +

Page 114: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Four: Simple Linear Correlation and Regression

100

predictions. To make a prediction for an unobserved x, just plug it in and calculate y.For example, if the cost of a crop is 100 fils/kg then its price

would be ( )88 65 5 95 100 506 35= − + =y . . . fils/kg

4.4 The Coefficient of Determination

In Example 4.3 we determined the regression equation, 88 65 5 95= − +y . . x , for

data on cost (in fils/kg) and price (in fils/kg) of a sample of 18 crops. We can apply the regression equation to make predictions of the price for a particular cost. For example, we predicted a 100 fils/kg price will result in a cost of 506.35 fils/kg. A question that arises now is: "How reliable are such predictions"? Is the regression equation we found really useful for predicting the price, or we could do just as well by ignoring cost. What we need now is a way to evaluate the utility of the regression equation for making good predictions.

One way to evaluate the utility of a regression equation for making good predictions is the use of r2, the square of the correlation coefficient, r.

Definition 4.3 The Coefficient of Determination

The coefficient of determination is the fraction of the variation in the values of y that is explained by the least squares regression on x.

In Example 4.2, we calculated the value of the Pearson's correlation coefficient, r

and found that r = 0.91. Therefore, ( )22 0 91 0 8281 82= = ≈r . . %. This result

means that our use of cost as a predictor variable has contributed in explaining 82% of the variation in the response variable; price. That is, our choice of taking cost as a predictor variable, and not taking another variable, was a successful choice.

Cautions about Correlation and Regression

1. Correlation and regression describe only linear relationships.

2. They are not resistant to outliers.

3. Extrapolation is the use of the regression line far outside the range of the x-values used to obtain the line. Such predictions are not to be trusted, they are not reliable. A good idea is not to go far below the minimum value or above the maximum value of x.

4. Correlation does not imply causation. Just because two variables are highly correlated does not mean that one causes the other, it could very well be the effect of a third variable. For example, the number of cavities in elementary school children and vocabulary size has a strong positive correlation because both variables are related to age.

Page 115: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

101

Exercises

4.1 The following data are observations of classroom temperatures and mean test scores for five sections of an introductory statistics course.

Temp. (C°) 35 36 38 38 39

Scores (%) 82 78 76 74 71

a. Find the value of Pearson's correlation coefficient.

b. Find the least-squares regression line.

c. Use the regression line to predict the mean test score when the temperature is 70 F°.

d. Calculate the value of r2 and interpret it.

4.2 The following are ages (in years) and prices (in $100) of twelve cars of the same model.

Age 6 6 2 2 5 4

Price 75 60 202 166 90 132

Age 6 4 1 5 2 2

Price 65 135 206 162 185 192

a. Find the value of the Pearson's correlation coefficient.

b. Find the least-squares regression line.

c. Use the regression line to predict the price of a car when its age is 3 years.

d. Calculate the value of r2 and interpret it.

4.3 A college professor claims that the scores on the first exam provide an excellent indication of how students will perform throughout

the semester. To test this claim, first-exam scores, second-exam scores, and final exam scores were recorded for a sample of n = 12 students in an introductory class. The scores are as given below.

Exam1 62 73 88 82 85 77

Exam2 68 83 86 79 91 72

Final 74 93 90 79 95 72

Exam1 94 65 91 74 85 98

Exam2 96 70 92 82 90 96

Final 96 75 95 85 93 100

a. Is there a correlation between the scores on the first exam and the final?

b. Is there a correlation between the scores on the second exam and the final?

c. How accurately do the first exam scores predict the final?

d. Find the regression equation that relates the first exam to the final.

4.4 suppose you have values of x and y given by

x 0 1 2 3

y 1 2 6 8

You want to compare two

prediction rules; 21 2ˆ ˆy 2x and y x= = .

According to the least-squares criterion, which of these prediction rules is better?

4.5 The following table represents the area (1000 dunum) and production (1000 tons) of fruit trees in Jordan*:

Page 116: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Four: Simple Linear Correlation and Regression

102

Crop

Area x

Productiony

Citrus fruit 67.5 139.2

Grapes 36.5 32.2

Fig 5.4 3.5

Almonds 4.8 3.1

Peach 15.8 12.6

Plums 6.5 4.9

Apricot 7.8 6.8

Apple 38.6 46.4

Pomegranate 3.6 3.5

Pears 2.7 2.6

Guava 1.2 1.0

Date palm 6.6 4.0

Banana 14.5 42.1

Nectarine 1.3 1.4

Cherry 1.9 1.3

Others 3.5 4.3

*Source: Statistical Yearbook, DOS, 2006.

a. Is there a correlation between area and production?

b. How accurately does the area predict production?

c. Find the regression equation that relates production to area.

d. Predict production when area is 10.

4.6 The following table represents transit commodities passing through Jordanian territories, 1995-2006. Values are in million J.D. and quantities are in 1000 tons:

Year

Quantity

x

Value

y

1995 1505.6 1506.8

1996 1386.9 1708.4

1997 1391.3 1164.8

1998 1967.4 1445.4

1999 2256.6 1730.9

2000 3726.4 1514.1

2001 3346.0 3539.4

2002 3905.1 3309.1

2003 5322.8 6299.1

2004 6863.5 8219.3

2005 6668.1 7545.7

2006 5934.4 7679.4

*Source: Statistical Yearbook, DOS, 2006.

a. Is there a correlation between quantity and value?

b. How accurately does the quantity predict value?

c. Find the regression equation that relates value to quantity.

d. Predict value when quantity is 4500.

4.7 The regression line for a particular data set is found to be

( )y 75 2 x 40 .= + −

a. A particular point has value x = 50; find the predicted value of this point.

b. Refer to part a. If the actual response for the point is 100; compute the value of the error (residual).

Page 117: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

103

4.8 The following table represents unemployment rates and refined economic activity rates for male Jordanians in 2006*:

Gov.

Unemployment rates

Refined economic activity

rate

Amman 9.90 65.30

Balqa 12.10 61.50

Zarqa 10.70 65.80

Madaba 13.00 60.40

Irbid 13.90 59.00

Mafraq 15.50 60.30

Jarash 12.80 59.80

Ajlun 15.50 57.70

Karak 19.10 60.00

Tafiela 12.10 61.70

Ma'an 16.40 61.60

Aqaba 12.80 73.30

*Source: Statistical Yearbook, DOS, 2006.

a. Is there a correlation between unemployment rate and refined economic activity rate?

b. How accurately does the unemployment rate predict refined economic activity rate?

c. Find the regression equation that relates unemployment rate to refined economic activity rate.

d. Predict refined economic activity rate when unemployment rate = 20.

e. Comment on your findings. Would you recommend such results? Why or why not?

4.9 The following table represents length of road networks (in km) by type of road and governorate*:

Gov.

Highways

Secondary roads

Amman 355 222

Balqa 206 167

Zarqa 303 149

Madaba 63 104

Irbid 272 299

Mafraq 444 288

Jarash 85 117

Ajlun 75 163

Karak 289 176

Tafiela 185 89

Ma'an 563 246

Aqaba 347 92

Total 3187 2112

*Source: Statistical Yearbook, DOS, 2006.

a. Is there a correlation between highways and secondary roads length?

b. How accurately does highway length predict secondary road length?

c. Find the regression equation that relates highway length to secondary road length.

d. Predict the length of a secondary road if the highway length is 400 km.

e. Comment on your findings. Would you recommend such results? Why or why not?

Page 118: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Four: Simple Linear Correlation and Regression

104

4.10 Let x be length (in inches) of foot and y length (in inches) of hand. The following measurements were made on 15 women.

x y

8.75 6.25

8.50 6.25

9.50 7.75

9.75 7.00

9.00 6.75

10.00 7.00

9.50 6.500

9.00 7.00

9.25 7.00

9.50 7.00

9.25 7.00

10.00 7.50

10.00 7.25

9.75 7.25

9.50 7.25

a. Graph a scatter plot of the data.

b. From part (a) above, do you see any relationship between x and y?

c. Calculate Pearson's correlation coefficient, r.

d. Find the regression equation that relates y to x.

e. If length of foot was 9.60, what would your prediction to the length of hand be?

g. Calculate ( )15 15 22i i i

i 1 i 1ˆe y y .

= =∑ ∑= −

Comment on this sum.

4.11 The following data represent the height in centimeters (x) and weight in grams (y) of a type of plant. A sample of ten plants was taken.

x y

4.8 2.3

6.2 4.6

6.4 5.0

6.9 6.8

7.8 9.4

7.8 9.2

8.1 10.9

8.7 13.6

9.2 15.9

10.4 22.1

a. Graph a scatter plot of the data.

b. From part (a) above, do you see any relationship between x and y?

c. Calculate Pearson's correlation coefficient, r.

d. Find the regression equation that relates y to x.

e. Graph the regression line on the scatter plot.

f. If the height of the plant was 10.6 cm, what would your prediction to the weight be?

g. Calculate ( )10 10 22i i i

i 1 i 1ˆe y y .

= =∑ ∑= −

Comment on this sum.

h. Calculate the value of the coefficient of determination, r2. What does it mean? Comment.

Page 119: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part II: Descriptive Statistics

105

Chapter Learning Outcomes

When you complete your careful study of this chapter you should be able to:

1. feel the importance of scatter plots.

2. understand the association between two quantitative variables.

3. determine the relationship between two variables.

4. learn how to quantify this relationship.

5. understand and use the formulas presented in this chapter.

6. determine and interpret the linear correlation coefficient.

7. learn what regression means.

8. talk about simple linear regression.

9. see what simple and linear means.

10. know about the least-squares criterion.

11. obtain the graph of the regression equation for a set of data points.

12. know how to get the best-fitted line.

13. look at the role of the least-squares equation in getting the simple linear regression equation.

14. learn the computations of slope and intercept.

15. interpret the meaning of slope and intercept.

16. use the regression equation for predictions.

Chapter Key Terms

Coefficient of determination

Correlation

Explanatory variable

Extrapolation

Least-squares criterion

Linear correlation coefficient

Negative relationship

Pearson's correlation coefficient

Positive relationship

Prediction

Predictor variable

Regression

Regression equation

Regression line

Response variable

Scatter plot

Simple linear correlation

Simple linear regression

Slope

Straight line

Page 120: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Four: Simple Linear Correlation and Regression

106

Page 121: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

107

Part III

Probability Concepts and Distributions

Page 122: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Five: Probability Concepts

108

Page 123: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

109

Chapter 5

Probability Concepts

In the preceding chapters, we paid attention to descriptive statistics and concepts related to it. In this chapter our concerns will concentrate on probability concepts. Probability is the science that deals with uncertainty. It provides a mathematical description of randomness and serves as being the link between descriptive statistics and inferential statistics; where sample data are used to make inferences to population. Inferential statistics plays an important role in this book and will be discussed in details later on.

Because samples are used to infer to populations, we will not be completely sure that our results are 100% correct. Uncertainty exists every time we use inferential statistics. This is the reason why we must study probability concepts.

In this chapter we give some basic definitions used in probability. We mention the three basic axioms of probability, and study some important multiplication and addition rules of probability. We also discuss independence; an important concept in probability. Finally, we introduce some counting rules necessary to use in solving problems of probability.

Chapter Outline

5.1 Some Definitions

5.2 Graphical Displays and Relationships between Events

5.3 Axioms of Probability

5.4 The Addition Rule

5.5 The Conditional Probability

5.6 Independence and Multiplication Rule

5.7 Some Counting Rules

Page 124: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Five: Probability Concepts

110

5.1 Some Definitions

In this section we mention some basic definitions that are usually used in probability. We start with the definition of random experiment given below.

Definition 5.1 Random Experiment

An experiment in probability is defined to be random if the outcomes of that experiment cannot be predicted with certainty.

Definition 5.2 Outcome

An outcome is a particular result of the random experiment. Outcomes of a particular experiment are usually denoted by small (or lower case) letters such a, b, c, or x, y, z.

Definition 5.3 Sample Space

Sample space is the set of all possible outcomes of a given experiment.

The letter S is used to denote the sample space of an experiment. The sample space is also called the "sure" or "certain" set.

Sample spaces of experiments are of two kinds, finite and infinite, depending on the number of outcomes in the experiment. If the number of outcomes of an experiment is finite (or countably infinite), then the sample space is discrete. However, if the number of outcomes is infinite, then the sample space is continuous.

Definition 5.4 Event

An event is any subset of the sample space.

Events are usually denoted by upper case letters such as A, B, C, or X, Y, Z. Notice that the event is a collection of outcomes of interest. It is the set that contains one ore more outcomes of an experiment. We say that an event, say A, occurs when any outcome of A occurs.

Example 5.1 clarifies definitions 1-4 given above.

Page 125: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

111

Example 5.1 Four Examples to Clear Definitions 1-4

Flipping a coin is an action that will eventually result in having either a head (H) or a tail (T). Therefore, flipping a coin is random experiment because we cannot predict the result that will occur in advance. Naturally, after the coin lands, only then we will be sure of the face being H or T. The sample space is S = {H, T}. It is finite because the number of outcomes is 2.

Rolling a six-sided die and observing the number of points on the top upper face of the die is also a random experiment because we cannot predict the side that will turn up in advance. Each of the faces of the die is an outcome and the sample space of this experiment is the set:

Notice that S is finite since the number of outcomes is 6.

Let

A = the event of having odd faces when you toss a die and

B = the event of having even faces when you toss a die,

then, A and B are events where

Note that the null or empty set φ is called the Impossible Event. It cannot

occur since φ has no elements or outcomes in it. Also, the sample space, S is called the Sure Event because it contains all possible outcomes of the experiment.

As a third example look at a basketball player who throws one ball towards the basket and observes whether he hits (H) or misses (M). This is a random experiment, where the outcomes are H or M. The sample space is the set S = {H, M}. {H} is an event and so is {M} and S is finite.

Now assume that the basketball player throws two balls, the finite sample space of the results will be

S = {(H, H), (H, M), (M, H), (M, M)}.

Some events that we can define on S are:

A = {(H, H)} is the event of having hits on both throws,

B = {(H, H), (H, M)} is the event of having a hit on the first throw, and

C = {(H, H), (H, M), (M, H)} is the event of having at least one hit.

Page 126: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Five: Probability Concepts

112

Finally, look at the following example. Suppose the General Electric Company is testing the life of a new light bulb by putting the light bulb in a lamp and leaving the lamp turned on until the light bulb burns out. The company is interested in the time (in hours) that it takes for the light to burn out.

The sample space in this case is of the infinite kind and is defined by

S = {x: x ≥ 0}, where x denotes time in hours.

Then we may describe the events

A = event the light bulb burns out in less than 1000 hours and

B = event the light bulb burns between 1300 and 1600 hours

as follows:

A = {x: x < 1000} and

B = {x: 1300 ≤ x ≤ 1600}.

Definition 5.5 Mutually Exclusive (or Disjoint) Events

The two events A and B are said to be mutually exclusive (or disjoint) if they cannot occur at the same time.

If A and B are disjoint, then A∩B = φ, the null, or impossible event.

Definition 5.6 Exhaustive Events

Events A1, A2, A3, … , Ak are exhaustive if at least one must occur when an experiment is run.

Definition 5.7 Independent Events

Two events A and B are independent if the occurrence or nonoccurrence of one does not affect the occurrence or nonoccurrence of the other.

Example 5.2 clarifies definitions 5-7 given above.

Example 5.2 An Example to Clear Definitions 5-7

Consider the experiment of choosing at random a digit from the digits 0, 1, 2, … , 9. The sample space is S = {0, 1, 2, … , 9}. Define the following events:

A = event the number chosen is less than or equal to 3,

B = event the number chosen is between 4 and 6, inclusive,

C = event the number chosen is greater than or equal to 7,

Page 127: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

113

D = event the number chosen is less than 4 or larger than 7, and

E = event the number chosen is an even number.

Then,

A = {0, 1, 2, 3},

B = {4, 5, 6},

C = {7, 8, 9},

D = {0, 1, 2, 3, 8, 9}, and

E = {0, 2, 4, 6, 8}.

The pairs of events (A, B), (A, C), and (B, C) are pairs of disjoint events, since, for example A∩B = φ. The events A, B, and C on the other hand are exhaustive since when we choose a number at random from the sample space; S, one of the three events must occur. Finally, the two events D and E are independent since the occurrence of one event is not affected by the occurrence of the other. At this point, the reader may not agree with this result, later on in Section 5.6 we will show the mathematics of how to prove that two events are independent.

We now consider the term probability. Probability is a measure defined on events of a given random experiment. It is a number between 0 and 1, inclusive, which reflects the likelihood that an event will occur. If A is an event in the sample space of an experiment, probability of A will be denoted by P(A). There are three definitions of probability; each will be given below followed by an example to explain that definition. Historically, the oldest way of measuring uncertainty is the classical probability concept. It applies when all possible outcomes of an experiment are equally likely; meaning each outcome has the same chance to occur.

Definition 5.8 The Classical Definition of Probability

If there are n equally likely outcomes in a random experiment, where n is finite, of which one must occur and m are regarded

as a success, then the probability of a success is given by mn

.

Definition 5.9 The Empirical or Relative Frequency Definition of Probability

If in n trials of an experiment an event E occurs m times, we

say that the relative frequency, or probability, of E is mn

.

The reader must see the difference between the above two definitions. According to Definition 5.9, the probability of an event E is the proportion of times E would occur in a long run of repeated experiments, whereas, Definition 5.8 considers the case of equally likely outcomes.

Page 128: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Five: Probability Concepts

114

Definition 5.10 The Subjective Definition of Probability

The probability of an event is based on personal judgment, past experience, and whatever available information.

Example 5.3 Example on Definition 5.8 of Classical Probability

It should be clear to the reader that Definition 5.8 considers sample spaces where outcomes are equally likely. For example, the sample space of tossing a fair coin once or the sample space of tossing two fair coins once. As another example, the sample space of throwing a balanced die one time or the sample space of throwing two balanced dice one time. As a third example, consider the sample space of choosing at random a number from the numbers 0, 1, 2, … , 9. In each of these cases, the outcomes are equally likely.

So let's assume that we consider the sample space of throwing a fair die once and we are interested in the two events A is an odd number and B a number greater than 2 . Then

A = {1, 3, 5}, and

Also,

B = {3, 4, 5, 6}, and

Example 5.4 Example on Definition 5.9 of Empirical Probability

As we said earlier in Definition 5.9, the probability of an event E according to the empirical or relative frequency interpretation is calculated based upon a long run of repeated experiments.

A car manufacturer has made 820 red colored cars during the last 20 years in business. If the car manufacturer made 8000 cars during that period, what's the probability of making a red colored car?

Among the cars manufactured, 8208000

=0.1025 were red color and we use this

result as a probability.

( ) ( )number of outcomes in AP A = see definition 5.8

total number of outcomes in S3 1

.6 2

= =

( ) ( )number of outcomes in BP B see definition 5.8 again

total number of outcomes in S4 2

= .6 3

=

=

Page 129: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

115

Example 5.5 Example on Definition 5.9 of Subjective Probability

The subjective probability is based upon personal judgment, accumulation of knowledge, and experience. For example, physicians, sometimes assign subjective probabilities to the length of life expectancy for people having cancer. Weather forecasting is another example of subjective probability.

5.2 Graphical Displays and Relationships between Events

5.2.1 Graphical Displays

Graphical displays of events are useful to explain and understand the concepts of probability and the relationship between events. There are three ways in which we can do graphical display.

• Venn Diagrams.

• Contingency Tables.

• Tree Diagrams.

We give explanation on how to do each of these three ways. In order to be able to compare between these ways, we shall use the same experiment and sample space. Example 5.6 illustrates the idea.

Example 5.6 Venn Diagram, Contingency Table, and Tree for the Experiment of Tossing Two Coins

The experiment is to toss two fair coins once, the sample space is given by the set S = {(H, H), (H, T), (T, H), (T, T)}. Figure 5.1 is a Venn diagram for this experiment.

Figure 5.1 Venn diagram for tossing two coins

Figure 5.2 is a contingency table that has two rows and two columns, recall that an r x c contingency table has r rows and c columns. The rows represent the first coin and the columns represent the second coin.

Figure 5.2 Contingency table for tossing two coins

Page 130: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Five: Probability Concepts

116

Figure 5.3 is a tree for the experiment of tossing two coins.

Figure 5.3 Tree for tossing two coins

5.2.2 Relationships between Events

We will consider three basic but important relationships that could exist between two or more events. These relationships are summarized as the complement of an event, the union of two or more events, and the intersection of two or more events.

The Complement Event

If A is an event defined on a sample space S, then the complement of A, denoted by AC, is the event that contains all elements in S but outside A. AC is also written as (Not A). Figure 5.4 displays the events A and AC in a sample space.

Figure 5.4 The complement of A

The Union of Two Events

If A and B are two events, their union, A B∪ , is the event that either A or B occurs, or they both occur. It is also written as the event (A or B).

When we consider the union of two events A and B, we must pay attention to whether the two events are mutually exclusive (disjoint) or not. If A and B are disjoint, then they are represented by two separated squares in the Venn diagram; see Figure 5.5, and if they are not disjoint, then they are represented by two intersected squares in the Venn diagram; see Figure 5.6.

Figure 5.5 Union of disjoint events

∪∪∪∪

Page 131: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

117

Figure 5.6 Union of not disjoint events

The Intersection of Two Events

If A and B are two events, their intersection, defined by A B∩ , is the event that both A and B occur simultaneously; that is A B∩ contains all common outcomes in A and B. It is also written as the event (A and B). Figure 5.7 displays the event of intersection.

Figure 5.7 Intersection of two events

5.3 Axioms of Probability

The science of probability is built on three basic properties, which are also called the axioms of probability. These axioms are:

1. The probability of any event E lies between zero and one, inclusive. In other words, 0 P(E) 1≤ ≤ .

2. P(S) = 1.

3. If E1 and E2 are mutually exclusive (or disjoint) events, i.e., E E1 2∩ = φ ,

then P(E E ) P(E ) P(E )1 2 1 2∪ = + .

Axiom 1 states that the probability of any event E is a real number that lies in the interval between 0 and 1, inclusive. Inclusive means that P(E) could be 0 and P(E) could be 1. Otherwise, P(E) is a positive fraction between 0 and 1. It cannot be negative value. This should be obvious from the definitions of probability given earlier.

Axiom 2 states that the probability of the sample space is equal to 1. Recall that the sample space is defined as the sure or certain event; meaning that when we run an experiment, one of its outcomes must occur. For example, when we toss a six-sided fair die, the sample space contains the outcomes 1, 2, … , 6. Because the die is fair then we have equally likely outcomes and therefore

∪A B

Page 132: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Five: Probability Concepts

118

Axiom 3 states that if E1 and E2 are two disjoint events; that is E E1 2∩ = φ , see

Figure 5.5, then the probability of their union is the sum of their probabilities. For example, if we consider the sample space of choosing a digit at random from the digits 0, 1, 2, … , 9, and define the two events

E1 = event the number chosen is less than or equal to 3 = {0, 1, 2, 3}, and

E2 = event the number chosen is between 4 and 6 inclusive = {4, 5, 6}.

Clearly the two events are disjoint and

Observe that from the above mentioned basic properties, we can deduce some other important properties. For example:

• P( ) 0φ = .

• ( ) ( )CP A 1 P A .= −

• cP(A B) P(B) P(A B)∩ = − ∩ .

• P(A B) P(A) P(B) P(A B)∪ = + − ∩ .

• ( ) ( )c c cP A B P A B .∩ = ∪

• ( ) ( )c c cP A B P A B .∪ = ∩

5.4 The Addition Rule

An important rule in probability is the General Addition Rule. This rule states that for any two events A and B in the same sample space of an experiment,

(E E ) {0, 1, 2, 3, 4, 5, 6},1 2P(E E ) P({0, 1, 2, 3, 4, 5, 6}) 1 2 P({0, 1, 2, 3}) P({4, 5, 6})

P(E ) P(E ) 1 234

10 10

∪ =∪ =

= += +

= +

7 .10

=

( )number of outcomes in SP(S) = see definition 5.8

number of outcomes in S6

= 1.6

=

Page 133: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

119

In the above rule, we subtract P(A ∩ B) because it has been computed twice, once with P(A) and another time with P(B).

If A and B are mutually exclusive (disjoint) events, the general addition rule reduces to the Special Addition Rule:

In what follows we give some examples to show how to calculate probabilities of events. In these examples we use the basic axioms (properties) of probability and the addition rule.

Example 5.7 An Illustrative Example

Suppose you write the digits from 0 to 9 on separate pieces of paper, fold up the pieces and put them in a hat. Then you ask a friend of yours to draw, while blindfolded, one piece from the hat. What is the probability your friend will draw

a. an even number?

b. a number divisible by 3?

c. an odd number or a number divisible by 3?

d. a number both even and divisible by 3?

e. a number larger than 6?

f. a number larger than 6 or less than 3?

g. a number larger than 6 and less than 3?

h. a number smaller than or equal 6?

i. a number less than 6 or larger than 3?

Solution

Recall that the sample space of this experiment is the set of integers from 0 to 9; namely, S = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. The number of outcomes in S is 10. These 10 outcomes are equally likely and therefore

( ) ( ) ( ) ( ) 1P 0 P 1 P 2 ... P 9 .10

= = = = =

a. Let the event A = an even number = {0, 2, 4, 6, 8}. The probability of an even number is then

P(A B) P(A) P(B)∪ = +

∪ = + − ∩P(A B) P(A) P(B) P(A B)

( ) ( )

( )

P A P {0,2,4,6,8}

number of outcomes in A see definition 5.8

total number of outcomes inS5

= .10

=

=

Page 134: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Five: Probability Concepts

120

b. Let the event B = a number divisible by 3. Then B = {0, 3, 6, 9}. The

probability of a number divisible by 3 is then ( ) 4P B10

= .

c. Let the events C = an odd number or a number divisible by 3, O = an odd number = {1, 3, 5, 7, 9} and B = number divisible by 3 = {0, 3, 6, 9}. The

intersection of these two events is the event ( )O B {3,9}.∩ = So, C =

O ∪ B, and

( ) ( )( ) ( ) ( )

P C P O B

P O P B P O B

5 74 2 .10 10 10 10

= ∪

= + − ∩

= + − =

d. Let the event D = a number that is even and divisible by 3. The only even number that is divisible by 3 in the sample space is 6.

Therefore ( ) 1P D .10

=

e. Let the event E = a number larger than 6, then E = {7, 8, 9} and

( ) { }( ) number of outcomes in E 3P E P 7,8,9 . (See definition 5.8)10total number of outcomes in S

= = =

f. Let the events F = a number larger than 6 or a number less than 3,

E = a number larger than 6, and L = a number less than 3. So, event

F = E ∪ L ={7, 8, 9} ∪ {0, 1, 2} and

( ) ( )( ) ( )

( )

P F P E L

P E P L "by the special addition rule since E and L are disjoint"

3 3 6 . See definition 5.810 10 10

= ∪

= +

= + =

g. Let the event G = a number larger than 6 and less than 3, then G = {φ} and P(G) = 0.

h. Let the event H = a number smaller than or equal 6, then H = {0, 1, 2, 3, 4,

5, 6} and ( ) { }( ) ( )7P H P 0,1,2,3,4,5,6 . See definition 5.8.

10= =

i. Let the events I = a number less than 6 or larger than 3, L1 = a number less than 6 = {0, 1, 2, 3, 4, 5} and L2 = a number larger than 3 = {4, 5, 6, 7, 8, 9} and L1 ∩ L2 = {4, 5}. Then I = L1 ∪ L2 and

( ) ( )( ) ( ) ( )

P I P L L1 2

=P L P L P L L "by the addition rule"1 2 1 26 6 2

=10 10 10

1.

= ∪

+ − ∩

+ −

=

Page 135: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

121

Example 5.8 Another Illustrative Example

Suppose the weatherman estimates the probability of rain tomorrow as 50%, the probability of lightning as 40%, and the probability of both rain and lightning as 20%. Determine the probability that tomorrow there will be

a. no rain,

b. no lightning,

c. rain or lightning,

d. rain but no lightning,

e. lightning but no rain,

f. neither rain nor lightning.

Solution

Let the events R = it rains tomorrow and L = there is lightning tomorrow. Then

a. the probability there will be no rain tomorrow is

( ) ( )cP R 1 P R 1 0.50 0.50= − = − = .

b. the probability there will be no lightning tomorrow is

( ) ( )cP L 1 P L 1 0.040 0.60= − = − = .

c. the probability of rain or lightning tomorrow is, by the general addition rule

( ) ( ) ( ) ( )P R L P R P L P R L

0.50 0.40 0.20

0.70

∪ = + − ∩= + −=

d. the probability of rain but no lightning is

( ) ( ) ( )cP R L P R P R L

0.50 0.20

0.30

∩ = − ∩

= −=

e. the probability of lightning but no rain is

( ) ( ) ( )cP L R P L P R L

0.40 0.20

0.20

∩ = − ∩

= −=

f. the probability of neither rain nor lightning is

Page 136: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Five: Probability Concepts

122

( ) ( ) ( )cc cP R L P R L 1 P R L

1 0.70

0.30

∩ = ∪ = − ∪

= −=

5.5 The Conditional Probability

Conditional Probability is the probability that a particular event occurring given that another event has occurred. The probability of event A occurring given that the event B has already occurred is denoted by P(A|B), the vertical line "|" is read "given".

Example 5.9 Illustrating Conditional Probability

Let us suppose the TV weathergirl states that the probability of rain tomorrow is 90% and if it lightens tomorrow, then the probability of rain is 95%. These are two entirely different statements. If, again, we let the events

R = there is rain tomorrow,

L = there is lightning tomorrow,

then the first information we are given is the "unconditional probability" P(R) = 90% = 0.90.

The second information we are given is not the probability of rain tomorrow, but rather it is the probability of rain tomorrow under the condition that it also lightens tomorrow. This is a "conditional probability"

P(R|L) = 95% = 0.95

Conditional Probability using Contingency Tables

Conditional probabilities can be determined from contingency tables. Example 5.10 shows how conditional probabilities can be calculated from a contingency table.

Example 5.10 Conditional Probability and Contingency Tables

Table 5.1 represents the cumulative AIDS/HIV by mode of transmission as of 31/12/2000 in Jordan*.

( )( ) ( )

( )

P A BThe quantity , for P B 0, is defined as the Conditional

P B

Probabilty of A given B, which is written as P A | B .

∩≠

Page 137: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

123

Table 5.1 Cumulative AIDS/HIV by mode of transmission

Number

Case Carrier Total Blood 33 46 79

Sexual 36 93 129 IVDU's 0 6 6 Vertical Tran.

3 0 3

Unknown 12 29 41

M

od

e o

f T

ran

smis

sio

n

Total 84 174 258 * The Hashemite Kingdom of Jordan, Ministry of Health, Annual Statistical Book, 2000.

A patient is selected at random. Determine the probability the patient is:

a. a carrier.

b. a carrier given unknown mode of transmission.

c. a carrier given known mode of transmission.

Solution

Let us define the events

C = the patient is a carrier, and

U = the mode of transmission is unknown.

a. The probability the patient is a carrier equals to

b. The probability the patient is a carrier given unknown mode of transmission is

c. The probability the patient is a carrier given known mode of transmission is

( ) 174P C 0.674 "this is the unconditional probability".

258= =

P(C U)P(C | U)

P(U)

29258 41

258 0.707

∩=

=

=

( ) ( )( )

c P C P C UP(C U )cP(C | U )c 1 P UP(U )

174 29258 258

411258

145258 217258

0.668.

− ∩∩= =−

−=

=

=

Page 138: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Five: Probability Concepts

124

The Multiplication Rule

P(A B)Since P(A|B) , then multiplying both sides by P(B) we get

P(B)

P(A B) P(B).P(A|B).

P(A B)Also, P(B|A) , and multiplying both sides by P(A) we get

P(A)

P(A B) P(A).P(B|A).

∩=

∩ =∩=

∩ =

This result is called The General Multiplication Rule. It is used to

calculate the joint probability of two events.

Example 5.11 Another Example on Conditional Probability

In a cold winter, a man interested in deer hunting, followed 100 deer and found 70 of the deer survived the winter, 45 were fawns, and 20 of the survivors were fawns. Find the probability that a fawn deer survived the winter.

Solution

We define the following events:

A = deer survives the winter,

B = deer is fawn.

Then

The required probability is P(A|B) and by the definition of conditional probability, we will have

Can you determine ( )CP A | B ?

5.6 Independence and the Multiplication Rule

5.6.1 The Multiplication Rule

The Multiplication Rule is another important rule in probability. It is used to find the joint probability of two events; that is the probability of intersection.

( )

( )

( )

70P A 0.70,10045P B 0.45, and

10020P A B 0.20.

100

= =

= =

∩ = =

( ) ( )( )

P A B 0.20P A | B 0.44.0.45P B

∩= = =

Page 139: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

125

Example 5.12 An Example on the General Multiplication Rule

Table 5.2 represents the cumulative AIDS/HIV by nationality as of 31/12/2000*

Table 5.2 Cumulative AIDS/HIV by nationality

Nationality Frequency

Jordanian 118

Others 140

Total 258

*The Hashemite Kingdom of Jordan, Ministry of Health, Annual Statistical Book, 2000.

Two patients are selected at random. Assume selection is done without replacement, what's the probability that the first patient selected is Jordanian and the second is non-Jordanian?

Solution

We define the two events

J = event the first patient is Jordanian, and

N = event the second patient is non-Jordanian

The required probability is then P(J∩N), and we have to use the general multiplication rule.

It is beneficial to have a tree diagram to explain joint and conditional probabilities when applying the general multiplication rule. Figure 5.8 is a tree diagram for the data given in Example 5.12.

Figure 5.8 Tree diagram for Example 5.12

117257

140258

139257

118257

118258

118 117(J and J) . = 0.208

258 257

118 140(J and N) . = 0.249

258 257

140 118(N and J) . =0.249

258 257

140 139(N and N) . =0.293

258 257

140

257

118 140P(J N) P(J).P(N | J) . 0.249

258 257∩ = = =

Page 140: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Five: Probability Concepts

126

Independent Events

Events A and B are defined to be independent if = =P(A|B) P(A) or P(B|A) P(B) .

Otherwise the events are said to be not independent (or dependent).

5.6.2 Independent Events

The concept of independence is paid a great deal of importance in probability. Consider the following pairs of events:

a. Rainfall and rate of inflation.

b. Rainfall and the price of corn.

c. Woman's shoe size and her income.

d. Woman's shoe size and her height.

If the occurrence of an event does not effect the occurrence or nonoccurrence of another, as is the case in parts a and c above, then we say that the two events are independent. Otherwise, we say that the two events are not independent (or dependent), as is the case with parts b and d. The next formula shows how to define independence mathematically.

CAUTION: Do not confuse independent events with disjoint events.

Example 5.13 Independent Events

Reem will draw one card at random from a well-shuffled deck of 52 cards. What is the probability the card will be a

a. king?

b. king, given that it is heart?

c. king, given that it is a face card?

Solution

Let's define the events

K = card drawn is king,

H = card drawn is heart, and

F = card drawn is face.

Page 141: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

127

a. As the full sample space of 52 cards contains 4 kings, ( ) 4 1P K .52 13

= =

b. The 13 hearts contain only 1 king; hence the probability the card is a

king, when we know it is a heart, is ( ) 1P K | H .13

=

Since P(K) = P(K|H) = 1 ,13

the two events K and H are independent.

c. If we restrict the sample space to only the 12 face cards, then the

probability of a king becomes ( ) 4 1P K | F .12 3

= = Since ( ) ( )P K P K | F≠ ,

the two events K and F are not independent.

An important question arises in the case of independent events: "How does independence affect each of the general multiplication rule and the general addition rule"? First, we look at the general multiplication rule.

The general multiplication rule, as we know, is stated in two equivalent forms:

(i) ( ) ( ) ( )P A B P A .P B | A .∩ =

(ii) ( ) ( ) ( )P A B P B .P A | B , or∩ =

Since the two events A and B are independent if

a. P(A) = P(A|B), or

b. P(B) = P(B|A).

Then substituting (b) with (i) yields

P(A ∩∩∩∩ B) = P(A).P(B)

Also substituting (a) with (ii) yields

P(A ∩∩∩∩ B) = P(B).P(A)

Thus, in both cases we have ( ) ( ) ( )∩ =P A B P A .P B .

With regards to the addition rule, recall that this rule is stated as follows:

P(A ∪ B) = P(A) + P(B) – P(A ∩ B)

In the case A and B are independent, then

P(A ∪ B) = P(A) + P(B) – P(A).P(B)

We summarize these findings in the following two rules; called the Special Multiplication Rule and Special Addition Rule.

The Special Multiplication Rule for Independent Events

If the two events A and B are independent,

∩ =P(A B) P(A).P(B)

Page 142: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Five: Probability Concepts

128

Example 5.14

Reem; a student at the university plans to take two courses next summer, a department requirement course (D) and a university requirement course (U). The probability she will succeed in D is 0.80 and in U is 0.60. Assume the two courses are independent. What is the probability Reem will succeed in

a. both courses?

b. in at least one of the two courses? (This means that Reem may succeed in either one or in both courses.)

Solution

Since the two events D and U are independent,

a. the probability Reem will succeed in both courses is P(D ∩ U), and by the special multiplication rule

P(D ∩ U) = P(D).P(U) = (0.8)(0.6) = 0.48

b. the probability Reem will succeed in at least one of the two courses is P(at least one) = P(D) + P(U) - P(D ∩ U)

= 0.80 + 0.60 - (0.80)(0.60)

= 0.92

5.6.3 The Rule of Total Probability

To understand the Rule of Total Probability, we must first go back to exhaustive events. Recall that exhaustive events; Definition 5.6, means that at least one event must occur. Figure 5.9 depicts a sample space S that has six disjoint events.

Figure 5.9 A sample space that has six disjoint events

The Special Addition Rule for Independent Events

If the two events A and B are independent,

∪ = + −P(A B) P(A) P(B) P(A).P(B)

A1 A2 A3

A4 A5 A6

S

Page 143: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

129

The Rule of Total Probability

In a random experiment, suppose the sample space S is composed of

n mutually and collectively exhaustive events A1, A2, … , An. Then for any event B defined on S,

1 1 2 2= + + +P(B) P(A )P(B|A ) P(A )P(B|A ) ... P(A )P(B|A )n nP(B) P(A )P(B | A ) P(A )P(B | A ) ... P(A )P(B | A )= + + +

Let's define an event B on sample space S. The graph of S would look like Figure 5.10 below.

Figure 5.10 A sample space that has six disjoint events and event B

From Figure 5.10, note that B (A B) (A B) ... (A B)1 2 6= ∩ ∪ ∩ ∪ ∪ ∩ and for i

= 1, 2, … , 6, the (A B)i ∩ are mutually exclusive events. Therefore,

P(B) P(A B) P(A B) ... P(A B)1 2 6 P(A )P(B | A ) P(A )P(B | A ) ... P(A )P(B | A).1 1 2 2 6 6

= ∩ + ∩ + + ∩= + + +

The last formula is called the "Rule of Total Probability".

Example 5.15 Illustration of the Rule of Total Probability

Table 5.3 shows the Kingdom's Hospital Principal Services in the year 2000*.

Table 5.3 Hospital principal services in Jordan

Patients

Admission% Dead% MOH** 46.0% 1.2% RMC 19.0% 2.4% JUH 3.8% 2.4%

S

ect

or

Private 31.2% 0.9% * The Hashemite Kingdom of Jordan, Ministry of Health, Annual Statistical Book, 2000.

**MOH = Ministry of Health, RMC = Royal Medical Services, JUH = Jordan University Hospital

A patient admitted to one of the four sectors is selected at random. Determine the probability he/she died in the year 2000.

A1 A2 A3

A4 A5 A6

S

B

Page 144: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Five: Probability Concepts

130

Solution

Let us define the following events:

D = event the patient selected is dead,

S1 = event patient admitted to MOH hospital,

S2 = event patient admitted to RMC hospital,

S3 = event patient admitted to JUH hospital,

S4 = event patient admitted to a private hospital.

The sample space in this experiment is represented by the four mutually exclusive events S1, S2, S3, and S4. The event D is defined on S.

From the above table we can derive the following probabilities:

P(S1)=0.460 P(D|S1)=0.012

P(S2)=0.190 P(D|S2)=0.024

P(S3)=0.038 P(D|S1)=0.024

P(S4)=0.312 P(D|S4)=0.009

From the rule of total probability, we get

Bayes's Theorem is an important theorem used a lot in probability. Its main use is to revise probabilities compatible with newly attained knowledge. These revised probabilities are of the conditional type probabilities.

5.6.4 Bayes's Rule

( ) ( ) ( ) ( )

4P(D) P(S )P(D |S )i i

i 1 0.460 0.012 ... 0.312 0.009 0.0138 1.38%

==

= + + = =

Bayes's Theorem

In any experiment, suppose the sample space S is composed of n

mutually and collectively exhaustive events A1, A2, … , An. For any event B defined on S,

1

=

=∑

P(A )P(B|A )i iP(A |B)i nP(A )P(B|A )i i

i

for i = 1, 2,…, n.

Page 145: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

131

In the above example, we have four mutually exclusive events; S1, S2, S3, S4 and their corresponding probabilities P(S1), P(S2), P(S3), and P(S4). We also have P(D|S1), P(D|S2), P(D|S3), and P(D|S4) as known probabilities. In total, we have eight known probabilities.

In Bayes's Rule, the problem is to use these eight probabilities to determine the conditional probabilities P(S1|D), P(S2|D), P(S3|D), and P(S4|D). We will show how to find P(S1|D) in terms of those eight probabilities. Other probabilities are calculated in similar manners. Since

then by the general multiplication rule applied to numerator, and the rule of total probability to denominator, we get

P(S D) P(S )P(D |S )1 1 1P(S | D) .1 4P(D)

P(S )P(D |S )i ii 1

∩= =

=∑

Other conditional probabilities are found in a similar manner.

Example 5.16 Example on Bayes's Theorem

From Example 5.15 for the rule of total probability we know 0.9% of the patients admitted to private hospitals are dead, P(D|S4). In Bayes's Theorem we ask the following question: what percentage of dead people where admitted to private hospitals? That is, P(S4|D) = ? To answer this question we apply the Bayes's Theorem defined above and find

5.7 Some Counting Rules

Counting Rules are methods that describe the total number of ways (or choices) for certain selections. Usually, we use these rules in probability to get the number of outcomes in a sample space and events of a given experiment. In many situations, we do not need to know all the outcomes of a sample space (or an event) or it may be too difficult to get. It suffices to get only the number of outcomes in the sample space and the required event so we can calculate the probability.

There are plenty of these rules. We will consider only 4 rules that are used frequently in solving probability problems. We will give the rule followed by an example.

P(S D)1P(S | D)1 P(D)

∩=

( )( )0.312 0.009P(S )P(D |S )4 4P(S | D) 0.203 20.3%.4 4 0.0138P(S )P(D |S )i i

i 1

= = = =

=∑

Page 146: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Five: Probability Concepts

132

Counting Rule 1 The Product Rule

Suppose we have Ai events, each contains ni outcomes for i = 1, 2, … , k. Then, the number of possible ways to choose one outcome from each event is the product n1.n2…nk.

Example 5.17 To Illustrate The Product Rule

Suppose there are 3 roads from city A to city B and 5 roads from B to C. Then the total number of ways (possibilities) you can get from city A to C is3 5 15× = .

Also suppose you have 6 shirts of different colors and 4 pants of different colors. How many possibilities (ways) are there? For each shirt there are 4 possibilities of pants, so together there are 6 4 24× = possibilities.

Finally, how many license plates of 2 letters followed by 4 numbers are possible?

Assuming that repetition of letters and numbers is allowed, the answer is

( ) ( )2 426 10 ,because there are 26 possibilities for the first place, 26 for the

second, 10 for the third, 10 for the fourth, 10 for the fifth, and 10 for the sixth.

Note that in each of the above examples we multiply the number of possible ways.

Counting Rule 2 n-Factorial

Suppose that n distinct objects are to be drawn sequentially, or ordered from left to right in a row. (Order is important; objects are drawn without replacement). Then, the number of ways to arrange n distinct objects in a row is n(n-1)(n-2)…3.2.1 = n!.

Example 5.18 To Illustrate the n-Factorial Rule

How many ways can one arrange the letters A, B, C? One can have

ABC, ACB, BAC, BCA, CAB, CBA

There are 3 possibilities for the first position. Once we have chosen the first position, there are 2 possibilities for the second position, and once we have chosen the first two positions, there is only 1 choice left for the third. So there are 3 2 1 3!× × = arrangements. In general, if there are n distinct letters, there are n! possibilities. Later, we consider the case if one or more letters were the same.

How many ways can 3 boys and 2 girls form a line? There are 5 possibilities for the first position. Once we have chosen the first position, there remain 4 possibilities for the second position, and once we have chosen the first two

Page 147: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

133

positions, there remain 3 possibilities for the third position, and so on. Therefore, the number of ways is 5! = 5.4.3.2.1 = 120.

What would the answer be if the two girls must be together? We treat the two girls as one girl, and then we have 4! = 24 ways. However, the two girls may change places in 2! = 2 ways. Thus the final answer is 2 24 48× = .

What would the answer be if the boys must be together and the girls must

be together? The answer is ( )2 3! 2! 24.× =

How many ways can we arrange 4 statistics books, 3 mathematics books, 2 biology books, and 1 computer science book on a bookshelf so that all the statistics books are together, all the mathematics books are together, all the biology books are together.

We can arrange the statistics books in 4! ways, the mathematics books in 3! ways, the biology books in 2! ways, and the CS book in 1! way. But these 4 different subject books can be arranged in 4! ways according to the subject in the same way as if we have four letters S, M, B, C and there are 4! ways of doing that. Thus the answer is 4!(4!3!2!1!) = 6912.

Now we consider the case if one or more letters are repeated. Let's consider the following case. How many ways can we arrange the letters A, A, B, C? Let us label these letters as a, A, B, C and we have 4! = 24 ways to arrange these letters. But we have repeats such as aA or Aa. So we have a repeat for each possibility, and so the answer should be 4! / 2! = 12. If there were 4 A's, 3 B's, and 2 C's, we would have

How many ways can we arrange the letters of the word MISSISSIPPI? There are 11 letters all together in the word MISSISSIPPI, 4 I's, 4 S's, 2 P's, and 1 M. So the total number of ways is

Counting Rule 3 Permutations Rule

When we select a subset of k objects from a set of n distinct objects such that ≤k n and

i) selection is without replacement,

ii) order is important,

then we count the number of ways using the Permutations rule:

n!nP .k (n k)!=

9!1260.

4!3!2!=

11!34650.

4!4!2!1!=

Page 148: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Five: Probability Concepts

134

Example 5.19 To Illustrate the Permutations Rule

How many ways can we choose 3 letters out of 5? Let the letters be A, B, C, D, E and order is important and selection is without replacement, then there would be 5 possibilities for the first position, 4 for the second, and 3 for the third, for a total of 5 4 3.× × Note that this product can be written as:

How many ways can you choose four courses from a set of ten offered next semester and order is important?

Remark

If order is important but selection is with replacement, then the number of ways is nk.

How many ways can you choose two letters out of the four letters A, B, C, D if order is important and selection is with replacement?

According to the above remark, there are 42 = 16 possibilities, namely:

AA AB AC AD

BA BB BC BD

CA CB CC CD

DA DB DC DD

Counting Rule 4 Combinations Rule

When we select a subset of k objects from a set of n distinct objects such that ≤k n and

i) selection is without replacement

ii) order is not important

then, we count the number of ways using the combinations rule:

n!nC .k k!(n k)!

=−

Example 5.20 To Illustrate the Combinations Rule

How many ways can we choose 3 letters out of 5? Let the letters be A, B, C, D, E and order is not important and selection is without replacement, then there would be 5 possibilities for the first position, 4 for the second, and 3 for the third, for a total of 5 4 3.× × But suppose the letters selected were A, B, C. If

( )5 4 3 2! 5! 5!55 4 3 P .22! 2! 5 3 !

× × ×× × = = = =−

10!10P 5040.4 (10 4)!= =

Page 149: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

135

order is not important, we will have the letters A, B, C 6 times, because there are 3! = 6 ways of arranging 3 letters. The same is true for any other choice of 3 letters. So we should have

5 4 3.

3!

× ×

This last result can be rewritten as

If we want to choose three books from a set of seven distinct books, without replacement and order is not important, we calculate:

Suppose there are 6 men and 7 women. How many ways can we choose a

committee that has 3 men and 2 women? We can choose 3 men in 6C3 ways

and 2 women in 7C2 ways. The number of committees is then the

product 6C3 . 7C2 = 420.

Suppose there are 6 men and 7 women. How many ways can we choose a committee of 5 so that at most 1 woman is in? The answer would be

Remark

If order is not important and selection is with replacement, then the number of

ways is n 1 kCk− + .

In how many ways can you select two letters out of the four letters A, B, C, D if order is not important and selection is with replacement?

According to the above remark, with n = 4 and k = 2, there are 4 1 2 5C C 102 2

− + = = possibilities, namely:

AA AB AC AD

BB BC BD

CC CD

DD

5 4 3 5 4 3 2! 5! 5C .33! 3!2! 3!2!

× × × × ×= = =

7!7C 35.3 3!4!= =

6 7 6 7C C C C 111.5 0 4 1+ =

Page 150: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Five: Probability Concepts

136

Exercises

5.1 An experiment consists of choosing at random two out of 5 persons, call them Ali (A), Basem (B), Cindy (C), Dawood (D), and Eman (E).

Determine the sample space of this experiment. Is it finite or infinite? Why?

a. Define the event A = Ali is chosen. Find P(A).

b. Determine the probability of the event Ali is not being chosen.

c. Determine the probability of Ali or Basem being chosen.

d. Determine the probability of Ali and Basem being chosen.

e. Determine the probability of Ali or Basem or Cindy being chosen.

5.2 Two fair dice are tossed once.

a. Determine the sample space of this experiment.

b. Define the event A = the sum of points on the two dice is 7.

c. Define the event B = the sum of points on the two dice is 11.

d. Determine P(A) and P(B).

e. Determine P(A or B).

5.3 Two fair dice are tossed once. Determine the probability that

a. the sum is 5,

b. the sum is 10,

c. the sum is less than 5,

d. the sum is 10 or larger,

e. the two numbers are the same,

f. the sum is an odd number,

g. the sum is 5 or 10,

h. one die shows a larger number than the other.

5.4 The mailman is approaching the mail box of Mohammad (M), Ahmad (A), and Fady (F) with 2 letters in his hand. If we assume that all three men are equally likely to receive letters, what's the probability that

a. Mohammad gets both letters?

b. Ahmad gets at least one letter?

c. Two different people get letters?

d. Mohammad gets more letters than Fadi?

e. Ahmad and Fadi each get a letter?

5.5 Ali, Basem, Cindy, and Dina enter a room where there is a chair, a couch that seats two people, and another chair next to the coach. If people take seats randomly, what is the probability that

a. two females sit on the coach?

b. Ali sits on the coach?

c. Ali sits on the coach with a female?

d. At least one female sits on the coach?

5.6 Two fair coins will be tossed once. Determine the probability that

a. two heads appear,

b. two heads appear, given that the first coin is a head,

Page 151: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

137

c. two heads appear, given that at least one coin is a head,

d. two heads appear, given that the second coin is a tail.

5.7 Three fair coins are tossed once. Compute the probability of getting two heads and one tail, given that there are more heads than tails.

5.8 One fair die is thrown. Determine the probability that the number showing on the die is

a. 4,

b. 4, given that the number is even,

c. 4, given that the number is odd,

5.9 Reem will draw one card at random from a full deck of 52 cards. What is the probability the card will be a

a. king?

b. king, given that it is a face card?

c. king, given that it is a diamond?

d. king, given that it is not an ace?

5.10 The weathergirl asserts that

i) the probability of rain tomorrow is 30%,

ii) if it rains tomorrow, then the probability of thunder is 40%,

iii) if tomorrow there is rain and thunder, then the probability of a tornado is 10%.

What is the probability that tomorrow there is

a. rain and thunder?

b. rain and thunder and a tornado?

5.11 If you toss three fair coins in succession, what is the probability you get three heads, given that

a. at least one coin is a head?

b. at least two coins is a head?

c. the first coin is a head?

d. the first and second coins are heads?

e. the first or the second coin is a head?

f. all three coins come up the same?

g. not all three coins come up the same?

5.12 In a throw of one die, what's the probability of getting an odd number, given that the number 1 is not thrown?

5.13 A college dormitory is home to 24 junior mathematics majors, 16 senior mathematics majors, 18 junior engineering majors, and 12 senior engineering majors. If a student is to be chosen at random from the dormitory, what is the probability he will be a

a. engineering major?

b. senior?

c. junior, given that he is engineering major?

d. mathematics major, given that he is senior?

5.14 Reem will fly from Amman to Dubai, spend a day and fly to Abu Dhabi, then spend two days and fly to Kuwait. She has a choice of 3 airlines from Amman to Dubai, 4 airlines from Dubai to Abu Dhabi, and 5 airlines from Abu Dhabi to Kuwait. How many ways can she choose the airlines for her three flights?

5.15 Ahmad, Hamed, and Dina are buying ice cream cones in a

Page 152: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Five: Probability Concepts

138

store where there are 10 choices for flavor. How many different ways can the three decide on flavors, if

a. each orders one scoop of ice cream?

b. each orders two scoops of ice cream, and the top scoop may be a different flavor from the bottom scoop?

c. Ahmad orders one scoop, Hamed two scoops, and Dina three scoops?

d. each orders one scoop, but the cones come in two flavors?

5.16 How many different 4-digit numbers are there

a. that are odd?

b. that are odd and larger than 3000?

c. that have only odd digits?

d. that have only even digits?

5.17 A father has 7 sons. He wants to clean up his yard. His wife suggests he ask his sons for help. How many ways can the father decide which sons will help him clean the yard? (He may ask none of his sons, all of them, or only some of them.)

5.18 Next semester there are 3 physics courses Marwa can take, 4 biology courses, 5 history courses, and 6 English courses. How many ways can Marwa choose

a. a physics course and a biology course?

b. a physics course or a biology course?

c. a biology course and a history course and an English course?

d. a biology course or a history course or an English course?

e. a physics course or a biology course, and a history course or an English course?

f. a physics course and a biology course, or a history course and an English course?

5.19 A college class has 10 junior women, 9 junior men, 6 senior women, and 5 senior men. How many ways can the professor choose

a. a male?

b. a women or a senior man?

c. a junior or a woman?

d. a senior couple?

e. a senior couple or a junior couple?

f. a senior or a woman, and a junior?

5.20 How many ways can 10 students line up at the cafeteria to order a hamburger?

5.21 How many ways can 4 mathematics professors be assigned to teach 5 mathematics classes, if each is to be assigned but one class?

5.22 How many ways can 3 soldiers line up for a picture, if

a. each is given the choice of wearing or not wearing his hat?

b. each is given the choice of wearing or not wearing his hat, and also the choice of carrying or not carrying his gun?

5.23 How many ways can a club with 50 members choose a President, Vice-President,

Page 153: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

139

Secretary, and a treasurer? (No one person can hold two positions.)

5.24 The mailman is bringing 4 letters to an apartment house with 7 mailboxes. How many ways can the mailman distribute the 4 letters among the 7 mailboxes if

a. no mailbox receive more than one letter?

b. each mailbox receive any number of letters?

5.25 Three cars have entered a parking lot with 11 empty parking spaces. Each car has an option of backing into its parking space or driving in forwards. How many ways can the 3 drivers select

a. parking spaces?

b. parking spaces and parking positions?

5.26 A set has 20 elements. How many 2-element subsets does the set have? How many 18-element subsets does it have?

5.27 A club has 20 members. How many ways can the club form a committee of

a. 3 members?

b. 17 members?

c. 20 members?

d. 1 member?

5.28 A farmer has 7 horses. How many ways can he choose 3 to march in a parade?

5.29 There are 5 cats living on the neighborhood. If every cat gets into a fight with each other once during the night, how many catfights are there?

5.30 A university team of runners has 4 freshmen, 2 sophomores, 4 juniors and 3 seniors. How many ways can the coach choose

a. 6 runners to race on a championship?

b. 3 freshmen?

c. 4 runners and line them up for a publicity picture?

d. 1 runner from each class to visit a high school?

e. 2 senior co-captain?

f. 1 senior co-captain and 1 junior co-captain?

5.31 How many ways can Reem select 3 out of her 8 business suits to pack for a trip?

5.32 In how many ways can you arrange the letters in the word

a. EAT

b. EGG

c. BUSINESS

d. MEMBER

e. KUWAIT

5.33 From a standard deck of 52 cards, how many ways can you get

a. 3 kings and 2 queens?

b. 4 aces and one other card?

c. 2 kings, 1 queen, and 2 numerical cards?

d. 5 cards of the same type?

5.34 A farmer has 6 horses, 3 sons, and 2 daughters. He intends to choose 2 sons and 1 daughter, seat each upon a horse, and line them up for a picture. How many possible pictures are there?

Page 154: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Five: Probability Concepts

140

5.35 From a basketball team of 9 players, how many ways can the coach choose two guards, two forwards, and one center to start the game?

5.36 A game consists of choosing 5 cards from a deck of 52 playing cards. How many different hands are there containing

a. 2 aces and 3 face cards?

b. 2 clubs, 2 hearts, and 1 diamond?

c. 3 black cards and 2 red cards?

d. 1 king, 1 queen, 1 jack, and 2 aces?

e. 2 even numbers and 3 red face cards?

5.37 How many ways are there to choose a committee of 2 women and 2 men out of 5 women and 4 men.

5.38 How many ways can 9 young women be lined up, if Reem insists on standing in the middle and Marwa insists on standing on an end?

5.39 Nine women will travel in a Ford, a Honda, and a Toyota. If 3 women ride in each car, how many ways can the 9 women be distributed in the 3 cars?

5.40 A family of 2 parents and 3 children will go for a ride in a car with 2 seats in front and 3 in the back. How many ways can the family be seated in the car if

a. the parents sit in the front and the children in the back?

b. The parents sit in the front, the children in the back, and the mother drives?

c. The mother drives and the father sits in the back?

d. One parent drives and the other sits in the back?

5.41 A town has 8 single young men and 5 single young women.

a. If next Friday each of the young women marries an eligible young man from the town, in how many ways can that be accomplished?

b. If only 4 of the young women marry an eligible young man next Friday, in how many ways can that be accomplished?

c. If 2 young men marry an eligible young woman next Friday, in how many ways can that be accomplished?

d. If only one couple gets married next Friday, in how many ways can that be accomplished?

5.42 How many ways can 4 cookbooks and 2 sewing books be lined up on a shelf, if

a. there are no restrictions?

b. the sewing books are to be in the middle?

c. the cookbooks are to be together and the sewing books together?

d. the sewing books are together?

5.43 Ali, Basem, Cindy, Dawood, Elham, and Farook will sit in a row of six chairs. How many ways can they choose a chair if

a. a female is to sit at the extreme left?

b. the two males are to sit on the ends?

Page 155: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

141

5.44 Consider the experiment of choosing one governorate from Jordan.

a. Write the sample space of this experiment.

b. Let event C = central region, N = northern region, and S = southern region. Define each of these regions by its governorates.

5.45 The following table represents estimated population of the Kingdom for females by age group*:

Age

group

Number of

females

0 – 4 347590

5 – 9 343790

10 – 14 325610

15 – 19 298200

20 – 24 285720

Total 1600910

*Source: Statistical Yearbook, DOS, 2006.

For a female selected at random, let

A = event the female is under 4.

B = event the female is 5 – 9.

C = event female is 10 – 14.

D = event female is 15 – 19.

E = female is under 20.

a. Find P(E).

b. Express event E in terms of events A, B, C, and D.

c. Determine P(A), P(B), P(C), and P(D).

d. Compute P(E) using the special addition rule and your answers from parts (b) and (c). Compare your answer with the one you found in part (a).

5.46 The following table shows number of employees in public and private sectors establishments by region and nationality, 2005*:

Nationality

Jord. Non-Jord. Total

North region

131204

5607

136811

Middle region

594007

70746

664753

Re

gio

n

South region

53613

10942

64555

Total 778824 87295 866119

*Source: Statistical Yearbook, DOS, 2006.

a. Determine the (unconditional) probability that a Jordanian employee from the middle region is selected.

b. Determine the (conditional) probability that a Jordanian employee is selected given that he is from the middle region.

c. Determine the (conditional) probability that a non-Jordanian employee is selected given that he is from the south region.

5.47 The following table shows number of employees in the public sector establishments by educational level, 2005*:

Education Level Total

Illiterate/Literate 7386

Less than secondary 63562

Vocational apprenticeship 2212

General secondary 33918

Intermediate diploma 43300

B.A./B.Sc. 82522

Higher than B.A./B.Sc. 22512

Total 255412

*Source: Statistical Yearbook, DOS, 2006.

Page 156: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Five: Probability Concepts

142

Two persons are selected at random and without replacement from this table. Determine the probability that:

a. the first's educational level is intermediate diploma and the second's B.A./B.Sc.

b. the first's educational level is less than secondary and the second's also less than secondary.

5.48 The following table represents Jordanian employed females age 15+ years (percentage distribution), 2006*:

Education level

Jordanian Population%

Female%

Illiterate 2.2 1.5

Less than secondary

50.6 15.7

Secondary 14.5 11.6

Intermediate diploma

11.4 26.2

Bachelor and above

21.3 45.0

*Source: Statistical Yearbook, DOS, 2006.

5.49 The following table represents a percentage distribution for teachers by authority of supervision and the percentage of female teachers*:

Authority of supervision

% of teachers

% of females

Ministry of Education

0.707 0.416

Other Governmental

0.016 0.003

UNRWA 0.052 0.020

Private education 0.225 0.191

*Source: Statistical Yearbook, DOS, 2006.

A teacher is selected at random, what's the teacher is

a. a female?

b. If the selected teacher is a female, what is the probability that she works at a private school?

a. One person is selected at random; determine the probability that the person is female.

b. Determine the probability of a female that has a secondary school level of education.

Page 157: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

143

Chapter Learning Outcomes

When you complete your careful study of this chapter you should be able to:

1. learn the three definitions of probability.

2. compute probabilities for experiments having equally likely outcomes.

3. know the axioms of probability and understand it.

4. determine whether two events are mutually exclusive.

5. find probabilities of various kinds of events such as complements, intersections, and unions.

6. show graphical displays of events such as Venn diagrams, contingency tables, and trees.

7. state and apply the general addition.

8. state and apply the general multiplication rule.

9. learn how to construct a contingency table.

10. know how to construct joint probabilities and how to use it.

11. define and compute the conditional probability.

12. state and apply the special addition rule.

13. state and apply the special multiplication rule.

14. discuss the concept of independence of two events.

15. state the rule of total probability.

16. state and apply Bayes's theorem

17. learn how to use some of the counting rules.

Page 158: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Five: Probability Concepts

144

Chapter Key Terms

(A and B)

(A or B)

Bayes's theorem

Certain event

Combinations rule

Complement event

Conditional probability

Contingency table

Counting rules

Dependent events

Disjoint events

Event

Exhaustive events

Experiment

Factorial

General addition rule

General multiplication rule

Given event

Impossible event

Independent events

Joint probability

Marginal probability

Mutually exclusive events

(Not A)

Permutations rule

Rule of total probability

Sample space

Special addition rule

Special multiplication rule

Sure event

Total probability

Tree diagram

Two-way table

Venn diagrams

Page 159: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

145

Chapter 6

Discrete Probability Distributions

In Chapter 5 we defined probability and introduced some basic tools used in working with probabilities. We now look at problems that can be put in a probabilistic framework. In this chapter, we will consider the fundamentals of discrete random variables and probability distributions and look into the concepts of the mean and standard deviation of a discrete random variable. In addition, we will study five of the most important discrete random variables; the uniform, Bernoulli, binomial, hypergeometric, and Poisson.

Chapter Outline 6.1 Discrete Random Variables

6.2 The Uniform Discrete Probability Distribution

6.3 The Bernoulli and Binomial Probability Distributions

6.4 The Hepergeometric Probability Distribution

6.5 The Poisson Probability Distribution

Page 160: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Six: Discrete Probability Distributions

146

6.1 Discrete Random Variables

Before we discuss discrete random variable, we must know what is meant by random variable. To keep discussion easy, we give a non-mathematical definition of random variable.

Definition 6.1 Random Variable

A random variable is a numerical outcome that results from an experiment.

Number of heads in two coins tosses. Number of children in a family. Weight of a randomly selected student. Distance from home to university.

Those were some examples of random variable.

We usually denote random variables by capitol (uppercase) letters such as X, Y, Z and the values that a random variable takes by a small (lowercase) letters such as x, y, z. So, the above examples each represent a random variable and we write:

X = number of heads in two coins tosses and its values are x = 0, 1, 2.

X = number of children in a family and x = 0, 1, 2, …, n.

X = weight of a randomly selected student and x ≤ 80 kg.

X = distance from home to university and 12 ≤ x ≤ 30 km.

Types of Random variables

There are, basically, two types of random variables; discrete random variables and continuous random variables. Discrete random variables are usually counts, whereas continuous random variables are usually measurements. For each outcome of an experiment's sample space, the random variable can take on exactly one value. Figure 6.1 shows these two types.

Figure 6.1 Types of random variables

Definition 6.2 Discrete Random Variables

Types of Random Variables

Continuous Random Variables

Discrete Random Variables

Page 161: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

147

A discrete random variable is a random variable that can take on a finite or countable set of values.

The values of a discrete random variable are usually whole numbers (0, 1, 2, 3 etc.) that are obtained by counting. These values can be either finite or countable depending on the sample space of the experiment on which the random variable is defined.

Example 6.1 Discrete Random Variables

Experiment Random Variable Possible Values

Answer 25 multiple choice questions

Number of correct answers

0, 1, 2, … , 25

Inspect 30 radios

Number of defective radios

0, 1, 2, … , 30

Make 5 telephone calls

Number of times number is busy

0, 1, 2, 3, 4, 5

Cars passing a bridge between 9:00 am - 11:00 am

Number of cars

0, 1, 2, …

Definition 6.3 Continuous Random Variable

A continuous random variable is a random variable that can take any value in an interval on the real line or in a collection of intervals.

Values of continuous random variables are measurements such as time, speed, volume, probability, etc. When continuous random variable is considered, then only intervals containing that random variable are taken into consideration; it makes no sense to talk about a single value of the random variable. This point will be cleared out later on when we consider continuous random variables in more details.

Example 6.2 Continuous Random Variables

Experiment Random Variable Possible Values

Choosing a student at random from the class

Weight of student (in kg)

50 ≤ x ≤ 100

Choosing a light bulb

Time until light buns out (in minutes)

x ≥ 0

Watching cars passing through on the highway

Car speed (in km/hr)

40 ≤ x ≤100

Recording blood pressure

Blood pressure for men over 50 years

(in mmHg)

60 ≤ x ≤ 180

6.1.1 Probability Distribution for Discrete Random Variable

The probability distribution for a discrete random variable is a table, graph, or formula that describes values a random variable can take on,

Page 162: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Six: Discrete Probability Distributions

148

and the corresponding probability. A discrete probability distribution assigns probabilities (masses) to the individual outcomes. Example 6.3 demonstrates the concept of probability distribution for a discrete random variable.

Example 6.3 Probability Distribution for the Number of Children in Family

Suppose we asked 20 randomly selected families about X = the number of children they have. The result is given in table below. Construct a probability distribution.

Number of children Frequency

0 2 1 5 2 8 3 4 4 1

Total 20 Solution

From the column of frequency we find relative frequency and use it as probability. The result will look like the following table.

Number of children X

P(X = x)

0 0.10 1 0.25 2 0.40 3 0.20 4 0.05

Total 1.00 The above table is a probability distribution of the discrete random variable X. It has two columns; the first is the values of X and the second is the probability associated with each value. Notice that each probability is found by dividing the frequency of a given value of X by the total frequency; 20. For example,

The probabilities here are the masses of the random variable.

We can use the information given in the table to graph what we call the probability mass function (p.m.f.) of the random variable X. The graph is accomplished by putting values of X on the horizontal line and probability on the vertical line. For each value of X, we draw a pin of length equal to the value of the probability, centered at the value of X. Figure 6.2 depicts the probability mass function of X in Example 6.3.

( ) 2P X = 0 = = 0.10.

20

Page 163: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

149

Figure 6.2 Probability mass function for X in Example 6.3

Properties

A probability distribution for a discrete random variable must satisfy two conditions at the same time:

1. 0 ≤≤≤≤ P(X = x) ≤≤≤≤1, and

2. Σ P(X = x) =1.

Property 1 states that all probabilities must be nonnegative numbers between 0 and 1, whereas Property 2 states that probabilities must sum to 1. Again, these two properties must be satisfied together.

Example 6.4 Finding The Probability Distribution

Find the probability distribution of the random variable X = number of heads that turn up when a fair coin is tossed four times.

Solution

The sample space of this experiment consists of 24 = 16 outcomes; namely

S = {HHHH, HHHT, HHTH, HTHH, THHH, HHTT, HTHT, THHT, THTH, TTHH, HTTH, HTTT, THTT, TTHT, TTTH, TTTT}.

Now, the random variable X is defined as the number of heads that appear in each outcome. For example, if the outcome is TTTT then X = 0, and if HHHH, then X = 4. To summarize we can say that

if outcome is then X =

TTTT 0 TTTH, TTHT, THTT, or HTTT 1 HHTT, HTHT, THHT, THTH, TTHH, or HTTH 2

Page 164: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Six: Discrete Probability Distributions

150

HHHT, HHTH, HTHH, or THHH 3 HHHH 4

In order to get the probability distribution, we must assign probability value to each of the values of X. To do so, notice that

1 1 1 1 1P(X 0) P(TTTT) P(T).P(T).P(T).P(T) . . . ,

2 2 2 2 16because the four T's (outcomes) are independent.

= = = = =

Also notice that

P(X 1) P(HTTT, THTT, TTHT, TTTH)

P(HTTT)+P(THTT)+P(TTHT)+P(TTTH) "by Axiom (3) of probability

= ==

because the four events

are disjoint"

1 1 1 1 =

16 16 16 164

.16

+ + +

=

If we continue in this way we will end up with the following table that represents the probability distribution of the random variable X.

X P(X = x)

0 1/16 1 4/16 2 6/16 3 4/16 4 1/16

Observe that each probability is a nonnegative number between 0 and 1 and the sum of probabilities is 1. Figure 6.3 is the probability mass function for the number of heads in Example 6.4.

Figure 6.3 Probability mass function for number of heads

6.1.2 Mean and Variance for Discrete Random Variables

The mean (or expected value) of a random variable is defined as the long-average value of the random variable. It represents the central location of the data. It is denoted by the Greek letter µ (read: mu).

Page 165: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

151

If X is a discrete random variable with values x and corresponding probabilities P(X = x), then the mean (or expected value) of X is defined as

The variance of a random variable measures the amount of spread (variation) of a distribution (or random variable) about the mean. Usually denoted by the Greek letter σ2 (read: sigma square). The standard deviation of the random variable X is the square root of σ2; that is σ.

If X is a discrete random variable with values x and corresponding probabilities P(X = x), then the variance is defined as

The standard deviation of X is defined to be the positive square root of the variance of X, that is

Example 6.5 illustrates the calculations of mean and variance for the discrete random variable.

Example 6.5 Mean and Variance for the Number of Children

in a Family

Go back to the information given in Example 6.3. Determine the mean and variance of X = number of children in a family.

Solution

To determine the mean and variance of X, all we need is the probability distribution of X. This distribution is given below.

x P(X=x)

0 0.10 1 0.25 2 0.40 3 0.20 4 0.05

Thus, the mean of the number of children is

all xx.P(X x)= =∑µµµµ

= (0)(0.10)+(1)(0.25)+(2)(0.40)+(3)(0.20)+(4)(0.05)

= 1.85

∑all x

µ=E(X)= x.P(X=x)

2 2 2= − = − =∑all x

E(X ) (x ) .P(X x)σ µ µσ µ µσ µ µσ µ µ

2σ σσ σσ σσ σ=

Page 166: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Six: Discrete Probability Distributions

152

Remark

It is important to know that the unit of measurement of µ is same as that of X. If X is measured in kg, so is µ.

The variance of the number of children is

Remark

It is also important to know that the unit of measurement of σ2 is the square of that of X. So if X is measured in kg, the variance is measured in (kg)2. This is why we consider the standard deviation σ because it is measured in the same unit of measurement of X.

For the above example, the standard deviation for the number of children is equal to

Remark

The unit of measurement of σ is the same as that of X.

Section 6.2 introduces the most important discrete distributions. These distributions include the uniform, Bernoulli, binomial, hypergeometric, and Poisson distributions. Figure 6.4 shows a list of discrete distributions.

Figure 6.4 Discrete probability distributions

6.2 The Uniform Probability Distribution

If a and b are integers and the possible values of the random variable X are the consecutive integers a, a + 1, a + 2, … , b , where a and b are called the parameters of the distribution, then the probability mass function of the uniform distribution is given by

2 2 2

all x

2 2 2

2 2

E(X ) (x ) .P(X x)

(0 1.85) (0.10) (1 1.85) ((0.25) (2 1.85) (0.40)

(3 1.85) (0.20) (4 1.85) (0.05)

1.0275

= − = − =∑

= − + − + −

+ − + −=

σ µ µσ µ µσ µ µσ µ µ

2 1 0275 1 0137σ σσ σσ σσ σ= = =. .

Page 167: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

153

The graph of the uniform probability distribution is given in Figure 6.5 below.

Figure 6.5 The graph of the uniform distribution

The mean, variance, and standard deviations of the uniform distribution are given by the following formulas:

Example 6.6 The Uniform Discrete Distribution

Toss a fair die. The number that turns up is a uniform random variable which has the following probability mass function:

The graph of the mass function is

The mean, variance, and standard deviation are given by

( )

6 13.5

226 1 1 1 352 2.9167

12 12

2.9167 1.7078

+µ = =

− + −σ = = =

σ = =

11− +b a

( ) 1 6= = ≤ ≤1P X k for k .

6

16

( ) 1

1P X k , for a k b.

b a= = ≤ ≤

− +

( )

( )

22

2

2

1 1

12

1 1

12

a b

b a

b a

µµµµ

σσσσ

σσσσ

+=

− + −=

− + −=

Page 168: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Six: Discrete Probability Distributions

154

6.3 The Bernoulli and Binomial Probability Distributions

We first begin by defining what we call a trial. When we perform a random experiment like tossing a coin three times then we refer to each toss as a trial. And we refer to the experiment of tossing the coin three times as a sequence of three trials. Make up your own example of a random experiment that is a sequence of trials.

Bernoulli Trials are a special kind of trials having the following properties:

1. Each trial has one of two possible outcomes which we refer to as success (s) and failure (f); (like heads and tails, pass and fail, married and single).

2. The probability p of success is the same for each trial.

3. The trials are independent.

Bernoulli distribution consists of one trial. It can result in one of two outcomes; success or failure where the probability of success, p, is fixed and 0 < p < 1. The Bernoulli random variable X is defined as the number of successes; that is, X takes either the value 1 if outcome is success (meaning the characteristic of interest is present) or the value of 0 if not. The probability mass function of the Bernoulli random variable is given by

The probability mass function can be written in the following formula:

Note that the two formulas are the same.

The graph of this distribution is given in Figure 6.6.

( ) 1

1 0

p, XP X x

p, X

== = − =

( ) ( )11xxp p , x = 0 or 1

P X x0, otherwise.

− −= =

Page 169: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

155

Figure 6.6 The probability mass function of the Bernoulli distribution

The mean, variance, and standard deviation of the Bernoulli distribution are given by the following formulas.

Example 6.7 Bernoulli Distribution

Consider the experiment of rolling a single fair die once. Suppose we are interested in having the number 6; this is the success and every other number that turns up is failure. In this case the random variable X = number of successes is a Bernoulli random variable. It has the following probability mass function:

( )1

0

== = =

1, X

6P X x5

, X6

Now we turn our attention to another distribution, the binomial, which is closely related to the Bernoulli distribution. The binomial distribution is widely used in real-life applications.

A random experiment consisting of n independent repeated Bernoulli trials is called a binomial experiment. The random variable X which is defined as the number of successes has a binomial distribution with two parameters n and

( )( )

2 1

1

p

p p

p p

µµµµ

σσσσ

σσσσ

=

= −

= −

16

56

21 5 5The mean is = , the variance = , and the standard deviation = .

6 36 6µ σ σµ σ σµ σ σµ σ σ

Page 170: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Six: Discrete Probability Distributions

156

p. The variable has n + 1 possible values ranging from 0 to n. The probability mass function for the binomial distribution is

where:

p = probability of success in each trial.

Note that, sometimes we write q instead of 1-p.

The graph of the binomial distribution for n = 10 and p = 0.5 is depicted in Figure 6.7.

Figure 6.7 Graph of binomial distribution with parameters n = 10 and p = 0.5

The mean, variance and standard deviation are given by the following formulas.

Example 6.8 Binomial Distribution

Roll a fair die five times, (this is the same as rolling five dice once). If we are interested in having the number that turns up to be a 6, then the random variable X = number of times a 6 turns up is a binomial random variable with

parameters n = 5 and p =1

6. The probability mass function of X is given by:

( ) ( ) ( )x 5 x5 51x 6 6

P X x C , x 0,1,2,3,4,5.−

= = =

0 1 2 3 4 5 6 7 8 9 10x

P(X=x)

0

0.05

0.10

0.15

0.20

0.25

( )( )

2 1

1

np

np p

np p

=

= −

= −

µµµµ

σσσσ

σσσσ

1 0 1 2n x n xxP(X x) C p ( p) , x , , , ..., n−= = − =

nx

n!C ,

x!(n x)!=

Page 171: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

157

Table 6.1 is the probability distribution; that is values and probabilities, for X.Table 6.1 Probability Distribution for a Binomial

Random Variable with n = 5 and P = 1

6

X P(X = x)

0

0 550

1 5C 0.4019

6 6 =

1

1 451

1 5C 0.4019

6 6 =

2

2 352

1 5C 0.1608

6 6 =

3

3 253

1 5C 0.0322

6 6 =

4

4 154

1 5C 0.0032

6 6 =

5

5 055

1 5C 0.0001

6 6 =

Total 1.0000

The probability mass function of X is given in Figure 6.8.

Figure 6.7 The probability distribution of a binomial distribution with n = 5 and p

=1

6

Now let's determine the following probabilities:

a. No 6.

b. At least one 6.

c. At most one 6.

d. The number of 6's is between 2 and 4, inclusive.

P(X

= x

)

Page 172: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Six: Discrete Probability Distributions

158

These probabilities are easily calculated from the probability distribution of X.

(a) The probability of no 6 is P(X = 0) = 0 5

50

1 5C 0.4019

6 6 =

(b) The probability of at least one 6 is

P(X ≥ 1) = P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4) + P(X = 5)

= 1 – P(X = 0)

= 1 – 0. 4019

= 0.5981

(c) The probability of at most one 6 is

P(X ≤ 1) = P(X = 0) + P(X = 1)

= 0.4019 + 0.4019

= 0.8038

(d) The probability that the number of 6's is between 2 and 4, inclusive, is

P(2 ≤ X ≤ 4) = P(X = 2) + P(X = 3) + P(X = 4)

= 0.1608 + 0.0322 + 0.0032

= 0.1962

The expected number (or the mean) of 6's that appear is np = 5(1

6) = (

5

6),

whereas the standard deviation is 1 5 5

5 .6 6 6 =

What a coincidence.

6.4 The Hypergeometric Probability Distribution

Both the binomial distribution and the hypergeometric distribution are concerned with the same thing- the number of successes in a sample containing n observations. The difference between these two distributions is the manner in which the data is obtained. In the binomial distribution, the sample is drawn with replacement from a finite population or without replacement from an infinite population. On the other hand, in the hypergeometric distribution, the sample is drawn without replacement from a finite population.

This hypergeometric distribution is the finite population generalization of the binomial distribution, where we have a population of N items, k of the items are assigned the label 1 (those items that represent success) and N – k are assigned the label 0 (those items that represent failure). Select at random n items from the N available without replacement. The number of items labeled 1 is the random variable of interest.

Page 173: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

159

For example, suppose you are dealt a hand of 5 cards from a well-shuffled standard deck of 52 cards. Before looking at the hand you wonder about the number of aces among the five cards. We have a fixed number from which to draw (N = 52 cards), the number of aces in the deck (k = 4) and the sample is 5 cards (n = 5). Observe that this random variable has three parameters N, k, and n, all positive and integers such that 0 ≤ k ≤ N, 1 ≤ n ≤ N. Note that a combination is zero if the bottom number is greater than the top number.

The probability mass function of the hypergeometric distribution is given by:

The mean, variance, and standard deviation of the hypergeometric distribution are given by the following formulas.

Example 6.9 Hypergeometric Distribution

A box contains 20 light bulbs with 4 defective ones. A random sample of 3 lights is to be selected from the box. What is the probability the sample contains 2 defectives?

Solution

Here the population of N = 20 light bulbs is finite. Also, k = 4 are defectives (successes) and the sample size is n = 3. Therefore, the probability mass function of X = number of defectives is

2

1

1

µ =

− − σ = −

− − σ = −

nkN

k N k N nn

N N N

k N k N nn

N N N

( ) 0 1 2−−= = =

k N kx n x

Nn

C .CP X x for x , , , ..., n.

C

Page 174: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Six: Discrete Probability Distributions

160

The mean is ( )3 4

0 620

.µ = = and the standard deviation is

6.5 The Poisson Probability Distribution

Poisson probability distribution is used in situations where there are arrivals that occur randomly but independently in time. Let the average arrival rate be equal to λ per unit of time and let the time interval be t. Then, one would

expect the number of arrivals during the interval to be θ = λt. The random variable X = actual number of arrivals occurring in the interval has the Poisson distribution with parameter θ.

The probability mass function of the Poisson probability distribution is given by:

Observe that all random variables we considered so far are of the finite type. However, the Poisson random variable is countable.

It is interesting to know that the mean and variance for the Poisson probability distribution are equal and they both equal θ, the parameter of the distribution.

Instances where the Poisson Distribution Can be Used

The number of telephone calls received per day in a swithboard.

The number of patients arriaving an emergency room in a hospital between 4:00 p.m. and 5:00 p.m.

The number of typographical errors on a page.

The number of bacteria on a plate.

( )

( )

4 163

203

4 162 1

203

4 162 2 1 15

203 17

0 1 2 3

2

0 0842

x x

! !! ! ! !

!! !

C .CP X x , x , , , and

C

C .CP X

C

.

−= = =

= =

×=

=

4 20 4 20 33 0 6553

20 20 20 1. .

− − σ = = −

( ) 0 1 2ke

P X k , > 0 and k , , , ...k !

−θθ= = θ =

2

=

=

µ θµ θµ θµ θ

σ θσ θσ θσ θ

Page 175: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

161

The number of white blood cells in a blood suspension.

The number of imperfections in a surface of wood or metal.

Example 6.10 Poisson Distribution

A traffic engineer is interested in the traffic intensity at a particular street corner during the 7:00 a.m. – 8:00 a.m. time period. He used a mechanical device to count the number of vehicles passing the corner during the one hour interval for several days of the week. Although the numbers observed are highly variable, the average number is 100 vehicles.

Using the Poisson distribution, we can model the probabilities for any interval of time; 1 minute, 15 minutes, 30 minutes, etc. However consider a 1 minute period. The random variable X = the number of vehicles passing during the 1 minute period. Then, the parameter of the distribution is

θ = (100/hour) (1/60) = 10/6 = 1.666.

The probability mass function is then given by:

From the above distribution, various probability statements can be made. For example:

The mean is equal to the variance and both are equal to 1.666.

( ) ( )( )

( ) ( ) ( )

01 666

0

2 3

-1.666

P No car passes the corner P 0

e . =

! = 0.1890

P At least two cars pass P X P X ...

=

= = + = +

( ) ( )( )11 666

1

-1.666

=1- P X=0 - P X=1

e . =1- 0.1890 -

! =1- 0.1890 - 0.3149

=0.4991

( ) ( )1 666 1 6660 1 2

k.e .P X k , k , , , ...

k !

−= = =

Page 176: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Six: Discrete Probability Distributions

162

Exercises

6.1 Name each of the following variables as being discrete or continuous.

• Number of times you dial your telephone to get an answer at the other end.

• Number of ships in a seaport.

• Pressure.

• Height of mountains.

• Number of absent students in class.

• Number of siblings.

• Cholesterol level.

6.2 X is a discrete random variable with the following probability mass function:

x 0 1 2 3 4 5

P(X = x) 0.3 0.4 0.1 c 0.05 0.05

a. Determine P(X = 3) = c.

b. Determine P(X > 2).

c. Determine P(X < 3).

d. Determine P(1 ≤ X ≤ 4)

e. Calculate µ.

f. Calculate σ2 and σ.

g. Find the interval 2 .µ ± σ

h. What is the probability that X is in this interval?

6.3 Choose a digit at random from 0, 1, 2, … , 9. Define X = the digit chosen.

a. Find the probability mass function of the random variable X.

b. Draw the probability mass function. The possible values of X are listed horizontally and above

each put a pin whose height is P(X = x).

c. Can you name this random variable?

d. Find µ, σ2, and σ.

e. Find the interval µ ± 2σ.

f. What proportion of the measurements fall into the interval µ ± 2σ?

6.4 A bent coin is tossed 4 times. Let the random variable X = number of heads. Find the probability distribution of X if the probability of heads is 0.60. Find µ and σ.

6.5 Let the random variable X have a discrete uniform distribution on the integers 0, 1, 2, … , 100. Determine the mean and variance of X.

6.6 Suppose X has a discrete uniform distribution on the integers 0 through 9. Determine the mean, variance, and standard deviation of X.

6.7 An exam has 50 multiple choice questions each with 4 choices. Suppose you have to answer 30 correct answers to pass the exam.

a. What would be your expected score?

b. What is the standard deviation?

c. What is the probability you will pass?

6.8 Find the mean and standard deviation for the given binomial distributions.

a. n = 1000, p = 0.3.

b. n = 400, p = 0.02.

c. n = 600, p = 0.40.

Page 177: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

163

6.9 Roll two fair dice one red and one green. Let X = sum of the faces of the two dice. Construct the probability distribution for X. Graph it. Find the mean and variance of X.

6.10 Lightning strikes tall building 5 times a year on the average. Calculate the standard deviation and the probabilities that the building is struck 3 times, 4 times, and 5 times.

6.11 Let X be a Poisson random variable with parameter θ = 3. Calculate the following probabilities:

a. P(X = 0).

b. P(X = 1).

c. P(X = 2).

d. P(X > 1).

e. P(X =3).

6.12 In a shipment of 60 apples, 5 are rotten. A sample of 4 apples is selected and X = the number of rotten apples in the sample. Determine

a. The average number of rotten apples in the sample.

b. Determine the standard deviation.

c. Find the probability that all the sample is rotten.

d. Find the probability that none are.

6.13 The table below gives the number of days of sick leave for 100 employees in a year.

Days 0 1 2 3 4 5 6 7

Number of Employees

10

20

20

15

15

10

5

5

One employee is selected at random and X = number of days of sick leave.

a. Construct the probability distribution for X. Graph it.

b. Determine P(X ≤ 6).

c. Determine P(X ≥ 2).

d. Determine P(2 ≤ X ≤ 6).

6.14 Show that for a discrete random variable X, if each of values is multiplied by the constant c, then the effect is to multiply the mean of X by c and the variance of X by c2. That is, show that E(cX) = cE(X) and V(cX) = c2V(X), where V stands for variance of.

Page 178: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Six: Discrete Probability Distributions

164

Chapter Learning Outcomes

When you complete your careful study of this chapter you should be able to:

11.. uussee aanndd uunnddeerrssttaanndd tthhee ffoorrmmuullaass ggiivveenn iinn tthhiiss cchhaapptteerr..

22.. ddeetteerrmmiinnee tthhee pprroobbaabbiilliittyy ddiissttrriibbuuttiioonn ooff aa ddiissccrreettee rraannddoomm vvaarriiaabbllee..

33.. ccoonnssttrruucctt aa pprroobbaabbiilliittyy mmaassss ffuunnccttiioonn..

44.. ggrraapphh aa pprroobbaabbiilliittyy mmaassss ffuunnccttiioonn

55.. uunnddeerrssttaanndd tthhee pprroobbaabbiilliittyy ddiissttrriibbuuttiioonn ooff aa rraannddoomm vvaarriiaabbllee uussiinngg tthhee empirical or relative frequency definition of probability.

66.. ccaallccuullaattee tthhee mmeeaann,, vvaarriiaannccee,, aanndd ssttaannddaarrdd ddeevviiaattiioonn ooff aa ddiissccrreettee rraannddoomm

vvaarriiaabbllee..

77.. ddeeffiinnee aanndd aappppllyy tthhee uunniiffoorrmm ddiissccrreettee pprroobbaabbiilliittyy ddiissttrriibbuuttiioonn..

88.. lleeaarrnn hhooww ttoo ccaallccuullaattee pprroobbaabbiilliittiieess ffoorr aa uunniiffoorrmm ddiissttrriibbuuttiioonn..

99.. compute the mean, variance, and standard deviation for aa uunniiffoorrmm ddiissttrriibbuuttiioonn..

1100.. ddeeffiinnee aanndd aappppllyy tthhee ccoonncceepptt ooff BBeerrnnoouullllii ttrriiaallss..

1111.. aassssiiggnn pprroobbaabbiilliittiieess ttoo tthhee oouuttccoommeess iinn aa BBeerrnnoouullllii ttrriiaall..

1122.. compute the mean, variance, and standard deviation for Bernoulli distribution.

1133.. ddeeffiinnee bbiinnoommiiaall ddiissttrriibbuuttiioonn aanndd oobbttaaiinn pprroobbaabbiilliittiieess..

1144.. compute the mean, variance, and standard deviation for bbiinnoommiiaall ddiissttrriibbuuttiioonn..

1155.. define hypergeometric distribution and obtain probabilities.

16. compute the mean, variance, and standard deviation for hypergeometric

distribution.

17. define the Poisson distribution and obtain probabilities.

18. compute the mean, variance, and standard deviation for Poisson distribution.

Page 179: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

165

Chapter Key Terms

Bernoulli trials

Bernoulli random variable

Bernoulli probability mass function

Binomial distribution

Binomial probability mass function

Binomial random variable

Continuous random variable

Discrete random variable

Expected value

Failure

Hypergeometric distribution

Hypergeometric probability mass function

Mass

Mean of a discrete random variable

Number of successes

Poisson distribution

Poisson probability mass function

Poisson random variable

Probability distribution

Properties of a discrete distribution

Random variable

Standard deviation of a discrete

random variable

Success

Trial

Uniform distribution

Uniform probability mass function

Uniform random variable

Variance of a discrete random

variable

Page 180: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Six: Discrete Probability Distributions

166

Page 181: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

167

Chapter 7 The Normal Probability Distribution

In Chapter 6 we introduced some discrete probability distributions. In this chapter we concentrate on the most important probability distribution in statistics, namely, the normal distribution. The normal distribution is one of many continuous distributions. These distributions arise as a result of measurement process on various phenomena of interest. Continuous Random variables have important applications in a wide variety of disciplines ranging from medical and engineering sciences to business and economics sciences. Some examples of continuous random variables are weight, time, speed, blood pressure as well as time between arrivals of airplanes, the amount of rainfall in a city, and customer servicing time. To obtain probabilities, expected values, and variances for continuous random variables require our knowledge of integral calculus, a subject on which we go briefly by introducing the uniform continuous distribution and some properties of a continuous random variable. However, the normal distribution is considered so important in applications that special probability tables were prepared to avoid the tedious and laborious mathematical computations. As we will see later on, many random variables are either normally distributed or can be approximated by a normal distribution. We begin this chapter by introducing continuous probability distributions, their properties, the uniform continuous distribution, and the normal distribution. For the normal distribution, we turn our attention to its properties, the probability density function and its graph, comparing two or more normal distributions, the standard normal distribution, interpreting the value of z, the 68.26-95.44-99.74 rule, and finally the percentiles of the standard normal distribution.

Page 182: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Seven: The Normal Probability Distribution

168

Chapter Outline 7.1 Continuous Probability Distributions 7.2 The Uniform Continuous Distribution 7.3 Properties of the Normal Distribution 7.4 The Probability Density Function of the Normal Distribution and its Graph 7.5 Comparing Two or More Normal Distributions 7.6 The Standard Normal Distribution 7.7 Interpreting the Meaning of the Value of z 7.8 The 68.26-95.44-99.74 Rule 7.9 The percentiles of the Standard Normal Distribution

Page 183: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

169

7.1 Continuous Probability Distributions

As we said earlier in Chapter 6, a continuous random variable can assume any value in an interval on the real line or in a collection of intervals. It is usually the result of a measurement. The probability distribution of a continuous random variable is described by a density curve called the probability density function (p.d.f.) denoted by f(x).

Unlike the discrete random variable, we cannot simply plug values of the random variable into the probability density function and get probability values directly. For continuous random variables, P(X = c) = 0 for any constant value c, we talk about the probability of the random variable assuming a value within a given interval. In order to determine the probability that a continuous random variable assume a value in an interval, we must first know the function f(x).

Properties of a Continuous Probability Distribution

The probability density function, f(x), of the continuous random variable X must satisfy the following two conditions at the same time:

( )

( )

1 0

2 1∞

−∞

=∫

. f x , and

. f x

This means that the probability density function, f(x), is non-negative and the total area under the graph of f(x) equals 1.

From the above two properties we can easily see that

1. The probability of a continuous random variable assuming a value within some interval from a to b, look at Figure 7.1, is defined to be the area under the curve of the probability density function between a and b. That is

2. The probability of a continuous random variable assuming a specific value, say c, is zero, that is; P(X = c) = 0; (there is no area under any graph at an exact point).

( ) ( )≤ ≤ = ∫b

aP a X b f x dx.

Page 184: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Seven: The Normal Probability Distribution

170

Figure 7.1 Area under the curve from a to b

Finding the Probability that a Continuous Random Variable Falls within an Interval

In order to find the probability that a continuous random variable, X, falls in an interval between a and b do the following:

1. Graph the probability density function.

2. Identify the interval of interest on the x-axis.

3. Shade the area under f(x) in this interval.

4. Compute the area of the shaded region.

The area under the shaded region is the probability that X will fall between a and b (look again at Figure 7.1).

Example 7.1

Suppose we have a continuous random variable X with the following probability density function:

( )23

1 120

x , - xf x

, otherwise.

≤ ≤=

Determine the probability that X falls between 1 1

2 2 and .−

Page 185: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

171

Solution

Step 1. Draw the function f(x).

Step 2. Identify the interval of interest.

Step 3. Shade the area under f(x) in this interval.

Step 4. Compute the area of the shaded region.

The area of the shaded region is ( ) ( )1 2 3 323 3 1 1 1

2 6 2 2 81 2

x dx .−

= − − =∫

Thus the probability that X falls between 1 1

2 2 and − is

1

8.

Page 186: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Seven: The Normal Probability Distribution

172

The Mean and Variance of Continuous Random Variable

If the random variable X is continuous and has probability density function f(x), the mean and variance are given by:

The standard deviation of the random variable X is then given by 2 .σ = σσ = σσ = σσ = σ

Example 7.2 Finding Mean and Variance: The Continuous Case

The mean and variance of the random variable X of Example 7.1, whose probability density function f(x) is given by

( )23

1 120

x , - xf x

, otherwise

≤ ≤=

are simultaneously given by the following:

(((( )))) (((( )))) (((( ))))

(((( )))) (((( ))))

1 12

1 1

4 x 1x 1

4 4

3E X x 0 dx x x dx x 0 dx

2

3 0 x 0

83

1 18

0,

and

− ∞− ∞− ∞− ∞

−∞ −−∞ −−∞ −−∞ −

=====−=−=−=−

µ = = + +µ = = + +µ = = + +µ = = + +∫ ∫ ∫∫ ∫ ∫∫ ∫ ∫∫ ∫ ∫

= + += + += + += + +

= − −= − −= − −= − −

====

(((( ))))(((( ))))

2 2 2

1 22 2

1

x 1x 1

E X

3 = x x dx 0

2

3 x

106

.10

−−−−

=====−=−=−=−

σ = − µσ = − µσ = − µσ = − µ

−−−−∫∫∫∫

====

====

(((( ))))

(((( )))) (((( )))) (((( ))))

(((( ))))

222 2

2 2

E(X) xf x dx

E X E X E X

x f x dx .

∞∞∞∞

−∞−∞−∞−∞

∞∞∞∞

−∞−∞−∞−∞

µ = =µ = =µ = =µ = = ∫∫∫∫

σ = − µ = −σ = − µ = −σ = − µ = −σ = − µ = −

= − µ= − µ= − µ= − µ∫∫∫∫

Page 187: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

173

Therefore, 6

.10

σ =σ =σ =σ =

7.2 The Continuous Uniform Distribution

A random variable is uniformly distributed over the interval from a to b whenever it is equally likely that the random variable could take on any values in the interval from a to b. The uniform random variable has the following probability density function:

The graph of the uniform probability density function is given in Figure 7.2.

Figure 7.2 Uniform probability density function

The mean, variance, and standard deviation of the uniform continuous distribution are given by the following formulas:

(((( ))))22

2

2

12

++++µ =µ =µ =µ =

−−−−σ =σ =σ =σ =

σ = σσ = σσ = σσ = σ

b a,

b a, and

.

Example 7.3 Working with the Uniform Continuous Distribution

Suppose the service time at a resturant is uniformly distributed between 10 and 25 minutes. The probability density function is

(((( ))))1

, 10 x 25f x 15

0, otherwise

≤ ≤≤ ≤≤ ≤≤ ≤====

where x = the service time in minutes for a customer.

(((( ))))1

0

, a x bf x b a

, otherwise.

≤ ≤≤ ≤≤ ≤≤ ≤==== −−−−

1b a−−−−

Page 188: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Seven: The Normal Probability Distribution

174

a. What's the probability the time it will take to service a customer is between 15 to 20 minutes?

b. Find the mean and variance of X.

Solution

a. (((( ))))20

15

1 1P 15 X 20 dx .

15 3≤ ≤ = =≤ ≤ = =≤ ≤ = =≤ ≤ = =∫∫∫∫

Thus the probability it takes between 15 and 20 minutes for a customer to get serviced is 33.33%. Moreover, we can conclude that 33.33% of the restaurant customers will wait between 15 and 20 minutes for service.

b. The mean of X is 10 25

17.5 minutes2

++++ ==== ,

and the variance is (((( )))) (((( ))))

2225 10

18.75 minutes .12

−−−−====

So, the standard deviation is 18.75 4.330==== minutes.

7.3 Properties of the Normal Distribution

The normal is the most important distribution at all. It is widely used in a variety of applications in everyday real-life problems. Also, it is widely used in various statistical inference techniques.

Properties

• Bell-shaped.

• Symmetrical.

• Mean, median, and mode are equal.

• Normal variable has an infinite range.

• It has two parameters; the mean µ and the standard deviation σ.

Figure 7.3 shows the graph of a normal distribution with mean µ and standard deviation σ. The mean measures the location of the distribution and the variance measures the spread.

115

Page 189: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

175

Figure 7.3 The Normal distribution with mean µ and standard deviation σ

7.4 The Probability Density Function of Normal and its Graph

The probability density function of the normal random variable is given by:

Remark

If the random variable X is normally distributed with mean µ and standard deviation σ, we then write X ~ N(µ, σ). Figure 7.4 represents the graph of a generic normal distribution with mean µ and standard deviation σ.

211 22

3 14159 2 71828

x

f(x) e ,

where

f(x) : density function of X

. , e .

: population mean

: population standard deviation

x : a value of the random var iable X

− − =

≈ ≈

µµµµσσσσ

σ πσ πσ πσ π

ππππµµµµσσσσ

Page 190: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Seven: The Normal Probability Distribution

176

Figure 7.4 Graph of generic normal distribution

7.5 Comparing Two or More Normal Distributions

There is infinite number of normal density functions, one for each combination of µ and/or σ. Every time we change µ and/or σ, we get a different normal distribution, see Figure 7.5 below. The mean can take any real value; negative, zero, or positive, whereas the standard deviation can only take positive values.

µ1 µ2

σ

4(µ1, σ1)

σ1σ1

σ2σ24(µ1, σ2)

4(µ2, σ) σ

µ3

σ4(µ3, σ)

σ

Graph (a) Graph (b)

Graph (d)µ1

Graph (c)

Figure 7.5 Different normal distributions

From Figure 7.5 observe that two (or more) normal distributions can have the same mean but different standard deviations (graphs (a) and (c)), different means but same standard deviation (graphs (b) and (d)), or different means and different standard deviations (graphs (a) and (b) or (a) and (d)).

Page 191: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

177

7.6 The Standard Normal Distribution

Definition 7.1 The standard normal distribution

The standard normal distribution is a normal distribution with mean 0 and standard deviation 1. It is also called the z-distribution or the standard normal score. A value of z is calculated using the following formula:

A z-value represents the distance between an observed value of x and the mean µ measured in units of standard deviation σ. It can be positive, negative, or zero. If x is larger than µ then z is positive, if x is smaller than µ then z is negative, and if x = µ then z is zero. If the random variable Z is normally distributed with mean 0 and standard deviation 1, we then write Z ~ N(0, 1). Figure 7.6 depicts standard normal distribution.

Figure 7.6 The standard normal distribution

Observe what the transformation z = (x - µµµµ) / σ does. It transforms the normal random variable X that has mean µ and standard deviation σ into a standard normal random variable Z that has mean zero and standard deviation 1. This means that if X ~ N(µ, σ) and a and b are numbers then the area under this normal curve that lies between a and b is the same as the area under the standard normal curve that lies between (a - µ) / σ and (b - µ) / σ. Figure 7.7 summarizes this fact graphically.

Figure 7.7 Finding area for normally distributed random variable from area under the standard normal curve

x - µz =

σ

Page 192: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Seven: The Normal Probability Distribution

178

Basic Properties of the Standard Normal Curve

1. The total area under the standard normal curve is equal to 1.

2. The standard normal curve is symmetric about 0; that is, the part of the curve to the left of the dashed line in Figure 7.6 is the mirror image of the part to the right of it.

3. The standard normal curve extends indefinitely in both directions; approaching, but never touching the x-axis.

4. Most of the area under the standard normal curve lies between -3 and 3.

How Do We Find Probabilities?

Recall from Section 7.1 that probabilities for continuous random variable are areas under the density curve of that random variable. Because the standard normal curve plays an important role in statistics, tables of the areas under it have been constructed. Such a table can be found at the end of the book and is called Cumulative Standard Normal Table that gives P(Z≤≤≤≤ z). Table 7.1 below is a portion of that table.

Table 7.1 Portion of the cumulative standard normal table

z 0.00 0.01 0.02 0.03

1.0 0.8413 0.8438 0.8461 0.8485

1.1 0.8463 0.8665 0.8686 0.8708

1.2 0.8849 0.8869 0.8888 0.8907

1.3 0.9032 0.9049 0.9066 0.9082

Example 7.4 Finding the Area to the Left of a Specified z-value Determine the area under the standard normal curve to the left of 1.12.

Solution

We use the standard normal table. First we go down the left-hand column, labeled z, to 1.1. Then we go across that row until we are under the 0.02 in the top row. The value in the body of the table there is 0.8686. This is the area under the standard normal curve that lies to the left of 1.12, as shown in Figure 7.8.

Probabilities

Page 193: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

179

Figure 7.8 Area under the standard normal curve to the left of 1.12

Observe that the area under the standard normal curve to the left of 1.12 is equivalent to the probability P(Z < 1.12).

Example 7.5 Finding the Area to the Right of a Specified z-value Determine the area the standard normal curve to the right of 1.31.

Solution

Since the total area under the standard normal curve is equal to 1 (Property 1 above), the area to the right of 1.31 equals 1 minus the area to the left of 1.31. This latter area can be found from the cumulative standard normal table in the same way of Example 7.4.

First we go down the left-hand column, labeled z, to 1.3. Then we go across that row until we are under the 0.01 in the top row. The value in the body of the table there is 0.9049. This is the area under the standard normal curve that lies to the left of 1.31. Therefore, the area under the standard normal curve that lies to the right of 1.31 is 1 – 0.9049 = 0.0951, as shown in Figure 7.9.

Figure 7.9 Area under the standard normal curve to the right of 1.31

Observe that the area under the standard normal curve to the right of 1.31 is equivalent to the probability P(Z > 1.31).

Page 194: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Seven: The Normal Probability Distribution

180

Example 7.6 Finding the Area between Two Specified z-values Determine the area under the standard normal curve that lies between 1.02 and 1.33.

Solution

The area under the standard normal curve that lies between 1.02 and 1.33 equals the area to the left of 1.33 minus the area to the left of 1.02. The cumulative standard normal table shows that the area to the left of 1.33 is 0.9082 and that the area to the left of 1.02 is 0.8461. Therefore, the area under the standard normal curve that lies between 1.02 and 1.33 is the difference 0.9082 – 0.8461 = 0.0621, as shown in Figure 7.10.

Figure 7.10 Finding the area under the standard normal curve that lies

between 1.02 and 1.33

Observe that the area under the standard normal curve that lies between 1.02 and 1.33 is equivalent to the probability P(1.02 < Z < 1.33).

Finding the z-Value that Corresponds to a Specified Area

In the discussion above, we have learned how to use the cumulative standard normal table to find the area under the standard normal curve that lies to the left of a specified z-value, to the right of a specified z-value, and between two specified z-values. Now we want to learn how to use the table to find the z-value corresponding to a specified area under the standard normal curve.

Example 7.7 Finding the z-value for a Specified Area

Determine the z-value having area 0.8460 to its left under the standard normal curve. This is the same as saying find c such that P(Z < c) = 0.8460.

Solution

Again, we use the cumulative standard normal table to obtain the z-value corresponding to the area 0.8460. What we do now is we search the body of the table for the area 0.8460. After searching, we find no such area in the table, so

Page 195: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

181

we take the area that is the closest to 0.8460, which is 0.8461. Observe that the z-value corresponding to that area is 1.02 as seen in Figure 7.11.

Figure 7.11 Finding the z-value having area 0.8460 to its left

Example 7.7 illustrates what we do when there is no area-entry in the cumulative standard normal table exactly equal to the one desired, but there is one area-entry closest to the one desired. In this case we take the z-value corresponding to the closest area-entry as an approximation to the required z-value.

There are two more cases to be considered. One case is when there is an area-entry in the cumulative standard normal table exactly equal to the one desired; in which case we take the z-value that corresponds to that area-entry exactly as it is. The other case is when there is no area-entry in the cumulative standard normal table exactly equal to the one desired, but there are two area-entries equally closest to the one desired; in this case, we take the average of the two corresponding z-values as an approximation of the required z-value. We illustrate this case in the following example.

Example 7.8 Finding the z-value for a Specified Area that is Equally Closest to the One Desired

Determine the z-value having area 0.8859 to its left under the standard normal curve.

Solution

We use the cumulative standard normal table to obtain the z-value corresponding to the area 0.8859. What we do now is we search the body of the table for the area 0.8859. After searching, we find no such area in the table, so we take the two areas that are the closest to 0.8859, which are 0.8849 and 0.8869. Observe that the z-value corresponding to the first area is 1.20 and the z-value that corresponds to the second is 1.21. So, we take as our approximation the z-

value halfway between 1.20 and 1.21; that is, 1.20 1.21

z 1.205,2

+= = as seen in

Figure 7.12.

Page 196: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Seven: The Normal Probability Distribution

182

Figure 7.12 Finding the z-value having area 0.8849 to its left

In the next paragraph, we introduce an important notation named the zα notation. This notation will be used frequently in the chapters to come. As we will see, it is often necessary to determine the z-value having a specified area to its right.

Definition 7.2 The zα Notation

The zα symbol is used to denote the z-value having area α to its right under the standard normal curve, as shown in Figure 7.13.

Figure 7.13 The zα Notation

Example 7.9 Finding zα

Use the cumulative standard normal table to find

a. z0.20 b. z0.10 c. z0.05 d. z 0.025

Solution

a. z0.20 is the z-value having area 0.20 to its right under the standard normal curve, see Figure 7.14.

Figure 7.14 Finding z0.20

Page 197: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

183

Since the area under the standard normal curve to the right of z0.20 is 0.20, the area to its left is 1 – 0.20 = 0.80. We search the body of the table for the area 0.8000 and find that such an area-entry does not exist. So we take the two areas that are the closest to 0.8000, which are 0.7995 (corresponding to z = 0.84) and 0.8023 (corresponding to z = 0.85). Since the area 0.8000 is closer to 0.7995 than 0.8023, we approximate the desired z by z0.20 = 0.84.

b. z0.10 is the z-value having area 0.10 to its right under the standard normal curve, see Figure 7.15.

Figure 7.15 Finding z0.10

Since the area under the standard normal curve to the right of z0.10 is 0.10, the area to its left is 1 – 0.10 = 0.90. We search the body of the table for the area 0.9000 and find that such an area-entry does not exist. So we take the two areas that are the closest to 0.9000, which are 0.8997 (corresponding to z = 1.28) and 0.9015 (corresponding to z = 1.29). Since the area 0.9000 is closer to 0.8997 than 0.9015, we approximate the desired z by z0.10 = 1.28.

c. z0.05 is the z-value having area 0.05 to its right under the standard normal curve, see Figure 7.16.

Figure 7.16 Finding z0.05

Since the area under the standard normal curve to the right of z0.05 is 0.05, the area to its left is 1 – 0.05 = 0.95. We search the body of the table

Page 198: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Seven: The Normal Probability Distribution

184

for the area 0.9500 and find that such an area-entry does not exist. So we take the two areas that are the closest to 0.9500, which are 0.9495 (corresponding to z = 1.64) and 0.9505 (corresponding to z = 1.65). Since the area 0.9000 is equally close to 0.9495 and 0.9505, we approximate the desired z by the average of the two z-values; that is, z0.05 = 1.645.

d. z0.025 is the z-value having area 0.025 to its right under the standard normal curve, see Figure 7.17.

Figure 7.17 Finding z0.025

Since the area under the standard normal curve to the right of z0.025 is 0.025, the area to its left is 1 – 0.025 = 0.975. We search the body of the table for the area 0.9750 and find that such an area-entry does exist. It corresponds to z-value of 1.96. So z0.025 = 1.96.

7.7 Interpreting the Meaning of the Value of z

Suppose that the scores of an introductory course in statistics are known to be normally distributed with mean µ = 70 and standard deviation σ = 5. How do we interpret the meaning of the z-values that correspond to the scores 75 and 60?

Using the standardized normal score, x

zµµµµ

σσσσ−= , the score 75 is transformed to

75 70z 1

5

−= = + , (see Figure 7.18). It is clear that the score 75 is equivalent to 1

standardized unit (i.e., 1 standard deviation) above the mean. On the other

hand, the score of 60 is transformed to 60 70

z 25

−= = − , (see Figure 7.18 again).

This is equivalent to 2 standardized units (i.e., 2 standard deviations) below the mean.

Thus the standard deviation has become the unit of measurements. In other words, a score of 75 is 5 (i.e., 1 standard deviation) higher than the mean of 70, and a score of 60 is 2 (i.e., 2 standard deviations) lower than the mean of 70.

Page 199: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

185

Figure 7.18 Transformation of scales from normal to standard normal

7.8 The 68.26-95.44-99.74 Rule of the Standard Normal Distribution

The 68.26-95.44-99.74 rule is also known as the Empirical Rule. This important rule of the standard normal distribution, see Figure 7.19, states that approximately

a. 68.26% of the area under the standard normal curve lies between -1 and +1,

b. 95.44% of the area under the standard normal curve lies between -2 and +2,

c. 99.74% of the area under the standard normal curve lies between -3 and +3.

Figure 7.19 The empirical rule of the standard normal curve

The following example illustrates the above rule.

-3 -2 -1 0 1 2 3

68.26% of area

4(0, 1)

95.44% of area99.74% of area

Page 200: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Seven: The Normal Probability Distribution

186

Example 7.10 The 68.26-95.44-99.74 Rule

A pharmaceutical company manufactures vitamin pills (X) that contain an average of 509 grams of vitamin C with standard deviation of 5 grams. If the distribution of vitamin C amounts is known to be approximately normal, then according to the 68.26-95.44-99.74 rule, approximately

a. 68.26% of vitamin pills have vitamin C content in the interval (509 - 5, 509 + 5) = (504, 514).

b. 95% of vitamin pills have vitamin C content in the interval (509 - 10, 509 + 10) = (499, 519).

c. 99.74% of vitamin pills have vitamin C content in the interval (509 - 15, 509 + 15) = (494, 524).

7.9 The Percentiles of the Standard Normal Distribution

Definition 7.3 The kth Percentile

The kth percentile, denoted by Pk, of the standard normal curve is the number that divides the bottom k% of the population from the top (100 – k)% .

For example, the 95th percentile, P95, is the number that divides the bottom 95% of the population from the top 5%. Equivalently, P95 is the z-value having area 0.95 to its left; that is, area 0.05 to its right, under the standard normal curve. (See Figure 7.20.)

Figure 7.20 Finding the 95th percentile

Example 7.11 Finding the 90th Percentile of The Grades

Suppose that the scores of an introductory course in statistics are known to be normally distributed with mean µ = 70 and standard deviation σ = 5. Determine the 90th percentile, P90, of this distribution.

Page 201: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

187

Solution

The 90th percentile, P90, is the number that divides the bottom 90% of the normal curve from the top 10%. P90 is the x-value having area 0.90 to its left or, equivalently, area 0.10 to its right under that normal curve, as shown in Figure 7.21.

Figure 7.21 The 90th percentile of grades

Using the cumulative standard normal table we find that the z-value having area 0.90 to its left under the standard normal curve. This z-value is

approximately z = 1.28. Thus, x = µ + zσ = 70 + 1.28(5) = 76.4 is the value of the 90th percentile. This means that 90% of the distribution have grades below 76.4 and 10% have grades above 76.4.

Remark:

In the exercises that follow, always remember that the probability for continuous random variable is an area under the specified density curve.

Page 202: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Seven: The Normal Probability Distribution

188

Exercises

7.1 Suppose the continuous random variable X has the pdf

where c is a constant. Determine

a. the value of c.

b. P(-2 < X < 1.5).

c. The mean and variance of X.

7.2 Suppose the continuous random variable X has the pdf

Determine the following probabilities:

a. P(X > 1).

b. P(1 < X <2.5).

c. P(X = 1).

d. P(X ≥ 4).

7.3 Suppose X has a uniform continuous density over the interval [2.5, 6.5].

a. Determine the mean, variance, and standard deviation of X.

b. What is P(X < 3.5)?

7.4 Suppose X has a uniform continuous density over the interval [-1, 1].

a. Determine the mean, variance, and standard deviation of X.

b. Determine the value for x such that P(-x < X < x) = 0.80.

7.5 Use the cumulative standard normal table to determine the

following probabilities for the standard normal random variable Z.

a. P(Z < 1.43).

b. P(Z < 3).

c. P(Z > 1.54).

d. P(Z > - 3.10).

e. P(-1.56 < Z < 2.56).

7.6 Use the cumulative standard normal table to determine the following probabilities for the standard normal random variable z.

a. P(-1 < Z < 1).

b. P(-2 < Z < 2).

c. P(-3 < Z< 3).

d. P(Z > 3).

e. P(0 < Z < 1).

7.7 Assume the random variable Z has a standard normal distribution. Use the cumulative standard normal table to determine the value of the constant c that solves each of the following:

a. P(Z < c) = 0.95.

b. P(Z < c ) = 0.60.

c. P(Z < c) = 0.10.

d. P(Z > c) = 0.90.

e. P(Z > c) = 0.20.

f. P(-1.56 < Z < c) = 0.60.

7.8 Assume the random variable Z has a standard normal distribution. Use the cumulative standard normal table to determine the value of the constant c that solves each of the following:

a. P(-c < Z < c) = 0.68.

cx, 1 x 2f (x)

0, otherwise

< <=

( )xe , x 0f x

0, otherwise.

− >=

Page 203: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

189

b. P(-c < Z < c) = 0.95.

c. P(-c < Z < c) = 0.99.

d. P(-c < Z < c) = 0.9973.

7.9 Assume the random variable X is normally distributed with mean µ = 12 and standard deviation σ = 3. Determine the following:

a. P(X < 9).

b. P(X > 15).

c. P(8 < X < 16).

d. P(6 < X < 9).

e. P(3 < X < 6).

7.10 Assume the random variable X is normally distributed with mean µ = 12 and standard deviation σ = 3. Determine the value of x that solves each of the following:

a. P(X > x) = 0.60.

b. P(X > x) = 0.90.

c. P(x < X < 12) = 0.95.

d. P(-x < X – 12 < x) = 0.90.

e. P(-x < X – 12 < x) = 0.95.

7.11 Use the cumulative standard normal table to obtain the following shaded areas under the standard normal curve.

a.

b.

c.

d.

e.

f.

Page 204: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Seven: The Normal Probability Distribution

190

7.12 Obtain the z-value having area 0.90 to its right; that is, find z0.90.

7.13 Obtain the z-value having area 0.60 to its right; that is, find z0.60.

7.14 Determine z0.22; that is find the z-value having area 0.22 to its right under the standard normal curve.

7.15 Determine z0.30; that is find the z-value having area 0.30 to its right under the standard normal curve.

7.16 Complete the following table.

z0.10 z0.05 z0.025 z0.01 z0.005

1.28

7.17Refer to Exercise 7.9. Find the 90th percentile; P90, of the random variable X.

7.18 Special kind of plastic bags used for packaging are manufactured so that the breaking strength of the bag is normally

distributed with a mean of µ = 10 pounds per square inch and a standard deviation of σ = 3 pounds per square inch.

a. What proportion of the bags produced have a breaking strength of

1. between 10 and 10.5 pounds per square inch?

2. between 7.5 and 9.2 pounds per square inch?

3. at least 6.2 pounds per square inch?

4. Less than 5.28 pounds per square inch?

b. Between what two values symmetrically distributed around the mean will 95% of the breaking strengths fall?

c. What will your answer be to (a) and (b) if the standard deviation is 2.0 pound per square inch?

d. Determine the 95th percentile, P95, for the breaking strength.

7.19 A set of final examination grades in an introductory statistics course was found to be normally distributed with a mean of µ = 73 and a standard deviation of σ = 8. Use the 68.26-95.44-99.74 rule to answer the following.

a. What percentage of the grades fall in the interval (65, 81)?

b. What percentage of the grades fall in the interval (57, 89)?

c. What percentage of the grades fall in the interval (49, 97)?

7.20 For a normally distributed random variable, fill in the blanks.

a. ______% of all possible observations lie within 1.28 standard deviations to either side of the mean.

b. ______% of all possible observations lie within 2.33 standard deviations to either side of the mean.

Page 205: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part III: Probability Concepts and Distributions

191

7.21 In the following exercises, check whether the given function is a p.d.f. If a function fails to be a p.d.f., say why.

a. f(x) = x/2 for 0 < x < 1.

b. f(x) = (3/2) (x2 – 1), 0 < x < 2.

c. f(x) = 1/x, 0 < x < e.

d. f(x) = ex, 0 < x < ln2.

e. ( )2

2 −= xf x xe , 0 < x < ∞

7.25 Verify the formula for the mean of a uniform distribution by computing the integral.

7.26 Assume that workers' salaries in your company are uniformly distributed between 200 J.D. to 500 J.D. per month; calculate the average salary in your company.

7.22 In the following exercises, find the value of the constant c so that the given function is a p.d.f.

a. f(x) = 2c, -1 < x < 1.

b. f(x) = c, -2 < x < 0.

c. f(x) = cecx, 0 < x < 1.

d. ( )2

= xf x cxe , 0 < x < 1.

e. ( ) 2=f x c x , 1 < x < ∞.

7.27 The GPA's of a group of students are uniformly distributed between 2.5 and 3.5. Find the average GPA for the group.

7.28 Assume that workers' salaries in your company are uniformly distributed between 200 J.D. to 500 J.D. per month; find the probability that a randomly selected worker earns a monthly salary between 200 J.D. and 250 J.D.

7.23 In the following exercises, find the expected value, the variance, and the standard deviation for each of the density functions.

a. f(x) = 1/3, 0 < x < 3.

b. f(x) = x/50, 0 < x < 10.

c. ( ) 5=f x x, 0 < x < 2 5.

d. ( ) 0 2= < <xf x e , x ln .

e. f(x) = 1/x, 0 < x < e.

f. ( ) 0 10 1 0−= ≤ < ∞. xf x . e , x .

7.29 The GPA's of a group of students are uniformly distributed between 2.5 and 3.5. Find the probability that a randomly selected student from the group has a GPA between 3.0 and 3.5.

7.30 Testing repeated measurements of a student's IQ yield a mean of 140 and a standard deviation of 6. What is the probability that the student has an IQ between 130 and 150?

Assume IQ scores are normally distributed.

7.24 Verify the formula for the mean of a uniform distribution by computing the integral.

7.31 Go back to Exercise 2.27. Find the median GPA.

Page 206: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Seven: The Normal Probability Distribution

192

Chapter Learning Outcomes

When you complete your careful study of this chapter you should be able to:

1. use and understand the techniques and formulas given in this chapter.

22.. ddeetteerrmmiinnee tthhee pprroobbaabbiilliittyy ddeennssiittyy ffuunnccttiioonn ooff aa ccoonnttiinnuuoouuss rraannddoomm vvaarriiaabbllee..

33.. ccaallccuullaattee tthhee mmeeaann,, vvaarriiaannccee,, aanndd ssttaannddaarrdd ddeevviiaattiioonn ooff aa ccoonnttiinnuuoouuss rraannddoomm vvaarriiaabbllee..

44.. pprreesseenntt tthhee uunniiffoorrmm,, aanndd nnoorrmmaall ddiissttrriibbuuttiioonnss..

5. ccaallccuullaattee tthhee aarreeaa uunnddeerr tthhee ccuurrvveess ffoorr ccoonnttiinnuuoouuss rraannddoomm vvaarriiaabblleess..

66.. iiddeennttiiffyy tthhee bbaassiicc pprrooppeerrttiieess ooff tthhee nnoorrmmaall ddiissttrriibbuuttiioonn..

77.. iiddeennttiiffyy tthhee ssttaannddaarrdd nnoorrmmaall ddiissttrriibbuuttiioonn..

88.. lleeaarrnn hhooww ttoo ttrraannssffoorrmm aa rraannddoomm vvaarriiaabbllee ttoo aa ssttaannddaarrddiizzeedd rraannddoomm vvaarriiaabbllee..

99.. ffiinnddiinngg pprroobbaabbiilliittiieess uussiinngg tthhee ccuummuullaattiivvee ssttaannddaarrdd nnoorrmmaall ttaabbllee..

1100.. ffeeeell tthhee uusseeffuullnneessss ooff tthhee ssttaannddaarrdd nnoorrmmaall ddiissttrriibbuuttiioonn..

1111.. ssttaattee tthhee 6688..2266--9955..4444--9999..7744 rruullee aanndd bbee aabbllee ttoo uussee iitt..

1122.. kknnooww tthhee mmeeaanniinngg ooff ppeerrcceennttiilleess ooff tthhee ssttaannddaarrdd nnoorrmmaall ddiissttrriibbuuttiioonn aanndd

lleeaarrnn ccaallccuullaattiioonnss..

Chapter Key Terms

Continuous probability density functions

Empirical rule

Mean of continuous random variable

Normal curve

Normally distributed variable

Percentiles of standard normal curve

Probability density function

Probability density function of normal

distribution

Properties of normal distribution

Properties of standard normal

curve

Standard normal curve

Standard normal distribution

Uniform continuous distribution

Variance of continuous random

variable

z distribution

Page 207: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

193

Part IV

Statistical Inference

Page 208: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eight: Sampling Distribution

194

Page 209: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

195

Chapter 8

Sampling Distributions

In the previous chapters we have studied important subjects such as types of samples, descriptive statistics, probability, random variables, and the normal distribution. In this chapter we will learn how to put all these subjects together to get the necessary background for inferential statistics.

Recall that inferential statistics is the use of a sample to make inference about the whole population. Thus, we need to develop a theory which relates the sample statistics to the corresponding population parameters.

Chapter Outline 8.1 Sampling Error and the Need for Sampling Distributions

8.2 Sampling Distribution of the Sample Mean

8.3 Sampling Distribution of the Sample Proportion

Page 210: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eight: Sampling Distribution

196

8.1 Sampling Error and the Need for Sampling Distributions

Statistical inference is concerned with making decisions about a population based on the information contained in a random sample from that population. Statistical inference is often preferable in conducting a census, where data for the entire population are collected. In general, a sample taken from population costs less and takes less time and effort and can be collected quicker than a census. In most applications, samples are the only practical way to gather data. Thus, we need to develop a theory which relates the sample statistics to the corresponding population parameters.

Definition 8.1 Parameter

Parameter is a descriptive measure for a population.

Examples of parameters are population mean µ, population standard deviation σ, and population proportion p.

Definition 8.2 Statistic

Statistic is a descriptive measure for a sample.

Examples of statistics are the sample mean X , the sample standard deviation

s, and the sample proportion p .

A sample from a population provides information about part of the whole population, and we must expect that the results it will give about the population are not perfectly accurate. Thus, we must expect that a certain amount of error will occur. This kind of error is called sampling error. The student is referred to Definition 1.7 in Chapter 1. In general, sampling error is taken as the absolute value of the statistic minus the parameter it estimates, for example, | X - µ|, |s – σ|, or | p -p|.

Example 8.1 Sampling Error

Suppose we are interested in estimating the mean height of men age 18-24 years in the city of Amman, Jordan. This mean would represent the population mean µ for all men in the given age group. To do this, we take a random sample of size, say, 1000 men age 18-24 and calculate the sample mean, X . Assume the

value of the calculated sample mean is turned out to be x 180 cm= .

Since different samples give different means, we definitely do not expect the

sample mean height x 180 cm= based on only 1000 men to be exactly the same as the mean height µ of all men in age group 18-24 years. Some sampling error is to be anticipated.

Page 211: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

197

The important question that must be asked now is: "How much sampling error should we expect"? Rephrasing this question we say: "How accurate such an estimate is likely to be"? Is it likely, for instance, that the sample mean height be within 10 cm of the true population mean height? 5 cm? 2 cm?

To answer the question of how much sampling error should we expect, we need to know the distribution of all possible sample means that we can calculate by drawing different random samples of sizes 1000 from the population. This distribution is called the sampling distribution of the mean.

Because the sample is only a subset of the whole population, any statistic will differ somewhat from the corresponding population parameter. If we get an idea about the sampling distribution of the statistic, i.e., how it behaves when a large number of samples is taken from the population, then we could make confidence statements about how well the statistic estimated the parameter.

8.2 Sampling Distribution of the Sample Mean

In this section, we introduce the sampling distribution of the sample mean. We start by giving the following definition.

Definition 8.2 Sampling Distribution

The probability distribution of a statistic is called a sampling distribution.

For example, the probability distribution of the sample mean X is called the sampling distribution of the mean. Also, the probability distribution of the sample proportion p is called the sampling distribution of the proportion.

Keep in mind that the sampling distribution of a statistic is the distribution of all possible observations of the statistic for samples of a given size.

Example 8.2 Introducing the Sampling Distribution of the Sample Mean

Assume we have a population of five poll balls, each ball has a number on it, and these numbers are 0, 1, 2, 3, and 4. Here, we take a population of a small size, N = 5, to keep calculations easy in order to reach our target conclusion fast. Keep in mind, however, that populations are much larger in real-life applications.

Let's define random variable X to be number on a selected ball. Thus, the values of X are 0, 1, 2, 3, and 4. Notice that the probability of getting a ball is equal for

all five balls; that is ( ) 1P X k for k =0, 1, 2, 3, 4.

5= =

This is an example of a uniform discrete population of five observations where the mean, variance, and standard deviation are given by

Page 212: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eight: Sampling Distribution

198

( ) ( )2 22

a b 0 42,

2 2

b a 1 1 4 0 1 12, and

12 12

2 1.414.

µµµµ

σσσσ

σσσσ

+ += = =

− + − − + −= = =

= =

The probability mass function of this distribution is given in Figure 8.1.

Figure 8.1 Probability mass function of discrete uniform distribution

It is remarkably amazing how the graph in Figure 8.1 will change at the end of our discussion. Just wait, you will see the big difference.

Let us now obtain the sampling distribution of the mean for samples of size two. So, we take samples of size n = 2, with replacement, from a population of size N = 5. We have 25 = 25 different samples. Table 8.1 represents all samples of size 2 balls taken with replacement from a population of size 5 balls.

Table 8.1 All samples of size 2 balls taken with replacement from a population of size 5 balls

Second ball 0 1 2 3 4

0 0, 0 0, 1 0, 2 0, 3 0, 4 1 1, 0 1, 1 1, 2 1, 3 1, 4 2 2, 0 2, 1 2, 2 2, 3 2, 4 3 3, 0 3, 1 3, 2 3, 3 3, 4

Fir

st B

all

4 4, 0 4, 1 4, 2 4, 3 4, 4

The sample means for each sample are given in Table 8.2.

15

Page 213: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

199

Table 8.2 The 25 sample means for the 25 samples of Table 8.1

Second ball

0 1 2 3 4

0 0.0 0.5 1.0 1.5 2.0 1 0.5 1.0 1.5 2.0 2.5 2 1.0 1.5 2.0 2.5 3.0 3 1.5 2.0 2.5 3.0 3.5

Fir

st B

all

4 2.0 2.5 3.0 3.5 4.0

As said earlier, the sampling distribution of the sample mean is a probability distribution consisting of all possible sample means of a given sample size, in this case n = 2, selected from a population. Table 8.3 represents the sampling distribution of the sample mean.

Table 8.3 The sampling distribution of the mean for samples of size n = 2

Value of X Frequency =P(X x)

0.0 1 0.04 0.5 2 0.08 1.0 3 0.12 1.5 4 0.16 2.0 5 0.20 2.5 4 0.16 3.0 3 0.12 3.5 2 0.08 4.0 1 0.04

Total 25 1.00

Observe that for future work, when we write the sampling distribution of a statistic, we only put the first and third columns of Table 8.3, namely, values of the statistics and the corresponding probabilities.

The graph of the probability mass function of the sample mean for samples of size n = 2 is shown in Figure 8.2.

Figure 8.2 The sampling distribution of the mean for samples of size n = 2

( )P X x=

x

Page 214: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eight: Sampling Distribution

200

Compare Figures 8.1 and 8.2, what do you see?

We now compute the mean, variance, and standard deviation for the sampling distribution of the mean for samples of size n = 2.

Comparing the Sampling Distribution with the Original Population

In Example 8.2, if we compare the original population with the sampling distribution of the means, we find some interesting remarks:

1. The shape of the original population is quite different from that of the sampling distribution of the means,

2. the mean of the two distributions are equal, and

3. the variance of the sampling distribution of the means is less than the variance of the original distribution.

Later on, we will find out that:

1. As the sample size n gets large, the sampling distribution of the mean will tend toward a normal distribution,

2. The expected value of the mean of the sampling distribution and the mean of population are equal, and

3. The variance of the sampling distribution of the mean is less than the variance of the original population. Moreover, there is some kind of a relationship between these two variances.

The Effect of Increasing the Sample Size on the Sampling

Distribution of the Mean

What happens to the sampling distribution of the means if we increase the sample size from n = 2 to n = 3?

x

N 2i x

2 i 1x

2 2 2 2

0 0.5 1 1.5 ... 4.5 502,

25 25

(x )

N

(0 2) (0.5 2) (1 2) ... (4 2)

2525

25

1, and

1.

=

+ + + + +µ = = =

− µ∑

σ =

− + − + − + + −=

=

=σ =

Page 215: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

201

If, again, we take all samples of size 3 with replacement, we get 53 = 125 different samples. The sampling distribution now is given in Table 8.4 and Figure 8.3 depicts the sampling distribution for the mean for samples of size n = 3.

Table 8.4 The sampling distribution of the mean for samples of size n = 3

Value of X ====P(X x)

0.000 0.008 0.333 0.024 0.666 0.048 1.000 0.080 1.333 0.120 1.666 0.144 2.000 0.152 2.333 0.144 2.666 0.120 3.000 0.080 3.333 0.048 3.666 0.024 4.000 0.008 Total 1.000

Figure 8.3 The sampling distribution of the mean for samples of size n = 3

Now compare Figures 8.1, 8.2 and 8.3, what do you see? What's happening?

(((( ))))P X x====

x

Page 216: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eight: Sampling Distribution

202

It is now obvious that when n increases, the sampling distribution of the mean tends toward a normal distribution.

We now compute the mean, variance, and standard deviation for the sampling distribution of the mean for samples of size n = 3.

If we investigate the relationship between the sampling distribution of the mean and the distribution of the original population, we find two interesting and important facts. These two facts give relationships between the mean and variance of the sampling distribution with corresponding mean and variance of the original population. The two facts are:

The quantity xx n

σσ = is called Standard Error of the sampling distribution of

the mean. It is interesting to notice that as n increases, the standard error decreases. See Figure 8.4.

xallx

2 2x x

allx2 2 2

x

x P(X x)

(0)(0.008) 3(0.333)(0.024) ... (4)(0.008)

2

(x ) P(X x)

(0 2) (0.008) (0.333 2) (0.024) ... (4 2) (0.008)

0.666

0.666 0.816

µ = =∑

= + + +=

σ = − µ =∑

= − + − + + −=

σ = =

x x

x

x x

1. ;

Recall that 2. We

n 2, 2 and when n 3, 2.

that is, the mean of the sampling distribution of the mean equals the

population mean. the original population mean is saw

that when This i

µ = µµ =

= µ = = µ =

xx2. ;

n

.

Recall that

s true for any sample size.

that is, the standard deviation of the sampling distribution of the

mean is equal to the standard deviation of the original distribution divided by n

σσ =

x

x xx x

1.414. Now, when

1.414 1.414n 2, 1, 1 and when n 3, 0.816, 0.816.

2 2 3 3

the original population standard deviation is

σ =σ σ= σ = = = = σ = = =

Page 217: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

203

Figure 8.4 As sample size increases, variance decreases

Note that when sampling is done without replacement from a finite population, the appropriate formula for the standard error is

where, n denotes the sample size and N the population size. And when sampling is done with replacement from a finite population or when it is done from an infinite population, the appropriate formula is

Sampling Distribution of the Mean for Normally Distributed Variables

Now suppose that a variable X is normally distributed with mean µ and

standard deviation σ. Then, for samples of size n, the sample mean x is also

normally distributed and has mean µ and standard deviation nσ .

Example 8.3 An Example to Illustrate the Sampling Distribution of the Mean for Normally Distributed Variables

Consider a variable which is normally distributed with mean µ = 70 and standard deviation σ = 5. Determine the sampling distribution of the mean for samples of size n = 5 and n = 25.

Solution

The normal distribution for a random variable with mean 70 and standard deviation 5 is shown in Figure 8.5(a). Now, for any particular sample size n, the variable x is also normally distributed and has mean µ = 70 and standard

deviation 5σ =n n.

a. For samples of size 5, the standard deviation is 5 5 5 2 236= =n . and,

therefore the sampling distribution of the mean is a normal distribution with

x .n

σσ =

xN n

. ,N 1 n

− σσ =−

Page 218: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eight: Sampling Distribution

204

mean 70 and standard deviation (or equivalently, standard error) 2.235. See Figure 8.5(b).

b. For samples of size 25, the standard deviation is 5 5 25 1= =n and,

therefore the sampling distribution of the mean is a normal distribution with mean 70 and standard deviation 1. See Figure 8.5(c).

Figure 8.5 (a) Normally distributed variable (b) Sampling distribution of the mean for n = 5 (c) Sampling distribution of the mean for n = 25

The Central Limit Theorem

The central limit theorem (C.L.T.) is one of the most important theorems in statistics. It is critical to understanding inferential statistics and test of hypotheses. The central limit theorem can be stated as follows:

"If a variable X has a distribution with mean µ and standard deviation σ, then

the sampling distribution of the sample mean X calculated from random samples of size n, will have a mean equal to µ and a standard deviation equal

toσσ =

X n, and will tend to be close to the normal as the sample size

increases".

The central limit theorem is important for several reasons:

1. We do not have to construct the sampling distribution for the means, as we did. The theorem tells us that, for a large sample size, the sampling distribution of the means is approximately normally distributed with

mean µ and standard deviation x .n

σσ =

2. The central limit theorem holds regardless of the distribution of the original population.

X X

Page 219: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

205

3. Although the standard deviation σ of X does not change with increasing sample size, the standard deviation of x decreases as the sample size increases.

How Large Must n be?

How large must n be for the central limit theorem to be applicable? What sample size must be taken so that the central limit theorem is applicable?

• For most distributions, the sample size must be n ≥ 30.

• For normal distribution, the sampling distribution of the mean is always normal regardless of the sample size.

What is the Usefulness of Applying the Central Limit Theorem?

As said earlier, the central limit theorem plays an important role in statistical inference. One advantage of applying the central limit theorem is that it enables us to answer questions that involve probability statements about the sample

mean X . Observe that up to this point, we have dealt with probability statements about the random variable X only. Now we want to deal with

probability statements that involve the sample mean X . How do we accomplish

this mission? The answer is easy, we standardize X . Because as n gets large

enough, thenX

X ~ N( , ) , and therefore z ~ N(0 1).n n

, σ − µµ =σ

Example 8.4 illustrates the basic idea of the central limit theory in practical applications.

Example 8.4 Application on the Central Limit Theorem

A light bulb manufacturer claims that lifespan of its brand of light bulbs is normally distributed with mean of 50 months and a standard deviation of 8 months. A random sample of 49 bulbs is taken for test. Determine the probability that the mean lifespan is greater than 52 months.

Solution

Since the original distribution is normal, then any sample mean will also be normal. From the information given, we have µ = 50, σ = 8, and n = 49 (> 30) so that the sample mean will have a normal distribution with mean of 50 and standard deviation equals to

Figure 8.6 depicts the sampling distribution of the mean of

x8 X 50

1.143. Therefore X ~ N(50,1.143), and z .1.14349

52 50So, P(X 52) P(Z )

1.143 P(Z 1.75)

1 0.9599

0.0401.

−σ = = =

−> = >

= >= −=

Page 220: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eight: Sampling Distribution

206

Example 8.4 compared to the standard normal distribution.

Figure 8.6 The sampling distribution of the mean for Example 8.4 compared to the standard normal distribution

Example 8.5 A Second Example on the Central Limit Theorem

Suppose that a random variable X has continuous uniform distribution given by:

Then the sampling distribution of the sample mean of a random sample of size n = 40 is found as follows.

We start by graphing the given probability density function. The graph is given in Figure 8.7.

Figure 8.7 The probability density function of uniform continuous over the interval [4, 6]

Next we find the mean and variance of X. Recall that for a uniform continuous random variable over the interval [a, b], the mean and variance are:

By the central limit theorem, the distribution of the sample

1 2, 4 x 6f (x)

0, otherwise.

≤ ≤=

4 6

1/2

x

f(x)

( ) ( )2 22

a b 4 65,

2 2

b a 6 41 3.

12 12

Thus, 1 3 0.577.

+ +µ = = =

− −σ = = =

σ = =

Page 221: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

207

mean x is approximately normal with mean x 5µ = µ = and standard

deviation x0.577

0.091n 40

σσ = = = .

The distribution of x is graphed in Figure 8.8 below.

Figure 8.8 The sampling distribution of the mean

Now, it will be easy for us to find probability statements concerning the sample mean, for example:

Example 8.5 A Third Example on the Central Limit Theorem

Suppose the mean of the final exam in statistics is 76 with standard deviation 8. What is the probability of selecting a sample of 36 students and finding that the sample mean is within 2 points of the final exam mean?

Solution

Notice that nothing is mentioned about the distribution of the original population. However, since the sample size n is large and n = 36, the central limit is applicable and the distribution of the sample mean is approximately

normal with mean 76µ = and standard deviation n 8 36 1.333.σ = =

Now, the probability that the sample mean is within 2 points of the population mean is written as follows:

( )

( )( ) ( )

P 5.1 X 5.2

5.1 5 X 5 5.2 5P

0.091 0.091 0.091

P 1.10 Z 2.20

P Z 2.20 P Z 1.10

0.9861 0.8643

0.1218.

≤ ≤

− − −= ≤ ≤

= ≤ ≤

= ≤ − ≤= −=

x

0091σ =X .

Page 222: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eight: Sampling Distribution

208

The above result is interpreted as follows: "We would expect about 87% of the sample means to be within 2 points of the population mean".

8.3 Sampling Distribution of the Sample Proportion

Proportion (or percentage) is another important parameter that needs to be considered. As a matter of fact, we may say that proportions are as important as means and they must be given our attention. In many statistical studies we are concerned with proportion of a population. For example, we might be interested in

• the proportion of women age ≥ 55 years who have cured from breast cancer.

• the proportion of university students who smoke.

• the proportion of women in the labor force in Jordan.

• the proportion of private sector companies specialized in importing maids.

• the proportion of students who get a higher degree (M.Sc. and Ph.D.) every year.

• the proportion of Philadelphia University students who live in Irbid.

From the above examples, it should be clear that we now deal with a categorical variable where each individual or item in the population can be classified as either possessing or not possessing a particular characteristic such as smoke and do not smoke, live in Irbid and do not live in Irbid. Thus, the variable could be assigned two possible values, 1 or 0 to represent the presence (success) or absence (failure) of the characteristic. Observe that the mean of a random sample of such variable is the sum of 1's and 0's divided by the sample size n.

We denote the population proportion by lower case letter p, and to its estimate by; the statistic, p,where

= =x number of successesp

n sample size

( )( )( )

( )( ) ( )

P x 76 2

P 2 x 76 2

P 74 x 78

74 76 x 76 78 76P

1.333 1.333 1.333

P 1.50 Z 1.50

P Z 1.50 P Z 1.0

0.9332 0.0688

0.8644.

− ≤

= − ≤ − ≤

= ≤ ≤

− − − = ≤ ≤

= − ≤ ≤

= ≤ − ≤ −= −=

Page 223: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

209

Notice in the above formula that if x is the number of successes in a sample of size n, then the number of failures is n – x. Also notice that the sample proportion phas the special property that it must be between 0 and 1.

The Sampling Distribution of Proportion

In Section 8.2 we discussed the sampling distribution of the sample mean, x , so that we will be able to make inferences about the population mean µ. The same thing holds for proportion: we must discuss the sampling distribution of the sample proportion, p,so that we will be able to make inferences about

population proportion p.

The sampling distribution of the proportion can be derived from the sampling distribution of the mean, since a proportion can be regarded as a mean; it is the sum of 1's divided by n. Keeping this fact in mind, we can state the following results about the sampling distribution of proportion for samples of size n.

1. The mean of p equals the population proportion; that is p p.µ =

2. The standard deviation of p equals to ( )p p 1 p n.σ = −

3. ( )( )p ~ N p, p 1 p n ;−ɺ that is, p is approximately normal with mean p and

standard deviation ( )p 1 p n.−

Observe that in result 3 above we used the approximation notation to normality. The accuracy of this approximation depends on two factors: n and p.

• If p is close to 0.5, then the approximation is quite accurate even for moderate sample size n.

• The further p is from 0.5, the larger n needs to be for the approximation to be accurate.

• As a rule of thumb, we must use the normal approximation when np and n(1 – p) are both greater than or equal to 5; (≥ 5). So, in this section, when we say that n is large we mean that both np and n(1- p ) ≥ 5.

Example 8.6 The Sampling Distribution of Proportion for Students

Assume that 30% of the students in Philadelphia University live in Amman. Determine the approximate sampling distribution of proportion for samples of size n = 36 and n = 64.

Solution

Page 224: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eight: Sampling Distribution

210

For large sample size n, the statistic p is approximately normal with mean

p = 0.30 and standard deviation of ( ) ( )( )p p 1 p n 0.30 0.70 n.σ = − =

a. For samples of size n = 36, ( )( )p 0.30 0.70 36 0.076σ = = , and

therefore the sampling distribution of the sample proportion is approximately normal distribution with mean 0.30 and standard deviation 0.076.

b. For samples of size n = 64, ( )( )p 0.30 0.70 64 0.057σ = = , and

therefore the sampling distribution of the sample proportion is approximately normal distribution with mean 0.30 and standard deviation 0.057.

Example 8.7 Using the Sampling Distribution of Students Living in Amman

Assume that 30% of the students in Philadelphia University live in Amman. If a random sample of 100 students is selected,

a. what proportion of samples are likely to have between 20% and 30% of the students living in Amman?

b. within what symmetrical limits of the population percentage will 95% of the sample percentages fall?

Solution

For large sample of size n = 100, the statistic p is approximately normal with

mean p = 0.30 and standard deviation ( )( )p 0.30 0.70 100 0.046.σ = = That is,

( )p ~ N 0.30, 0.046 .ɺ Standardizing this random variable, we get p 0.30

Z .0.046

−=

a.

Figure 8.9 Sampling distribution of the proportion of students living in Amman to find area between sample proportions of 20% and 30%

( )

( )( ) ( )

ˆP 0.20 p 0.30

ˆ0.20 0.30 p 0.30 0.30 0.30P

0.046 0.046 0.046

P 2.17 Z 0

P Z 0 P Z 2.17

0.5000 0.0150

0.4850. See Figure 8.9.

≤ ≤

− − − = ≤ ≤

= − ≤ ≤

= ≤ − ≤ −= −=

Page 225: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

211

b. We must find two limits; lower and upper. We are given that

p

p pP z z 0.95

p 0.30or 1.96 0.95.

0.046

p 0.30Thus

0.046ˆ

−− ≤ ≤ = σ

− ≤ ≤ =

− −

P -1.96

the lower limit is found by solving the equation = 1.96 which gives

a value of p = 0.21 or 21%. On the other hand, the

p 0.30ˆ

0.046

− upper limit is found by solving

the equation = 1.96 which gives a value of p = 0.39 or 39%.

Page 226: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eight: Sampling Distribution

212

Exercises

8.1 What are the mean and standard deviation of the sampling distribution of the mean?

8.2 What are the mean and standard deviation of the sampling distribution of the proportion?

8.3 If test scores are normally distributed with a mean of 75 and standard deviation of 8.

a. What is the probability that a single score selected at random will be greater than 78?

b. What is the probability that a sample of 25 scores will have a mean greater than 78?

c. What is the probability that a sample of 25 scores will have a mean between 72 and 78?

8.4 What is the standard error of the mean and why is it important?

8.5 What is the relationship between the standard error of the mean and the sample size?

3. Compute the standard deviation of the sampling distribution of the mean for samples of size n = 2 and also compute the standard deviation of the population. Which one is smaller?

4. Repeat parts (1), (2), and (3) for all possible samples of size n = 3.

5. Compare the shape of the sampling distribution of the mean obtained in parts (1) and (4) and compare the results. Which sampling distribution seems to have less variability? Why?

b. Assuming that you sample with replacement, repeat parts (1) – (5) of (a) and compare the results. Which sampling distributions seem to have the least variability, those in part (a) or (b)? Why?

Remark: When sampling is done without replacement from a finite population, the appropriate formula for the standard error is

8.6 The following data represent the number of defective T.V. sets in batches of 100:

2, 3, 4, 6, 9, 10

a. Assume that you sample without replacement.

1. Select all samples of size n =2 and determine the sampling distribution of the mean.

2. Compute the mean of the sampling distribution of the mean for samples of size n = 2 and also compute the population mean. Are they equal?

8.7 Given a normal distribution with µ = 100 and σ = 10, if a sample of size n = 25 is selected, what is the

probability that the sample mean X is

a. less than 90?

b. greater than 103?

c. between 90 and 110?

d. between 99 and 101?

e. what would be your answers to (a) – (d) if n = 36?

xN n

. .N 1 n

− σσ =−

Page 227: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

213

8.8 Given a normal distribution with µ = 60 and σ = 6, if a sample of size n = 81 is selected, what is the

probability that the sample mean X is

a. less than 56?

b. greater than 63?

c. between 58 and 62?

d. between 56 and 64?

What would be your answers to (a) – (d) if n = 36?

8.9 This exercise uses an unrealistically small population to provide a concrete illustration for the exact distribution of a sample proportion. A population consists of three men and two women. The names of the men are Omar, Ali, and Hassan; the names of the women are Layla and Fatema. Suppose the success is "female".

Samples

Number of females

x

Sample proportion

p

O, A O, H O, L O, F A, H A, L A, F H, L H, F L, F

0 0 1 1

0 0

0.5 0.5

a. Determine the population proportion, p.

b. The first column of the following table provides the possible samples of size two, without replacement, where each person is represented by the first

letter of his or her name; the second column gives the number of successes; the number of females obtained for each sample, and the third column shows the sample proportion. Complete the table.

Construct a dot diagram for the sampling distribution of the proportion for samples of size 2. Mark the position of of the population proportion on the dot diagram.

c. Construct a dot diagram for the sampling distribution of the proportion for samples of size 2. Mark the position of of the population proportion on the dot diagram.

d. Use the third column of the table to obtain the mean of the variable p .

e. Compare your answers from parts (a) and (d). Why are they the same?

8.10 If 25% of MIS students prefer to work for private companies after they graduate, determine the sampling distribution of the proportion for samples of size n = 25 and n = 49.

8.11 If 25% of MIS students prefer to work for private companies after they graduate and a random sample of size n = 81 students is selected a. what proportion of samples are likely to have between 10% and 15% of the students who will work for private companies?

b. within what symmetrical limits of the population percentage will 90% of the sample percentages fall?

Page 228: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eight: Sampling Distribution

214

Chapter Learning Outcomes

When you complete your careful study of this chapter you should be able to:

11.. uussee aanndd uunnddeerrssttaanndd tthhee ffoorrmmuullaass ggiivveenn iinn tthhiiss cchhaapptteerr..

22.. ddeeffiinnee ssaammpplliinngg eerrrroorr aanndd ffeeeell tthhee nneeeedd ffoorr ssaammpplliinngg ddiissttrriibbuuttiioonnss..

33.. ddeevveelloopp tthhee ssaammpplliinngg ddiissttrriibbuuttiioonn ooff tthhee mmeeaann..

4. ffiinndd tthhee mmeeaann aanndd ssttaannddaarrdd ddeevviiaattiioonn ooff tthhee ssaammppllee mmeeaann ffrroomm tthhee mmeeaann aanndd ssttaannddaarrdd ddeevviiaattiioonn ooff tthhee oorriiggiinnaall ppooppuullaattiioonn aanndd tthhee ssaammppllee ssiizzee..

55.. sseeee tthhee eeffffeecctt ooff iinnccrreeaassiinngg tthhee ssaammppllee ssiizzee oonn tthhee ssaammpplliinngg ddiissttrriibbuuttiioonn ooff tthhee mmeeaannss..

66.. ddeevveelloopp tthhee ssaammpplliinngg ddiissttrriibbuuttiioonn ooff tthhee mmeeaann wwhheenn oorriiggiinnaall ppooppuullaattiioonn iiss nnoorrmmaallllyy ddiissttrriibbuutteedd..

77.. ssttaattee,, aappppllyy,, aanndd ffeeeell tthhee iimmppoorrttaannccee ooff tthhee cceennttrraall lliimmiitt tthheeoorreemm..

88.. ddeevveelloopp tthhee ssaammpplliinngg ddiissttrriibbuuttiioonn ooff ssaammppllee pprrooppoorrttiioonn..

99.. ffiinndd tthhee mmeeaann aanndd ssttaannddaarrdd ddeevviiaattiioonn ooff ssaammppllee pprrooppoorrttiioonn..

10. ttaakkee aa nn eexxcceelllleenntt llooookk aatt ddiiffffeerreenntt aapppplliiccaattiioonnss.

Chapter Key Terms

Central limit theorem

Parameter

Sampling distribution

Sampling distribution for normal

Sampling distribution of the mean

Sampling distribution of the proportion

Sampling error

Standard error

Standard error of the mean

Standard error of the

proportion

Statistic

Page 229: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

215

Chapter 9

Point and Interval Estimation

The major problem that we will address in the remaining of the book is that we have a data set at hand and we want to infer the properties of the underlying distribution from this data. In other words, we will be concerned with what is usually referred to as statistical inference (or inferential statistics); the use of sample statistics to get information about population parameter(s).

Statistical inference is divided into two main areas; namely, estimation and hypothesis testing. In estimation, usually we are interested in estimating the values of specific population parameters. We will see that there are two methods of estimation; point and interval estimation. Hypothesis testing, on the other hand, is concerned with testing whether the value of a population parameter is equal to a specific number or it belongs to a certain interval. Problems of estimation are covered in this chapter, while problems of hypothesis testing are discussed in Chapter 10.

This chapter contains sections regarding point estimation of a parameter, confidence interval about population mean, and confidence interval about population proportion.

Chapter Outline

9.1 Point Estimation of a Parameter

9.2 Confidence Interval about the Population Mean when σ is Known

9.3 Confidence Interval about the Population Mean when σ is Unknown

9.4 Confidence Interval about the Population Proportion

Page 230: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Nine: Point and Interval Estimation

216

9.1 Point Estimation of a Parameter

As we have seen in previous chapters, we are always interested in estimating the population mean µ from information contained in a particular sample selected from that population. For example, we might be interested to know

• the mean monthly expenditure on rent,

• the mean grade of a final exam,

• the mean weight of newly born baby,

• the mean salary of newly graduate engineer, and

• the mean arrival time of train.

Because populations are always large and contain too many units, it is impractical to study all units in the population because such a study costs time and money. However, we can obtain reasonably accurate information about the population mean µ from a sample taken from the population under study.

Definition 9.1 Point Estimate

A point estimate of a parameter is a single sample statistic that is used to estimate the true value of a population parameter.

For example, the sample mean x , is a point estimate of the population mean µ and the sample variance s2 is a point estimate of the population variance σ2. Remember, in Chapter 8 we referred to quantities such as x and s2 as statistics.

Example 9.1 Using Sample Mean to Estimate Population Mean

The data in Table 9.1 are obtained by taking a random sample of n = 40 males and record the life expectancy (in years) for each of them. Use this data to estimate the population mean life expectancy, µ, for all males in the population.

Table 9.1 Life expectancy (in years) for 40 males

67 54 62 73 68 68 60 50 74 73 57 59 64 75 76 59 67 67 68 63 53 56 62 62 69 69 64 61 75 52 72 50 66 68 66 73 72 71 63 52

Solution

We estimate the population mean life expectancy, µ, of all males by the sample

mean life expectancy, x , of the 40 males sampled from the population. Notice that

258064 5

40∑= = =x

x . years.n

Page 231: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

217

Therefore, based on the sample data, we estimate the mean life expectancy, µ, of all males to be 64.5 years. Keep in mind that an estimate of this kind is called a point estimate for µ because it consists of a single number, or point.

9.2 Confidence Interval about the Population Mean when σ is Known

A sample statistic (or point estimate), such as x , varies from sample to sample because it depends on the units selected in the sample, so we must take this into consideration when providing an estimate of the population parameter. To accomplish this, we give a confidence interval estimate of the true population mean µ by taking into account the sampling distribution of the mean.

Definition 9.2 Confidence Interval Estimate

A confidence interval (or interval estimate) of a population parameter is an interval that is used to estimate the true value of a population parameter. It is obtained from a point estimate of the parameter together with a percentage that specifies how confident we are that the parameter lies in the interval.

Definition 9.3 Confidence Level

Confidence level is the proportion of times that the confidence interval actually does contain the unknown population parameter, assuming the estimation process is repeated a large number of times.

The most frequently used values of the confidence level are 90%, 95%, and 99%.

The confidence level is also referred to in some books as confidence coefficient

Remark

We usually abbreviate confidence interval by C.I.

It should always be noted that a confidence interval

• is stated in terms of the confidence level (confidence coefficient),

• is based on observations collected from one sample, and

• gives information about how close is an estimate to the unknown parameter.

Now, recall from the Central Limit Theorem that for sufficiently large n; (n ≥

30), the sampling distribution of the sample mean X is approximately normal

with mean µ and standard deviation nσ . Suppose we take a random sample

of size n = 35 from a population of interest and construct an interval that is 1.96

Page 232: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Nine: Point and Interval Estimation

218

standard deviations around the sample mean. "How likely will this interval contain the true unknown population mean µ?" See Figure 9.1.

The above question can be rephrased to be equivalent of saying: construct a 95% C.I. for the unknown population mean µ, since

X XP(X 1.96 X 1.96 ) 0.95.− σ ≤ µ ≤ + σ =

Figure 9.1 Normal curve for determining the z-value needed for 95% confidence

The interval xX 1.96± σ is called a large-sample 95% C.I. for the population

mean µ. The term large-sample refers to the fact that the sample is of sufficiently large size that we can apply the central limit theorem to determine the form of sampling distribution.

How Do We Interpret Confidence Intervals?

To interpret a C.I., we give two examples, the first when we choose the confidence level to be 0.95, and the second 0.90.

Example 9.2 The Meaning of a 95% C.I. and a 90% C.I. for the Mean

• A 95% C.I. for the mean is interpreted as follows:

If we repeat the experiment over and over again, a large number of times, and in each time we take a sample of fixed size and construct C.I., then we observe that the

mean µ lies in 95% of the samples. Stated equivalently, 95% of the sample means for a given sample size will lie within 1.96 standard deviations of the unknown parameter µ.Look at Figure 9.2.

Page 233: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

219

Figure 9.2 Twenty confidence intervals for the population mean µ, each based on the same sample size n, one C.I .out

• A 90% C.I. for the mean is interpreted as follows:

If we repeat the experiment over and over again, a large number of times, and in each time we take a sample of fixed size and construct C.I., then we observe that the

mean µ lies in 90% of the samples. Stated equivalently, 90% of the sample means for a given sample size will lie within 1.645 standard deviations of the unknown parameter µ. Look at Figure 9.3.

Figure 9.3 Twenty confidence intervals for the population mean µ, each based on the same sample size n, two C.I.s out

Example 9.3 Introducing Confidence Interval for the Population Mean

when σ is Known

Go back to Example 9.1 and assume life expectancy is normally distributed and the population standard deviation of all such sale taxes is known to be σ = 7 years.

a. Determine the distribution of the variable x , that is, the sampling distribution of the mean for samples of size n = 40.

b. Use part (a) to show that 95.44% of all samples of 40 new males life expectancy have the property that the interval from x - 2.214 to x + 2.214 contains µ.

c. Use part (b) and the data in Table 9.1 to obtain a 95.44% confidence interval for the mean life expectancy.

Solution

a. Because n = 40, σ = 7, and life expectancy is normally distributed, then we know from Chapter 8 that

• xµ = µ (which we do not know),

• x n 7 40 1.107, andσ = σ = =

• x is normally distributed.

Page 234: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Nine: Point and Interval Estimation

220

So we now know that for samples of size 40, the statistic X is normally distributed with mean µ (unknown) and standard deviation 1.107.

b. Recall from Chapter 7, the "95.44 part" of the empirical rule states that for a normally distributed variable, 95.44% of all possible observations lie within two standard deviations to either side of the

mean. Applying this to the variable X and referring to part (a), we see that 95.44% of all samples of 40 males life expectancy have mean life within 2(1.107) = 2.214 of µ. This is the same to say that 95.44% of all samples of 40 males life expectancy have the property that the

interval from X - 2.214 to X + 2.214 contains µ.

c. From part (b) we know that 95.44% of all samples of 40 males life

expectancy have the property that the interval from X - 2.214 to

X + 2.214 contains µ. Therefore, we can be 95.44% confident that the sample of 40 males whose life expectancy are in Table 9.1 has that

property. For that sample, X = 64.5 and, so

X -2.214 = 64.5 – 2.214 = 62.268 and X + 2.214 = 64.5 + 2.214 = 66.714.

Consequently, our 95.44% confidence interval is from 62.268 to 66.714; we can be 95.44% confident that the mean life expectancy, µ, of all males is somewhere between 62.268 years and 66.714 years.

In reality, always remember that this or any other 95.44% confidence

interval may or may not contains µ, but we can be 95.44% confident that it does.

After we find the C.I., we my write it in either of the following forms:

• 62.268 ≤ µ ≤ 66.714,

• (62.268, 66.714), or

• the C.I. is from 62.268 years to 66.714 years.

From the above discussion and Example 9.3, we can conclude that

A (1 – α)100% C.I. for the Population Mean (σ Known) is Given By:

2

2 2

2

x zn

or

x z x zn n

where

z the value of z such that 2 of the area under the

standar

α

α α

α

σ±

σ σ− ≤ µ ≤ +

= α

d normal curve is to its right.

Page 235: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

221

What if the standard deviation of the population σ is unknown?

In this case we have two cases to consider, both depends on the sample size n.

Case 1

If the standard deviation of the population σ is unknown, but n is large; n ≥ 30, then we replace σ with the sample standard deviation s. In this case the, the sample standard error becomes

A (1 – α)100% C.I. for the Population Mean (σ Unknown But n ≥≥≥≥ 30) is Given By:

Case 2

If the standard deviation of the population σ is unknown, but n is small; n < 30, then we use another distribution called the t-distribution. This case will be discussed in details later (see page 221).

Example 9.4 Introducing Confidence Interval for the Population Mean

when σ is Unknown, but n ≥≥≥≥ 30

The university administration wishes to know how many kilometers students travel to campus. Suppose that a random sample of size n = 49 is taken from the population of university students yields the following statistics: the mean distance is 30.52 km with a standard deviation of 7.23 km. Construct a 95% C.I. for µ, the true mean distance for all students

Solution

We are given x 30.52, s = 7.23, and n = 49. Thus,= we must find the value of z. For a confidence level 0.95 we have 1 – α = 0.95, and α/2 = 0.025. Thus, the value of z0.025 is found by consulting the cumulative standard normal table. It is the value of z that has an area of 0.025 to it right under the standard normal curve. Therefore, z0.025 = 1.96 and a 95% C.I. for the true mean distance for all students is given by

xs s n , and=

2

2 2

s x z

nor

s s x z x z

n n

α

α α

±

− ≤ µ ≤ +

Page 236: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Nine: Point and Interval Estimation

222

2s 7.23

x z 30.52 1.96n 49

=30.52 2.02

or

28.50 32.54

α± = ±

±

≤ µ ≤

Thus, we estimate with 95% confident, that the true population's mean distance is between 28.50 km and 32.54 km. There is 95% confidence that the sample selected is one in which the true population mean is included somewhere between the two numbers.

Factors that Affect the Length of a Confidence Interval

It is obvious that the shorter the confidence interval is, the better the estimate, since a long confidence interval is a sign of poor precision, whereas a short confidence interval is a sign of good precision. Notice that length of confidence interval gets smaller as

1. the sample size n increases,

2. the confidence level, 1 – α, decreases, and

3. the population's variability decreases.

Maximum Error of Estimate

As it should now be clear that when we use a sample mean to estimate a

population mean, the chances are zero that X .= µ Hence, we accompany such a

point estimate of µ with some statement as to how close we might reasonably

expect the estimate to be. The error, X − µ , as we know, is the absolute

difference between the point estimate and the quantity it is supposed to estimate.

Our objective now is to examine this error. In order to do so, we make use of the fact that for large n, the quantity

is a random variable having approximately the standard normal distribution; N(0, 1). See Figure 9.4.

X

n

− µσ

Page 237: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

223

Figure 9.4 The large sample distribution of

As shown in Figure 9.4, we can assert with probability 1 – α that the inequality

will be satisfied. We can rewrite this inequality in the following form:

Now let E be the maximum allowable error of estimate, then the error, X − µ ,

will be less than

with probability 1 – α. In other words, if we want to estimate µ with the mean, X , of a large sample (n ≥ 30), we can be confident with probability 1 – α that the

error, X − µ , will be at most zα/2(σ/ n).

The commonly used values for 1–α are 0.90, 0.95, and 0.99, and the corresponding values of zα/2 are z0.05 = 1.645, z0.025 = 1.96, and z0.005 = 2.575. It is worthwhile to observe that the error is equal to half the length of the confidence interval, as seen in Figure 9.5.

Figure 9.5 Maximum error of estimate

Example 9.5 Finding the Maximum Error for Estimating the Population Mean

An engineer intends to use the mean of a random sample of size n = 60 computers to estimate the mean repair cost. The population standard deviation was 36 J.D. What can he assert with probability 99% about the maximum size of his error?

EE

2x z .n

α

σ+2x z .n

α

σ− x

2E z .n

ασ=

X

n

− µσ

2 2X

z zn

α α− µ− ≤ ≤

σ

Xz

n

− µ≤

σ

Page 238: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Nine: Point and Interval Estimation

224

Solution

We substitute n = 60, σ = 36, and z0.005 = 2.575 into the formula for E and get

Thus, the engineer can assert with probability 0.99 that his error will be at most 12 J.D.

Determination of Sample Size

The formula for E can be solved for n to determine the sample that is needed to attain a desired degree of confidence.

rounded up to the next integer.

Note that to be able to use this formula to find n, we must know 1 – α, E, and σ.

Example 9.6 Finding the Sample Size Needed for Estimating the Population Mean

It is known that weights of women in one age group are normally distributed with a standard deviation σ of 9 kg. A researcher wishes to estimate the mean weight of all women in this age group. Find the sample size needed in order to be 90% confident that the sample mean will not differ from the population mean by more than 1.50 kg.

Solution

We substitute E = 1.50, σ = 9, and z0.05 = 1.645 into the formula for n, we get

rounded to next integer, we get n = 98.

Factors Affecting the Sample Size

In general, the required sample size, n, increases as

1. σ increases,

2. the confidence coefficient increases, and

3. the maximum error of estimate decreases.

36E 2.575. 12

60= =

2Since, E = z , then solving for n we getn

ασ

22z .

nE

α σ =

21.645 9n 97.4

1.50

× = =

Page 239: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

225

9.3 Confidence Interval About Population Mean When σ is Unknown

In the previous section, we have learned an important fact; if x is a normal random variable with mean µ and standard deviation σ, then, for samples of size n, the sample mean x is also normal and has mean µ and standard

deviation nσ . This is the same to say that

has the standard normal distribution.

However, in most practical applications, the population standard deviation σ is unknown, and we cannot base our confidence interval estimate on the above standard normal distribution. What do we do now? Since the population standard deviation, σ, is unknown, the best we can do is to estimate it by the sample standard deviation, s. Now, we replace σ by s and base our confidence interval estimate on a new variable,

This new variable is called the t-distribution with degrees of freedom, df, equal to n – 1.

Properties of the t-Curve

1. Bell-shaped and symmetric about zero.

2. Total area under the curve equals 1.

3. As the degree of freedom increases, t-curve looks more like z-curve.

4. A t-curve extends indefinitely in both directions, approaching but never touching the x-axis as it does so.

As a matter of fact there is a family of t-distributions; each is identified by its degree of freedom. This means that we have different t-curves, each for a different degree of freedom. However, all t-curves are similar and look like the standard normal curve. Figure 9.6 shows the standard normal and three t-curves.

Xz

n

− µ=σ

Xt

s n

− µ=

Page 240: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Nine: Point and Interval Estimation

226

Figure 9.6 Standard normal curve and three t-curves

From Figure 9.6 notice that as n increases, the t-curve gets closer to the z-curve.

Using the t-Table

Because the t-variable is a continuous one, then probabilities for a variable having the t-distribution are equal to areas under its associated t-curve. We will show only a portion of the t-table and learn how to use it, refer to table 9.2 below.

The t-table has two outside columns, labeled df, display the number of degrees of freedom. Once again, we use the notation tα to denote the t-value having area α to its right under the t-curve. Thus the column headed t0.10 contains t-values having area 0.10 to their right; the column headed t0.05 contains t-values having area 0.05 to their right; and so on.

Table 9.2 A Portion of the t-table

Values of tα df t0.10 t0.05 t0.025 t0.01 t0.005 df

15 1.341 1.753 2.131 2.602 2.947 15 16 1.337 1.746 2.120 2.583 2.921 16 17 1.333 1.740 2.110 2.567 2.898 17 18 1.330 1.734 2.101 2.552 2.878 18 19 1.328 1.729 2.093 2.539 2.861 19

Example 9.7 Finding the t-Value Having a Specified Area to Its Right

a. For a t-curve with 16 degrees of freedom, determine t0.05; that is, find the t-value having area 0.05 to its right under the t-curve, as shown in Figure 9.7(a).

b. For a t-curve with 19 degrees of freedom, determine t0.01; that is, find the t-value having area 0.01 to its right under the t-curve, as shown in Figure 9.7(b).

Solution

Page 241: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

227

a. To find the required t-value, we use the t-table. We go to the row where df = 16 then we go across that row until we are under the column headed t0.05. The number in the body of the table there, 1.746, is the required t-value.

Figure 9.7(a) Finding the t-value having area 0.05 to its right

b. To find the required t-value, we use the t-table again. We go to the row where df = 19 then we go across that row until we are under the column headed t0.01. The number in the body of the table there, 2.539, is the required t-value.

Figure 9.7(b) Finding the t-value having area 0.01 to its right

Confidence Intervals about Population Mean (σ Unknown)

Now, we can develop a procedure to obtain a confidence interval for population mean when the population standard deviation is unknown. The confidence interval in this case is the same as the one when the population standard deviation is known except now we use a t-distribution instead of the z-distribution, and therefore in the new confidence interval, we use tα/2 instead of zα/2.

Page 242: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Nine: Point and Interval Estimation

228

A (1 – α)100% C.I. for the Population Mean (σ Unknown) is Given By:

Example 9.8 Introducing Confidence Interval for the Population Mean when σ is Unknown

Suppose the GPA of 18 students selected at random in the faculty of science yields an average of 2.9 and standard deviation of 0.8. Construct a 95% C.I. for the true mean GPA for all students. Assume original population is approximately normal.

Solution

We substitute the values of n = 18, x = 2.9, s = 0.8, and t(17, 0.025) = 2.110, look at Figure 9.8, in the confidence interval formula for population mean when σ is

unknown, and get a 95% C.I. for µ given by:

Figure 9.8 Finding the t-value with df = 17 for a 95% C.I.

( )17, 0.025s

x t .n

0.82.9 2.110

182.9 0.40

or

2.50 3.30

±

= ±

= ±

≤ µ ≤

( )

( ) ( )

( )

n 1, 2

.n 1, 2 n 1, 2

n 1, 2

s x t .

nor

s s x t . x t

n nwhere

t the value of t with df = n-1such that 2 of the

− α

− α − α

− α

±

− ≤ µ ≤ +

= α

area under the t-curve is to its right.

Page 243: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

229

To interpret this confidence interval we say that if we repeat the experiment a large number of times, and in each time we take a sample of size n = 18, calculate its mean and standard deviation, and construct a 95% C.I., then 95% of the constructed C.I.s will have the mean within the interval (2.50, 3.30).

Maximum Error of Estimate and Determination of Sample Size

when σ is Unknown

The formula for maximum error of estimate required to estimate the population mean in the case when σ is unknown is the same as the formula when the population standard deviation is known except now we use tα/2 instead of zα/2, and s instead of σ. Therefore the new formula for E becomes

However, the formula for the sample size needed to estimate the population mean is, somewhat, different and needs some further explanation. Because t depends on the degrees of freedom, and these, in turn, depend on n, so we cannot simply put tα/2 instead of zα/2. What is usually done in this case is that we approximate tα/2 by zα/2 and obtain the following result for the sample size needed to estimate the population mean when σ is unknown:

Again, n should be rounded up to the next integer.

Example 9.9 Determining the Maximum Error of Esimate when σ is Unknown

The number in thousands of man-hours required to build n = 13 ships has standard deviation of 240 days. Find the maximum error of the estimate that corresponds to a 95% confidence interval.

Solution

We substitute the values of n = 13, s = 240, and t(12, 0.025) = 2.179, look at Figure 9.9, in the formula for E when σ is unknown, and get the following:

The result means if we want to construct a C.I. such that we are 95% confident that it will contain the true population mean, then the maximum error of estimate should be taken equal to 145.04.

2s

E t .n

α=

22z .s

nE

α=

2402 179 145 04

13= =E . .

Page 244: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Nine: Point and Interval Estimation

230

Figure 9.9 Finding the t-value with df = 12 for a 95% C.I.

Example 9.10 Determining the Sample Size Needed to Esimate the Mean when σ is Unknown

Find the sample size needed to estimate the mean heart rate if we require that the 95% confidence interval has an error no more than 2 beats per minute and the sample standard deviation for heart rate equals 9 beats per minute.

Solution

We substitute the values of E = 2, s = 9, and, z0.025 = 1.96 in the formula for n when σ is unknown, and get the following result:

Thus, 78 patients needed to be studied.

As a final remark before we end this section, we must say that the students some times confuse the use of the z- or the t-distribution. To clarify this point, notice that we use the z-distribution when

1. σ is known, or

2. n is large. In this case s is a good estimate of σ.

On the other hand, we use the t-distribution when σ is unknown and n is small.

9.4 Confidence Interval about the Population Proportion

Recall Chapter 8, Section 8.3, we have defined proportion and its sampling distribution. In this section, we present a procedure to obtain a confidence interval for a population proportion. We then discuss the two important issues related to confidence interval of proportion, namely the maximum error of estimate and the sample size needed.

21.96 9n 77.79

2

× = =

Page 245: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

231

As we now know, the point estimate of proportion p is defined by p x n ,= where x = number of successes in a sample of size n. We also know that

the standard error of estimate is ( )p p 1 p n.σ = − For large n, this standard

error will be very well approximated by ( )p ˆ ˆˆ p 1 p n.σ = − Therefore, a large-

sample confidence interval for population proportion is now ready to be given.

A (1 – α)100% Large-Sample Confidence Interval for Population Proportion is given by:

Observe the condition for using the above confidence interval is that both the

number of successes, n, and the number of failures, n – x, must both be ≥ 5. This

is equivalent to saying that ( )ˆ ˆnp and n 1-p both ≥ 5.

Example 9.11 Constructing Confidence Interval for Proportion of Students Living in Amman

When 384 college students are randomly selected and surveyed, it is found that 149 live in Amman. Construct a 99% confidence interval for the percentage of all college students who live in Amman.

Solution

It is obvious that the condition to construct a large-sample confidence interval for proportion is satisfied. All we need now is the value of p , n, and z0.005. Since

149p 0.39

384= = , n = 384, and z0.005 = 2.575, therefore, a 99% confidence interval

for proportion of students living in Amman is given by

( )( )0.39 2.575. 0.39 0.61 384 0.39 0.064

or

0.326 p 0.454

± = ±

≤ ≤

To interpret this confidence interval we say that if we repeat the experiment over and over again, a large number of times, and in each time we take a random sample of 384 students and construct a confidence interval for the proportion living in Amman, we will find that in 99% of the intervals

( )

( ) ( )

2

2 2

ˆ ˆ ˆ p z . p 1 p n

or

ˆ ˆ ˆ ˆ ˆ ˆ p z . p 1 p n p p z . p 1 p n

where,

the number of successes, n, and the number of failures, n - x,

are both 5.

α

α α

± −

− − ≤ ≤ + −

Page 246: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Nine: Point and Interval Estimation

232

constructed, the true proportion is somewhere between 0.326 and 0.454. Briefly stated, we can be 99% confident that the percentage of all students living in Amman is somewhere between 32.6% and 45.4%.

Maximum Error of Estimate for Proportion

Finding the maximum error for proportion is similar to that of the mean. If we take another look at the confidence interval of proportion, we see that the maximum error in estimating a population proportion by a sample proportion

is simply ( )2 ˆ ˆE z . p 1 p n.α= −

Once again, the maximum error in this case is one half the length of the confidence interval, and represents the precision with which a sample proportion, p , estimates the true population proportion, p, with a specified

level of confidence.

Example 9.12 Maximum Error for Proportion of Students Living in Amman

Go back to Example 9.11. Find the maximum of error.

Solution

The maximum error for proportion of students living in Amman is

Determining the Sample Size Needed to Estimate a Proportion

In practice, we usually specify both the maximum error and the level of confidence. Then we must determine the sample size needed to satisfy these specifications. In order to find the sample size, n, we solve the

equation ( )2 ˆ ˆE z . p 1 p nα= − for n and get ( )2

2zˆ ˆn p 1 p .

= −

However, this

formula cannot be used since pdepends on n.

So, what do we do now to solve this problem? The answer is we can use either one of the following two solutions.

Solution (1)

In many practical applications, past similar experiments, or relevant experience may be well available that enable us to provide an educated guess of the estimate p .

Solution (2)

If solution (1) is not available, we can try to provide a value for p that would

never underestimate the sample size needed. If we look again at the equation

( )( )E 2.575. 0.39 0.61 384 0.064.= =

Page 247: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

233

we see that the quantity ( )ˆ ˆp 1 p− appears in the numerator.

Thus we must determine the value of p that will make the

quantity ( )ˆ ˆp 1 p− as large as possible. It can be shown that the product

( )ˆ ˆp 1 p− achieves its maximum value when p = 0.5. For example

when p= 0.9 then ( )ˆ ˆp 1 p− = (0.9)(0.1) = 0.09

when p= 0.8 then ( )ˆ ˆp 1 p− = (0.8)(0.2) = 0.16

when p= 0.6 then ( )ˆ ˆp 1 p− = (0.6)(0.4) = 0.24

when p= 0.5 then ( )ˆ ˆp 1 p− = (0.5)(0.5) = 0.25

when p= 0.4 then ( )ˆ ˆp 1 p− = (0.4)(0.6) = 0.24

when p= 0.2 then ( )ˆ ˆp 1 p− = (0.2)(0.8) = 0.16

when p= 0.1 then ( )ˆ ˆp 1 p− = (0.1)(0.9) = 0.09

Therefore, if solution (1) is not available, we should use the value of p= 0.5.

Example 9.13 Sample Size Needed to Estimate Proportion of Students Living in Amman

Suppose we want to estimate the proportion of college students who live in Amman. What sample size is needed to construct a 95% confidence interval so that the margin of error will be at most 20%?

Solution

Because nothing is said about the value of the estimate p , we assume that it is

equal to 0.5. From the information given, we know that E = 20% and z0.025 = 1.96. If we plug these values in the formula of n, we get

Thus we take n = 25 and say, in order to be 95% confident of estimating the proportion of college students living in Amman so the maximum error of estimate be at most 20%, a sample size of n = 25 is needed.

( )2

2zˆ ˆn p 1 p ,

= −

( ) ( )( )2 2

2z 1.96ˆ ˆn p 1 p 0.5 0.5 24.01

E 0.20α = − = =

Page 248: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Nine: Point and Interval Estimation

234

Exercises

9.1 The temperature (in C°) of a city during one week is recorded in the following table.

Saturday Sunday Monday Tuesday

24 27 25 24

Wednesday Thursday Friday 23 26 25

What is the best point estimate for the population mean? Find the standard error of the estimate.

9.2 A physician wishes to estimate the mean mortality index for neoplasm of female breast. A random sample of 12 women yields the following mortality indexes.

95.0 88.6 89.2 78.9 84.6 81.7 87.0 72.2 65.1 68.1 67.3 52.5 Use the data to obtain a point estimate of the mean mortality index for all women with breast cancer. Find the standard error of the estimate.

9.3 A long-distance phone company wishes to estimate the mean duration of long-distance calls originating in Amman, Jordan. A random sample of 15 long-distance calls originating in Amman yields the following call durations, in minutes.

1 3 1 7 8 35 24 12 2 3 19 7 7 8 10

Use the data to obtain a point estimate of the mean call duration for all long -distance calls originating in Amman. Find the standard error of the estimate.

9.4 In a sample of 104 students, 68 passed the course. What is the best point estimate for the population

proportion of students who passed the course? Find the standard error of the estimate.

9.5 Using the t-tables, determine the t-value for the given confidence interval and degrees of freedom.

a. 95% confidence interval with df = 5.

b. 90% confidence interval with df = 15.

c. 99% confidence interval with df = 26.

d. 95% confidence interval with df = 29.

e. 99% confidence interval from a sample of size 100.

f. 90% confidence interval from a sample of size 22.

9.6 Determine the error in estimating the population mean.

Based on a sample of size 36, a 90% confidence interval for the mean score of all students on some test is from 65.4 to 69.5.

9.7 Determine the error in estimating the population mean.

Assume a 95% t confidence interval for the population mean yields the following result: 10.5 < µ < 14.3, based on a sample of size n = 16.

9.8 Construct the requested confidence interval from the supplied information.

a. Thirty randomly selected students took the statistics final. If the sample mean was 78 and the standard deviation was 10.2, construct a 99%

Page 249: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

235

confidence interval for the mean score of all students.

b. A local bank needs information concerning the checking account balances of its customers. A random sample of 25 accounts was investigated and yielded a mean balance of 470. 90 J.D. and a standard deviation of 201.06 J.D. Find a 95% confidence interval for the true mean checking account balance for local customers.

c. A laboratory tested fifteen skulls and found that the mean length of skull was 97.73 mm with s = 5.59 mm. Construct a 98% confidence interval for the true mean length of all such skulls.

9.9 A sample of 64 calculus students at a large college had a mean score of 72 with a standard deviation of 6. Find a 98% confidence interval for the mean score for all calculus students at this college.

9.10 A 99% confidence interval for the mean salary of secretaries was found to be from 250 J.D. to 450 J.D. Obtain the error by

a. taking half the length of the confidence interval.

b. using the formula assuming n = 14, σ = 145.02 J.D.

9.11 A sample of forty students in the Biotechnology Department is randomly selected. The average, to the nearest decimal, each student gained in high school is recorded in the following table.

69 92 80 72 63

62 94 65 97 76

73 65 64 95 56

63 65 63 64 67

56 74 79 70 62

60 85 96 67 65

88 64 74 60 74

62 65 77 64 58

Assuming the population standard deviation of all such averages is 12.0, determine a 90% confidence interval for the mean weight, µ, of all high school averages. (Note: x = 71.1250.)

9.12 Construct a 90% confidence interval for the population mean, µ. Assume the population has a normal distribution. A sample of 15 randomly selected students has a grade point average of 2.86 with a standard deviation of 0.78.

a. Determine the error.

b. Find the sample size required to have a maximum error of 2.0 and a 99% confidence level, assuming that σ = 12.0.

9.13 Construct a 95% confidence interval for the population mean, µ. Assume the population has a normal distribution. A sample of 20 college students had mean annual earnings of 2100 J.D. with a standard deviation of 350 J.D.

9.14 Go back to Exercise 9.12, a 90% confidence interval for the mean average, µ, of all students is from 68.0258 to 74.2242.

9.15 Construct a 95% confidence interval for the population mean, µ. Assume the population has a normal distribution. A random

Page 250: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Nine: Point and Interval Estimation

236

sample of 16 cars has a mean speed of 82 mph with a standard deviation of 9 mph.

9.16 A survey of 200 fatal accidents showed that 25 were alcohol related. Find a point estimate for p, the population proportion of accidents that were alcohol related. Also find the standard error for the estimate.

9.17 Go back to Exercise 9.16. Construct a 95% confidence interval for the proportion of accidents that were alcohol related.

9.18 A survey of 100 non-fatal accidents showed that 3 involved a sleepy driver. Find a point estimate for p, the population proportion of accidents that involved a sleepy driver.

9.19 Go back to Exercise 9.18. Construct a 99% confidence interval for the proportion of non-fatal accidents that involved a sleepy driver.

9.20 Of 125 books in mathematics, 25 are found to be in linear algebra. Construct a 98% confidence interval for the proportion of all books in linear algebra.

9.21 Of 369 randomly selected IT students, 23 said that they planned to work for public companies. Construct a 95% confidence interval

for the percentage of all IT students who plan to work for public companies.

9.22 A researcher at a major hospital wishes to estimate the proportion of the adult population that has high blood pressure. How large a sample is needed in order to be 99% confident that the sample proportion will not differ from the true proportion by more than 4%?

9.23 A researcher wishes to estimate the number of households with two cars. How large a sample is needed in order to be 98% confident that the sample proportion will not differ from the true proportion by more than 3%? A previous study indicates that the proportion of households with two cars is 18%.

9.24 A survey of 1020 college seniors working towards an undergraduate degree was conducted. Each student was asked, "Are you planning or not planning to pursue a graduate degree?" Of the 1020 surveyed, 656 stated that they were planning to pursue a graduate degree. Construct and interpret a 98% confidence interval for the proportion of college seniors who are planning to pursue a graduate degree.

Page 251: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

237

Chapter Learning Outcomes

When you complete your careful study of this chapter you should be able to:

1. use and understand formulas given in this chapter.

2. define and obtain point estimate for population mean.

3. define the confidence level.

4. construct and interpret confidence interval for population mean when population standard deviation is known.

5. construct and interpret confidence interval for population mean when population standard deviation is unknown but n ≥ 30.

6. construct and interpret confidence interval for population mean when population standard deviation is unknown.

7. determine the sample size needed to estimate the mean.

8. calculate the maximum error needed to estimate the mean.

9. know the factors that affect the width of the confidence interval.

10. know the factors that affect the sample size.

11. know when and how to use the t-distribution.

12. state the basic properties of the t-curve and use the t-table.

13. define and obtain point estimate for population proportion.

14. construct and interpret confidence interval for population proportion.

15. determine the sample size needed to estimate population proportion.

16. calculate the maximum error needed in estimating population proportion.

Chapter Key Terms

Confidence interval estimate

Confidence coefficient

Confidence level

Estimation

Interval estimate

Large-sample confidence interval

Maximum error of estimate

Point estimate

Sample size

Statistical inference

Student's t-distribution

t-curve

t-distribution

t-table

Test of hypothesis

Page 252: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Nine: Point and Interval Estimation

238

Page 253: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

239

Chapter 10

Hypotheses Testing

In Chapter 8 we started the study of statistical inference by introducing the concept of a sampling distribution of

the mean. In Chapter 9 we considered studies where a statistic, such as the sample mean or sample proportion, calculated from a random sample is used to estimate the corresponding population parameter using either point or interval estimate. In this chapter we pay attention to the second part of statistical inference, namely, hypotheses testing. In hypotheses testing, we have in advance an opinion about the value of the parameter and we wish to test whether the data collected in a sample confirm this value.

We give the general concept of hypotheses testing, test of hypothesis about the mean of a normal distribution, test of hypothesis about population proportion, the p-value approach, and the power of the test.

Chapter Outline

10.1 Introduction and General Concepts of Hypothesis Testing in One-Sample Case

10.2 Testing Hypothesis about the Mean When σ is Known

10.3 Testing Hypothesis about the Mean When σ is Unknown

10.4 Testing Hypothesis about Population Proportion

10.5 The p-Value

10.6 The Power of the Test

Page 254: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Ten: Hypotheses Testing

240

10.1 Introduction and General Concepts of Hypothesis Testing in the One-Sample Case

In Chapter 9, we used the sample mean X to construct a confidence interval about the population mean, µ. Now we learn how this statistic can be used to make decisions about a hypothesized value of a population mean. We start with the methodology.

The Methodology of Hypothesis Testing

Definition 10.1 Hypothesis

A hypothesis is a claim, or assumption about a population parameter.

For example, the parameter may be the mean µ and we may claim that µ = 100. Or, the parameter may be some proportion p and we claim that p = 20%.

A test of hypothesis is a series of steps to see whether the hypothesis is true or not. Each testing problem involves two hypotheses; the null and alternative hypotheses.

Definition 10.2 Null Hypothesis

Null hypothesis is a hypothesis to be tested. It is a statement about the value of a population parameter and is denoted by H0.

The null hypothesis should always specify a single value for the parameter of interest and will be in the form µ = µ0, where µ0 is a given (or fixed) number. Therefore, we will write the null hypothesis as

H0: µ = µ0.

Definition 10.3 Alternative Hypothesis

Alternative hypothesis is a hypothesis that is opposite to the null hypothesis and is denoted by H1 (or Ha).

Alternative hypothesis is the hypothesis that must be true if the null hypothesis is false. In this book we will study three choices for the alternative hypothesis; each depends on the purpose of the test.

1. If the purpose of the test is to decide that a population mean, µ, is different from a specific value µ0, then the alternative hypothesis should be

H1: µ ≠ µ0.

Page 255: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

241

In this case, the test of hypothesis is called a two-tailed test.

2. If the purpose of the test is to decide that a population mean, µ, is less

than a specific value µ0, then the alternative hypothesis should be

H1: µ < µ0.

In this case, the test of hypothesis is called a left-tailed test.

3. If the purpose of the test is to decide that a population mean, µ, is greater

than a specific value µ0, then the alternative hypothesis should be

H1: µ > µ0.

In this case, the test of hypothesis is called a right-tailed test.

It is also customary to say that a test of hypothesis is a one-tailed test if it is either a left-tailed or right-tailed.

Example 10.1 Learning How to Write the Null and Alternative Hypotheses

Suppose a well-known car company claims that its new model car will average

a. 20 kilometers per liter.

b. More than 20 kilometers per liter.

If you plan to test the company's claim, what are the null and alternative hypotheses?

Solution

Keeping in mind that the null hypothesis is the hypothesis to be tested and the alternative hypothesis is opposite to the null hypothesis, you must choose the following null and alternative hypotheses:

a. H0: µ = 20 km per liter

H1: µ ≠ 20 km per liter.

This is a two-tailed hypothesis.

b. H0: µ = 20 km per liter

H1: µ > 20 km per liter.

This is a right-tailed hypothesis

Errors in Hypothesis Testing

Each time we test a hypothesis we could possibly reach an incorrect decision. The reason is that we use only part of the information (the sample) to take a decision about the whole population. As a matter of fact, there are two kinds of error that could be committed in a test of hypothesis; Type I error and Type II error. Look at Table 10.1.

Page 256: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Ten: Hypotheses Testing

242

Table 10.1 Correct and incorrect decisions for a hypothesis test

H0 is: True False

Do not Reject H0

Correct Decision

Type II Error

D

eci

sio

n:

Reject H0

Type I Error

Correct Decision

Definition 10.4 Type I Error

Type I Error is the error committed when H0 is rejected and it is in fact true. The probability of committing Type I Error is denoted by α, that is P(Type I Error) = αααα

Definition 10.5 Type II Error

Type II Error is the error committed when H0 is accepted and it is in fact false. The probability of committing Type II Error

is denoted by β, that is P(Type II Error) = ββββ

Definition 10.6 Level of Significance

The probability of committing a Type I error is referred to as the level of significance of the statistical test.

Definition 10.7 The Power of a Test

The power of a statistical test is the probability of rejecting the null hypothesis when it is in fact false and should be rejected. The power is denoted by 1 – β.

The power of a test is equal to 1 – β; that is one minus the probability of a Type II error.

Example 10.2 Types I and II Errors

Consider again the hypothesis test given in part (a) of Example 10.1. The null and alternative hypotheses are

H0: µ = 20 km per liter

H1: µ ≠ 20 km per liter

where µ is the mean of all company's cars driven. Explain in words the following.

a. Type I error.

Page 257: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

243

b. Type II error.

c. Correct decision.

Solution

a. A Type I error is committed when a true null hypothesis is rejected. So, a Type I error will occur when in fact µ = 20 but we conclude that µ ≠ 20.

b. A Type II error is committed when a false null hypothesis is accepted. So, a Type II error will occur when in fact µ ≠ 20 but we conclude that µ = 20.

c. A correct decision can occur in one of two ways:

• We do not reject a true null hypothesis. This would occur if in fact µ = 20 and the results of the statistical test do not lead to the rejection of that fact.

• We do reject a false hypothesis. This would occur if in fact µ ≠ 20 and the results of the statistical test lead to the rejection of that fact.

Before we give the necessary steps to perform a statistical test of hypothesis we must give the following definitions. These definitions plus the ones given earlier will ease our coming work in testing a hypothesis.

Definition 10.8 Test Statistic

It is the statistic used as a basis to decide whether H0 should be rejected.

Definition 10.9 Rejection Region

The rejection region is the set of all values for the test statistic which leads to conclude the rejection of H0.

Definition 10.10 Nonrejection Region (or Acceptance Region)

The nonrejection region is the set of all values of the test statistic which leads to conclude the nonrejection (or acceptance) of H0.

Definition 10.11 Critical Values

Critical values are those values of the test statistic that separate the rejection and nonrejection regions.

The terminology we used thus far in this section applies to any hypothesis test, for example hypothesis test for a population mean and hypothesis test for

Page 258: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Ten: Hypotheses Testing

244

proportion or any other hypothesis test for single parameter such as the variance or standard deviation of population.

The Logic Behind Hypothesis Testing

From the proceeding discussion, in particular Example 10.1, we have learned how to choose appropriate null and alternative hypotheses for a given test. A big question remains to be answered is "Which of these two hypotheses is true and must be selected and how do we decide that?"

To answer the above question, and roughly speaking, we take a random sample from the population. If the sample data is in agreement with H0, then we do not reject H0; if the sample data is not in agreement with H0, then we reject H0.

Example 10.3 Illustrating the Idea of Hypothesis Testing

Suppose a well-known car company claims that its new model car will make, on the average, 20 km per liter. We will assume that the distance traveled per liter is normally distributed with mean µ (unknown) and standard deviation of 3.4 km per litter. A random sample of 25 cars is taken and the distance traveled

in kilometers per litter was recorded for each car. The sample mean is X 18= km per liter.

Do the data support the claim that the mean distance traveled is different than 20 km per liter? To answer the question, we will use the following steps.

a. Write down both H0 and H1.

b. Discuss the logic behind the hypothesis test.

c. Get the sampling distribution of the mean for samples of size 25.

d. Get a solid criterion to decide whether or not to reject H0.

e. Apply the criterion in part (d) to the sample data you collected and state the conclusion.

Solution

Let µ be the mean distance traveled per liter for all company's cars.

a. From Example 10.1, H0 and H1 are as follows:

H0: µ = 20 km per liter

H1: µ ≠ 20 km per liter.

b. If H0 is true and µ = 20 km per liter, then the sample mean of the 25 cars must be approximately equal to 20 km per liter. However, if the sample mean differs "too much" from 20 km per liter, then we tend to reject H0 and conclude that H1 is true. How much difference is "too much"? We

Page 259: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

245

answer this question in part (d) when we use our knowledge of the sampling distribution of the mean.

c. Since n = 25, σ = 3.4, and distances traveled are normal, then we know that

• xµ = µ (unknown),

• x n 3.4 25 0.68, andσ = σ = =

• X is normal.

That is, X ~ N( , 0.68).µ

d. From the empirical rule and in particular from the "95.44 part", we know that for a normally distributed variable, 95.44% of all observations are within 2 standard deviations of the mean. Applying this to the

variable X , we see that 95.44% of all samples of 25 cars have mean distance traveled within 2 x 0.68 = 1.36 km per liter of µ.

Stated in an equivalent way, only 4.56% of all samples of 25 cars have mean distance traveled that are not within 1.36 km per liter of µ. See Figure 10.1.

Figure 10.1 95.44% of all samples of 25 cars have mean distance traveled within

2 standard deviations (1.36) km per liter of µ

According to Figure 10.1, if the mean distance traveled in km per liter of the 25 cars sampled is not within 2 standard deviations of the population mean µ, then we have evidence against H0. Why? Because getting a value of such a sample mean would occur by chance only 4.56% of the time, if H0 were true.

To summarize our findings, we have obtained the following solid criterion to decide whether or not to reject H0:

If the mean distance traveled in km per liter of the 25 cars sampled from the population of company's cars is more than 2 standard deviations away from 20 km per liter, then reject H0: µ = 20 km per liter and accept H1: µ ≠ 20 km per liter.

Page 260: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Ten: Hypotheses Testing

246

The above criterion is depicted in Figure 10.2(a). If H0 is true, then

X ~ N(20, 0.68); this normal curve is graphed on Figure 10.2(a) to give

Figure 10.2(b).

Figure 10.2: (a) Criterion for deciding whether or not to reject H0 (b) Normal curve

associated with X if H0 is true, superimposed on the decision criterion

e. Now we apply the criterion we obtained in part (d) to the sample data and give the conclusion. Since the mean distance traveled of the sample

of 25 cars is X 20= km per liter, then

This result says that the sample mean of 18 km per liter is 2.94 standard deviations below the null hypothesis population mean of 20 km per liter, as shown in Figure 10.3.

Figure 10.3 Number of standard deviations that the sample mean of 18 km per liter is a way from the null hypothesis population mean of 20 km per liter

Because the mean distance traveled of the 25 cars sampled from the company's cars is more than 2 standard deviations away from 20 km per liter, we reject H0: µ = 20 km per liter, and conclude that the alternative hypothesis H1: µ ≠ 20 is

X 20 18 20z 2.94.

0.68 0.68

− −= = = −

Page 261: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

247

true. In other words, the data support the claim that the mean distance traveled is different than 20 km per liter.

Before we end this section and move to Section 3, observe Figure 10.3 again. The figure shows the rejection region, the nonrejection region, and the critical values. The rejection region consists of two parts; the first contains all values of the test statistic that are less than -2; and the second contains all values that are greater than 2. The nonrejection region contains all the values from -2 to 2. Finally, -2 and 2 are the two critical values.

Also, observe Table 10.2 and Figure 10.4. Table 10.2 gives rejection regions for two-tailed, left-tailed, and right-tailed tests. In Figure 10.4, we graph the rejection regions for two-tailed, left-tailed, and right-tailed tests.

Table 10.2 Sides of rejection regions for two-tailed, left-tailed, and right-tailed tests

Two-tailed test

Left-tailed test Right-tailed test

Sign in H1 Rejection region

≠≠≠≠

Both sides

<

Left side

>

Right side

Figure 10.4 Graphical representation of rejection and nonrejection regions for two-tailed, left-tailed, and right-tailed tests

Steps to Follow In Hypothesis Testing

In order to work with problems that deal with hypothesis testing, it is helpful to follow the five steps given below. These five steps must be followed in sequence as they are given.

Step 1

Formulate a null hypothesis and an appropriate alternative hypothesis.

Step 2

Specify the significance level, α.

Step 3

Page 262: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Ten: Hypotheses Testing

248

Based on the sampling distribution of an appropriate statistic, construct a criterion for testing the null hypothesis against the given alternative.

Step 4

From the sample data, calculate the value of the test statistic on which the decision is to be based.

Step 5

Decide whether to reject the null hypothesis or whether to fail to reject it.

10.2 Testing Hypothesis about the Mean When σ is Known

In this section, we present a procedure to conduct a hypothesis test for a population mean µ when the population standard deviation σ is known. We assume that the variable considered is normally distributed with unknown mean µ and known standard deviation σ. However, the procedure we develop here works very well when the sample size is large; n ≥ 30, regardless of the distribution of the variable because of the central limit theorem.

The procedure is called the One-Sample z-Test for a Population Mean.

Assumptions

1. Normal population or large sample, n ≥ 30.

2. σ known.

Step 1

The null hypothesis is H0: µ = µ0 and the alternative hypothesis is one of the following three:

H1: µµµµ ≠≠≠≠ µµµµ0 or H1: µµµµ < µµµµ0 or H1: µµµµ > µµµµ0

(two-tailed) (left-tailed) (right-tailed)

Step 2

Specify the significance level, α.

Step 3

Based on the sampling distribution of the sample mean x , we construct the following rejection regions:

We use the cumulative standard normal table to find the values of

2z and z ,α α see Figure below.

(((( )))) (((( )))) (((( ))))2| z | z or z < -z or z > z

two tailed left-tailed right-tailed

α α αα α αα α αα α α>>>>

−−−−

Page 263: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

249

Step 4

Calculate the value of the test statistic

Step 5

Decide whether to reject the null hypothesis or whether to fail to reject it.

Example 10.4 Illustrating the One-Sample z-Test for a Population Mean (Two-Tailed Test)

A statistics course professor claims that the mean grade point average (GPA) of his class is 3.2. To test this claim a random sample of 36 student's GPA yielded a mean of 2.8. At the 5% significance level, do the data provide sufficient evidence to conclude that the mean GPA of all students differs than 3.2? Assume that the standard deviation of GPA is known to be 0.9.

Solution

Since the sample size is large; n = 36, and the population standard deviation is known, we can apply the one-sample z-test for a population mean procedure.

Step 1

The null hypothesis: H0: µ = 3.2

Alternative Hypothesis: H1: µ ≠ 3.2

Step 2

Level of significance, α = 0.05.

Step 3

Criterion: Reject the null hypothesis if |z| > z0.025, that is, z > 1.96 or z < -1.96, where

0Xz .

n

− µ− µ− µ− µ====σσσσ

Page 264: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Ten: Hypotheses Testing

250

Step 4

Calculations:

Step 5

Decision: Since z = -2.67 falls in the rejection region, see Figure 10.5, therefore we reject the null hypothesis.

Figure 10.5Criterion for deciding whether or not to reject the null hypothesis in Example 10.4

Example 10.5 Illustrating the One-Sample z-Test for a Population Mean (Right-Tailed Test)

The mean salary of employee at a service company is 160 J.D with a standard deviation of 32 J.D. Suppose a random sample of 33 employees is examined and it is observed that the sample mean salary is 170 J.D. At the 1% level of significance, is there evidence to believe that the true average is greater than 160 J.D.

Solution

Since the sample size, n = 33, is large and the population standard deviation is known, we can apply the one-sample z-test for a population mean procedure.

Step 1

The null hypothesis: H0: µ = 160 J.D.

Alternative Hypothesis: H1: µ > 160 J.D.

Step 2

Level of significance, α = 0.01.

2.8 3.2z 2.67

0.9 36

−= = −

0Xz

n

− µ=σ

Page 265: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

251

Step 3

Criterion: Reject the null hypothesis if z > z0.01, that is, z > 2.33, where

Step 4

Calculations:

Step 5

Decision: Since z = 1.89 falls in the nonrejection region, see Figure 10.6, therefore the null hypothesis cannot be rejected. To put this result in another way, we say

that the difference between X 170 J.D. and = 160 J.D.= µ can be attributed to

chance.

Figure 10.6 Criterion for deciding whether or not to reject the null hypothesis in Example 10.5

Example 10.6 Illustrating the One-Sample z-Test for a Population Mean (Left-Tailed Test)

A potato chips manufacturing company produces bags of chips. If the production process is working properly, it turns out bags with a mean of 28 grams and standard deviation of 2 grams. Bags of weights greater than 28 grams can be kept or reduced, but bags that contain less than 28 grams would cause a bad reputation. A sample of 25 bags is selected from the production line. The sample indicates an average weight of 27.3 grams. The company wishes to determine whether the production equipment needs an immediate adjustment. If the company wishes to test the hypothesis at the 0.10 level of significance, what decision would it make? Assume the weights of bags of chips are normally distributed.

Solution

170 160z 1.89

30 32

−= =

0Xz

n

− µ=σ

Page 266: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Ten: Hypotheses Testing

252

Since the variable is assumed normal and the population standard deviation is known, we can apply the one-sample z-test for a population mean procedure.

Step 1

The null hypothesis: H0: µ = 28 grams

Alternative Hypothesis: H1: µ < 28 grams

Step 2

Level of significance, α = 0.10.

Step 3

Criterion: Reject the null hypothesis if z < - z0.10, that is, z < -1.28, where

Step 4

Calculations:

Step 5

Decision: Since z = -1.75 falls in the rejection region, see Figure 10.7, therefore the null hypothesis is rejected. To put this result in another way, we say that the difference between x 27.3 grams and = 28 grams= µ cannot be attributed to

chance. Therefore, the production equipment must be adjusted immediately.

Figure 10.7 Criterion for deciding whether or not to reject the null hypothesis in Example 10.6

10.3 Testing Hypothesis About the Mean when σ is Unknown

In Section 10.2 we introduced a hypothesis test for a population mean when the population standard deviation is known. We called the test "the One-Sample z-

0xz

n

− µ=σ

27.3 28z 1.75

2 25

−= = −

Page 267: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

253

Test for a Population Mean". In that test we assumed that the population is normal, or n is large and σ is known. What if the sample size is small and σ is unknown? In this case, we cannot use the test given in Section 10.2. However, if the sample comes from a normal population, we may recall from Section 9.3 that the sampling distribution of the mean will follow a t-distribution with n – 1 degrees of freedom. Accordingly, we can base our test of the null hypothesis µ = µ0 on the statistic

, which is a random variable having the t-distribution with n – 1 degrees of freedom.

We will use this variable as our test statistic and consequently use the t-table to find critical values, rejection, and nonrejection regions.

The procedure we will use to test a hypothesis about the population mean when σ is unknown is called The One-Sample t-Test For a Population Mean.

Assumptions

1. Normal population

2. σ unknown

Step 1

The null hypothesis is H0: µ = µ0 and the alternative hypothesis is one of the following three:

H1: µµµµ ≠≠≠≠ µµµµ0 or H1: µµµµ < µµµµ0 or H1: µµµµ > µµµµ0

(two-tailed) (left-tailed) (right-tailed)

Step 2

Specify the significance level, α.

Step 3

Based on the sampling distribution of the sample mean x , we construct the following rejection regions:

We use the t-table to find the values of 2t and t ,α α see Figure below.

(((( )))) (((( )))) (((( ))))2| t | t or t < -t or t > t

two tailed left-tailed right-tailed

with df = n - 1.

α α αα α αα α αα α α>>>>

−−−−

0Xt

s n

− µ=

Page 268: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Ten: Hypotheses Testing

254

Step 4

Calculate the value of the test statistic

Step 5

Decide whether to reject the null hypothesis or whether to fail to reject it.

Example 10.7 Illustrating the One-Sample t-Test for a Population Mean (Two-Tailed Test)

A manufacturer of marble blocks used in homes construction claims that the mean weight of a particular marble is 325 kilograms. A random sample of 25 marbles reveals a sample average of 32.38 kilograms and a sample standard deviation of 1.17 kilograms. Using the 1% level of significance, is there evidence that the average weight of the marbles is different from 3.25? Assume weights of marbles are normally distributed.

Solution

Since the variable of interest is assumed normal and the population standard deviation is unknown, we can apply the procedure of one-sample t-test for a population mean.

Step 1

The null hypothesis: H0: µ = 32.5 kilograms

Alternative Hypothesis: H1: µ ≠ 32.5 kilograms

Step 2

Level of significance, α = 0.01.

Step 3

0Xt

s n

− µ=

Page 269: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

255

Criterion: Reject the null hypothesis if |t| > t(24, 0.005), that is, t > 2.797 or t < - 2.797, where

Step 4

Calculations:

Step 5

Decision: Since t = -0.51 falls in the nonrejection region, see Figure 10.8, therefore the null hypothesis cannot be rejected. To put this result in another way, we say

that the difference between X 32.38 kilograms and = 32.5 kilograms= µ can be

attributed to chance.

Figure 10.8 Criterion for deciding whether or not to reject the null hypothesis in Example 10.7

Example 10.8 Illustrating the One-Sample t-Test for a Population Mean (Right-Tailed Test)

Suppose that the cost of textbooks during a typical semester is a normally distributed random variable. A sample of 16 students enrolled in the university indicates an average cost of books of 31.54 J.D. with a standard deviation of 4.32 J.D. Using the 10% level of significance, is there evidence that the population mean is above 30 J.D.?

Solution

Since the cost of textbooks is assumed normal and the population standard deviation is unknown, we can apply the procedure of one-sample t-test for a population mean.

32.38 32.5t 0.51

1.17 25

−= = −

0Xt

s n

− µ=

0Xt

s n

− µ=

Page 270: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Ten: Hypotheses Testing

256

Step 1

The null hypothesis: H0: µ = 30 J.D.

Alternative Hypothesis: H1: µ > 30 J.D.

Step 2

Level of significance, α = 0.10.

Step 3

Criterion: Reject the null hypothesis if t > t(15, 0.10), that is, t > 1.341, where

Step 4

Calculations:

Step 5

Decision: Since t = 1.43 falls in the rejection region, see Figure 10.9, therefore the null hypothesis is rejected.

Figure 10.9 Criterion for deciding whether or not to reject the null hypothesis in Example 10.8

10.4 Testing Hypothesis about Population Proportion

In some applications, we want to test a hypothesis about a population proportion p of values that belong to a particular category. In Section 9.4, we learned how to construct a confidence interval for a population proportion. Now we present a procedure that is very similar to the one-sample z-test for a population mean. In this procedure, we, as usual, refer to the population proportion as p, the value of the proportion to be tested in the null and alternative hypotheses as p0, and to the point estimate as p . This procedure is

referred to as the One-sample z-Test for a Population Proportion.

31.54 30t 1.43

4.32 16

−= =

0Xt

s n

− µ=

Page 271: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

257

Assumption

Step 1

The null hypothesis is H0: p = p0 and the alternative hypothesis is one of the following three:

H1: p ≠≠≠≠ p0 or H1: p < p0 or H1: p > p0

(two-tailed) (left-tailed) (right-tailed)

Step 2

Specify the significance level, α.

Step 3

Based on the sampling distribution of the sample proportion p , we construct the

following rejection regions:

We use the cumulative standard normal table to find the values of

2z and z ,α α see Figure below.

Step 4

Calculate the value of the test statistic

Step 5

Decide whether to reject the null hypothesis or whether to fail to reject it.

Example 10.9 Illustrating the One-Sample z-Test for a Population Proportion (Left-Tailed Test)

A university official claims that in the past no more than 10% of the civil engineering department graduates continue a higher education. To test the validity of this claim, a sample of 100 civil engineers was interviewed and

( )0 0np 5 and n 1- p 5.≥ ≥

( )0

0 0

p pz .

p 1 p n

−=−

( ) ( ) ( )α α α>

−2|z| z or z < -z or z > z

two tailed left-tailed right-tailed

Page 272: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Ten: Hypotheses Testing

258

found that 14 continued their higher education. Using the 1% level of significance, is the official claim valid or is there evidence that the claim is invalid?

Solution

Notice that p0 = 10%, p= 14%, and p is the parameter to be tested. Also notice

Step 1

The null hypothesis: H0: p = 10%

Alternative Hypothesis: H1: p < 10%

Step 2

Level of significance, α = 0.01.

Step 3

Criterion: Reject the null hypothesis if z < - z0.01, that is, z < -2.33, where

Step 4

Calculations:

Step 5

Decision: Since z = 1.33 falls in the nonrejection region, see Figure 10.10, therefore the null hypothesis is not rejected, indeed p = 10%. To put this result in another way, we say that the difference between p 14% and p = 10%= is due

to chance. Therefore, the television manufacturer's claim is valid.

Figure 10.10 Criterion for deciding whether or not to reject the null hypothesis in Example 10.9

100 10% 10 5 and 100 90% = 90 > 5, so the assumption of the test is satisfied.× = > ×

( )0

0 0

p pz

p 1 p n

−=−

( )( )0.14 0.10

z 1.330.10 0.90 100

−= =

Page 273: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

259

10.5 The p-Value

Because of Step 3, the test of hypothesis we used thus far is called the critical-value approach. This is one way to report the results of a hypothesis test and say that the null hypothesis was or was not rejected at a specified α-level of significance. For instance, in Example 10.4, we can say that H0: µ = 3.2 is rejected at the 5% level of significance. However, some people, especially researchers, may see such a conclusion as inadequate, because it gives the decision maker no clear idea about whether the computed value of the test statistic was just in the beginning of the rejection region, or whether it was very far into this region. Furthermore, if the significance level is specified in a study, then it is as forcing others to use it. For these reasons, the critical-value approach may not be satisfactory.

To avoid these difficulties, many research workers and decision makers accompany the calculated value of z with a corresponding tail probability, or

p-Value, which is the probability of getting a difference between X and µ0 greater than or equal to that actually observed.

Definition 10.12 p-Value

The p-value is the probability of obtaining a test statistic equal to or more extreme than the one observed from the sample data, given that the null hypothesis H0 is true.

The p-value is the smallest level of significance that would lead to the rejection of the null hypothesis H0. Roughly speaking, the p-value indicates how likely it would be to observe the value obtained for the test statistic if the null hypothesis were true. A p-value close to 0 means it is improbable to observe the value obtained for the test statistic if H0 is true. In general, small p-values indicate the rejection of H0. We will use the letter p to denote the p-value. The p-value is also referred to as the observed significance level, abbreviated as OSL, or as the probability value.

When the null hypothesis H0 is rejected, we usually say the test statistic (and data) as being significant; therefore, we may think of the p-value as the smallest level α at which the data is significant. This way, researchers and decision makers can determine how significant the data is without formally imposing a predetermined level of significance, α.

For the foregoing normal distribution tests it is easy to compute the p-values. If z is the computed value of the test statistic, then the p-value is

(((( ))))(((( ))))

(((( ))))

2 1 z , for a two-tailed test

p z , for a left-tailed test

1- z , for a right-tailed test

− Φ− Φ− Φ− Φ = Φ= Φ= Φ= Φ ΦΦΦΦ

Page 274: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Ten: Hypotheses Testing

260

Here, Φ(z) is the standard cumulative distribution function, or simply, Φ(z) = P(Z ≤ z), that is the area to the left of Z under the standard normal curve, where, of course, Z~N(0, 1).

In Example 10.10, we compute the p-values for all three cases of the alternative hypothesis; namely, the two-tailed, left-tailed, and right-tailed hypotheses.

Example 10.10 Computation of the p-Value

a. Go back to Example 10.4. The computed value of the test statistic is z = -2.67 and since the alternative hypothesis is two-tailed, the p-value is

p = 2[1 - Φ(|-2.67|)] = 2[1 - Φ(2.67)] = 2[1 – 0.9962] = 0.0076

This result may be interpreted as follows:

p = 0.0076 is the probability of obtaining a value of the test statistic equals to or more extreme than the one observed. Because this is smaller than α = 0.05, the null hypothesis is rejected.

b. Go back to Example 10.6. The computed value of the test statistic is z = -1.75 and since the alternative hypothesis is left-tailed, the p-value is

p = Φ(-1.75) = 0.0401

This result may be interpreted as follows:

p = 0.0401 is the probability of obtaining a value of the test statistic equals to or less than the one observed. Because this is smaller than α = 0.10, the null hypothesis is rejected.

c. Go back to Example 10.5. The computed value of the test statistic is z = 1.89 and since the alternative hypothesis is right-tailed, the p-value is

p = 1 - Φ(1.89) = 1 – 0.9706 = 0.0294

This result may be interpreted as follows:

p = 0.0294 is the probability of obtaining a value of the test statistic equals to or greater than the one observed. Because this is larger than α = 0.01, the null hypothesis is not rejected.

10.6The Power of the Test

Recall Definition 10.7, the power of a statistical test is the probability of rejecting the null hypothesis when in fact it is true and should not be rejected. It is denoted by 1 – β. The power of a test is 1 minus the probability of a Type II error. The power can be interpreted as the probability of correctly rejecting a false null hypothesis. Its value is between 0 and 1. If the power is near 0, the hypothesis test is not good at detecting a false null hypothesis; if the power is

Page 275: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

261

near 1, the hypothesis test is very good at detecting a false null hypothesis. We compare statistical tests by comparing their power properties.

Since power is the complement of a Type II error and power = 1 – β, therefore, to calculate the power we must actually start by calculating the probability of Type II error β. In this section we will learn how to compute β, and then it becomes easy to calculate power. Although our discussion will be limited to the one-sample z-test, the ideas can be applied to any test of hypothesis.

To fully understand the ideas of Type II error and power, it may be a good idea to rewrite Table 10.1 in a slightly different form. Table 10.3 shows Type II error and power, in addition to Type I error.

Table 10.3 Type I error, Type II error, and power

H0 is: True False

Do not Reject H0

Correct Decision P(confidence) = 1 - α

Type II Error P(Type II error)= β

De

cisi

on

:

Reject H0

Type I Error P(Type I error) = α

Correct Decision Power = 1 - β

Type II Error Probability β

Example 10.11 explains how to compute the probability of making a Type II error for a one-sample z-test for a population mean. Notice that to be able to compute the value of β you need the value of µ. That is, for each µ there is a

value of β.

Example 10.11 Computing Type II Error Probability

Omar is willing to buy a mini-market located about fifty meters away from his home. The present owner claims that over the past 5 years the mean daily revenue has been 67.5 J.D. with a standard deviation of 7.5 J.D. Omar thinks that the mean daily revenue is less than 67.5 J.D. To test the owner's claim, Omar takes a sample of 30 selected days, thus plans to perform the hypothesis test

H0: µ = 67.5 J.D. (owner's claim)

H1: µ < 67.5 J.D. (Omar's belief)

at the 1% level of significance. Assuming that daily revenues are normally distributed, find the probability of making a Type II error, β, if the true mean daily revenue is

a) 65 J.D. b) 62 J.D.

Page 276: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Ten: Hypotheses Testing

262

Solution

Notice that the test is left-tailed with α = 0.01 and the test statistic is

and the critical value is z = z0.01 = -2.33. Therefore, the criterion for the hypothesis test is as follows:

We reject H0 if z ≤≤≤≤ -2.33; we do not reject H0 if z > -2.33.

It is very important to say that if the decision rule is expressed in terms of x , the calculations of Type II error will be simpler. Towards this end, we observe that

So the decision rule becomes:

This decision rule is depicted in Figure 10.11.

Figure 10.11 Graphical representation of decision criterion for Omar's mini-market illustration

a. Now we want to determine β if the true mean daily revenue of all days is 65 J.D. We first note that if µ = 65 J.D., then

• x 65.µ = µ =

• x n 7.5 30 1.369.σ = σ = =

• ( )X ~ N 65, 1.369 .

This normal curve is shown in Figure 10.12. Also shown in the figure, is the calculation of β.

0X X 67.5z

n 7.5 30

− µ −= =σ

0 0We reject H if X 64.31 ; we do not reject H if X > 64.31. ≤

X 67.52.33 which implies that X 64.31.

7.5 30

− ≤ − ≤

Page 277: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

263

Figure 10.12 Determining the probability of a Type II error if µ = 65 J.D.

Recall that a Type II error occurs if we do not reject H0, that is, if

x 64.31 J.D.> and the probability that this happens is equal to 0.6915.

Therefore, if the true mean daily revenue of all days is 65 J.D., then the probability of committing a Type II error is 0.6915., that is, β = 0.6915.

This means that there is approximately a 70% chance that Omar will fail to reject the owner's claim that the mean daily revenue of all days is 67.5 J.D. when, in fact, the true mean is 65 J.D.

b. Here, we want to determine β if the true mean daily revenue of all days is 62 J.D. Proceeding similar to what we did in part (a), but this time with µ = 62 J.D., we get the results shown in Figure 10.13.

Figure 10.13 Determining the probability of a Type II error if µ = 62 J.D.

Therefore, if the true mean daily revenue of all days is 62 J.D., then the probability of committing a Type II error is 0.0.0455., that is, β = 0.0455. This means that there is approximately a 5% chance that Omar will fail to reject the owner's claim that the mean daily revenue of all days is 67.5 J.D. when, in fact, the true mean is 62 J.D.

Page 278: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Ten: Hypotheses Testing

264

It should be clear now that the Type II error probability, β, depends on three factors, namely, the sample size, the significance level, and the true value of the parameter.

To better understand Type II error probabilities, we combine Figures 10.12 and 10.13 where µ = 65 and 62 with three others where we assumed the true mean is

µ = 64, 63, and 61. We put all results in Figure 10.14.

Figure 10.14 Type II error probabilities for

µ = 65, 64, 63, 62, and 61 J.D.s (α =0.01, n = 30)

From Figure 10.14, it is absolutely clear that β decreases as the value of the true mean gets smaller than the value of the mean in the null hypothesis; µ = 67.5 J.D. We would expect a false null hypothesis to be detected when the true mean is far from the mean in the null hypothesis than when it is close to it.

Having computed the probability of a Type II error for a given statistical test, it is now easy to compute the power of that test. Since for any test, power = 1 – β, all we need to do is subtract the value of Type II error probability from 1 to get

Page 279: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

265

610 620 630 640 650

Truemean

0.40

0.60

0.80

1.00

Pow

er

the corresponding power of that test. Table 10.4 shows the power that corresponds to each value of µ considered in Graph 10.14.

Table 10.4 Selected Type II error probabilities and powers for Omar's mini-market illustration, (α = 0.01, n = 30)

True mean

µµµµ

P(Type II error) β

Power 1 - β

65 0.6915 0.3085 64 0.4090 0.5910 63 0.1685 0.8315 62 0.0455 0.9545 61 0.0078 0.9922

We can plot points of power against µ and then connect the points with a smooth curve. The resulting curve is called the power curve and is shown in Figure 10.15. A power curve that is closer to 1 indicates that the test of hypothesis is good in detecting a false hypothesis.

Figure 10.15 Power curve for Omar's mini-market illustration, (α = 0.01, n = 30)

Page 280: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Ten: Hypotheses Testing

266

Exercises

10.1

a. The symbol H0 is used to denote which hypothesis?

b. The symbol H1 is used to denote which hypothesis?

c. The level of significance or probability of committing a Type I error is denoted by which symbol?

d. The probability of committing a Type II error is denoted by which symbol?

e. What does 1 – β represent?

f. What is the relationship of α to the Type I error?

g. What is the relation of β to the Type II error?

h. How is power related to the probability of making a Type II error?

10.2 Due to complaints from both students and faculty about lateness, the registrar at a large university is willing to adjust the scheduled class times to allow for adequate travel times between classes and is ready to undertake a study. Up until now, the registrar believed 10 minutes between scheduled classes should be sufficient. State the null hypothesis H0 and the alternative hypothesis H1 that you would use to perform such a test.

10.3 The manager of a local bank believes that over the past few years the mean amount of saving accounts is less than 2000 J.D. State the null hypothesis H0 and the alternative

hypothesis H1 that you would use to perform such a test.

10.4 In the past, the mean running time for a certain type of fluorescents has been 1000 hours. The manufacturer has introduced a change in the production method and wants to perform a hypothesis test to determine whether the mean running time has increased as a result. State the null hypothesis H0 and the alternative hypothesis H1 that you would use to perform such a test.

10.5 At one university, the average amount of time that students spend studying each week is 20 hours. The registrar introduces a campaign to encourage the students to study more. One year later, the registrar performed a hypothesis test to determine whether the average amount of time studying per week has increased. State the null hypothesis H0 and the alternative hypothesis H1 that you would use to perform such a test.

10.6 A health insurance company has determined that the fee for a certain medical operation is 500 J.D. They suspect that the average fee charged by one particular clinic for this procedure is higher than 500 J.D. The company wants to perform a hypothesis test to determine whether their suspicion is correct. State the null hypothesis H0 and the alternative hypothesis H1 that you would use to perform such a test.

10.7 The professor that teaches statistics claims that students who are well prepared should average

Page 281: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

267

82.3 with a standard deviation of 8.6. A sample of 45 randomly selected students averaged 85.7. Using the 5% level of significance, do the data provide sufficient evidence that the true average is higher than 82.3?

10.8 The diameter of cell phone chips is known to have a standard deviation of 0.01 inch. A random sample of size 10 chips yields an average diameter of 1.5045 inches. Using the 1% level of significance, do the data provide sufficient evidence that the true average is equal to 1.5 inches?

10.9 You wish to test the claim that µ > 23 at a level of significance of α = 0.05 and are given sample

statistics n = 51, x = 22.3, and σ = 1.2. Round your answer to two decimal places.

10.10 You wish to test the claim that µ ≠ 21 at a level of significance of α = 0.05 and are given sample

statistics n = 38, x = 19.1, and σ = 2.7. Round your answer to two decimal places.

10.11 The owner of a company claims that the average salary of a mechanical engineer working for him is 750 J.D. A random sample of 36 mechanical engineers yields mean salary of 730 J.D. with a standard deviation of 50 J.D. Test the owner's claim. Use α = 0.05.

10.12 If in a sample of size n = 16 selected from a normal population,

the sample mean x = 56 and the sample standard deviation is s = 13, what is the value of the t-statistic if

we are testing the null hypothesis H0 = 50?

10.13 If in a sample of size n = 20 selected from a left-skewed population, the sample mean is

x = 70 and the sample standard deviation is s = 18, would you use the t-statistic to test the hypothesis H0: µ = 70? Discuss.

10.14 If in a sample of size n = 100 selected from a left-skewed population, the sample mean is

x = 70 and the sample standard deviation is s = 18, would you use the t-statistic to test the hypothesis H0: µ = 70? Discuss.

10.15 Test the claim that the mean lifetime of car engines of a particular type is greater than 222,000 miles. Sample data are summarized as

n = 26, x = 226,450 miles, and s = 11,000 miles. Use a significance level of α = 0.01. Find the test statistic t. Assume the random sample has been selected from a normally distributed population.

10.16 Test the claim that for the population of all statistics exams, the mean score is 85. Sample data

are summarized as n = 16, x = 83, and s = 11.1. Use a significance level of α = 0.05. Find the test statistic t. Assume the random sample has been selected from a normally distributed population.

10.17 A DVD company wants to determine the proportion of DVD's bought by university students. If the proportion exceeds 50%, then the company will make a sale on its products. Suppose 350 university students were randomly sampled

Page 282: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Ten: Hypotheses Testing

268

and 180 have DVD's at home. Find the rejection region for this test using α = .05.

10.18 A DVD company wants to determine the proportion of DVD's bought by university students. If the proportion differs from 50%, then the company will make a sale on its products. Suppose a hypothesis test is conducted and the test statistic is 2.5. Find the p-value for a two-tailed test of hypothesis.

10.19 A survey claims that 8 out of 10 doctors (i.e., 80%) recommend medicine brand W for their patients who have children. To test this claim against the alternative that the actual proportion of doctors who recommend brand W is less than 80%, a random sample of 100 doctors results in 74 who indicate that they recommend brand W. Determine the approximate value of the test statistic in this problem.

10.20 A survey claims that 8 out of 10 doctors (i.e., 80%) recommend medicine brand W for their patients who have children. To test this claim against the alternative that the actual proportion of doctors who recommend brand W is less than 80%, a random sample of doctors was taken. Suppose the test statistic is z = -1.75. Can we conclude that H0 should be rejected at the

a. α = 0.10,

b. α = 0.05, and

c. α = 0.01 level?

10.21 The director of manufacturing at a clothing factory needs to determine whether a new machine is producing a particular

type of cloth according to the manufacturer's specifications, which indicate that the cloth should have mean breaking strength of 70 pounds and a standard deviation of 3.5 pounds. A sample of 40 pieces reveals a mean of 68.9 pounds. Compute the p-value and interpret its meaning.

10.22 In each part below, you are given the significance level and p-value for a hypothesis test. For each case decide whether the null hypothesis should be rejected.

a. α = 0.05, p = 0.07

b. α = 0.05, p = 0.01

c. α = 0.05, p = 0.05

d. α = 0.01, p = 0.02

e. α = 0.10, p = 0.07

10.23 In each part below, you are given the value obtained for the test statistic z in a one-sample z-test for the population mean. You are also given whether the test is two-tailed, left-tailed, or right-tailed. Determine the p-value in each case.

a. Right-tailed test:

(i) z = 1.56 (ii) z = -0.72

b. Left-tailed test:

(i) z = -1.76 (ii) z = 1.42

c. Two-tailed test:

(i) z = 3.02 (ii) z = -2.24

10.24 A business school claims that more than 25% of its students plan to get the CPA certificate. It is found that among a random sample of 130 of the school's students, 40% of them plan to get CPA. Find the

Page 283: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

269

p-value for a test of the school's claim.

10.25 In a sample of 85 children selected randomly from one town, it is found that 8 of them suffer from asthma. Find the p-value for a test of the claim that the proportion of all children in the town who suffer from asthma is equal to 11%.

10.26 It is known that the recommended daily allowance (RDA) of iron for adult females under the age of 51 is 18 mg. A hypothesis test is to be performed to decide whether adult females under the age of 51 are, on the average, getting less than the RDA of 18 mg of iron. The null and alternative hypotheses for the test are

H0: µ = 18 mg

H1: µ < 18 mg

where µ is the mean iron intake (per day) of all adult females under the age of 51. Suppose that σ = 4.2, α =

0.01, n = 45, and µ = 15.50, 15.75, 16.00, 16.25, 16.50, 16.75, 17.00, 17.25, 17.50, 17.75.

a. Express the decision criterion for

the hypothesis test in terms of x .

b. Determine the probability of a Type II error.

c. Construct a table similar to Table 10.4 giving the probability of a Type II error and the power for each of the given values of µ.

d. Use the table obtained in part (c) to draw the power curve.

10.27 Suppose that the mean telephone expenditure per consumer unit was 480 J.D. in the

year 2005. We want to perform a hypothesis test to decide whether last year's mean expenditure has increased over the 2005 mean of 480 J.D. The null and alternative hypotheses are

H0: µ = 480 J.D.

H1: µ > 480 J.D.

where µ is the last year's mean telephone expenditure per consumer unit. Suppose that σ = 245, α = 0.05, n = 40, and µ = 500, 520, 540, 560, 580, 600, 620, 640.

a. Express the decision criterion for the hypothesis test in terms of x .

b. Determine the probability of a Type II error.

c. Construct a table similar to Table 10.4 giving the probability of a Type II error and the power for each of the given values of µ.

d. Use the table obtained in part (c) to draw the power curve.

10.28 A manufacturer claims that the mean amount of juice in its 16 ounce bottles is 16.15 ounces. A consumer advocacy group wants to perform a significance test to determine whether the mean amount is actually less than this. The hypotheses are:

H0: µ = 16.1 ounces

H1: µ < 16.1 ounces

Suppose that the results of the sample lead to rejection of the null hypothesis. Classify that conclusion as a Type I error, a Type II error, or a correct decision, if in fact the mean amount of juice, µ, is less than 16.1 ounces.

Page 284: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Ten: Hypotheses Testing

270

10.29 In the past, the mean running time for a certain type of flashlight battery has been 9.6 hours. The manufacturer has introduced a change in the production method and wants to perform a significance test to determine whether the mean running time has increased as a result. The hypotheses are:

H0: µ = 9.6 hours

H1: µ > 9.6 hours

Suppose that the results of the sample lead to nonrejection of the null hypothesis. Classify that conclusion as a Type I error, a Type II error, or a correct decision, if in fact the mean running time has increased.

10.30 A health insurance company has determined that the fee for a certain medical procedure is 250 J.D. They suspect that the average fee charged by one particular physician for this procedure is higher than 250 J.D. The insurance company wants to perform a significance test to determine whether their suspicion is correct. The hypotheses are:

H0: µ = 250 J.D.

H1: µ > 250 J.D.

Suppose that the results of the sample lead to rejection of the null hypothesis. Classify that conclusion as a Type I error, a Type II error, or a correct decision, if in fact the average fee charged by the clinic is 250 J.D.

Page 285: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

271

Chapter Learning Outcomes

When you complete your careful study of this chapter you should be able to:

1. use and understand formulas given in this chapter.

2. define a hypothesis and a hypothesis testing.

3. define the terms associated with hypothesis testing.

4. describe the five steps to use in hypothesis testing.

5. distinguish between a one-tailed and a two-tailed hypotheses.

6. identify the test statistic, the rejection and nonrejection regions.

7. conduct a test of hypothesis about a population mean when σ is known.

8. conduct a test of hypothesis about the population mean when σ is unknown.

9. conduct a test of hypothesis about population proportion.

10. calculate and interpret the p-value.

11. define and learn how to compute Type I error, Type II error, and power of test.

Chapter Key Terms

Acceptance region

Alternative hypothesis

Critical-value approach

Critical values

Errors

Hypothesis

Hypothesis test

Left-tailed test

Nonrejection region

Null hypothesis

Observed significance level (OSL)

One-sample t-test

One-sample z-test

One-tailed test

Power

Power curve

Probability value

Rejection region

Right-tailed test

Significance level, α

Steps in hypothesis testing

t-test

Test statistic

Two-tailed test

Type I error

Type I error probability, α

Type II error

Type II error probability, β

z-test

Page 286: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Ten: Hypotheses Testing

272

Page 287: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

273

Chapter 11

Statistical Inference on Two Samples

In the previous chapter we presented hypothesis tests for one population parameter (the mean µ or the proportion p). However, there are many statistical problems in which we have to deal with the means of two populations. For example, if the operation manager at a light bulb factory wants to determine if there is any difference in the mean life expectancy of bulbs manufactured on two different types of machines; also, if a licensing examination is given to engineers who graduated from two different colleges, we may want to decide whether any observed difference between the means of the scores of the students from the two colleges is significant or whether it may be attributed to chance.

In this chapter we extend the results of one population to the case of two independent populations. We will be able to compare statistics computed from two samples. Such procedures are called two-sample tests and are used to make inferences about the parameters of two populations. We shall consider inferences about two means and inferences about two proportions.

Chapter Outline

11.1 Inference About Two Means: Independent Samples

11.2 Inference About Two Means: Dependent Samples

11.3 Inference About Two Population Proportions

Page 288: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eleven: Statistical Inference on Two Samples

274

11.1 Inference About Two Means: Independent Samples

The general situation of the case of two population means is depicted in Figure 11.1. We have two populations; population 1 has mean µ1 and standard deviation σ1 while population 2 has mean µ2 and standard deviation σ2. Our inferences will be based on two random samples of sizes n and m, taken from populations 1 and 2, respectively. That is, 1 2 nx , x , ... , x is a random sample of

size n observations from population 1, and 1 2 my , y , ... , y is a random sample of

size m observations from population 2. We also assume that these two samples are independent samples, meaning that the sample selected from one population has no effect on the sample selected from the other population

Figure 11.1 Two

independent populations

Now suppose we want to test the null hypothesis

H0: µ1 = µ2

, which can equivalently be written as

H0: µ1 - µ2 = 0.

Similar to the tests concerning one mean, we shall consider tests of this null hypothesis against one of three alternatives, namely

(1) H1: µ1 - µ2 ≠ 0 , (2) H1: µ1 - µ2 < 0, or (3) H1: µ1 - µ2 > 0.

Roughly speaking the hypothesis test can be carried out as follows:

1. Independently and randomly take a sample of size n from population 1 and a sample of size m from population 2.

2. Compute the mean of sample 1, x , and the mean of sample 2, y .

3. Reject the null hypothesis if the sample means, x and y differ by "too much";

otherwise do not reject the null hypothesis.

The question whether the difference x - y can reasonably be attributed to

sampling error or whether the difference is large enough that the two populations have different means, is answered by knowing the distribution of the difference between two sample means; the sampling distribution of the difference between two means.

Page 289: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

275

The Sampling Distribution of the Difference between Two Means

Suppose that X is normally distributed variable on population 1 with mean µ1 (unknown) and standard deviation σ1 (known), and suppose that y is normally distributed variable on population 2 with mean µ2 (unknown) and standard deviation σ2 (known). Then, for independent samples of sizes n and m from the two populations, respectively,

Under the above conditions, then

has a standard normal distribution. Recall that, under H0: µ1 - µ2 = 0.

Knowing this fact, we can now easily develop hypothesis testing and confidence interval procedures for comparing two population means when the population standard deviations are known. These procedures are referred to as the two-sample z-test and the two-sample z-interval, respectively.

Notice that the above statistic, z, can also be used when our samples are large, n ≥ 30 and m ≥ 30, so that we can apply the central limit theorem and approximate σ1 and σ2 with s1 and s2, respectively. The z-statistic, then, takes the form

We can now develop the test hypothesis procedure, the two-sample z-test for two-population means, when population standard deviations are assumed known, as follows.

The Two-Sample z-Test for Two-Population Means

Assumptions:

1. Independent samples.

2. Normal populations or large samples.

3. Known population standard deviations.

(((( )))) (((( ))))

(((( )))) (((( ))))1 2 1 2

2 21 2

x xz

s n s m

− − µ − µ− − µ − µ− − µ − µ− − µ − µ====

++++

( ) ( )( ) ( )

x y 1 2

2 2x y 1 2

2 21 2 1 2

,

n m , and

x y ~ N , n m .

• µ = µ − µ

• σ = σ + σ

• − µ − µ σ + σ

(((( )))) (((( ))))(((( )))) (((( ))))

1 2

2 21 2

x yz

n m

− − µ − µ− − µ − µ− − µ − µ− − µ − µ====

σ + σσ + σσ + σσ + σ

Page 290: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eleven: Statistical Inference on Two Samples

276

Step 1

The null hypothesis is H0: µ1 = µ2 and the alternative hypothesis is one of the following three:

H1: µµµµ1 - µµµµ2 ≠≠≠≠ 0 or H1: µµµµ1 - µµµµ2 < 0 or H1: µµµµ1 - µµµµ2 > 0

(two-tailed) (left-tailed) (right-tailed)

Step 2

Specify the significance level, α.

Step 3

Based on the sampling distribution of the difference between two means x and y,

we construct the following rejection regions:

|z| > zα/2 or z < -zα or z > zα

(two-tailed) (left-tailed) (right-tailed)

We use the cumulative standard normal table to find the values of

2z and z ,α α see Figure below.

Step 4

Calculate the value of the test statistic

Step 5

Decide whether to reject the null hypothesis or whether to fail to reject it.

We can also develop a confidence interval procedure, the two-sample z-interval for two population means, when population standard deviations are assumed known, as follows.

(((( )))) (((( ))))

(((( )))) (((( ))))1 2

2 21 2

x yz

n m

− − µ − µ− − µ − µ− − µ − µ− − µ − µ====

σ + σσ + σσ + σσ + σ

Page 291: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

277

The Two-Sample z-Interval for Two-Population Means

Assumptions:

1. Independent samples.

2. Normal populations or large samples.

3. Known population standard deviations.

Example 11.1, given next, illustrates inferences about two population means in the case where population standard deviations are known.

Example 11.1 Inferences about Two Population Means, Known Standard Deviations

Two production lines produce ball bearings used in industry. It is known that both lines produce units that have the same diameter standard deviation; that is σ1 = σ2 = 4 cm. Two independent random samples of sizes n = 24 and m = 26 are tested and the sample mean diameter are x = 0.19 cm and y = 0.25 cm. Assume

that the diameters of ball bearings are normally distributed.

a. Test the hypothesis that both production lines have the same mean diameter. Use α = 0.05

b. Construct a 95% confidence interval on the difference in means µ1 - µ2. What is the practical meaning of this interval?

Solution

a. Since the variables are assumed normal and the population standard deviations are known, we can apply the two-sample z-test for two population means procedure. Let µ1 = mean diameter of ball bearings produced by line 1 and µ2 = mean diameter of ball bearings produced by line 2.

Step 1

The null hypothesis: H0: µ1 = µ2

Alternative Hypothesis: H1: µ1 ≠ µ2

(((( )))) (((( )))) (((( ))))

(((( )))) (((( )))) (((( )))) (((( )))) (((( )))) (((( ))))

2 22 1 2

2 2 2 22 1 2 1 2 2 1 2

2

x y z. n m

or

x y z . n m x y z . n m

Use the cumulative standard normal table to find z .

αααα

α αα αα αα α

αααα

− ± σ + σ− ± σ + σ− ± σ + σ− ± σ + σ

− − σ + σ ≤ µ − µ ≤ − + σ + σ− − σ + σ ≤ µ − µ ≤ − + σ + σ− − σ + σ ≤ µ − µ ≤ − + σ + σ− − σ + σ ≤ µ − µ ≤ − + σ + σ

Page 292: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eleven: Statistical Inference on Two Samples

278

Step 2

Level of significance, α = 0.05.

Step 3

Criterion: Reject the null hypothesis if z < - z0.025 or z > z0.025, that is, z < -1.96, or z > 1.96, where

Step 4

Calculations:

Step 5

Decision: Since z = -4.93 falls in the rejection region, see Figure 11.2, therefore the null hypothesis is rejected. To put this result in another way, we say that the difference between 1 2 1 2x x 0.19 0.25 and - = 0− = − µ µ cannot be attributed to chance.

Figure 11.2 Criterion for deciding whether or not to reject the null hypothesis in Example 11.1

To interpret this conclusion we say the test results are statistically significant at the 5% level; that is, at the 5% significance level, we have sufficient evidence to conclude that mean diameter of line 1 is significantly different than mean diameter of line 2.

b. Since the variables are assumed normal and the population standard deviations are known, we can apply the two-sample z-interval for two population means procedure. Let µ1 = mean diameter of line 1 and µ2 = mean diameter of line 2.

( ) ( )( ) ( )2 2

0.19 0.25 0z 4.93

0.04 19 0.04 25

− −= = −

+

( ) ( )( ) ( )

1 2

2 21 2

x yz

n m

− − µ − µ=

σ + σ

Page 293: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

279

A 95% confidence interval for (µ1 - µ2) is given by

Thus the 95% confidence interval is from -0.0675 to -0.0525. We can be 95% confident that the difference, µ1 - µ2, between the mean diameter of line 1 and line 2 is somewhere between -0.0675 and -0.0525.

Hypothesis Testing for the Means of Two Populations Using Independent Samples when σ1 and σ2 are Unknown but Equal

In this case, we assume that σ1 = σ2 = σ, where σ is the common standard deviation for both populations. Instead of both σ1 and σ2 we replace σ in the formula for z and get

Unfortunately, we cannot use this variable as a basis for our inferences because σ is unknown. What we do now is estimating σ from the samples we have. We first obtain an estimate of the unknown population variance, σ2. To do this, we

consider the two sample variances, 2 2 21 2s and s , as two estimates of σ and then

pool those estimates by weighting them according to their sample sizes. So, our estimate of the common variance, σ2, is given by

Therefore, our estimate of σ is

The quantity sp is called the pooled sample standard deviation, where the subscript "p" stands for "pooled".

We now replace σ, in z, with its estimate sp and get the following variable:

(((( )))) (((( ))))2 22 1 2p

n 1 s m 1 ss

n m 2

− + −− + −− + −− + −====

+ −+ −+ −+ −

(((( )))) (((( ))))2 21 2

pn 1 s m 1 s

sn m 2

− + −− + −− + −− + −====

+ −+ −+ −+ −

( ) ( ) ( )( ) ( ) ( )

2 22 1 2

2 2

1 2

x y z . n m

0.19 0.25 1.96. 0.04 19 0.04 25

0.06 0.0075

or

0.0675 0.0525

α− ± σ + σ

= − ± +

= − ±

− ≤ µ − µ ≤ −

(((( )))) (((( ))))(((( )))) (((( ))))

1 2x yz

1 n 1 m

− − µ − µ− − µ − µ− − µ − µ− − µ − µ====

σ +σ +σ +σ +

(((( )))) (((( ))))(((( )))) (((( ))))

1 2

p

x y

s 1 n 1 m

− − µ − µ− − µ − µ− − µ − µ− − µ − µ

++++

Page 294: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eleven: Statistical Inference on Two Samples

280

This variable can be used as a basis for our inferences in testing hypothesis and confidence interval. However, this random variable is no longer a standard normal variable; it is a t variable which has a t-distribution with n + m – 2 degrees of freedom. So, we can write

We can use this statistic as our test statistic for a hypothesis test and obtain critical values from the t-table. Thus, we now have the following procedure, which is referred to as the pooled t-test.

The Pooled t-Test for Two Population Means

Assumptions

1. Independent samples.

2. Normal populations or large samples.

3. Unknown but equal standard deviations.

Step 1

The null hypothesis is H0: µ1 = µ2 and the alternative hypothesis is one of the following three:

H1: µµµµ1 - µµµµ2 ≠≠≠≠ 0 or H1: µµµµ1 - µµµµ2 < 0 or H1: µµµµ1 - µµµµ2 > 0

(two-tailed) (left-tailed) (right-tailed)

Step 2

Specify the significance level, α.

Step 3

Based on the sampling distribution of the difference in two means 1 2x x− , we construct the following rejection regions:

We use the t-table to find the values of ( ) ( )n 1, n 1, 2t and t ,− α − α see Figure

below.

(((( )))) (((( ))))(((( )))) (((( ))))

1 2

p

x yt with df = n+m-2

s 1 n 1 m

− − µ − µ− − µ − µ− − µ − µ− − µ − µ====

++++

( ) ( ) ( )( ) ( ) ( )

1 11 n , n , n , 2|t| t or t <- t or t > t

two tailed left-tailed right-tailed

with df = n + m - 2.

− α − α− α>

Page 295: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

281

Step 4

Calculate the value of the test statistic

Step 5

Decide whether to reject the null hypothesis or whether to fail to reject it.

We can also develop a confidence interval procedure, the pooled t- interval for two population means, when population standard deviations are assumed unknown but equal. This confidence interval is given as follows.

The Pooled t-Interval for Two-Population Means

Assumptions:

1. Independent samples.

2. Normal populations or large samples.

3. Unknown but equal standard deviations.

Example 11.2, given next, illustrates inferences about two population means in the case where population standard deviations are unknown but equal.

Example 11.2 Inferences about Two Population Means, Unknown but Equal Standard Deviations

A lap top computer production company wishes to study the differences between two of its major distributing stores. The company is particularly interested in the time needed before customers receive lop tops they have ordered from the stores. The following table shows data of delivery times for the most popular type of lop top.

Store A Store B

1x 15.4 days= 2x 15.2 days=

1s 0.2 days= 2s 0.3 days=

n = 40 m = 30

( )( ) ( )1 1

−=

+p

x yt

s n m

(((( )))) (((( )))) (((( )))) (((( ))))

(((( )))) (((( )))) (((( )))) (((( )))) (((( )))) (((( )))) (((( )))) (((( ))))

pn m 2, 2

p 1 2 pn m 2, 2 n m 2, 2

2

x y t .s 1 n 1 m

or

x y t .s 1 n 1 m x y t .s 1 n 1 m

For a confidence level of 1 - , we use the t-table to find t with df = n

+ − α+ − α+ − α+ − α

+ − α + − α+ − α + − α+ − α + − α+ − α + − α

αααα

− ± +− ± +− ± +− ± +

− − + ≤ µ − µ ≤ − + +− − + ≤ µ − µ ≤ − + +− − + ≤ µ − µ ≤ − + +− − + ≤ µ − µ ≤ − + +

αααα + m - 2.

Page 296: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eleven: Statistical Inference on Two Samples

282

Assume that the population standard deviations for both stores are equal.

a. At the 0.05 level of significance, is there evidence of a difference between the mean delivery times of the two distributing stores?

b. Construct a 95% confidence interval for the difference between the means of the two populations.

Solution

Because the three assumptions of testing a hypothesis and constructing confidence interval are all satisfied, we can use the pooled t-test for two population means and the pooled t-interval for two population means to solve this problem. Let µ1 = mean delivery time, in days, for store A and µ2 = mean delivery time, in days, for store B.

a. A statistical test of hypothesis consists of the following 5 steps.

Step 1

The null hypothesis: H0: µ1 = µ2

The alternative hypothesis: H1: µ1 ≠ µ2

Step 2

The significance level, α = 0.05.

Step 3

Criterion: Since df = 40 + 30 – 2 = 68, we will reject the null hypothesis if

( )68, 0.025| t | t> , that is, if t > 1.9955 or t < -1.9955.

Step 4

Calculations: For our data we have

( )( ) ( )

( ) ( )

( ) ( )

1 2

p

2 22 1 2p

2 2

p

x x t ,

s 1 n 1 m

where

n-1 s m 1 s s =

n m 2

39 0.2 29 0.3 4.71 = 0.0693

40 30 2 68

so, s 0.0693 0.2632

Therefore,

−=

+

+ −+ −

+= =

+ −= =

( ) ( )15.4-15.2 0.2

t = 3.140.06360.2632 1 40 1 30

= =+

Page 297: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

283

Step 5

Decision: Since t = 3.14 falls in the rejection region, see Figure 11.3, therefore the null hypothesis is rejected. To put this result in another way, we say that the difference between 1 2 1 2x x 0.2 and - = 0− = µ µ cannot be attributed to

chance.

Figure 11.3 Criterion for deciding whether or not to reject the null hypothesis in Example 11.2(a)

To interpret this conclusion we say the test results are statistically significant at the 5% level; that is, at the 5% significance level, we have sufficient evidence to conclude that mean delivery time of store A is significantly different than mean delivery time of store B.

b. A 95% confidence interval for the difference between the means of the two populations is given as follows. For a 95% confidence interval, α = 0.05. From the t-table, since n = 40 and m = 30; so df = 40 + 30 – 2 = 68. Consequently, we find that for df = 68, tα/2 = t0.05/2 = t0.025 = 1.9955. Also, we

are given 1 2x 15.4, x 15.2= = , and from part (a) we calculated sp = 0.2632.

Therefore,

We can be 95% confident that the difference, µ1 - µ2, between the mean delivery times of store A and store B is somewhere between 0.0731 days and 0.3269 days.

11.2 Inference About Two Means: Dependent Samples

In Section 11.1 we discussed hypothesis testing and confidence intervals to examine differences between two independent populations. In this section, we

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )1 2 pn m 2, 2

1 2

x x t .s 1 n 1 m 15.4 15.2 1.9955 0.2632 1 40 1 30

= 0.2 0.1269

or

0.0731 0.3269

+ − α− ± + = − ± +

±

≤ µ − µ ≤

Page 298: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eleven: Statistical Inference on Two Samples

284

develop procedures for hypothesis testing and confidence interval for the differences between two dependent (or paired) populations. The "dependency" of the two populations occurs either because the items or individuals are paired or matched, e. g. mother and son, brand 1 and brand 2 or because repeated measurements are obtained from the same set of items or individuals, e.g. before and after. In either case, we will be interested in the difference between the values of the observations rather than the values of the observations themselves.

The objective of dependent samples is to study the difference between two measurements by reducing the effect of the variability that is due to the items or individuals themselves. In this section we develop important inferences to accomplish this.

In order to determine if difference exists between two dependent samples, we must first calculate the differences in the individual values, as shown in Table 11.1. If x1, x2, … , xn represents the n observations from a sample and y1, y2, … , yn represents either the corresponding n paired observations from a second sample or the corresponding n repeated measurements from the original sample. Then the corresponding n differences are denoted by d1, d2, … , dn, where

d1 = x1 – y1, d2 = x2 – y2, … , and dn = xn – yn.

We call d the paired-difference variable.

Table 11.1 Calculating the difference between two dependent samples

Sample

Observation 1 2 Difference

1 2 . . . i . . . n

x1 x2 . . . xi . . .

xn

y1 y2 . . .

yi . . .

yn

d1 = x1 – y1 d2 = x2 – y2

.

.

. di = xi – yi

.

.

. dn = xn - yn

From the last column of Table 11.1, we find that the sample mean of the paired differences is

ni

i 1d

dn

∑∑∑∑========

Page 299: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

285

If µ1 denotes the mean of population 1 and µ2 denotes the mean of population 2, it can be shown that

that is, the mean of the paired difference equals the difference between the two population means. Furthermore, if d is normally distributed, and the population standard deviation of the difference, σd, is known or the sample size is large, then the random variable

has the standard normal distribution.

Hypothesis Tests for the Mean of Two Populations

Using a Dependent Sample, σ Known

We can develop a testing hypothesis procedure similar to the ones presented earlier, but this time for dependent samples. The procedure is called the paired z-test for two population means and is given as follows.

The Paired z-Test for Two Population Means

Assumptions

1. Paired sample.

2. Normal differences or large sample.

Step 1

The null hypothesis is 0 dH : 0µ = , and the alternative hypothesis is one

of the following:

H1: µµµµd ≠≠≠≠ 0 or H1: µµµµd < 0 or H1: µµµµd >0

(two-tailed) (left-tailed) (right-tailed)

Step 2

Specify the significance level, α.

Step 3

Based on the sampling distribution of the mean of paired differences, we construct the following rejection regions:

|z| > zα/2 or z < - zα or z > zα

(two-tailed) (left-tailed) (right-tailed)

d 1 2µ = µ − µµ = µ − µµ = µ − µµ = µ − µ

d

d

dz

n

− µ− µ− µ− µ====σσσσ

Page 300: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eleven: Statistical Inference on Two Samples

286

We use the cumulative standard normal table to find the values of 2z and z ,α α see Figure below.

Step 4

Calculate the value of the test statistic

Step 5

Decide whether to reject the null hypothesis or whether to fail to reject it.

Confidence Intervals for the Mean of Two Populations

Using a Dependent Sample, σ Known

We can develop a confidence interval procedure similar to the ones presented earlier, but this time for dependent samples. The procedure is called the paired z-interval for two population means and is given as follows.

The Paired z-Interval for Two Population Means

Assumptions

1. Paired sample.

2. Normal differences or large sample.

d

d

dz

n

− µ=σ

2

2 2

d

d dd

d z .n

or

d z . d z .n n

α

α α

σ±

σ σ− ≤ µ ≤ +

Page 301: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

287

Confidence Intervals for the Mean of Two Populations

Using a Dependent Sample, σ Unknown

As mentioned previously, in most applications we do not know the actual standard deviation of the population and the only information we have at hand is the sample mean and the sample standard deviation.

If we assume that the sample of paired differences is randomly and independently drawn from a population that is normally distributed, then a t test can be used to determine whether there is a significant population mean difference. Accordingly, we may develop a procedure similar to the one-sample t-test for a population mean. The procedure is called the paired t-test for two population means and is given as follows.

The Paired t-Test for Two Population Means

Assumptions

1. Paired sample.

2. Normal differences or large sample.

Step 1

The null hypothesis is 0 dH : 0µ = , and the alternative hypothesis is one

of the following:

H1: µµµµd ≠≠≠≠ 0 or H1: µµµµd < 0 or H1: µµµµd >0

(two-tailed) (left-tailed) (right-tailed)

Step 2

Specify the significance level, α.

Step 3

Based on the sampling distribution of the mean of paired differences, we construct the following rejection regions:

|t| > t(df, α/2) or t < -t(df, α) or t > t(df,α)

(two-tailed) (left-tailed) (right-tailed)

with df = n – 1, use the t-table to find values of tα/2 and tα. See Figure below.

Page 302: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eleven: Statistical Inference on Two Samples

288

Step 4

Calculate the value of the test statistic

Step 5

Decide whether to reject the null hypothesis or whether to fail to reject it.

Confidence Intervals for the Mean of Two Populations

Using a Dependent Sample, σ Unknown

We can develop a confidence interval procedure similar to the ones presented earlier, but this time for dependent samples when the population standard deviation, σ, is unknown. The procedure is called the paired t-interval for two population means and is given as follows.

The Paired t-Interval for Two Population Means

Assumptions

1. Paired sample.

2. Normal differences or large sample.

Example 11.3 Illustrating Inference About Two Means: Dependent Samples

The scores of 10 students in a test before and after using a new teaching method are listed in the following table.

Student

1 2 3 4 5 6 7 8 9 10

Before 61 76 70 70 66 50 80 73 77 43 After 93 77 67 72 52 83 66 84 59 63

a. Test the hypothesis of no difference in mean scores before and after the new teaching method, Use α = 0.05.

b. Construct a 95% confidence interval for the mean difference of the "before" minus "after" scores.

d

d

dt

s n

− µ=

( )

( ) ( )

1 2

1 2 1 2

dn ,

d ddn , n ,

s d t .

n

or

s sd t . d t .

n n

− −

±

− ≤ ≤ +

αααα

α αα αα αα αµµµµ

Page 303: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

289

Solution

We begin the solution by checking the two conditions required to use the paired t-test. In this problem we are dealing with a paired-sample; each pair consists of the score of a student one time before and another time after the new teaching method. So assumption 1 is satisfied.

Since the sample size, n = 10, is small therefore the differences must be assumed normally distributed. Consequently, the paired t-test can be applied to perform the required hypothesis test.

Let µ1 = mean score of before and µ2 = mean score of after. Then µd = µ1 - µ2 is the mean of paired difference.

a. To see if there is any evidence of a difference in the mean score between before and after, we perform the paired t-test for two population means as follows.

Step 1

The null hypothesis: H0: µd = 0 (no difference between the two scores)

The alternative hypothesis: H1: µd ≠ 0 (a difference exists)

Step 2

Level of significance, α = 0.05.

Step 3

Criterion: Since the test is two-tailed, we reject H0 if

2t t with df = n - 1.α> From step 2, α = 0.05. Also, since there are 10 pairs

in the sample, we have df = n – 1 = 10 – 1 = 9. From the t-table, we find that 9 0 025 2 262=( , . )t . , and thus we reject H0 if t > 2.262 or t < -2.262, see

Figure 11.4.

Step 4

Calculations: In order to calculate the value of t, we need to find dd and s .

Therefore, we construct the following table.

Student Before After d = Before - After d2

1 61 93 -32 1024 2 76 77 -1 1 3 70 67 3 9 4 70 72 -2 4 5 66 52 14 196 6 50 83 -33 1089 7 80 66 14 196 8 73 84 -11 121 9 77 59 18 324

10 43 63 -20 400 Total 666 716 -50 3364

Page 304: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eleven: Statistical Inference on Two Samples

290

Step 5

Decision: Since t = -0.08 falls in the nonrejection region, see Figure 11.4 again, then we do not reject H0; that is, the test fail to reject the null hypothesis.

Figure 11.4 Criterion for deciding whether or not to reject the null hypothesis in Example 11.3

b. To construct a 95% confidence interval for the mean of the difference of two populations, we perform the paired t-interval for two population means as follows.

We can be 95% confident that the difference in the mean score of all students when the "before" method is used and the mean score of all students when the "after" method is used is somewhere between -18.30 and 8.30.

( ) ( )2 22

d

d 50 d 5

n 10

and

d d n 3364 50 10 s 18.60

n 1 10 1

Therefore, the value of the test ststistic is

t =

∑ ∑

−= = = −

− − −= = =

− −

-5 - 0.85

18.60 10=

d2

d

s 18.60 d t . 5 2.262.

n 10 = 5 13.30

or

18.30 8.30

α± = − ±

− ±

− ≤ µ ≤

Page 305: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

291

11.3 Inference About Two Population Proportions

In Section 9.4 we studied confidence interval about one population proportion and in Section 10.4 we studied testing hypothesis about one population proportion. In this section we study confidence intervals and hypothesis testing about two population proportions. We assume two populations and one specified characteristic of interest; the problem is to compare the proportion of one population having the specified characteristic to the proportion of the other population having the same specified characteristic. We start with hypothesis testing.

Example 11.4 Introducing Hypothesis Test for Two population Proportions

Students in the faculty of science and the faculty of IT are asked the following question: Do you plan to continue your higher education after you graduate? A student who answers "yes" is considered a "success". Independent random samples of sizes 260 and 266 are obtained in order to compare the percentage of students who plan to continue their higher education in both faculties. Of the sample taken faculty of science, 13 said yes, and of the sample taken from the faculty of IT, 8 said yes. Is it reasonable to conclude that both faculties have the same fraction of students planning for graduate study? Use α = 0.05.

Solution

First note that the characteristic we are interested in is "number of students who plan to continue higher studies" and the two populations are:

Population 1: All students in the faculty of science.

Population 2: All students in the faculty of IT.

Let p1 and p2 denote the population proportions for students who plan to continue their higher education for the two populations:

p1 = proportion of all students in the faculty of science who plan

for higher education.

p2 = proportion of all students in the faculty of IT who plan

for higher education.

We wish to test the hypothesis

H0: p1 = p2 (proportion of students is equal)

H1: p1 ≠ p2 (proportion of students differs)

As before and roughly speaking, we can test the hypothesis as follows:

Page 306: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eleven: Statistical Inference on Two Samples

292

1. Compute the proportion of students plan for higher education in the faculty of science, 1p , and the proportion of students plan for higher education in

the faculty of IT, 2p .

2. If the difference 1p - 2p is "too much", reject H0; otherwise, do not reject H0.

It is always easy to do the first step. Since 13 of the 260 said "yes" in the faculty of science and 8 of the 266 said "yes" in the faculty of IT, then

To do the second step, we must know the sampling distribution of the difference, 1p - 2p , to be able to decide whether the sample proportion of

1p 0.05= differs too much than the sample proportion of 2p 0.03= , a reason

which makes us reject the null hypothesis in favor of the alternative. In other words, we must decide whether the difference between the two sample proportions is due to sampling error or whether the proportion of students who said "yes" differs in the two faculties.

To take that decision, we must know the sampling distribution of the difference between two proportions. We discuss the sampling distribution for large and independent samples and then return back to the test of hypothesis.

The Sampling Distribution of the Difference between Two Population Proportions for Large and Independent Samples

If 1 1 1 2 2 2ˆ ˆp x n and p x n= = , then for independent samples of sizes n1 and n2

taken from two populations, we have:

The above result states that, for large samples, the differences between two sample proportions have approximately a normal distribution with mean

equals to p1 – p2 and standard deviation ( ) ( )1 1 1 2 2 2p 1 p n p 1 p n− + − .

11

1

22

2

x 13ˆ p 0.05 (or 5%)

n 260

and

x 8ˆ p 0.03 (or 3%).

n 266

= = =

= = =

( ) ( )ˆ ˆp p 1 21 2

ˆ ˆp p 1 1 1 2 2 21 2

1 2 1 2

p p ,

p 1 p n p 1 p n ,

ˆ ˆ p p is approximately normally distributed for large n and n .

µ = −

σ = − + −

i

i

i

Page 307: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

293

Large-Sample Hypothesis Tests for Two Population Proportions Using Independent Samples

From the preceding result, we know the variable that we can use as a test statistic for the hypothesis test is

which has approximately the standard normal distribution.

Since the null hypothesis for a hypothesis test to compare two population proportions is

H0: p1 = p2 (population proportions are equal),

then, if H0 is true, we have p1 - p2 = 0 and the variable z can be written as

where p denotes the common value of p1 and p2. If we factor out the quantity p(1 – p) of the denominator, we get

Unfortunately, we cannot use this variable as the test statistic because p is unknown.

For this reason, we have to estimate p from the samples. The best estimate of p is obtained by pooling the data to get the proportion of successes in both samples combined; that is, we estimate p by

We call pp the pooled sample proportion.

Now we replace p in z by its estimate pp to get the following variable

( ) ( )( ) ( )

1 2 1 2

1 1 1 2 2 2

ˆ ˆp p p pz

p 1 p n p 1 p n

− − −=

− + −

( ) ( ) ( )1 2

1 2

ˆ ˆp pz

p 1 p 1 n 1 n

−=− +

( ) ( )1 2

1 2

ˆ ˆp pz

p 1 p n p 1 p n

−=− + −

1 2p

1 2

x xp .

n n

+=+

( ) ( ) ( )1 2

p p 1 2

ˆ ˆp pz

ˆ ˆp 1 p 1 n 1 n

−=− +

Page 308: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eleven: Statistical Inference on Two Samples

294

,which has the standard normal distribution for large samples if the null hypothesis is true and can, therefore, be used as the test statistic. Consequently, we have the following test of hypothesis procedure, which is called the two-sample z-test for two population proportions.

The Two-Sample z-Test for Two Population Proportions

Assumptions

1. Independent samples.

2. 1 1 1 2 2 2x , n -x , x , and n -x are all 5 or greater.

Step 1

The null hypothesis is H0: p1 = p2 and the alternative hypothesis is one of the following three:

H1: p1 ≠≠≠≠ p2 or H1: p1 < p2 or H1: p1 > p2

(two-tailed) (left-tailed) (right-tailed)

Step 2

Specify the significance level, α.

Step 3

Based on the sampling distribution of the difference between two population proportions we construct the following rejection regions:

|z| > zα/2 or z < - zα or z > zα

(two-tailed) (left-tailed) (right-tailed)

We use the cumulative standard normal table to find the values of 2z and z ,α α see Figure below.

Step 4

Calculate the value of the test statistic

( ) ( ) ( )( ) ( )

1 2

p p 1 2

p 1 2 1 2

ˆ ˆp p z ,

ˆ ˆp 1 p 1 n 1 n

ˆ where p x x n n .

−=− +

= + +

Page 309: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

295

Step 5

Decide whether to reject the null hypothesis or whether to fail to reject it.

Large-Sample Confidence Intervals for the Difference between Two Population Proportions

The fact that, for large samples, the differences between the two sample-proportions have approximately a normal distribution with mean p1 – p2 and standard deviation

can be used to derive the following confidence interval procedure for the difference between two population proportions. This procedure is called the two-sample z-interval for two population proportions.

The Two-Sample z-Interval for Two Population Proportions

Assumptions

1. Independent samples.

2. 1 1 1 2 2 2x , n -x , x , and n -x are all 5 or greater.

Example 11.5 Illustrating the Two-Sample z-Test for Two Population Proportions

Go back to Example 11.4 where independent samples of 260 and 266, are obtained in order to compare the percentage of students who plan to continue their higher education, in the faculty of science, 13 said "yes" while in the faculty of IT, 8 said "yes"..

a. Is it reasonable to conclude that both faculties have the same fraction of students who said yes? Use α = 0.05.

b. Construct a 95% confidence interval for the two population proportions.

Solution

Note that the two assumptions are satisfied.

a. We apply the two-sample z-test for two population proportions as follows.

Step 1

The null hypothesis: H0: p1 = p2

The alternative hypothesis: H1: p1 ≠ p2

( ) ( ) ( )1 2 2 1 1 1 2 2 2ˆ ˆ ˆ ˆ ˆ ˆp p z p 1 p n p 1 p nα− ± − + −

( ) ( )1 1 1 2 2 2p 1 p n p 1 p n − + −

Page 310: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eleven: Statistical Inference on Two Samples

296

Step 2

Level of significance, α = 0.05.

Step 3

Criterion: Since the test is two-tailed, we reject H0 if |z| > 1.96; that is, if z > 1.96 or z < -1.96. See Figure 11.5.

Step 4

Calculations: We calculate the value of the test statistic

We first obtain

Consequently, the value of the test statistic is

Step 5

Decision: Since z = 1.17 falls in the nonrejection region, see Figure 11.5 again, then we do not reject H0; that is, the test fail to reject the null hypothesis.

Figure 11.5 Criterion for deciding whether or not to reject the null hypothesis in Example 11.5

( ) ( ) ( )( ) ( )

1 2

p p 1 2

p 1 2 1 2

ˆ ˆp p z ,

ˆ ˆp 1 p 1 n 1 n

ˆwhere p x x n n .

−=− +

= + +

( ) ( ) ( )0.05 0.03

1.170.04 1 0.04 1 260 1 266

− =− +

1 21 2

1 2

1 2p

1 2

x 13 x 8ˆ ˆ p 0.05, p 0.03

n 260 n 266

and

x x 13 8 21ˆ p 0.04

n n 260 266 526

= = = = = =

+ += = = =+ +

Page 311: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

297

b. To construct a 95% confidence interval for the two population proportions, we apply the two-sample z-interval for two population proportions as follows.

We can be 95% confident that the percentages of all students who plan to continue their higher education in the two faculties is somewhere between -0.013 and 0.053.

The Maximum Error and Sample Size

We present formulas for maximum error and sample size used in making inferences about two population proportions. From our knowledge on the one population case, we know that we must first find the maximum error and use it to determine the value of the sample size.

The maximum error in estimating the difference between two population proportions can be easily found from the confidence interval, and from the formula of the maximum error we can determine the sample sizes needed to construct a confidence interval with a specified level of confidence and margin of error.

The maximum error for the estimate of p1 – p2 is given by

Observe that the maximum error equals to half the width of the confidence interval. Also observe that it represents the precision with which the difference between the two sample proportions, 1 2ˆ ˆp p− , estimates the population

proportion, p1 – p2, at the specified level of confidence.

A (1 – α)100% confidence interval for the difference between two population proportions having a maximum error E can be obtained by choosing

rounded up to the next integer.

( ) ( ) ( )( ) ( ) ( )

1 2 2 1 1 1 2 2 2ˆ ˆ ˆ ˆ ˆ ˆp p z p 1 p n p 1 p n

0.05 0.03 1.96. 0.05 0.95 260 0.03 0.97 266

0.02 0.033, or -0.013 to 0.053.

α− ± − + − =

− ± + =

±

( ) ( )2 1 1 1 2 2 2ˆ ˆ ˆ ˆE z . p 1 p n p 1 p n .α= − + −

21 2

zn n 0.5 ,

= =

Page 312: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eleven: Statistical Inference on Two Samples

298

Exercises

11.1 Two models of a car are tested for their mileage consumption (miles per gallon) and sample results are given below.

Model X Model Y

xn 36= yn 39=

x 19.3= y 15.2=

xs 1.3= ys 0.9=

i. Test the hypothesis that Model X cars have a higher mileage than Model Y. Use α = 0.05.

ii. Construct a 95% confidence interval for the difference µX - µY based on the sample data. Assume that the two samples are independent and the population standard deviations are known. 11.2 A researcher wishes to determine whether tiers produced by company 1 have less age than tires produced by company 2. Use the sample data below (where the sample means and standard deviations are in thousands kilometers) to

a. test the hypothesis that company 1 tires has a lower age than company 2 tires. Use α = 0.01.

b. construct a 99% confidence interval for µ1 - µ2 where µ1 and µ2 represent the mean for company 1 and company 2, respectively. Assume that the two samples are independent and the population standard deviations are known.

Company 1 Company 2

n1 = 80 tires n2 = 75 tires

1x 188.2= 2x 202.5=

1s 37.8= 2s 38.9=

11.3 A car battery production company is interested in determining whether or not a difference exists between the lives of two different brands. A random sample of 81 batteries of each brand was selected and the life in years was determined for each box. The sample results are given below.

a. Test the hypothesis of no difference exists between the lives in years for the two brands. Use α = 0.10.

b. Construct a 90% confidence interval for µA - µB, the difference in mean life between brand A and brand B.

Brand A Brand B

1n 81= 2n 81=

1x 4.2= 2x 4.8=

1s 0.8= 2s 1.2=

Assume that the two samples are independent and the population standard deviations are known.

11.4 A teacher is interested in performing a significance test to compare the mean math score of the girls and the mean math score of the boys. She randomly selects 12 girls from the class and then randomly selects 12 boys. She arranges the girls' names alphabetically and uses this list to assign each girl a number between 1 and 12. She does the same thing for the boys. What is the correct test procedure? A 1-sample t-test, 2-sample t-test, or paired t-test?

Page 313: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

299

11.5 A researcher wishes to determine whether listening to music affects students' performance on memory test. He randomly selects 35 students and has each student perform a memory test once while listening to music and once without listening to music. He obtains the mean and standard deviation of the 35 "with music" scores and obtains the mean and standard deviation of the 35 "without music scores". What is the correct test procedure? A 1-sample t-test, 2-sample t-test, or paired t-test?

11.6 Each of a random sample of ten college freshmen takes a mathematics test both before and after taking an intensive training course designed to improve such test scores. Then, the scores for each student are paired, as shown in the table below:

Student 1 2 3 4 5

Before (B) 65 70 44 86 69

After (A) 70 83 42 91 73

d = A – B 5 13 -2 5 4

Student 6 7 8 9 10

Before (B) 77 92 64 52 96

After (A) 86 95 73 70 97

d = A – B 9 3 9 15 1

Assume that the assumptions necessary to perform the required inferences are satisfied.

a. Compare the mean scores after and before the training course by (i) finding the difference of the sample means and (ii) finding the mean of the difference scores.

b. Compare the mean scores after and before the training course by constructing and interpreting a 90% confidence interval for the population mean difference.

c. Test the hypothesis that the mean scores after the training course is higher than the mean scores before. Use α = 0.10.

11.7 The table below shows the weights in kilograms of 12 subjects before and after they followed a particular diet for two months.

a. Test the hypothesis of no difference in mean weights before and after the diet. Use α = 0.05.

b. Construct a 95% confidence interval for the mean difference of the "before" minus "after" weights.

Assume that the assumptions necessary to perform the required inferences are satisfied.

11.8 A researcher is interested in comparing the mean supermarket prices of two leading tooth pastes. His sample was taken by randomly going to each of eleven supermarkets and recording the price of a tooth paste of each brand. The data are shown in the following table:

Subject 1 2 3 4 5 6

Before 76 82 72 56 94 57

After 73 81 68 54 83 58

Subject 7 8 9 10 11 12

Before 85 97 79 83 65 84

After 82 90 75 79 63 85

Page 314: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eleven: Statistical Inference on Two Samples

300

Price (in J.D.)

Supermarket Brand 1 Brand 2 1 1.48 1.32 2 1.25 1.42 3 1.26 1.26 4 1.36 1.41 5 1.27 1.47 6 1.40 1.30 7 1.20 1.25 8 1.36 1.40 9 1.15 1.25

10 1.40 1.50 11 1.31 1.45

a. Is there any evidence of a difference that the mean supermarket price of brand 1 is less than that of brand 2? Use α = 0.90.

b. Construct a 90% confidence interval for the difference in mean price of brand 1 and brand 2.

Assume that the paired data came from a population that is normally distributed.

11.9 From the sample statistics, find the value of 1 2ˆ ˆp p ,− the point

estimate of the difference of two proportions.

(i) n1 = 200, n2 = 200, x1 = 65, and x2 = 58.

(ii) n1 = 220, n2 = 190, x1 = 67, and x2 = 85.

(iii) n1 = 416, n2 = 40, x1 = 132, and x2 = 8.

11.10 In a random sample of 200 people aged 20-24, 19% were smokers. In a random sample of 200 people aged 25-29, 12% were smokers. Assume that the samples are independent and they have been randomly selected.a. Is it reasonable

to conclude that both age groups have the same percentage of smokers? Use α = 0.05.

b. Construct a 95% confidence interval for the difference in smoking proportions for the two groups.

11.11 A university found it retained 21 students out of 300 in 2007 and 28 students out of 310 in 2008. Assume that the samples are independent and they have been randomly selected.

a. Is it reasonable to conclude that the year 2008 has a larger percentage of retained students than the year 2007? Use α = 0.10.

b. Construct a 90% confidence interval for the difference in the proportions of students retained in 2003 and 2004.

11.12 The health department in a city has concerns about the chlorine level at a local water park. The health department tests the water Sunday with readings of 4 mg out of 100 mL. On Friday when the water is retested, the health department gets readings of 5 mg out of 120 mL. Assume that the samples are independent and they have been randomly selected.

a. Is it reasonable to assume that the Sunday readings has a lower level of chlorine than the Friday readings? Use α = 0.02.

b. Construct a 98% confidence interval for the difference in the proportions of chlorine levels on Sunday and Friday.

Page 315: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Part IV: Statistical Inference

301

Chapter Learning Outcomes

When you complete your careful study of this chapter you should be able to:

1. use and understand the formulas presented in this chapter.

2. perform tests of hypotheses and confidence intervals to compare the means of two populations using independent samples when the population standard deviations are known.

3. perform tests of hypotheses and confidence intervals to compare the means of two populations using independent samples when the population standard deviations are unknown, but assumed equal.

4. perform tests of hypotheses and confidence intervals to compare the means of two populations using a paired sample.

5. perform large-sample tests of hypotheses and confidence intervals for two population proportions using independent samples.

Chapter Key Terms

Independent samples

Paired differences

Paired difference variable

Paired samples

Paired t-interval procedure

Paired t-test

Pooled sample standard deviation

Pooled t-interval procedure

Pooled t-test

Sampling distribution of the difference between two means

Sampling distribution of the difference between two proportions

Page 316: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

Chapter Eleven: Statistical Inference on Two Samples

302

Page 317: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

303

Area under the Standard Normal Curve Negative Values of z Second decimal place in z

z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09

-3.4 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0002

-3.3 .0005 .0005 .0005 .0004 .0004 .0004 .0004 .0004 .0004 .0003

-3.2 .0007 .0007 .0006 .0006 .0006 .0006 .0006 .0005 .0005 .0005

-3.1 .0010 .0009 .0009 .0009 .0008 .0008 .0008 .0008 .0007 .0007

-3.0 .0013 .0013 .0013 .0012 .0012 .0011 .0011 .0011 .0010 .0010

-2.9 .0019 .0018 .0018 .0017 .0016 .0016 .0015 .0015 .0014 .0014

-2.8 .0026 .0025 .0024 .0023 .0023 .0022 .0021 .0021 .0020 .0019

-2.7 .0035 .0034 .0033 .0032 .0031 .0030 .0029 .0028 .0027 .0026

-2.6 .0047 .0045 .0044 .0043 .0041 .0040 .0039 .0038 .0037 .0036

-2.5 .0062 .0060 .0059 .0057 .0055 .0054 .0052 .0051 .0049 .0048

-2.4 .0082 .0080 .0078 .0075 .0073 .0071 .0069 .0068 .0066 .0064

-2.3 .0107 .0104 .0102 .0099 .0096 .0094 .0091 .0089 .0087 .0084

-2.2 .0139 .0136 .0132 .0129 .0125 .0122 .0119 .0116 .0113 .0110

-2.1 .0179 .0174 .0170 .0166 .0162 .0158 .0154 .0150 .0146 .0143

-2.0 .0228 .0222 .0217 .0212 .0207 .0202 .0197 .0192 .0188 .0183

-1.9 .0287 .0281 .0274 .0268 .0262 .0256 .0250 .0244 .0239 .0233

-1.8 .0359 .0351 .0344 .0336 .0329 .0322 .0314 .0307 .0301 .0294

-1.7 .0446 .0436 .0427 .0418 .0409 .0401 .0392 .0384 .0375 .0367

-1.6 .0548 .0537 .0526 .0516 .0505 .0495 .0485 .0475 .0465 .0455

-1.5 .0668 .0655 .0643 .0630 .0618 .0606 .0594 .0582 .0571 .0559

-1.4 .0808 .0793 .0778 .0764 .0749 .0735 .0721 .0708 .0694 .0681

-1.3 .0968 .0951 .0934 .0918 .0901 .0885 .0869 .0853 .0838 .0823

-1.2 .1151 .1131 .1112 .1093 .1075 .1056 .1038 .1020 .1003 .0985

-1.1 .1357 .1335 .1314 .1292 .1271 .1251 .1230 .1210 .1190 .1170

-1.0 .1587 .1562 .1539 .1515 .1492 .1469 .1446 .1423 .1401 .1379

-0.9 .1841 .1814 .1788 .1762 .1736 .1711 .1685 .1660 .1635 .1611

-0.8 .2119 .2090 .2061 .2033 .2005 .1977 .1949 .1922 .1894 .1867

-0.7 .2420 .2389 .2358 .2327 .2296 .2266 .2236 .2206 .2177 .2148

-0.6 .2743 .2709 .2676 .2643 .2611 .2578 .2546 .2514 .2483 .2451

-0.5 .3085 .3050 .3015 .2s981 .2946 .2912 .2877 .2843 .2810 .2776 -0.4 .3446 .3409 .3372 .3336 .3300 .3264 .3228 .3192 .3156 .3121

-0.3 .3821 .3783 .3745 .3707 .3669 .3632 .3594 .3557 .3520 .3483

-0.2 .4207 .4168 .4129 .4090 .4052 .4013 .3974 .3936 .3897 .3859

-0.1 .4602 .4562 .4522 .4483 .4443 .4404 .4364 .4325 .4286 .4247

0.0 .5000 .4960 .4920 .4880 .4840 .4801 .4761 .4721 .4681 .4641

Page 318: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

304

Area under the Standard Normal Curve Positive Values of z

Second decimal place in z

z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09

0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359

0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753

0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141

0.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517

0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879

0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224

0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549

0.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852

0.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133

0.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389

1.0 .8413 .8438 .8461 .8485 .8508 .8531 .8554 .8577 .8599 .8621

1.1 .8643 .8665 .8686 .8708 .8729 .8749 .8770 .8790 .8810 .8830

1.2 .8849 .8869 .8888 .8907 .8925 .8944 .8962 .8980 .8997 .9015

1.3 .9032 .9049 .9066 .9082 .9099 .9115 .9131 .9147 .9162 .9177

1.4 .9192 .9207 .9222 .9236 .9251 .9265 .9279 .9292 .9306 .9319

1.5 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441

1.6 .9452 .9463 .9474 .9484 .9495 .9505 .9515 .9525 .9535 .9545

1.7 .9554 .9564 .9573 .9582 .9591 .9599 .9608 .9616 .9625 .9633

1.8 .9641 .9649 .9656 .9664 .9671 .9678 .9686 .9693 .9699 .9706

1.9 .9713 .9719 .9726 .9732 .9738 .9744 .9750 .9756 .9761 .9767

2.0 .9772 .9778 .9783 .9788 .9793 .9798 .9803 .9808 .9812 .9817

2.1 .9821 .9826 .9830 .9834 .9838 .9842 .9846 .9850 .9854 .9857

2.2 .9861 .9864 .9868 .9871 .9875 .9878 .9881 .9884 .9887 .9890

2.3 .9893 .9896 .9898 .9901 .9904 .9906 .9909 .9911 .9913 .9916

2.4 .9918 .9920 .9922 .9925 .9927 .9929 .9931 .9932 .9934 .9936

2.5 .9938 .9940 .9941 .9943 .9945 .9946 .9948 .9949 .9951 .9952

2.6 .9953 .9955 .9956 .9957 .9959 .9960 .9961 .9962 .9963 .9964

2.7 .9965 .9966 .9967 .9968 .9969 .9970 .9971 .9972 .9973 .9974

2.8 .9974 .9975 .9976 .9977 .9977 .9978 .9979 .9979 .9980 .9981

2.9 .9981 .9982 .9982 .9983 .9984 .9984 .9985 .9985 .9986 .9986 3.0 .9987 .9987 .9987 .9988 .9988 .9989 .9989 .9989 .9990 .9990

3.1 .9990 .9991 .9991 .9991 .9992 .9992 .9992 .9992 .9993 .9993

3.2 .9993 .9993 .9994 .9994 .9994 .9994 .9994 .9995 .9995 .9995

3.3 .9995 .9995 .9995 .9996 .9996 .9996 .9996 .9996 .9996 .9997

3.4 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9998

Page 319: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

305

The t-table Values of t

α

df 0.10 0.05 0.025 0.02 0.01 0.005 df 1 3.078 6.314 12.706 15.894 31.821 63.656 1

2 1.886 2.920 4.303 4.849 6.965 9.925 2

3 1.638 2.353 3.182 3.482 4.541 5.841 3

4 1.533 2.132 2.776 2.999 3.747 4.604 4

5 1.476 2.015 2.571 2.757 3.365 4.032 5

6 1.440 1.943 2.447 2.612 3.143 3.707 6

7 1.415 1.895 2.365 2.517 2.998 3.499 7

8 1.397 1.860 2.306 2.449 2.896 3.355 8

9 1.383 1.833 2.262 2.398 2.821 3.250 9

10 1.372 1.812 2.228 2.359 2.764 3.169 10

11 1.363 1.796 2.201 2.328 2.718 3.106 11

12 1.356 1.782 2.179 2.303 2.681 3.055 12

13 1.350 1.771 2.160 2.282 2.650 3.012 13

14 1.345 1.761 2.145 2.264 2.624 2.977 14

15 1.341 1.753 2.131 2.249 2.602 2.947 15

16 1.337 1.746 2.120 2.235 2.583 2.921 16

17 1.333 1.740 2.110 2.224 2.567 2.898 17

18 1.330 1.734 2.101 2.214 2.552 2.878 18

19 1.328 1.729 2.093 2.205 2.539 2.861 19

20 1.325 1.725 2.086 2.197 2.528 2.845 20

21 1.323 1.721 2.080 2.189 2.518 2.831 21

22 1.321 1.717 2.074 2.183 2.508 2.819 22

23 1.319 1.714 2.069 2.177 2.500 2.807 23

24 1.318 1.711 2.064 2.172 2.492 2.797 24

25 1.316 1.708 2.060 2.167 2.485 2.787 25

26 1.315 1.706 2.056 2.162 2.479 2.779 26

27 1.314 1.703 2.052 2.158 2.473 2.771 27

28 1.313 1.701 2.048 2.154 2.467 2.763 28

29 1.311 1.699 2.045 2.150 2.462 2.756 29

30 1.310 1.697 2.042 2.147 2.457 2.750 30

40 1.303 1.684 2.021 2.123 2.423 2.704 40

50 1.299 1.676 2.009 2.109 2.403 2.678 50

60 1.296 1.671 2.000 2.099 2.390 2.660 60

70 1.294 1.667 1.994 2.093 2.381 2.648 70

80 1.292 1.664 1.990 2.088 2.374 2.639 80

90 1.291 1.662 1.987 2.084 2.368 2.632 90

100 1.290 1.660 1.984 2.081 2.364 2.626 100

Page 320: Dr. Jaffar S. Almousawi An Introduction to Statistics...E. Mall: darelbaraka@yahoo.com ISBN 978 - 9957 – 414 – 69 – 6 (كدر ) iii An Introduction to Statistics First Edition

306

Bibliography

− Blair, R., and Taylor, Richard (2007), Biostatistics for the Health Sciences, 1st edition, Pearson.

− Daniel, W. Wayne (1995), Biostatistics: A Foundation for Analysis in the Health Sciences, 6th edition, John Wiley & Sons.

− Johnson, Richard A, and Wichern Dean W. (1997), Business Statistics: Decision Making with Data, John Wiley & Sons.

− Johnson, Richard A., Miller Irwin and Freund, John (2004), Miller & Fuend's Probability and Statistics for Engineers, 7th edition, Prentice Hall.

− Levine, David (2007) , Statistics for Managers Using Excel and Student CD Package, 5th edition, Pearson.

− Levine, David M., Berenson, Mark L., and Stephan, David (1999), Statistics for Managers Using Microsoft Excel, 2nd edition, Prentice Hall.

− McClave, James, Benson, P. George, and Sincich, Terry (2007), Statistics for Business & Economics, 10th edition, Pearson.

− Montgomery, Douglas C., and Runger, George C. (2006), Applied Statistics and Probability for Engineers, 3rd edition, John Wiley & Sons.

− Montgomery, Douglas C., Runger, George C. and Hubele, Norma Faris (2007), Engineering Statistics, 4th edition, John Wiley & Sons.

− Rosner, Bernard (2000), Fundamentals of Biostatistics, 5th edition, Duxbury.

− Samuels, Myra, and Witmer, Jeffery (2003), Statistics for the Life Sciences, 3rd edition, Pearson.

− Sullivan, Michael III (2006), Fundamentals of Statistics, 2nd edition, Pearson.

− Sullivan, Michel (2006), Statistics: Informed Decision Using Data, 2nd edition, Pearson.

− Walpole, Ronald, Mayers, Raymond, Myers, Sharon, and Ye, keying (2006), Probability and Statistics for Engineers & Scientists, 8th edition, Pearson.

− Welkowitz, Joan, Cohen, Barry H., and Ewen, Robert B. (2007), Introductory Statistics for the Behavioral Sciences, 6th edition, John Wiley & Sons.

− Weiss, Neil A. (2007), Elementary Statistics, 7th edition, Pearson.

− Weiss, Neil A. (1999), Introductory Statistics, 5th edition, Addison Wesley.