21
Probability & Statistics: Infinite Statistics Robert Leishman Mark Colton ME 363 Spring 2011

Probability & Statistics: Infinite Statistics · Infinite vs. Finite Statistics In theory, we can have infinite data sets – An infinite number of data points – We can examine

  • Upload
    others

  • View
    21

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Probability & Statistics: Infinite Statistics · Infinite vs. Finite Statistics In theory, we can have infinite data sets – An infinite number of data points – We can examine

Probability & Statistics: Infinite Statistics

Robert Leishman Mark Colton

ME 363 Spring 2011

Page 2: Probability & Statistics: Infinite Statistics · Infinite vs. Finite Statistics In theory, we can have infinite data sets – An infinite number of data points – We can examine

Large Data Sets

What happens to a histogram as N becomes large (N → ∞)? – Number of bins becomes large (K → ∞) – Width of bins becomes small (δx → 0) – Histogram becomes smoother and approaches a

continuous function

Page 3: Probability & Statistics: Infinite Statistics · Infinite vs. Finite Statistics In theory, we can have infinite data sets – An infinite number of data points – We can examine

Example: ASTM-A242 Steel

N = 100 K = 12 Save = 479.18 MPa

450 460 470 480 490 500 5100

0.05

0.1

0.15

0.2

0.25

S (Mpa)

n (%

)

N = 10,000,000 K = 1,180 Save = 480.00 MPa

420 440 460 480 500 520 5400

0.5

1

1.5

2

2.5

3

3.5

4 x 10-3

S (Mpa)n

(%)

Page 4: Probability & Statistics: Infinite Statistics · Infinite vs. Finite Statistics In theory, we can have infinite data sets – An infinite number of data points – We can examine

Probability Density Function

For an infinite data set, the frequency distribution is smooth and continuous

This function is called the probability density function (p.d.f.)

The p.d.f. relates the value of a measured variable to its likelihood of occurring

Page 5: Probability & Statistics: Infinite Statistics · Infinite vs. Finite Statistics In theory, we can have infinite data sets – An infinite number of data points – We can examine

Infinite vs. Finite Statistics

In theory, we can have infinite data sets – An infinite number of data points – We can examine the entire population, or all possible

values In reality, data sets are finite

– A limited number of data points – Based on a sample of the entire population

We will use our limited sample size to extract information about the entire population

– Example: Extract information about ALL ASTM-A242 steel from 1000 specimens

Page 6: Probability & Statistics: Infinite Statistics · Infinite vs. Finite Statistics In theory, we can have infinite data sets – An infinite number of data points – We can examine

Standard p.d.f.s

There are many shapes of p.d.f.s that can occur

See Table 4.2 The shape of the p.d.f can be experimentally

determined by looking at the shape of the histogram, and finding which p.d.f best matches it

Page 7: Probability & Statistics: Infinite Statistics · Infinite vs. Finite Statistics In theory, we can have infinite data sets – An infinite number of data points – We can examine

Standard p.d.f.s

Page 8: Probability & Statistics: Infinite Statistics · Infinite vs. Finite Statistics In theory, we can have infinite data sets – An infinite number of data points – We can examine

Normal Distribution

The normal or Gaussian distribution is one of the most common

It can be used to represent variables that are random about a mean – Electrical noise – Variation in strength of steel – Precision error

“Bell Curve”

Page 9: Probability & Statistics: Infinite Statistics · Infinite vs. Finite Statistics In theory, we can have infinite data sets – An infinite number of data points – We can examine

Normal Distribution

The p.d.f. of the normal distribution is given by:

– x is the value of a particular measurement – x’ is the true mean of the entire population

• Describes the tendency of the population • Determines the center point of the normal curve • Also called μ (when it is sampled, not true)

– σ is the standard deviation of the entire population • Describes the population’s spread • Determines the width of the normal curve

22

2)'(

21)( σ

πσ

xx

exp−

=

Page 10: Probability & Statistics: Infinite Statistics · Infinite vs. Finite Statistics In theory, we can have infinite data sets – An infinite number of data points – We can examine

Normal Distribution

If we have the p.d.f. of some data, then we can make predictions about the probability P of future measurements

Specifically, we can predict the probability of a data point falling within a certain interval

This probability is given by the area under the p.d.f. over the appropriate interval

∫+

−=+≤≤−

xx

xxdxxpxxxxxP

δ

δδδ

'

')()''(

∫+

−−

=+≤≤−xx

xx

xx

dxexxxxxPδ

δ

σ

πσδδ

'

'

22

2)'(

21)''(

x’ x’+δx x’-δx

Page 11: Probability & Statistics: Infinite Statistics · Infinite vs. Finite Statistics In theory, we can have infinite data sets – An infinite number of data points – We can examine

Normal Distribution

Make a change of variables: The integral then becomes

Since the normal distribution is symmetric about x’:

∫+

−−

=+≤≤−xx

xx

xx

dxexxxxxPδ

δ

σ

πσδδ

'

'

22

2)'(

21)''(

σσβ

'' 11

xxzxx −≡

−≡

∫−−

=≤≤−1

1

2

211 2

1)(z

zdezzP β

πβ

β

=≤≤− ∫

−1

2

02

11 212)(

zdezzP β

πβ

β

Page 12: Probability & Statistics: Infinite Statistics · Infinite vs. Finite Statistics In theory, we can have infinite data sets – An infinite number of data points – We can examine

Normal Distribution

The bracketed expression is called the “normal error function”

Solutions to this integral are tabulated in Table 4.3 (p. 118) The integral gives us a method for calculating the

probability that x lies between x’ ± x1 for a given distribution defined by x’ and σ

Best understood by doing some examples

σσβ

'' 11

xxzxx −≡

−≡

=≤≤− ∫

−1

2

02

11 212)(

zdezzP β

πβ

β

Page 13: Probability & Statistics: Infinite Statistics · Infinite vs. Finite Statistics In theory, we can have infinite data sets – An infinite number of data points – We can examine

Example 1

What is the area under the normal distribution curve from z1 = -1.43 to z1 = 1.43?

What is the significance of this area?

Page 14: Probability & Statistics: Infinite Statistics · Infinite vs. Finite Statistics In theory, we can have infinite data sets – An infinite number of data points – We can examine

Example 1

Page 15: Probability & Statistics: Infinite Statistics · Infinite vs. Finite Statistics In theory, we can have infinite data sets – An infinite number of data points – We can examine

Example 1 From the table:

– The area under the normal distribution curve from z1 = 0 to z1 = 1.43 is 0.4236

– This represents ½ of the integral between z1 = -1.43 to z1 = 1.43

– The total area is therefore 2(0.4236) = 0.8472 What does this mean?

– For data following a normal distribution, 84.72% of the population lies within the range -1.43 ≤ z1 ≤ 1.43

– But x1 = x’ + z1σ – So this means that 84.72% of the population lies within

±1.43 standard deviations of the mean – This is true for any normally distributed data

Page 16: Probability & Statistics: Infinite Statistics · Infinite vs. Finite Statistics In theory, we can have infinite data sets – An infinite number of data points – We can examine

Example 2

What range of a random variable x will contain 90% of the population?

Solution: – Find z1 such that 45% of the data lie between 0

and + z1 and the other 45% lie between –z1 and 0

– Use the table

Page 17: Probability & Statistics: Infinite Statistics · Infinite vs. Finite Statistics In theory, we can have infinite data sets – An infinite number of data points – We can examine

Example 2

By interpolation, z0.45 = 1.645

Page 18: Probability & Statistics: Infinite Statistics · Infinite vs. Finite Statistics In theory, we can have infinite data sets – An infinite number of data points – We can examine

Example 2

Again, x1 = x’ + z1σ So 90% of the population will fall within

the range (x’ - z0.45σ) < x < (x’ + z0.45σ)

(x’ – 1.645σ) < x < (x’ + 1.645σ) So 90% of the population will lie within

±1.645 standard deviations of the mean

Page 19: Probability & Statistics: Infinite Statistics · Infinite vs. Finite Statistics In theory, we can have infinite data sets – An infinite number of data points – We can examine

Comments The probability of a measurement occurring

can be expressed in terms of the standard deviation of the population

The probability of a measurement being within: ±1σ of the mean is 68.27% ±2σ of the mean is 95.45% ±3σ of the mean is 99.73%

Data outside ±3σ are often considered “outliers”

Page 20: Probability & Statistics: Infinite Statistics · Infinite vs. Finite Statistics In theory, we can have infinite data sets – An infinite number of data points – We can examine

Example 3 You are assigned to measure the maximum no-

load speed of a new type of DC motor You apply a constant voltage to “many” of these

motors and measure the maximum no-load speed for each motor

You calculate that x’ = 4315.25 rpm and σ = 427.5 rpm

Assuming that the variations in no-load speed are random (and normally distributed), what is the probability that a motor will have a no-load speed between 5000 and 5200 rpm?

Page 21: Probability & Statistics: Infinite Statistics · Infinite vs. Finite Statistics In theory, we can have infinite data sets – An infinite number of data points – We can examine

Example 3 P(5000≤ x ≤ 5200) = P(4315.25≤ x ≤ 5200) -

P(4315.25≤ x ≤ 5000) From upper limit (first term):

z1 = (5200-4315.25)/427.5 = 2.0696 P(2.0696) = 0.4808

From lower limit (second term): z1 = (5000-4315.25)/427.5 = 1.6018 P(1.6018) = 0.4474

P(5000≤ x ≤ 5200) = 0.4808 – 0.4474 = 0.0334 = 3.34%