mba15 -1.ppt

7/25/2019 mba15 -1.ppt

1/44

Business Statistics

7/25/2019 mba15 -1.ppt

2/44

Why statistics?

Decision making is often based on

analysis of data.

Statistics helps you to make sense of thedata by using tools that summarize,

present and analyze the data.

Decision maker can also ascertain the

confidence in the decisions.

7/25/2019 mba15 -1.ppt

3/44

Eamples

!o" many ne"spapers should the #endor stock

to maimize re#enue?

$ Depends on the probability distribution of demand and

epected profit %re t"o or more market segments significantly

different?

$ !ypothesis testing

What proportion of people are happy "ith the

Sith&pay commission report?

$ 'arameter estimation

7/25/2019 mba15 -1.ppt

4/44

Sample #s. 'opulation

'opulation is the entire group(collection ofindi#iduals(ob)ects(things that "e "antinformation about.

Sample is part of the population that "e actuallyeamine to gather information.

Eample$ We "ish to find the a#erage di#idend percentage of

all companies traded at *SE. %ll stocks traded at *SE comprises population

+- of the stocks selected for gathering information is thesample

7/25/2019 mba15 -1.ppt

5/44

Inferential Statistics Predict and forecast

values of populationparameters

Test hypotheses about

values of population

parameters

Make decisions

Descriptive Statistics Collect

Organize

Summarize

Display

nalyze

Subdivision within Statistics

7/25/2019 mba15 -1.ppt

6/44

Descripti#e statistics

& data and freuency distribution /he follo"ing are the departure delay in minutes of 01 flights selected

at random from a particular airport.

+ +1 02

+3 4 0

+3

1 02

52 34 67

0 07 22

26 2

02 2 17

2 +2 16

30 +1 12

04 0 12

2 01 04

23 00 13

26 06 11

7/25/2019 mba15 -1.ppt

7/44

8reuency Distribution

/able "ith t"o columns listing9

Each and e#ery group or class or inter#al of #alues

%ssociated frequency of each group

*umber of obser#ations assigned to each group Sum of freuencies is number of obser#ations

:lassmidpoint is the middle #alue of a group or class or

inter#al

Relative frequencyis the percentage(proportion of totalobser#ations in each class

Sum of relati#e freuencies ; +

7/25/2019 mba15 -1.ppt

8/44

8reuency distribution

Delay inminutes

8reuency

7/25/2019 mba15 -1.ppt

9/44

8reuency distribution& histogram

7/25/2019 mba15 -1.ppt

10/44

/"o #ariable freuency distribution

&cross tabulation

% )oint freuency distribution of t"o #ariables =e.g. o"nership of airline, delay

in minutes>

7/25/2019 mba15 -1.ppt

11/44

Descripti#e statistics & measures

easures of @ocation

easures of Aariability

Ske"ness and urtosis%ssociation bet"een t"o #ariables

7/25/2019 mba15 -1.ppt

12/44

easures of @ocation

%rithmetic ean

edian

ode 'ercentiles

Cuartiles

7/25/2019 mba15 -1.ppt

13/44

%rithmetic mean

/he mean of a data set is the a#erage

of all the data #alues.

x xn

i=x xn

i=

=xN

i=xN

i

Sample mean

'opulation mean

7/25/2019 mba15 -1.ppt

14/44

ean $ eample

%#erage delay in flight departure

xx; +320(01 ; 31.134+ minutes

7/25/2019 mba15 -1.ppt

15/44

edian

t is the middle item in a data set that isarranged in ascending(descending order

f there are n obser#ations then the

edian ; =n+>(1 th obser#ation.

computation rule

if n is odd then =n+>(1 is an integer if n is e#en then use a#erage of n(1 and n(1 + th

obser#ation

7/25/2019 mba15 -1.ppt

16/44

Eample

Sorted 01

obser#ations

median is a#erage of

1+stand 11ndobser#ation

; =3034>(1

; 36

11 02 13 06

12 07

12 04

0 16 04

2 17 2

4 34 2

+ 38 2

+1 0 23

+1 0 22

+3 01 26

+3 00 26

+2 02 67

1 02 52

7/25/2019 mba15 -1.ppt

17/44

ode

ode is the highest occurring obser#ation

$ mode in the eample is

/he greatest freuency can occur at t"oor more different #alues.

f the data ha#e eactly t"o modes, the

data are bimodal.

f the data ha#e more than t"o modes, the

data are multimodal.

7/25/2019 mba15 -1.ppt

18/44

Fi#en any set of ordered numerical

obser#ations /he Pth percentilein the orderedset is that

#alue belo" "hich lie P- =Ppercent> of the

obser#ations in the set.

/he positionof the Pthpercentile is gi#en by (n+

1)P1!!, "here nis the number of obser#ations inthe set.

'ercentiles and Cuartiles

7/25/2019 mba15 -1.ppt

19/44

Eample

:alculate 02thpercentile of the airline

delay data

the position of 02thpercentile is

02G=01+>(+ ; +5.32th

#alue of 02thpercentile

; +5th

obser#ation .32 of =1 $ +5>thobser#ation

; 16.32 =16 .32=17&16>>

7/25/2019 mba15 -1.ppt

20/44

Cuartiles

Cuartiles are special names to percentiles

C+ ; 12thpercentile

C1 ; 2th

percentile ; median C3 ; 72thpercentile

7/25/2019 mba15 -1.ppt

21/44

easures of Aariability

7/25/2019 mba15 -1.ppt

22/44

7/25/2019 mba15 -1.ppt

23/44

nteruartile range

/he interuartile range of a data set is the

difference bet"een the third uartileand the first

uartile.

t is the range for the middle 2- of the data. t o#ercomes the sensiti#ity to etreme data

#alues.

7/25/2019 mba15 -1.ppt

24/44

Aariance

/he #ariance is a measure of #ariability

that utilizes all the data.

t is based on the difference bet"een the

#alue of each obser#ation =xi> and the

mean =xfor a sample, for a population>.

2

2

= ( )xNi 2

2

= ( )xNi s xi x

n2

2

1=

( )s xi x

n2

2

1=

( )H & 'opulation #ariance

Sample #ariance & I

7/25/2019 mba15 -1.ppt

25/44

Standard de#iation

/he standard de#iation of a data set is thepositi#e suare root of the #ariance.

t is measured in the same units as the

data, making it more easily comparable,than the #ariance, to the mean.

f the data set is a sample, the standard

de#iation is denoted s. f the data set is a population, the standard

de#iation is denoted =sigma>.

7/25/2019 mba15 -1.ppt

26/44

:oefficient of Aariation

/he coefficient of #ariation indicates ho" large the

standard de#iation is in relation to the mean. f the data set is a sample, the coefficient of #ariation

is computed as follo"s9

f the data set is a population, the coefficient of

#ariation is computed as follo"s9

s

x ( )100

s

x ( )100

( )100

( )100

s

x ( )100

s

x ( )100

7/25/2019 mba15 -1.ppt

27/44

Eample

Aariance

; 062.45 minutes suare

Standard De#iation

; 1+.242 minutes

:oefficient of Aariation ;

; 1+.240(31.134+ =+> ; 66.52-

7/25/2019 mba15 -1.ppt

28/44

S"ewness

$ Ske"ness characterizes the degree of

asymmetry of a distribution around its

mean 'ositi#ely ske"ed

Symmetric or unske"ed

*egati#ely ske"ed

Ske"ness

7/25/2019 mba15 -1.ppt

29/44

!egatively ske"ed

Ske"ness

7/25/2019 mba15 -1.ppt

30/44

Ske"ness

Symmetric

7/25/2019 mba15 -1.ppt

31/44

Ske"ness

Positively Ske"ed

7/25/2019 mba15 -1.ppt

32/44

Ske"ness & measure

3

3

1

)(

N

X=

Ske"ness of a distribution is measured by

8or a gi#en data set you may use

7/25/2019 mba15 -1.ppt

33/44

urtosis

urtosis characterizes the relati#e

peakedness or flatness of a symmetric

distribution compared to the normal

distribution

'latykurtic=relati#ely flat>

esokurtic=normal>

@eptokurtic=relati#ely peaked>

7/25/2019 mba15 -1.ppt

34/44

urtosis

Platykurtic- flat distribution

7/25/2019 mba15 -1.ppt

35/44

urtosis

Mesokurtic - not too flat and not too peaked

7/25/2019 mba15 -1.ppt

36/44

urtosis

#eptokurtic- peaked distribution

7/25/2019 mba15 -1.ppt

37/44

urtosis & measure

urtosis for a distribution is measured by

4

4

2

)(

N

X=

31

=

"here

8or a gi#en data set you may use

7/25/2019 mba15 -1.ppt

38/44

%ssociation bet"een t"o #ariables

#elay $assen%ers #elay $assen%ers #elay $assen%ers

23 62 26 2+ 2 64

0 6+ 01 2 71

06 23 12 27 34 70

62 +3 27 22 64

11 02 0 20 02 73

2 24 4 20 +2 63

00 64 17 62 04 64

+1 62 67 27 22

+1 26 04 61 + 02

12 2 0 2 2 7+

+3 7 02 6+ 26 60

2 73 25 16 6

02 63 30 63 07 6+

13 26 52 05 1 04

7/25/2019 mba15 -1.ppt

39/44

%ssociation bet"een t"o #ariables

Scatter plot

:o#ariance

:orrelation :oefficient

7/25/2019 mba15 -1.ppt

40/44

Scatter 'lot

Scatter $lotsare used to identify any

underlying relationships among pairs of

data sets.

/he plot consists of a scatter of points,

each point representing an obser#ation.

7/25/2019 mba15 -1.ppt

41/44

Scatter 'lot

7/25/2019 mba15 -1.ppt

42/44

:o#ariance

/he co#ariance is a measure of the linear

association bet"een t"o #ariables.

'ositi#e #alues indicate a positi#e

relationship.

*egati#e #alues indicate a negati#e

relationship

: i

7/25/2019 mba15 -1.ppt

43/44

f the data sets are samples, the co#ariance

is denoted by

f the data sets are populations, theco#ariance is denoted by

:o#ariance

s x x y y

nxy

i i=

( )( )

1s

x x y y

nxy

i i=

( )( )

1

xy i x i yx yN

=

( )( )

xy i x i yx y

N=

( )( )

; 1.01 in the

%irline

eample

: l ti : ffi i t

7/25/2019 mba15 -1.ppt

44/44

:orrelation :oefficient

/he coefficient can take on #alues bet"een &+ and +.

Aalues near &+ indicate a strong negati#e linear relationship. Aalues near + indicate a strong positi#e linear relationship.

f the data sets are samples, the coefficient is

f the data sets are populations, the coefficient is

xyxy

x y=

xyxy

x y=

rs

s sxy xy

x y=r

s

s sxy xy

x y= ; .+1+ in %irlineeample

Documents

mba15 -1.ppt