Upload
rohit-sharma
View
216
Download
0
Embed Size (px)
Citation preview
7/25/2019 mba15 -1.ppt
1/44
Business Statistics
7/25/2019 mba15 -1.ppt
2/44
Why statistics?
Decision making is often based on
analysis of data.
Statistics helps you to make sense of thedata by using tools that summarize,
present and analyze the data.
Decision maker can also ascertain the
confidence in the decisions.
7/25/2019 mba15 -1.ppt
3/44
Eamples
!o" many ne"spapers should the #endor stock
to maimize re#enue?
$ Depends on the probability distribution of demand and
epected profit %re t"o or more market segments significantly
different?
$ !ypothesis testing
What proportion of people are happy "ith the
Sith&pay commission report?
$ 'arameter estimation
7/25/2019 mba15 -1.ppt
4/44
Sample #s. 'opulation
'opulation is the entire group(collection ofindi#iduals(ob)ects(things that "e "antinformation about.
Sample is part of the population that "e actuallyeamine to gather information.
Eample$ We "ish to find the a#erage di#idend percentage of
all companies traded at *SE. %ll stocks traded at *SE comprises population
+- of the stocks selected for gathering information is thesample
7/25/2019 mba15 -1.ppt
5/44
Inferential Statistics Predict and forecast
values of populationparameters
Test hypotheses about
values of population
parameters
Make decisions
Descriptive Statistics Collect
Organize
Summarize
Display
nalyze
Subdivision within Statistics
7/25/2019 mba15 -1.ppt
6/44
Descripti#e statistics
& data and freuency distribution /he follo"ing are the departure delay in minutes of 01 flights selected
at random from a particular airport.
+ +1 02
+3 4 0
+3
1 02
52 34 67
0 07 22
26 2
02 2 17
2 +2 16
30 +1 12
04 0 12
2 01 04
23 00 13
26 06 11
7/25/2019 mba15 -1.ppt
7/44
8reuency Distribution
/able "ith t"o columns listing9
Each and e#ery group or class or inter#al of #alues
%ssociated frequency of each group
*umber of obser#ations assigned to each group Sum of freuencies is number of obser#ations
:lassmidpoint is the middle #alue of a group or class or
inter#al
Relative frequencyis the percentage(proportion of totalobser#ations in each class
Sum of relati#e freuencies ; +
7/25/2019 mba15 -1.ppt
8/44
8reuency distribution
Delay inminutes
8reuency
7/25/2019 mba15 -1.ppt
9/44
8reuency distribution& histogram
7/25/2019 mba15 -1.ppt
10/44
/"o #ariable freuency distribution
&cross tabulation
% )oint freuency distribution of t"o #ariables =e.g. o"nership of airline, delay
in minutes>
7/25/2019 mba15 -1.ppt
11/44
Descripti#e statistics & measures
easures of @ocation
easures of Aariability
Ske"ness and urtosis%ssociation bet"een t"o #ariables
7/25/2019 mba15 -1.ppt
12/44
easures of @ocation
%rithmetic ean
edian
ode 'ercentiles
Cuartiles
7/25/2019 mba15 -1.ppt
13/44
%rithmetic mean
/he mean of a data set is the a#erage
of all the data #alues.
x xn
i=x xn
i=
=xN
i=xN
i
Sample mean
'opulation mean
7/25/2019 mba15 -1.ppt
14/44
ean $ eample
%#erage delay in flight departure
xx; +320(01 ; 31.134+ minutes
7/25/2019 mba15 -1.ppt
15/44
edian
t is the middle item in a data set that isarranged in ascending(descending order
f there are n obser#ations then the
edian ; =n+>(1 th obser#ation.
computation rule
if n is odd then =n+>(1 is an integer if n is e#en then use a#erage of n(1 and n(1 + th
obser#ation
7/25/2019 mba15 -1.ppt
16/44
Eample
Sorted 01
obser#ations
median is a#erage of
1+stand 11ndobser#ation
; =3034>(1
; 36
11 02 13 06
12 07
12 04
0 16 04
2 17 2
4 34 2
+ 38 2
+1 0 23
+1 0 22
+3 01 26
+3 00 26
+2 02 67
1 02 52
7/25/2019 mba15 -1.ppt
17/44
ode
ode is the highest occurring obser#ation
$ mode in the eample is
/he greatest freuency can occur at t"oor more different #alues.
f the data ha#e eactly t"o modes, the
data are bimodal.
f the data ha#e more than t"o modes, the
data are multimodal.
7/25/2019 mba15 -1.ppt
18/44
Fi#en any set of ordered numerical
obser#ations /he Pth percentilein the orderedset is that
#alue belo" "hich lie P- =Ppercent> of the
obser#ations in the set.
/he positionof the Pthpercentile is gi#en by (n+
1)P1!!, "here nis the number of obser#ations inthe set.
'ercentiles and Cuartiles
7/25/2019 mba15 -1.ppt
19/44
Eample
:alculate 02thpercentile of the airline
delay data
the position of 02thpercentile is
02G=01+>(+ ; +5.32th
#alue of 02thpercentile
; +5th
obser#ation .32 of =1 $ +5>thobser#ation
; 16.32 =16 .32=17&16>>
7/25/2019 mba15 -1.ppt
20/44
Cuartiles
Cuartiles are special names to percentiles
C+ ; 12thpercentile
C1 ; 2th
percentile ; median C3 ; 72thpercentile
7/25/2019 mba15 -1.ppt
21/44
easures of Aariability
7/25/2019 mba15 -1.ppt
22/44
7/25/2019 mba15 -1.ppt
23/44
nteruartile range
/he interuartile range of a data set is the
difference bet"een the third uartileand the first
uartile.
t is the range for the middle 2- of the data. t o#ercomes the sensiti#ity to etreme data
#alues.
7/25/2019 mba15 -1.ppt
24/44
Aariance
/he #ariance is a measure of #ariability
that utilizes all the data.
t is based on the difference bet"een the
#alue of each obser#ation =xi> and the
mean =xfor a sample, for a population>.
2
2
= ( )xNi 2
2
= ( )xNi s xi x
n2
2
1=
( )s xi x
n2
2
1=
( )H & 'opulation #ariance
Sample #ariance & I
7/25/2019 mba15 -1.ppt
25/44
Standard de#iation
/he standard de#iation of a data set is thepositi#e suare root of the #ariance.
t is measured in the same units as the
data, making it more easily comparable,than the #ariance, to the mean.
f the data set is a sample, the standard
de#iation is denoted s. f the data set is a population, the standard
de#iation is denoted =sigma>.
7/25/2019 mba15 -1.ppt
26/44
:oefficient of Aariation
/he coefficient of #ariation indicates ho" large the
standard de#iation is in relation to the mean. f the data set is a sample, the coefficient of #ariation
is computed as follo"s9
f the data set is a population, the coefficient of
#ariation is computed as follo"s9
s
x ( )100
s
x ( )100
( )100
( )100
s
x ( )100
s
x ( )100
7/25/2019 mba15 -1.ppt
27/44
Eample
Aariance
; 062.45 minutes suare
Standard De#iation
; 1+.242 minutes
:oefficient of Aariation ;
; 1+.240(31.134+ =+> ; 66.52-
7/25/2019 mba15 -1.ppt
28/44
S"ewness
$ Ske"ness characterizes the degree of
asymmetry of a distribution around its
mean 'ositi#ely ske"ed
Symmetric or unske"ed
*egati#ely ske"ed
Ske"ness
7/25/2019 mba15 -1.ppt
29/44
!egatively ske"ed
Ske"ness
7/25/2019 mba15 -1.ppt
30/44
Ske"ness
Symmetric
7/25/2019 mba15 -1.ppt
31/44
Ske"ness
Positively Ske"ed
7/25/2019 mba15 -1.ppt
32/44
Ske"ness & measure
3
3
1
)(
N
X=
Ske"ness of a distribution is measured by
8or a gi#en data set you may use
7/25/2019 mba15 -1.ppt
33/44
urtosis
urtosis characterizes the relati#e
peakedness or flatness of a symmetric
distribution compared to the normal
distribution
'latykurtic=relati#ely flat>
esokurtic=normal>
@eptokurtic=relati#ely peaked>
7/25/2019 mba15 -1.ppt
34/44
urtosis
Platykurtic- flat distribution
7/25/2019 mba15 -1.ppt
35/44
urtosis
Mesokurtic - not too flat and not too peaked
7/25/2019 mba15 -1.ppt
36/44
urtosis
#eptokurtic- peaked distribution
7/25/2019 mba15 -1.ppt
37/44
urtosis & measure
urtosis for a distribution is measured by
4
4
2
)(
N
X=
31
=
"here
8or a gi#en data set you may use
7/25/2019 mba15 -1.ppt
38/44
%ssociation bet"een t"o #ariables
#elay $assen%ers #elay $assen%ers #elay $assen%ers
23 62 26 2+ 2 64
0 6+ 01 2 71
06 23 12 27 34 70
62 +3 27 22 64
11 02 0 20 02 73
2 24 4 20 +2 63
00 64 17 62 04 64
+1 62 67 27 22
+1 26 04 61 + 02
12 2 0 2 2 7+
+3 7 02 6+ 26 60
2 73 25 16 6
02 63 30 63 07 6+
13 26 52 05 1 04
7/25/2019 mba15 -1.ppt
39/44
%ssociation bet"een t"o #ariables
Scatter plot
:o#ariance
:orrelation :oefficient
7/25/2019 mba15 -1.ppt
40/44
Scatter 'lot
Scatter $lotsare used to identify any
underlying relationships among pairs of
data sets.
/he plot consists of a scatter of points,
each point representing an obser#ation.
7/25/2019 mba15 -1.ppt
41/44
Scatter 'lot
7/25/2019 mba15 -1.ppt
42/44
:o#ariance
/he co#ariance is a measure of the linear
association bet"een t"o #ariables.
'ositi#e #alues indicate a positi#e
relationship.
*egati#e #alues indicate a negati#e
relationship
: i
7/25/2019 mba15 -1.ppt
43/44
f the data sets are samples, the co#ariance
is denoted by
f the data sets are populations, theco#ariance is denoted by
:o#ariance
s x x y y
nxy
i i=
( )( )
1s
x x y y
nxy
i i=
( )( )
1
xy i x i yx yN
=
( )( )
xy i x i yx y
N=
( )( )
; 1.01 in the
%irline
eample
: l ti : ffi i t
7/25/2019 mba15 -1.ppt
44/44
:orrelation :oefficient
/he coefficient can take on #alues bet"een &+ and +.
Aalues near &+ indicate a strong negati#e linear relationship. Aalues near + indicate a strong positi#e linear relationship.
f the data sets are samples, the coefficient is
f the data sets are populations, the coefficient is
xyxy
x y=
xyxy
x y=
rs
s sxy xy
x y=r
s
s sxy xy
x y= ; .+1+ in %irlineeample