Upload
brett-kittridge
View
227
Download
3
Embed Size (px)
Citation preview
1
The Vision ThingPower Thirteen
Bivariate Normal Distribution
2
Outline
• Circles around the origin
• Circles translated from the origin
• Horizontal ellipses around the (translated) origin
• Vertical ellipses around the (translated) origin
• Sloping ellipses
3
x
y
x = 0, x2 =1
y = 0, y2 =1
x, y = 0
4
x
y
x = a, x2 =1
y = b, y2 =1
x, y = 0
a
b
5
x
y
x = 0, x2 > y
2
y = 0
x, y = 0
6
x
y
x = 0, x2 < y
2
y = 0
x, y = 0
7
x
y
x = a, x2 > y
2
y = b
x, y > 0
a
b
8
x
y
x = a, x2 > y
2
y = b
x, y < 0
a
b
9
Why? The Bivariate Normal Density and Circles
• f(x, y) = {1/[2xy]}*exp{(-1/[2(1-)]* ([(x-x)/x]2 -2([(x-x)/x] ([(y-y)/y] + ([(y-y)/y]2}
• If means are zero and the variances are one and no correlation, then
• f(x, y) = {1/2}exp{(-1/2 )*(x2 + y2), where f(x,y) = constant, k, for an isodensity
• ln2k =(-1/2)*(x2 + y2), and (x2 + y2)= -2ln2k=r2
10
Ellipses
• If x2 > y
2, f(x,y) = {1/[2xy]}*exp{(-1/2)* ([(x-x)/x]2 + ([(y-y)/y]2}, and x* = (x-x) etc.
• f(x,y) = {1/[2xy]}exp{(-1/2)* ([x*/x]2 + [y*/y]2) , where f(x,y) =constant, k, and ln{k [2xy]} = (-1/2) ([x*/x]2 + [y*/y]2 )and x2/c2 + y2/d2 = 1 is an ellipse
11
x
y
x = 0, x2 < y
2
y = 0
x, y < 0
Correlation and Rotation of the Axes
Y’
X’
12
Bivariate Normal: marginal & conditional
• If x and y are independent, then f(x,y) = f(x) f(y), i.e. the product of the marginal distributions, f(x) and f(y)
• The conditional density function, the density of y conditional on x, f(y/x) is the joint density function divided by the marginal density function of x: f(y/x) = f(x, y)/f(x)
Conditional Distribution
• f(y/x)= 1/[y ]exp{[-1/2(1-y2]* [y-y-
x-x)(y/x)]}
• the mean of the conditional distribution is: y + (x - x) )(y/x), i.e this is the expected value of y for a given value of x, x=x*:
• E(y/x=x*) = y + (x* - x) )(y/x)
• The variance of the conditional distribution is: VAR(y/x=x*) = x
2(1-)2
2/12 )1(2
14
x
y
x = a, x2 > y
2
y = b
x, y > 0
x
y
Regression line
intercept:y - x(y/x)
slope:(y/x)
15
Bivariate Regression: Another Perspective
• Regression line is the E(y/x) line if y and x are bivariate normal– intercept: y - x x/y)
– slope: x/y)
16
Example: Lab Six
0
1
2
3
4
5
6
-0.05 0.00 0.05 0.10
Series: GESample 1993:01 1996:12Observations 48
Mean 0.022218Median 0.019524Maximum 0.117833Minimum -0.058824Std. Dev. 0.043669Skewness 0.064629Kurtosis 2.231861
Jarque-Bera 1.213490Probability 0.545122
Rate of Return to GE stock
17
Example: Lab Six
0
2
4
6
8
10
12
-0.04 -0.02 0.00 0.02 0.04 0.06 0.08
Series: INDEXSample 1993:01 1996:12Observations 48
Mean 0.014361Median 0.017553Maximum 0.076412Minimum -0.044581Std. Dev. 0.025430Skewness -0.453474Kurtosis 3.222043
Jarque-Bera 1.743715Probability 0.418174
Rate of Return to S&P500 Index
18
Correlation Matrix
• GE INDEXGE 1.000000
0.636290 INDEX 0.636290 1.000000
•
19
Bivariate Regression: Another Perspective
• Regression line is the E(y/x) line if y and x are bivariate normal– intercept: y - x x/y)
– slope: x/y)
y = 0.022218
x = 0.014361
x/y) = (0.02543/0.043669) =
– intercept = 0.0064
– slope = 1.094
20
-0.10
-0.05
0.00
0.05
0.10
0.15
-0.05 0.00 0.05 0.10
INDEX
GE
Returns Generating Process For GE Stock and S&P 500 Index
21
Vs. 0.0064
Vs. 1.094
22
Bivariate Normal Distribution and the Linear probability Model
23
income
education
x = a, x2 > y
2
y = b
x, y > 0
mean income players
Meaneduc.
Players
MeanEduc
Non-Players
Mean income non
Non-Players
Players
24
income
education
x = a, x2 > y
2
y = b
x, y > 0
mean income players
Meaneduc.
Players
MeanEduc
Non-Players
Mean income Non-Players
Non-Players
Players
25
income
education
x = a, x2 > y
2
y = b
x, y > 0
mean income players
Meaneduc.
Players
MeanEduc
Non-Players
Mean income Non-Players
Non-Players
Players Discriminatingline
26
Discriminant Function, Linear Probability Function, and Decision Theory, Lab 6
• Expected Costs of Misclassification– E(C) = C(P/N)P(P/N)P(N)+C(N/P)P(N/P)P(P)
• Assume C(P/N) = C(N/P)
• Relative Frequencies P(N)=23/100~1/4, P(P)=77/100~3/4
• Equalize two costs of misclassification by setting fitted value of P(P/N), i.e.Bern to 3/4– E(C) = C(P/N)(3/4)(1/4)+C(N/P)(1/4)(3/4)
27
income
education
x = a, x2 > y
2
y = b
x, y > 0
mean income players
Meaneduc.
players
MeanEduc
Non-Players
Mean income Non-Players
Non-Players
Players Discriminatingline
Note: P(P/N) is area of the non-players distribution below (southwest) of the line
28
Set Bern = 3/4 = 1.39 -0.0216*education - 0.0105*income,solve for education as it depends on income and plot
297 non-players misclassified, as well as 14players misclassified
30
31
Decision Theory
• Moving the discriminant line, I.e. changing the cutoff value from 0.75 to 0.5, changes the numbers of those misclassified, favoring one population at the expense of another
• you need an implicit or explicit notion of the costs of misclassification, such as C(P/N) and C(N/P) to make the necessary judgement of where to draw the line