Upload
others
View
33
Download
1
Embed Size (px)
Citation preview
508-B (Statistics Camp, Wash U, Summer 2016)
Families of DistributionsAuthor: Andrés Hincapié and Linyi Cao
This Version: July 21, 2016
Families of Distributions 3
Suppose our data X follows a distribution X ∼ P
We could assume for instance P ≡ N(µ, σ2
)
Hence pdf is
f (x) =1√
2πσ2exp
{− 1
2σ2(x− µ)2
}
Distribution is indexed by a fixed number of parameters
Such distribution is called parametric
508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 4
We may relax our functional assumptions to
a. E [X ] = µ
b. Distribution is symetric around µ
c. cdf of X is smooth
This more flexible distribution cannot be characterized by a fixed number of parameters.
Such distribution is called nonparametric
508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 5
We will be dealing with parametric distributions
Families are parametric distributions f (x; θ) indexed by a set of parameters θ with fixed number ofelements.
Notation f (x; θ) helps keep track of the vector of parameters θ
Vary certain characteristics while remaing with the same functional form
508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 6
1 Discrete Distributions
R.V. X has a discrete distribution if its support is countable
1.1 Discrete Uniform (1, N)
P (X = x) = 1N , x = 1, 2, . . . N
Mean: E [X ] = N+12
Variance: V ar [X ] = (N+1)(N−1)12
Needless to say, mean does not necesarily belong to the support of X
Example: Throwing a die- each number has p = 1/6
508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 7
1.2 Binomial (p)
Based on the idea of a Bernoulli trial : an experiment with only two possible outcomes happening withprob. p and 1− p
If X is Bernoulli, EX =, and V arX = p (1− p)
A binomial r.v. emerges from the number of successes in a sequence of n independent Bernoulli trials
Let n = 3. Sequence A1 = 0, A2 = 0, A3 = 1 (which implies Y = 1) has prob (1− p) (1− p) p
Sequence A1 = 0, A2 = 1, A3 = 0 (which also implies Y = 1) has prob (1− p) p (1− p)
508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 8
Any particular sequence of with y successes has probability py (1− p)n−y
In how many different sequences can Y = y happen? Using our counting knowledge, how manyunordered subsets of y elements without replacement can we form from n labels?
(ny
)
Therefore
P (X = x) =
(n
y
)py (1− p)n−y
Example: Probability of obtaining at least one 6 in a sequence of 4 rolls of a fair die.
EX = np, V arX = np (1− p), MX (t) = [pet + (1− p)]n
508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 9
1.3 Poisson (λ)
Used f.i. in number of occurrences in a given period of time or waiting for occurrences
P (X = x) =e−λλx
x!
EXAMPLE: operator receives 5calls/min. X =number of calls in a minute. X is Poisson. What is theprob. of receiving at least two calls in the next minute?
It can be shown that
P (X = x) =λ
xP (X = x− 1)
508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 10
2 Continuous Distributions
R.V. X has an uncountable support
2.1 Uniform[a, b]
fX (x) =
{1/ (b− a) if x ∈ [a, b]
0 otherwise
EX = b+a2 , V arX = (b−a)2
12
508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 11
2.2 Gamma (α, β)
First define the gamma function. For any positive integer α
Γ (α) =
∫ ∞0
tα−1e−tdt
Satisfies
Γ (α + 1) = αΓ (α) , for α > 0
And
Γ (n) = (n− 1)!
508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 12
A random variable has a gamma (α, β) distribution if its pdf is of the form:
fX (x) =1
Γ (α) βαxα−1e−x/β , 0 < x <∞ , α > 0 , β > 0
α is known as the shape parameters and β as the scale parameter (affects spread)
EX = αβ
V arX = αβ2
MX (t) =(
11−βt)α, t < 1/β
508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 13
Important special cases of the Gamma are
Exponential (β)= Gamma (α = 1, β)
Chi-squared with n degrees of freedom = Gamma(α = n/2, β = 2)
508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 14
2.3 Normal(µ, σ2
)(Also called Gaussian)
Very tractable analytically
Familiar bell shape
A large variety of distributions “approach” a Normal in large samples
f (x) =1√
2πσ2exp
{− 1
2σ2(x− µ)2
}pdf does not have closed form antiderivative
Φ (z) is notantion for the standard normal ≡ N (0, 1)
If X ∼ N(µ, σ2
), Z = X−µ
σ ∼ N (0, 1)
If X ∼ N(µ, σ2
), and Y = aX + b. How’s Y distributed?
508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 15
2.4 Lognormal
X is a r.v. whose log is normaly distributed
EX = exp{µ + σ2/2
}
V arX = exp{
2(µ + σ2
)}− exp
{2µ + σ2
}
Applications... when variable of interest is skewed to the right: income as log normal (it cannot benegative) allows to exploit Normal tractability on the log(income)
508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 16
2.5 Beta (α, β)
Continuous family on (0, 1)
f (x) =1
B (α, β)xα−1 (1− x)β−1 , 0 < x < 1 , α, β > 0
B (α, β) is the beta function
B (α, β) =
∫ 1
0
xα−1 (1− x)β−1 dx
which satisfies
B (α, β) =Γ (α) Γ (β)
Γ (α + β)
EX = αα+β V arX = αβ
(α+β)2(α+β+1)
508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 17
3 Exponential Families
A family of pdfs or pmfs belongs to exponential family if it can be written as
f (x; θ) = h (x) c (θ) exp
{k∑i=1
wi (θ) ti (x)
}
where h (x) ≥ 0 and t1 (x) , . . . , tk (x) are real-valued functions of x
θ can be a vector or a scalar
Exponential families:
Discrete: binomial, Poisson
Continuous: normal, gamma, and beta
508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 18
EXAMPLE: Binomial exponential family
P (X = x) =
(n
y
)py (1− p)n−y
EXAMPLE: Gamma exponential family
fX (x) =1
Γ (α) βαxα−1e−x/β , 0 < x <∞ , α > 0 , β > 0
508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 19
If X belongs to an exponential family
E
(k∑i=1
∂wi (θ)
∂θjti (X)
)= −∂ log c (θ)
∂θj
EXAMPLE: Binomial
508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 20
4 Location and Scale Families
Horizontal shifts of the distribution; stretch or contract the distribution
First, notice that for any pdf f (x) and arbitrary constants µ and σ > 0,
g (x;µ, σ) =1
σf
(x− µσ
)is a pdf
Proof...
508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 21
4.1 Location FamiliesLet f (x) be any pdf. Then for any constant µ, the family of pdfs
f (x− µ)
is called a location family with location parameter µ
At any x = µ + a, f (x− µ) = f (a)
EXAMPLE: Normal with σ = 1
4.2 Scale FamiliesLet f (x) be any pdf. Then for any σ > 0, the family of pdfs
(1/σ) f (x/σ)
indexed by the parameter σ is called a scale family with scale parameter σ
EXAMPLE: Normal with µ = 0
508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 22
4.3 Scale-Location FamiliesLet f (x) be any pdf. Then any constant µ and any constant σ > 0, the family of pdfs
f (x;µ, σ) =1
σf
(x− µσ
)indexed by the parameters µ and σ is called a scale-location family
Result: if X follows a location-scale distribution such that
fX (x) =1
σf
(x− µσ
)
Hence the r.v. Z = X−µσ follows a location-scale distribution with location parameter 0 and scale
parameter 1
fZ (z) = f (z)
508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 23
5 Expectations and Probabilities
We can think of any probability as an expectation of some indicator function; like we do with theBernoulli distribution where
E[X ] = Pr[X = 1]
There are some useful inequalities that provide information regarding the relation between expecta-tions and probabilities
5.1 Markov’s Inequality
Let X be a r.v. and let g (x) be a nonnegative function. Then, for any r > 0
P (g (X) ≥ r) ≤ Eg (x)
r
Proof...
It is an upper bound for the probability that a non-negative function of a r.v. is greater than or equal tosome positive constant.508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 24
5.2 Chebyshev Inequality
Let X be a r.v., any constant c, and any constant d > 0. Then
Pr (|X − c| ≥ d) ≤ E[(X − c)2]/d2
EXAMPLE:
Pr[(X − EX)2 ≥ d2) ≤ V arX/d2
It really follows from Markov’s Inequality by considering g (X) = (X − EX)2
Gives a universal bound on deviation |X − µ| in terms of σ
Pr[|X − µ| ≥ tσ) ≤ E[(X − µ)2]/ (tσ)2
= V arX/ (tσ)2
= 1/t2
508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 25
5.3 Stein’s Lemma
Let X ∼ N(µ, σ2
)and let g be a differentiable function satisfying E[g′ (X)] <∞. Then
E [g (X) (X − µ)] = σ2E [g′ (X)]
EXAMPLE: Compute EX3 as E[X2 (X + µ− µ)
]= E
[X2 (X − µ)
]+ µEX2
where E[X2]
= V arX + (EX)2
It is not used as often as Markov’s inequality but can be very helpful sometimes
508-B (Statistics Camp, Wash U, Summer 2016)
Families of Distributions 26
5.4 Jensen’s Inequality
Recall a function h(X) is concave (convex) if for all x, y and 0 < λ < 1
h (λx + (1− λ) y) ≥ (≤)λh (x) + (1− λ)h (y)
Let X be a r.v. and h be a concave(convex) function of X. Then,
E[h(X)] ≤ (≥) h(E[X ])
Related to preference over risk.
508-B (Statistics Camp, Wash U, Summer 2016)