Families of Distributions - Washington University in St. Louis · Families of Distributions 17 3 Exponential Families A family of pdfs or pmfs belongs to exponential family if it

508-B (Statistics Camp, Wash U, Summer 2016)

Families of DistributionsAuthor: Andrés Hincapié and Linyi Cao

This Version: July 21, 2016

Families of Distributions 3

Suppose our data X follows a distribution X ∼ P

We could assume for instance P ≡ N(µ, σ2

)

Hence pdf is

f (x) =1√

2πσ2exp

{− 1

2σ2(x− µ)2

}

Distribution is indexed by a fixed number of parameters

Such distribution is called parametric



We may relax our functional assumptions to

a. E [X ] = µ

b. Distribution is symetric around µ

c. cdf of X is smooth

This more flexible distribution cannot be characterized by a fixed number of parameters.

Such distribution is called nonparametric



We will be dealing with parametric distributions

Families are parametric distributions f (x; θ) indexed by a set of parameters θ with fixed number ofelements.

Notation f (x; θ) helps keep track of the vector of parameters θ

Vary certain characteristics while remaing with the same functional form



1 Discrete Distributions

R.V. X has a discrete distribution if its support is countable

1.1 Discrete Uniform (1, N)

P (X = x) = 1N , x = 1, 2, . . . N

Mean: E [X ] = N+12

Variance: V ar [X ] = (N+1)(N−1)12

Needless to say, mean does not necesarily belong to the support of X

Example: Throwing a die- each number has p = 1/6



1.2 Binomial (p)

Based on the idea of a Bernoulli trial : an experiment with only two possible outcomes happening withprob. p and 1− p

If X is Bernoulli, EX =, and V arX = p (1− p)

A binomial r.v. emerges from the number of successes in a sequence of n independent Bernoulli trials

Let n = 3. Sequence A1 = 0, A2 = 0, A3 = 1 (which implies Y = 1) has prob (1− p) (1− p) p

Sequence A1 = 0, A2 = 1, A3 = 0 (which also implies Y = 1) has prob (1− p) p (1− p)



Any particular sequence of with y successes has probability py (1− p)n−y

In how many different sequences can Y = y happen? Using our counting knowledge, how manyunordered subsets of y elements without replacement can we form from n labels?

(ny

)

Therefore

P (X = x) =

(n

y

)py (1− p)n−y

Example: Probability of obtaining at least one 6 in a sequence of 4 rolls of a fair die.

EX = np, V arX = np (1− p), MX (t) = [pet + (1− p)]n



1.3 Poisson (λ)

Used f.i. in number of occurrences in a given period of time or waiting for occurrences

P (X = x) =e−λλx

x!

EXAMPLE: operator receives 5calls/min. X =number of calls in a minute. X is Poisson. What is theprob. of receiving at least two calls in the next minute?

It can be shown that

P (X = x) =λ

xP (X = x− 1)



2 Continuous Distributions

R.V. X has an uncountable support

2.1 Uniform[a, b]

fX (x) =

{1/ (b− a) if x ∈ [a, b]

0 otherwise

EX = b+a2 , V arX = (b−a)2

12



2.2 Gamma (α, β)

First define the gamma function. For any positive integer α

Γ (α) =

∫ ∞0

tα−1e−tdt

Satisfies

Γ (α + 1) = αΓ (α) , for α > 0

And

Γ (n) = (n− 1)!



A random variable has a gamma (α, β) distribution if its pdf is of the form:

fX (x) =1

Γ (α) βαxα−1e−x/β , 0 < x <∞ , α > 0 , β > 0

α is known as the shape parameters and β as the scale parameter (affects spread)

EX = αβ

V arX = αβ2

MX (t) =(

11−βt)α, t < 1/β



Important special cases of the Gamma are

Exponential (β)= Gamma (α = 1, β)

Chi-squared with n degrees of freedom = Gamma(α = n/2, β = 2)



2.3 Normal(µ, σ2

)(Also called Gaussian)

Very tractable analytically

Familiar bell shape

A large variety of distributions “approach” a Normal in large samples

f (x) =1√

2πσ2exp

{− 1

2σ2(x− µ)2

}pdf does not have closed form antiderivative

Φ (z) is notantion for the standard normal ≡ N (0, 1)

If X ∼ N(µ, σ2

), Z = X−µ

σ ∼ N (0, 1)

If X ∼ N(µ, σ2

), and Y = aX + b. How’s Y distributed?



2.4 Lognormal

X is a r.v. whose log is normaly distributed

EX = exp{µ + σ2/2

}

V arX = exp{

2(µ + σ2

)}− exp

{2µ + σ2

}

Applications... when variable of interest is skewed to the right: income as log normal (it cannot benegative) allows to exploit Normal tractability on the log(income)



2.5 Beta (α, β)

Continuous family on (0, 1)

f (x) =1

B (α, β)xα−1 (1− x)β−1 , 0 < x < 1 , α, β > 0

B (α, β) is the beta function

B (α, β) =

∫ 1

0

xα−1 (1− x)β−1 dx

which satisfies

B (α, β) =Γ (α) Γ (β)

Γ (α + β)

EX = αα+β V arX = αβ

(α+β)2(α+β+1)



3 Exponential Families

A family of pdfs or pmfs belongs to exponential family if it can be written as

f (x; θ) = h (x) c (θ) exp

{k∑i=1

wi (θ) ti (x)

}

where h (x) ≥ 0 and t1 (x) , . . . , tk (x) are real-valued functions of x

θ can be a vector or a scalar

Exponential families:

Discrete: binomial, Poisson

Continuous: normal, gamma, and beta



EXAMPLE: Binomial exponential family

P (X = x) =

(n

y

)py (1− p)n−y

EXAMPLE: Gamma exponential family

fX (x) =1

Γ (α) βαxα−1e−x/β , 0 < x <∞ , α > 0 , β > 0



If X belongs to an exponential family

E

(k∑i=1

∂wi (θ)

∂θjti (X)

)= −∂ log c (θ)

∂θj

EXAMPLE: Binomial



4 Location and Scale Families

Horizontal shifts of the distribution; stretch or contract the distribution

First, notice that for any pdf f (x) and arbitrary constants µ and σ > 0,

g (x;µ, σ) =1

σf

(x− µσ

)is a pdf

Proof...



4.1 Location FamiliesLet f (x) be any pdf. Then for any constant µ, the family of pdfs

f (x− µ)

is called a location family with location parameter µ

At any x = µ + a, f (x− µ) = f (a)

EXAMPLE: Normal with σ = 1

4.2 Scale FamiliesLet f (x) be any pdf. Then for any σ > 0, the family of pdfs

(1/σ) f (x/σ)

indexed by the parameter σ is called a scale family with scale parameter σ

EXAMPLE: Normal with µ = 0



4.3 Scale-Location FamiliesLet f (x) be any pdf. Then any constant µ and any constant σ > 0, the family of pdfs

f (x;µ, σ) =1

σf

(x− µσ

)indexed by the parameters µ and σ is called a scale-location family

Result: if X follows a location-scale distribution such that

fX (x) =1

σf

(x− µσ

)

Hence the r.v. Z = X−µσ follows a location-scale distribution with location parameter 0 and scale

parameter 1

fZ (z) = f (z)



5 Expectations and Probabilities

We can think of any probability as an expectation of some indicator function; like we do with theBernoulli distribution where

E[X ] = Pr[X = 1]

There are some useful inequalities that provide information regarding the relation between expecta-tions and probabilities

5.1 Markov’s Inequality

Let X be a r.v. and let g (x) be a nonnegative function. Then, for any r > 0

P (g (X) ≥ r) ≤ Eg (x)

r

Proof...

It is an upper bound for the probability that a non-negative function of a r.v. is greater than or equal tosome positive constant.508-B (Statistics Camp, Wash U, Summer 2016)


5.2 Chebyshev Inequality

Let X be a r.v., any constant c, and any constant d > 0. Then

Pr (|X − c| ≥ d) ≤ E[(X − c)2]/d2

EXAMPLE:

Pr[(X − EX)2 ≥ d2) ≤ V arX/d2

It really follows from Markov’s Inequality by considering g (X) = (X − EX)2

Gives a universal bound on deviation |X − µ| in terms of σ

Pr[|X − µ| ≥ tσ) ≤ E[(X − µ)2]/ (tσ)2

= V arX/ (tσ)2

= 1/t2



5.3 Stein’s Lemma

Let X ∼ N(µ, σ2

)and let g be a differentiable function satisfying E[g′ (X)] <∞. Then

E [g (X) (X − µ)] = σ2E [g′ (X)]

EXAMPLE: Compute EX3 as E[X2 (X + µ− µ)

]= E

[X2 (X − µ)

]+ µEX2

where E[X2]

= V arX + (EX)2

It is not used as often as Markov’s inequality but can be very helpful sometimes



5.4 Jensen’s Inequality

Recall a function h(X) is concave (convex) if for all x, y and 0 < λ < 1

h (λx + (1− λ) y) ≥ (≤)λh (x) + (1− λ)h (y)

Let X be a r.v. and h be a concave(convex) function of X. Then,

E[h(X)] ≤ (≥) h(E[X ])

Related to preference over risk.


Documents

Families of Distributions - Washington University in St. Louis · Families of Distributions 17 3 Exponential Families A family of pdfs or pmfs belongs to exponential family if it