Click here to load reader

1 One parameter exponential families - One parameter exponential families ... we see that the mapping is 1:1 and non-decreasing and is invertible as long as the random variable t(X)

  • View
    214

  • Download
    1

Embed Size (px)

Text of 1 One parameter exponential families - One parameter exponential families ... we see that the...

  • 1 One parameter exponential families

    The world of exponential families bridges the gap between the Gaussian family and general dis-tributions. Many properties of Gaussians carry through to exponential families in a fairly precisesense.

    In the Gaussian world, there exact small sample distributional results (i.e. t, F , 2).

    In the exponential family world, there are approximate distributional results (i.e. deviancetests).

    In the general setting, we can only appeal to asymptotics.

    A one-parameter exponential family, F is a one-parameter family of distributions of the form

    P(dx) = exp ( t(x) ())P0(dx)

    for some probability measure P0. The parameter is called the natural or canonical parameterand the function is called the cumulant generating function, and is simply the normalizationneeded to make

    f(x) =dPdP0

    (x) = exp ( t(x) ())

    a proper probability density. The random variable t(X) is the sufficient statistic of the exponentialfamily.

    Note that P0 does not have to be a distribution on R, but these are of course the simplestexamples.

    1.0.1 A first example: Gaussian with linear sufficient statistic

    Consider the standard normal distribution

    P0(A) =A

    ez2/2

    2

    dz

    and let t(x) = x. Then, the exponential family is

    P(dx) exx

    2/2

    2

    and we see that() = 2/2.

    eta = np.linspace(-2,2,101)

    CGF = eta**2/2.

    plt.plot(eta, CGF)

    A = plt.gca()

    A.set_xlabel(r$\eta$, size=20)

    A.set_ylabel(r$\Lambda(\eta)$, size=20)

    f = plt.gcf()

    1

  • Thus, the exponential family in this setting is the collection

    F = {N(, 1) : R} .

    1.0.2 Normal with quadratic sufficient statistic on Rd

    As a second example, take P0 = N(0, Idd), i.e. the standard normal distribution on Rd. As sufficientstatistic, we take t(x) = x22/2. Then, the exponential family is

    P(dx) ex22/2x22/2

    and we see that the family is only defined for < 1. For < 1,

    () = d2

    log(1 ).

    We see that not all exponential families have all of R as their parameter space.We might as well define over all of R:

    () =

    {d2 log(1 ) < 1 1.

    The exponential family here is

    F ={N(0d1, (1 )1 Idd), < 1

    }.

    eta = np.linspace(-3,0.99,101)

    d = 3

    CGF = -d * np.log(1-eta)/2.

    plt.plot(eta, CGF)

    A = plt.gca()

    A.set_xlabel(r$\eta$, size=20)

    A.set_ylabel(r$\Lambda(\eta)$, size=20)

    2

  • 1.0.3 Tilts of triangular distribution

    The previous two examples, we could express explicitly by simple integration. This is not alwayspossible, though we can use the computer to do some calculations for us. Set P0 to be the triangulardistribution on (1, 1) with sufficient statistic t(x) = x so that

    P(dx) = exp( x ())P0(dx)

    with

    () = log

    ( 11ex dx

    ).

    X = np.linspace(-1,1,501)

    dX = X[1]-X[0]

    def tilted_density(eta):

    D = np.exp(eta*X) * np.minimum((1 + X), (1 - X))

    CGF = np.log((np.exp(eta*X) * np.minimum((1 + X), (1 - X)) * dX).sum())

    return D / np.exp(CGF)

    [plt.plot(X, tilted_density(eta), label=r$\eta=%d$ % eta) for eta in [0,1,2,3]]

    plt.gca().set_title(Tilts of the uniform distribution.)

    plt.legend(loc=upper left)

    3

  • 1.0.4 Carrier measure

    More generally, P0 could be replaced by some measure m0 that is not a probability density.For example, if m0 is Lebesgue measure on R and t(x) = x2/2. Then, for all < 0

    dPdm0

    (x) =ex

    2/22/

    corresponds to a N(0,1) density.To find (), note that

    e() =

    Rex

    2/2dm0(x) =

    {2/ < 0 otherwise.

    Therefore, for < 0

    () = 12

    log() + 12

    log(2).

    The exponential family is therefore{N(0,1), < 0

    }.

    1.1 Reparametrizing the family

    Note that the exponential family is determined by the pair (t(X),m0). The choice of m0 is somewhatarbitrary. We could fix some 0 and consider a new family with carrier measure P0 F :

    F ={P = exp

    ( t(x) ()

    )P0(dx)

    }

    4

  • But, a simple manipulation shows that

    P(dx) = exp ( t(x) ())m0(dx)= exp (( 0) t(x) (() (0))P0(dx).

    This shows that there is a 1:1 correspondence between F and F . Namely

    F 3 P 7 P+0 F

    () = ( + 0) (0).

    1.2 Domain of an exponential family

    In the examples above, we saw that not all values of lead to a probability distribution due to thesufficient statistic not being integrable with respect to m0. The domain D(F) can be thought of asthe set of all natural parameters which lead to a probablity distribution.

    Formally, we define the domain as

    D(F) = D((t(X),m0)) = { : ()

  • 1.3 Example: the Poisson family

    An important example of a one-parameter family that we will revisit often is the Poisson family onthe non-negative integers Z0 = {0, 1, 2, . . . }. The carrier measure is

    m0(dx) =1

    x!m(dx)

    with m the counting measure on Z0.Poisson random variables are usually parametrized by their expectation . This is different than

    the canonical parametrization.Lets write this parameterization as

    Q(dx) =ex

    x!m0(dx).

    We see, thenP(dx) = exp( x ())m0(dx).

    The two parametrizations are related by

    = log()

    () = e =

    1.3.1 Exercise: reparametrizing the Poisson family

    Let F = (x,m0) denote the Poisson family.

    1. What is D(F)?

    2. Rewrite the Poisson family P so that the carrier measure is a Poisson distribution with mean2. Call the exponential family with this carrier measure F2. What is D(F2)?

    3. Write the Poisson distribution with mean 6 as a point in D(F) and as a point in D(F2). That is,in each case, find the canonical parameter such the corresponding distribution is a Poisson withmean 6.

    1.4 Expectation and variances

    The function is the cumulant generating function of the family and differentiating it yields thecumulants of the random variable t(X). Specifically, if the carrier measure is a probability measure,it is the logarithm of the moment generating function of t(X) under P0. More generally, if thecarrier measure is not a probability measure but just a measure on some sample space , then forany D(F)

    E(et(X)) =

    e(+)t(x)() m0(dx) = e

    (+)().

    Note that

    e() =

    et(x) m0(dx).

    6

  • Differentiating yields with respect to

    ()e() =

    t(x)et(x) m0(dx)

    = e()

    t(x)P(dx)

    = e() E(t(X)).

    Differentiating a second time yields(() + ()2

    )e() =

    t(x)2et(x) m0(dx)

    = e()E(t(X)2).

    Summarizing,() = E(t(X))() = E[(t(X) E(t(X))2] = Var(t(X))

    The above also motivates definition of another space related to F , the set of realizable expectedvalues

    M(F) ={

    () : D(F)}

    = (D(F)).

    1.4.1 Parametrization by the mean

    The above calculation yields a parameterization

    () = E(t(X)) = ().

    Asd

    d= () = Var(t(X)) 0

    we see that the mapping is 1:1 and non-decreasing and is invertible as long as the random variablet(X) is not constant under P.

    Further, as the moment generating function of t(X) under P is defined for all D(F). Thismap is infinitely differentiable on D(F).

    1.5 Skewness and kurtosis

    We saw above that the moment generating function of t(X) under P can be expressed as

    E(et(X)) = e(+)().

    Taking the logs and expanding yields the cumulants of t(X) under P. That is,

    ( + ) () = k1 + k22

    2+ k3

    3

    6+ . . .

    whereki =

    (i)()

    7

  • are the cumulants of t(X) under P.The skewness of a random variable defined as

    Skew(Y ) = Skew(Y,LY ) =E[(Y E(Y ))3]

    Var(Y )3/2= =

    k3

    k3/22

    and kurtosis is defined as

    Kurtosis(Y ) = Kurtosis(Y,LY ) =E[(Y E(Y ))4]

    Var(Y )2 3 = = k4

    k22.

    For a one-parameter exponential family, we see that

    () = Skew(t(X),P)

    =(3)()

    ()3/2

    () = Kurtosis(t(X),P)

    =(4)()

    ()2

    1.5.1 Exercise: skewness and kurtosis of the Poisson family

    1. Plot the skewness and kurtosis of the Poisson family as a function of .

    2. What happens as ? Is this expected?

    1.6 Cumulants in the mean parametrization

    Above, we see that the most natural parameterization of cumulants (and skewness, kurtosis) is interms of the canonical parameter . However, we now have this alternative parametrization in termsof the mean . Given a quantity like () we can reparametrize this as

    () = Skew(t(X),P()) = (()) =(3)(())

    Var()(t(X))3/2.

    This new parametrization yields some interesting relations.

    1.6.1 Exercise: reparametrization of skewness

    Show that

    () = 2d

    d

    (Var()

    )1/2.

    1.6.2 Exercise: estimation of and

    Suppose X P, then, from the calculations we have seen above

    E(t(X)) = () = ().

    8

  • Can we find an unbiased estimate of ? For this exercise, assume = R, t(x) = x and the carriermeasure has a density with respect to Lebesgue measure and write D(F) = [m,M ] (with one orboth of m,M possibly infinite). That is,

    P(dx) =

    {ex()g0(x) dx m x M0 otherwise.

    with (0) = 0.Define

    `0(x) = log(g0(x)).

    Show that

    E[ ddx`0(x)

    ]= [g(M) g(m)] .

    Try this estimator out numerically for the half-Gaussian family.

    1.7 Repeated sampling

    One of the very nice properties of exponential families is the behaviour under IID sampling. Specif-ically, let

    X1, . . . , XnIID P

    withdPdm0

    (x) = exp( t(x) ()).

    The joint density has a very simple expression:

    ni=1

    [exp( t(xi) ()) m0(dxi)] = exp(n [ t(X) ()

    ]) ni=1

    m0(dxi)

    with

    t(X) =1