35
Logarithmic Sobolev inequalities in discrete product spaces: proof by a transportation cost distance Katalin Marton Alfréd Rényi Institute of Mathematics of the Hungarian Academy of Sciences

Logarithmic Sobolev inequalities in discrete product

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Logarithmic Sobolev inequalities in discrete product spaces:proof by a transportation cost distance

Katalin Marton

Alfréd Rényi Institute of Mathematicsof the Hungarian Academy of Sciences

Relative entropy

DefinitionZ: measurable space, µ, ν: probability measures on Z.Z, U : random variables, L(Z) = µ, L(U) = ν.

Relative entropy:

D(µ||ν) = D(Z||U) =

∫Z

logdµ

dνdµ (1)

If Z is finite:

D(µ||ν) = D(Z||U) =∑z∈Z

µ(z) logµ(z)

ν(z). (2)

Entropy contraction of Markov kernels

Definition(Z, ν) : probability space,P(Z): measures on Z,Γ: Markov kernel on Z with invariant measure ν.

Entropy contraction for (Z, ν,Γ)with rate 1− c, 0 < c ≤ 1:

D(µΓ||ν) ≤ (1− c) ·D(µ||ν). (3)

Equivalently:

c ·D(µ||ν) ≤(D(µ||ν)−D(µΓ||ν)

), for all µ ∈ P(Z) (4)

Gibbs sampler governed by local specifications of qn

DefinitionΓi: Markov kernel X n 7→ Xn

Γi(zn|yn) = δ(yi, zi) · qi(zi|yi),

Γ: Markov kernel X n 7→ Xn:

Γ =1

n·n∑i=1

Γi.

Definitionqn has the entropy contraction property if:

its Gibbs sampler Γ has.

Entropy contraction in product space

(X n, qn): probability space

QuestionWhich measures qn have the entropy contraction property witha reasonable constant c?

c cannot be larger than O(1/n).Changing notation: WHEN

c

n·D(pn||qn) ≤

(D(pn||qn)−D(pnΓ||qn)

)for all pn ∈ P(X n)

?(5)

Equivalently: WHEN

D(pn||qn) ≤ (1− c

n) ·D(pnΓ||qn) ? (6)

Conditional relative entropy

DefinitionZ: measurable space,

V: another measurable space, π: probability measure on V,V : random variable, L(V ) = π

For v ∈ Vprobability measures on Z:µ(·|v) = L(Z|V = v) ν(·|v) = L(U |V = v)

Conditional relative entropy:

D(µ(·|V )||ν(·|V )

)= D

(Z|V )||U |V )

),∫VD(µ(·|v)||ν(·|v)

)dπ

(7)

Cont’dIf V is finite:

D(µ(·|V )||ν(·|V

)= D

(Z|V )||U |V )

)=∑v∈V

π(v)D(µ(·|v)||ν(·|v)

). (8)

Sufficient condition for entropy contraction in productspace

(X n, qn,Γ): probability space with Gibbs sampler,qn = L(Xn)pn = L(Y n): another distribution on X n

Recall: Γ = 1n

∑ni=1 Γi Yi = (Y1, . . . , Yi−1, Yi+1, . . . , Yn),

Proposition

c ·D(pn||qn) ≤n∑i=1

D(pi(·|Yi)||qi(·|Yi

)(∗)

=⇒

D(pnΓ||qn) ≤(1− c

n

)D(pn||qn).

(9)

If qn is a product measure then: c = 1.

NotationFor xn = (x1, x2, . . . , xn) ∈ X n:

xi , (x1, . . . , xi−1, xi+1, . . . , xn)

pi , L(Yi), qi , L(Xi)

Expansion formula

D(pn||qn) = D(pi||qi) +D(pi(·|Yi)||qi(·|Yi)

)=⇒

D(pn||qn)

=1

n·n∑i=1

D(pi||qi) +1

n·n∑i=1

D(pi(·|Yi)||qi(·|Yi)

)

Proof of Proposition

D(pn||qn)−D(pnΓ||qn)

≥ (by convexity of entropy)

D(pn||qn)− 1

n·n∑i=1

D(pi||qi)

=1

n

n∑i=1

D(pi(·|Yi||qi(·|Yi)

)(by assumption) ≥ c

nD(pn||qn)

Entropy contraction in DISCRETE product spaces

X finite

(Xn, qn,Γ)

Wanted: inequality

D(pn||qn) ≤ C·n∑i=1

D(pi(·|Yi||qi(·|Yi)

)for all pn ∈ P(Xn) (∗)

To get (*):use a Wassersteine-like distance.

A reverse Pinsker’s inequality

µ, ν: probability measures on X X finite

Notation: Variational distance

|µ− ν| = 1

2·∑x∈X|µ(x)− ν(x)|

LemmaSet

X+ , {x ∈ X : ν(x) > 0}, α , min{ν(x) : x ∈ X+

}Then

D(µ||ν) ≤ 4

α· |µ− ν|2

Follows from the inequality D(µ||ν) ≤∑

x∈X|µ(x)−ν(x)|2

ν(x)

A distance between measures on product spaces

Xn: product space

µn = L(Zn), νn = L(Un): probability measures on Xn

Definition (P. Massart)The square of the W2-distance of µn and νn:

W 22 (µn, νn) , min

n∑i=1

Pr2{Zi 6= Ui

}inf: on couplings of µn and νn.

NotationFor I ⊂ [1, n], pn = L(Y n) and yn ∈ X n:

yI , (yi : i ∈ I), yI , (yi : i /∈ I)

pI , L(YI), pI(·|yI) , L(YI |YI = yI)

For Theorem 1 we need the inequality

W 22

(pn||qn) ≤ C · E

n∑i=1

∣∣pi(·|Yi)− qi(·|Yi)∣∣2.in a MORE GENERAL FORM:

We require a bound for

W 22

(pI(·|yI), qI(·|yI)

)for ALL subsets I ⊂ [1, n] (not just I = [1, n]), and all yI .

Main Theorem

(X n, qn), X finite !

Theorem 1Set

α = min{qi(xi|xi) : qn(xn) > 0, 1 ≤ i ≤ n

}(10)

Fix a pn = L(Y n) ∈ P(X n); assume

qn(xn) = 0 =⇒ pn(xn) = 0. (11)

Main assumption:

W 22

(pI(·|yI), qI(·|yI))

≤ C · E{∑i∈I

∣∣pi(·|Yi)− qi(·|Yi)∣∣2 ∣∣∣∣ YI = yI

},

(12)

for all I ⊂ [1, n], yI ∈ X [1,n]\I .

Main Theorem continued

Assume all the inequalities

W 22

(pI(·|yI), qI(·|yI))

≤ C · E{∑i∈I

∣∣pi(·|Yi)− qi(·|Yi)∣∣2 ∣∣∣∣ YI = yI

},

(13)

where I ⊂ [1, n] and yI ∈ X [1,n]\I is fixed.Then

D(pn||qn) ≤

4C

α·n∑i=1

E∣∣pi(·|Yi)− qi(·|Yi)∣∣2

≤ 2C

α·n∑i=1

D(pi(·|Yi)||qi(·|Yi)

).

(14)

An analogous result for densities in Rn

Theoremf(xn) = exp(−V (xn) : density on Rn,qn : probability measure with density f .

Assume: conditional densities a f(xi|xi) satisfy alogarithmic Sobolev inequality with constant ρ, for all i, xi.Under some conditions on

1

ρ·Hess V (xn)

(expressing that V is not too far from being uniformly convex):

D(pn||qn) ≤ C ·n∑i=1

D(pi(·|Yi)||qi(·|Yi)

)for all pn

(C = C(qn))

Proof of Theorem 1By induction on n. Assume Theorem 1 for n− 1

Notation

pi(·|yi) , L(Yi|Yi = yi)

For every i ∈ [1, n]

D(pn||qn) = D(Y n||Xn) = D(Yi||Xi) +D(pi(·|Yi)||qi(·|Yi)

)=⇒

D(pn||qn) =1

n

n∑i=1

D(Yi||Xi) +1

n

n∑i=1

D(pi(·|Yi)|qi(·|Yi)

)(15)

By the induction hypothesis the second term is

≤(1− 1

n

)· 4C

α

n∑i=1

E∣∣pi(·|Yi)− qi(·|Yi)∣∣2

Proof of Theorem 1 Cont’dSecond term:

≤(1− 1

n

)· 4C

α

n∑i=1

E∣∣pi(·|Yi)− qi(·|Yi)∣∣2

First term:

1

n

n∑i=1

D(Yi||Xi) ≤ (by the Lemma)1

n· 4

α·∣∣L(Yi)− L(Xi)

∣∣2≤

n∑i=1

Pr2{Yi 6= Xi

}in any coupling of pn, qn

=1

n· 4

α·W 2

2 (pn, qn) for the best coupling

≤ (by the assumption of Theorem 1)

1

n· 4C

α·n∑i=1

∣∣pi(·|Yi)− qi(·|Yi)∣∣2(16)

Entropy contraction

(X n, qn,Γ), X finiteΓ: Gibbs sampler

Corollary 1If qn satisfies the conditions of Theorem 1 then

D(pnΓ||qn) ≤(

1− α

2nC

)·D(pn||qn). (17)

Logarithmic Sobolev inequality

NotationE : Dirichlet form associated with Γ isthe quadratic form

E(f, g) =⟨(Id− Γ)f, g

⟩qn

Definitionqn satisfies a logarithmic Sobolev inequalitywith constant c > 0 if:

c ·D(pn||qn) ≤ E(√

pn

qn,

√pn

qn

)for all pn ∈ P(X n)

Logarithmic Sobolev inequality for Gibbs sampler

(X n, qn,Γ), X finite

Corollary 2Under conditions of Theorem 1the logarithmic Sobolev inequality holds true:

1

n·D(pn||qn) ≤ 4C

α· EΓ

(√pn

qn,

√pn

qn

)=

4C

αn·n∑i=1

E(

1−(∑yi∈X

√pi(yi|Yi

)· qi(yi|Yi

))2).

(18)

=⇒ hypercontractivity

***********************************?????????????????

Application: Gibbs measures with Dobrushin’suniqueness condition

(X n, qn), X finite

Definitionqn satisfies (an L2-version of)Dobrushin’s uniqueness condition with coupling matrix

A =(ak,i)nk,i=1

, ai,i = 0,

if:

(i) max |qi(·|zi)− qi(·|si)| ≤ ak,i, k 6= i,

max : for all zi, si differing only in the k-th coordinate,(19)

and (ii)||A||2 < 1.

(X n, qn), X finite

Theorem 2Assume Dobrushin’s uniqueness condition with coupling matrix

A, ||A||2 < 1.

Then conditions of Theorem 1 hold with

C = 1/(1− ||A||

)2.

Thus

D(pn||qn) ≤ 4

α· 1(

1− ||A||)2 · n∑

i=1

E∣∣pi(·|Yi)− qi(·|Yi)∣∣2

≤ 2

α· 1(

1− ||A||)2 · n∑

i=1

D(pi(·|Yi)||qi(·|Yi)

),

(20)

Cont’d

ProofDobrushin’s uniqueness condition implies that Γ contractsW2-distance with rate

1− 1

n·(

1− ||A||2).

Application: Gibbs measures on Zd

Notation

Zd : d-dimensional lattice, i ∈ Zd: site

ρ(k, i) = max1≤ν≤d |kν − iν |: distance on Zd,

Λ ⊂⊂ Zd: finite set of sites

X finite: spin space

xZd

= (xi : i ∈ Zd) ∈ X Zd: spin configuration

X Zd: configuration space,

For xZd

and Λ ⊂ Zd

xΛ = (xi : i ∈ Λ), xΛ = (xi : i /∈ Λ),

xΛ is called an outside configuration for Λ.

Local specificatons

Definition

qΛ(·|xΛ), Λ ⊂⊂ Zd : conditional distributions on XΛ.Assume compatibility conditions.

There exists at least one probability measure q on the space ofconfigurations:

q = L(X) ∈ P(X Zd)

satisfying

L(XΛ|XΛ = xΛ) = qΛ(·|xΛ), all Λ ⊂⊂ Zd.

qΛ(·|xΛ): local specifications of q.

Finite range interactions

DefinitionThe local specifications have finite range of interactions if:there is an R > 0:

qΛ(·|xΛ) only depends on coordinates k /∈ Λ with

ρ(k,Λ) ≤ R.

q may not be uniquely defined by the local specifications, evenfor finite range interactions.

Dobrushin-Shlosman’s strong mixing condition

Given local specifications qΛ(·|xΛ).

Assumption

There exists a function ϕ(ρ) of the distance such that:(i) ∑

i∈Zd

ϕ(ρ(k, i)

)<∞,

and:(ii) for every

Λ ⊂⊂ Zd, M ⊂ Λ, k /∈ Λ

and everyyΛ, zΛ, differing only at k :∣∣qM (·|yΛ)− qM (·|zΛ)

∣∣ ≤ ϕ(ρ(k,M)).

In case of finite range interactions:

If Dobrushin-Shlosman’s strong mixing condition holds then

ϕ(ρ) = C · exp(−γ · ρ)

can be taken

ρ(k,M) = max length of red segments

|qM (·|yΛ)− qM (·|zΛ)| ≤ ϕ(ρ(k,M))

yΛ, zΛ differ only at k ∈ Λ

Dobrushin-Shlosman’s strong mixing condition

Cont’dMeaning: The influence of the spin at k /∈ Λon the spins in M ⊂ Λ that are far away from kis small.

Essential:

The spins over Λ are fixed in two different ways.

The spins over Λ \M are not fixed.

Logarithmic Sobolev inequality for strongly mixingmeasures

Earlier results for the case of finite range interactions:D. Stroock, B. Zegarlinski 1992, F. Cesi 2001F. Martinelli, E. Olivieri

Theorem 3

(XΛ, qΛ(·|yΛ)) for fixed Λ and yΛ

α = min{qi(xi|xi) : qi(xi|xi) > 0

}.

(Finite range is not assumed.){Dobrushin-Shlosman’s strong mixing condition + {α > 0}

}=⇒conditions of Theorem 1 for qΛ(·|yΛ), with uniform constant=⇒logarithmic Sobolev inequality for qΛ(·|yΛ) with uniform constant

Logarithmic Sobolev inequality for strongly mixingmeasures

Cont’dThe proof uses a Gibbs sampler

updating cubes of size mdepending on the dimension and on the function ϕ(ρ).

We get

W 22

(pΛ, qΛ(·|yΛ)

)≤ Cm ·

∑I:m-sided cube

E∣∣pI∩Λ(·|YI∩Λ)− qI∩Λ(·|YI∩Λ)

∣∣2≤ Cm,α ·

∑i∈Λ

E∣∣pi(·|YΛ\i

)− qi

(·|YΛ\i, yΛ

)∣∣2for an appropriate m that is good enough for any Λ and yΛ.