Information Geometric Structure on Positive Definite Matrices...

Preview:

Citation preview

Information Geometric Structure on Positive Definite Matrices

and its Applications

Atsumi Ohara University of Fukui2013 March 2-3 at Hokkaido University

ミニワークショップ「統計多様体の幾何学とその周辺(4)」1

Outline

1. Introduction2. Standard information geometry on positive

definite matrices3. Extension via the other potentials (Bregman

divergence) Joint work with S. Eguchi (ISM)

4. Conclusions

2

1. Introduction

: the set of positive definite real symmetric matrices

related to branches in math. Riemannian symmetric space Symmetric cone (Jordan algebra) Symplectic geom. (Siegel-Poincare) Information, Hessian geom. affine, Kahler, …,C-H, B-T,…?

3

1. Introduction

: the set of positive definite real symmetric matrices

related to branches in applications matrix (in)eq. (Lyapunov,Riccati,…) mathematical programming (SDP) Statistics, signal processing, time series

analysis (Gaussian, Covariance matrix) …

4

Our interests stable matrices and IG [O,Amari Kybernetika93]

standard IG [O,Suda,Amari LAA96]

dual conections and Jordan alg. [O,Uohashi Positivity04]

means on sym. cones [O IEOT04]

complexity analysis of IPM [O 統計数理98], [Kakihara,O,Tsuchiya JOTA?]

deformed IG [O,Eguchi ISM_RM05]

update formula for Q-Newton [Kanamori,O OMS13]

5

Information geometry on

Dualistic geometric structure

: arbitrary vector fields on

:Riemannian metric

, :a pair of dual affine connections6

A simple way to introduce a dualistic structure (1)

: open domain in:strongly convex on (i.e., positive

definite Hessian mtx.) Cf. Hessian geometry Riemannian metric

Dual affine connections

7

A simple way to introduce a dualistic structure (2)

divergence

projection MLE, MaxEnt and so on

Pythagorean relations

8

2. Standard IG on

: the set of positive definite real symmetric matrices

● logarithmic characteristic func. on

- The standard case -

R);(,detlog)( nPDPPP Î-=j

99

-log det P appears as

Semidefinite Programming (SDP)self-concordant barrier function

Multivariate Analysis (Gaussian dist.)log-likelihood function

(structured covariance matrix estimation) Symmetric cone: log characteristic function Information geometry on

potential function10

11

Standard dualistic geometric structure on (1) [O,Suda,Amari LAA96]

the set of n by n real symmetric matrix

:arbitrary set of basis matrices (primal) affine coordinate system

Identification

Standard dualistic geometric structure

on (2)

: Riemannian metric (Fisher for Gaussian)

, :dual affine connections

Jordan product (mutation)

plays a role of potential function)(Pj

1212

Properties symmetric cones

-invariant : isometric involution dual affine coordinate system (Legendre tfm.)

divergence

self-dual13

Invariance of the structure

Automorphism group, i.e., congruent transformation:the differential:

Ex) Riemannian metric

TG

TG

GXGX

nGLGGPGP

=

Î=

*)(

),,(,

t

t R

1414

Connections represented by Jordan product [Uohashi O Positivity04]

Recall the dual affine connections:

Hence, By the invariance, it follows that

Rem. the Levi-Civita is

15

Doubly autoparallel submanifold

Def. Submanifold is doubly autoparallel when it is both -and -autoparallel,

equivalently,

is both linearly and inverse-linearly constrained.

16

Linearly constrained -autoparallelInverse-linearly -autoparallel

17

Set .

18

Doubly autoparallelism (special case)

Jordan product for Sym(n)

19

Doubly autoparallelism - Examples – (1)

20

Doubly autoparallelism - Examples - (2)

21

Applications of DA Nearness, matrix approximation, GL(n)-invariance, convex optimization

Semidefinite Programming If a feasible region is DA, an explicit formula for

the optimal solution exists. Maximum likelihood estimation of structured

covariance matrix GGM, Factor analysis, signal processing (AR

model)

22

MLE of str. cov. matrix (1)

n samples of random variable z

23

MLE of str. cov. matrix (2)

If is also inverse-linearly constrained, i.e., is DA, then MLE is a convex optimization problem with a solution formula:

24

MLE of str. cov. matrix (3)

25

MLE of str. cov. matrix (4)

26

MLE of str. cov. matrix (5)

E-step: Explicit formula for simple imbedding (e.g., upper-left corner etc)

M-step: reduces to solving a linear equation if the structure of C is DA.

27

3. Extension via the other potentials (Bregman divergence)

[O,Eguchi ISM_RM05]

The other convex potentialsV-potential functions

Study their different and common geometric natures

Application to multivariate statistics?

28

Contents

V-potential function Dualistic geometry on Foliated Structure Decomposition of divergence Application to statistics

geometry of a family of multivariateelliptic distributions

29

Def. V-potential function

-The standard case:

PPssV detlog)(log)( -=Þ-= j

, RR ®+:)(sV

Characteristic function on

(strongly convex)30

Def.

Rem. The standard case: 2,0)(,1)(1 ³=-= kss knn

Prop. (Strong convexity condition)

The Hessian matrix of the V-potential is positive definite on if and only if

31

Prop.When two conditions in Prop.1 hold, Riemannian metric derived from the V-potential is

=Here,X,Y : vector field ~ symmetric matrix-valued func.

Rem. The standard case:=

32

Prop. (affine connections)

Let be the canonical flat connection on . Then the V-potential defines the following dualconnection with respect to :

Ñ

33

Rem. the standard case:

“mutation” of the Jordan product of Ei and Ej

34

divergence function

Divergence function derived from

- a variant of relative entropy,- Pythagorean type decomposition

3535

Prop.

The largest group that preserves the dualistic structure invariant is

except in the standard case.

),( RnSLGG Ît

),( RnGLGG ÎtRem. the standard case:

Rem. The power potential of the form:

has a special property.36

with

with

Special properties for the power potentials

Orthogonality is GL(n)-invariant. The dual affine connections derived from the

power potentials are GL(n)-invariant.Hence, Both - and -projection are GL(n) -

invariant.

37

Foliated Structures

The following foliated structure features thedualistic geometry derived by the V-potential.

38

Prop.Each leaf and are orthogonal each other with respect to .

Prop.Every is simultaneously a - and -geodesic for an arbitrary V-potential.

39

aProp.Each leaf is a homogeneous space with the constant negative curvature

40

Application to multivariate statistics

Non Gaussian distribution(generalized exponential family) Robust statistics beta-divergence, Machine learning, and so on

Nonextensive statistical physics Power distribution, generalized (Tsallis) entropy, and so on

41

Application to multivariate statistics

Geometry of U-modelDef. Given a convex function U and set u=U’,U-model is a family of elliptic (probability)distributions specified by P:

:normalizing const.42

Rem. When U=exp, the U-model is the family of Gaussian distributions.

U-divergence:

Natural closeness measure on the U-model

Rem. When U=exp, the U-divergence is the Kullback-Leibler divergence (relative entropy).

43

Prop.

Geometry of the U-model equipped with the

U-divergence coincides with

derived from the following V-potential function:

44

Conclusions

Sec. 2 DA submanifold: needs a tractable characterization or

the classification Sec. 3 Derived dualistic geometry is invariant under the

-group actions Each leaf is a homogeneous manifold with a negative

constant curvature Decomposition of the divergence function (skipped) Relation with the U-model with the U-divergence

45

),( RnSL

Main ReferencesA. Ohara, N. Suda and S. Amari, Dualistic Differential Geometry of Positive Definite

Matrices and Its Applications to Related Problems, Linear Algebra and its Applications, Vol.247, 31-53 (1996).

A. Ohara, Information geometric analysis of semidefinite programming problems, Proceedings of the institute of statistical mathematics (統計数理), Vol.46, No.2, 317-334 (1998) in Japanese.

A. Ohara and S. Eguchi, Geometry on positive definite matrices and V-potential function, Research Memorandum No. 950, The Institute of Statistical Mathematics, Tokyo, July (2005).

46

Recommended