73
ϦʔϚϯଟମͷదԽͷཧ ౻ࠤژཧՊେ 2016 3 10 ཧਓηϛφʔ

Hiroyuki Sato

  • Upload
    suurist

  • View
    367

  • Download
    0

Embed Size (px)

Citation preview

  • 2016 3 10

  • 1

    2

    3

    4

    5

    ( ) 2016 3 10 1 / 67

  • 1

    2

    3

    4

    5

    ( ) 2016 3 10 2 / 67

  • Rn

    1.1 (Rn )

    minimize f (x),subject to x Rn.

    1.1 Rn1: x0 Rn2: for k = 0, 1, 2, . . . do3: k Rn tk > 04: xk+1 xk+1 := xk + tkk5: end for

    k tk( ) 2016 3 10 3 / 67

  • Rn

    ( ) 2016 3 10 4 / 67

  • Rn k

    f , 2f fk := f (xk).k Rn

    2f (xk)[] = f (xk)

    0 := f (x0),k+1 := f (xk+1) + k+1k, k 0.

    k

    ( ) 2016 3 10 5 / 67

  • A n

    1.2

    minimize f (x) =xTAxxTx,

    subject to x Rn {0} .

    f (x)A

    x f( Ax = x

    TAxx2 x

    )

    x = x.

    ( ) 2016 3 10 6 / 67

  • 1.2 Rn

    1.3minimize f (x) = xTAx,subject to x Rn, xTx = 1.

    n 1 Sn1

    1.4minimize f (x) = xTAx,subject to x Sn1.

    ( ) 2016 3 10 7 / 67

  • 1.1M M

    Ui Ui Rni : Ui i(Ui)

    iUi = M,

    Ui Uj !

    i (1j |j(UiUj)

    ): j(Ui Uj) i(Ui Uj)

    C

    M Rn MR3

    MM

    ( ) 2016 3 10 8 / 67

  • p nn 1 Sn1 =

    {x Rn | xTx = 1

    } Rn

    n O(n) ={X Rnn | XTX = In

    } Rnn

    St(p, n) ={Y Rnp | YTY = Ip

    } Rnp

    n 1 RPn1 = {l : Rn }

    Grass(p, n) ={W : Rn p

    }

    ( ) 2016 3 10 9 / 67

  • Rn Mk M xk .

    Rn

    xk+1 := xk + tkk

    M (0) = xk, (0) = k M xk+1

    R : TM M Rx := R|TxM

    xk+1 := Rxk(tkk), Rxk : TxkM M.

    ( ) 2016 3 10 10 / 67

  • M R ( )

    1.2x0 M .

    for k = 0, 1, 2, . . . dok TxkM tk > 0 .

    xk+1 xk+1 := Rxk(tkk) .end for

    k tk

    ( ) 2016 3 10 11 / 67

  • ( ) 2016 3 10 12 / 67

  • M

    k := grad f (xk) grad M

    0 := grad f (x0),(?) k+1 := grad f (xk+1) + k+1k, k 0.

    grad f f

    grad f (xk+1) Txk+1M k TxkM

    ( ) 2016 3 10 13 / 67

  • 1

    2

    3

    4

    5

    ( ) 2016 3 10 14 / 67

  • x M TxM

    x M2

    M (0)

    f : M R (0)f = ddt

    f ((t))|t=0

    M (0)ddt(t)|t=0

    Sn1 := {x Rn | xTx = 1}

    TxSn1 = { Rn | Tx = 0}.

    ( ) 2016 3 10 15 / 67

  • g

    x M TxM gx x

    Sn1 Rn Rn

    a, b = aTb, a, b Rn

    gx(, ) = T, , TxSn1

    g TxMgx(, ) , x

    ( ) 2016 3 10 16 / 67

  • f grad f (x)

    M f x grad f (x) TxM

    D f (x)[] = gx(grad f (x), ), TxM

    Sn1 f (x) = xTAx Af Rn f

    f (x) = xTAx, x Rn.f Rn f (x) = 2Ax

    TxSn1

    Df (x)[] = 2xTA = 2xTA(In xxT) = gx(2(In xxT)Ax, )

    grad f (x) = 2(In xxT

    )Ax.

    ( ) 2016 3 10 17 / 67

  • R : TM M

    R [Absil et al., 2008]

    2.1R : TM M R

    Rx := R|TxM R TxMRx(0x) = x, x M. 0x TxMDRx(0x)[] = , x M, TxM.

    x M, TxM (t) = Rx(t)(0) = Rx(0) = x (t) x(0) = DRx(0)[] = (t)

    ( ) 2016 3 10 18 / 67

  • Sn1

    Rx() =x + x + , x S

    n1, TxSn1

    R

    ( ) 2016 3 10 19 / 67

  • 1

    2

    3

    4

    5

    ( ) 2016 3 10 20 / 67

  • Rn

    3.1 Rn1: x0 Rn .2: 0 := f (x0).3: while f (xk) ! 0 do4: k xk+1 := xk + kk .5: k+1

    k+1 := f (xk+1)+k+1k (1)

    6: k := k + 1.7: end while

    M(1) +

    grad f (xk+1) Txk+1M, k TxkM ( ) 2016 3 10 21 / 67

  • Vector transport

    Vector transport

    M vector transport T TM TM TMx M

    [Absil et al., 2008]1 R (Tx(x)) = R(x).

    (Tx(x)) Tx(x)2 T0x(x) = x, x TxM.3 Tx(ax + bx) = aTx(x) + bTx(x), a, b R.

    vector transport

    ( ) 2016 3 10 22 / 67

  • Vector transport

    Vector transport

    M R

    T Rx(x) := DRx(x)[x]T R vector transport

    T T R( ) 2016 3 10 23 / 67

  • Vector transport

    Vector transport

    3.1 M1: x0 M .2: 0 := grad f (x0).3: while grad f (xk) ! 0 do4: k xk+1 := Rxk(kk) .5: k+1 k+1 := grad f (xk+1) + k+1Tkk (k)6: k := k + 1.7: end while

    k k

    ( ) 2016 3 10 24 / 67

  • 0 < c1 < c2 < 1Rn xk Rn k f (xk)Tk < 0

    f (xk + kk) f (xk) + c1kf (xk)Tk, (2)f (xk + kk)Tk c2f (xk)Tk, (3)|f (xk + kk)Tk| c2|f (xk)Tk|. (4)

    (2)(2) (3)

    (2) (4)

    ( ) 2016 3 10 25 / 67

  • () := f (xk + k) (2), (3), (4)

    (k) (0) + c1k(0), (5)(k) c2(0), (6)|(k)| c2|(0)| (7)

    (5)(5) (6)

    (5) (7)M () := f (Rxk(k))

    (5), (6), (7)

    ( ) 2016 3 10 26 / 67

  • 0 < c1 < c2 < 1M xk M k

    grad f (xk), kxk < 0

    f (Rxk(kk)) f (xk) + c1kgradf (xk), kxk , (8)grad f (Rxk(kk)),DRxk(kk)[k]xk c2grad f (xk), kxk , (9)|grad f (Rxk(kk)),DRxk(kk)[k]xk | c2|grad f (xk), kxk |. (10)

    [Absil et al., 2008] (8)[Sato, 2015] (8) (9)

    [Ring & Wirth, 2012] (8) (10)

    DRxk(kk)[k] = T Rkk(k)( ) 2016 3 10 27 / 67

  • k

    Rn k

    gk := f (xk), yk := gk+1 gkHSk+1 =

    gTk+1ykTk yk

    . [Hestenes & Stiefel, 1952]

    FRk+1 =gk+12gk2

    . [Fletcher & Reeves, 1964]

    PRPk+1 =gTk+1ykgk2

    . [Polak, Ribiere, Polyak, 1969]

    CDk+1 =gk+12Tk gk

    . [Fletcher, 1987]

    LSk+1 =gTk+1ykTk gk

    . [Liu & Storey, 1991]

    DYk+1 =gk+12Tk yk

    . [Dai & Yuan, 1999]

    ( ) 2016 3 10 28 / 67

  • k

    k

    gk := f (xk), yk := gk+1 gkFletcherReeves: Rn FRk+1 =

    gk+12gk2

    .

    M

    k+1 =grad f (xk+1), grad f (xk+1)xk+1

    grad f (xk), grad f (xk)xk

    DaiYuan: Rn DYk+1 =gk+12Tk yk

    .

    M

    (?) k+1 :=grad f (xk+1), grad f (xk+1)xk+1

    k, ykxkyk = grad f (xk+1) Tkk(grad f (xk))?

    ( ) 2016 3 10 29 / 67

  • FletcherReeves

    Scaled vector transport

    Rn

    vector transport TTk1k1(k1)xk k1xk1

    Vector transport

    Vector transport T R scaled vector transport T 0[Sato & Iwai, 2015]

    T 0 () =x

    T R ()Rx()T R (), , TxM.

    ( ) 2016 3 10 30 / 67

  • FletcherReeves

    Scaled vector transport FletcherReeves

    3.2 FletcherReeves

    1: x0 M2: 0 := grad f (x0).3: while grad f (xk) ! 0 do4: k

    xk+1 := Rxk(kk)

    5: k+1 :=grad f (xk+1), grad f (xk+1)xk+1

    grad f (xk), grad f (xk)xkk+1 := grad f (xk+1) + k+1T (k)kk (k)

    6: k := k + 1.7: end while

    T (k)kk(k) :=T Rkk(k), if T Rkk(k)xk+1 kxk ,T 0kk(k), otherwise.

    ( ) 2016 3 10 31 / 67

  • FletcherReeves

    FletcherReeves

    3.1 (Sato & Iwai, 2015)f C1 L > 0

    |D(f Rx)(t)[] D(f Rx)(0)[]| Lt,

    TxM with x = 1, x M, t 03.2 {xk}

    lim infk

    grad f (xk)xk = 0

    ( ) 2016 3 10 32 / 67

  • FletcherReeves

    [Ring & Wirth, 2012]k

    T Rk1k1(k1)xk k1xk1 (11)

    vector transport T R[Sato & Iwai, 2015]

    (11) (11) vectortransport scaled vector transport

    ( ) 2016 3 10 33 / 67

  • FletcherReeves

    (11)n = 20,A = diag(1, . . . , 20) Sn1 :=

    {x Rn | xTx = 1

    }

    3.1minimize f (x) = xTAx,subject to x Sn1,

    Sn1

    gx(x, x) := Tx Gxx, x, x TxSn1,

    Gx := diag(104(x(1))2 + 1, 1, 1, . . . , 1) x(1)x 1

    ( ) 2016 3 10 34 / 67

  • FletcherReeves

    grad f (x) = 2(In

    G1x xxT

    xTG1x x

    )G1x Ax.

    Rx() =x +

    (x + )T(x + )

    , TxSn1, x Sn1,

    Vector transport:

    T R () =1

    (x + )T(x + )

    (In

    (x + )(x + )T

    (x + )T(x + )

    ),

    , TxSn1, x Sn1.x f (x) = 1

    ( ) 2016 3 10 35 / 67

  • FletcherReeves

    0 2 4 6 8 10x 104

    1.45

    1.5

    1.55

    1.6

    Iteration

    f(x

    k)

    ( ) 2016 3 10 36 / 67

  • FletcherReeves

    0 2 4 6 8 10x 104

    0.6

    0.65

    0.7

    0.75

    0.8

    0.85

    Iteration

    x(1

    )k

    ( ) 2016 3 10 37 / 67

  • FletcherReeves

    0 2 4 6 8 10x 104

    0

    0.5

    1

    1.5

    2

    2.5

    Iteration

    ||T

    R k

    k(

    k)|| x

    k+

    1/||

    k||

    xk

    ( ) 2016 3 10 38 / 67

  • FletcherReeves

    0 0.5 1 1.5 2x 104

    0.5

    1

    1.5

    Iteration

    xk

    (1)

    Ratios

    ( ) 2016 3 10 39 / 67

  • FletcherReeves

    0 50 100 150 2000

    0.2

    0.4

    0.6

    0.8

    1

    Iteration

    x(1

    )k

    ( ) 2016 3 10 40 / 67

  • FletcherReeves

    0 50 100 150 200108

    106

    104

    102

    100

    102

    Iteration

    Dist

    ance

    to so

    lutio

    n

    ( ) 2016 3 10 41 / 67

  • FletcherReeves

    n = 100, A = diag(1, . . . , 100)/100Sn1

    3.2minimize f (x) = xTAx,subject to x Sn1,

    Sn1

    gx(x, x) := Tx x, x, x TxSn1,

    ( ) 2016 3 10 42 / 67

  • FletcherReeves

    grad f (x) = 2(I xxT

    )Ax.

    Rx() =

    1 Tx + , TxSn1, x Sn1,

    Vector transport:

    T R () = T

    1 T)

    x,

    , TxSn1 with x, x < 1, x Sn1.(2) T R ()Rx() > x.

    ( ) 2016 3 10 43 / 67

  • FletcherReeves

    0 50 100 150 200 250 300 350

    106

    104

    102

    100

    Iteration

    Dis

    tan

    ce

    to

    so

    luti

    on

    ( ) 2016 3 10 44 / 67

  • DaiYuan

    Rn DaiYuan

    3.3 Rn DaiYuan [Dai & Yuan, 1999]

    1: x0 Rn2: 0 := grad f (x0).3: while grad f (xk) ! 0 do4: k xk+1 :=

    xk + kk5:

    k+1 =gk+12Tk yk

    , k+1 := grad f (xk+1) + k+1k

    gk = grad f (xk), yk = gk+1 gk.6: k := k + 1.7: end while

    ( ) 2016 3 10 45 / 67

  • DaiYuan

    Rn DaiYuan

    3.2f L = {x Rn | f (x) f (x1)} N

    C1 L > 0

    f (x) f (y) Lx y, x, y N

    3.3 {xk}

    lim infk

    grad f (xk)xk = 0

    ( ) 2016 3 10 46 / 67

  • DaiYuan

    DaiYuan

    Rn gk = f (xk), yk = gk+1 gk

    k+1 =gk+12Tk yk

    =gTk+1k+1

    gTk k

    M gk = grad f (xk)

    k+1 =gk+1, k+1xk+1

    gk, kxkk+1 k+1

    k+1

    ( ) 2016 3 10 47 / 67

  • DaiYuan

    DaiYuan

    k+1 =gk+1, k+1xk+1

    gk, kxk

    =gk+1,gk+1 + k+1T (k)kk(k)xk+1

    gk, kxk

    =gk+12 + k+1gk+1,T (k)kk(k)xk+1

    gk, kxk.

    k+1 =gk+12xk+1

    gk+1,T (k)kk(k)xk+1 gk, kxk.

    ( ) 2016 3 10 48 / 67

  • DaiYuan

    DaiYuan

    Rn

    k+1 =gTk+1k+1

    gTk k=

    gk+12Tk yk

    , yk = gk+1 gk.

    M

    k+1 =gk+1, k+1xk+1

    gk, kxk=

    gk+12xk+1T (k)kk(k), ykxk+1

    .

    yk = gk+1 gk, kxk

    T (k)kk(gk),T (k)kk(k)xk+1T (k)kk(gk).

    ( ) 2016 3 10 49 / 67

  • DaiYuan

    DaiYuan

    3.3 (Sato, 2015)f C1 L > 0

    |D(f Rx)(t)[] D(f Rx)(0)[]| Lt,

    TxM with x = 1, x M, t 0{xk}

    lim infk

    grad f (xk)xk = 0

    ( ) 2016 3 10 50 / 67

  • DaiYuan

    f (x) = xTAx, x Sn1.

    Iteration0 50 100 150 200 250 300 350

    Nor

    m o

    f the

    gra

    dien

    t

    10-6

    10-4

    10-2

    100

    102DY + wWolfeDY + sWolfeFR + wWolfeFR + sWolfe

    3.1: n = 100,A = diag(1, 2, . . . , n), x0 = 1n/

    n.( ) 2016 3 10 51 / 67

  • DaiYuan

    f (x) = xTAx, x Sn1.

    Iteration0 200 400 600 800 1000

    Nor

    m o

    f the

    gra

    dien

    t

    10-6

    10-4

    10-2

    100

    102

    104DY + wWolfeDY + sWolfeFR + wWolfeFR + sWolfe

    3.2: n = 500,A = diag(1, 2, . . . , n), x0 = 1n/

    n.( ) 2016 3 10 52 / 67

  • DaiYuan

    f (x) = xTAx, x Sn1.

    3.1: n = 100,A = diag(1, 2, . . . , n), x0 = 1n/

    n.!!!!!!Method

    Iterations Function Evals. Gradient Evals. Computational time

    DY + wWolfe 149 210 206 0.0175DY + sWolfe 90 288 244 0.0187FR + wWolfe 318 619 577 0.0429FR + sWolfe 91 293 258 0.0191

    3.2: n = 500,A = diag(1, 2, . . . , n), x0 = 1n/

    n.!!!!!!Method

    Iterations Function Evals. Gradient Evals. Computational time

    DY + wWolfe 340 373 367 0.0522DY + sWolfe 232 657 467 0.0658FR + wWolfe 960 1902 1757 0.1988FR + sWolfe 300 723 529 0.0730

    ( ) 2016 3 10 53 / 67

  • Rn k

    PRPk+1 =gk+1ykgk2

    , HSk+1 =gk+1ykdk yk

    , LSk+1 =gk+1ykdk gk

    ,

    FRk+1 =gk+12gk2

    , DYk+1 =gk+12dk yk

    , CDk+1 =gk+12dk gk

    .

    Rn 3[Narushima et al., 2011]

    0 := g0 k 0

    k+1 :=

    gk+1 if gk+1pk+1 = 0,gk+1 + k+1k k+1

    gk+1kgk+1pk+1

    pk+1 otherwise.

    pk Rn

    ( ) 2016 3 10 54 / 67

  • 1

    2

    3

    4

    5

    ( ) 2016 3 10 55 / 67

  • [Sato & Iwai, 2013]

    A Rmn, m np n N = diag(1, . . . , p), 1 > > p > 0

    4.1minimize tr(UTAVN),subject to (U,V) St(p,m) St(p, n).

    (U, V) U, VA p

    2

    ( ) 2016 3 10 56 / 67

  • [Yger et al., 2012]0 2 X RTm,Y RTn

    CX = XTX, CY = YTY , CXY = XTY

    u Rm, v Rn f = Xu, g = Yv2 f g

    =Cov(f , g)

    Var(f )

    Var(g)

    =uTCXYv

    uTCXu

    vTCYv.

    4.2maximize uTCXYv,subject to uTCXu = vTCYv = 1.

    2( ) 2016 3 10 57 / 67

  • [Yger et al., 2012]

    u, v

    4.3maximize tr(UTCXYV),subject to (U,V) StCX (p,m) StCY (p, n).

    n GStG(p, n)

    StG(p, n) = {Y Rnp | YTGY = Ip}

    2( ) 2016 3 10 58 / 67

  • [Sato & Sato, 2015]

    x =Ax + Bu,y =Cx.

    u Rp y Rq x Rn

    xm =Amxm + Bmu,ym =Cmxm.

    Am = UTAU,Bm = UTB,Cm = CU, U Rnm UUTU = Im

    ( ) 2016 3 10 59 / 67

  • [Sato & Sato, 2015]

    4.4minimize J(U),subject to U St(m, n).

    J

    J(U) := Ge2 = tr(CeEcCTe ) = tr(BTe EoBe)

    Ae =(A 00 UTAU

    ),Be =

    (B

    UTB

    ),Ce =

    (C CU

    )Ec

    Eo

    AeEc + EcATe + BeBTe =0, A

    Te Eo + EoAe + C

    Te Ce = 0.

    ( ) 2016 3 10 60 / 67

  • [Kasai & Mishra, 2015]

    X Rn1n2n3 : 3 {(i1, i2, i3) | id {1, 2, . . . , nd}, d {1, 2, 3}}Xi1i2i3 (i1, i2, i3)

    P(X)(i1,i2,i3) =Xi1i2i3 if (i1, i2, i3) 0 otherwise

    r = (r1, r2, r3)

    4.5minimize

    1||P(X) P(X

    )2F,

    subject to X Rn1n2n3 , rank(X) = r.

    ( ) 2016 3 10 61 / 67

  • [Kasai & Mishra, 2015]

    X Rn1n2n3 r

    X = G1U12U23U3, G Rr1r2r3 , Ud St(rd, nd), d = 1, 2, 3.

    M := St(r1, n1) St(r2, n2) St(r3, n3) Rr1r2r3Od O(rd), d = 1, 2, 3

    (U1,U2,U3,G) 8 (U1O1,U2O2,U3O3,G 1 OT1 2 OT2 3 OT3 )

    XM/(O(r1) O(r2) O(r3))

    ( ) 2016 3 10 62 / 67

  • [Yao et al., 2016]

    1

    DSIEP (Doubly Stochastic Inverse Eigenvalue Problem):self-conjugate {1, 2, . . . , n}

    n n C1, 2, . . . , n

    i

    ( ) 2016 3 10 63 / 67

  • [Yao et al., 2016]

    Oblique OB := {Z Rnn | diag(ZZT) = In} := diag(1, 2, . . . , n)U:

    1 Z Z, Z OB(Z Z)T1n 1n = 0

    Z Z 1, 2, . . . , nZ Z = Q( + U)QT , Q O(n), U U

    ( ) 2016 3 10 64 / 67

  • [Yao et al., 2016]

    H1(Z,Q,U) := Z Z Q( + U)QT , H2(Z) := (Z Z)T1n 1nH(Z,Q,U) := (H1(Z,Q,U),H2(Z))

    4.6minimize h(Z,Q,U) :=

    12H(Z,Q,U)2F,

    subject to (Z,Q,U) OB O(n) U.

    OB O(n) U

    ( ) 2016 3 10 65 / 67

  • 1

    2

    3

    4

    5

    ( ) 2016 3 10 66 / 67

  • ( ) 2016 3 10 67 / 67

  • I

    [1] Absil, P.A., Mahony, R., Sepulchre, R.: OptimizationAlgorithms on Matrix Manifolds. Princeton University Press,Princeton, NJ (2008)

    [2] Dai, Y.H., Yuan, Y.: A nonlinear conjugate gradient methodwith a strong global convergence property. SIAM Journalon Optimization 10(1), 177182 (1999)

    [3] Edelman, A., Arias, T.A., Smith, S.T.: The geometry ofalgorithms with orthogonality constraints. SIAM Journal onMatrix Analysis and Applications 20(2), 303353 (1998)

    [4] Fletcher, R., Reeves, C.M.: Function minimization byconjugate gradients. The Computer Journal 7(2), 149154(1964)

    ( ) 2016 3 10 68 / 67

  • II

    [5] Kasai, H., Mishra, B.: Riemannian preconditioning fortensor completion. arXiv preprint arXiv:1506.02159v1(2015)

    [6] Narushima, Y., Yabe, H., Ford, J.A.: A three-term conjugategradient method with sufficient descent property forunconstrained optimization. SIAM Journal on optimization21(1), 212230 (2011)

    [7] Ring, W., Wirth, B.: Optimization methods on Riemannianmanifolds and their application to shape space. SIAMJournal on Optimization 22(2), 596627 (2012)

    [8] Sato, H.: A DaiYuan-type Riemannian conjugate gradientmethod with the weak Wolfe conditions. ComputationalOptimization and Applications (2015)

    ( ) 2016 3 10 69 / 67

  • III[9] Sato, H., Iwai, T.: A Riemannian optimization approach to

    the matrix singular value decomposition. SIAM Journal onOptimization 23(1), 188212 (2013)

    [10] Sato, H., Iwai, T.: A new, globally convergent Riemannianconjugate gradient method. Optimization 64(4), 10111031(2015)

    [11] Sato, H., Sato, K.: Riemannian trust-region methods for H2optimal model reduction. In: Proceedings of the 54th IEEEConference on Decision and Control, pp. 46484655(2015)

    [12] Tan, M., Tsang, I.W., Wang, L., Vandereycken, B., Pan,S.J.: Riemannian pursuit for big matrix recovery. In:Proceedings of the 31st International Conference onMachine Learning, pp. 15391547 (2014)

    ( ) 2016 3 10 70 / 67

  • IV

    [13] Yao, T.T., Bai, Z.J., Zhao, Z., Ching, W.K.: A RiemannianFletcherReeves conjugate gradient method for doublystochastic inverse eigenvalue problems. SIAM Journal onMatrix Analysis and Applications 37(1), 215234 (2016)

    [14] Yger, F., Berar, M., Gasso, G., Rakotomamonjy, A.:Adaptive canonical correlation analysis based on matrixmanifolds. In: Proceedings of the 29th InternationalConference on Machine Learning (ICML-12), pp.10711078 (2012)

    ( ) 2016 3 10 71 / 67