Upload
buidieu
View
244
Download
5
Embed Size (px)
Citation preview
Quasi-Newton methods:Symmetric rank 1 (SR1)
Broyden–Fletcher–Goldfarb–Shanno (BFGS)Limited memory BFGS (L-BFGS)
February 6, 2014
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 1 / 25
Roadmap
Line-search Trust-region
CG pk = −∇fk + βkpk−1 —Newton pk = −[∇2fk ]−1∇fk mk(p) = fk +∇f Tk p + 0.5pT∇2fkp
q-Newton pk = −B−1k ∇fk mk(p) = fk +∇f Tk p + 0.5pTBkp
Today:
How to select Bk ≈ ∇2fk for fast convergence of algorithms?
Tomorrow:
How to cheaply approximately solve Newton’s subproblem & preserve fastconvergence?
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 2 / 25
Idea
Taylor:
∇f (xk+1)−∇f (xk) = ∇2f (xk + t(xk+1 − xk))︸ ︷︷ ︸≈Bk+1
[xk+1 − xk ]
Secant equation
sk := xk+1 − xk
yk := ∇f (xk+1)−∇f (xk)
Bk+1sk = yk
n equations in (n + 1)n/2 unknowns!
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 3 / 25
SR1 method
Ansatz:
Bk+1 = Bk + σvvT , σ = ±1
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 4 / 25
SR1 method
Bk+1 = Bk +(yk − Bksk)(yk − Bksk)T
(yk − Bksk)T sk, when (yk − Bksk)T sk 6= 0
Sherman–Morrison
A = A + abT =⇒ A−1 = A−1 − A−1abTA−1
1 + bTA−1a,
provided A, A are non-singular.
Hk+1 = B−1k+1 = Hk +(sk − Hkyk)(sk − Hkyk)T
(sk − Hkyk)T yk
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 5 / 25
SR1: cases
1 If (yk − Bksk)T sk 6= 0 use SR1 update formula
2 If yk = Bksk =⇒ Bk+1 = Bk
3 If yk 6= Bksk and (yk − Bksk)T sk = 0 =⇒ no SR1 update! Skip theupdate in this case (Bk+1 = Bk)
Note:
Bk does not have to be positive definite =⇒Good Hessian approximation for non-convex problems
Better suited for TR than LS approach
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 6 / 25
SR1: convergence
Theorem 6.1, N&W
Exact in n steps on convex quadratic functions (if updates never skipped)
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 7 / 25
SR1: convergence
Theorem 6.2, N&W
xk → x∗
∇2f is bounded & Lipschitz
|(yk − Bksk)T sk | ≥ r‖yk − Bksk‖‖sk‖ (updates never skipped)
{sk}-uniformly linearly independent
Thenlimk→∞
‖Bk −∇2f (x∗)‖ = 0
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 8 / 25
SR1: convergence
Theorem 6.7, N&W
Under restrictive assumptions (unique stationary point in the vicinity ofthe algorithmic iterations; updates never skipped; Bk bounded; etc) =⇒convergence with n-step superlinear rate.
Good convergence in practice.
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 9 / 25
Rank-two update algorithms
Idea:
minimizeB∈Rn×n
‖B − Bk‖,
subject to Bsk = yk ,
B = BT .
orminimizeH∈Rn×n
‖H − Hk‖,
subject to Hyk = sk ,
H = HT .
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 10 / 25
Rank-two update algorithms
Idea:
minimizeB∈Rn×n
‖B − Bk‖,
subject to Bsk = yk ,
B = BT .
Take scaled Frobenius matrix norm:
‖B − Bk‖2 = Tr[W (B − Bk)W (B − Bk)T ],
W = W T , W � 0, Wyk = sk .
Result: Davidon–Fletcher–Powell (DFP) algorithm
Bk+1 = (I − ρkyksTk )Bk(I − ρkskyTk ) + ρkykyTk ,
ρk = (yTk sk)−1.
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 11 / 25
Davidon–Fletcher–Powell algorithm:
Bk+1 = (I − ρkyksTk )Bk(I − ρkskyTk ) + ρkykyTk ,
Hk+1 = Hk −Hkyky
Tk Hk
yTk Hkyk+
sksTk
yTk sk,
ρk = (yTk sk)−1.Historically: first quasi-Newton algorithm. Sometimes gets stuck.
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 12 / 25
Rank-two update algorithms
Idea:
minimizeH∈Rn×n
‖H − Hk‖,
subject to Hsk = yk ,
H = HT .
Take scaled Frobenius matrix norm:
‖H − Hk‖2 = Tr[W (H − Hk)W (H − Hk)T ],
W = W T , W � 0, Wsk = yk .
Result: Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm
Hk+1 = (I − ρkskyTk )Hk(I − ρkyksTk ) + ρksksTk ,
ρk = (yTk sk)−1.
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 13 / 25
Broyden–Fletcher–Goldfarb–Shanno algorithm:
Hk+1 = (I − ρkskyTk )Hk(I − ρkyksTk ) + ρksksTk ,
Bk+1 = Bk −Bksks
Tk Bk
sTk Bksk+
ykyTk
yTk sk,
ρk = (yTk sk)−1.
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 14 / 25
Linesearch framework:
Wolfe curvature condition:
∇f Tk+1pk ≥ c2∇f Tk pk
⇓[∇f Tk+1 −∇fk ]T︸ ︷︷ ︸
=yTk
pk︸︷︷︸=α−1
k sk
≥ (c2 − 1)∇f Tk pk > 0.
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 15 / 25
BFGS/DFP+Wolfe linesearch:
Suppose Hk � 0.
Wolfe curvature condition =⇒ yTk sk > 0.
∀z ∈ Rn, z 6= 0:
zTHk+1z = [(I − ρkyksTk )zk ]THk [(I − ρkyksTk )zk ] + [sTk z ]2/(yTk sk) ≥ 0.
If sTk z 6= 0 =⇒ zTHk+1zT ≥ [sTk z ]2/(yTk sk) > 0.
If sTk z = 0 =⇒ zTHk+1z = zTHkz > 0.
Therefore Hk+1 � 0! (Same for DFP)
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 16 / 25
BFGS/DFP+Wolfe linesearch:
B0 � 0 (H0 � 0) + Wolfe linesearch⇓Bk � 0 (Hk � 0), ∀k⇓pk = −B−1k ∇fk = −Hk∇fk is a descent direction (∇fk 6= 0).
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 17 / 25
BFGS: global convergence
Theorem 6.4, N&W
Exact in n steps on convex quadratic functions; if B0 = I =⇒ sameiterates as CG.
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 18 / 25
BFGS: global convergence
Theorem 6.5, N&W
1 B0 � 0, B0 = BT0
2 f ∈ C 2
3 ∃m,M > 0: m‖z‖2 ≤ zT∇2f (x)z ≤ M‖z‖2
Then xk → x∗: ∇f (x∗) = 0 &∑∞
k=1 ‖xk − x∗‖ <∞.
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 19 / 25
BFGS: local convergence
Theorem 6.6, N&W
1 f ∈ C 2
2 ∇f - Lipschitz
3∑∞
k=1 ‖xk − x∗‖ <∞.
Then limk→∞ ‖xk+1 − x∗‖/‖xk − x∗‖ = 0.
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 20 / 25
Presented quasi-Newton algorithms:
Good for small to medium size optimization problems with dense (or costlyto compute) Hessian matrices
Factorization/system solve cost for [∇2fk ]−1∇fk : O(n3)
Update + multiplication with Hk : O(n2)
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 21 / 25
Limited-memory quasi-Newton algorithms:
Idea
Storage of Bk (or Hk): O(n2)
Instead: store m << n pairs (sk , yk): O(2mn)
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 22 / 25
Example: L-BFGS
Hk+1q = [I − ρkyksTk ]THk [I − ρkyksTk ]q + ρksksTk q
= [I − ρkskyTk ]Hk [q − {ρksTk q}yk ] + {ρksTk q}sk .
Algorithm (only one pair is stored)
1: αk := ρksTk q
2: q := q − αkyk3: r := Hkq4: βk := ρky
Tk r
5: r := r + (αk − βk)sk
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 23 / 25
Example: L-BFGS
1: q := ∇fk+1
2: for i = k , k − 1, . . . , k −m + 1 do3: αi := ρi s
Ti q
4: q := q − αiyi5: end for6: r := H0
kq7: for i = k −m + 1, . . . , k − 1, k do8: βi := ρiy
Ti r
9: r := r + (αi − βi )si10: end for11: Result: r ≈ Hk+1∇fk+1
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 24 / 25
Storage and step complexity revisited:
Method Storage Step complexity Convergence rate
CG O(n) O(n) linearBFGS/DFP/SR1 O(n2) O(n2) superlinear
L-BFGS O(mn) O(mn) ???Newton (full Hess) O(n2) O(n3) quadratic
Newton (sparse Hess) O(n)–O(n2) O(n)–O(n3) quardatic
Note: step complexity does not account for complexities inevaluating function/derivatives!!!
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 25 / 25