68
pn- ª00136720 (,00100863 , f Epfv-ˆ http://bicmr.pku.edu.cn/~wenzw/bigdata2019.html [email protected] 1/68

f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

大数据分析中的算法

文再文

课程代码:00136720 (本科生),00100863(本研合)

北京大学北京国际数学研究中心

http://bicmr.pku.edu.cn/~wenzw/[email protected]

1/68

Page 2: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

2/68

课程信息

大数据分析中的算法侧重数值代数和最优化算法

课程代码:00136720 (本科生),00100863(本研合)

教师信息:文再文,[email protected],微信:wendoublewen

助教信息:杨明瀚,柳伊扬

上课地点:二教301

上课时间:1-16周,每周周二1-2节,双周周四1-2节,8:00am-9:50am

课程主页:http://bicmr.pku.edu.cn/~wenzw/bigdata2020.html

Page 3: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

3/68

参考资料

class notes, and reference books or papers“Mining of Massive Datasets”, Jure Leskovec, Anand Rajaraman,Jeff Ullman, Cambridge University Press

“Convex optimization”, Stephen Boyd and Lieven Vandenberghe

“Numerical Optimization”, Jorge Nocedal and Stephen Wright,Springer

“Optimization Theory and Methods”, Wenyu Sun, Ya-Xiang Yuan

“Matrix Computations”, Gene H. Golub and Charles F. Van Loan.The Johns Hopkins University Press

“Deep Learning", Ian Goodfellow and Yoshua Bengio and AaronCourville

Page 4: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

4/68

大数据分析相关的网站

Mining Massive Data Sets, Stanford University

Jure Leskovec @ Stanford - Stanford Computer Science

Mining Massive Data Sets: Hadoop Labs, Stanford University

Big Data Analytics, Columbia University

Theoretical Foundations of Big Data Analysis, The SimonsInstitute for the Theory of Computing, UC Berkeley

Core Methods in Data Science, University of Washington

“Theories of Deep Learning”, STATS 385, Stanford University,Fall 2017

courses from Berkeley Artificial Intelligence Research (BAIR)Lab

Page 5: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

5/68

课程计划

侧重数值代数和最优化的模型与算法

线性规划,半定规划

压缩感知和稀疏优化基本理论和算法

低秩矩阵恢复的基本理论和算法

随机特征值算法,随机优化算法

图和网络流问题: shorted path, maximum flow等等

机器学习算法:聚类分析,高维数据降维,链接分析,推荐系统,SVM

现代医学成像与高维图像分析:相位恢复以及低温电子显微镜

Optimal transport

深度学习(deep learning)

强化学习(reinforcement learning)

Page 6: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

6/68

课程信息

教学方式:课堂讲授

成绩评定办法:

迟交一天(24小时)打折10%,不接受晚交4天的作业和项目(任何时候理由都不接受)

大作业,包括习题和程序:40%

课程项目一:30%

课程项目二:30%

作业要求:i)计算题要求写出必要的推算步骤,证明题要写出关键推理和论证。数值试验题应该同时提交书面报告和程序,其中书面报告有详细的推导和数值结果及分析。ii)可以同学间讨论或者找助教答疑,但不允许在讨论中直接抄袭,应该过后自己独立完成。iii)严禁从其他学生,从互联网,从往年的答案,其它课程等等任何途径直接抄袭。iv)如果有讨论或从其它任何途径取得帮助,请列出来源。

请谨慎选课

Page 7: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

7/68

Why Optimization in Machine Learning?

Many problems in ML can be written as

minw∈W

N∑i=1

12‖w>xi − yi‖2

2 + λ‖w‖1 linear regression

minw∈W

1N

N∑i=1

log(1 + exp(−yiw>xi)) + λ‖w‖1 logistic regression

minw∈W

N∑i=1

`(f (w, xi), yi) + λr(w) general formulation

The pairs (xi, yi) are given data, yi is the label of the data point xi

`i(·): measures model fit for data point i (avoids under-fitting)r(w): measures model “complexity" (avoids over-fitting)f (w, x): linear function or models constructed from deep neuralnetworks

Page 8: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

8/68

Convolutional neural network (CNN)

A pioneering digit classification neural network by LeCun et. al. Itwas applied by several banks to recognise hand-written numbers onchecks.https://stats385.github.io/networks

Page 9: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

9/68

AlexNet

Designed by Hinton and etc in 2012 and achieved a top-5 error of15.3%. The network consisted of five convolutional layers, some ofwhich were followed by max-pooling layers, and three fully-connectedlayers with a final 1000-way softmax. All the convolutional and fullyconnected layers were followed by the ReLU nonlinearity.https://stats385.github.io/networks

Page 10: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

10/68

VGGNet

A 19 layer convolutional neural network from VGG group, Oxford, thatwas simpler and deeper than AlexNet. All large-sized filters inAlexNet were replaced by cascades of 3x3 filters (with nonlinearity inbetween). Max pooling was placed after two or three convolutionsand after each pooling the number of filters was always doubled.https://stats385.github.io/networks

Page 11: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

11/68

Deconvolutional networks

A generative network that is a special kind of convolutional networkthat uses transpose convolutions, also known as a deconvolutionallayers. https://stats385.github.io/architectures

Page 12: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

12/68

Recurrent neural networks (RNN)

RNNs are built on the same computational unit as the feed forwardneural network. RNNs do not have to be organized in layers anddirected cycles are allowed. This allows them to have internalmemory and as a result to process sequential data. One can convertan RNN into a regular feed forward neural network by "unfolding" it intime, as depicted in the figures.https://stats385.github.io/architectures

Page 13: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

13/68

Optimization algorithms in Deep learning

随机梯度类算法

pytorch/caffe2里实现的算法有adadelta, adagrad, adam,nesterov, rmsprop, YellowFinhttps://github.com/pytorch/pytorch/tree/master/caffe2/sgd

pytorch/torch里有:sgd, asgd, adagrad, rmsprop, adadelta,adam, adamaxhttps://github.com/pytorch/pytorch/tree/master/torch/optim

tensorflow实现的算法有:Adadelta, AdagradDA, Adagrad,ProximalAdagrad, Ftrl, Momentum, adam, Momentum,CenteredRMSProp具体实现:https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/training_ops.cc

Page 14: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

14/68

Stochastic Algorithms in Deep learning

Consider problem minx∈Rd1n

∑ni=1 fi(x)

References: chapter 8 inhttp://www.deeplearningbook.org/

Gradient descent

xt+1 = xt − αt

n

n∑i=1

∇fi(xt)

Stochastic gradient descent

xk+1 = xt − αt∇fi(xt)

SGD with momentum

vt+1 = µtvt − αt∇fi(xt)

xt+1 = xt + vt+1

Page 15: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

15/68

Stochastic Algorithms in Deep learning

Adaptive Subgradient Methods(Adagrad): let gt = ∇fi(xt),g2

t = diag[gtgTt ] ∈ Rd, and initial G1 = g2

1. At step t

xt+1 = xt − αt√

Gt + ε1d∇fi(xt)

Gt+1 = Gt + g2t+1

in the upper and the following iterations we use element-wisevector-vector multiplication.

Page 16: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

16/68

Reinforcement Learning

AlphaGo: supervised learning + policy gradients + valuefunctions + Monte-Carlo tree search

Page 17: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

17/68

What is RL

Reinforcement learning

Branch of machine learning concerned with taking sequences ofactions Usually described in terms: Policies (select next action), Valuefunctions (measure goodness of states or state-action pairs),Dynamics Models (predict next states and rewards)

is learning what to do–how to map situations to actions–so as tomaximize a numerical reward signal. The decision-maker is called theagent, the thing it interacts with, is called the environment.

A reinforcement learning task that satisfies the Markov property iscalled a Markov Decision process, or MDP

We assume that all RL tasks can be approximated with Markovproperty. So this talk is based on MDP.

Page 18: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

18/68

Agent and Environment

The agent selects actions based on the observations and rewardsreceived at each time-step t

The environment selects observations and rewards based on theactions received at each time-step t

Page 19: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

19/68

Some applications

Page 20: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

20/68

Definition

A Markov Decision Process is a tuple (S,A,P, r, γ):S is a finite set of states, s ∈ SA is a finite set of actions, a ∈ AP is the transition probability distribution.probability from state s with action a to state s′: P(s′|s, a)also called the model or the dynamicsr is a reward function, r(s, a, s′)sometimes just r(s) or ra

sor rt after time step tγ ∈ [0, 1] is a discount factor

Page 21: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

21/68

Tetris

State: the current board, the current falling tileAction: Rotate or shift the falling shapeOne-step reward: if a level is cleared by the current action,reward 1, o.w. reward 0;Transition Probability: future tile is uniformly distributed

Page 22: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

22/68

Optimization

Maximize the expected total discounted return of an episode

maxπ

E[

∞∑t=0

γtrt|π]

or,maxπ

E[R(τ)|τ = s0, a0, s1, a1, ... ∼ π]

Policy π(a|s) is a probability

Page 23: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

23/68

Linear Programming Formulation

Solving Markov Decision Processes (MDP) can be formulated as LP:Primal

max∑

a

raxa

s.t. ∀s ∈ S,∑a∈A

xa = 1 + γPa,sxa

x ≥ 0

Dualmin

∑s

vs

s.t. ∀s ∈ S, a ∈ A, vs ≥ ra + γ∑

s′Pa,s′vs′

Page 24: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

24/68

线性规划,半定规划,锥规划

Primal

min c>x

s.t. Ax = b

x ∈ K

Dual

max b>y

s.t. A>y + s = c

s ∈ K∗

“大”问题:LP: K is the nonnegative orthantvector x ∈ Rn, millions of variables

SOCP: K is the circular or Lorentz conevector x ∈ Rn, millions of variables

SDP: K is the cone of positive semidefinite matrices matrixx ∈ Sn×n, n up to 5000

Page 25: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

25/68

Compressive Sensing

Find the sparest solutionGiven n=256, m=128.A = randn(m,n); u = sprandn(n, 1, 0.1); b = A*u;

50 100 150 200 250−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

min

x‖x‖0

s.t. Ax = b(a) `0-minimization

50 100 150 200 250−0.6

−0.4

−0.2

0

0.2

0.4

min

x‖x‖2

s.t. Ax = b(b) `2-minimization

50 100 150 200 250−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

min

x‖x‖1

s.t. Ax = b(c) `1-minimization

Page 26: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

26/68

Wavelets and Images (Thanks: Richard Baraniuk)

Page 27: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

27/68

Wavelet Approximation (Thanks: Richard Baraniuk)

Page 28: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

28/68

Compressive sensing

Given (A, b,Ψ), find the sparsest point:

x∗ = arg min‖Ψx‖0 : Ax = b

From combinatorial to convex optimization:

x = arg min‖Ψx‖1 : Ax = b

1-norm is sparsity promotingBasis pursuit (Donoho et al 98)Many variants: ‖Ax− b‖2 ≤ σ for noisy b

Theoretical question: when is ‖ · ‖0 ↔ ‖ · ‖1 ?

Page 29: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

29/68

Restricted Isometry Property (RIP)

Definition (Candes and Tao [2005])Matrix A obeys the restricted isometry property (RIP) with constant δs

if(1− δs)‖c‖2

2 ≤ ‖Ac‖22 ≤ (1 + δs)‖c‖2

2

for all s-sparse vectors c.

Theorem (Candes and Tao [2006])If x is a k-sparse and A satisfies δ2k + δ3k < 1, then x is the unique `1minimizer.

RIP essentially requires that every set of columns with cardinalityless than or equal to s behaves like an orthonormal system.

Page 30: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

30/68

MRI: Magnetic Resonance Imaging

(a) MRI Scan (b) Fourier Coefficients (c) Image

Is it possible to cut the scan time into half?

Page 31: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

31/68

MRI (Thanks: Wotao Yin)

MR images often have sparse sparse representations undersome wavelet transform Φ

Solvemin

u‖Φu‖1 +

µ

2‖Ru− b‖2

R: partial discrete Fourier transformThe higher the SNR (signal-noise ratio) is, the better the imagequality is.

(a) full sampling (b) 39% sampling,SNR=32.2

Page 32: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

32/68

MRI: Magnetic Resonance Imaging

(a) full sampling (b) 39% sampling,SNR=32.2

(c) 22% sampling,SNR=21.4

(d) 14% sampling,SNR=15.8

Page 33: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

33/68

The Netflix Prize

Training data100 million ratings, 480,000 users, 17,770 movies6 years of data: 2000-2005

Test dataLast few ratings of each user (2.8 million)Evaluation criterion: root mean squared error (RMSE)Netflix Cinematch RMSE: 0.9514

Competition2700+ teams$1 million prize for 10% improvement on Cinematch

Page 34: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

34/68

Netflix Problem: 1 million dollar award

Given m movies x ∈ X and n customers y ∈ Ypredict the “rating” W(x, y) of customer y for movie xtraining data: known ratings of some customers for some moviesGoal: complete the matrixother applications: collaborative filtering, system identification,etc.

Page 35: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

35/68

Matrix Rank Minimization

Given X ∈ Rm×n, A : Rm×n → Rp, b ∈ Rp, we considermatrix completion problem:

min rank(X), s.t. Xij = Mij, (i, j) ∈ Ω

nuclear norm minimization:

min ‖X‖∗ s.t. A(X) = b

where ‖X‖∗ =∑

i σi and σi = ith singular value of matrix X.SDP reformulation

min Tr(W1) + Tr(W2)

s.t.(

W1 XX> W2

) 0

A(X) = b

Page 36: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

36/68

Video separation

Partition the video into moving and static parts

Page 37: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

37/68

Sparse and low-rank matrix separation

Given a matrix M, we want to find a low rank matrix W and asparse matrix E, so that W + E = M.Convex approximation:

minW,E‖W‖∗ + µ‖E‖1, s.t. W + E = M

Robust PCA

Page 38: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

38/68

图与网络流问题

Page 39: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

39/68

图与网络流问题

2014 ACM SIGMOD Programming ContestShortest Distance Over Frequent Communication Paths定义社交网络的边: 相互直接至少有x条回复并且相互认识。给定网络里两个人p1和p2以及另外一个整数x,寻找图中p1和p2之间数量最少节点的路径

Interests with Large Communities

Socialization Suggestion

Most Central People (All pairs shorted path)定义网络:论坛中有标签t的成员,相互直接认识。给定整数k和标签t,寻找k个有highest closeness centrality values的人

Page 40: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

40/68

DIMACS Implementation Challenges

http://dimacs.rutgers.edu/Challenges/

2014, 11th: Steiner Tree Problems2012, 10th: Graph Partitioning and Graph Clustering2005, 9th: The Shortest Path Problem2001, 8th: The Traveling Salesman Problem2000, 7th: Semidefinite and Related Optimization Problems1998, 6th: Near Neighbor Searches1995, 5th: Priority Queues, Dictionaries, and Multi-Dim. PointSets1994, 4th: Two Problems in Computational Biology: FragmentAssembly and Genome Rearrangements1993, 3rd: Effective Parallel Algorithms for CombinatorialProblems1992, 2nd: Maximum Clique, Graph Coloring, and Satisfiability1991, 1st: Network Flows and Matching

Page 41: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

41/68

Combinatorial optimization problems

Maximum cut problemgiven weights W for all edges in a graph, find the partition of thenodes of G into two sets V1 and V2 so that the sum of the weightsof the edges between V1 and V2 is maximal

Maximum k-cut problem (Frequency assignment problem)given a graph G, assign colors to the vertices in such a way thatthe number of non-defect edges (with endpoints of differentcolors) is maximal

Maximum stable set problemsX (G): clique number of a graph G = largest clique in Gω(G): coloring number of a graph G = smallest number of colorsneeded to color a graph, so that vertices connected by an edgehave different colorsθ(G) (θ+(G)): Lovasz theta function of G: defined by a SDP

ω(G) ≤ θ(G) ≤ θ+(G) ≤ X (G).

Page 42: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

42/68

Semidefinite programming (SDP) relaxations

Maximum cut problem

minX∈Sn〈C,X〉 s.t. Xi,i = 1, X 0

Frequency assignment problem (Maximum k-cut problem)

min⟨ 1

2k diag(We) + k−12k W,X

⟩s.t. Xi,j ≥ −1

k−1 , ∀(i, j) ∈ E\T, Xi,j = −1k−1 , ∀(i, j) ∈ T,

diag(X) = e, X 0.

Maximum stable set problems

θ(G) := max⟨ee>,X

⟩s.t. Xij = 0, (i, j) ∈ E, 〈I,X〉 = 1,X 0,

θ+(G) := max⟨ee>,X

⟩s.t. Xij = 0, (i, j) ∈ E, 〈I,X〉 = 1,X 0,X ≥ 0

Page 43: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

43/68

Communities in the Networks

Many networks have community structures. Nodes in the samecluster have high connection intensity.

Figure: https://www.slideshare.net/NicolaBarbieri/community-detection

Page 44: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

44/68

Communities in the Networks

Figure: Simmons College Facebook Network, the four clusters are labeledby different graduation year: 2006 in green, 2007 in light blue, 2008 inpurple and 2009 in red. Figure from Chen, Li and Xu, 2016.

Page 45: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

45/68

Partition Matrix and Assignment Matrix

For any partition ∪ka=1Ca = [n], define the partition matrix X

Xij =

1, if i, j ∈ Ca, for some a,0, else .

Low rank solution

X =

11 1 11 1 11 1 1

1 11 1

=

1111

11

×1

1 1 11 1

Page 46: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

46/68

Modularity Maximization

The modularity (MEJ Newman, M Girvan, 2004) is defined by

Q = 〈A− 12λ

ddT ,X〉

where λ = |E|.The Integral modularity maximization problem:

max 〈A− 12λddT ,X〉

s.t. X ∈ 0, 1n×n is a partiton matrix.

Probably hard to solve.

Page 47: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

47/68

Convex Relaxation

doubly nonnegative relaxation of modularity maximization:

min 〈−A +1

2λddT ,X〉

s.t. Xii = 1, i = 1, . . . , n,

X 0,

0 ≤ Xij ≤ 1

This is an SDP problemwith theoretical guarantee under certain conditionsBut it is a huge-scale problem to solve

Page 48: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

48/68

Dimensionality Reduction

Given n data points Xi ∈ Rp and assume that this dataset has intrinsicdimensionality d where d p. How transform the data set X into anew dataset Y with dimensionality d?

max∑

ij

‖yi − yj‖2

s.t. ‖yi − yj‖2 = ‖xi − xj‖2.

Maximal variance unfolding (MVU):

max Tr(K)

s.t. Kii + Kjj − 2Kij = ‖xi − xj‖2∑ij

Kij = 0

K 0.

which is SDP problem.

Page 49: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

49/68

Supporting Vector Machine (SVM)

Suppose we have two-class discrimination data. We assign the firstclass with 1 and the second with 1. A powerful discrimination methodis the Supporting Vector Machine (SVM). With yi = 1 (in the firstclass), let the data point i be given by ai ∈ Rd, i = 1, . . . , n1, and withyi = −1 (in the second class), let the data point j be given by bj ∈ Rd,j = 1, . . . , n2. We wish to find a hyper-plane in Rd to separate ais frombjs. Mathematically, we wish to find ω ∈ Rd and β ∈ R such that

a>i ω + β > 1,∀i

andb>j ω + β < −1, ∀j,

where x : ω>x + β = 0 is the desired separating hyper-plane. This is alinear program!

Page 50: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

50/68

Supporting Vector Machine (SVM)

When the data is noisy. Then we solve

min∑

i

|πi|+∑

j

|σj|

s.t. a>i ω + β ≥ 1− πi,∀ib>j ω + β ≤ −1 + σj,∀jπ, σ ≥ 0

which is a linear program, or

min ω>ω + µ

∑i

π2i +

∑j

σ2j

s.t. a>i ω + β ≥ 1− πi,∀i

b>j ω + β ≤ −1 + σj,∀jπ, σ ≥ 0

which is a quadratic program.

Page 51: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

51/68

Supply chain

Page 52: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

52/68

Supply chain

Page 53: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

53/68

facility localization

Page 54: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

54/68

Medical imaging

Page 55: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

55/68

Medical imaging

Page 56: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

56/68

Medical imaging

Page 57: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

57/68

Phase Retrieval

Phase carries more information than magnitude

Y |F(Y)| phase(F(Y)) iF(|F(Y)|.*phase(F(S)))

S |F(S)| phase(F(S)) iF(|F(S)|.*phase(F(Y)))

Question: recover signal without knowing phase?

Page 58: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

58/68

Classical Phase Retrieval

Feasibility problem

find x ∈ S ∩M or find x ∈ S+ ∩M

given Fourier magnitudes:

M := x(r) | |x(ω)| = b(ω)

where x(ω) = F(x(r)), F : Fourier transformgiven support estimate:

S := x(r) | x(r) = 0 for r /∈ D

orS+ := x(r) | x(r) ≥ 0 and x(r) = 0 if r /∈ D

Page 59: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

59/68

Ptychographic Phase Retrieval (Thanks: Chao Yang)

Given bi = |F(Qiψ)| for i = 1, . . . , k, can we recover ψ?

Ptychographic imaging along with advances in detectors andcomputing have resulted in X-ray microscopes with increased spatialresolution without the need for lenses

Page 60: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

60/68

Recent Phase Retrieval Model Problems

Given A ∈ Cm×n and b ∈ Rm

find x, s.t. |Ax| = b.

(Candes et al. 2011b, Alexandre d’Aspremont 2013)

SDP Relaxation: |Ax|2 is a linear function of X = xx∗

minX∈Sn

Tr(X)

s.t. Tr(aia∗i X) = b2i , i = 1, . . . ,m,

X 0

Exact recovery conditions

Page 61: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

61/68

Single Particle Cryo-Electron Microscopy

Projection images Pi(x, y) =∫∞−∞ φ(xR1

i + yR2i + zR3

i )dz+ “noise”.

φ : R3 → R is the electric potential of the molecule.Cryo-EM problem: Find φ and R1, . . . ,Rn given P1, . . . ,Pn.

Page 62: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

62/68

A Toy Example

Page 63: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

63/68

Single Particle Cryo-Electron Microscopy

Least Squares Approach

minR1,...,RK∈SO(3)

∑i 6=j

wij

∥∥∥Ri (~cij, 0)T − Rj (~cji, 0)T∥∥∥2

Since∥∥∥Ri (~cij, 0)T

∥∥∥ =∥∥∥Rj (~cji, 0)T

∥∥∥ = 1, we obtain

maxR1,...,RK∈SO(3)

∑i 6=j

wij〈Ri (~cij, 0)T ,Rj (~cji, 0)T〉

SDP Relaxation:

maxG∈R2K×2K trace ((W S) G)

s.t. Gii = I2, i = 1, 2, · · · ,K,G < 0

Page 64: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

64/68

Randomized Linear Algebra Algorithms

(Thanks: Petros Drineas and Michael W. Mahoney)Goal: To develop and analyze (fast) Monte Carlo algorithms forperforming useful computations on large (and later not so large!)matrices and tensors.

Matrix MultiplicationComputation of the Singular Value DecompositionComputation of the CUR DecompositionTesting Feasibility of Linear ProgramsLeast Squares ApproximationTensor computations: SVD generalizationsTensor computations: CUR generalization

Such computations generally require time which is superlinear in thenumber of nonzero elements of the matrix/tensor, e.g., O(n3) for n× nmatrices.

Page 65: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

65/68

CPU vs. GPU

March 1, 2015:Intel Xeon Processor E7-8880 v2 15 cores, $5729.00Intel Xeon Phi Coprocessor 7120A, 61 cores, $4235.00Tesla K80, 4992 ( 2496 per GPU) cores, $4959.99

Page 66: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

66/68

Super Computer

http://www.top500.org/lists/2014/11/

Top 1: Tianhe-2 (MilkyWay-2): TH-IVB-FEP Cluster, Intel XeonE5-2692 12C 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P,NUDT, 3,120,000 cores#24: Edison is NERSC’s newest supercomputer, a Cray XC30,with a peak performance of 2.57 petaflops/sec, 133,824 computecores, 357 terabytes of memory, and 7.56 petabytes of disk.

Page 67: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

67/68

Optimization Formulation

Mathmatical optimization problem

minf (x)

s.t. ci(x) = 0, i ∈ Eci(x) ≥ 0, i ∈ I

x = (x1, . . . , xn)>: variablef (x) : Rn → R: objective functionci(x) : Rn → R: constraintsoptimal solution x∗: a feasible point with the smallest value of f

Page 68: f ýEpf v ˆ wenzw/bigdata2019.html ...bicmr.pku.edu.cn/~wenzw/bigdata/lect1.pdf · 4/68 ’pn ’łs—QÙ Mining Massive Data Sets, Stanford University Jure Leskovec @ Stanford

68/68

Classification

Continuous versus discrete optimizationUnconstrained versus constrained optimizationGlobal and local optimizationStochastic and deterministic optimizationLinear/nonlinear/quadratic programming, Convex/nonconvexoptimizationLeast square problem, equation solvingsparse optimization, PDE-constrained optimization, robustoptimization