The Data Stream Space Complexity of Cascaded Norms

The Data Stream Space Complexity of Cascaded Norms

T.S. JayramDavid Woodruff

IBM Almaden

Data streams Algorithms access data in a sequential fashion

One pass / small space Need to be randomized and approximate [FM, MP, AMS]

Algorithm MainMemory

2 3 4 16 0 100 5 4 501 200 401 2 3 6 0

Frequency Moments and Norms Stream defines updates to a set of

items 1,2,…,d. fi = weight of item i positive-only vs. turnstile model

k-th Frequency Moment Fk = i |fi|k

p-th Norm: Lp = kfkp = (i |fi|p)1/p

Maximum frequency: p=1 Distinct Elements: p=0 Heavy hitters Assume length of stream and

magnitude of updates is · poly(d)

Classical Results Approximating Lp and Fp is the same

problem

For 0 · p · 2, Fp is approximable in O~(1) space (AMS, FM, Indyk, …)

For p > 2, Fp is approximable in O~(d1-2/p) space (IW)

this is best-possible (BJKS, CKS)

Cascaded Aggregates Stream defines updates to pairs of items

in {1,2,…n} x {1,2,…,d} fij = weight of item (i,j)

Two aggregates P and Q0BBB@

f 11 f 12 : : : f 1df 21 f 22 : : : f 2d... ... ... ...

f n1 f n2 : : : fnd

1CCCA

Q PP ± Q

P ± Q = cascaded aggregate

0BBB@

Q(Row1)Q(Row2)

...Q(Rown)

1CCCA

Motivation Multigraph streams for analyzing IP

traffic [Cormode-Muthukrishnan] Corresponds to P ± F0 for different P’s F0 returns #destinations accessed by

each source Also introduced the more general

problem of estimating P ± Q Computing complex join estimates Product metrics [Andoni-Indyk-Krauthgamer] Stock volatility, computational

geometry, operator norms

k

n

n1-2/k d

1

k=p

0 1 2 1

0

1

2

p

n1-2/k d1-2/p

n1-1/k

£(1)

?

£(1)

d1-2/p d

n1-1/k

The Picture

Estimating Lk ± Lp

We give a 1-pass O~(n1-2/kd1-2/p) space algorithm when k ¸ p

We also provide a matching lower bound based on multiparty disjointness

We give the (n1-1/k) bound for Lk ± L0 and Lk ± L1

Õ(n1/2) for L2 ± L0 without deletions [CM]Õ(n1-1/k) for Lk ± Lp for any p in {0,1} in turnstile [MW][Ganguly] (without

deletions)Follows from techniques of[ADIW] Our upper

bound

Our Problem: Fk ± Fp

Fk ± Fp (M) = i (j |fij|p)k

= i Fp(Row i)k

0BBB@

f 11 f 12 : : : f 1df 21 f 22 : : : f 2d... ... ... ...

f n1 f n2 : : : f nd

1CCCAM =

High Level Ideas: Fk ± Fp1. We want the Fk-value of the vector

(Fp(Row 1), …, Fp(Row n))

2. We try to sample a row i with probability / Fp(Row i)

3. Spend an extra pass to compute Fp(Row i)

4. Could then output Fp(M) ¢ Fp(Row i)k-1

(can be seen as a generalization of [AMS])

How do we do the sampling efficiently??

Review – Estimating Fp [IW] Level sets:

Level t is good if |St|(1+ε)2t ¸ F2/B

Items from such level sets are also good

St = f i j (1+ ²)t · jf i j · (1+ ²)t+1g

²-Histogram [IW] Finds approximate sizes s’t of level

sets For all St, s’t · (1+ε)|St| For good St, s’t ¸ (1- ε)|St|

Also provides O~(1) random samples from each good St

Space: O~(B)

Sampling Rows According to Fp value Treat n x d matrix M as a vector:

Run ε-Histogram on M for certain B Obtain (1§ε)-approximation st’ to |St| for good t Fk ± Fp(M’) ¸ (1-ε) Fk ± Fp(M), where M’ is M

restricted to good items (Holder’s inequality)

To sample, Choose a good t with probability

st’(1+ε)pt/Fp’(M),where Fp’(M) = sumgood t st’ (1+ε)pt

Choose random sample (i, j) from St Let row i be the current sample

Pr[row i] = t [st’(1+ε)pt/Fp’(M)]¢[|St Å row i|/|St|] ¼ Fp(row i)/Fp(M) Problems

1. High level algorithm requires many samples (up to n1-1/k) from the St, but [IW] just gives O~(1).

Can’t afford to repeat in low space

2. Algorithm may misclassify a pair (i,j) into St when it is in St-1

High Level Ideas: Fk ± Fp1. We want the Fk-value of the vector

(Fp(Row 1), …, Fp(Row n))

2. We try to sample a row i with probability / Fp(Row i)

3. Spend an extra pass to compute Fp(Row i)

4. Could then output Fp(M) ¢ Fp(Row i)k-1

(can be seen as a generalization of [AMS])

How do we avoid an extra pass??

Avoiding an Extra Pass Now we can sample a Row i / Fp(Row i)

We design a new Fk-algorithm to run on(Fp(Row 1), …, Fp(Row n))

which only receives IDs i with probability / Fp(Row i)

For each j 2 [log n], algorithm does:1. Choose a random subset of n/2j rows2. Sample a row i from this set with Pr[Row i] / Fp(Row i)

We show that O~(n1-1/k) oracle samples is enough to estimate Fk up to 1§ε

New Lower Bounds

Alice Bobn x d matrix A n x d matrix B

NO instance: for all rows i, ¢(Ai, Bi) · 1

YES instance: there is a unique row j for which¢(Aj, Bj) = d, and for all i j, ¢(Ai, Bi) · 1

We show distinguishing these cases requires (n/d) randomized communication CC

Implies estimating Lk(L0) or Lk(L1) needs (n1-1/k) space

Information Complexity Paradigm [CSWY, BJKS]: the information cost IC is the

amount of information the transcript reveals about the inputs

For any function f, CC(f) ¸ IC(f)

Using their direct sum theorem, it suffices to show an (1/d) information cost of a protocol for deciding if ¢(x,y) = d or ¢(x,y) · 1

Caveat: distribution is only on instances where ¢(x,y) · 1

Working with Hellinger Distance Given the prob. distribution vector ¼(x,y) over transcripts of an

input (x,y) Let Ã(x,y)¿ = ¼(x,y)¿

1/2 for all ¿

Information cost can be lower bounded by ¢(u,v) = 1 kÃ(u,u) - Ã(u,v)k2

Unlike previous work, we exploit the geometry of the squared Euclidean norm (useful in later work [AJP])

Short diagonals property:¢(u,v) = 1 kÃ(u,u) - Ã(u,v)k2 ¸ (1/d) ¢(u,v) = d kÃ(u,u) - Ã(u,v)k2

a

bc

d

ef

a2 + b2 + c2 + d2 ¸ e2 + f2

Open Problems Lk ± Lp estimation for k < p

Other cascaded aggregates, e.g. entropy

Cascaded aggregates with 3 or more stages

Documents

The Data Stream Space Complexity of Cascaded Norms