39
Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter: Yang Ying-Chia 楊楊楊 (R01922018) CSIE, National Taiwan University

Evaluating Window Joins over Unbounded Streams

  • Upload
    taariq

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

Evaluating Window Joins over Unbounded Streams. Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter : Yang Ying-Chia 楊 應 甲 ( R01922018) CSIE, National Taiwan University. Outline. Abstract Background Introduction Related Work - PowerPoint PPT Presentation

Citation preview

Page 1: Evaluating Window Joins over Unbounded Streams

Evaluating Window Joins over Unbounded Streams

Author: Jaewoo Kang, Jeffrey F. Naughton, Strat is D. Viglas

University of Wisconsin-Madison CS Dept.

Presenter: Yang Ying-Chia 楊應甲 (R01922018)CSIE, National Taiwan University

Page 2: Evaluating Window Joins over Unbounded Streams

2

Outline• Abstract• Background• Introduction• Related Work• Estimating the Cost of Moving Window Joins• On Maximizing the Efficiency of Processing Joins• Conclusion

Page 3: Evaluating Window Joins over Unbounded Streams

3

Abstract – Problem and Solution• Problem: Process joins over unbounded

streams.• Solution: Moving Window Join

• Queries have “window predicates”

Page 4: Evaluating Window Joins over Unbounded Streams

4

Abstract – Central Point of the Thesis• The paper proposes a unit-time-basis

cost model for evaluating moving window joins.

• Using this cost model, it proposes strategies for maximizing the efficiency of processing joins in different scenarios.

Page 5: Evaluating Window Joins over Unbounded Streams

5

• Abstract• Background• Introduction• Related Work• Estimating the Cost of Moving Window Joins• On Maximizing the Efficiency of Processing Joins• Conclusion

Page 6: Evaluating Window Joins over Unbounded Streams

6

Background• Join • Nested Loops Join (NLJ)• Hash Join (HJ)• Moving Window Join

Page 8: Evaluating Window Joins over Unbounded Streams

8

Background – Nested Loops Join (NLJ)

Page 9: Evaluating Window Joins over Unbounded Streams

9

Background – Hash Join (HJ)

Page 10: Evaluating Window Joins over Unbounded Streams

10

Background – Moving Window Join

Page 11: Evaluating Window Joins over Unbounded Streams

11

Background – Moving Window Join • Instead of saying we want to join all

tuples of A and B, we say we want to join all tuples that have arrived on A in the last t1 seconds with all the tuples that have arrived on S in the last t2 seconds.

Page 12: Evaluating Window Joins over Unbounded Streams

12

• Abstract• Background• Introduction• Related Work• Estimating the Cost of Moving Window Joins• On Maximizing the Efficiency of Processing Joins• Conclusion

Page 13: Evaluating Window Joins over Unbounded Streams

13

Introduction – Questions1. How can we measure the efficiency of a moving window

join evaluation strategy, since the traditional metric of execution time to completion does not apply?

2. Can an algorithm for a moving window join take advantage of asymmetries in the rates of the input streams?

3. How can we deal with cases in which an input stream is so fast that the system cannot keep up?

4. If memory is the bottleneck, how should we allocate memory between the two windows for the two inputs?

Page 14: Evaluating Window Joins over Unbounded Streams

14

Introduction – The Three Scenarios• One stream is much faster than the

other.• System resources are insufficient to

keep up with the input streams.• Memory is limited.

Page 15: Evaluating Window Joins over Unbounded Streams

15

• Abstract• Background• Introduction• Related Work• Estimating the Cost of Moving Window Joins• On Maximizing the Efficiency of Processing Joins• Conclusion

Page 16: Evaluating Window Joins over Unbounded Streams

16

Related Work • Predicate grouping and group optimization

techniques• Adaptive query processing and query scrambling• Symmetric Hash Join and symmetric nested loops

join• Diag-Join for data warehouse environment• Rate based streaming query optimization

framework

Page 17: Evaluating Window Joins over Unbounded Streams

17

• Abstract• Background• Introduction• Related Work• Estimating the Cost of Moving Window Joins• On Maximizing the Efficiency of Processing Joins• Conclusion

Page 18: Evaluating Window Joins over Unbounded Streams

18

Estimating the Cost of Moving Window Joins• Cost model

• Cost of a single join operation

Page 19: Evaluating Window Joins over Unbounded Streams

19

Cost of Nested Loop Join A to B

Number of tuples accessed in a time unitCost of accessing a single tuple

Number of tuples accessed to search for matched in window B

Number of tuples insert and invalidation

Page 20: Evaluating Window Joins over Unbounded Streams

20

Cost of Hash Join A to B

Cost of probe(b) and invalidate(b) is a function of the hash bucket size in window B

Cost of accessing a single tuple in a specific hash table implementation

Page 21: Evaluating Window Joins over Unbounded Streams

21

Cost of Full Join• Symmetric Join

• HHJ, NNJ

Page 22: Evaluating Window Joins over Unbounded Streams

22

Cost of Full Join• Asymmetric Join

• HNJ

Page 23: Evaluating Window Joins over Unbounded Streams

23

Cost Curves for Full Joinsσa = 1/|A| = 1/Nkey(A)σb = 1/|B| = 1/Nkey(B)

Page 24: Evaluating Window Joins over Unbounded Streams

24

Observation from the Previous Graphs• When input streams’ speed difference is

minimal, HJ outperforms every other join combinations.

• As the speed gap increases, the cost of HJ increases considerably and exceeds that of HNJ at around 70 tuples/sec and 140 tuples/sec.

• Here we have a performance crossover point.

Page 25: Evaluating Window Joins over Unbounded Streams

25

Estimating the Weight Factors• The crossover points can be calculated by

equating the two cost formulas

• For two given streams, we can determine when NLJ will outperform HJ, depending on the ratio of the arrival of the input streams.

Page 26: Evaluating Window Joins over Unbounded Streams

26

• Abstract• Background• Introduction• Related Work• Estimating the Cost of Moving Window Joins• On Maximizing the Efficiency of Processing Joins• Conclusion

Page 27: Evaluating Window Joins over Unbounded Streams

27

Recall the three scenarios• One stream is much faster than the

other.• System resources are insufficient to

keep up with the input streams.• Memory is limited.

Page 28: Evaluating Window Joins over Unbounded Streams

28

Exploiting Asymmetry in Input Streams Speed• Assumptions:

• The two time windows are fixed.• The aggregate speed of two streams is less than

the system’s service rate μ (i.e., λa + λb < μ ).• The following inequality determines the

likely winner between NLJ and HJ:• If inequality holds,

NLJ will outperform HJ; otherwise, HJ outperforms NLJ.

Page 29: Evaluating Window Joins over Unbounded Streams

29

Graphs to Prove the Previous Hypothesis

Page 30: Evaluating Window Joins over Unbounded Streams

30

Observation from the Previous Graphs• HHJ costs the least until the input rate

reaches about 70 tuples/sec; then HNJ takes over. Hence, either HHJ or HNJ is the winner.

• Both hash join output rates decrease drastically after passing their thrashing point.

Page 31: Evaluating Window Joins over Unbounded Streams

31

Maximizing the Number of Result Tuples with Limited Computing Resources• This scenario arises under the following

conditions:• System evaluates very expensive predicates• The input stream’s speed is faster than the join

operator’s service rate, i.e., λa + λb > μ.• Hence, not all answer tuples can be

generated and input streams need to be “regulated”.

• But, what policy?

Page 32: Evaluating Window Joins over Unbounded Streams

32

Performance Comparison between Policies

• The winner is the equal distribution strategy!

• Regardless of time window sizes and window selectivity factors.

Page 33: Evaluating Window Joins over Unbounded Streams

33

Maximizing the Number of Result Tuples with Limited Memory• Assumption:

• The two time window sizes can be adjusted to fully utilize available memory.

• The two arrival rates are constant.• Hence, memory allocation strategies are

necessary. But, what policy? Will equal distribution win again?

Page 34: Evaluating Window Joins over Unbounded Streams

34

Performance Comparison between Policies

• The winner is the Max A strategy, which allocates all memory to the slower stream.

• Keep the slower stream in memory and let the faster one probe against it and pass by.

Page 35: Evaluating Window Joins over Unbounded Streams

35

Maximizing the Number of Result Tuples with Limited Memory• Another assumption:

• Variable time windows• Variable arrival rates

Page 36: Evaluating Window Joins over Unbounded Streams

36

Performance Comparison between Policies

• The best policy is either maximizing stream A’s time window in conjunction with maximizing B’s arrival rate, or we can maximize B’s time window and A’s arrival rate alternatively.

Page 37: Evaluating Window Joins over Unbounded Streams

37

• Abstract• Background• Introduction• Related Work• Estimating the Cost of Moving Window Joins• On Maximizing the Efficiency of Processing Joins• Conclusion

Page 38: Evaluating Window Joins over Unbounded Streams

38

Conclusion• A unit-time basis model to analyze expected

performance of moving window joins is introduced. • The proposed cost-model divides the join cost into

two independent terms, each corresponding to one of the two join directions.

• This work can be extended to have a cost model beyond single joins and for full query plans.

• Other algorithms apart from NLJ and HJ can be modeled and evaluated.

Page 39: Evaluating Window Joins over Unbounded Streams

The EndThanks for your attention