40
1 Shooting Stars in the Sk y:An Online Algorithm fo r Skyline Queries 作作Donald Kossmann Frank Ramsak Steffen Rost 作作 作作作

Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

  • Upload
    roger

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Shooting Stars in the Sky:An Online Algorithm for Skyline Queries. 作者: Donald Kossmann Frank Ramsak Steffen Rost 報告:黃士維. Outline. Introduction An Online Algorithm for two-dimention Skylines The NN Algorithm for d-dimention Skylines Performance Experiments Conclusion. - PowerPoint PPT Presentation

Citation preview

Page 1: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

1

Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

作者: Donald Kossmann Frank Ramsak

Steffen Rost報告:黃士維

Page 2: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

2

Outline

Introduction

An Online Algorithm for two-dimention Skylines

The NN Algorithm for d-dimention Skylines

Performance Experiments

Conclusion

Page 3: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

3

Introduction

Skyline

Bold points

Skyline Queries

Online Skyline Computation

Related work

Page 4: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

4

Introduction-Skyline

Skyline queries ask for a set of interesting points from a potentially large set of data points.-ex:a cheaper,nearer,better food hotel.

Skyline point-also called “bold point”-a point dominate another point in at least one dimension.

Page 5: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

5

Introduction- Bold points

The bold points respresent this hotels which part of Skyline.

Page 6: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

6

Skyline Queries

Skyline queries can involve more than 2 dimensions and they could depend on current position of a user.

-to find a near,good food,cheap restaurant,but the user moves in the same time.

If the user moves on,the Skyline should be re-computed for this user.

Page 7: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

7

Online Skyline Computation-1

An online algorithm would probably take much longer than a batch-oriented algorithm to produce the full Skyline,but an online algorithm would produce a subset of the Skyline very quickly.

Page 8: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

8

Online Skyline Computation-2

Six properties of an online algorithm:1. The first result should be return instantaneously.

2. The algorithm should produce the full Skyline.

3. The algorithm should only return pionts which are part of the Skyline.

4. The algorithm should be fair.

5. The algorithm should be well-controled.

6. The algorithm should be universal in Skyline queries and DB.

Page 9: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

9

Related work

All these algorithms work in a batch-oriented way. Tan et al. proposed two progressive Skyline

algorithms.- Bitmap algorithm, the order in which Skyline points are returned depends on the clustering of the data.-B-tree algorithm, the order in which points are returned depends on the value distribution of the data.

Both algorithms require that all interesting properties are materialized in the database.

Page 10: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

10

An Online Algorithm for two-dimention Skylines

Three asumptions

Basic Observations

An example

Discussion

Page 11: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

11

Three asumptions

All values are positive real numbers.

There are no duplicates in the data set.

We try to fine minimal points.

Page 12: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

12

Basic Observations

D is 2-dimentional data set.

n∈ D ,is a nearest neighbor of O = (0, 0) according to f . - n is in the Skyline

Observation 1:

n must be part of the Skyline.

Observation 2:

n is in the Skyline of D.

Page 13: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

13

Basic Observations

If we partition the data set, then it is sufficient to look for Skyline points using nearest neighbor search in each region separately.

This observation gives rise to a divide & conquer algorithm using nearest neighbor search.

Page 14: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

14

An example-first step

Region 1:corner sign,smaller y.

Region 2:circle sign,smaller x,

Region 3:triangle sign,greater x,y.

Page 15: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

15

An example-second stepDivide & conquer algorithm.

In this way, the algorithm continues until all Skyline points are retrieved.

Page 16: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

16

Algorithm Description

Page 17: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

17

Discussion-1

Six requirements:We believe that Skyline queries that involve two dimensions are the most common case.

The algorithm will find all points of the Skyline.

All nearest neighbors found by the NN algorithm are part of the Skyline.

Page 18: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

18

Discussion-2

The NN algorithm produces results from the whole range of results very quickly.

The NN algorithm can continue to produce Skyline points using the new distance function without any adaptions.

The NN algorithm is universal.

Page 19: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

19

The NN Algorithm for d-dimention Skylines

Introduction

Laisser-faire

Propagate

Merge

Fine-grained Partitioning

Hybrid Approaches

Page 20: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

20

Introduction-1

Find NN and compute

Page 21: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

21

Introduction-2

Two observations from Section 2.1 can be generalized to three and more dimensions.

Region 1 (xn, ∞, ∞)

Region 2 (∞, yn, ∞)

Region 3 (∞, ∞, zn)

Etc…

p will be produced by the NN algorithm twice.

Page 22: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

22

Laisser-faire

If the nearest neighbor p is found in the hash table, it is a duplicate and it is not output.

The big disadvantage of this algorithm is that it results in a great deal of wasted work.

Page 23: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

23

Propagate

We can do this by “propagation”.Whenever the NN algorithm finds a Skyline point, it scans the whole to-do list in order to find those regions in the to-do list that contain that point. The big advantage of this technique is that it completely avoids wasted work to find duplicates

Page 24: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

24

Merge

Assume that two regions:a = (a1, a2, . . ., ad ) b = (b1, b2, . . ., bd) are in the to-do list.

We can merge these two regions into a single region:a b = (max(a1, b1), max(a2, b2), . . . , max⊕(ad, bd)) A particular situation arises, if a supersedes b , a

= a b and thus b can be simply discarded from th⊕e to-do list.

Page 25: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

25

Fine-grained Partitioning

In Figure 5, for instance, we could partition into 8 nonoverlapping regions of which 6 regions would be relevant for further processing.

Implementing this approach, however, results in a sharp growth of the number of regions in the to-do list.

Page 26: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

26

Hybrid Approaches

As mentioned above, merge can be combined with both the laisser-faire and the propagate approach.

Another option would be to start and propagate duplicates until the to-do list has reached a certain size.

Page 27: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

27

Performance Experiments

Experimental Environment

Points in database

Variant algorithms

Two-dimensional Skyline Queries

Quality of Results

Comparing Algorithm Variants

Page 28: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

28

Experimental Environment

167MHz processor

128MB of main memory

OS:Solaris 8

17.1G HD

Program language: C++

Page 29: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

29

Points in database

100,000 points VS 1million pointsPoints are generated using one of the following three value distributions:

Corr:points which are good in one dimension tend to be good in otherdimensions, too. Anti:points which are good in one dimension a

re bad in at least one other dimension. Indep:points are generated using a uniform dis

tribution.

Page 30: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

30

Variant algorithms

NN : R-tree [BKSS90] in order to carry out nearest neighbor search. D&C : Kung’s divide & conquer algorithm extended by

m-way partitioning and “Early Sky-line”. BNL : The block-nested-loops algorithm with a self-

organizing list. Bitmap : scans the database and uses bitmaps in order to

detect whether a point is part of the Skyline. B-tree : used a light-weight implementation of this

algorithm that does not require the use of extended B-trees.

Page 31: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

31

Two-dimensional Skyline Queries

Page 32: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

32

Two-dimensional Skyline Queries

It can be seen that the NN algorithm scales best.

The general trends were the same:(1) The NN algorithm was the clear overall winner.

(2) The Bitmap algorithm was in the same ball park as the BNL and D&C algorithms.

(3) The B-tree algorithm showed good performance for the correlated and independent databases, but terrible performance for the anti-correlated database.

Page 33: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

33

Quality of ResultsFigure 6a shows the full Skyline for the anti-

correlated database for d = 2.

Figure 6b shows the first 10 points returned by the NN algorithm (without user interaction).

Figure 6c shows the first 10 points returned by the B-tree algorithm. It can be seen that the NN algorithm gives a good big picture of the Skyline.

Page 34: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

34

Quality of ResultsComputing the full Skyline takes a long time,rega

rdless what algorithm is used.

an online algorithm is particularly important for interactive applications in this scenario:

(1) toselect relevant points from the Skyline (rather than flooding the user with results)

(2) to give the firstanswers quickly.

Page 35: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

35

Quality of Results

B-tree algorithm produces the results very quickly.If the user gives preferences, the B-tree algorithm

cannot adapt well.The Bitmap algorithm is not competitive in either

aspect: (1) It produces Skyline points at approximately the

same rate as the NN algorithm.(2) It is no match to the NN algorithm in terms of

interactivity because it produces Skyline points in a somewhat random order (based on the clustering of the data).

Page 36: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

36

Comparing Algorithm Variants

Figure 7 shows the performance of four variants for a 5-dimensional Skyline with an anti-correlated database and 1 million points.

measured the following variants:• Laisser-faire

• Propagate

• Merge:laisser-faire and merge regions whenever a duplicate is found.

• Hybrid: propagate to the first 20 % entries of the to-do list.

Page 37: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

37

Comparing Algorithm Variants

Page 38: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

38

Conclusion

Skyline queries are important for several database application.All algorithms have their particular virtues.The NN algorithm is the only algorithm that gives the us

er control over the process and allows the user to give preferences.The B-tree algorithm gives “extreme” points preference

(i.e., points good in one dimension) and returns points which are good in many dimensions very late.The Bitmap algorithm scans the database and uses Bitm

aps in order to detect whether a point is part of the Skyline.

Page 39: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

39

Conclusion(cont.)

future work:

First, improve the performance of the NN algorithm by exploiting new techniques for nearest neighbor search.

Second, we would like to investigate special-purpose main-memory index structures

Third, we would like to investigate how the NN algorithm can be combined with other algorithms.

Page 40: Shooting Stars in the Sky:An Online Algorithm for Skyline Queries

40

Thanks for your listening!!