51
Efficient Computation of Combinatorial Skyline Queries Author: Yu-Chi Chung, I-Fang Su, and Chiang Lee Source: Information Systems, 38(2013), pp.369-387 Reporter: Yueh-Lin Lin 1

Efficient Computation of Combinatorial Skyline Queries Author: Yu-Chi Chung, I-Fang Su, and Chiang Lee Source: Information Systems, 38(2013), pp.369-387

Embed Size (px)

Citation preview

PowerPoint

Efficient Computation of Combinatorial Skyline QueriesAuthor: Yu-Chi Chung, I-Fang Su, and Chiang Lee

Source: Information Systems, 38(2013), pp.369-387

Reporter: Yueh-Lin Lin11OutlineIntroductionRelated WorkCombinatorial Skyline Query ProcessingThe Brute-Force MethodThe Decomposition Algorithm (DA)The Improved Decomposition Algorithm (IDA)Performance EvaluationConclusions2IntroductionThe skyline operator has received considerable attention from database communityImportance in numerous disciplinesData mining, multi-criteria decision making, and market analysis3Skyline Operator 3Skyline ExampleMercedes-Benz plans to increase automobile salesConsidering advertisingTV is the most effective mass-market advertising formatAdvertising cost and audience numberTries to find a best advertising slotCosts lower and higher number of customersThe slots that meet Benz need form a skyline4

Benz TVTV Benz Benz Skyline4Skyline Example5

Motivation66Combinations of Two Advertising Slots

7Combinatorial Skyline Query (CSQ)8

Observation9

Challenge10Related WorkAfter the skyline operatorMany algorithms are proposed for skyline query processingBBS, bitmap, etc.

Variations of the skylineSubspace skyline, k-dominate skyline, dynamic skyline, etc.

The concept of combination is not mentioned in previous workTop-k combinatorial skyline queries (DASFAA 2010)

11Problem12Strategy Desperately needed 12Combinatorial Skyline Query ProcessingThe Brute-Force Method1313The Brute-Force Method Example14

The Brute-Force Method Example

15The Decomposition Algorithm (DA)The brute-force method incurs high computation overhead since it enumerates all combinations.

The Decomposition AlgorithmTo find the combinatorial skyline tuples without enumerating all combinations1616DA Example

17

The Improved Decomposition Algorithm (IDA)18Enhanced Pruning Example

1919The Improved Decomposition AlgorithmExample

20Performance Evaluation21

Scalability with respect to Data SizeQuery Processing Time

22Scalability with respect to Data SizeQuery Processing Time23

Comparison on Real Dataset24The Real Dataset Processing TimeDimensionality

25The Real Dataset Processing TimeCardinality

26ConclusionsProposed a new type of queryThe combinatorial skyline query

Proposed two algorithmsDAIDA

The experimental results show IDA better than DA in all performance metrics27On Skyline GroupsAuthor: Nan Zhang, Chengkai Li, Naeemul Hassan, Sundaresan Rajasekaran, and Gautam Das

Source: IEEE Transactions on Knowledge and Data Engineering, Vol. 26, No. 4, April 2014, pp. 942-956

Reporter: Yueh-Lin Lin28OutlineIntroductionSkyline Group ProblemFinding Skyline GroupsTechniquesAlgorithmsExperimentsConclusionsComments29Motivation30 Group Skyline Group Skyline 30Challenge31 input 31Techniques32Input pruning tuple tuple group32Skyline Group Problem33Aggregate Functions34

A: attribute2nABCkK=2t1 t2 A1 aggregate vector 3 (SUM)34Finding Skyline Groups35Finding Skyline Groups36

36Techniques3737Output CompressionNumber of skyline groups may be large, many of them share the same aggregate vector

Main ideaTo storeNot all skyline groupsThe distinct skyline aggregate vectorsOne skyline group for each skyline vector38SUM attributeSUM Skyline group tuple MIN MAX tuple () 38Input Pruning39 K 2G1 T1 T5 T5 T2 T4 T T4

K TUPLES TUPLE K TUPLE 39Search Space Pruning: Anti-MonotonicityTo find and leverage two anti-monotonic properties for skyline search, analogy to the Apriori algorithmOrder-Specific Anti-Monotonic Property (OSM)SUM, MIN and MAXWeak Candidate-Generation Property (WCM)MIN and MAX

The challenge is to find anti-monotonic properties that hold for skyline searchThe main contribution is not about proving, but rather about finding the right ones that can effectively prune the search space.40If constraint c is violated, its further mining can be terminated Apriori Anti-monotonic Apriori 40AlgorithmDynamic Programming Algorithm Based on Order-Specific Property

Iterative Algorithm Based on Weak Candidate-Generation Property41Dynamic Programming Algorithm Based on Order-Specific Property42 Apriori 42Dynamic Programming Algorithm Based on Order-Specific Property43

ExperimentsThe algorithms implemented in C+

EnvironmentDell PowerEdge 2900 III serverLinux kernel 2.6.27-7Dual Quad-Core Xeon 2.0 GHz8GB RAM250 GB HDD in RAID5

44DatasetsNBA players (2009 season)512 tuples (players)5 attributes

Stocks (2009/12/31)35000 tuples (stocks)4 attributes

Synthetic data1-10 million tuples5 attributes45Aggregate Functions & Methods ComparedAggregate functionsSUM, MIN, and MAX

Two algorithms compared with baseline methodOrder-Specific Property (OSM)Weak Candidate-Generation Property (WCM)

46Comparison of Various Methods: SUM47

Effect of Input Pruning48

n=100 k=1 19 tuple pruning 48ConclusionsThe novel problem of computing skyline groups

The novel algorithmic techniquesOutput compressionInput pruningSearch space pruning

The experiments run the real and synthetic data sets to evaluate the proposed algorithms

49CommentsGroup skyline with constraintNBA teams have salary limits

Parallel computingMapReduce

50Skyline provides all interesting tuples for the user, but sometimes the results do not satisfy the user prefer. The user wishes the results are closer they need. Therefore, there is a skyline research called constrained skyline [2, 12]. The constrained skyline query returns all tuples that satisfy the constrained and these tuples are not dominated by any other tuples satisfy the constrained. Although group skyline have some study, but most of them didnt consider the constraint. In practice, we sometimes need to consider our constraint to decide which result is more suitable for us.For example, the NBA teams have to consider not only players performance but there salary, because when total salary more than a certain amount, the NBA will fine the luxury tax.The second comment is parallel computing. To compute the group skyline is a heavy task. There are no previous works computing skyline with parallel systems. My idea is using MapReduce to parallel compute the candidate groups to enhance computing efficiency. MapReduce is a programming model for parallel computing. contains four features: flexibility, scalability, efficiency and fault tolerance. These features make MapReduce becomes an effective method to parallel compute the data.50Q&A5151