35
1 CSE 480: Database Systems Lecture 22: Query Optimization Reference: Read Chapter 15.6 – 15.8 of the textbook

CSE 480: Database Systems

  • Upload
    coye

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

CSE 480: Database Systems. Lecture 22: Query Optimization. Reference: Read Chapter 15.6 – 15.8 of the textbook. Query Processing. A query is mapped into a sequence of operations SELECT E.Fname, E.Lname, W.Pno FROMEmployee E, Works_on W WHERE E.Salary > 50000 AND W.Hours > 40 - PowerPoint PPT Presentation

Citation preview

Page 1: CSE 480: Database Systems

1

CSE 480: Database Systems

Lecture 22: Query Optimization

Reference:

Read Chapter 15.6 – 15.8 of the textbook

Page 2: CSE 480: Database Systems

2

Query Processing

A query is mapped into a sequence of operations

SELECT E.Fname, E.Lname, W.PnoFROM Employee E, Works_on WWHERE E.Salary > 50000 AND W.Hours > 40

AND E.SSN = W.ESSN

E.Fname,E.Lname,W.Pno (E.Salary>50000 (Employee) E.SSN=W.ESSN W.Hours>40 (Works_on))

– Each execution of an operation produces a temporary result– Generating and saving temporary files on disk is time consuming

and expensive

Page 3: CSE 480: Database Systems

3

Combining Operations using Pipelining

Pipelining: – Avoid constructing temporary tables as much as possible.– Pass the result of a previous operator to the next without waiting

to complete the previous operation.

Without pipelining:1. Temp1 E.Salary>50000 (Employee)

2. Temp2 W.Hours>40 (Works_on)

3. Temp3 Temp1 E.SSN=W.ESSN Temp2

4. Result E.Fname,E.Lname,W.Pno (Temp3)

Pipelining: interleave the operations in steps 3 and 4

Page 4: CSE 480: Database Systems

4

Query Optimization

General query:SELECT TargetListFROM R1, R2, …, RN

WHERE Condition

– Naïve conversion: TargetList (Condition (R1 R2 ... RN))

Processing this query can be very expensive

Query optimizer does not actually “optimize” – It only tries to find a “reasonably efficient” evaluation strategy

Page 5: CSE 480: Database Systems

5

How Does Query Optimization Work?

Uses a combination of two approaches:– Heuristic rules (on query trees)– Cost-based estimation

Page 6: CSE 480: Database Systems

6

Query Tree

A relational algebra expression can be represented by a query tree

– Leaf nodes are input relations of the query– Internal nodes represent relational algebra operations

A query tree can be “executed”– Execute an internal node operation whenever its operands are

available and then replace the internal node by the relation that results from executing the operation

Page 7: CSE 480: Database Systems

7

Example

For every project located in ‘Stafford’, retrieve the project number, the controlling department number, and the department manager’s last name, address, and birth date

SQL:SELECT P.Pnumber, P.Dnum, E.Lname, E.Address, E.BdateFROM PROJECT P, DEPARTMENT D, EMPLOYEE EWHERE P.Dnum = D.Dnumber AND D.Mgr_ssn = E.SSN

AND P.Plocation = ‘Stafford’;

Relational algebra (naïve conversion):PNUMBER, DNUM, LNAME, ADDRESS, BDATE (

PLOCATION=‘STAFFORD’ AND DNUM=DNUMBER AND MGRSSN=SSN

( PROJECT DEPARTMENT EMPLOYEE))

Page 8: CSE 480: Database Systems

8

Example

PNUMBER, DNUM, LNAME, ADDRESS, BDATE (

PLOCATION=‘STAFFORD’ AND DNUM=DNUMBER AND MGRSSN=SSN

( PROJECT DEPARTMENT EMPLOYEE))

Leaf nodes are relations

Internal nodes are relational algebra operations

Expensive to process!

Page 9: CSE 480: Database Systems

9

Using Heuristics in Query Optimization

Parser generates an initial query tree representation (naïve conversion)

The task of heuristic optimization to find a final query tree that is efficient to execute by applying a set of heuristics rules

Page 10: CSE 480: Database Systems

10

Heuristics Rules for Query Optimization

Query:SELECT DISTINCT A1, A2, …, Ak FROM R1, R2, …, RN

WHERE Cond

STEP 0: Naïve conversion by the parser: A1,A2,…,Ak (Cond (R1 R2 ... RN))

R1

A1,A2,…,Ak

cond

R2

R3

RN

Page 11: CSE 480: Database Systems

11

Using Heuristics in Query Optimization

STEP 1: Break up any select operations with conjunctive conditions into a cascade of select operations

c1 AND c2 AND ... AND cn (R1 R2 ... RN)

= c1( c2(...( cn(R1 R2 ... RN) ) ) Note: Disjuncts cannot be split like the conjuncts. We can separate disjuncts by union but it may NOT be useful.

c1 OR c2 = c1 U c2 and the two selections can be done in any order cond1 AND cond2

R

cond2

R

cond1

Page 12: CSE 480: Database Systems

12

Using Heuristics in Query Optimization

STEP 2: Push SELECT operation as far down the query tree as permitted

– This is possible due to the commutativity of select operator with other operations

– Algebraic operations are independent of column positions in the table because they work on column names.

– AND conditions in selection can be separated and reordered. c1 ( c2(R)) = c2 ( c1(R)) A1, A2, ..., An ( c (R)) = c ( A1, A2, ..., An (R)) (assuming c is in Ai) c1 AND c2 ( R S ) = (c1 (R)) (c2 (S)) c1 AND c2 ( R S ) = (c1 (R)) (c2 (S)) c ( R S ) = (c (R)) (c (S)) c ( R S ) = (c (R)) (c (S)) c ( R – S ) = (c (R)) – (c (S))

Page 13: CSE 480: Database Systems

13

Using Heuristics in Query Optimization

STEP 2: Push SELECT operation as far down the query tree as permitted.

R S

R.A > 10

S.B < 50

R.A = S.B

R S

R.A > 10 S.B < 50

R.A = S.B

Page 14: CSE 480: Database Systems

14

Using Heuristics in Query Optimization

STEP 3: Rearrange binary operations– Position the leaf node relations with most restrictive SELECT

operations to the left of the query tree Most restrictive: produce relation with fewest tuples or smallest

selectivity

– Selectivity is the ratio of the number of records that satisfy the select () condition

– Based on commutativity and associativity of binary operations R C S = S C R; R x S = S x R ( R op S ) op T = R op ( S op T )

– where op is either , , , or

Page 15: CSE 480: Database Systems

15

STEP 3: Rearrange the binary operations

Using Heuristics in Query Optimization

T.C = 3

R S

R.A > 10 S.B < 50

T

T S

R.A > 10

S.B < 50

R T.C = 3

Page 16: CSE 480: Database Systems

16

Using Heuristics in Query Optimization

STEP 4: Combine CARTESIAN PRODUCT with SELECT operation into a JOIN operation

– ( C (R x S)) = (R C S)

R S

R.A=S.B

R S

R.A=S.B

Page 17: CSE 480: Database Systems

17

Using Heuristics in Query Optimization

STEP 5: Move PROJECT operation down the tree as far as possible by creating new PROJECT operations as needed

– Using cascading and commutativity of PROJECT operations List1 ( List2 (...( Listn(R))...) ) = List1(R) (cascade) A1, A2, ..., An ( c (R)) = c ( A1, A2, ..., An (R)) (commute with as long as

c is part of the projected attributes) L ( R C S ) = (A1, ..., An (R)) C ( B1, ..., Bm (S)) (commute with ) L ( R S ) = (A1, ..., An (R)) ( B1, ..., Bm (S)) (commute with ) L ( R S ) = ( L (R)) ( L (S)) (commute with )

Page 18: CSE 480: Database Systems

18

Using Heuristics in Query Optimization

STEP 5: Move PROJECT operation down the tree as far as possible by creating new PROJECT operations as needed

R.A=S.B

R S

R.C, S.D

R S

R.A=S.B

R.A, R.C S.B, S.D

R.C, S.D

Page 19: CSE 480: Database Systems

19

Example

Find the last names of employees born after 1957 who work on a project named ‘Aquarius’

SELECT LNAMEFROM EMPLOYEE, WORKS_ON, PROJECTWHERE PNAME = ‘AQUARIUS’ AND PNUMBER=PNO

AND ESSN=SSN AND BDATE > ‘1957-12-31’;

Page 20: CSE 480: Database Systems

20

Example

Moving SELECT operations down the query tree

Page 21: CSE 480: Database Systems

21

Example

Apply most restrictive SELECT operation first

Page 22: CSE 480: Database Systems

22

Example

Replace Cartesian Product and Select with Join operations

Page 23: CSE 480: Database Systems

23

Example

Moving PROJECT operations down the query tree

Page 24: CSE 480: Database Systems

24

Cost-based Query Optimization

Estimate and compare the costs of executing a query using different execution strategies and choose the strategy with the lowest cost estimate.

Example query:SELECT Pnumber, Dnum, Lname, Address,

BdateFROM PROJECT, DEPARTMENT, EMPLOYEEWHERE Dnum = Dnumber AND Mgr_ssn = Ssn

AND Plocation = ‘Stafford’

Page 25: CSE 480: Database Systems

25

Example

Need to estimate the cost for performing each relational algebra operation using different access paths and query processing methods

Page 26: CSE 480: Database Systems

26

Cost-based Query Optimization

System Catalog Information– Information about the size of a file

number of records (tuples) (r), record size (R), number of blocks (b) blocking factor (bfr) number of records per block

– Information about indexes and indexing attributes of a file Number of levels (x) of each multilevel index Number of first-level index blocks (bI1) Number of distinct values (d) of an attribute Selectivity (sl) of an attribute

Page 27: CSE 480: Database Systems

27

Example (System Catalog)

Page 28: CSE 480: Database Systems

28

Example

Plocation = ‘Stafford’ (PROJECT)– Table scan (Plocation is not primary key)

Cost = 100– PROJ_PLOC Index (number of levels, x = 2)

Selectivity = 1/200 (assuming uniformly distributed) Selection cardinality = Selectivity * Num_rows = 10 blocks Cost = 2 + 10 = 12

Page 29: CSE 480: Database Systems

29

Example

Cost for Plocation = ‘Stafford’ (PROJECT) DEPARTMENT– No index available to process the join– We use the nested loop join

Page 30: CSE 480: Database Systems

30

Example

Nested loop join TEMP1 Dnum=Dnumber DEPARTMENT – TEMP1: result of Plocation = ‘Stafford’ (PROJECT)

Estimated number of rows = 2000/200 = 10 Blocking factor = 2000/100 = 20 tuples/block So, number of blocks needed = 1

– DEPARTMENT number of blocks needed= 5

Page 31: CSE 480: Database Systems

31

Example

Nested loop join TEMP1 Dnum=Dnumber DEPARTMENT – Use TEMP1 in outer loop for nested-loop join

Cost = 1 + 5 + cost to write join output into TEMP2 = 6 + cost to write join output into TEMP2

– What is the cost for writing join output? Each row in TEMP1 joins exactly 1 row in DEPARTMENT Estimated number of rows in TEMP2 = 10 (join attribute dno is

the key of department. So we assume there are 10 joined records)

Estimated blocking factor = 5 (from estimated record size) Number of blocks needed = 2

Page 32: CSE 480: Database Systems

32

Example

Cost for TEMP2 Mgr_ssn=Ssn EMPLOYEE

TEMP2

Page 33: CSE 480: Database Systems

33

Example

Nested loop join TEMP2 Mgr_ssn=Ssn EMPLOYEE– Primary index (EMP_SSN) available for Ssn in EMPLOYEE– Can use single-loop join on TEMP2 (see lecture 21)

For each row in TEMP2, use primary index to retrieve corresponding rows in EMPLOYEE

Cost = 2 + 10 (1 + 1 + 1) + cost of output = 32 + cost of output

Page 34: CSE 480: Database Systems

34

Example

Use pipelining to produce the final result – So, no additional cost for projection– Total cost = 12 + 1 + 6 + 2 + 32 + cost of writing final output

Page 35: CSE 480: Database Systems

35

Query Execution Plan

An execution plan consists of a combination of – The relational algebra query tree and– Information about how to compute the relational operators in the

tree Based on the access paths and algorithms available

Generated by the query optimizer module in DBMS– A code generator then generates the code to execute the plan– Finally, the runtime database processor will execute the code (either in

compiled or interpreted mode) to produce the query result