52
Cost Model Cost Model and and Estimating Result Sizes Estimating Result Sizes

Cost Model and Estimating Result Sizes

  • Upload
    javen

  • View
    32

  • Download
    1

Embed Size (px)

DESCRIPTION

Cost Model and Estimating Result Sizes. מודל המחיר Cost Model. בהרצאה הראנו איך לחשב את המחיר של כל שיטה (join) כדי לעשות זאת צריך לדעת את גודל היחסים, שחלקם מתקבלים כתוצאות ביניים לפיכך, יש צורך לחשב את הגודל של תוצאות ביניים עכשיו נסביר איך מעריכים את גודל התוצאה. - PowerPoint PPT Presentation

Citation preview

Page 1: Cost Model and  Estimating Result Sizes

Cost ModelCost Modelandand

Estimating Result Sizes Estimating Result Sizes

Page 2: Cost Model and  Estimating Result Sizes

המחיר המחיר מודל מודלCost ModelCost Model

כל • של המחיר את לחשב איך הראנו בהרצאה

( join)שיטה

• , היחסים גודל את לדעת צריך זאת לעשות כדי

ביניים כתוצאות מתקבלים שחלקם

תוצאות, • של הגודל את לחשב צורך יש לפיכך

ביניים

התוצאה • גודל את מעריכים איך נסביר עכשיו

Page 3: Cost Model and  Estimating Result Sizes

בחירת תוכנית לחישובבחירת תוכנית לחישוב צירוף של שלושה יחסים צירוף של שלושה יחסים

• : היחסים שלושת של צירוף לחשב - Reserves , Sailorsרוצים ו Boats

הצירוף ) • בפעולת היחסים מסדר התעלמות תוך האפשרויות שתי

: הנן( הראשונה

(Sailors Reserves) Boats

Sailors (Reserves Boats)

איזה • בשאלה היתר בין תלויה יותר הזולה התוכנית מהי ההחלטה

יותר קטנה הנה ביניים תוצאת

Page 4: Cost Model and  Estimating Result Sizes

התוצאות גודל של התוצאות אנליזה גודל של אנליזה

הצירוף • של התוצאה גודל את להעריך צריך

( Sailors Reserves) גודל לעומת

הצירוף של Reserves )התוצאה

Boats)

היחסים DBMSה- • לגבי סטטיסטיקות שומר

והאינדקסים

Page 5: Cost Model and  Estimating Result Sizes

Statistics Maintained by Statistics Maintained by DBMSDBMS

• Cardinality: Number of tuples NTuples(R) in each relation R

• Size: Number of pages NPages(R) in each relation R

• Index Cardinality: Number of distinct key values NKeys(I) for each index I

• Index Size: Number of pages INPages(I) in each index I

• Index Height: Number of non-leaf levels IHeight(I) in each B+ Tree index I

• Index Range: The minimum value ILow(I) and maximum value IHigh(I) for each index I

Page 6: Cost Model and  Estimating Result Sizes

NNoteote

• The statistics are updated periodically

(not every time the underlying

relations are modified).

• We cannot use the cardinality for

computing

select count(*)

from R

Page 7: Cost Model and  Estimating Result Sizes

Estimating Result SizesEstimating Result Sizes

• Consider

• The maximum number of tuples is the product of the cardinalities of the relations in the FROM clause

• The WHERE clause is associating a reduction factor with each term. It reflects the impact of the term in reducing result size.

SELECT attribute-list

FROM relation-list

WHERE term1 and ... and termn

Page 8: Cost Model and  Estimating Result Sizes

Result SizeResult Size

• Estimated result size:

maximum size

X

the product of the reduction factors

Page 9: Cost Model and  Estimating Result Sizes

AssumptionsAssumptions

• There is an index I1 on R.Y and index

I2 on S.Y

• Containment of value sets: if

NKeys(I1)<NKeys(I2) for attribute Y,

then every Y-value of R will be a Y-

value of S

Page 10: Cost Model and  Estimating Result Sizes

Estimating Reduction Estimating Reduction FactorsFactors

• column = value: 1/NKeys(I) – There is an index I on column.

– This assumes a uniform distribution.

– Otherwise, use 1/10.

• column1 = column2: 1/Max(NKeys(I1),NKeys(I2)) – There is an index I1 on column1 and an index I2 on column2.

– Containment of value sets assumption

– If only one column has an index, we use it to estimate the value.

– Otherwise, use 1/10.

Page 11: Cost Model and  Estimating Result Sizes

Estimating Reduction Estimating Reduction FactorsFactors

• column > value:

(High(I)-value)/(High(I)-Low(I)) if there

is an index I on column.

Page 12: Cost Model and  Estimating Result Sizes

ExampleExample

• Cardinality(R) = 100,000

• Cardinality(S) = 40,000

• NKeys(Index on R.agent) = 100

• High(Index on Rating) = 10, Low = 0

Reserves (sid, agent), Sailors(sid, rating)

SELECT *

FROM Reserves R, Sailors S

WHERE R.sid = S.sid and S.rating > 3 and

R.agent = ‘Joe’

Page 13: Cost Model and  Estimating Result Sizes

Example (cont.)Example (cont.)

• Maximum cardinality: 100,000 * 40,000

• Reduction factor of R.sid = S.sid: 1/40,000 – sid is a primary key of S

• Reduction factor of S.rating > 3: (10–3)/(10-0) = 7/10

• Reduction factor of R.agent = ‘Joe’: 1/100

• Total Estimated size: 700

Page 14: Cost Model and  Estimating Result Sizes

Database TuningDatabase Tuning

Page 15: Cost Model and  Estimating Result Sizes

Database TuningDatabase Tuning

• Problem: Make database run efficiently

• 80/20 Rule: 80% of the time, the

database is running 20% of the queries

– find what is taking all the time, and tune

these queries

Page 16: Cost Model and  Estimating Result Sizes

SolutionsSolutions

• Indexing

– this can sometimes degrade performance.

why?

• Tuning queries

• Reorganization of tables; perhaps

"denormalization"

• Changes in physical data storage

Page 17: Cost Model and  Estimating Result Sizes

DenormalizationDenormalization

• Suppose you have tables:– emp(eid, ename, salary, did)

– dept(did, budget, address, manager)

• Suppose you often ask queries which require finding the manager of an employee. You might consider changing the tables to:– emp(eid, ename, salary, did, manager)

– dept(did, budget, address, manager)

- in emp, there is an fd did -> manager. It is not 3NF!

Page 18: Cost Model and  Estimating Result Sizes

Denormalization (cont’d)Denormalization (cont’d)

• How will you ensure that the

redundancy does not introduce errors

into the database?

Page 19: Cost Model and  Estimating Result Sizes

Creating Indexes Using Creating Indexes Using OracleOracle

Page 20: Cost Model and  Estimating Result Sizes

IndexIndex

• Map between

– a key of a row

– the location of the data on the row

• Oracle has two kinds of indexes

– B+ tree

– Bitmap

• Sorted

Page 21: Cost Model and  Estimating Result Sizes

B+ treeB+ tree

Root

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13

Page 22: Cost Model and  Estimating Result Sizes

Creating an IndexCreating an Index

• Syntax:

create [bitmap] [unique] index index on

table(column [,column] . . .)

Page 23: Cost Model and  Estimating Result Sizes

Unique IndexesUnique Indexes

• Create an index that will guarantee the

uniqueness of the key. Fail if any duplicate

already exists.

• When you create a table with a

– primary key constraint or

– unique constraint

a "unique" index is created automatically

create unique index rating_bit on Sailors(rating);

Page 24: Cost Model and  Estimating Result Sizes

Bitmap IndexesBitmap Indexes

• Appropriate for columns that may have very

few possible values

• For each value c that appears in the column,

a vector v of bits is created, with a 1 in v[i]

if the i-th row has the value c

– Vector length = number of rows

• Oracle can automatically convert bitmap

entries to RowIDs during query processing

Page 25: Cost Model and  Estimating Result Sizes

Bitmap Indexes: ExampleBitmap Indexes: Example

create bitmap index rating_bit on Sailors(rating);

• Corresponding bitmaps:– 3: <1 0 0 1>

– 7: <0 1 0 0>

– 10: <0 0 1 0>

SidSnameagerating

12Jim553

13John467

14Jane4610

15Sam373

Page 26: Cost Model and  Estimating Result Sizes

When to Create an IndexWhen to Create an Index

• Large tables, on columns that are likely

to appear in where clauses as a

simple equality

• where s.sname = ‘John’ and s.age = 50

• where s.age = r.age

Page 27: Cost Model and  Estimating Result Sizes

Function-Based IndexesFunction-Based Indexes

• You can't use an index on sname for the

following query:

select *

from Sailors

where UPPER(sname) = 'SAM';

• You can create a function-based index to

speed up the query:

create index upp_sname on Sailors(UPPER(sname));

Page 28: Cost Model and  Estimating Result Sizes

Index-Organized TablesIndex-Organized Tables• An index organized table keeps its data sorted by the

primary key

• Rows do not have RowIDs

• They store their data as if they were an index

create table Sailors(

sid number primary key,

sname varchar2(30),

age number,

rating number)

organization index;

Page 29: Cost Model and  Estimating Result Sizes

Index-Organized Tables (2)Index-Organized Tables (2)

• What advantages does this have?

– Enforce uniqueness: primary key

– Improve performance

• What disadvantages?

– expensive to add column, dynamic data

• When to use?

– where clause on the primary key

– static data

Page 30: Cost Model and  Estimating Result Sizes

Clustering Tables TogetherClustering Tables Together

• You can ask Oracle to store several tables

close together on the disk

• This is useful if you usually join these tables

together

• Cluster: area in the disk where the rows of

the tables are stored

• Cluster key: the columns by which the

tables are usually joined in a query

Page 31: Cost Model and  Estimating Result Sizes

Clustering Tables Clustering Tables Together: SyntaxTogether: Syntax

• create cluster sailor_reserves (X number);– Create a cluster with nothing in it

• create table Sailors(

sid number primary key,

sname varchar2(30),

age number,

rating number)

cluster sailor_reserves(sid);

– create the table in the cluster

Page 32: Cost Model and  Estimating Result Sizes

Clustering Tables Clustering Tables Together: Syntax (cont.)Together: Syntax (cont.)

• create index sailor_reserves_index on cluster sailor_reserves– Create an index on the cluster

• create table Reserves(

sid number,

bid number,

day date,

primary key(sid, bid, day) )

cluster sailor_reserves(sid);

– A second table is added to the cluster

Page 33: Cost Model and  Estimating Result Sizes

Reserves

sidbidday

221027/7/97

2210110/10/96

5810311/12/96

Sailors

sidsnameratingage

22Dustin745.0

31Lubber855.5

58Rusty1035.0

Stored

sidsnameratingagebidday

22Dustin745.010

27/7/97

10

110/10/96

31Lubber855.5

58Rusty1035.010

311/12/96

Page 34: Cost Model and  Estimating Result Sizes

The Oracle OptimizerThe Oracle Optimizer

Page 35: Cost Model and  Estimating Result Sizes

Types of OptimizersTypes of Optimizers

• There are different modes for the optimizer

• RULE: Rule-based optimizer (RBO)– deprecated

• CHOOSE: Cost-based optimizer (CBO); picks a plan based on statistics (e.g. number of rows in a table, number of distinct keys in an index) – Need to analyze the data in the database using

analyze command

ALTER SESSION SET optimizer_mode = {choose|rule|first_rows(_n)|all_rows}

Page 36: Cost Model and  Estimating Result Sizes

Types of OptimizersTypes of Optimizers

• ALL_ROWS: execute the query so that all of

the rows are returned as quickly as possible

– Merge join

• FIRST_ROWS(n): execute the query so that

all of the first n rows are returned as quickly

as possible

– Block nested loop join

Page 37: Cost Model and  Estimating Result Sizes

analyze table | index

<table_name> | <index_name>

compute statistics |

estimate statistics [sample <integer>

rows | percent] |

delete statistics;

analyze table Sailors estimate statistics sample 25 percent;

Analyzing the DataAnalyzing the Data

Page 38: Cost Model and  Estimating Result Sizes

Viewing the Execution PlanViewing the Execution Plan(Option 1)(Option 1)

• You need a PLAN_TABLE table. So, the first

time that you want to see execution plans,

run the command:

• Set autotrace on to see all plans

– Display the execution path for each query, after

being executed

@$ORACLE_HOME/rdbms/admin/utlxplan.sql

Page 39: Cost Model and  Estimating Result Sizes

Viewing the Execution Plan Viewing the Execution Plan (Option 2)(Option 2)

• Another option:

explain planset statement_id='test'for SELECT *FROM Sailors SWHERE sname='Joe';

explain plan set statement_id=‘<name>’ for <statement>

Select Plan_Table

Page 40: Cost Model and  Estimating Result Sizes

Operations that Access Operations that Access TablesTables

• TABLE ACCESS FULL: sequential table scan

– Oracle optimizes by reading multiple blocks

– Used whenever there is no where clause on a

query

select * from Sailors

• TABLE ACCESS BY ROWID: access rows by

their RowID values.

– How do you get the rowid? From an index!

select * from Sailors where sid > 10

Page 41: Cost Model and  Estimating Result Sizes

Types of IndexesTypes of Indexes

• Unique: each row of the indexed table

contains a unique value for the

indexed column

• Nonunique: the row’s indexed values

can repeat

Page 42: Cost Model and  Estimating Result Sizes

Operations that Use Operations that Use IndexesIndexes

• INDEX UNIQUE SCAN: Access of an

index that is defined to be unique

• INDEX RANGE SCAN: Access of an

index that is not unique or access of a

unique index for a range of values

Page 43: Cost Model and  Estimating Result Sizes
Page 44: Cost Model and  Estimating Result Sizes
Page 45: Cost Model and  Estimating Result Sizes
Page 46: Cost Model and  Estimating Result Sizes

When are Indexes Used/Not When are Indexes Used/Not Used?Used?

• If you set an indexed column equal to a value, e.g., sname = 'Jim'

• If you specify a range of values for an indexed column, e.g., sname like 'J%'– sname like '%m': will not use an index

– UPPER(sname) like 'J%' : will not use an index

– sname is null: will not use an index, since null values are not stored in the index

– sname is not null: will not use an index, since every value in the index would have to be accessed

Page 47: Cost Model and  Estimating Result Sizes

When are Indexes Used? When are Indexes Used? (cont)(cont)

• 2*age = 20: Index on age will not be used. Index on 2*age will be used.

• sname != 'Jim': Index will not be used.

• MIN and MAX functions: Index will be used

• Equality of a column in a leading column of a multicolumn index. For example, suppose we have a multicolumn index on (sid, bid, day)– sid = 12: Can use the index

– bid = 101: Cannot use the index

Page 48: Cost Model and  Estimating Result Sizes

When are Indexes Used? When are Indexes Used? (cont)(cont)

• If the index is selective

– A small number of records are associated

with each distinct column value

Page 49: Cost Model and  Estimating Result Sizes

HintsHints

Page 50: Cost Model and  Estimating Result Sizes

HintsHints

• You can give the optimizer hints about

how to perform query evaluation

• Hints are written in /*+ */ right after

the select

• Note: These are only hints. The oracle

optimizer can choose to ignore your

hints

Page 51: Cost Model and  Estimating Result Sizes

HintsHints

• FULL hint: tell the optimizer to perform a

TABLE ACCESS FULL operation on the

specified table

• ROWID hint: tell the optimizer to perform a

TABLE ACCESS BY ROWID operation on the

specified table

• INDEX hint: tells the optimizer to use an

index-based scan on the specified table

Page 52: Cost Model and  Estimating Result Sizes

ExamplesExamples

Select /*+ FULL (sailors) */ sidFrom sailorsWhere sname=‘Joe’;

Select /*+ INDEX (sailors) */ sidFrom sailorsWhere sname=‘Joe’;

Select /*INDEX (sailors s_ind) */ sidFrom sailors S, reserves RWhere S.sid=R.sid AND sname=‘Joe’;