Upload
tess98
View
454
Download
1
Embed Size (px)
DESCRIPTION
Citation preview
Further understand column group statistics in DB2Leverage the extended use of multi-column statistics in DB2 9.5to improve cardinality estimates
Skill Level: Intermediate
Samir Kapoor ([email protected])DB2 Advanced Support AnalystIBM Canada Ltd.
Vincent Corvinelli ([email protected])DB2 Optimizer DeveloperIBM Canada Ltd.
04 Sep 2008
With multi-column statistics in IBM® DB2® for Linux®, UNIX®, and Windows®(DB2), the optimizer can determine a better query access plan and improve queryperformance when there is correlation between multiple predicates. In this article,learn how to use multi-column statistics to take advantage of the enhancements tothe optimizer in DB2 9.5 that extend their use to a broader range of predicates.
Introduction
The article "Understand column group statistics in DB2" (developerWorks,December 2006) describes the importance of collecting column group statistics andhow the DB2 SQL Optimizer (referred to as optimizer hereafter) makes use of thesemulti-column statistics to detect a statistical correlation between two or more local orjoin equality predicates. In DB2 9.5, the optimizer further extended the use ofmulti-column statistics to a broader range of predicates.
The optimizer depends on accurate cardinality estimates to properly compute thecost of each query access plan considered. Cardinality estimation is a process bywhich the optimizer uses statistics to determine the size of partial query results after
Further understand column group statistics in DB2© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 1 of 26
predicates are applied or aggregation is performed. At each operator in the accessplan, the optimizer estimates the cardinality output from the operator. Theapplication of one or more predicates may reduce the output stream cardinality.
It is common practice to assume the predicates are independent of each other whencomputing their combined filtering effect on the cardinality estimate. However, thepredicates can be statistically correlated. Treating multiple predicates independentlytypically results in the optimizer under-estimating the cardinality. Under-estimatingthe cardinality could lead the optimizer to choose a sub-optimal access plan.
The optimizer considers using multi-column statistics to detect a statisticalcorrelation and estimate more accurately the combined filtering effect of multiplepredicates. This article describes how the optimizer makes use of multi-columnstatistics to detect a statistical correlation and estimate more accurately thecombined filtering effect of multiple predicates for SQL statements that apply at leasttwo local IN, OR, and equality predicates, and the filtering effect of predicates forSQL statements that apply some classes of OR predicates. "Understand columngroup statistics in DB2" describes how the optimizer makes use of multi-columnstatistics to detect a correlation between two or more local equality predicates andfor the join of two or more tables that apply at least two equality join predicatesbetween the pair of tables. The RUNSTATS command options, as described in thatarticle, are used in the same manner, so those command options will not bedescribed in this article.
Statistical correlation of multiple local equality and local INpredicates
If the WHERE clause of an SQL statement applies multiple predicates, as follows:
C1=? AND C2 IN ( ?, ?, ? )
and multi-column statistics on (C1, C2) are collected, then the optimizer attempts todetect a statistical correlation between the predicates in order to improve thecardinality estimates. This does not apply to:
• Join predicates with IN or OR operators
• Local predicates with inequality, LIKE, or IS NULL operators
• Predicates with subqueries
The C1=? predicate is an example of a local equality predicate, which is an equalitypredicate applied to a single table and is described as follows:
developerWorks® ibm.com/developerWorks
Further understand column group statistics in DB2Page 2 of 26 © Copyright IBM Corporation 1994, 2008. All rights reserved.
COLUMN = literal
where the literal can be any one of these:
• A constant value
• A parameter marker or host variable
• A special register (for example, CURRENT DATE)
The C2 IN ( ?, ?, ? ) predicate is an example of a local IN predicate, which is apredicate applied to the same single table that the equality predicate is applied to,and is described as follows:
COLUMN IN ( <VALUE LIST> )
where the <VALUE LIST> is a comma separated list of one or more literals, asdescribed for the local equality predicate.
An OR predicate that is equivalent to an IN predicate can be specified in the SQLstatement instead of the IN predicate, and the optimizer will treat it in the samemanner when accounting for statistical correlation; that is,
COL IN ( literal_1, literal_2, ..., literal_n )
is equivalent to
COL=literal_1 OR COL=literal_2 OR ... OR COL=literal_n
The following are some examples for which the optimizer tries to detect a correlationbetween local IN, OR, and equality predicates:
a) COL_1 IN ( <VALUE LIST> ) AND COL_2=literal ANDCOL_3=literalb) (COL_1=literal_1 OR COL_1=literal_2 OR ... ORCOL_1=liternal_n) AND COL_2=literal AND ... AND COL_m=literal
ibm.com/developerWorks developerWorks®
Further understand column group statistics in DB2© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 3 of 26
c) COL_1 IN ( <VALUE LIST> ) AND COL_2 IN ( <VALUE LIST> ) AND... AND COL_m IN ( <VALUE LIST> )d) (COL_1=literal_1 OR COL_1=literal_2) AND (COL_2=literal_1 ORCOL_2=literal_2) AND ... AND (COL_m=literal_1 ORCOL_M=literal_2)e) COL_1 IN ( <VALUE LIST> ) AND ... And COL_m IN ( <VALUELIST> ) AND COL_1_2=literal AND ... AND COL_1_k=literalf) (COL_1=literal_1 OR COL_1=literal_2) AND COL_2=literal ANDCOL_3=literalg) (C)L_1=literal_1 OR COL_1=literal_2) AND (COL_2=literal_1 ORCOL_2=literal_2) AND COL_3=literal
The following are some examples of predicates that are not considered for statisticalcorrelation detection by the optimizer:
a) (COL_1=literal AND COL_2=literal) OR (COL_1=literal ANDCOL_2=literal AND COL_3=literal)b)((COL_1=literal AND COL_2=literal) OR (COL_1=literal ANDCOL_2=literal)) AND COL_3=literalc)( COL_1 IN ( <VALUE LIST> ) OR (COL_2 IN ( <VALUE LIST> ) )AND COL_3=literal
Example 1: C1 IN ( <VALUE LIST> ) AND C2 = literal
Note: Please replace SKAPOOR with your own schema in all the examplesdescribed in this article.
These examples were tested in the following environment, using the SAMPLEdatabase, which can be created by executing db2sampl:
Listing 1. Testing environment for samples
DB21085I Instance "skapoor" uses "64" bits and DB2 code release "SQL09051"with level identifier "03020107".Informational tokens are "DB2 v9.5.0.1", "s080328", "U814639", and Fix Pack"1".Product is installed at "/home2/skapoor/sqllib".
Configuration: (as displayed by the db2exfmt tool)
Database Context:----------------
Parallelism: NoneCPU Speed: 4.000000e-05Comm Speed: 100Buffer Pool size: 1000Sort Heap size: 256Database Heap size: 1200Lock List size: 100Maximum Lock List: 10Average Applications: 1Locks Available: 640
Package Context:
developerWorks® ibm.com/developerWorks
Further understand column group statistics in DB2Page 4 of 26 © Copyright IBM Corporation 1994, 2008. All rights reserved.
---------------SQL Type: DynamicOptimization Level: 5Blocking: Block All CursorsIsolation Level: Cursor Stability
STMTHEAP: (Statement heap size)6402
Consider the following query on the EMPLOYEE table in the SAMPLE database:
Listing 2. Query on the EMPLOYEE table in the SAMPLE database
SELECT FIRSTNME, LASTNAME, JOB, WORKDEPT, SALARYFROM EMPLOYEE
WHERE JOB IN ('CLERK', 'SALESREP') ANDWORKDEPT = 'A00'
ORDER BY JOB, SALARY
It returns four records from the EMPLOYEE table:
Listing 3. Records returned from the EMPLOYEE table
FIRSTNME LASTNAME JOB WORKDEPT SALARY------------ --------------- -------- -------- -----------GREG ORLANDO CLERK A00 39250.00SEAN O'CONNELL CLERK A00 49250.00DIAN HEMMINGER SALESREP A00 46500.00VINCENZO LUCCHESSI SALESREP A00 66500.00
4 record(s) selected.
The EXPLAIN tool, which requires the existence of the EXPLAIN tables, can beused to view the query access plan chosen by the optimizer. To create the EXPLAINtables, execute:
db2 -tvf $DB2PATH/misc/EXPLAIN.DDL
When the SAMPLE database is initially created, statistics are not collected on thetables. To collect statistics on the EMPLOYEE table, the RUNSTATS tool can beused. The following RUNSTATS command collects statistics on each column,including distribution statistics, and detailed statistics on all indexes defined in theEMPLOYEE table, if any:
ibm.com/developerWorks developerWorks®
Further understand column group statistics in DB2© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 5 of 26
RUNSTATS ON TABLE SKAPOOR.EMPLOYEEWITH DISTRIBUTION AND DETAILED INDEXES ALL
Once the EXPLAIN tables are created and the statistics are collected, the SETCURRENT EXPLAIN MODE statement can be used to insert the query access plandetails for one or more statements into the EXPLAIN tables, as follows:
Listing 4. Insert the query access plan details into the EXPLAIN tables
SET CURRENT EXPLAIN MODE EXPLAIN;
SELECT FIRSTNME, LASTNAME, JOB, WORKDEPT, SALARYFROM EMPLOYEEWHERE JOB IN ('CLERK', 'SALESREP') AND
WORKDEPT = 'A00'ORDER BY JOB, SALARY;
SET CURRENT EXPLAIN MODE NO;
The db2exfmt tool reads the data in the EXPLAIN tables, and formats the queryaccess plan in a text file:
db2exfmt -d SAMPLE -1 -g -o exfmt_example1.out
The file exfmt_example1.out contains a query access plan similar to the following,with an estimated cardinality of 1:
Listing 5. Query access plan
RowsRETURN( 1)CostI/O|
1.19048TBSCAN( 2)10.7902
1|
1.19048SORT( 3)10.7387
1|
1.19048FETCH( 4)10.6299
1/---+---\
developerWorks® ibm.com/developerWorks
Further understand column group statistics in DB2Page 6 of 26 © Copyright IBM Corporation 1994, 2008. All rights reserved.
5 42IXSCAN TABLE: SKAPOOR( 5) EMPLOYEE2.27828
0|42
INDEX: SKAPOORXEMP2
The cardinality estimate of 1 does not match the actual result of 4. The optimizerassumes the two predicates are independent because relevant index or columngroup statistics do not exist. The RUNSTATS tool can be used to collect columngroup statistics on the group (JOB,WORKDEPT) to provide the optimizer with theappropriate information to detect a statistical correlation, if any, between the twocolumns:
RUNSTATS ON TABLE SKAPOOR.EMPLOYEE ON ALL COLUMNSAND COLUMNS ((JOB,WORKDEPT)) WITH DISTRIBUTIONAND DETAILED INDEXES ALL
After repeating the above steps to explain the query again to generate the queryaccess plan, the optimizer computes a better cardinality estimate as a result ofcollecting column group statistics on the two columns:
Listing 6. Query access plan, with better cardinality estimate
RowsRETURN( 1)CostI/O|5
TBSCAN( 2)10.8458
1|5
SORT( 3)10.7944
1|5
FETCH( 4)10.6299
1/---+---\
5 42IXSCAN TABLE: SKAPOOR( 5) EMPLOYEE2.27828
ibm.com/developerWorks developerWorks®
Further understand column group statistics in DB2© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 7 of 26
0|42
INDEX: SKAPOORXEMP2
The cardinality estimate is slightly higher than the actual value of 4 since the columngroup statistic is a uniform distribution statistic. You may have noticed that the queryaccess plan itself did not change with the increase in cardinality estimate. Theexamples described in this article are simple in order to illustrate how to improve thecardinality estimate. Statements involving larger tables and joins of two or moretables are more likely to exhibit a change in query access plan as a result of theimproved cardinality estimate.
Example 2: C1 IN ( <VALUE LIST> ) AND C2 IN ( <VALUE LIST> )
This example illustrates the effect of column group statistics on two IN predicates.Consider the following query that retrieves the bonus and salaries for managers anddesigners in certain departments:
Listing 7. Bonus and salaries query
SELECT FIRSTNME, LASTNAME, WORKDEPT, JOB, BONUS, SALARYFROM EMPLOYEE
WHERE WORKDEPT IN ('D11','D21') ANDJOB IN ('MANAGER','DESIGNER')
ORDER BY WORKDEPT, SALARY
This query returns 12 records from the EMPLOYEE table:
Listing 8. Records returned from the EMPLOYEE table
FIRSTNME LASTNAME WORKDEPT JOB BONUS SALARY------------ --------------- -------- -------- ----------- -----------MASATOSHI YOSHIMURA D11 DESIGNER 500.00 44680.00JENNIFER LUTZ D11 DESIGNER 600.00 49840.00JAMES WALKER D11 DESIGNER 400.00 50450.00MARILYN SCOUTTEN D11 DESIGNER 500.00 51340.00BRUCE ADAMSON D11 DESIGNER 500.00 55280.00DAVID BROWN D11 DESIGNER 600.00 57740.00ELIZABETH PIANKA D11 DESIGNER 400.00 62250.00KIYOSHI YAMAMOTO D11 DESIGNER 500.00 64680.00WILLIAM JONES D11 DESIGNER 400.00 68270.00REBA JOHN D11 DESIGNER 600.00 69840.00IRVING STERN D11 MANAGER 500.00 72250.00EVA PULASKI D21 MANAGER 700.00 96170.00
12 record(s) selected.
developerWorks® ibm.com/developerWorks
Further understand column group statistics in DB2Page 8 of 26 © Copyright IBM Corporation 1994, 2008. All rights reserved.
First, examine the query access plan and cardinality estimates without the columngroup statistic on (JOB,WORKDEPT). This is accomplished by executing anotherRUNSTATS command on the EMPLOYEE table as follows:
RUNSTATS ON TABLE SKAPOOR.EMPLOYEEWITH DISTRIBUTION AND DETAILED INDEXES ALL
The previous statistics collected are cleared by the latest RUNSTATS command, sothe column group statistics collected earlier are no longer kept. Generating the queryaccess plan using EXPLAIN and the db2exfmt tool, as in Example 1, you canexamine the estimated cardinality by the optimizer:
Listing 9. Insert the query access plan details into the EXPLAIN tables
SET CURRENT EXPLAIN MODE EXPLAIN;
SELECT FIRSTNME, LASTNAME, WORKDEPT, JOB, BONUS, SALARYFROM EMPLOYEE
WHERE WORKDEPT IN ('D11','D21') ANDJOB IN ('MANAGER','DESIGNER')
ORDER BY WORKDEPT, SALARY
SET CURRENT EXPLAIN MODE NO;
db2exfmt -d SAMPLE -1 -g -o exfmt_example2.out
The file exfmt_example2.out should contain a query access plan similar to thefollowing, with an estimated cardinality of 7:
Listing 10. Query access plan
RowsRETURN( 1)CostI/O|
7.28572TBSCAN( 2)13.7066
1|
7.28572SORT( 3)13.5723
1|
7.28572NLJOIN( 4)13.1318
1
ibm.com/developerWorks developerWorks®
Further understand column group statistics in DB2© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 9 of 26
/------+------\2 3.64286
TBSCAN FETCH( 5) ( 6)0.006 11.0934
0 1| /---+---\2 9 42
TABFNC: SYSIBM IXSCAN TABLE: SKAPOORGENROW ( 7) EMPLOYEE
2.553640|42
INDEX: SKAPOORXEMP2
In the query access plan shown in Listing 9, notice a join between the tableEMPLOYEE and a table function, GENROW. When an IN predicate (or anequivalent OR predicate) is used, the optimizer considers an IN-to-JOINtransformation, converting the IN predicate to a join predicate. The GENROW tablefunction produces the values listed in the <VALUE LIST> of the IN predicate. Whenthe IN predicate is used in its join form, the optimizer still considers it for statisticalcorrelation detection.
The cardinality estimate of 7 does not match the actual result of 12. As in Example1, collecting column group statistics on the columns (JOB,WORKDEPT) provides thenecessary information for the optimizer to account for a statistical correlation whencomputing the combined filtering effect of the two IN predicates:
RUNSTATS ON TABLE SKAPOOR.EMPLOYEEON ALL COLUMNS AND COLUMNS ((JOB,WORKDEPT))WITH DISTRIBUTION AND DETAILED INDEXES ALL
After repeating the above steps to explain the query again to generate the queryaccess plan, the optimizer computes a better cardinality estimate that is very closeto the actual result:
Listing 11. Query access plan with more accurate cardinality estimate
RowsRETURN( 1)CostI/O|11.2
TBSCAN( 2)13.9768
1|11.2
SORT
developerWorks® ibm.com/developerWorks
Further understand column group statistics in DB2Page 10 of 26 © Copyright IBM Corporation 1994, 2008. All rights reserved.
( 3)13.8033
1|11.2
NLJOIN( 4)13.1318
1/------+------\
2 5.6TBSCAN FETCH( 5) ( 6)0.006 11.0934
0 1| /---+---\2 9 42
TABFNC: SYSIBM IXSCAN TABLE: SKAPOORGENROW ( 7) EMPLOYEE
2.553640|42
INDEX: SKAPOORXEMP2
Example 3: C1 IN ( <VALUE LIST> ) AND C2 IN ( <VALUE LIST> ) ANDC3=literal
In this example, you add a third predicate to the query in Example 2 to determinewhich employees received a bonus of $500:
Listing 12. Add a third predicate to find $500 bonus
SELECT FIRSTNME, LASTNAME, WORKDEPT, JOB, BONUS, SALARYFROM EMPLOYEEWHERE WORKDEPT IN ('D11','D21') AND
JOB IN ('MANAGER','DESIGNER') ANDBONUS = 500
ORDER BY WORKDEPT, SALARY
This query returns five records from the EMPLOYEE table:
Listing 13. Records returned from EMPLOYEE table
FIRSTNME LASTNAME WORKDEPT JOB BONUS SALARY------------ --------------- -------- -------- ----------- -----------MASATOSHI YOSHIMURA D11 DESIGNER 500.00 44680.00MARILYN SCOUTTEN D11 DESIGNER 500.00 51340.00BRUCE ADAMSON D11 DESIGNER 500.00 55280.00KIYOSHI YAMAMOTO D11 DESIGNER 500.00 64680.00IRVING STERN D11 MANAGER 500.00 72250.00
5 record(s) selected.
If you re-collect the statistics without the column group statistics using:
ibm.com/developerWorks developerWorks®
Further understand column group statistics in DB2© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 11 of 26
RUNSTATS ON TABLE SKAPOOR.EMPLOYEEWITH DISTRIBUTION AND DETAILED INDEXES ALL
A query access plan similar to the following is chosen by the optimizer, with acardinality estimate of 2:
Listing 14. Query access plan
RowsRETURN( 1)CostI/O|
2.42857TBSCAN( 2)13.8494
1|
2.42857SORT( 3)13.7636
1|
2.42857NLJOIN( 4)13.5765
1/------+------\
2 1.21429TBSCAN FETCH( 5) ( 6)0.006 11.3158
0 1| /---+---\2 9 42
TABFNC: SYSIBM IXSCAN TABLE: SKAPOORGENROW ( 7) EMPLOYEE
2.553640|42
INDEX: SKAPOORXEMP2
With three predicates applied in the WHERE clause, assuming they are independentresults in the optimizer underestimating the cardinality. To illustrate how theoptimizer can use index statistics, as well as column group statistics, to detect astatistical correlation, create an index covering the three columns(JOB,WORKDEPT,BONUS) that are referenced in the predicates, and collectstatistics:
Listing 15. Create index and collect statistics
developerWorks® ibm.com/developerWorks
Further understand column group statistics in DB2Page 12 of 26 © Copyright IBM Corporation 1994, 2008. All rights reserved.
CREATE INDEX JOB_DEPT_BONUS ON EMPLOYEE (JOB,WORKDEPT,BONUS)
-- The RUNSTATS command provides the option to collect statistics on a set of-- indexes only, without affecting the statistics previously collected.RUNSTATS ON TABLE SKAPOOR.EMPLOYEE FOR DETAILED INDEXES SKAPOOR.JOB_DEPT_BONUS
With the new index created, and statistics collected on it, the optimizer corrects thecardinality estimate of the query access plan:
Listing 16. A corrected cardinality estimate from the query access plan
RowsRETURN( 1)CostI/O|5.25
TBSCAN( 2)13.5227
1|5.25
SORT( 3)13.4087
1|5.25
NLJOIN( 4)13.0875
1/------+------\
2 2.625TBSCAN FETCH( 5) ( 6)0.006 11.0713
0 1| /---+---\2 2.625 42
TABFNC: SYSIBM IXSCAN TABLE: SKAPOORGENROW ( 7) EMPLOYEE
2.859330|42
INDEX: SKAPOORJOB_DEPT_BONUS
Example 4: (C1=literal OR C1=literal2) AND (C2=literal OR C2=literal2) ANDC3=literal
This example is equivalent to Example 3, using equivalent OR predicates to replace
ibm.com/developerWorks developerWorks®
Further understand column group statistics in DB2© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 13 of 26
the IN predicates:
Listing 17. Equivalent OR predicates to replace the IN predicates
SELECT FIRSTNME, LASTNAME, WORKDEPT, JOB, BONUS, SALARYFROM EMPLOYEE
WHERE (WORKDEPT = 'D11' OR WORKDEPT = 'D21') AND(JOB = 'MANAGER' OR JOB = 'DESIGNER') ANDBONUS = 500
ORDER BY WORKDEPT, SALARY
This query returns the same result set as in Example 3. This example illustrates theeffect that partial statistics have on the ability of the optimizer to estimate thecardinality. Drop the index created in Example 3 and re-collect the statistics withcolumn group statistics on the group ((JOB,WORKDEPT)) only:
DROP INDEX JOB_DEPT_BONUSRUNSTATS ON TABLE SKAPOOR.EMPLOYEE
ON ALL COLUMNS AND COLUMNS((JOB,WORKDEPT))
WITH DISTRIBUTION AND DETAILED INDEXES ALL
With column group statistics collected on a subset of the columns referenced by theeligible IN, OR, and equality predicates, the optimizer estimates a cardinality that isclose to the actual result, but not as accurate as shown in Example 3 when columngroup statistics were collected on all three columns:
Listing 18. Query access plan
RowsRETURN( 1)CostI/O|
3.73333TBSCAN( 2)13.9174
1|
3.73333SORT( 3)13.8186
1|
3.73333NLJOIN( 4)13.5765
1/------+------\
2 1.86667
developerWorks® ibm.com/developerWorks
Further understand column group statistics in DB2Page 14 of 26 © Copyright IBM Corporation 1994, 2008. All rights reserved.
TBSCAN FETCH( 5) ( 6)0.006 11.3158
0 1| /---+---\2 9 42
TABFNC: SYSIBM IXSCAN TABLE: SKAPOORGENROW ( 7) EMPLOYEE
2.553640|42
INDEX: SKAPOORXEMP2
The optimizer used the column group statistic on (JOB,WORKDEPT) to account fora statistical correlation between the two OR predicates, but without includingBONUS in the column group, it considered the BONUS=500 predicate asindependent of the two OR predicates, resulting in the slightly underestimated finalcardinality.
Note: if you analyze the Optimized Statement section of the db2exfmt output for theabove query, you may notice that the OR predicates were converted to theirequivalent IN predicates:
Listing 19. OR predicates converted to their equivalent IN predicates
Optimized Statement:-------------------SELECT Q5.FIRSTNME AS "FIRSTNME", Q5.LASTNAME AS "LASTNAME", Q5.WORKDEPT AS
"WORKDEPT", Q5.JOB AS "JOB", +0000500.00 AS "BONUS", Q5.SALARY AS"SALARY"
FROM SKAPOOR.EMPLOYEE AS Q5WHERE (Q5.BONUS = +0000500.00) AND Q5.JOB IN ('MANAGER ', 'DESIGNER') AND
Q5.WORKDEPT IN ('D11', 'D21')ORDER BY Q5.WORKDEPT, Q5.SALARY
Collecting column group statistics on all three columns result in the same cardinalityestimate as in Example 3. In this case, you still collect the column group statistics onthe previous two columns (JOB,WORKDEPT) and include the full set of threecolumns (JOB,WORKDEPT,BONUS):
RUNSTATS ON TABLE SKAPOOR.EMPLOYEEON ALL COLUMNS AND COLUMNS((JOB,WORKDEPT), (JOB,WORKDEPT,BONUS))
WITH DISTRIBUTION AND DETAILED INDEXES ALL
As described in "Understand column group statistics in DB2", you can gather one ormore column group statistics between the same sets of columns. The query accessplan produced after collecting these statistics is the same as the final plan inExample 3. It is left as an exercise for you to verify this is the case.
ibm.com/developerWorks developerWorks®
Further understand column group statistics in DB2© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 15 of 26
Example 5: Index oring
This example illustrates how collecting column group statistics can also improve thecardinality estimate of index oring access plans. Consider the following query on theEMPLOYEE table that retrieves all clerks and sales representatives that belong todepartment A00:
Listing 20. Query on the EMPLOYEE table that retrieves all clerks and salesrepresentatives that belong to department A00
SELECT FIRSTNME, LASTNAME, JOB, WORKDEPT, SALARYFROM EMPLOYEE
WHERE JOB IN ('CLERK', 'SALESREP') ANDWORKDEPT='A00'
ORDER BY JOB, SALARY
This query returns four records from the EMPLOYEE table:
Listing 21. Query returns four records
FIRSTNME LASTNAME JOB WORKDEPT SALARY------------ --------------- -------- -------- -----------GREG ORLANDO CLERK A00 39250.00SEAN O'CONNELL CLERK A00 49250.00DIAN HEMMINGER SALESREP A00 46500.00VINCENZO LUCCHESSI SALESREP A00 66500.00
4 record(s) selected.
To better illustrate the improvement in cardinality estimation, drop all the existingindexes on the EMPLOYEE table except the primary key index:
DROP INDEX XEMP2
and create the following index that includes both columns referenced by predicatesin the WHERE clause of the above query, separated by the SALARY column:
CREATE INDEX IND2 ON EMPLOYEE (JOB,SALARY,WORKDEPT)
Statistics are re-collected on the EMPLOYEE table and its new and remainingindexes:
developerWorks® ibm.com/developerWorks
Further understand column group statistics in DB2Page 16 of 26 © Copyright IBM Corporation 1994, 2008. All rights reserved.
RUNSTATS ON TABLE SKAPOOR.EMPLOYEEWITH DISTRIBUTION AND DETAILED INDEXES ALL
In this example, the optimizer is forced to choose an index oring table accessoperation by using the optimization profile feature. To do so, the optimization profileincludes two optimizer guidelines:
1. A guideline to disable the transformation of the IN predicate to a join
2. A guideline to force the optimizer to choose the index oring operation toaccess the EMPLOYEE table
The first step in creating the optimization profile is to create an XML file, calledexample5.xml, that contains the following contents:
Listing 22. XML file contents
<?xml version="1.0" encoding="UTF-8"?>
<OPTPROFILE VERSION="9.5.1"><STMTPROFILE ID="Example 5 Index oring test">
<STMTKEY><![CDATA[SELECT FIRSTNME, LASTNAME, JOB, WORKDEPT, SALARY
FROM EMPLOYEEWHERE JOB IN ('CLERK', 'SALESREP') AND
WORKDEPT='A00'ORDER BY JOB, SALARY]]>
</STMTKEY>
<OPTGUIDELINES><INLIST2JOIN OPTION="DISABLE" TABLE="EMPLOYEE" COLUMN="JOB"/><IXOR TABLE="EMPLOYEE" INDEX="IND2"/>
</OPTGUIDELINES></STMTPROFILE>
</OPTPROFILE>
The second step involves creating a del file, called example5.del, that contains thefollowing contents:
"SKAPOOR","IXORPLAN","example5.xml"
where SKAPOOR is the schema for the profile, IXORPLAN is the name youassociated to the profile, and example5.xml is the XML file created in the first step,which contains the contents describing the profile.
The third step requires placing both the example5.xml and the example5.del files inthe same location and issuing the following commands:
ibm.com/developerWorks developerWorks®
Further understand column group statistics in DB2© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 17 of 26
Listing 23. Commands to use with example5.xml and example5.del
-- Create the OPT_PROFILE table, if it does not already existCREATE TABLE SYSTOOLS.OPT_PROFILE (SCHEMA VARCHAR(128) NOT NULL,NAME VARCHAR(128) NOT NULL,PROFILE BLOB (2M) NOT NULL,
PRIMARY KEY ( SCHEMA, NAME ))
-- Add an entry to OPT_PROFILE table for our index-oring guidelineIMPORT FROM example5.del OF DEL
MODIFIED BY LOBSINFILEINSERT INTO SYSTOOLS.OPT_PROFILE
To view the query access plan using the optimization profile created, the SETCURRENT OPTIMIZATION PROFILE statement can be used in combination withthe SET CURRENT EXPLAIN MODE statement, as follows:
Listing 24. View the query access plan using the optimization profile
-- use the IXORPLAN profileSET CURRENT OPTIMIZATION PROFILE="IXORPLAN"
SET CURRENT EXPLAIN MODE EXPLAIN
SELECT FIRSTNME, LASTNAME, JOB, WORKDEPT, SALARYFROM EMPLOYEE
WHERE JOB IN ('CLERK', 'SALESREP') ANDWORKDEPT='A00'
ORDER BY JOB, SALARY
SET CURRENT EXPLAIN MODE NO
A query access plan similar to the following is chosen by the optimizer:
Listing 25. Query access plan
RowsRETURN( 1)CostI/O|
1.19048TBSCAN( 2)13.0404
0.963719|
1.19048SORT( 3)12.967
0.963719|
developerWorks® ibm.com/developerWorks
Further understand column group statistics in DB2Page 18 of 26 © Copyright IBM Corporation 1994, 2008. All rights reserved.
1.19048FETCH( 4)12.8278
0.963719/---+---\
1.19048 42RIDSCN TABLE: SKAPOOR( 5) EMPLOYEE4.89404
0/-----+-----\
0.952381 0.238095SORT SORT( 6) ( 8)2.7352 2.21536
0 0| |
0.952381 0.238095IXSCAN IXSCAN( 7) ( 9)2.6272 2.10736
0 0| |42 42
INDEX: SKAPOOR INDEX: SKAPOORIND2 IND2
If the generated query access plan is not the index oring plan shown above, thenthere is a problem with your optimization profile setup. In the db2exfmt output, thefollowing is seen if the optimizer used the optimization profile:
Profile Information:--------------------OPT_PROF: (Optimization Profile Name)
SKAPOOR.IXORPLANSTMTPROF: (Statement Profile Name)
Example 5 Index oring test
It is left as an exercise to the reader to determine the appropriate method to collect acolumn group statistic on the columns (JOB,WORKDEPT). Once the column groupstatistic is collected, the query access plan displays improved cardinality estimates:
Listing 26. Improved cardinality estimates in query access plan
RowsRETURN( 1)CostI/O|5
TBSCAN( 2)13.878
ibm.com/developerWorks developerWorks®
Further understand column group statistics in DB2© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 19 of 26
1|5
SORT( 3)13.7665
1|5
FETCH( 4)13.4746
1/---+---\
5 42RIDSCN TABLE: SKAPOOR( 5) EMPLOYEE4.89404
0/-----+-----\
4 1SORT SORT( 6) ( 8)2.7352 2.21536
0 0| |4 1
IXSCAN IXSCAN( 7) ( 9)2.6272 2.10736
0 0| |42 42
INDEX: SKAPOOR INDEX: SKAPOORIND2 IND2
At each IXSCAN operator, the cardinality is corrected to account for a correlationbetween the predicates:
• JOB='CLERK' AND WORKDEPT='A00'
• JOB='SALESREP' AND WORKDEPT='A00'
and the cardinality is corrected at the RIDSCN and FETCH operators, whichaccounts for the statistical correlation between the IN and equality predicates.
Statistical correlation of multiple local equality predicates withinsubterms of OR operators
If the WHERE clause of an SQL statement applies OR operators with multiple localpredicates within each subterm, as follows:
(C1=literal_1 AND C2=literal_2) OR(C1=literal_3 AND C2=literal_4) OR
developerWorks® ibm.com/developerWorks
Further understand column group statistics in DB2Page 20 of 26 © Copyright IBM Corporation 1994, 2008. All rights reserved.
(C1=literal_5 AND C2=literal_6)
and multi-column statistics on (C1,C2) are collected, then the optimizer will attemptto detect a statistical correlation between the predicates in order to improve thefiltering effect of the OR predicate. In this article, the above OR operators aredescribed as a single OR predicate with three subterms:
1. (C1=literal_1 AND C2=literal_2)
2. (C1=literal_3 AND C2=literal_4)
3. (C1=literal_5 AND C2=literal_6)
This does not apply if the OR predicate contains any of the following:
• Non-local equality predicates in any of the subterms
• Different sets of columns referenced in two or more subterms
The following are some examples for which the optimizer tries to detect a correlationbetween local IN, OR, and equality predicates:
a) (COL_1=literal_1 AND COL_2=literal_2) OR(COL_1=literal_3 AND COL_2=literal_4) OR
... OR(COL_1=literal_n AND COL_2=literal_m)
The following are some examples of predicates that are not considered for statisticalcorrelation detection by the optimizer:
a) (COL_1=literal_1 AND COL_2=literal_2) OR(COL_1=literal_3 AND COL_2=literal_4 AND COL_3=literal_5)
b) (COL_1=literal_1 AND COL_2=literal_2) OR(COL_1=literal_3 AND COL_2=literal_4) OR(COL_1=literal_5 AND COL_2=literal_6 AND COL_3=literal_7)
Example 6: (C1=LITERAL1 AND C2=LITERAL2) OR (C1=LITERAL3 ANDC2=LITERAL4)
This example illustrates the effect of column group statistics on a qualifying OR
ibm.com/developerWorks developerWorks®
Further understand column group statistics in DB2© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 21 of 26
predicate. Consider the following query on the EMPLOYEE table:
Listing 27. Query on the EMPLOYEE table
SELECT FIRSTNME, LASTNAME, WORKDEPT, JOB, BONUS, SALARYFROM EMPLOYEEWHERE ( WORKDEPT='E21' AND JOB='FIELDREP' ) OR
( WORKDEPT='D21' AND JOB='MANAGER' )ORDER BY WORKDEPT, SALARY
This query returns six records from the EMPLOYEE table:
Listing 28. Query results from the EMPLOYEE table
FIRSTNME LASTNAME WORKDEPT JOB BONUS SALARY------------ --------------- -------- -------- ----------- -----------EVA PULASKI D21 MANAGER 700.00 96170.00ROY ALONZO E21 FIELDREP 500.00 31840.00HELENA WONG E21 FIELDREP 500.00 35370.00RAMLAL MEHTA E21 FIELDREP 400.00 39950.00JASON GOUNOT E21 FIELDREP 500.00 43840.00WING LEE E21 FIELDREP 500.00 45370.00
6 record(s) selected.
If you re-collect the statistics without the column group statistics using:
RUNSTATS ON TABLE SKAPOOR.EMPLOYEEWITH DISTRIBUTION AND DETAILED INDEXES ALL
a query access plan similar to the following is chosen by the optimizer, with acardinality estimate under 2:
Listing 29. Query access plan similar to the one chosen by the optimizer
RowsRETURN( 1)CostI/O|
1.88095TBSCAN( 2)16.1786
1|
1.88095SORT( 3)16.1272
1|
developerWorks® ibm.com/developerWorks
Further understand column group statistics in DB2Page 22 of 26 © Copyright IBM Corporation 1994, 2008. All rights reserved.
1.88095TBSCAN( 4)16.0113
1|42
TABLE: SKAPOOREMPLOYEE
Collecting a column group statistic on the columns (JOB,WORKDEPT) allows theoptimizer to better estimate the filtering effect of the OR predicate, since eachsubterm of the OR predicate applies a set of local equality predicates on thecolumns JOB and WORKDEPT. It is left as an exercise for you to determine theappropriate RUNSTATS statement to collect a column group statistic. Oncecollected, a query access plan similar to the following is chosen by the optimizer,with an improved cardinality estimate that is very close to the actual result of sixrows:
Listing 30. Query access plan with more accurate cardinality estimate
RowsRETURN( 1)CostI/O|5.6
TBSCAN( 2)16.2651
1|5.6
SORT( 3)16.2136
1|5.6
TBSCAN( 4)16.0113
1|42
TABLE: SKAPOOREMPLOYEE
Conclusion
The optimizer is dependent on accurate cardinality estimates to properly computethe cost of each query access plan considered. You can leverage the extended use
ibm.com/developerWorks developerWorks®
Further understand column group statistics in DB2© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 23 of 26
of multi-column statistics in DB2 9.5 to provide the optimizer more information tobetter estimate the cardinality in order to choose an optimal query access plan.
developerWorks® ibm.com/developerWorks
Further understand column group statistics in DB2Page 24 of 26 © Copyright IBM Corporation 1994, 2008. All rights reserved.
Resources
Learn
• "Understand column group statistics in DB2" (developerWorks, December2006): Learn all about how to use column group statistics.
• "Comparing real-time cardinality to the optimizer cardinality estimates"(developerWorks, December 205): Get all the details to create count queries toevaluate real-time cardinalities at certain operators in an access plan.
• "Influence query optimization with optimization profiles and statistical views inDB2 9" (developerWorks, December 2006): Learn about enhancements in DB29 that enable you to influence the default query optimization behaviour.
• Anatomy of an optimization profile section of the "IBM DB2 Database for Linux,UNIX, and Windows Information Center": Get an introduction to the contents ofan optimization profile.
• developerWorks DB2 for Linux, UNIX, and Windows page: Read articles andtutorials and connect to other resources to expand your DB2 skills.
• Learn about DB2 Express-C, the no-charge version of DB2 Express Edition forthe community.
Get products and technologies
• Download a free trial version of DB2 Enterprise 9.
• Now you can use DB2 for free. Download DB2 Express-C a no-charge versionof DB2 Express Edition for the community that offers the same core datafeatures as DB2 Express Edtion and provides a solid base to build and deployapplications.
• Download IBM product evaluation versions and get your hands on applicationdevelopment tools and middleware products from IBM InformationManagement, Lotus®, Rational®, Tivoli®, and WebSphere®.
Discuss
• Participate in the discussion forum for this content.
• Check out developerWorks blogs and get involved in the developerWorkscommunity.
About the authors
Samir KapoorSamir Kapoor is an IBM Certified Advance Technical Expert for DB2. Samir currently
ibm.com/developerWorks developerWorks®
Further understand column group statistics in DB2© Copyright IBM Corporation 1994, 2008. All rights reserved. Page 25 of 26
works with the DB2 Advanced Support -- Down System Division (DSD) team and hasin-depth knowledge in the engine area.
Vincent CorvinelliVincent Corvinelli is an advisory software developer in the DB2 Query OptimizerDevelopment team at the IBM Toronto Lab.
developerWorks® ibm.com/developerWorks
Further understand column group statistics in DB2Page 26 of 26 © Copyright IBM Corporation 1994, 2008. All rights reserved.