Upload
amal
View
48
Download
0
Embed Size (px)
DESCRIPTION
월간 마이크로소프트웨어 창간 20 주년 기념 세미나. 데이터베이스 ( 가제 ). 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -. 이 상 원 [email protected] http://vldb.skku.ac.kr 성균관대 & ㈜ 엑셈. Contents. A Quick Overview of Database History Why Self-Tuning Databases? Database Tuning in the 21 st century - PowerPoint PPT Presentation
Citation preview
데이터베이스의 끝없는 도전데이터베이스의 끝없는 도전- Towards Self-Tuning Databases - Towards Self-Tuning Databases
--이 상 원[email protected]://vldb.skku.ac.kr성균관대 & ㈜엑셈
월간 마이크로소프트웨어 창간 월간 마이크로소프트웨어 창간 20 20 주년 기념 세미나주년 기념 세미나
데이터베이스데이터베이스 (( 가제가제 ))데이터베이스데이터베이스 (( 가제가제 ))
Contents
• A Quick Overview of Database History– Why Self-Tuning Databases?
• Database Tuning in the 21st century
• Towards Self-Tuning Databases
A Quick Overview of Database History
• File systems(50s)
• Hierarchy/Network databases(60s)
• Relational databases(70s)
• Object databases(80s)
• OR databases(90s)
• XOR databases(00s)
A Quick Overview of Database History(2)
• The state-of-the-arts functionalities of RDBMSs– A giantic black-hole of application intelligence
– Developer
DB2, Oracle,SQL Server
Active DB(Trigger)
Spatial DB
Data Mining
EXCEL
MOLAP
ROLAP(Cube) Deductive DB
(Recursion)
Procedure-C/C++, Java, ..
Temporal DB
Queue e-mail
Multimedia
File System
XML/Stream/Bioinformatics/Appl. Server/Web Service ..
A Quick Overview of Database History(3)
• The state-of-the-arts functionalities of RDBMSs– Rapidly growing internal optimization/tuning knobs
– DBA
DB2, Oracle,SQL Server
MVs
PartitioningBuffer Mgmt
Bitmap Index
……….
Extensibility
Cost-BasedOptimizationAutomatic
Disk Mgmt
A Quick Overview of Database History(4)
• Achievements of RDBMSs1. Relational model and the “Esperanto” SQL “productivity”
2. Query optimization “can beat all but the best programmers”
3. Transaction management “ACID”
• E. F. Codd gone! but Relational forever!!
• A vision for database community
"My prediction is that eventually all storage systems will evolve to be database systems."(Jim Gray, 2002)
A Quick Overview of Database History(5)
DB = IT Core Platform in the 21st Century???
• ERP/CRM/SEM/BSC
• DW/OLAP/Data Mining
• Web Log Analysis
• GIS/XML/Mobile
• Bio-informatics
• ....
• Ubiquitous Databases
A Quick Overview of Database History(6)
• One big challenge for RDBMS– Simplicity is a big issue. (Jim Gray, 2003)
• Commercial DBMSs are exponentially complicated– Even 10-year veteran Oracle DBAs do not know 50% of all the n
ew features of Oracle’s latest release– Worsely, it would take more than 1 week for them to understand
the performance behavior of a single new feature– The gap between human DB learning curve and new features be
comes larger
“Crossing this chasm” seems to be impossible!!
• It’s time for self-tuning intelligence!
What is DB Tuning?
• Car tuning– Goals?
– What?
– How?
Why DB Tuning in the 21st Century?
• DB performance influences all aspects of your business– DB performance management is one of the most important
success factors in almost every IT projects– “ 미래 웹경쟁력은 데이터처리능력에” (Scott McNealy, Sun CEO,
전자신문 )
• DB tuning is not a technical issue, but a business issue!
– Improve your productivity
– Increase your revenue/profit
– Increase your ROI(Return on Investment)
Why DB Tuning in the 21st Century?(2)
1. More data - doubling every 9 month
2. More users - from CEO to sales representatives
3. More complex query - e.g. OLAP/data mining query
4. Faster responses– e.g. web response time impacts revenue!– e.g. real time personalization
0
500
1000
1500
2000
2500
3000
3500
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
Year
Pe
tab
yte
s
DiskSalesMoore'sLaw
• Data growth vs. computer speedup
* Moore’s Law -- # of transistors/chip doubles every 18 months (1965)
Why DB Tuning in the 21st Century?(4)
• Disk sales doubling every 9 months – Greg Papadopoulos, Sun CTO
– Also from Winter VLDB survey (http://www.wintercorp.com)
• Time to process all your data doubles every 18 months!– Without DB tuning, you need double-up investment on DB serve
r and software every 6 month!
• Maximize your DB performance!! (Let’s DBMax)– Maximize your PRODUCTIVITY
– Maximize your REVENUE/PROFIT
– Maximize your ROI
Why Self-Tuning Databases?
• War of TCO(Total Cost of Ownership) in 21st century– Major costs of IT infra.: H/W, S/W, Human Cost
• Moore’s law: H/W, S/W cost ↓
• BUT! Human cost ↑
– cf. 1950s: telephone industry
• manual switching by operators automatic switching system
Why Self-Tuning Databases?(2)
Blackbox Database Tuning
Self-Tuning: Targets
Server Server ProcessesProcesses
Shared PoolShared Pool Database Buffer CacheDatabase Buffer Cache
SGASGA Redo LogRedo LogBufferBuffer
Oracle InstanceOracle Instance
Parameter FileParameter File DatafilesDatafilesControl FilesControl Files Redo Log FilesRedo Log Files
Oracle DatabaseOracle Database
UserUserProcessesProcesses
SQLSQL
OptimizerOptimizerSQL Exec. SQL Exec.
EngineEngine
SQL Exec. SQL Exec. MemoryMemory
Big Picture of a Simplied DBMS
Self-Tuning: Targets(2)
• SQLs
• Memory
• Index/Materialized Views
• Emerging fields
SQL Self-Tuning: Optimizer
• SQL best execution plan– access method + join order + join method
– Rule-based optimizer vs. Cost-based optimizer
• The Selinger-style optmizer– P. Selinger et al, “Access path selection in a relational databas
e management systems,” SIGMOD 79
– SYSTEM/R
– A “MUST” reading for every SQL guys
• cost estimation
• dynamic programming
SQL Self-Tuning(2): Cost-Based Optimzation
Data Dictionary
Query Optimizer
SingleSelectivity
CombinedSelectivitySelectivity &
CardinalityEstimation
Formula
EstimatedCardinality
Cost Model EstimatedCost
RDBMS InstanceRuntime Memory Environment:
Buffer Cache Size,Sort/Hash Memory etc.
Real Data
2. Plan GeneratorP1, P2, .. , Pn
1. SQL4. (Estimated)Best Execution Plan for SQL:
3. Cost Estimator
Pi
for each plan estimated cost
SQL Self-Tuning(3): Cost-Based Optimzation
• Cost-based optimization technique itself is a self-tuning process, but it lacks 5%– 80% of SQL: Best plan
– 15%: top 3 plan
– 5%: not so good
• Fallacies of CBO1. Uniform distribution
2. Attribute Independence
3. Join Independence
SQL Self-Tuning(4): Cost-Based Optimzation
• How to fix the fallacies of CBO?– Various approaches: who? when? based on what?
1. Manual tuning
2. LEO
3. SOS [13]
4. Dynamic reoptimization
5. Parameteric query optimization
6. ….
SQL Self-Tuning(5): IBM DB2 LEO
• LEO( LEarning Optimizer ) – DB part of IBM SMART: Self-Managing And Resource Tuning
(See IBM’s IBM Autonomic Computing )
SQL Self-Tuning(6): IBM DB2 LEO
• eeed
SELECT * FROM X,Y,Z
WHERE X.PRICE>=100 AND Z.CITY=‘Denver’
AND Y.MONTH=‘Dec’ AND X.ID=Y.ID
AND Y.NR=Z.NR
GROUP BY A
Negative feed back!!!
SQL Self-Tuning(7): Oracle’s v$sql_plan_statistics
• Runtime statistics monitor(Since Oracle9iR2)– You can compare optimizer’s estimation with real statistics
– Then, you can find out where is the source of errors?
** Someday, Oracle and MS will come with LEO-like features
• SKKU and XM are developing SOS tool– “Save Our SQL”
– How to tune SQL (semi-)automatically?
1. Find the negative feedback
2. Drill down to the causes
3. Analyze the effect
4. Hint the SQL
Self-Tuning: Targets
• SQLs
• Memory
• Index/Materialized Views
• Emerging fields
Memory Self-Tuning
• Complex queries use memory intensive operators– Sorting, hashing, bitmap index manipulation etc.
• The amount of available memory greatly affects their performance
• However, there is only a finited amount of memory, shared by all concurrent users
Response Time
Memory
..
1-pass optimalM-pass
Sort
M1-pass = sqrt(I x S x 2)
Moptimal = I
Memory Self-Tuning(2): Oracle SQL Memory Mgmt.
Oracle Memory Model
Sort Area, Hash Area, Bitmap Index
• Pre-Oracle9i– Fixed PGA size
– Sort(Hash)_Area_Size
• Oracle9i [8]– PGA_Aggregate_Target
– Dynamically allocated according to the needs
Memory Self-Tuning(3): Oracle SQL Memory Mgmt.
0
100
200
300
400
500
600
1 2 4 6 8 10 12 16 20
Number of Users
Time (min)
Manual - 5MB
Manual - 15MB
Auto - 1.5G
• Manual vs. Automatic: Response time
0
1,000
2,000
3,000
4,000
5,000
6,000
1 2 4 6 8 10 12 16 20
Manual - 5MManual - 15MAuto
Number of Users
• Manual vs. Automatic: Total Memory Usage
Memory Self-Tuning(4): Oracle SQL Memory Mgmt.
• DBA specifies PGA_AGGREGATE_TARGET
• Oracle divides PGA memory for max. performance and optimal memory usage
Memory Self-Tuning(5): Oracle SQL Memory Mgmt.
Local Memory Manager
WP1
WP2
WP3
SnSnSnSn
Memory Bound Computation
Memory Target Computation
Memory Usage Statistics
SQL Statements pga_aggregate_target
Active work areas profiles
SQLMemoryTargetMemory
Bound
SMM Daemon
WP4
…
WPn
Self-Tuning: Targets
• SQLs
• Memory
• Index/Materialized View(MV)s
• Emerging fields
Index/MV Self-Tuning
• Selection of Index/MVs:
“SQL Workloads + Available Disk Space Best Index/MV sets”
– Workload: queries + updates
– Constraints: upper bound on storage space for indexes
– Search: pick a configuration that is of “lowest” cost for the given database and workload
• Industry cases– MS Autoadmin
– IBM Design Advisor
– Oracle OEM Index/Summary Advisor
Index/MV Self-Tuning(2): MS SQL Server
• AutoAdmin: Self-Tuning and -Administering Databases[12] “We achieve this by enabling databases to track the usage of their syste
ms and to gracefully adapt to application requirements. Thus, instead of applications having to track and tune databases, databases actively auto-tunes itself to be responsive to application needs.”
– Workload driven physical database design
– To reduce TCO(Total Cost of Ownership)
Index/MV Self-Tuning(3): MS SQL Server
Architecture of Index/MV Selection Tools
Issue - search space
• Large search space for indexes– Many columns to choose from
– Kinds of indexes
• Explosive search space for MVs
• Query optimizers use physical design in novel ways
• Physical design choices interact
Workload
Syntactic structure selection
CandidateIndex
Selection
ConfigurationEnumeration
FinalRecommendation
MicrosoftSQLServer
ConfigurationSimulationand CostEstimationModule
CandidateMaterializedView Selection
Self-Tuning: Targets
• SQLs
• Memory
• Index/Materialized View(MV)s
• Emerging areas for self-tuning
Emerging Areas for Self-Tuning
Server Server ProcessesProcesses
Shared PoolShared Pool Database Buffer CacheDatabase Buffer Cache
SGASGA Redo LogRedo LogBufferBuffer
Oracle InstanceOracle Instance
Parameter FileParameter File DatafilesDatafilesControl FilesControl Files Redo Log FilesRedo Log Files
Oracle DatabaseOracle Database
UserUserProcessesProcesses
SQLSQL
OptimizerOptimizerSQL Exec. SQL Exec.
EngineEngine
SQL Exec. SQL Exec. MemoryMemory
Big Picture of a Simplied DBMS
Emerging Areas for Self-Tuning(2)
• Automatic configuration of init. parameter[5]– e.g. IBM DB2 V7.2 “Configuration Advisor”
– For 36 parameters, 98.4% of DB2 performance experts
• Automating physical database layout[9]– “How database objects such as tables and indexes are assigned
to disk drives can impact the I/O performance of the system” (MS SQL Server)
– Automatic partitioning of tables under parallel databases (DB2)
Emerging Areas for Self-Tuning(3)
• Accurate storage device access cost estimation[11]– CPU speed, disk I/O time, network, RAID rebuild
– Inaccurate estimation cause (seriously) suboptimal query plan
– Why wrong estimation?
• storage or network may undergo heavy load or failure
• older/newer than the h/w the optimizer assume
– “We can achieve noticeable performance improvements by providing query optimizer with accurate and timely information about the current status of their storage devices”
Emerging Areas for Self-Tuning(4)
• System tuning[1]– Most urgent, but only ad-hoc or rule-of-thumb approaches with some mo
nitoring tools: Bizmax/MaxGuage, Orange/Chakra, Spotlight, Patrol, OEM
– Only the black box consulting
– Highly dependent on consultant or DBA experience
– Not scientic/engineering realm!!
• We need to develop mathematical model for the interactions among various system tuning knobs; Or, we need to construct and share the rule/knowledge base of the system tuning experts
MonitoringDrill-down
to BottleneckRoot Cause
AnalysisTuning
Conclusion
• Next decade will be self-tuning intelligence era– Better than average DBAs
• BUT, You may be still the MVP for IT ROI only if– “Crossing the chasm between you and your DBMS technology”
• Know your data
• Know your application workloads
• Know your DBMS intelligence
Let’s DBMax!
References
1. Gerhard Weikum et al., “Self-Tuning Database Technology and Information Services: from Wishful Thinking to Viable Engineering” VLDB 2002, Ten-Year Award
2. Gerhard Weikum et al, “The COMFORT Automatic Tuning Project”, Information Systems 19(5), 1994
3. Gerhard Weikum et al, “Special Issues on Self-Tuning Database and Applications,” Data Engineering Bulletine, Jun., 1999
4. IBM’s Autonomic Computing Perspectives: http://www.resarch.ibm.com/autonomic/manifesto/autonomic_computing.pdf
5. Guy Lohman et al, “Toward Automomic computing with DB2 Universal Database,” SIGMOD Record, Sep., 2002
6. MS AutoAdmin Homepage: http://www.research.microsoft.com/
7. Oracle Manageability: http://technet.oracle.com/products/manageability/content.html
References(2)
8. Benoit Dageville and Mohamed Zait, “SQL Memory Management in Oracle9i,” VLDB 2002
9. Jun Rao, Chun Zhang, Guy Lohman et al., “Automating Physical Database Design in a Parallel Database,” SIGMOD 2002
10. S. Agrawal and Surajit Chaudhuri et al., “Automating Layout of Relational Databases,” ICDE 2003
11. F. R. Reiss and T. Kanungo, “A Characterization of the Sensitivity of Query Optimization to Storage Access Cost Parameters,” SIGMOD 2003
12. S. Agrawal and Surajit et al., “Automated Selection of Materialized Views and Indexes for SQL Databases,” VLDB 2000
13. Michael Stillger, Guy M. Lohman et al., “LEO - DB2's LEarning Optimizer,” VLDB 2001