Upload
jonathan-haynes
View
322
Download
3
Embed Size (px)
Citation preview
软件调优基础软件调优基础
陈健陈健2003/32003/3
为什么需要调优?为什么需要调优?相同的代码 相同的代码 >> >> 不同的性能不同的性能
SELF RELEASE OPT :4
IMSL CXML ATLAS MKL50 MKL51
16.676s 5.445s 5.457s 10.996s 3.328s 0.762s 0.848s 0.738s
for(i=0;i<NUM;i++) { for(j=0;j<NUM;j++) { for(k=0;k<NUM;k++) { c[i][j] =c[i][j] + a[i][k] * b[k][j]; } }}
for(i=0;i<NUM;i++) { for(k=0;k<NUM;k++) { for(j=0;j<NUM;j++) { c[i][j] =c[i][j] + a[i][k] * b[k][j]; } }}
目标目标 明确性能调优的主要任务明确性能调优的主要任务
定义一些重要的性能调优术语定义一些重要的性能调优术语
利用利用 IntelIntel 工具提供帮助工具提供帮助
AgendaAgenda
Performance Cycle OverviewPerformance Cycle Overview–The Performance CycleThe Performance Cycle–When to StartWhen to Start–Performance GainsPerformance Gains–When to StopWhen to Stop–Putting it into PerspectivePutting it into Perspective
Performance Cycle DetailsPerformance Cycle Details SummarySummary
调优循环调优循环
分析数据并得出结论测试结果
修改代码实现优化
确定修改方法来解决问题
从这里开始 收集性能数据
When (why) to StartWhen (why) to Start User Requirement?User Requirement? Software Vendor Requirement?Software Vendor Requirement?
Put Performance Requirement into the Put Performance Requirement into the Requirements DocumentRequirements Document
Performance should be considered at every Performance should be considered at every stage of the product life cyclestage of the product life cycle (Requirements Gathering, Design, and (Requirements Gathering, Design, and Testing)Testing)
Exception: Do “code tuning” after the simple/readable Exception: Do “code tuning” after the simple/readable non-optimized version of the application exists.non-optimized version of the application exists.
工作 工作 vs. vs. 效果效果
Effort
Perfo
rmac
ne
Theoretical Performance Required Performance
Performance Attained w Tools Performance Attained w/o Tools
When to StopWhen to Stop
Architecture is at Maximum Efficiency?Architecture is at Maximum Efficiency? Be sure you know what this is: Calculate Be sure you know what this is: Calculate
Theoretical Maximum Theoretical Maximum
Performance Requirement is satisfiedPerformance Requirement is satisfied Incrementally do Wide Mesh OptimizationsIncrementally do Wide Mesh Optimizations22
until doneuntil done
调优原则调优原则We should forget about small efficiencies, say about We should forget about small efficiencies, say about
97% of the time: premature optimization is the root of 97% of the time: premature optimization is the root of all evil.all evil.
Donald KnuthDonald Knuth
Quality Code is:Quality Code is:– PortablePortable
– ReadableReadable
– MaintainableMaintainable
– ReliableReliable
Intelligently Sacrifice Quality for PerformanceIntelligently Sacrifice Quality for Performance
AgendaAgenda
Performance Cycle OverviewPerformance Cycle Overview Performance Cycle DetailsPerformance Cycle Details
–Gather Performance DataGather Performance Data
–Analyze Data and Identify IssuesAnalyze Data and Identify Issues
–Generate Alternatives to Resolve IssuesGenerate Alternatives to Resolve Issues
– Implement EnhancementsImplement Enhancements
SummarySummary
收集性能数据收集性能数据 TimerTimer
– Use to get wall clock timeUse to get wall clock time
– Accuracy, Low OverheadAccuracy, Low Overhead
Use IntelUse Intel®® VTune™ Performance Analyzer VTune™ Performance Analyzer– Profiler: Gather Information about Code UsageProfiler: Gather Information about Code Usage
– Performance Monitor: Gather Information about Performance Monitor: Gather Information about System Resource UsageSystem Resource Usage
工作量工作量 A good workload should have these A good workload should have these
characteristics:characteristics:– measurable measurable
– reproducible reproducible
– static static
– representative representative
分析数据得出结论分析数据得出结论
Baseline Current PerformanceBaseline Current Performance Examine Hot SpotsExamine Hot Spots Identify BottlenecksIdentify Bottlenecks Calculate Potential Maximum Calculate Potential Maximum
PerformancePerformance
Examine Hot SpotsExamine Hot Spots The Pareto Principle, a.k.a. the 80/20 RuleThe Pareto Principle, a.k.a. the 80/20 Rule
– Concentrate on the vital few vs. the trivial manyConcentrate on the vital few vs. the trivial many
Hot Spot: Hot Spot: 应用或系统中占主要运算量的部分应用或系统中占主要运算量的部分 Generally consists of a LoopGenerally consists of a Loop
For Applications that don’t have hot spots, For Applications that don’t have hot spots, examine:examine:– Memory LayoutMemory Layout– ExceptionsExceptions– Effective Compiler UsageEffective Compiler Usage
额外内容额外内容
Big OBig O Utilization, Efficiency, Throughput, LatencyUtilization, Efficiency, Throughput, Latency BottlenecksBottlenecks
– I/O, Memory, CPUI/O, Memory, CPU
MIPS/FLOPS/CPIMIPS/FLOPS/CPI Concurrency, ParallelismConcurrency, Parallelism ScalabilityScalability Loads/Stores per CalculationLoads/Stores per Calculation
AgendaAgenda
Performance Cycle OverviewPerformance Cycle Overview Performance Cycle DetailsPerformance Cycle Details
–Gather Performance DataGather Performance Data
–Analyze Data and Identify IssuesAnalyze Data and Identify Issues
–Generate Alternatives to Resolve IssuesGenerate Alternatives to Resolve Issues
– Implement EnhancementsImplement Enhancements
SummarySummary
优化设计层次优化设计层次 问题定义问题定义 系统结构系统结构 算法和数据结构算法和数据结构 代码调优代码调优 系统软件系统软件 系统硬件系统硬件
代码调优代码调优 汇编指令级汇编指令级 内部函数 内部函数 C++ C++ 向量类库向量类库 多线程多线程 循环转化循环转化 编译器及参数编译器及参数 性能库 性能库
Hardest to develop
and maintain
Easiest to develop,
port and maintain
Hardest to develop
and maintain
Easiest to develop,
port and maintain
Code TuningCode Tuning
If Parallel ProcessingIf Parallel Processing–Break Algorithm up across Clusters Break Algorithm up across Clusters
(Distributed Memory)(Distributed Memory)
–Single Node OptimizationSingle Node Optimization
–Break Algorithm up across Processors Break Algorithm up across Processors (SMP)(SMP)
修改代码实现优化修改代码实现优化 Use Intel® LibrariesUse Intel® Libraries Use Various Compiler SwitchesUse Various Compiler Switches Find out if the compiler or hardware Find out if the compiler or hardware
does the enhancements automatically does the enhancements automatically - before implementing yourself- before implementing yourself
Modify SourceModify Source(i.e. Loop Transformations, SWP,(i.e. Loop Transformations, SWP,SIMD, OpenMP, Intrinsics,SIMD, OpenMP, Intrinsics,Assembly)Assembly)
Test!Test!
Make sure Applications still runs Make sure Applications still runs correctly (Regression Testing)correctly (Regression Testing)
Make sure enhancement actually Make sure enhancement actually increases performanceincreases performance
Calculate Speed-upCalculate Speed-up
Decide if you’re done optimizingDecide if you’re done optimizing
Speed-UpSpeed-Up
Speed-Up =Optimized TimeBaseline Time
Speed-Up = Optimized Throughput
Baseline Throughput
The Two Basic Formulas
SummarySummary Optimization TasksOptimization Tasks
–Gather Performance DataGather Performance Data
–Analyze Data & Identify IssuesAnalyze Data & Identify Issues
–Generate Alternatives to Resolve IssueGenerate Alternatives to Resolve Issue
– Implement EnhancementsImplement Enhancements
–Test ResultsTest Results
Use Intel® Software Development Use Intel® Software Development Tools for every step in the processTools for every step in the process