Upload
lester-lewis
View
217
Download
1
Embed Size (px)
Citation preview
1
Seoul National University
Performance
2
Performance Example
Seoul National University
Sonata
Boeing 727
Speed
100 km/h
1000km/h
Seoul to Pusan
10 hours
1 hour
Passengers
5
100
Concorde 0.5 hour2000km/
h20
3
Time – the first type of performance
Seoul National University
Wall-clock time, response time, or elapsed time actual time from start to completion includes everything: CPU time for other programs as well as
for itself, I/O, operating system overheads, etc
CPU (execution) time CPU time spent for a given program user CPU time + system CPU time e.g., results of UNIX time command
90.7u 12.9s 2:39 65%
4
Decomposition of CPU (Execution) Time
Seoul National University
SecondsProgramCPU time =
CyclesProgram
SecondsCycle
x
InstructionsProgram
CyclesInstruction
x SecondsCycle
x
=
=
5
More on CPI (Clocks or Cycles Per Instruction)
Seoul National University
CPI =
CPI Example
Instruction Class Frequency CPIi
ALU operations 43% 1
Loads 21% 2
Stores 12% 2
Branches 24% 2
Instruction Count
∑ ( CPIi x Ii )i = 1
n
CPI = 0.43 x 1 + 0.21 x 2 + 0.12 x 2 + 0.24 x 2
CPI = 0.43 x 1 + 0.21 x 2 + 0.12 x 2 + 0.24 x 2
6
Factors involved in the CPU Time
Seoul National University
InstructionsProgram
CyclesInstruction
SecondsCycle
Program ∨
Compiler ∨
ISA ∨ ∨
Organiza-tion ∨ ∨
Technology ∨
Seconds Instructions Cycles Seconds
Program Program Instruction Cycle
Seconds Instructions Cycles Seconds
Program Program Instruction Cycle
CPU time = = x x
7
RISC vs. CISC Arguments
Seoul National University
0
0.5
1
1.5
2
2.5
3
3.5
4
spice matrix nasa7 tpppp tomcatv doduc espresso eqntott li
SPEC 89 Benchmarks
MIPS/VAX
CPI ratio Performance ratio Instruction executed ratio
Source : Hennessy & Patterson Computer Architecture: A Quantitative Approach, 3rd Ed.(p.108), Morgan Kaufmann, 2002
MIPS (typical RISC) vs. VAX8700 (typical CISC)
8
Rate – the second type of performance
Seoul National University
MIPS (million instructions per second) MIPS = Instruction count
Execution time 10ⅹ 6
Specifies performance (roughly) inversely to execution time Easy to understand; faster machines means bigger MIPS Problems
It does not take into account the capabilities of the instructions. It varies between programs on the same computer. It can even vary inversely with performance!!
MFLOPS (million floating-point operations per second)
9
Ratio
Seoul National University
“X is n times faster than Y” means :Execution TimeY
Execution TimeX
“X is n% faster than Y” means : Execution TimeY n
Execution TimeX 100
“X is n order of magnitude faster than Y” means :Execution TimeY
Execution TimeX
= n
= 1 +
= 10n
10
How to Summarize Performance
Seoul National University
Arithmetic mean ∑ Ti
Harmonic mean n
∑
Geometric mean ∏ Ratioi
n
i=1
1n
n
i=1
1 Ri
nn
i=1
(Rate)
(Ratio)
(Time)
11
Arithmetic Mean
Seoul National University
Used to summarize performance given in times Average Execution Time=( ∑ Execution Times ) / n Assumes each benchmark is run an equal no. of times
Weighted Arithmetic Mean Weighted Average Execution Time =
∑ ( Wi Execution Times ) / ∑ WⅩ i
One possible weight assignment: equal execution time on some machine
n
i=1
n n
i=1 i=1
12
Harmonic Mean
Seoul National University
Used to summarize performance given in rates (e.g., MIPS, MFLOPS): Harmonic Mean = n / ∑ ( 1 / Ri ) Example
Four programs execute at 10, 100, 50 and 20 MFLOPS, respectively Harmonic mean is 4 / (1/10 + 1/100 + 1/50 + 1/20) = 22.2 MFLOPS
Weighted Harmonic Mean Weighted Harmonic Mean = ∑ Wi / ∑ ( Wi / Ri )i=1 i=1
i=1
n
n n
13
Benchmarks
Seoul National University
A benchmark is distillation of the attributes of a workload Domain specific Desirable attributes
Relevant: meaningful within the target domain Coverage: does not oversimplify important factors in the target domain Understandable Good metric(s) Scalable Acceptance: vendors and users embrace it
Two de facto industry standard benchmarks SPEC: CPU performance TPC: OLTP (On-Line Transaction Processing) performance
14
Benchmarks
Seoul National University
Benefits of good benchmarks Good benchmarks
Define the playing field Accelerate progress
• Engineers do a great job once objective is measurable and repeatable
− Set the performance agenda• Measure release-to-release progress• Set goals
Lifetime of benchmarks Good benchmarks drive industry and technology forward At some point, all reasonable advances have been made Benchmarks can become counter productive by engineering artificial
optimizations So, even good benchmarks become obsolete over time
15
Grand Example 1 – SPEC Benchmark
Seoul National University
SPEC (Standard Performance Evaluation Corporation) Formed in 1988 to establish, maintain, and endorse a standardized set of
relevant benchmarks and metrics for performance evaluation of modern computer systems
Founded by EE times, Sun, MIPS, HP, Apollo, DEC For more info, see http://www.spec.org
Create standard list of programs, inputs, reporting rules: Based on real programs and includes real system effects Specifies rules for running and reporting Measures observed execution time
16
Grand Example 1 – SPEC Benchmark
Seoul National University
17
Grand Example 1 – SPEC Benchmark
Seoul National University
18
Grand Example 1 – SPEC Benchmark
Seoul National University
19
SPEC Benchmark Update (2006)
Seoul National University
20
SPEC Benchmark Update (2006)
Seoul National University
21
SPEC Benchmark Update (2006)
Seoul National University
22
SPEC Benchmark Update (2006)
Seoul National University
23
Grand Example 2 – TPC Benchmark
Seoul National University
TPC (Transaction Processing Performance Council) OLTP benchmark Founded in August 1988 by Omri Serlin and 8 vendors Currently four different benchmarks available
TPC-C: Order-entry benchmark TPC-H, TPC-R: Decision support benchmark TPC-W: Transactional web benchmark
For more info, see http://www.tpc.org
24
Grand Example 2 – TPC Benchmark
Seoul National University
TPC-C Benchmark Databases consisting of a wide variety of tables Concurrent transactions of five different types over the database
New-order: enter a new order from a customer Payment: update customer balance to reflect a payment Delivery: delivery orders (done as a batch transaction) Order-status: retrieve status of customer’s most recent order Stock-level: monitor warehouse inventory
Transaction integrity (ACID properties) Users and database scale linearly with throughput Performance metric
Transaction rate: tpmC Price per transaction: $/tpmC
25
Grand Example 2 – TPC Benchmark
Seoul National University
26
Amdahl’s Law
Seoul National University
Concept : The performance enhancement possible with a given
improvement is limited by the amount that
the improved feature is used.
Execution time after improvement
Execution time affected by improvementAmount of improvement
+ Execution time unaffected=
27
Amdahl’s Law Example
Seoul National University
Suppose a program runs in 100 seconds on a machine, with multiply operations responsible for 80 seconds of this time. How much do I have to improve the speed of multiplication if I want my program to run five times faster
Execution time after improvement = + (100 - 80seconds)
20 seconds = + 20 seconds
0 =
Execution time affected by improvementAmount of improvement
+ Execution time unaffected
80 secondsn
80 secondsn
80 secondsn
28
Amdahl’s Law Example
Seoul National University
Integer instructions memoryFP
instructionsothers
Integerinstructions
memory FP instructions others
After adding a pipelined integer instruction execution unitand cache memory (with FP emulation)
29
Summary
Seoul National University
“Execution time is the only and unimpeachable measure of performance” CPU time equation can predict performance by estimating the effects of
changing features. Measuring performance requires good care
Good workloads (benchmarks) Good ways to summarize performance
Amdahl’s Law Speedup is limited by unimproved part of program
Seconds Instructions Cycles Seconds
Program Program Instruction Cycle
Seconds Instructions Cycles Seconds
Program Program Instruction Cycle
CPU time= = x x