33
Dynamically Specialized Datapaths for Energy Efficient Computing Venkatraman Govindaraju, Chen- Han Ho, Karu Sankaralingam Department of Computer Sciences UW-Madison http://www.cs.wisc.edu/vertical 1

Dynamically Specialized Datapaths for Energy Efficient Computing

  • Upload
    lali

  • View
    56

  • Download
    1

Embed Size (px)

DESCRIPTION

Dynamically Specialized Datapaths for Energy Efficient Computing. Venkatraman Govindaraju , Chen-Han Ho , Karu Sankaralingam Department of Computer Sciences UW-Madison http://www.cs.wisc.edu/vertical. Hardware Improvement. Wedding Cake!. Cupcake!. Pancake!. 1971. 1991. 2011. - PowerPoint PPT Presentation

Citation preview

Page 1: Dynamically Specialized  Datapaths  for Energy Efficient Computing

Dynamically Specialized Datapaths for Energy Efficient Computing

Venkatraman Govindaraju, Chen-Han Ho, Karu Sankaralingam

Department of Computer SciencesUW-Madison

http://www.cs.wisc.edu/vertical1

Page 2: Dynamically Specialized  Datapaths  for Energy Efficient Computing

2

Hardware Improvement

Pancake!

Wedding Cake!

Cupcake!

Not exactly!1971 1991 2011

Page 3: Dynamically Specialized  Datapaths  for Energy Efficient Computing

3

Technology Scaling

Okay, but how is a wedding cake made?

Honey, I shrunk the cooks!

Page 4: Dynamically Specialized  Datapaths  for Energy Efficient Computing

4

The CPU Approachin-order processor

Cupcake!C!

Page 5: Dynamically Specialized  Datapaths  for Energy Efficient Computing

5

The Advanced CPU ApproachOut-of-order, Superscalar

Wedding Cake!

WC!

Do as scheduled!

You mis-predicted!

Two ways at once!

Partial cake from

refrigerator!

Partial cake to

refrigerator!

Load strawberry!

Better performance, but not efficient!

Too many things to do!

Page 6: Dynamically Specialized  Datapaths  for Energy Efficient Computing

6

Hardware Specialization

• We can build a specialized hardware datapath for a certain application

• Will be efficient• Example: GPU for

graphics processing• But,..

“The Wedding Cake Team”

Page 7: Dynamically Specialized  Datapaths  for Energy Efficient Computing

7

Can I get a strawberry pancake?

What are you talking about?

Performance, Efficiency, and Flexibility?

Page 8: Dynamically Specialized  Datapaths  for Energy Efficient Computing

8

Dynamically Specialized Execution Resources : DySER

Dynamically Specialized Execution!

Page 9: Dynamically Specialized  Datapaths  for Energy Efficient Computing

9

Overview

• Dynamically Specialized Execution• Hardware resource: DySER– How to specialize and be dynamic?

• The compile time support: Slicer• HW/SW interface: ISA extensions• Integration, performance, and conclusion

Page 10: Dynamically Specialized  Datapaths  for Energy Efficient Computing

10

A Little PeekFetch Decode Execute Memory WriteBack

D$

I$Register

File

Decode ExecUnits

DySER

Page 11: Dynamically Specialized  Datapaths  for Energy Efficient Computing

11

DySER: Summary

Pipe

Shared Cache

DySER

• Heterogeneous array• ≈ 64 KB SRAM area• Up to 10X speedup• An average of 40% energy reduction

Page 12: Dynamically Specialized  Datapaths  for Energy Efficient Computing

12

Dynamically Specialized Execution Resources

• An array of functional units and switches

• A stateless execution unit in processor pipeline– Pipelined– Simple flow control

A B

C

A*B+C

Page 13: Dynamically Specialized  Datapaths  for Energy Efficient Computing

13

Dynamic Specialization

• Capture the pattern between different applications

• The specialized datapath is constructed at the granularity of functional units– Switches for

programmability

Page 14: Dynamically Specialized  Datapaths  for Energy Efficient Computing

14

How DySER Works

• Same DySER block, different pattern

• Simple switch is sufficient– Routers are

energy inefficient• Remove per-

instruction overhead

Specialization Efficiency⇒ Circuit SwitchPacket Switch

Page 15: Dynamically Specialized  Datapaths  for Energy Efficient Computing

15

Slice and Dice

• Dynamically Specialized Execution• Hardware resource: DySER– How to specialize and be dynamic?

• The compile time support: Slicer• HW/SW interface: ISA extensions• Integration, performance, and conclusion

Page 16: Dynamically Specialized  Datapaths  for Energy Efficient Computing

16

Identifying The Specialization Target

• Applications are executed in phases– Capture the most

frequent phase

• Identify the phases– Path profiling

• Construct path-treesFind computation? Use DySER!

Page 17: Dynamically Specialized  Datapaths  for Energy Efficient Computing

17

Core DySER

Slicer: A Compiler for the DySER • The instructions in path-

trees are not all computations– Slice the path-tree into a

computation slice and a load slice

• Execute computation slice in DySER

• Execute load-slice in conventional processor pipeline

Slicer

Application

Communication

Page 18: Dynamically Specialized  Datapaths  for Energy Efficient Computing

18

Working Together

• Dynamically Specialized Execution• Hardware resource: DySER– How to specialize and be dynamic?

• The compile time support: Slicer• HW/SW interface: ISA extensions• Integration, performance, and conclusion

Page 19: Dynamically Specialized  Datapaths  for Energy Efficient Computing

19

Communication Between The DySER and Processor Core

• DySER interface: ISA extension

bb1: MOV control1 => R2MOV control2 => R3MOV 1 => R4SLL R4, target => R4LD reg->node => R5DYSER_INIT [COMPSLICE]DYSER_SEND R2 => DI1DYSER_SEND R3 => DI2DYSER_SEND R4 => DI3

bb2: DYSER_LOAD [R5+offset(state)] => DM0DYSER_STORE:DO2 DO1, [R5+offset(state)]DYSER_COMMITADD R5, sizeof(node), R5ADDCC R1, -1, R1BNE bb2

Initialize DySERSend input from

register file to DySERSend input

from memory to DySER

Store output from DySER to memory

Commit DySER output to register file

Page 20: Dynamically Specialized  Datapaths  for Energy Efficient Computing

20

Energy Efficient Bakery Is About to Open!

DySER to the rescue!

Integration!

Page 21: Dynamically Specialized  Datapaths  for Energy Efficient Computing

21

Back To Hardware

• Dynamically Specialized Execution• Hardware resource: DySER– How to specialize and be dynamic?

• The compile time support: Slicer• HW/SW interface: ISA extensions• Integration, performance, and conclusion

Page 22: Dynamically Specialized  Datapaths  for Energy Efficient Computing

22

It Is Simple -- Integration

• DySER interface: FIFOFetch Decode Execute Memory WriteBack

D$

I$Register

File

Decode ExecUnits

DySER

Page 23: Dynamically Specialized  Datapaths  for Energy Efficient Computing

23

Out-of-Order Integration

• Out-of-order core integration

• DySER itself maintains no architectural state

• Use buffers to keep the state for speculative execution

Page 24: Dynamically Specialized  Datapaths  for Energy Efficient Computing

24

It Is Good – Evaluation Method

• Simulator: Wisconsin Multifacet GEMS– Benchmarks: SPEC CPU2006, Parboil, and PARSEC– Modified GCC compiler– DySER with 64 functional units

• Speedup & energy reduction– Quantify the low overhead execution on computation

slice– Wattch-based model in GEMS

Page 25: Dynamically Specialized  Datapaths  for Energy Efficient Computing

25

Result - Performance

cp pnssad

blacksch

oles

bodytrack

cannealnamd

soplex

lbm

Geomean1

3

5

7

9

11

1-issue inorder2-issue out-of-order

Spee

dup

Page 26: Dynamically Specialized  Datapaths  for Energy Efficient Computing

26

Result – Energy Reduction

cp pnssad

blacksch

oles

bodytrack

cannealnamd

soplex

lbm

Geomean0

102030405060708090

100

1-issue inorder2-issue out-of-order

Ener

gy R

educ

tion

(%)

Page 27: Dynamically Specialized  Datapaths  for Energy Efficient Computing

27

It is flexible – comparison

• DySER can be SIMD, can do operation-fusion, can accelerate loops– Not enough resources? – The Slicer can help to partition the computational

slice and offload from DySER to processor core• DySER looks like dataflow, but..– No entire new ISA, no routers or packets, no burden

to programmers

Page 28: Dynamically Specialized  Datapaths  for Energy Efficient Computing

28

Conclusion

• Hardware specialization is efficient– Dynamic approach with moderate integration

complexity and few ISA extensions– Up to 10X speedup, ~40% average energy redutcion

• Future work:– FPGA implementation– Comparison with other specialization approaches• FPGA • GPGPU• SSE, AVX

Page 29: Dynamically Specialized  Datapaths  for Energy Efficient Computing

29

Questions?

Page 30: Dynamically Specialized  Datapaths  for Energy Efficient Computing

30

Backup Slides

Page 31: Dynamically Specialized  Datapaths  for Energy Efficient Computing

31

Can This Work?Benchmark Number of pathtrees Pathtrees contribute 90%

execution time

blackscholes 9 3bodytrack 322 9canneal 89 12facesim 906 22

fluidanimate 33 2freqmine 151 31

streamcluster 61 1swaptions 36 6

• We also find: applications re-execute Path-tree several times before moves to next

Page 32: Dynamically Specialized  Datapaths  for Energy Efficient Computing

32

Related work

• Industrial effort

• Generality

• RAW• TRIPS• Wave scalar

• VEAL(ISCA 08)

• scalability

• DySER

• Ambric• Mathstar

Page 33: Dynamically Specialized  Datapaths  for Energy Efficient Computing

DySER Configuration

• Special configure phase– Encode configure information in data, passing through

the existing datapath

33

S1 : L->R

Switch 0: Switch 1:

Not mine This is it!

Switch 1:Left -> Right