21
Using Partial Tag Using Partial Tag Comparison in Low-Power Comparison in Low-Power Snoop-based Chip Snoop-based Chip Multiprocessors Multiprocessors Ali Shafiee Narges Shahidi Amirali Baniasadi Sharif University of Technology University of Victoria 1

Using Partial Tag Comparison in Low-Power Snoop-based Chip Multiprocessors

  • Upload
    aleta

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Using Partial Tag Comparison in Low-Power Snoop-based Chip Multiprocessors. Ali ShafieeNarges Shahidi Amirali Baniasadi Sharif University of Technology University of Victoria. This Work: Improving Snoop Coherency. Goal: Improving energy efficiency in snoop-based CMPs. - PowerPoint PPT Presentation

Citation preview

Page 1: Using Partial Tag Comparison in Low-Power Snoop-based Chip  Multiprocessors

Using Partial Tag Comparison in Using Partial Tag Comparison in Low-Power Snoop-based Chip Low-Power Snoop-based Chip

MultiprocessorsMultiprocessors

Ali Shafiee Narges Shahidi Amirali Baniasadi

Sharif University of TechnologyUniversity of Victoria

1

Page 2: Using Partial Tag Comparison in Low-Power Snoop-based Chip  Multiprocessors

Goal: Improving energy efficiency in snoop-based CMPs.

Motivation: Broadcasting/processing entire tag is inefficient.

Our Solution: Using Partial Tag Comparison (PTC) prior to snoop.

Key Results Performance (2.9%)

Tag array power (52%) Bandwidth utilization (78.5%)

2

This Work: Improving Snoop Coherency This Work: Improving Snoop Coherency

Page 3: Using Partial Tag Comparison in Low-Power Snoop-based Chip  Multiprocessors

Our Solution (PTC) vs. Conventional Our Solution (PTC) vs. Conventional

3

D$D$

Interconnect Interconnect

Upper Level CacheUpper Level Cache

….D$D$ D$D$ D$D$

Upper Level Cache

….D$D$ D$D$

InterconnectInterconnect

Conventional Our solution

Fast +Power & Bandwidth −

Fast ++ (early miss detection)

Power & Bandwidth Efficient +

Page 4: Using Partial Tag Comparison in Low-Power Snoop-based Chip  Multiprocessors

Conventional Snooping

4

Address BusAddress Bus Snoop Bus Snoop Bus

Command BusCommand Bus

D$CPUCPU

D$

D$D$

CPU CPU

21

3

33

controller54 4

4

Redundant (miss): ~

70%

Page 5: Using Partial Tag Comparison in Low-Power Snoop-based Chip  Multiprocessors

Snoop Filters

5

Goal: Eliminate redundant snoop requests.Example: RegionScout (ISCA’05), CGCT(ISCA’05), SSP

(ASPLOS’08)

PTC:(1) Early miss detection using subset of tag bits. (2) Once a miss is detected, snoop is avoided.

How often is that possible?

Page 6: Using Partial Tag Comparison in Low-Power Snoop-based Chip  Multiprocessors

6

How often using n bits is enough to detect a miss?

95+% of misses can be detected using 8 bits.

Page 7: Using Partial Tag Comparison in Low-Power Snoop-based Chip  Multiprocessors

7

D$

Address BusAddress Bus

LSB

LSB

LSB

misshit

Avoid Snoop Access Upper Level

Snoop Potential Targets

PTC-Filter

PTC-Filter

Page 8: Using Partial Tag Comparison in Low-Power Snoop-based Chip  Multiprocessors

PTC-Filter

8

4-way D$

4-way D$

4-way D$

4-way D$

4-way D$

4-way D$

4-way D$

4-way D$

PTC-FilterPTC-Filter FilterFilter FilterFilter FilterFilter

0 1 2 3

Core1’s LSB Core2’s LSB Core3’s LSB

VDLSB

8 bits

Page 9: Using Partial Tag Comparison in Low-Power Snoop-based Chip  Multiprocessors

PTC: Filter Miss

9

Address BusAddress Bus Snoop Bus Snoop Bus

Command BusCommand Bus

D$CPUCPU

D$

D$D$

CPU CPU

32

controller

1

Page 10: Using Partial Tag Comparison in Low-Power Snoop-based Chip  Multiprocessors

PTC: Filter Hit

10

Address BusAddress Bus Snoop Bus Snoop Bus

Command BusCommand Bus

D$CPUCPU

D$

D$D$

CPU CPU

2

4

controller6

5

✗ ✗

✓1 ✗✗

3

Page 11: Using Partial Tag Comparison in Low-Power Snoop-based Chip  Multiprocessors

Filter Maintenance

11

PTC- FilterPTC- Filter

CPUCPU

1

B F D E

Request =A

33

Address Bus

Core 0

….. …..

Core i

Addr.

C W D

Snoop Controller

4

Command Bus5

6

6

miss A. place it in position of tag F

22

Pending Request Table

{Address=A, C=0,W=1, D=1}

A 0 1 1

Place A, insert in Way 1 of core 0

Page 12: Using Partial Tag Comparison in Low-Power Snoop-based Chip  Multiprocessors

12

Methodology

• SESC simulator 4-way CMP• SPLASH-2 benchmarks• CACTI 6.0

4 MB 4-banked 16-way 10 cycle latency L2

6 cycle arbitration + 2 cycle core to controller latency + Crossbar data network+ MESI protocol

DL1/IL1 4-way/2-way 64KB/32KB 3 cycle latency

64 B cache line+ 500 cycle Memory access

Page 13: Using Partial Tag Comparison in Low-Power Snoop-based Chip  Multiprocessors

13

Performance

Average: 2.9%

Page 14: Using Partial Tag Comparison in Low-Power Snoop-based Chip  Multiprocessors

14

Bandwidth

Average: 78.5%

Page 15: Using Partial Tag Comparison in Low-Power Snoop-based Chip  Multiprocessors

15

Tag Power

Average: 52%

Page 16: Using Partial Tag Comparison in Low-Power Snoop-based Chip  Multiprocessors

Why do benchmarks show different performance improvement? Different cache miss frequency Different early miss detection frequency Not all cache misses are on the critical path

Filter overhead: Timing: 1 cycle Power: 78.5% of single tag array access

16

Discussion

Page 17: Using Partial Tag Comparison in Low-Power Snoop-based Chip  Multiprocessors

PTC: Using subset of tag bits to improve

bandwidth/power efficiency.

Results: Performance: 2.9% Tag Power: 52% Bandwidth: 78.5%

17

Summary

Page 18: Using Partial Tag Comparison in Low-Power Snoop-based Chip  Multiprocessors

18

Page 19: Using Partial Tag Comparison in Low-Power Snoop-based Chip  Multiprocessors

19

Global vs. Local Miss

D$D$

Interconnect Interconnect

Upper Level CacheUpper Level Cache

….D$D$ D$D$

Have B? NO NO

D$D$

interconnect interconnect

Upper Level CacheUpper Level Cache

….D$D$ D$D$

Have B? NO YES

D$D$

NO

Global Miss Local Miss

local miss detection better power/bandwidth profile Remote miss detection (source-based approach) vs.

(destination-based filter)

Page 20: Using Partial Tag Comparison in Low-Power Snoop-based Chip  Multiprocessors

20

Partial tag lookup: global miss

Page 21: Using Partial Tag Comparison in Low-Power Snoop-based Chip  Multiprocessors

21

Partial tag lookup: local miss