Performance Evaluation of Packet Classification on FPGA-based TCAM Emulation Architectures GLOBECOM...

Preview:

Citation preview

Performance Evaluation of Packet Classification on FPGA-based TCAM Emulation Architectures

GLOBECOM (Global Communications Conference), 2012

Presenter: NTHU 101062607 李若萍

Outline•Introduction•Related Work•TCAM Emulation•RAM-based TCAM Architecture•Performance Evaluation•Conclusion

2/17

Introduction•Packet fields are used as keys to determine the best

matching rule and apply a corresponding action.▫Exact matching▫Prefix matching▫Range matching

•How to find the best matching rule?▫Each rule is assigned a cost.

3/17

Introduction (cont.)CAM(Content Addressable Memories) TCAM(Ternary Content Addressable

Memories)

SRAM cell ≠

VCC KEY

Match line

Data

Match

Match = (key ≠ Data)Match line = !Match

Match = (key ≠ Data) & MaskMatch line = !Match

SRAM cell ≠

VCC KEY

Match line

DataMatch

Mask SRAM

&Mask

Data Key Match line

0 0 1

1 0 0

0 1 0

1 1 1

Data Key Mask Actual Data Match line

0 0 0 X (don’t care) 1

1 0 0 X 1

0 0 1 0 1

1 0 1 1 0

0 1 0 X 1

1 1 0 X 1

0 1 1 0 0

1 1 1 1 1

4/17

Introduction (cont.)•TCAMs (Ternary Content Addressable Memories)

RAM

Compared key

0 1 X 1

TCAM

Priority Encoder

Compared result:

Memory address: 1 2 N

memory address as indexto find responding action

store rules

3

Capacity constraints Storage inefficiency High power consumptionLimited scalability

5/17

Introduction (cont.)•Purpose : we investigated performance and trade-

offs related to TCAM emulation in FPGAs (Field-Programmable Gate Array).

•We considered the impact of encoding different key ranges on rules for different configurations in terms of the search key length and the number of rules.

(Not ASIC: Application-Specific Integrated Circuits)

6/17

Related Work•Hardware-assisted packet classification▫Decision tree

Hierarchically split rule pattern straitens incremental updates.

▫Decomposition The cross-producting stage issue.

▫Exhaustive search Predictable memory requirements.

7/17

TCAM EmulationNative TCAM Emulated TCAM

8/17

RAM-based TCAM Architecture

m-bit key (m = 10)

w = m-1 = 9

m/w = 10/9 = 1 RAM blockblock size = 2^w = 2^9 ( 0~2^9-1 )

w = m-2 = 8

m/w = 10/8 = 1 RAM blockblock size = 2^w = 2^8 ( 0~2^8-1 )

w = 2

m/w = 10/2 = 5 RAM blockblock size = 2^w = 2^2 = 4 ( 0~3 )

w = m = 10

m/w = 1 RAM blockblock size = 2^w = 2^10 ( 0~2^10-1 )

Full address expansion

w = 1

m/w = 10/1 = 10 RAM blockblock size = 2^w = 2 ( 0~1 )

native TCAM

BRAMs demands (m/w) * 2^w bitsBRAMs modes = depth*width

9/17

RAM-based TCAM Architecture (cont.)

•n = 64, m = 16

w = 6

m/w = 16/6 = 2 RAM blockblock size = 2^w = 2^6 = 64

16 –bit key

m/w = 16/6 = 2

2^16*64

2^8*32*4

10/17

RAM-based TCAM Architecture (cont.)

11/17

Performance Evaluation•Resource Utilization▫ A TCAM bit typically demands 16 transistors, while

a RAM bit, only 6

▫TCAM => w*m*16▫TCAM emulation => (m/w)*(2^w)*6

emulated one(m/w)*(2^w)*6

TCAMw*m*16

12/17

Performance Evaluation (cont.)(m/w)*(2^w) bits

13/17

Performance Evaluation (cont.)

•Classification Throughput▫a crucial factor for evaluating emulated TCAM

performance on FPGA is the actual classification throughput in terms of packets per second (pps).

14/17

Performance Evaluation (cont.)

•Range Impact▫we assess the impact of supporting different

ranges in terms of memory requirements and classification rate.

15/17

Conclusion•Classification rates above 300Mpps for both large

keys and rule sets can be implemented with only a few megabits of RAM when considering up to medium size range intervals (512-2048).

•Support for both large ranges and large rule sets tends to demand much memory resources, which also penalizes the resulting classification rate.

16/17

Thank you!

The End.

17/17

Recommended