55

Getting 20x Performance Improvement in Data Routing

Embed Size (px)

Citation preview

Page 1: Getting 20x Performance Improvement in Data Routing

SignalFx

Page 2: Getting 20x Performance Improvement in Data Routing

SignalFx

Getting to 20x Performance Improvement on our Data Routing Layer

Rajiv Kurian, Software [email protected]

Page 3: Getting 20x Performance Improvement in Data Routing

Agenda

1. Introduction2. Properties of modern memory systems3. Evolution of our data router4. Results5. Q&A (hopefully)

Page 4: Getting 20x Performance Improvement in Data Routing

SignalFx

What does SignalFx do?

Page 5: Getting 20x Performance Improvement in Data Routing

• High resolution: • Any mix of resolutions up to 1 sec

• Streaming analytics: • custom analytics pipelines at any scale• Streaming dashboards update within seconds

• Multidimensional metrics: • add dimensions to model metrics however you like• Use them to aggregate & filter (e.g. 99th-percentile-of-latency-by-

service-by-customer) interactively on streaming data

SignalFx is an advanced monitoring platform for modern applications

Page 6: Getting 20x Performance Improvement in Data Routing

SignalFx

What is the data routing layer

Page 7: Getting 20x Performance Improvement in Data Routing

SignalFx data routerRaw data in Processed data out

PUBLISHER0

SUBSCRIBER 1

SUBSCRIBER 0

SUBSCRIBER 2

PUBLISHER1

PUBLISHER2

Time Series ID: 1212450

Payload: 0b1000100010

Page 8: Getting 20x Performance Improvement in Data Routing

SignalFx data router - subscribers

Subscriptions

PUBLISHER0

SUBSCRIBER 1

SUBSCRIBER 0

SUBSCRIBER 2

PUBLISHER1

PUBLISHER2

Subscriber ID: 1224525566

Time Series ID: 1212450

Page 9: Getting 20x Performance Improvement in Data Routing

Routing table

Routing table

Key: 128759 Set<Subscriber>

Key Subscribers

Routing data

Page 10: Getting 20x Performance Improvement in Data Routing

SignalFx

Properties of modern memory systems

Page 11: Getting 20x Performance Improvement in Data Routing

SignalFx Main memory

L1 D L1 I

L3

L1 D L1 I

L2L2

CORE 1 CORE 2

11

1

1

Page 12: Getting 20x Performance Improvement in Data Routing

Cache Lines

•The memory subsystem makes a few bets to help us:•Temporal locality•Spatial locality•Prefetching

Page 13: Getting 20x Performance Improvement in Data Routing

SignalFx

L3

L2L2

CORE 1 CORE 2

L1 L1

Main memory1

1

1

2

1

2

2

2

1 2

Page 14: Getting 20x Performance Improvement in Data Routing

SignalFx

L1 L1

L2L2

L3

CORE 1 CORE 2

Main memory 1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

2

1 2 3 4 5 6 7 8

1 4 3 6 8 7 5

Page 15: Getting 20x Performance Improvement in Data Routing

SignalFx

L1 CORE

Page 16: Getting 20x Performance Improvement in Data Routing

SignalFx

L2 CORE

Page 17: Getting 20x Performance Improvement in Data Routing

SignalFx

MainMemory CORE

Page 18: Getting 20x Performance Improvement in Data Routing

SignalFx

The evolution of our data routing layer

Page 19: Getting 20x Performance Improvement in Data Routing

Routing table

Routing table

Key: 128759 Set<Subscriber>

Key Subscribers

Page 20: Getting 20x Performance Improvement in Data Routing

Routing table v1

HashMap<Long, HashSet<Subscriber>>

Subscriber Objects

Data Key Set<Subscriber>

1212450 {1228, 4412}

3989 {12244}

8921224 {3244}

245819 {3244, 12244, 1228}

Subscriber ID Host Port

1228 …. ….

Subscriber ID Host Port

12244 …. ….

Subscriber ID Host Port

4412 …. ….

Subscriber ID Host Port

3244 …. ….

Page 21: Getting 20x Performance Improvement in Data Routing

But …

We want to be able to support millions of subscriptions per publisher, while doing more than 2 million queries per second

Page 22: Getting 20x Performance Improvement in Data Routing

Set<Subscriber>Boxed long

key* value*key* value*

List

List

List

List

HashMap <Long, HashSet<Subscriber>>

1

2

3 4

????

Page 23: Getting 20x Performance Improvement in Data Routing

So why did we need a better data router?

• Look ups are O(1) ….• Cache misses • High memory overhead

Page 24: Getting 20x Performance Improvement in Data Routing

Routing table v2 - bloom filters

A Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set.

False positive matches are possible, but false negatives are not, thus a Bloom filter has a 100% recall rate

Page 25: Getting 20x Performance Improvement in Data Routing

SignalFx

Routing table v2 - write

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Subscriber bloom filter

Hash 1 Hash 2 Hash 3

3 9 12

127829

0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0

Page 26: Getting 20x Performance Improvement in Data Routing

SignalFx

0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0

Routing table v2 - read hit

Subscriber bloom filter

Hash 1 Hash 2 Hash 33 9 12

127829

0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0

Page 27: Getting 20x Performance Improvement in Data Routing

SignalFx

0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0

Routing table v2 - read miss

Subscriber bloom filter

Hash 1 Hash 2 Hash 33 9 14

120422

0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0

Page 28: Getting 20x Performance Improvement in Data Routing

long 0 long 1 long 2 long 3

long 4 long 5 long 6 long 7

long 8 long 9 long 10 long 11

long 12 long 13 long 14 long 15

long 16 long 17 long 18 long 19

long 20 long 21 long 22 long 23

long 24 long 25 long 26 long 27

long 28 long 29 long 30 long 31

long 32 long 33 long 34 long 35

long 36 long 37 long 38 long 39

1

2

3

Typical bloom filter get lookupKey Hash 1 Hash 2 Hash 3

43 168 312

Page 29: Getting 20x Performance Improvement in Data Routing

Bloom Filter 1long 4 long 5 long 6

long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long

Routing table v2Key Hash 1 Hash 2 Hash 3

43 168 312

Bloom Filter 2long 4 long 5 long 6

long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long

Bloom Filter 2long 4 long 5 long 6

long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long

Bloom Filter 4long 4 long 5 long 6

long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long

Bloom Filter 5long 4 long 5 long 6

long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long

Bloom Filter 6long 4 long 5 long 6

long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long

1 2 3

1 2 3

1 2 3

1 2 3

1 2 3

1 2 3 num_sub * 3

cache misses

Page 30: Getting 20x Performance Improvement in Data Routing

Progress so far

GetSubscribers() Memory

Naive hash map O(1) high

Bloom filter O(num_subscribers) low

Page 31: Getting 20x Performance Improvement in Data Routing

So why did we need a better data router?

• CPU Intensive• What did the profiler say? Data

router -> 32%

• Scaled poorly• CPU performance got worse with

the number of subscribers

Page 32: Getting 20x Performance Improvement in Data Routing

So how can we do better?

Specialize - we have a limited number of subscribers present at any time. Fewer than 128

Page 33: Getting 20x Performance Improvement in Data Routing

ID transformation

Subscriber ID

1228

4412

12244

3244

Subscriber ID

0

1

2

127

subscribercoordination

publisherassignment

Page 34: Getting 20x Performance Improvement in Data Routing

Producer Routing table

Data Key(8 bytes) Set<Subscriber>

Subscriber ID(0 - 127) Key (64 bit)

0 3890

subscribe message

Routing table V3

0000000000…..00013890

16 bytes bit set

Page 35: Getting 20x Performance Improvement in Data Routing

Boxed long

key* value*key* value*

List

List

List

List

Routing table V3 - regular hash map

1

2

3 4

long 1 long 2

Page 36: Getting 20x Performance Improvement in Data Routing

Routing table V4 - single array of longsEmpty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Key

Value 0-63

Value 64-127

Page 37: Getting 20x Performance Improvement in Data Routing

Routing table V4 - single array of longsKey 0 hash 0 Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Page 38: Getting 20x Performance Improvement in Data Routing

Key 0

Value 0-63

Value 64-127

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Routing table V4 - single array of longsKey 0 hash 0

Page 39: Getting 20x Performance Improvement in Data Routing

Routing table V4 - single array of longsKey 1 hash 0 Key 0

Value 0-63

Value 64-127

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Page 40: Getting 20x Performance Improvement in Data Routing

Routing table V4 - single array of longsKey 1 hash 0 Key 0

Value 0-63

Value 64-127

Key 1

Value 0-63

Value 64-127

Empty

Empty

Empty

Empty

Empty

Empty

Page 41: Getting 20x Performance Improvement in Data Routing

Routing table V4 - single array of longsKey 2 hash 3 Key 0

Value 0-63

Value 64-127

Key 1

Value 0-63

Value 64-127

Empty

Empty

Empty

Empty

Empty

Empty

Page 42: Getting 20x Performance Improvement in Data Routing

Key 0

Value 0-63

Value 64-127

Key 1

Value 0-63

Value 64-127

Empty

Empty

Empty

Key 3

Value 0-63

Value 64-127

Routing table V4 - single array of longsKey 2 hash 3

Page 43: Getting 20x Performance Improvement in Data Routing

Routing table V4 - single array of longsKey 0

Value 0-63

Value 64-127

Key 1

Value 0-63

Value 64-127

Empty

Empty

Empty

Key 3

Value 0-63

Value 64-127

1 Key 1 hash 0

Page 44: Getting 20x Performance Improvement in Data Routing

Routing table V4 - single array of longs

Key 0

Value 0-63

Value 64-127

Key 1

Value 0-63

Value 64-127

Key 2

Value 0-63

Value 64-127

Subscribers Array

Subscriber 0Subscriber 1Subscriber 2Subscriber 3Subscriber 4

…Subscriber 127

BitSet024

127

Key 1 hash 0

Key 0

Value 0-63

Value 64-127

Key 1

Value 0-63

Value 64-127

Key 2

Value 0-63

Value 64-127

Page 45: Getting 20x Performance Improvement in Data Routing

Progress so far

GetSubscribers() Memory

Naive hash map O(1) high

Bloom filter O(num_subscribers) low

Optimized hash map O(1) medium

Page 46: Getting 20x Performance Improvement in Data Routing

SignalFx

Results(library)

Page 47: Getting 20x Performance Improvement in Data Routing

Microbenchmark• Method:

• Heap: 3G• Number of subscribers: 128• Number of time series: 1048576• All time series have a random number of subscribers: [1, 128]• 2 million random queries

Writes Reads

Naive hash map 34469 ms (42x) 11900 ms (21x)

Bloom filter 31710 ms (39x) 54995 ms (97x)

Optimized hash map 805 ms (1x) 565 ms (1x)

Memory

2.6 GB (27x)

80 MB (0.83x)

96 MB (1x)

Page 48: Getting 20x Performance Improvement in Data Routing

SignalFx

Results(Application)

Page 49: Getting 20x Performance Improvement in Data Routing

SignalFx

CPU %

Page 50: Getting 20x Performance Improvement in Data Routing

SignalFx

CPU %

6 subscribers45 %

Page 51: Getting 20x Performance Improvement in Data Routing

SignalFx

Garbage collection

Page 52: Getting 20x Performance Improvement in Data Routing

SignalFx

Garbage collection

6 subscribers63 %

Page 53: Getting 20x Performance Improvement in Data Routing

Closing remarks / rant

• “Write code first, optimize later”….

• Analyze your data• Metrics• Logging

Page 55: Getting 20x Performance Improvement in Data Routing

SignalFx

Q & A