Probabilistic Detection and Sampling of Concurrency Bugslcs.ios.ac.cn/~yancai/pdf/Talk_ASE16_FSE16_Simplified_YanCai.pdf · Execution of events ... • Access pairs not ordered by

Probabilistic Detection and Sampling

of Concurrency Bugs

Yan Cai (蔡彦)[email protected]

State Key Lab. of Computer Science,

Institute of Software, Chinese Academy of Sciences中科院软件所·计算机科学国家重点实验室

Radius-aware

Probabilistic Deadlock detection

ASE’16

Yan Cai and Zijiang Yang

Locks and Deadlocks

Read

Thread 1

Write

Thread 2

Data1

Read

Thread 1

Write

Write

Thread 2

Read

Data2

Deadlock

3

Thread t1 Thread t2acq(m)

acq(n)

acq(n)

acq(m)

Deadlock Testing

• Random testing

– OS scheduling + random manipulation

– Stress testing

– Heuristic directed random testing

– Systematic scheduling

4

No Guarantee to find a

concurrency bug (e.g., Deadlock)

• PCT Algorithm

– Mathematical randomness with Probabilistic Guarantees

5

Thread t1 Thread t2

s01 acq(m)

s02 acq(n)

s03 rel(n)

s04 rel(m)

s05 acq(n)

s06 acq(m)

s07 rel(m)

s08 rel(n)

k =8, n =2, d =2

1

2 × 82−1= 1/16

1

𝑛 × 𝑘𝑑−1n: #threads, k: #events, d: bug depth

PCT – Probabilistic Concurrency Testing


• PCT :

– Intuition of guaranteed probability:

1. satisfy the 1st order by assigning the thread a largest priority (1/𝑛)

2. select d – 1 priority change points at the remaining d – 1 order

position (1/𝑘 × 1/k ×…× 1/𝑘 = 1

𝑘𝑑−1) ⇒1

𝑛×𝑘𝑑−1

6

Thread t1 Thread t2

s01 acq(m)

s02 acq(n)

s03 rel(n)

s04 rel(m)

s05 acq(n)

s06 acq(m)

s07 rel(m)

s08 rel(n)

k =8, n =2, d =21

2 × 82−1= 1/16


• Provide a guarantee (a probability ):

But …

• Theoretical model, not consider

thread interaction:

real executions do not follow designed

executions

• Guaranteed probability decreases

exponentially with increase of bug

depth: due to factor 1

𝑘𝑑−1.

7

Threads t1, t2, … tn

(a) Uniform distribution (b) Centralized distribution

Execu

tio

n

Execution of events

Coordination among threads

Ranges of events to be selected by PCT or RPro


… …

1

𝑛 × 𝑘𝑑−1n: #threads, k: #events, d: bug depth

RPro- Radius aware

• Our approach: RPro – Radius aware Probabilistic testing

8


(a) Uniform distribution (b) Centralized distribution

Execu

tio

n

Execution of events

Coordination among threads

Ranges of events to be selected by PCT or RPro


… …

• Consider thread interaction

• Guaranteed probability

decreases:1

𝑟(not

1

𝑘, r≪ k)

1

𝑛 × 𝑘𝑑−1

1

𝑛 × 𝑘 × 𝑟𝑑−2

PCT v.s. RPro


…

RPro- Radius aware

• RPro: Theoretical guarantee

9

0

Bug Radius

Probability

1 𝑛 × 𝑘𝑑−1

0

1 𝑛 × 𝑘 × 𝑟𝑑−2

rbugrbug – 1

PCT: Guaranteed probability

RPro: Guaranteed probability

RPro: Probability in practice

r = k

How to find rbug?

Experiment

• Results

10

p=0.0020

r=17, p=0.0439

0.00

0.01

0.02

0.03

0.04

0.05

0 15 30 45 60 75 90 105 120 135 150

PCT

RPro

p=0.0385

r=3, p=0.0632

0.02

0.03

0.04

0.05

0.06

0.07

0 15 30 45 60 75 90 105 120 135 150

p=0.0005

r=11, p=0.0229

0.00

0.01

0.01

0.02

0.02

0.03

0 15 30 45 60 75 90 105 120 135 150

p=0.0680

r=5, p=0.1123

0.06

0.07

0.08

0.09

0.10

0.11

0.12

0 15 30 45 60 75 90 105 120 135 150

p=0.1755

r=2, p=0.453

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50

0 15 30 45 60 75 90 105 120 135 150

p=0.4326

r=2, p=0.6863

0.40

0.45

0.50

0.55

0.60

0.65

0.70

0 15 30 45 60 75 90 105 120 135 150

p=0.0004

r=47, p=0.0022

-0.0001

0.0004

0.0009

0.0014

0.0019

0.0024

0 50 100 150 200 250 300

p=0.0088

r=27, p=0.0256

0.0050

0.0100

0.0150

0.0200

0.0250

0.0300

0 15 30 45 60 75 90 105 120 135 150

p=0.0000

r=114, p=0.0039

-0.0001

0.0009

0.0019

0.0029

0.0039

0.0049

0 50 100 150 200 250 300

p=0.0000

r=20, p=0.0062

-0.0001

0.0009

0.0019

0.0029

0.0039

0.0049

0.0059

0.0069

0 15 30 45 60 75 90 105 120 135 150

(a) JDBC-1 (b) JDBC-2

(c) JDBC-3 (d) JDBC-4

(e) Hawknl (f) SQLite

(g) MySQL-1 (h) MySQL-2

(i) MySQL-3 (j) MySQL-4

0

Bug Radius

Probability

1 𝑛 × 𝑘𝑑−1

0

1 𝑛 × 𝑘 × 𝑟𝑑−2

rbugrbug – 1

PCT: Guaranteed probability

RPro: Guaranteed probability

RPro: Probability in practice

r = k

Table 1. The best radiuses (rbest) of each benchmarks.

Benchmark #

events

#

threads

bug

depth 𝒓𝒃𝒆𝒔𝒕*

𝒓𝒃𝒆𝒔𝒕

#𝒆𝒗𝒆𝒏𝒕𝒔 Probability

Hawknl 28 3 3 2 - 0.4530

SQLite 16 3 3 2 - 0.6863

JDBC-2 5,050 3 3 3 0.059% 0.0632

JDBC-4 5,090 3 3 5 0.098% 0.1123

JDBC-3 5,080 3 3 11 0.217% 0.0229

JDBC-1 5,088 3 3 17 0.334% 0.0439

MySQL-4 444,621 19 3 20 0.005% 0.0062

MySQL-2 15,066 17 3 27 0.179% 0.0256

MySQL-1 19,300 16 3 47 0.244% 0.0022

MySQL-3 406,117 22 6 114 0.028% 0.0039

(* All rows are sorted on the data in this column.)

Deployable

Data Race Sampling

FSE’16

Yan Cai, Jian Zhang, Lingwei Cao, and Jian Liu

Concurrency bugs

• Difficult to detect

– Non-determinism (space explosion)

– Inadequate test inputs

– …

• Even after software release,

concurrency bugs may still occur

12

Concurrency bugs

• It is necessary to detect concurrency

bugs in deployed products

• Challenges:

not to disturb normal executions

– light-weighted

– …

Sample user executions

Detector

<5% overhead

13

Existing works

• Data Race

• Happens-before (HB Race)• Access pairs not ordered by happens-before relation (HBR)

Two threads concurrently access the same memory

location and at least one access is a write.

Thread t1

x++;sync(m){}

Thread t2

sync(m){x++;}

Thread t1

x++;

sync(m){}

Thread t2

sync(m){x++;}

Value of x: +2. Value of x: +1 or +2? 14

Existing works

• Happens-before Races

– Track full Happens-before relation

• Incurring many O(n) operations

Insight 1:

Not to track Full Happens-before Relation

0% sampling rate => ~30%

overhead (Pacer, PLDI’10)

~15% in our experiment

15

Existing works

• Hardware based (e.g., DataCollider, OSDI’10)

– Code Breakpoints and Data Breakpoints

(or Watchpoints)

– Collision Races

• A data race: two accesses

– Select a memory address =>

Set a data breakpoint =>

Wait for the breakpoint to be fired

– The waiting time directly increases the sampling overhead

Insight 2: Not to directly delay executions16

Existing works

• …

• See our paper for more insights

17

Our Proposal

• Clock Race

– For data race sampling purpose

• CRSampler

– To detect clock races

18

Thread 1 1

Thread 2

2

1 𝑘 is not changed between time1 and time2.

time1

time2

Tim

e el

ap

se

Clock Race

• Clock Race

– Thread-local clock: an integer for each thread, increased on

synchronization operation.

– Two accesses (with at least a write) form a Clock Race if:

at least one thread-local clock is not changed

in between the two accesses

sync

No clock races

sync

19

Thread 1 1

Thread 2

2


time1

time2

Tim

e el

ap

se

Clock Race

• A Quick Demonstration

Thread 1

acquire(l) onSync( );x = 0; sample(x);…

release(l) onSync( );

Thread 2

acquire(k) onSync( );…

release(k) onSync( );x ++;

1 𝑘 2 𝑘10 8

11 9

11 9

11 9

11 10

11 10

12

On this read, t1.clock remains 11,

a clock race on x is reported

Maintain thread-local clocks

Sampled access

20

Clock Race

• Clock Race

– Race checking does not need to delay any thread.

– But: after e1 appears, how much time is required to check two

accesses? • Given a short time, it is not enough to trap the second access.

• Given a long time, all threads’ lock clocks are changed.

Thread 1 1

Thread 2

2


time1

time2

Tim

e el

ap

se

One second,

or …

21

Setup

• Implementation

– Jikes RVM

– Sampling: Java class load time

– Memory accesses Linux Kernel

• Benchmarks

– Dacapo benchmark suite

JikesRVM

User-site

Agent

Kernel

SiteCPU

Set breakpoints

On firing

User space Kernel space

Netlink

Com.

Core of

DC/CR

Execu

tio

n

22

Setup

• Comparisons

– Sampling rate: 0.1% to 1.0%

– Pacer (PLDI’10)

– Data Collider (OSDI’10)

– CRSampler

• ThinkPad Workstation

– I7-4710MQ CPU, four cores, 16G memory, 250G SSD

23

15ms, 30msDC15, DC30

CR15, CR30

Experiments

• Overall Results

– Effectiveness

• CR: more data races at

low sampling rates

– Overhead

Bench-

marks

Binary

Size (KB) # of

threads

# of

sync. Pacer*

DC

15

DC

30

CR

15

CR

30

avrora09 2,086 7 3,312,801 3 3 3 5 3

xalan06 1,027 9 35,859,489 5 5 5 87 81

xalan09 4,827 9 12,599,144 0 2 2 84 91

sunflow09 1,017 17 1,590 0 0 2 46 45

pmd09 2,996 9 20,550 4 2 2 110 121

eclipse06 41,822 16 51,131,093 19 2 6 58 63

Sum: 31 14 20 390 404

y = 9.0749x + 0.1474

R² = 0.9397

y = 92.529x + 0.0588

R² = 0.9851

y = 173.46x + 0.0997

R² = 0.9796

0%

40%

80%

120%

160%

200%

0.1% 0.2% 0.3% 0.4% 0.5% 0.6% 0.7% 0.8% 0.9% 1.0%

Avere

age O

verh

ead

Sampling rate

Avg. Overhead of Three + TrendlinesPacer

DC15

DC30

CR15

CR30

y = 9.0749x + 0.1474

R² = 0.9397

y = 2.6312x + 0.0252

R² = 0.9624

y = 2.0859x + 0.0269

R² = 0.98990%

5%

10%

15%

20%

25%

30%

0.1% 0.2% 0.3% 0.4% 0.5% 0.6% 0.7% 0.8% 0.9% 1.0%

Avere

age O

verh

ead

Sampling rate

Avg. Overhead of Pacer and CRSampler + TrendlinesPacer

CR15

CR30

y = 9.0749x + 0.1474

R² = 0.9397

y = 92.529x + 0.0588

R² = 0.9851

y = 173.46x + 0.0997

R² = 0.9796

0%

40%

80%

120%

160%

200%

0.1% 0.2% 0.3% 0.4% 0.5% 0.6% 0.7% 0.8% 0.9% 1.0%

Avere

age O

verh

ead

Sampling rate


DC15

DC30

CR15

CR30

y = 9.0749x + 0.1474

R² = 0.9397

y = 2.6312x + 0.0252

R² = 0.9624

y = 2.0859x + 0.0269

R² = 0.98990%

5%

10%

15%

20%

25%

30%

0.1% 0.2% 0.3% 0.4% 0.5% 0.6% 0.7% 0.8% 0.9% 1.0%

Avere

age O

verh

ead

Sampling rate


CR15

CR30

DC30

DC15

CR30 , CR15Pacer

5%

24

Experiments

• Discussions

– DataCollider: overhead from its delays.• DC30 has almost 2 times overhead than DC15.

– Pacer: basic overhead ~15%

– CRSampler: ~5% overhead at 1.0% sampling rate.

25

y = 9.0749x + 0.1474

R² = 0.9397

y = 92.529x + 0.0588

R² = 0.9851

y = 173.46x + 0.0997

R² = 0.9796

0%

40%

80%

120%

160%

200%

0.1% 0.2% 0.3% 0.4% 0.5% 0.6% 0.7% 0.8% 0.9% 1.0%

Avere

age O

verh

ead

Sampling rate


DC15

DC30

CR15

CR30

y = 9.0749x + 0.1474

R² = 0.9397

y = 2.6312x + 0.0252

R² = 0.9624

y = 2.0859x + 0.0269

R² = 0.98990%

5%

10%

15%

20%

25%

30%

0.1% 0.2% 0.3% 0.4% 0.5% 0.6% 0.7% 0.8% 0.9% 1.0%

Avere

age O

verh

ead

Sampling rate


CR15

CR30

y = 9.0749x + 0.1474

R² = 0.9397

y = 92.529x + 0.0588

R² = 0.9851

y = 173.46x + 0.0997

R² = 0.9796

0%

40%

80%

120%

160%

200%

0.1% 0.2% 0.3% 0.4% 0.5% 0.6% 0.7% 0.8% 0.9% 1.0%

Avere

age O

verh

ead

Sampling rate


DC15

DC30

CR15

CR30

y = 9.0749x + 0.1474

R² = 0.9397

y = 2.6312x + 0.0252

R² = 0.9624

y = 2.0859x + 0.0269

R² = 0.98990%

5%

10%

15%

20%

25%

30%

0.1% 0.2% 0.3% 0.4% 0.5% 0.6% 0.7% 0.8% 0.9% 1.0%

Avere

age O

verh

ead

Sampling rate


CR15

CR30

DC30

DC15

CR30 , CR15Pacer

5%

26

Thanks~

Documents

Probabilistic Detection and Sampling of Concurrency Bugslcs.ios.ac.cn/~yancai/pdf/Talk_ASE16_FSE16_Simplified_YanCai.pdf · Execution of events ... • Access pairs not ordered by