Upload
proidea
View
232
Download
0
Embed Size (px)
DESCRIPTION
In the world of multi-core programming, traditional parallel programming techniques with locks (mutexes and similar mechanisms) create performance bottlenecks. Lockless programming is a set of techniques employing atomic operations to synchronize data exchange between threads. The talk introduces the audience to the lockless programming, presents its benefits and pitfalls. The presenter will talk about support for atomic operations in different CPU families as well as support for them in lower- and higher-level languages. He will also cover reordering and memory barriers. He will end the talk with tips on designing lockless algorithms and practical examples of lockless data structures. Tomasz Barański - Tomasz Barański is a software developer working in Kraków for IBM T.J. Watson Research on projects related to High-Performance computing. He has got over 12 years experience in enterprise world, taking roles of a developer, tester, interaction designer and a go-to guy.
Citation preview
Lockless Programming
Tomasz BarańskiIBM Research
Me
Making software for 15 years
IBM Research @ KRK
Lockless?
Programming with multiple threads that access
shared memory and threads cannot block
each other.
Why?
And also
(Dead|Live)locks
Priority inversion
Lock convoy
How?
Atomic operations Memory barriers
Atomic operations Memory barriers
( τομοςἄ indivisible)
Atomic operations Memory barriers
CAS FAA|AAF
Atomic operations Memory barriers
CAS FAA|AAF
LoadLoad LoadStore
StoreLoad StoreStore
Compare-And-Swap
cas(val, old, new) =if val == old
val = newreturn SUCCESS
elsereturn FAIL
Fetch-And-Add
faa(val, i) =tmp = valval += ireturn tmp
Sequential consistency
acqiure lockread Xread Y
(…)store Ystore X
release lock
Pseudo-assembly
acqiure lockread Xread Y
(…)store Ystore X
release lock
acqiure lockread Y
(…)store X
(...)read X
(...)store Y
release lock
reordering
compiler(JVM)CPU
read Y(…)
store X(...)
read X(...)
store Y
read Y(…)
store X(...)
read X(...)
store Y
Thread 2Thread 1
What are X and Y?
Sequential consistency
All threads (on all CPUs) agree on order of all memory operations, and the order is consistent with the operations order in the source code.
Memory barriers
read XLoadLoad Barrier
read Y(…)
store Ystore X
read X(…)
store X(...)
read Y(...)
store Y
reordering
compiler(JVM)CPU
read Xread Y
(…)store Y
StoreStore Barrierstore X
read Y(…)
store Y(...)
read X(...)
store X
reordering
compiler(JVM)CPU
read Xread Y
(…)LoadStore Barrier
store Ystore X
read Y(…)
read X(…)
store X(...)
store Y
reordering
compiler(JVM)CPU
store Xstore Y
(…)StoreLoad Barrier
read Xread Y
store Y(…)
store X(…)
read X(...)
read Y
reordering
compiler(JVM)CPU
Full barrier
Let's get practical!
Lock-free (FIFO) queue
(by John D. Valois)
enqueue(x) =acquire(lock)q = new Nodeq.value = xq.next = NULLtail.next = qtail = qrelease(lock)
enqueue(x) =acquire(lock)q = new Nodeq.value = xq.next = NULLtail.next = qtail = qrelease(lock)
enqueue(x) =acquire(lock)q = new Nodeq.value = xq.next = NULLtail.next = qtail = qrelease(lock)
enqueue(x) =q = new Nodeq.value = xq.next = NULLdo
p = tailsucc = CAS(p.next, NULL, q)if !succ
CAS(tail, p, p.next)while !succCAS(tail, p, q)
enqueue(x) =q = new Nodeq.value = xq.next = NULLdo
p = tailsucc = CAS(p.next, NULL, q)if !succ
CAS(tail, p, p.next)while !succCAS(tail, p, q)
dequeue() =do
p = headif p.next == NULL
error QUEUE_EMPTYwhile !CAS(head, p, p.next)return p.next.value
Never waitsNever blocks
Silver bullet?
More difficultABA problem
Solution?
Tagged referenceIntermediate nodes
LL/SC
Load-Link / Store-Conditional
Separates storage has valuefrom storage has been changed.
PowerPC, ARMbut NOT: x86, SPARC
LoadLink(x) =read(x)mark(x)
StoreConditional(x) = if x marked
store(x)unmark(x)return SUCCESS
elsereturn FAILURE
Language support
C (gcc)
__sync_fetch_and_add (_sub, _or...)__sync_add_and_fetch (_sub, _or...)
__sync_bool_compare_and_swap__sync_val_compare_and_swap
__sync_synchronize
C++11
#include <atomic>
template <class T> struct atomic;
atomic_thread_fence(...)
::store(...)::load(...)::compare_exchange(...)::fetch_add(...)
Java
java.util.concurrent.atomic
AtomicInteger.addAndGet.getAndAdd.compareAndSet
AtomicIntegerArray
AtomicReferenceAtomicStampedReference
?