19
Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale http://charm.cs.uiuc.e du Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana Champaign

Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale Parallel Programming Laboratory Department of Computer Science

Embed Size (px)

DESCRIPTION

Outline –Charm++ designed for portability between shared and distributed memory –Optimizing multicore charm K-neighbor and its description and performance What optimizations were carried out –Abstractions: Basic: shared object space, Readonly data Plain global variables: still work.. More on disciplined use of these later Nodegroups Passing pointers to shared data structures, including sections of arrays. –Readonly, write-exclusive: permissions by convention or “capability” 1/26/2016Charm++ Workshop3

Citation preview

Page 1: Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale  Parallel Programming Laboratory Department of Computer Science

Programming an SMP Desktop using Charm++

Laxmikant (Sanjay) Kale

http://charm.cs.uiuc.eduParallel Programming LaboratoryDepartment of Computer Science

University of Illinois at Urbana ChampaignSupported in part by IACAT

Page 2: Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale  Parallel Programming Laboratory Department of Computer Science

Prologue• I will present an abbreviated version of the planed

talk– We are running late..– Also, I realized that what I really intended to present, with code

examples, will need an hour long talk..– We will write that in a report later (may be put it in charm

documentation)

05/03/23 Charm++ Workshop 2

Page 3: Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale  Parallel Programming Laboratory Department of Computer Science

Outline– Charm++ designed for portability between shared and distributed

memory– Optimizing multicore charm

• K-neighbor and its description and performance• What optimizations were carried out

– Abstractions:• Basic: shared object space, Readonly data• Plain global variables: still work.. More on disciplined use of

these later• Nodegroups• Passing pointers to shared data structures, including sections

of arrays.– Readonly, write-exclusive: permissions by convention or “capability”

05/03/23 Charm++ Workshop 3

Page 4: Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale  Parallel Programming Laboratory Department of Computer Science

Optimizing SMP implementation of Charm++• Changed memory allocator

– to avoid acquiring a lock per memory allocation

• Reduced the granularity of critical region• Used thread local storage (__thread) to avoid false

sharing• Use memory fence instead of lock for pcqueue• Reduce lock contention by using a separate msg

queue for every other core on the same node• Simplify the data structure of pcqueue

– Assumes queuesize is adequately large

05/03/23 Charm++ Workshop 4

Page 5: Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale  Parallel Programming Laboratory Department of Computer Science

Results on SMP Performance• Improvement on K-Neighbor Test (8 cores, Mar’2009)

Page 6: Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale  Parallel Programming Laboratory Department of Computer Science

Results on SMP Performance• Improvement on K-Neighbor Test (24 cores, Mar’2009)

Page 7: Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale  Parallel Programming Laboratory Department of Computer Science

Results on SMP Performance• Improvement on K-Neighbor Test (16 cores, Apr’2009)

0 64 128 256 512 1024 2048 40960.00

200.00

400.00

600.00

800.00

1000.00

1200.00

1400.00

1600.00

1800.00

2000.00 kNeighbor test on a Power 5 node (15 elements, k=3)

non-smp smpmsg size (bytes)

aver

age

itera

tion

time

(us)

Page 8: Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale  Parallel Programming Laboratory Department of Computer Science

05/03/23 Charm++ Workshop 8

We evaluated many of our applications to test and

demonstrate the efficacy of the optimized SMP

runtime

Page 9: Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale  Parallel Programming Laboratory Department of Computer Science

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

0.2

0.4

0.6

0.8

1

1.2

Number of processors

Effi

cien

cyJacobi 2D stencil computation on Power 5

(8000x8000 matrix size)

Page 10: Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale  Parallel Programming Laboratory Department of Computer Science

05/03/23 Charm++ Workshop 10

ChaNGa: Barnes-Hut based

production astronomy code

Page 11: Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale  Parallel Programming Laboratory Department of Computer Science

05/03/23 Charm++ Workshop 11

ChaNGa: Barnes-Hut based

production astronomy code

Page 12: Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale  Parallel Programming Laboratory Department of Computer Science

NAMD Scaling with Optimization

0

0.5

1

1.5

2Time/ste

p (

s)

Number of Cores

Multicore vs net

multicorenon-smp

multicore 1.715 0.6 0.303 0.152 0.0826non-smp 1.719 0.631 0.321 0.173 0.11

1 3 6 12 24

NAMD apoa1 running on upcrc

Page 13: Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale  Parallel Programming Laboratory Department of Computer Science

05/03/23 Charm++ Workshop 13

Summary of constructs that use shared memory in

Charm++

Page 14: Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale  Parallel Programming Laboratory Department of Computer Science

Basic Mechanisms• Chares and Chare array constitute a “shared

object space”– Analogous to shared address space

• Readonly globals– Initialized in main::main or any method called from it

synchronously

• Shared global variables

05/03/23 Charm++ Workshop 14

Page 15: Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale  Parallel Programming Laboratory Department of Computer Science

More powerful mechanisms• Node groups• Passing pointers to shared data structures,

including sections of arrays.– Readonly, write-permission

05/03/23 Charm++ Workshop 15

Page 16: Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale  Parallel Programming Laboratory Department of Computer Science

16

Node Groups• Node Groups - a collection of objects (chares)

– Exactly one representative on each node• Ideally suited for system libraries on SMP

– Similar to arrays:• Broadcasts, reductions, indexing

– But not completely like arrays:• Non-migratable; one per node

Page 17: Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale  Parallel Programming Laboratory Department of Computer Science

Conditional packing

• Pass data structure between chares– Pass pointer (dest. within the node)– PUP the entire structure (dest. outside the node)

• Who owns the data and frees it?– Data structure must inherit from CkConditional

• Reference counted

• A data structure can contain info about an array section– Useful in cases like in-place sorting (e.g. quicksort)

Page 18: Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale  Parallel Programming Laboratory Department of Computer Science

Sharing Data and Conditional packing• Pointers can be sent in “messages”, but they are

packed to underlying data structures when going across nodes– (feature in chare kernel since 1989 or so!)

• Data structure being shared should be encapsulated, with a read or write “capability”– If I give you write access, I promise not to modify it, read it, or

grant access to someone else– If I give you a read access, I promise not to change it until you

are done

05/03/23 Charm++ Workshop 18

Page 19: Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale  Parallel Programming Laboratory Department of Computer Science

Disciplined Sharing• My pet idea: shared arrays with restricted modes

– Readonly, write-exclusive, accumulate, and “owner-computes”– Modes can change at well-defined global synch points– Captures a large fraction of uses of shared arrays

05/03/23 Charm++ Workshop 19