32
BSC-Microsoft Research Centre Overview Osman Unsal Barcelona 26.March.2010

BSC-Microsoft Research Center Overview

Embed Size (px)

Citation preview

Page 1: BSC-Microsoft Research Center Overview

BSC-Microsoft ResearchCentre Overview

Osman Unsal

Barcelona 26.March.2010

Page 2: BSC-Microsoft Research Center Overview

BSC Departments

• Computer Science• Computer Architecture

• Life Sciences• Earth Sciences• Computer Applications

Page 3: BSC-Microsoft Research Center Overview

• Mission– Investigate, develop and manage technology to

facilitate the advancement of science.

• Objectives– Operate national supercomputing facility

– R&D in Supercomputing and Computer Architecture.

– Collaborate in R&D e-Science

• Consortium– the Spanish Government (MEC)– the Catalonian Government (DURSI)– the Technical University of Catalonia (UPC)

Barcelona Supercomputing Center

Page 4: BSC-Microsoft Research Center Overview

Location

Page 5: BSC-Microsoft Research Center Overview

Private/Public Partnership in Research• Two main models for company/university lablets

– Closed – Open

Closed

Open

• Finding the appropriate balance is key for successclosed open

?Students highly motivatedGenerated IP is private

CompanyUniversity

Higher visibility for researchers?Requires more effort from company

CompanyUniversity

Page 6: BSC-Microsoft Research Center Overview

BSCMSRC experience (I)

• Open model– No patents, public IP, papers and open source main focus

• Making positive impact in the community• Contributing to standard setting bodies

– Participation of Microsoft Cambridge Senior Researchers– Similar research agenda with parallel computing centers in US

• Berkeley• Illinois• Rice• Stanford

Combining efforts is absolutely necessary

Page 7: BSC-Microsoft Research Center Overview

BSCMSRC experience (II)

• Internships/stays:– UPC students exposed to Microsoft Research through internships– Future extension: longer stays of Cambridge researchers in

Barcelona

• Collaboration between BSCMSRC and other Microsoft institutes– Cosbi in planning phase

• BSC coordinating 9 partner, 5M Euro EC VELOX project – Microsoft Cambridge catalyst in introducing potential partners– Microsoft observers to project (as is Intel and SUN)

Page 8: BSC-Microsoft Research Center Overview

BSCMSR Centre: general overview

• Microsoft-BSC joint project– Combining research expertise

• Computer Architectures for Parallel Paradigms at BSC: computer architecture

• Microsoft Research Cambridge: programming systems– Kickoff meeting in April 2006– Total Effort:

• 2 Senior BSC researchers, 23 PhD students– Support and part-time involvement of senior researchers and

faculty from UPC

• BSCMSRC inaugurated January 2008

Page 9: BSC-Microsoft Research Center Overview

BSCMSR Centre: research approach

• Processor design historically bottom-up:– Build the chip and let the software guys sort it out– Difficult to defend in the era of multi- and many-core

• We prefer a top-down approach:– Let software developments guide hardware design– “Coger el toro por los cuernos”– Utilize feedback from the OS, compiler, programmer language

experts about the features they would like to see in future processors

Page 10: BSC-Microsoft Research Center Overview

The motivation“It is better for Intel to get involved in this now so when we get to the point of having 10s and 100s of cores we will have the answers.

There is a lot of architecture work to do to release the potential, and we will not bring these products to market until we have good solutions to the programming problem”

Justin Rattner Intel CTO

“Now, the grains inside these machines more and more will be multi-core type devices, and so the idea of parallelization won't just be at the individual chip level, even inside that chip we need to explore new techniques like transactional memorythat will allow us to get the full benefit of all those transistors and map that into higher and higher performance.”Bill Gates, Supercomputing 05 keynote

Marenostrum

Most beautiful supercomputerFortune magazine, Sept. 2006

#1 in Europe, #5 in the World

100's of TeraFlopswith general purpose Linux supercluster of commodity PowerPC-based Blade Servers

Source: Intel

•80 processors in a die of 300 square mm.•Terabytes per second of memory bandwidth•Note: The barrier of the Teraflops was obtained by Intel in 1991 using 10.000 Pentium Pro processors contained in more than 85 cabinets occupying 200 square meters •This will be possible in 5 years from now

• Top-down architecture, include:– Application– Debugging– Programming models– Programming languages– Compilers– Operating Systems– Runtime environment

• As design drivers

TOP-DOWN COMPUTER ARCHITECTURE

Page 11: BSC-Microsoft Research Center Overview

BSC Contributors

Full-timeSerbiaPhD StudentVladimir Gajinov

Full-TimeTurkeyPhD StudentGokcen Kestor

Full-TimeIranPhD StudentAzam Seyedi

Full-TimeSerbiaPhD StudentNikola Markovic

Full-timeTurkey/USADoctorIbrahim Hur

Full-timeArgentinaDoctorAdrian Cristal

Full-timeTurkeyDoctorOsman Unsal

Part-timeSpainProfessorEduard Ayguade

Full-timeIndiaPhD StudentSutirtha Sanyal

Full-timeTurkeyPhD StudentNehir Sonmez

Full-timeSerbiaPhD StudentSasa Tomic

Full-timeBulgariaPhD StudentFerad Zyulkyarov

Full-timeSerbiaPhD StudentSrdjan Stipic

Part-timeSpainPhD StudentEnrique Vallejo

Part-timeSpainProfessorMateo Valero

Full-timeSerbiaPhD StudentMilovan Djuric

Full-timeIrelandPhD StudentTimothy Hayes

Full-timeSerbiaPhD StudentIvan Ratkovic

Full-timeSerbiaPhD StudentNikola Bezanic

Full-timeSerbiaPhD StudentMilan Stanic

Full-timeSpainPh.D. StudentOscar Palomar

Full-timeGreeceM.S. StudentVasilis Karakostas

Full-TimeTurkeyPhD StudentGulay Yalcin

Full-TimeSerbiaPhD StudentVesna Smiljovic

Full-TimeSerbiaPhD StudentNebosja Miletic

InternSpainM.S. StudentOriol Arcas

Full-TimeSpainPhD StudentAdria Armejach

InternSpainM.S. StudentOtto Pflucker

Page 12: BSC-Microsoft Research Center Overview

Microsoft- BSC Project: A quick snapshot

• Project started in April 2006

• Two-year initial duration

• Initial topic:Transactional Memory

• BSC – Microsoft Research Centre inaugurated January 2008

Page 13: BSC-Microsoft Research Center Overview

Transactions vs. Locks

• Non-blocking synchronization

• Deadlock free• Composable• Easy programmable• Efficiency of fine

grain locks

• Blocking synchronization

• Deadlock risk• Non-composable• Coarse grain locks

limit TLP• Fine grain locks are

difficult to program

Transactions Locks

B1

Page 14: BSC-Microsoft Research Center Overview

Slide 13

B1 Osman addedBSC-CNS; 02/09/2007

Page 15: BSC-Microsoft Research Center Overview

What is Transactional Memory (TM) ?

• TM optimistically runs transactions in parallel, in the hope that they do not perform conflicting memory accesses.

• If the transactions do not conflict then the optimism has paid off.

• If transactions do attempt conflicting accesses, then the TM must delay or abort the work of one or the other.

• The partial effects of aborted transactions are "rolled back" before they are re-executed .

Page 16: BSC-Microsoft Research Center Overview

Transaction Execution• The TM system lets different threads to

execute the atomic regions speculatively.• The TM system guarantees:

– Atomicity – all tentative memory changes become visible to the other threads simultaneously at the time when a transaction commits.

– Isolation – while transaction executes the tentative memory updates are not visible by the other threads.

Page 17: BSC-Microsoft Research Center Overview

Readset / Writeset

• Readset: The set of all the distict memorylocations read by a transaction

• Writeset: The set of all the distinct memorylocations written by a transaction

Page 18: BSC-Microsoft Research Center Overview

atomic { atomic {… …read (a); read (a);read (b); write (c);… …write (a); }

}

Page 19: BSC-Microsoft Research Center Overview

Versioning and Conflictresolution

• Basic implementation requirements– Data versioning– Conflict detection & resolution

• Conflicts happen if– One transaction (attemps to) reads a data item while another

one tries to write to the item– At least two transactions (attempt to) write a data item

• If conflict is detected one of the transactions could be aborted

Page 20: BSC-Microsoft Research Center Overview

atomic { atomic {… …read (a); read (a);read (b); write (c);… …write (a); }

}

Page 21: BSC-Microsoft Research Center Overview

Conflict Detection andResolution

• Detect and handle conflicts between transaction– Read-Write and (often) Write-Write conflicts– For detection, a transactions tracks its read-set and write-set

1. Eager detection •Check for conflicts during loads or stores

HW: check through coherence lookupsSW: checks through locks and/or version numbers

•Use contention manager to decide to stall or abortVarious priority policies to handle common case fast

2.Lazy detection•Detect conflicts when a transaction attempts to commit

HW: write-set of committing transaction compared to read-set of others–Committing transaction succeeds; others may abort

SW: validate write-set and read-set using locks and version numbers

Page 22: BSC-Microsoft Research Center Overview

atomic { atomic {… …read (a); read (a);read (b); write (c);… …write (a); }

}

Page 23: BSC-Microsoft Research Center Overview

Versioning• Manage uncommited(new) and commited(old) versions of data for

concurrent transactions

1.Eager (undo-log based)•Update memory location directly; maintain undo info in a log+Faster commit, direct reads (SW) –Slower aborts, no fault tolerance, weak atomicity (SW)

2.Lazy (write-buffer based)•Buffer writes until commit; update memory location on commit +Faster abort, fault tolerance, strong atomicity (SW)–Slower commits, indirect reads (SW)

Page 24: BSC-Microsoft Research Center Overview

atomic { atomic {… …read (a); read (a);read (b); write (c);… …write (a); }

}

Page 25: BSC-Microsoft Research Center Overview

App

licat

ions

Architecture

Transactional Memory

STM HTM

Func

tiona

lIm

pera

tive

Pro

gram

min

gm

odel

Transactional Memory• Advent of multicore chips• Parallel programmingcommonplace

• Simplicity vs. wide applicability• Full performance not theinitial driving force, but thefinal objective

• Advent of multicore chips• Parallel programmingcommonplace

• Simplicity vs. wide applicability• Full performance not theinitial driving force, but thefinal objective

MSRC

BSC

Page 26: BSC-Microsoft Research Center Overview

App

licat

ions

Architecture

Transactional Memory

STM HTM

Func

tiona

lIm

pera

tive

Pro

gram

min

gm

odel

Research onTM•Haskell Transactional Benchmark•RMS applications•Quake-TM•Atomic-Quake•Wormbench

•Haskell Transactional Benchmark•RMS applications•Quake-TM•Atomic-Quake•Wormbench

MSRC

BSC

Page 27: BSC-Microsoft Research Center Overview

App

licat

ions

Architecture

Transactional Memory

STM HTM

Func

tiona

lIm

pera

tive

Pro

gram

min

gm

odel

Research on TM•Haskell STM•OpenMP + TM•Haskell STM•OpenMP + TM

MSRC

BSC

Page 28: BSC-Microsoft Research Center Overview

App

licat

ions

Architecture

Transactional Memory

STM HTM

Func

tiona

lIm

pera

tive

Pro

gram

min

gm

odel

Research on TM•HW acceleration of STM •Simulators•Dynamic filtering •Eazy HTM

•HW acceleration of STM •Simulators•Dynamic filtering •Eazy HTM

MSRC

BSC

Page 29: BSC-Microsoft Research Center Overview

Selected Publications on TMF. Zyulkyarov, T. Harris, O. Unsal, A. Cristal, M. Valero,” Debugging Programs that use Atomic Blocks and

Transactional Memory “, 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’10), January 2010

S. Tomić, C. Perfumo, C. Kulkarni, A. Armejach, A. Cristal, O. Unsal, T. Harris, M. Valero, “EazyHTM, Eager-Lazy Hardware Transactional Memory”, 42nd International Symposium on Microarchitecture, Micro'09, December 2009

V. Gajinov, F. Zyulkyarov, A. Cristal, O. Unsal, T. Harris, M. Valero, “QuakeTM: Parallelizing a Complex Serial Application Using Transactional Memory”, 23rd International Conference on Supercomputing (ICS 2009), June 2009.

N. Sonmez, A. Cristal, T. Harris, O. Unsal, M. Valero, “Taking the heat off transactions: dynamic selection of pessimistic concurrency control”, 23th International Parallel and Distributed Systems Symposium, (IPDPS 2009), May 2009,

F. Zyulkyarov , V. Gajinov, O. Unsal , A. Cristal , E. Ayguade, T. Harris , M. Valero, “Atomic Quake: Use Case of Transactional Memory in an Interactive Multiplayer Game Server”, 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), February 2009.

C. Perfumo, N. Sonmez, S. Stipic, O. Unsal, A. Cristal, T. Harris, M. Valero, “The Limits of Software Transactional Memory (STM): Dissecting Haskell STM Applications on a Many-Core Environment”, ACM International Conference on Computing Frontiers, May 2008

E. Vallejo, T. Harris, A. Cristal, O. Unsal, M. Valero, “Hybrid Transactional Memory to Accelerate Safe Lock-based Transactions”, Third ACM SIGPLAN Workshop on Transactional Computing TRANSACT, February 2008

M. Milovanović, R. Ferrer, O. Unsal, A. Cristal, X. Martorell, E. Ayguadé, J. Labarta and M. Valero, “Transactional Memory and OpenMP”, International Journal of Parallel Programming.

C. Perfumo, N. Sonmez, O. Unsal, A. Cristal, M. Valero, T. Harris, “Dissecting Transactional Executions in Haskell”, Second ACM Workshop on Transactional Computing TRANSACT, August 2007

T. Harris, A. Cristal, O. Unsal, E. Ayguade, F. Gagliardi, B. Smith, M. Valero, “Transactional Memory: An Overview”, IEEE Micro, May-June 2007

Page 30: BSC-Microsoft Research Center Overview

FP7 TM-Project VELOX• Goal: Unified TM system stack, Duration: 2008-2011, Budget: 5M Euros• BSC involvement initiated through BSC-MSR Centre • Project participants:

– BSC– EPFL– University of Neuchatel– Technical University of Dresden– Tel Aviv University– Chalmers Institute of Technology– AMD– Red-Hat– VirtualLogix

• Project observers:– Microsoft– Intel– SUN

Page 31: BSC-Microsoft Research Center Overview

Empowering TM across the system stackTM-based Applications and Benchmarks

(Legacy software migration, TM-aware development, etc.)TM-based Applications and Benchmarks

(Legacy software migration, TM-aware development, etc.)

Application Environments(Application servers, event processing engine, etc.)

Application Environments(Application servers, event processing engine, etc.)

Programming Language(Java/C/C++ integration, exception handling, etc.)

Programming Language(Java/C/C++ integration, exception handling, etc.)

Compiler(Code generation, bytecode instrumentation, etc.)

Compiler(Code generation, bytecode instrumentation, etc.)

TM Libraries(Word-based, object-based, PLS, TBB, etc.)

TM Libraries(Word-based, object-based, PLS, TBB, etc.)

Native Libraries(System libraries)

Native Libraries(System libraries)

Operating System(Low-level TM support)Operating System(Low-level TM support)

TM Hardware and Adaptation Layer (Multi-core CPUs, HW+SW transactions, etc.)TM Hardware and Adaptation Layer (Multi-core CPUs, HW+SW transactions, etc.)

1

2

6

5

C

7

4

3 Inte

grat

ion

(Ada

ptiv

e TM

, hy

brid

app

roac

hes,

etc

.)In

tegr

atio

n(A

dapt

ive

TM,

hybr

id a

ppro

ache

s, e

tc.)

Theo

ry(C

onte

ntio

n, is

olat

ion,

alg

orit

hms,

etc

.)Th

eory

(Con

tent

ion,

isol

atio

n, a

lgor

ithm

s, e

tc.)

A B

http://www.velox-project.eu/

Page 32: BSC-Microsoft Research Center Overview

Thank you!

http://www.bsc.eshttp://www.bscmsrc.eu

http://www.velox-project.eu