HiPEACinfo 1

info1

Network of Excellence on High Performance Embedded Architectures and Compilers

“FET – Proactive Initiative in Advanced Computing ArchitecturesAdvanced Compiler Technologies and Processor Architectures

Deadline: 22 March 2005, http://www.cordis.lu/ist/fet/aca.htm”

2

2

3

4

5

6

6

7

7

8

12

Message from the HiPEAC coordinator

Message from the project officer

Hipeac in a nutshell

A presentation of HiPEAC Sweden

Report from the HiPEAC Kickoff meeting

Modular simulation with MicroLib

FIT instrumentation framework

Steering Committee News

Current list of members

PhD news

Upcoming events

www.hipeac.net

appears quarterly | January 2005

intro

2 info1

Message from the HiPEAC coordinator

ers evolving, detailing the key issuesand the most promising approaches. Atthe same time, HiPEAC will provideseed funding to researchers in a quickand flexible manner for pursuing prom-ising research directions and setting upcollaborations within Europe.

In addition, we will also develop com-mon simulation and compilation plat-forms that will serve as backbones forour research and help disseminate it toEuropean industry and outside ofEurope. HiPEAC will also set up multipleactivities to increase the interactionsand visibility of our community, such asonline seminars, a summer school, thisnewsletter, a new conference and jour-nal, and a common computing plat-form, amongst others.

The role of HiPEAC is also to act as anentry point to and for researchers: tointeract with companies within andoutwith Europe, to set up relationshipswith academic groups outside Europe,

Our field is currently witnessing drasticchanges in many areas : in architectureswith the emergence of alternatives to theILP-based, VLIW or superscalar proces-sors, in compilers with the emergence ofexcessively complex architectures whichrequire new program optimizationapproaches, in technology whereMoore’s law may be slowing down in thecoming years, forcing us to either findnew ways to scale up architecture per-formance or even to adapt to new tech-nologies, and even in applications wherewe are witnessing an increased overlapof the high-performance computingand embedded domains with newdevices belonging to both.

In these changing times, the role ofresearchers, from both academia andindustry, will be key to enable architec-ture performance to keep scaling up.They are charged with the task ofquickly identifying the most promisingalternatives and the corresponding coreissues. The need for innovation in all

domains makes it critical to improveinteractions between researchers fromacademia and industry, to spread infor-mation quickly among all researchers,and to entice researchers to gather andcoordinate in order to increase theirimpact, their efficiency and take advan-tage of their complementary skills.

The role of HiPEAC

HiPEAC has set ambitious scientific andorganizational goals. We want HiPEACto act as a steering partner forresearchers, outlining key fundamentalor industry-relevant issues, whilst at thesame time empowering researcherswith the ability and freedom to quicklydevelop new approaches or highlightnew issues , to inform the whole com-munity and to start working or collabo-rating there on. For that purpose,together with renowned non-Europeanresearchers from academia and indus-try, HiPEAC will draft a yearly roadmapof how it sees architectures and compil-

Dear Colleagues, Welcome to the first edition of the HiPEAC newsletter. As coordinator of the EuropeanNetwork of Excellence on High Performance Embedded Architectures and Compilers, Ifirst wish to congratulate all HiPEAC members on achieving what is the first step in buil-ding up the opportunities for continued basic research in Europe on compilers, proces-sors, and new architectures for high-performance applications. To non-HiPEAC mem-bers, I am happy to inform you of the creation of this network, which gathers more than70 top European researchers from 39 academic institutions and 13 companies.

Mateo ValeroCoordinatorUPC [email protected]

Message from the project officerMy name is Mercè Griera i Fisa and I havethe pleasure of being the project officerin charge of HiPEAC. My background isin computer science and I’m working inthe Information SocietyDirectorate-General of theEuropean Commission,concretely, in the unitresponsible for EmbeddedSystems.

Recently, I was reading in a

paper, that the European university sys-tem (concretely, the EU-15) is delivering0.56 PhD in Science and Technology perthousand inhabitants. The US system is

delivering 0.48 and theJapanese only 0.24.However, Europe is lag-ging behind US andJapan in the number ofresearchers per thou-sand employees and inthe number of papers

highly quoted. In my view, this showsclearly, that Europe has big difficulties inkeeping its best researchers and thatEurope is not attractive enough toforeign talents.

The aim of HiPEAC is to create a virtualcentre of excellence in high-performancecompilers and architectures for embed-ded processors. The centre will gatherthe world’s largest critical mass ofresearchers, generate leading-edgeresults and offer high level discussionforums to become a world referencepoint in the field.

Mercè Griera-I-Fisa ([email protected])

especially in the US (NSF) and Asia, andto make major funding institutions likethe European Commission aware of thepotentials and needs of our field. Onthis last point, we recently scored ourfirst success with the opening of a FET(Future and Emerging Technologies)Call on Advanced ComputingArchitectures in November 2004.

To achieve many of these goals, weneed to improve, if not create, commu-nication channels within our researchcommunity, and between our commu-nity and all our potential partners. Thenewsletter you now have in your handswill be one of our key means of dissem-inating information. We also want thisnewsletter to become a two-way com-munication channel. It is not onlymeant to inform you of the latestHiPEAC news and events, but also toencourage you to actively participate infuture issues, by sharing importantnews and information from your insti-tution with our community. We strong-ly encourage you, whether you belongto HiPEAC or not, to take advantage ofthis medium and submit to it.

To HiPEAC members, I want to wel-come you again to the network, and Iam sure that with your help, we canmake HiPEAC one of the most impor-tant tools for our scientific communityin the next few years. To non-HiPEACmembers, I want to thank you for yourinterest in HiPEAC, and I hope we willhave the pleasure of working togetherin the near future. n

What are the core objectives of HiPEAC?• to create a visible and integrated community of researchers; to establish tight

relationships with European industry;• to steer research efforts towards important scientific issues, either funda-

mental or industry-relevant in the domain of high performance computer archi-tecture and compilers; to make the community more reactive to novel issues andapproaches, and to coordinate its efforts;

• to stimulate cooperation between computer architects and compilerbuilders, because increasingly complex architectures require increasingly complexcompilers, and so, designing both in conjunction becomes crucial to have goodsustained performance that scales with technology.

HiPEAC’s management structure

HiPEAC in a nutshell

The network coordinator is UPC (Spain)Mateo Valero, UPC, Barcelona, Spain

The management of HiPEAC is carriedout by the steering committee, consisting of the following institutions: • Koen De Bosschere, Ghent University (Belgium) • Olivier Temam, INRIA, Paris (France) • Theo Ungerer, University of Augsburg (Germany) • Manolis Katevenis, FORTH (Greece) • Antonio Prete, University of Pisa (Italy) • Josep Llosa, UPC (Spain) • Per Stenstrom, Chalmers University (Sweden)• Stamatis Vassiliadis, TU Delft (The Netherlands)• Michael O’Boyle, Edinburgh University (UK)

The industrial advisory board consists of the following companies:• ARM• IBM Haifa• Infineon• Kayser Italia• Virtutech • STMicroelectronics

HiPEAC will bring together researcherson architectures with researchers oncompilers from both academia andindustry. Therefore, innovative researchresults will be fed into industry at theearliest stages contributing to keepEuropean industry leadership in thishighly competitive market.I believe HiPEAC will contribute to createinnovation in industry and will offer totalented researchers from Europe andabroad the new opportunities Europeneeds to improve the “figures” in myreadings. I look forward to contributingto HiPEAC progress.

n

HiPEAC members throughout Europe

HiPEAC stands for High PerformanceEmbedded Architectures andCompilers. It is a Network of Excellence,funded by the 6th European FrameworkProgramme (FP6), within theInformation Society Technologies (IST)Priority. It started on September 1st,

2004 and will last for four years.HiPEAC addresses the design andimplementation of high-performancecommodity computing devices on the10+ year horizon, covering both theprocessor design and the optimizingcompiler infrastructure.

3info1

4 info1

HiPEAC Partner

A presentation of HiPEAC SwedenSweden has a vibrant industry that relieson information technology in theembedded systems area. For example, inthe field of telecommunications, work isfocused on systems ranging from highlyavailable telecommunication servers tohandheld mobile clients. Design trade-offs typically involve meeting challengingdemands on availability, performance,and energy consumption. Anotherexample is Swedish companies engagedin vehicular technology ranging fromcars through aeroplanes to satellites.Embedded computer systems in theseapplications must be trustworthybecause human life or enormous eco-nomical investments are at risk. HiPEAC’sfocus on hardware/software tradeoffs inhigh-performance embedded computerarchitectures is of strategic importancefor these and other industry segments.

Sweden has traditionally pro-activelyrecognized the importance of informa-tion technology. In the late 40s, thegovernment sent a delegation to trans-fer knowledge and experience from theENIAC project. Erik Stemme, today pro-fessor emeritus at Chalmers Universityof Technology, had the privilege to workwith John Von Neumann amongst oth-ers. After his return, he built the firstSwedish computer – called BESK –which was deployed in 1953 and wasthen the world’s fastest computer, atleast for a couple of months, with a per-formance of 18,000 additions per sec-ond. Another strategic action was agovernment program to systematically

strengthen research competence bylaunching a large number of projects incomputer science and engineering.

Two projects on multiprocessor technol-ogy were launched in the 80s at LundUniversity and at the Swedish Instituteof Computer Science (SICS) with theaim of advancing state-of-the-art indesign principles, ranging from theimplementation of parallel languagesvia operating systems to computerarchitecture. On the architecture side,one emphasis was on the design ofscalable memory systems, a prerequisitefor multiprocessor technology. At LundUniversity, one of the first experimentalshared-memory multiprocessors inEurope was already in operation in1981, based on microprocessor tech-nology. Moreover, a parallel implemen-tation of a high-level language wasdeployed permitting experiments rang-ing all the way from the application tothe architectural level. Per Stenströmand Mats Brorsson – two of the princi-pal Swedish members of HiPEAC whowere at the time Ph.D. students – wereengaged in the design of the secondgeneration of the system which embod-ied a 38-node non-uniform-memory-access (NUMA) machine that came intooperation in 1987. Erik Hagersten – thethird principal Swedish member ofHiPEAC – was at that time involved inthe implementation of the DataDiffusion Machine (DDM) at SICS. DDMpushed cache principles to their fullestextent by replicating data across the

processing nodes in a concept theynamed Cache Only MemoryArchitecture, or COMA for short. Workon the SimICS simulator, which has nowbeen spun off as Virtutech, an industri-al member of HiPEAC, is another exam-ple of results from the SICS research. Inretrospect, scalable multiprocessing wasa vision that today has manifested itselfas a new breed of systems known aschip multiprocessors.

Per Stenström leads the High-Performance Computer Archi-tecture group at Chalmers. Hismain research is oriented

towards understanding how we candesign high-performance systems thatare cost-effective by investigatinghardware/software tradeoffs. To thisend, he has contributed to scalablemultiprocessor systems, e.g., scalablecache coherence solutions, latency toler-ance techniques, performance evalua-tion methodologies, and compiler opti-mization techniques. One of his currentdirections is to explore design principlesfor chip-multiprocessor technology withthe goal of achieving high-performance,low power consumption, ease of pro-gramming, and low design complexity.

Mats Brorsson leads the research groupat KTH, the Royal Institute of Technologyin Stockholm. During the last few yearsthe group’s activities have been in pro-gramming models for shared memoryparallel computing, in particular relatedto OpenMP, and in low-power computerarchitecture for embedded processors. Amajor contribution in the formerresearch area has been an open source

1To steer research, we set upResearch Clusters, which are

groups of individual researchers fromacademia and industry with a commontopic of interest who gather in a flexibleway to carry out joint architecture/com-piler research. The researchers will alsohave access to common computingequipment, joint seminars, etc.

What are the seven HiPEAC instruments?2HiPEAC will define and promote an

open common simulation plat-form and an open common compilerplatform for use within the networkand beyond. This will act as a backbonefor our joint research activities, and willgreatly facilitate the exchange ofresearch results. It is our aim to makethese platforms in the long term the defacto standard in Europe and abroad.

3HiPEAC will publish a roadmap onhigh performance computer

architecture and compilers as well ason as long-term issues (new paradigmsand new technologies) and methodolo-gy issues (simulators, compilers). Thepurpose of the roadmap is twofold:highlight industry issues and fundamen-tal scientific issues. The first version ofthe roadmap is due in 2005.

5

research platform for OpenMPimplementations for clusters ofmultiprocessors. In the latterarea they are looking at issues

such as scheduling principles for adap-tive chip-multiprocessors, low-energyfetch engine improvements, and design-space exploration of chip-multiproces-sors etc. In cooperation with ARM, theyare using the OpenMP implementationas the primary vehicle for parallel execu-tion on embedded CMPs.

Erik Hagersten returned toSweden from SunMicrosystems six years agoand started the Uppsala

Architecture Research Team (UART).Examples of their research results includethe efficient new tool StatCache, whichguides an application-writer towardmore locality-aware algorithms, a crucialconsideration for CMP architectures.Other examples are bundled prefetchingand the Elbow cache, techniques tolower power consumption caused bysnooping and cache lookups in a CMP,and DSZOOM, a software technique thatcan tie several CMPs together in ashared-memory manner while requiringlittle or no hardware support.

n

info1

Report from the HiPEAC kickoff meeting“The European commission has high expectations of the HiPEAC network”

The kickoff meeting took place inAntibes Juan-les-Pins on September 30,2004, with 66 registered participants.The meeting started with an introduc-tion by the project officer Mercè Griera-I-Fisa explaining the objectives of thenetworks of excellence, and makingclear that she has high expectations ofthe HiPEAC network. Her introductionwas followed by a presentation aboutHiPEAC by Mateo Valero, the network

coordinator, who explained in detail thedifferent work packages. During theremainder of the day, the differentmember countries and companies gavean overview of their activities and thetopics on which they would like to col-laborate. After the kickoff meetingthere was a first workshop on the sim-ulation platform, which was alsoattended by numerous HiPEACresearchers. n

Steering committee and industrial advisory board. From left to right: Theo Ungerer (U. Augsburg), Krisztian Flautner (ARM), MichaelO’Boyle (U. Edingburgh), Mercè Griera-I-Fisa (EU Project officer), Stamatis Vassiliadis(TU Delft), Bilha Mendelsohn (IBM Israel), Manolis Katevenis (Forth), Koen DeBosschere (Ghent U.), Pilar Armas (UPC), Erik Norden (Infineon), Mateo Valero(UPC), Marco Cornero (STM), Antonio Prete (U. Pisa), Jakob Engblom (Virtutech),Per Stenström (U. Chalmers), Olivier Temam (INRIA), Josep Llosa (UPC).

4HiPEAC will start a conference anda journal on architecture and

compilers. The aim is to disseminateresearch results and to increase the visi-bility of the community. The conferenceand the journal will start in 2005, andwill be open to everybody.

5HiPEAC will organize a yearly sum-mer school on advanced computer

architectures and compilers forembedded systems. This yearly eventwill grow to become one of the major

events of the network. World-class expertswill teach during the summer school andthe participants will be able to discuss theirresearch with one another. The firstsummer school will take place in July 2005and it will be open to everybody.

6HiPEAC will publish a quarterlynewsletter with information about

the network, targeted at everybodyinterested in the HiPEAC activities. Theaim of the newsletter is to increase thevisibility of the HiPEAC network inside

and outside Europe and to improve thecommunication among the researchersof our community.

7The network includes an adminis-trative and technical staff that

supports its day-to-day operation. Thewebsite is the main communicationchannel between the network partners.It can be visited athttp://www.HiPEAC.net.

n

Chalmers University of TechnologyProfessor Per Stenström (partner)http://www.ce.chalmers.se/~pers

Uppsala UniversityProfessor Erik Hagersten (member)http://www.it.uu.se/research/group/uart

Royal Institute of TechnologyProfessor Mats Brorsson (member)http://www.it.kth.se/~matsbror

6 info1

in the spotlights

Processor architecture simulation is thekey method used by micro-architectureresearchers for evaluating the perform-ance and usefulness of new architec-ture ideas. Micro-Architecture simula-tion often consists of writing very largesoftware programs describing thedetailed behaviour of processors, i.e.,what is happening in the processor atevery clock cycle, the architecture per-formance being often measured innumbers of cycles. While these simula-tors are not as detailed as circuit-levelmodels, the software complexity is verymuch tied to the architecture complexi-ty itself, and with high-performanceprocessors quickly evolving from a fewmillion transistors in the 1990s to a bil-lion transistors or more in 2005, simula-tors are becoming excessively complexpieces of software, which are very time-consuming to develop and modify.

Consequently, a single research groupcan no longer afford to develop awhole new processor simulator as partof its normal research activities, i.e., toevaluate a few architecture ideas. As aresult, most researchers rely on available

models, and especially monolithic simu-lators, i.e., built as a single piece of soft-ware describing a full processor archi-tecture; but a processor architecture istypically composed of several tens ofcomponents (cache, scheduler, branchpredictions...), which have evolved dif-ferently over the years, and moreimportantly, researchers are usually spe-cialized in just one or a few of thesecomponents. Because monolithic simu-lators do not reflect the modularity ofprocessor architectures, it can beexceedingly difficult to update one or afew architecture components, or toextract a component proposed andimplemented by a researcher in order tocompare it with other similar compo-nents.

This situation has severe consequenceson micro-architecture research: (1)researchers develop their ideas on out-dated architecture models because theylack alternatives, (2) simulator develop-ment time is quickly increasing andresearchers have no simple means forreusing and exploiting the developmenteffort of other researchers, (3)researchers have no reasonably rigorousway to compare their performanceresults, and consequently, these results

are often unverifiable and unreliable, (4)the discrepancy between the softwarestructure and the processor structure isa source of inaccuracy sometimes lead-ing researchers to propose unrealisticarchitectures, (5) and as result, it hin-ders the take-up of architecture ideasfrom academia by industry.

Modular simulationIn the past few years, the INRIAAlchemy group, together with CEA,University of Toulouse and University ofParis 6, has been working on an alter-native approach to processor simulationcalled modular simulation within a proj-ect called MicroLib. The basic principlebeing to reflect the processor structurein the software structure, alleviatingmany of the above mentioned issues.The goal is to change the methodologyin processor architecture research byencouraging architecture researchers touse modular simulation environments,and by setting up a central repositorywhere researchers could easilyexchange, reuse and compare architec-ture ideas through simulator compo-nents. This repository has existed since2002 at http://www.microlib.org, andcontains modules corresponding to dif-ferent processor components, a few full

Ghent University makes FIT instrumentation framework available to the research community

MicroLib: Using modular simulation to facilitate the sharing, comparison and design of complex processor architectures

FIT is an ATOM-like tool for the genera-tion of binary instrumentors. It supportsback-ends for the Alpha, i386-Linuxand ARM architecture under theTru64Unix, Linux and ARM Firmwareexecution environments. It is easilyportable to any other Unix-based plat-form because it is build on top of theportable open-source executable codeediting framework DIABLO. The currentversion of FIT is released under the GNUGeneral Public License, so it is free touse, modify and redistribute it, as longas the changes are contributed back tothe community.

About INRIA INRIA is the leading French institu-tion dedicated to computer science.It is a decentralized organizationwith more than 110 research proj-ects, distributed across 6 researchunits in France. INRIA researchersbenefit from permanent full-timeresearch positions. The INRIAALCHEMY project is located at theINRIA Futurs unit, south of Paris, andis focusing on novel processor archi-tecture and compiler approaches,and the associated methodology issues. For further information, see www.inria.fr

vailable to the research community

About Ghent UniversityGhent University offers high-quality,research-based education in all aca-demic disciplines. Today GhentUniversity attracts over 26,000 stu-dents and employs more than 5,000researchers. Ghent University investsan annual amount of more than 150million euro in research projects onbehalf of public and private partners.Further information can be found onwww.ugent.be or wwww.techtrans-fer.ugent.be for technology transfer.

7info1

Steering committee

Pilar Armas was hired to supportthe steering committee.Pilar received a degree in ElectricalEngineering (1989) from UniversitatPolitècnica de Catalunya (UPC). Shealso obtained a Master’s in BusinessAdministration (MBA) from IndianaUniversity, USA in 1992 and a Master’sin International Relations by CEI(Barcelona, Spain) in1997. She has previous-ly worked for IBM andEDS Spain, where shewas a project manager.She has valuable expe-rience in the manage-ment of European proj-ects. She will be thesupport coordinator forHiPEAC.

processor models, and tools for speed-ing up simulation. In a recent articlepublished at MICRO in 2004, theAlchemy group has shown that the lackof a methodology for easily comparingsimulation results has a significantimpact on research choices.

This project may become part of theCommon Simulation Platform activity ofthe HiPEAC network, headed by PerStenström. This project is also part of acollaboration with the University ofPrinceton, which has been a leadingproponent of the modular simulationapproach with the Liberty project,headed by David August. The rationalebehind these partnerships is to establisha common modular simulation standardin the US and Europe in order to speedup adoption. The Liberty and MicroLibprojects have recently agreed to movetoward a joint environment and reposi-

tory, with the prospect of setting a com-mon modular simulation standard inthe US and Europe in order to speed upadoption. n

More detailed information is availble athttp://microlib.org

Current list of members

Baldacci, Stefano Kayser Italia, ItalyBartolini, Sandro University of Siena, ItalyBechini, Alessio University of Pisa, ItalyBeivide, Ramon University of Cantabria, SpainBernstein, David IBM, IsraelBilas, Angelos University of Crete, GreeceBodin, François INRIA, FranceBrinkschulte, Uwe University of Karlsruhe, GermanyBrorsson, Mats Royal Institute of Technology, SwedenClauss, Philippe CNRS, FranceCohen, Albert INRIA, FranceCornero, Marco STMicro, SwitzerlandCotofana, Sorin TU Delft, The NetherlandsD. Bruguera, Javier University of Santiago de Compostela, SpainDarte, Alain CNRS, FranceDe Bosschere, Koen Ghent University, BelgiumDeprettere, Ed Leiden University, The NetherlandsDonati, Alessandro Kayser Italia, ItalyDrach, Nathalie CNRS, FranceDuato, Jose University Politecnica de Valencia, SpainDuranton, Marc Philips Research, The NetherlandsEeckhout, Lieven Ghent University, BelgiumEisenbeis, Christine INRIA, FranceEngblom, Jakob Virtutech, SwedenFanucci, Luca University of Pisa, ItalyFeautrier, Paul INRIA, FranceFlautner, Krisztian ARM, UKFoglia, Pierfrancesco University of Pisa, ItalyFurber, Steve Manchester University, UKGarcia, Jose manuel University Politecnica de Valencia, SpainGaydadjiev, Georgi TU Delft, The NetherlandsGiorgi, Roberto University of Siena, ItalyGrassmann, Cyprian INFINEON, GermanyHagersten, Erik Uppsala University, SwedenHufeld, Knut INFINEON, GermanyJesshope, Chris University of Amsterdam, The NetherlandsKarl, Wolfgang University of Karlsruhe, GermanyKatevenis, Manolis FORTH, GreeceKaxiras, Stefanos University of Patras, GreeceKelly, Paul Imperial College, UKLeupers, Rainer RWTH Aachen, Germany

continuation on page 9

FIT is flexible in that it allows the user todefine his own custom instrumentationroutines, which can be called at instruc-tion, basic block, function or programlevel. FIT allows the user to trade speedfor accuracy. At the highest accuracylevel, the instrumentation is slow, but100% accurate (i.e. all addresses, han-dles, values, flags collected from theinstrumented binary are exactly as theyare in an uninstrumented execution).

More detailed information is availableat http://www.elis.UGent.be/FIT

n

8 info1

Avoiding Mapping Conflicts in Microprocessorsby Hans Vandierendonck, [email protected], Prof. Koen De Bossschere, Ghent University, January 2004.

This thesis studies and proposes solutions toconflict misses in caches and aliasing inpredictors. Conflicts in caches are avoided withXOR-based hash functions. Algorithms arepresented to determine near-optimal hashfunctions. The hash functions obtained thisway significantly reduce the miss rate (32% in

an 8KiB direct mapped cache). This workshows that it suffices to hash n+2 address bitsto compute n set index bits. Each set index bitis the XOR of at most 2 address bits. Models ofinter-bank dispersion show similar propertiesfor skewed-associative caches. Aliasing in thefinite context method (FCM) value predictor is

caused mostly by spreading the predictions forstride patterns over many entries in the level-2table. The differential FCM predictor mapsstride patterns to a single entry in the level-2table, decreasing the prediction error up to33%.

Application-Specific Parallel Structures for Discrete Cosine Transform and Variable Length DecodingBy Jari Nikara, [email protected], Prof. Stamatis Vassiliadis, TU Delft, June 2004

This thesis considers the design of application-specific parallel structures for digital signalprocessing. Due to wideness of the subject, thediscussion has been restricted to the studies ofthe discrete cosine transform and variablelength decoding. New area-efficient parallelstructures, which process data in a sequential

form at data rate, are developed for the dis-crete cosine transform. Comparison to a state-of-the-art design reveals up to 15% smallerestimated area than in the reference design.For the variable length decoding, a novel mul-tiple-symbol decoding scheme is proposed. Thecritical path of the resulting decoder is mini-

Optimized Energy Consumption of a Multithreaded Processor with Real-time Capabilityby Sascha Uhrig, [email protected], Prof. Theo Ungerer, University of Augsburg, February 2004.

One of the key problems of modern processorsis the power dissipation respectively the ener-gy consumption. Two energy managementtechnologies based on the multithreadedKomodo microcontroller with integrated real-

time scheduling were developed. Both tech-nologies are realized as hardware solutionsand therefore do not suffer from an additionalsoftware overhead. The first algorithm is anenhancement of the Guaranteed Percentage

real-time scheduling and the second works incombination with EDF scheduling. The evalua-tions showed energy savings between 23%and 47% in comparison to a comparable soft-ware based algorithm.

Iterative Compilation and Performance Prediction for Numerical Applicationsby Grigori Fursin, [email protected], Prof. Michael O’Boyle, University of Edinburgh, May 2004

This thesis presents a platform independentoptimization approach for numerical applica-tions based on iterative feedback-directed pro-gram restructuring using a new reasonably fastand accurate performance prediction tech-nique for guiding optimizations. New strate-

gies for searching the optimization space, bymeans of profiling to find the best possibleprogram variant, have been developed. Thesestrategies have been evaluated using a rangeof kernels and programs on different platforms

and operating systems. A significant perform-ance improvement has been achieved usingnew approaches when compared to the state-of-the-art native static and platform-specificfeedback directed compilers.

The ρ-TriMedia ProcessorBy Mihai Sima, [email protected], Prof. Stamatis Vassiliadis, TU Delft, May 2004

An augmentation of the TriMedia-CPU64 VLIWprocessor with a Field-Programmable GateArray (FPGA) is presented, and the perform-ance of this hybrid (referred to as ρ-TriMedia)is assessed. To incorporate the FPGA intoTriMedia-CPU64, an extension of the TriMedia-

CPU64 instruction set architecture is proposed.Essentially, SET and EXECUTE instructions areprovided. The SET instruction controls thereconfiguration of the FPGA, and the EXECUTEinstruction launches the FPGA-mapped opera-tions. The experiments carried out on a

TriMedia-CPU64 cycle accurate simulatorindicate that a speed-up of more than 40% onρ-TriMedia over the standard TriMedia-CPU64is achieved for media-oriented tasks.

PhD news

mized by introducing a new multiplexed addunit. The performance of the decoder can beconsidered promising with 16-100% betterthroughput at 2-3.6 times lower frequenciesthan the reference designs on the same FPGAtechnology.

In each newsletter we will publish a summary of recent PhD theses defended in the HiPEAC community in Europe

Steering committee

Per Stenström was appointed editorin chief of the HiPEAC journal.Per is a professor of computer engi-neering at Chalmers University ofTechnology and adjunct professor anddeputy dean of the IT University ofGoteborg. His research is focused ondesign principles for high-performancecomputer systems. He is an author oftwo textbooks and over a hundredresearch publications. He regularlyserves on program committees of majorconferences in the field of computerarchitecture. He has been an editor ofIEEE Transaction on Computers, is aneditor of Journal of Parallel andDistributed Computing and the IEEETCCA ComputerArchitecture Letters. Hehas served as Generalas well as ProgramChair of the ACM/IEEEInt. Symposium onComputer Architecture.He is a member of ACMand senior member ofthe IEEE.

9info1

Software Methods to Improve Data Locality and Cache Behaviorby Kristof Beyls, [email protected], Prof. Erik D’Hollander, Ghent University, June 2004.

The key to good cache behavior is good locali-ty, which is measured by the ‘reuse distance’metric. Based on reuse distances, two opti-mization strategies are followed. First, an auto-matic compiler optimization is presented thatgenerates cache hints for the IA-64 architec-ture, which influences the replacement policyof the cache. This results in a speed-up ofabout 10% on average. Secondly, a visualiza-

tion of long-distance reuses to programmers isproposed. Based on the information providedby the visualization, the locality of threeSPEC2000 programs has been optimized, in aplatform-independent way. After optimization,these programs run three times faster on aver-age on different platforms (Itanium, Pentium4,Alpha).

Adaptive Java optimization using machine learningby Shun Long, [email protected], Prof. Michael O’Boyle, University ofEdinburgh, July 2004

This thesis presents a language- and architec-ture-independent approach to achieveportable high performance. It uses Pugh’sUniform Transformation Framework to specifya large optimization space. A heuristic randomseach algorithm is introduced to explore alarge optimization space in a feedback- direct-ed iterative optimization manner. It is thenextended using a machine learning approachwhich enables the compiler to learn from its

previous optimizations and apply the knowl-edge when necessary.

Experimental results show that the searchalgorithm can quickly find good points withinthe large space. In addition the learing opti-mization approach is capable of finding goodtransformations for a given program from itsprior experience with other similar programs.

Compilation Techniques for High-Performance EmbeddedSystems with Multiple ProcessorsBy Björn Franke, [email protected], Prof. Michael O’Boyle, University ofEdinburgh

This thesis develops an integrated optimizationand parallelization strategy that can deal withlow-level C codes and produces optimised par-allel code for a homogeneous multi-DSP archi-tecture with distributed physical memory andmultiple logical address spaces. Performanceoptimization is achieved through exploitationof data locality on the one hand, and utiliza-tion of DSP-specific architectural features such

as Direct Memory Access (DMA) transfers onthe other hand. Source-to-source transforma-tions of DSP codes yield an average speedup of2.21 across four different DSP architectures.The parallelization scheme is — in conjunctionwith a set of locality optimizations — able toproduce linear and even super-linear speedupson a number of relevant DSP kernels and appli-cations.

Llosa, Josep UPC, SpainLuque, Emilio University Autonoma de Barcelona, SpainLysne, Olav Simula Research Laboratory, NorwayMarkatos, Evangelos University of Crete, GreeceMarwedel, Peter University of Dortmund, GermanyMendelson, Bilha IBM, IsraelMeunier, Stéphanie INRIA, FranceMichaud, Pierre INRIA, FranceMoshovos, Andreas University of Athens, GreeceNavarro, Nacho UPC, SpainNorden, Erik INFINEON, GermanyO’boyle, Michael The University of Edinburgh, UKPapadopoulos, George University of Patras, GreecePimentel, Andy University of Amsterdam, The NetherlandsPlata, Oscar University of Malaga, SpainPnevmatikatos, Dionisios University of Crete, GreecePrete, Antonio University of Pisa, ItalyRamirez, Alex UPC, SpainSainrat, Pascal CNRS, FranceSazeides, Yiannakis University of Cyprus, CyprusSeznec, André INRIA, FranceSips, Henk TU Delft, The NetherlandsSousa, Leonel INESC-ID, PortugalStenstrom, Per Chalmers University of Technology, SwedenTemam, Olivier INRIA, FranceTirado, Francisco University Complutenese de Madrid, SpainTopham, Nigel The University of Edinburgh,UKUngerer, Theo University of Augsburg, GermanyValero, Mateo UPC, SpainVassiliadis, Stamatis TU Delft, The NetherlandsViñals, VictorUniversity of Zaragoza, SpainVounckx, Johan IMEC, BelgiumWatson, Ian Manchester University, UKZapata, Emilio l. University of Malaga, Spain

n

continuation from page 7

10 info1

Processor and system modelingby Gilles Mouchard, [email protected], Prof. Olivier Temam, INRIA & CEA, September 2004, now at CEA

We propose a design methodology for buildinginstruction set emulators and micro-architec-tural simulators in SystemC. This methodologydefines an interface and a communicationprotocol between the simulation components.Our methodology has been applied to buildinga generic superscalar processor simulator

(OoOSySC), and a library of modular simulationcomponents (MicroLib). However, modularsimulators are significantly slower than mono-lithic simulators. We propose a new fastsystemC simulation engine called FastSysCwhich can speedup cycle level SystemC simula-tors by 2.13 to 3.56 over the SystemC 2.0.1

engine. We also present DiST, a practicalapproach and tool for distributing any cycle-level simulator over several workstations. DiSTachieves an average speedup of 7.35 on 10machines with an average IPC error of 1.81%.

Sparse Matrix Vector Processing FormatsBy Pyrros Stathis, [email protected], Prof. Stamatis Vassiliadis, TU Delft, November 2004

In this thesis we propose two new storage for-mats for sparse matrices denoted as BlockBased Compression Storage (BBCS) formatand Hierarchical Sparse Matrix (HiSM) storage.Regarding storage space our proposed formatsrequire 72% to 78% of the storage space

needed for Compressed Row Storage (CRS) orthe Jagged Diagonal (JD) storage, both com-monly used sparse matrix storage formats.Regarding Sparse Matrix Vector Multiplication(SMVM) both BBCS and HiSM achieve aspeedup of 5.3 and 4.07 versus CRS and JD

respectively. Additionally, the operation of ele-ment insertion using HiSM can be sped up bya factor of 2-400 depending on the sparsity ofthe matrix and the performance of the trans-position operation is increased by a factor of17.7 when compared to CRS.

Program optimization methodology for complex processors By David Parello, [email protected], Prof. Olivier Temam, INRIA, HP France & University of Paris Sud, September 2004,now at University of Paris Sud

Because processor architectures are increas-ingly complex, it is increasingly difficult toembed accurate machine models within com-pilers. In this thesis, we adopt a bottom-upapproach to the architecture complexity issue:we assume we know everything about thebehavior of the program on the architecture.

We present a manual but systematic processfor optimizing a program on a complex proces-sor architecture using extensive dynamicanalysis, and we find that a small set of run-time information is sufficient to drive anefficient process. We have experimentallyobserved on an Alpha 21264 that this

approach can yield significant performanceimprovement on Spec benchmarks, beyondpeak Spec. This thesis has been funded anddone in cooperation with HP France, and thisapproach is currently used at HP France foroptimizing customer applications.

PhD news

Alternative Approaches to Improve Performance without ILP by Sami Yehia, [email protected], Prof. Olivier Temam, INRIA & University of Paris Sud, September 2004, now at ARM Research, Cambridge, UK.

In this thesis we propose alternative approach-es to exploit on-chip space and reduce thememory wall effect for applications that havecomplex data structures or irregular dataaccess patterns. For codes that have little ILPand low spatial locality, we propose a novel

approach that collapses dependent instruc-tions to functions that execute independentlyand in parallel. Because the collapsingapproach is limited by dependent memoryaccesses, we propose the “load squared”, anapproach that improves performance of

dependent loads that have high miss ratios byadding logic closer to memory. We also inves-tigate a generalization of this concept by pre-senting a decoupled architecture associatedwith a language extension that explicitly sepa-rates execution from data accesses.

Shared Memory and OpenMP on Clusters – Performance Aspects, Compilers, Methods, and Run-time Systems by Sven Karlsson, [email protected], Prof. Mats Brorsson, KTH, Royal Institute of Technology, Stockholm, Sweden, September 2004

Clusters, i.e., several computers interconnectedwith a communication network, provide a costefficient way to achieve high performance.Messages are the natural way of communica-tion in this kind of systems. However, it is wide-ly argued that using a shared memory pro-

gramming model reduces the programmingeffort. Hence it is interesting to investigate sys-tems that provide shared memory on clusters.This thesis describes some performanceaspects of providing such a shared memoryusing software. Systems that implement a

shared memory model in software are com-monly called software distributed shared mem-ory systems, software DSM systems. The thesisconsists of seven papers that each describesdifferent aspects of software DSM systems.

11info1

Single Electron Tunneling Based Arithmetic ComputationBy Caspar Lageweg, [email protected], Prof. Stamatis Vassiliadis, TU Delft, November 2004

This thesis addresses the design and imple-mentation of arithmetic computation in singleelectron tunneling (SET) technology, one of thecandidate technology for future nano-electron-ics based circuits. First, we investigate singleelectron encoded logic (SEEL), which is based

on encoding logic values with single electrons.We propose threshold logic gates, Booleanlogic gates, and memory elements that operateon the SEEL principle. Second, we investigateelectron counting based arithmetic, which isbased on encoding input operands as charge

quantities. We propose electron counting com-ponents and schemes for addition and multi-plication. We propose SET based implementa-tions of the two main electron counting build-ing blocks and demonstrate the schemesthrough examples.

The Molen Polymorphic Media Processorby Georgi Kuzmanov, [email protected], Prof. Stamatis Vassiliadis, TU Delft, December 2004

This thesis describes a reconfigurable proces-sor, which can diminish and even overcome thecomputational power and wide data bandwithlimitations of media applications while remain-ing as flexible as a general purpose processor.The proposal is based on the co-processor

architectural paradigm. The basic idea compris-es a core general purpose processor, whichcontrols the execution and the reconficurationof a reconfigurable co-processor, tuning thelatter to specific media algorithms.A fully oper-ational prototype implemented in the Xilinx

Virtex II Pro (TM) technology is proposed. Anexperimental evaluation of the prototype isperformed considering MJPEG, MPEG-2, andMPEG-4. The experimentally obtainedspeedups approach up to 98 % of the theoret-ically attainable maximums.

Fetch improvement mechanisms for next-generation processorsBy Ayose Falcón, [email protected], Prof. Mateo Valero and Prof. Alex Ramirez, Universitat Politecnica de Catalunya (UPC), January 2005.

The sustained throughput achieved by thefetch stage impacts the overall processor’s per-formance, because the throughput of all thesubsequent pipeline stages depends on andcannot exceed the throughput provided by the

fetch unit. This thesis adresses the issues thatthe technological evolution will incorporate inthe design of fetch units for next-generationprocessors. The proposals presented in thisthesis are not only restricted to the superscalar

environment, but also extend to SimultaneousMultithreading (SMT) processors. In both archi-tectures, we attack the main factors that deter-mine the fetch performance: branch predictionaccuracy and instruction cache throughput.

Strategies to Reduce Energy and Resources in Chip Multiprocessor Systems by Magnus Ekman, [email protected], Prof. Per Stenström, Chalmers University of Technology, Goteborg, Sweden, December, 2004

The new architectural style chip multiprocessor(CMP) comes with many promises but alsowith new problems and new trade-offs. Thisthesis addresses the technical problem of howto design more efficient CMP systems in terms

of energy and memory utilization. It con-tributes with design strategies that fall intothree categories consisting of design principlesto reduce energy consumption and main mem-ory resources without reducing performance,

design recommendations to balance theexploited instruction-level parallelism andthread-level parallelism in a CMP, and amethodology to reduce simulation time whenevaluating future designs.

Are you a HiPEAC member, and do you want to includea recent PhD of your group in this list?

Send a short summary to [email protected]

Upcoming events

12 info1

HPCA-11, 11th International Symposium on High-Performance Computer Architecture, Palace Hotel, San Francisco, February 12-16, 2005, http://www.hpcaconf.org/hpca11

DATE 2005, Design, Automation and Test in Europe, Messe Munich, March 7-11, 2005, http://www.date-conference.com

CC 2005: Compiler Construction, Edinburgh, Scotland, April 4-8, 2005, http://cc05.cs.berkeley.edu

CF ‘05, 2005 ACM International Conference on Computing Frontiers, Ischia, Italy, 4-6 May 2005, http://www.computingfrontiers.org

PLDI 2005: Conference on Programming Language Design and Implementation, Chicago, June 12-15, 2005, http://www.research.ibm.com/pldi2005

ICAC 2005: International Conference of Autonomic Computing, Seattle, WA, June 13-16, 2005, http://www.caip.rutgers.edu/~parashar/icac2005/index.htm

DAC 2005: Design Automation Conference, Anaheim Convention Center, Anaheim, CA, June 13-17, 2005, http://www.dac.com/42nd/index.html

PPoPP 2005: Conference on Principles and Practices of Parallel Programming, Chicago, IL, June 15-17, 2005, http://www.cs.cornell.edu/Conferences/PPoPP05

LCTES 2005: Conference on Languages, Compilers, and Tools for Embedded Systems, Chicago, IL, June 15-17, 2005, http://lctes05.snu.ac.kr

HiPEAC Summer School on advanced computer architectures and compilation for embedded systems, L’Aquila, Italy, July 24-30, 2005, http://www.HiPEAC.net

Euro-Par 2005, Lisbon, Portugal, August 30-September 2, 2005, http://europar05.di.fct.unl.pt/

HiPEAC ConferenceBarcelona, Spain, November 2005, http://www.HiPEAC.net

In the next issue: A presentation of HiPEAC SpainAnnouncement of the HiPEAC Summer School

Announcement of the HiPEAC Conference

Would you like to receive this newsletter? Please contact Mrs. Pilar Armas at

[email protected]

HiPEAC Info is a quaterly newsletter published by the HiPEAC network of excellence. It is funded by the 6th European Framework Programme (FP6), under contract no. IST-004408. Website : http://www.hipeac.net • Subscriptions: [email protected]

Documents

HiPEACinfo 1