20
NETWORK OF EXCELLENCE ON HI GH PERFORMANCE AND E MBEDDED ARCHITECTURE AND COMPILATION AUTUMN COMPUTING SYSTEMS WEEK, SEPTEMBER 21-23, 2015, MILANO, ITALY WELCOME TO ACACES’15, JULY 12-18, 2015, FIUGGI, ITALY Follow us on LinkedIn! hipeac.net/linkedin info 43 appears quarterly july 2015

HiPEACinfo 43

  • Upload
    hipeac

  • View
    215

  • Download
    2

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: HiPEACinfo 43

NETWORK OF EXCELLENCE ON HIGH PERFORMANCE AND EMBEDDED

ARCHITECTURE AND COMPILATION

AUTUMN COMPUTING SYSTEMS WEEK, SEPTEMBER 21-23, 2015, MILANO, ITALY

WELCOME TO ACACES’15,

JULY 12-18, 2015, FIUGGI, ITALY

Follow us on LinkedIn!hipeac.net/linkedin

info 43

appears quarterly july 2015

Page 2: HiPEACinfo 43

MESSAGE FROM THE HIPEAC COORDINATOR

CONTENT

intro

This spring, a new company miDiagnostics was launched in Belgium. This in itself is not special, but its product captured my attention. It will commercialize a dispo­sable chip called miLab. This chip will be able to automatically carry out a sophis­ticated blood analysis (DNA, proteins, viruses, blood cells,…) in ten to fifteen minutes, starting from one drop of blood. The process will be as simple as a routine glucose test already used at home by millions of people diagnosed with dia­betes. One chip will cost between 10 and 20 euro, and it will be for sale everywhere. The chip integrates a complete lab in just a few square centimeters, with the outcome of the analysis sent wirelessly to a smart­phone. If this product is successful, then it will change the way doctors work, because they will no longer have to wait for the outcome of a blood analysis. It will bring an affordable diagnostic tool to the poorest regions of the world, and it will disrupt the clinical microbiology market. Beyond that, police officers will be able to analyze the blood of suspected drunk

drivers, without having to call a doctor to take a blood sample, monthly blood checks at home will become feasible leading to the early detection of medical conditions long before they become life threatening. The chip also has applications in other sectors where wet laboratories are used: environment, food industry, sports, and many more.By creating a new class of (disposable) devices, it also creates opportunities for the computing systems industry. Maybe one day, we will buy disposable transistors, rather than transistors that have to last for a couple of years. The future will tell whether this technology will become a game changer or not, and whether Europe will lead in this domain. The investors of miDiagnostics believe it will: the company starts with an investment of 60 M €, which makes it the highest capitalized startup ever in Belgium.In May, HiPEAC underwent its third review. The reviewers concluded that the project has made very good progress during the third reporting period and that the con­

hipeac activity

4 THEMATICSESSIONSINTHEOSLOCSW(MAY5-7,2015)

hipeac announce

7 COMPUTING:THECURRENTANDITSPROBABILITYBASEDFUTURE

7 CEVELOPC++IDERELEASED

hipeac news

8 JUNIPERNETWORKSANDMAXELERTECHNOLOGIESANNOUNCENEWCOMPUTE-INTEGRATEDNETWORKSWITCH

8 EUROPEANLLVMCONFERENCE2015

in the spotlight

9 SAFERTRAVELSANDIMPLANTSWITHDESYRESYSTEMS

10 SOCKETSOVERRDMAANDSHAREDPERIPHERALSFORARMMICROSERVERS

11 THEAEGLEPROJECT

12 SUCCESSFULCONCLUSIONTOEUPARAPHRASEPROJECT

13 COLLECTIVEKNOWLEDGE:AFRAMEWORKFORSYSTEMATICPERFORMANCEANALYSISANDOPTIMIZATION

hipeac students

14 COLLABORATIONGRANT:HECTORORTEGA

14 COLLABORATIONGRANT:LUCIAG.MENEZO

15 COLLABORATIONGRANT:ROELJORDANS

15 COLLABORATIONGRANT:JAIMEESPINOSAGARCIA

16 COLLABORATIONGRANT:ALEJANDROVALERO

16 COLLABORATIONGRANT:ERKANDIKEN

17 INTERNSHIPREPORT:SOMNATHMAZUMDAR

17 INTERNSHIPREPORT:MICHELESCANDALE

18 INTERNSHIPREPORT:TURKEYALSALKINI

18 INTERNSHIPREPORT:ANOUKVANLAER

19 phd news

20upcoming events

sortium certainly has the capacity and resources to continue delivering value to European stakeholders and to fully achieve the stated objectives. We are very happy with this outcome, and we are committed to continue supporting the European com­puting systems community in the future.This newsletter is the summer school issue. The summer school also marks the beginning of the summer break for me and for the HiPEAC staff. We wish you a relaxing summer with your family and friends, and we hope to see you again after the summer holiday in good health, and full of plans for the year to come.

Koen De Bosschere_________

HiPEAC info 432

Page 3: HiPEACinfo 43

intro

MESSAGE FROM THE PROJECT OFFICER

I am just back from the “Computer Systems Week and block review” organised by HiPEAC with the University of Oslo, and I am really satisfied with what I saw: HiPEAC is a growing, active and hardworking commu nity, with impressive know-how in all areas of computer science.What I found a bit less exciting, in the same community, is the capability to go beyond the purely technological work; for example, explaining the potential of the developed technologies to the wide public, or identifying and pursuing exploitation opportunities. We know that computers are going to change the world (they have already done it, actually), but a part of our community seems so busy with the details of the technology that they risk over-looking the possible applications.Well, this is not the best approach: advan­ced computing is a very powerful tool in our hands and we should develop a vision of how this “magic wand” can create new

useful applications, in every existing sector of the economy, and even in new market segments that do not yet exist. And then we should be able to communicate this vision, to make it understandable to everybody, not only to computer science majors, because it is not them who will be able to finance our ideas and make them happen.This does not mean that every researcher should become a salesman: on the con­trary, research should not be confused with product development, and we should fight for the freedom to fail while exploring untracked roads, because this is the only way to innovate; personally, I will continue to support this approach internally in the European Commission. What we all need, however, is the far-sighted vision and open mind needed to look beyond the computer screen and the lab walls, in order to under­stand how our technology can really make a difference. It is much more fun to change the world than to play in the lab, and it is also what European Union asks as a condition of giving us money. The Horizon 2020 programme is described as “research and innovation to boost growth and jobs

in Europe”, and future work programmes will be more and more asking for the creation of platforms, ecosystems, and technical/economic communities which can put technology into use, create value, and make advanced computing techno­logies really relevant in the world. The applications of these technologies to the digitalization of European industry, to the Internet of Things and autonomous vehicles and robots, can change radically the world as we know it. Our role is not only to develop the technologies that make this change possible, but also to create applications which are socially and ethically acceptable and to make sure that European industry can benefit from the change in terms of jobs and growth.If you do a good job – and I am sure that the HiPEAC community can do a really good job – we will have such visible results that I will be finally able to explain my work to my mother­in­law. Sandro D’Elia_________

ACACES'14 Group Photo

HiPEAC info 43 3

Page 4: HiPEACinfo 43

The 2015 Spring Computing Systems Week took place in Oslo, Norway from May 5th to May 7th. During this week several thematic sessions were organized by HiPEAC participants. In the following text we summarize the topic and organizers for each of the sessions. Most presentations are available on the HiPEAC website at https://www.hipeac.net/csw/2015/oslo/schedule

hipeac activity

1. SECURITY INTELLIGENCEOrganized by Michael Vinov and Omer Boehm, from IBM Research Haifa, this session explored the most recent advancements in Computer Security, and the problems that need to be solved. It looked at a wide range of recent computer security research, focusing on mitigation techniques and some of the future challenges that we are facing. The session included two talks that presented real­world use cases:• Vulnerability Detection using Symbolic Interpretation. Sharon Keidar­Barner

(IBM Research Haifa)• Preventing ROP Attacks. Omer Boehm (IBM Research Haifa)

2. EMBEDDED COMPUTER VISIONOrganized by David Moloney (Movidius) and Oscar Deniz Suarez (University of Castilla­La Mancha), this session dealt with the increasing need for ‘intelligence’ and cognitive functions in embedded systems. Four invited speakers shared their experiences in this field with the audience:• Next Generation Imaging Solutions for Smartphones. Peter Corcoran (FotoNation)• Hyperspectral Imaging goes Embedded. Max Larin (XIMEA)• Platforms and Applications for Embedded Computer Vision: Toys, Bees and Safety

Devices. Emanuel Popovici (University College Cork)• Accelerating OPENVX Applications on Embedded Many-core Accelerators.

Giuseppe Tagliavini (University of Bologna)

3. PATTERNS OF PARALLELISM AND SOFTWARE ENGINEERING FOR MULTICORE/MANYCORE SYSTEMS

Organized by Kevin Hammond (Univ. of St. Andrews), the purpose of this session was to explore the general problem of software engineering as it applies to multicore/manycore systems. Parallelism is increasingly important for software. It is not unreasonable to say that almost all future software development will need to consider parallelism. Software engineering methodologies and practices are, however, firmly rooted in the single­core era. The limited tools that exist do not cover all aspects of software development and are not integrated into coherent methodologies. It included 6 talks:• Advanced Parallel Programming with FastFlow. Marco Danelutto (Univ. of Pisa)• Refactoring Parallel Programs. Chris Brown (Univ. of St. Andrews)• Pattern-Based Approaches to Programming Heterogeneous Systems. Kevin Hammond

(Univ. of St. Andrews)• Concurrency and Parallelism in Modern C++. Daniel Garcia (Univ. Carlos III of Madrid)• Using Machine Learning to Map Applications to Heterogeneous Parallel Systems.

Vladimir Janjic (Univ. of St. Andrews)• Experience with Programming Parallel Applications: an Industrial Perspective.

Thomas Natschlager (SCCH)

4. FP7 PROJECTS HARPA, CLERECO AND EXCESS; CONVERGENCE, PERSPECTIVES AND JOINT VISION

This session was organized by Dimitrios Soudris (National Technical University of Athens), and included discussions on standardization, power vs. reliability, hardware vs. software reliability, data structures of HPC applications on embedded platforms, and exploration

THEMATIC SESSIONS IN THE OSLO CSW (MAY 5 - 7, 2015)

VisitOSLO/Normanns Kunstforlag/ Terje Bakke Pettersen

HiPEAC info 434

Page 5: HiPEACinfo 43

hipeac activity

of synergies and common actions between the three involved projects. It included 3 talks and a panel discussion:• HARPA: Harnessing Performance Variability. William Fornaciari (Politecnico di Milano)• CLERECO: Cross-Layer Early Reliability Evaluation for the Computing cOntinuum.

Stefano Dicarlo (Politecnico di Torino)• EXCESS: Execution Models for Energy-Efficient Computing Systems. Christoph Kessler

(Linköping University)The panel discussion went around standardization, power vs. reliability, the debate between hardware or software reliability, how certain data structures affect the dependability/relia­bility on embedded platforms, and ways to explore synergies between the EU projects.

5. TOWARDS PORTABLE LIBRARIES FOR HYBRID SYSTEMSOrganized by Christian Brugger and Christian De Schryver (Technical Univ. of Kaisers­lautern), this session was about current challenges and feasible approaches for bundling hardware and software parts with the required interconnect and runtime environment in a library that runs on a wide range of compute platforms. It included 4 talks:• Scalable Architecture and Shared-memory Programming for FPGA-based

Heterogeneous Platforms. Paolo Burgio (Univ. Modena)• Portable Libraries and Programming Environments. Jeronimo Castrillon (TU Dresden)• Rule-based Program Transformation for Hybrid Architectures. Manuel Carro

(IMDEA Software Institute)• HW Flexibility & Runtime Optimizations. Ioannis Sourdis (Chalmers Univ. of Technology)

6. INTERNET OF THINGS (IOT): TECHNOLOGY AND APPLICATIONS FOR A GOOD SOCIETY

Organized by Donn Morrison and Lasse Natvig (Norwegian Univ. of Science and Techno­logy), this session brought together experts from academia and industry who research and develop components and products that deal with the growth in the number of network connected devices, known as the Internet of Things. It included 6 talks:• Smarter Bees - Bees, IoT and Big Data. Torstein Dybdah (TD Research)• Integrating Wireless Sensor Networks Into Internet of Things: Challenges. Yuming Jiang

(NTNU)• Developing Robust IoT Gateway Applications from Building. Frank Alexander Kraemer

(Bitreactive)• Robotics in IoT. Jim Tørresen (Univ. of Oslo)• Engineering the IoT. Alf Syvertsen (Silicon Labs)• Internet of Things - Marketing or Real Opportunity. Jo Uthus (ATMEL Norway)

7. BEYOND SELF-AWARE EMBEDDED COMPUTINGOrganized by Stephan Wong (Delft Univ. of Technology), this session invited experts from the field of self­aware embedded computing to explore how their solutions can be further improved in a more interconnected world. It included three talks:• Application Autotuning and Runtime Resource Management from Heterogeneous

Manycore Architectures. William Fornaciari (Politecnico di Milano)• Self-Awareness in Cyber-Physical Systems. Axel Jantsch (Technical Univ. Wien)• Runtime support for self-awareness in interconnected CPS systems.

Dionisios N. Pnevmatikatos (ICS FORTH)It ended with a panel discussion on “What is beyond self­aware embedded computing?”

8. EUROPEAN INITIATIVE ON RUNTIME SYSTEMS AND ARCHITECTURE CO-DESIGN

This session was organized by Miquel Moretó (BSC), Marc Casas (BSC), Vassilis Papaefstathiou (Chalmers) and Miquel Pericàs (Chalmers). This session gathered representatives of the main european research groups in programming models and computer architecture co­design to discuss ways to achieve strengthened co­operation and improved interopera­bility of their runtime middlewares. This session included six talks:

HiPEAC info 43 5

Page 6: HiPEACinfo 43

hipeac activity

• MECCA – Meeting the Challenges in Computer Architecture. Per Stenström (Chalmers Univ. of Technology)

• The Swan Task Dataflow Scheduler. Hans Vandierendonck (Queens Univ. of Belfast)• Project Beehive: A HS/SW co-designed stack for runtime and architectural research.

Christos Kotselidis (Univ. of Manchester)• Task-based Runtimes for Multicore Architectures. Foivos S. Zakkak (FORTH­ICS)• Runtime-Aware Architectures. Miquel Moretó (BSC)• The StarPU Runtime System: Task Scheduling for Exploiting Heterogeneous

Architectures. Olivier Aumage (INRIA Bordeaux)The thematic session concluded with a panel during which the participants discussed the main scientific challenges addressed by their research, how their work could benefit from strengthened collaboration, and what instruments or funding vehicles could be used to improve collaboration across European research groups.

9. RISING VIRTUES OF HETEROGENEOUS SYSTEMS: RELIABILITYThis session was organized by Chris Fensch (Heriot­Watt University), Georgios Goumas (National Technical Univ. of Athens), and Marisa Gil (BSC­UPC Barcelona Tech). This new edition of the Programming Models Thematic Session started a series of meetings that focus on specific factors or components that influence performance, but that are also of concern to designers, programmers, and developers. The selected topic was Reliability. The main ideas in the discussion were around the factors that impact the resilience of a system, and how overall resilience depends on the weakest link. Resilience is difficult to test, and the use of simulators gives results less accurate than using real hardware. For the programming models, a return to dataflow and task­based models (e.g. OpenMP 4.0 and OmpSs) are future promises. It included 4 talks and a panel discussion:• Keynote: Heterogeneous Systems - a Blessing or a Curse for Massive Parallel Dependable

Systems. Avi Mendelson (Technion ­ Israel Institute of Technology)• Controlling Application Behavior in the Presence of Approximations and Errors.

Christos Antonopoulos (CERTH)• Variability-Aware Self-Adaptive Parallel Application in Many-core Chips. Fabien Chaix

(FORTH)• PID-Controlled DVFS for Absorbing Temporal Overheads of RAS Mechanisms. Dimitrios

Rodopoulos (National Technical Univ. of Athens)After the talks, the panel discussion dealt with the difficulties of testing for proper implementation of resiliency, how the programming model can improve resiliency, the tradeoffs between low power consumption and reliability, and the fact that there is a missing set of benchmarks to evaluate the resiliency techniques.

10. ERROR-AWARE SYSTEMS: OPPORTUNITIES AND CHALLENGES FOR HANDLING ERRORS AT MULTIPLE LEVELS

Organized by Dimitrios Nikolopoulos (Queen’s Univ. of Belfast), Pedro Trancoso and Yiannakis Sazeides (Univ. of Cyprus), this session included an initial report on the topics covered during the discussions in the Thematic Session in the Athens CSW, talks reporting results from related EU Projects and an Invited Talk from BSC on related issues within the scope of the MontBlanc Project. It included 5 talks:• Report from the Error-Aware Thematic Session I. Yanos Sazeides (Univ. of Cyprus)• Report from the EU Energy-Efficient Computing Systems Workshop. Koen De Bosschere

(Ghent Univ.)• Report from the 1st Workshop on Approximate Computing (WAPCO 2015).

Georgios Karakonstantis (Queen’s Univ. of Belfast)• Enablers and Roadblocks of Approximate and Error-Aware Computing.

Christos Antonopoulos (Univ. of Thessaly)• Understanding and Addressing the Resiliency Issues for Future Exascale Computing with

the Mont-Blanc Prototype. Ferad Zyulkyarov (BSC)The session concluded with a panel discussion._________

HiPEAC info 436

Page 7: HiPEACinfo 43

Where Current Computers struggle and how Probability Based Computing can overcome this.

Developed in the REPARA project, the Eclipse based Cevelop IDE for C++ aims to make programmers more productive by integrating automated refactoring and testing tools.

hipeac announce

Over the last decade, current computing platforms have not progressed at a similar rate as in previous years. This lack of progress is largely due to a combination of different problems, ranging from silicon manufacturing issues over to actual problems with the models being used, leading to, for example, the von Neumann bottleneck. When one takes a few steps backwards and “over­views” the situation, then it becomes clear that current platforms have their limitations. Consequently, there is the need to start developing a new computing approach, namely one that is more biologically inspired, can deal with unreliable components, while at the same time offering more intelligent functionalities. Probability based computation has all the necessary characteristics to overcome the current problems, while also forming a better platform for machine learning approaches. There is still work to be done before these new systems will become reality, but it is about time that people with knowledge and understanding of science and technology start to combine their knowledge and learn to deal with the need for unreliability to ensure this brighter future does happen.

For more information please visit: http://users.telenet.be/wimmelis/professional/books/book_1.htm

Wim Melis, University of Greenwich_________

Cevelop is an integrated development environment (IDE) for C++ programmers that combines a variety of development tools into a one­stop download. Cevelop is based on the most recent version of the popular Eclipse C/C++ Development Tooling, combining a stable technical foundation with the ergonomics of a state­of­the­art IDE. The development of Cevelop was started in the FP7 REPARA project by the University of Applied Sciences Rapperswil, Switzerland. It provides the infrastructure to run REPARA’s static code analysis and transformation tools for heterogeneous parallelization refactorings.Cevelop ships with the the CUTE unit testing framework, support for the Scons build system and new tools to refactor namespaces and macros. It also helps you to upgrade your code to C++11/14 to automatically take advantage of new features such as initializer lists and smart pointers. Cevelop is free to use and is available for Windows, OS X, and Linux.

For more information please visit: https://www.cevelop.com/

Mirko Stocker, HSR University of Applied Sciences Rapperswil_________

COMPUTING: THE CURRENT AND ITS PROBABILITY BASED FUTURE

CEVELOP C++ IDE RELEASED

HiPEAC info 43 7

Page 8: HiPEACinfo 43

hipeac news

JUNIPER NETWORKS AND MAXELER TECHNO LOGIES ANNOUNCE NEW COMPUTE-INTEGRATED NETWORK SWITCH

Maxeler Technologies and Juniper Net­works have joined forces and developed a compact ground­breaking data center switch that integrates high­performance compute resources into the network. Juni­per’s QFX5100­AA is a novel type of appli­cation acceleration switch that includes a jointly developed QFX­PFA packet flow accelerator module capable of processing a large amount and variety of data streams at line­rate.

Maxeler is pioneering a new dataflow­oriented approach to efficient high­per­formance computing, where appli ca tion experts in science, engineering or finance can develop and customise their algo­rithms in a high­level language, targeting Maxeler’s highly efficient dataflow sys­tems. Maxeler multiscale dataflow techno­logy exploits inherent parallelism in appli cations, and one to two orders­of­magnitude improvements in terms of both throughput and power consumption compared to standard ser vers of the same size have been realised, across a range of application domains, including finance,

geology, weather modelling, genomics, and data analytics.

Including Maxeler’s technology into the latest Juniper QFX network switch product family vastly improves the performance of network applications that require complex processing. Network traffic can be decoded, transformed and re­encoded while sus tai­ning line­rate throughputs with mini mal and highly predictable latency. Finan cial institutions will be able to instantly analyse and process massive quantities of information originating from various sources, in order to make better trading decisions, reduce risks or comply with the latest regulatory requirements. The QFX5100­AA switch enables the processing of market data decoding and risk analysis to take place directly inside the network

infrastructure with ultra­high throughput and predictable latency. Other application areas include, but are not limited to: handling and analysis of social media feeds, line­rate video transcoding and big data analytics. The QFX­PFA packet flow accelerator is programmed and managed using Maxeler’s dataflow com piler and operating system. Full TCP/IP support is included.

More information about this product can be found at:http://newsroom.juniper.net/press-release/juniper-networks-delivers-unparalleled-application-performance-with-new-compute-integrated-switch

Tobias Becker, Maxeler Technologies_________

This April 2015, taking place in parallel with ETAPS, EuroLLVM reached over 250 attendees!

EUROPEAN LLVM CONFERENCE 2015

active games masters programme that works, for example, on applying technology deve loped for triple­A games to the solution of high performance computing problems.

This year, EuroLLVM not only had a great line­up of presentations, but it also hosted a Khronos UK Chapter meeting and it was strategically organized in parallel with ETAPS 2015. Together, EuroLLVM attracted over 250 attendees, which made it the largest EuroLLVM conference ever, with participation from industry, research and members from the LLVM community.

EuroLLVM had two inspiring keynotes, with Francesco Zappa Nardelli talking about the C++ memory model and Ivan Goddard presenting the Mill CPU, an unusual architecture that uses ‘bands’ instead of traditional registers. In addition to the keynotes, there was a diverse set of talks discussing SIMD challenges such as mixed­width vector code generation and vecto rization of control flow with the new AVX­512 instruction set, but also optimi zations

2015 has seen the return of EuroLLVM to London, this year on April 13 and 14 at Goldsmiths College, University of London. Although better known for its Turner Prize winners, Goldsmiths has an

HiPEAC info 438

Page 9: HiPEACinfo 43

such as loop fusion in the presence of control flow, high­level software pipe lining and low­overhead Link­Time Optimi zation (LTO) with the ThinLTO framework. With Templight we saw a practical solution for interactive C++ template debugging, we saw C++ code such as clang being executed in a web­browser, a report of the use of LLVM in .NET’s CoreCLR, as well as talks on static analysis, partial evaluation and micro processor emulation. The lightning talk sessions were as varied as ever with talks on SPIR, Javascript, verification and sym bolic execution to name but a few. Besides presentations, there were several tea breaks and interactive sessions that allowed attendees to mangle and colla­borate. The conference itself started with a half­day hacking session providing time for networking and free discussions. There was also a poster session, several developer BoFs as well as tutorials on topics such as LLDB and LLVM’s debug info. Finally, the Khronos chapter presented demonstra tions of Vulkan – a low­overhead API for graphics and compute similar to those found on game consoles.

Andy Thomason from Goldsmiths College, who organized EuroLLVM together with LLVM community, said: “We are thankful for the support of Apple, ARM, Codeplay, Google, the HSA Foundation, Intel, the Khronos Group, the LLVM Foundation, Mentor, Qualcomm Innovation Center (QuIC), Solid Sands and Sony Computer Entertainment. Our partners have enabled us to support the growth of EuroLLVM while ensuring attendance

remains affor dable, particularly for enthusiasts and students.” If you now regret having missed out on EuroLLVM, you can look at the slides and video recordings of EuroLLVM 2015 (http://llvm.org/devmtg/2015­04/) and join the next meeting later this year in the Bay Area, or Spring 2016 back in Europe!

For more information please visit: http://llvm.org/devmtg/2015­04/

Tobias Grosser, ETHz _________

hipeac news / in the spotlight

The DeSyRe project has developed a novel DeSyRe SoC architecture and underlying concepts for reliability. Such SoCs have been shown to typically use 28 percent less energy and 48 percent less chip area, while offering a nine times lower hardware failure rate. It’s time to integrate this technology in safer cars and trains, more reliable medical devices, more advanced brain models, and embedded many-core systems.

The DeSyRe project brought together experts in fault­tolerant and self­repairing design.Industry partner YOGITECH helps silicon vendors and system integrators meet functional­safety challenges for the automotive, industrial automation, biomedical and railway mar kets. During the DeSyRe project, YOGITECH combined fault detection, fault diagnosis, and reconfiguration in safe and dependable architectures for reconfigurable embedded systems. An ARM­based demonstrator now shows the benefits of the newly developed IPs.Industry partner Recore Systems makes many­core programming easy. Their first appli­cation of DeSyRe’s fault­detection and reconfiguration techniques is for missions into deep space, where reliable and fault­tolerant systems are a must. The second application is in their FlexaWare many­core embedded platform. DeSyRe concepts of task­based

SAFER TRAVELS AND IMPLANTS WITH DESYRE SYSTEMS

Consortium Members: Chalmers University of Technology (Sweden)University of Bristol (UK)EPFL (Switzerland)FORTH (Greece)Imperial College London (UK) Neurasmus (The Netherlands) Recore Systems (The Netherlands) YOGITECH (Italy).

Project start date: 1st October 2011Project end date: March, 2015

Industry partner websites:YOGITECH: www. yogitech.com Recore Systems: www. flexaware.net Neurasmus: neurasmus.com

Project coordinator: Chalmers University of Technology (Sweden)

Project website:http://www.desyre.eu

HiPEAC info 43 9

Page 10: HiPEACinfo 43

The Euroserver Prototype at FORTH-ICS

in the spotlight

programming and runtime task migration are key for an intelligent many­core OS and crucial for the easy programming of many­cores.Neurasmus is a R&D company which develops new high­tech medical systems. DeSyRe stimulated the launch of two novel Neurasmus products: the Implant Toolbox – with fault­tolerance and security techniques for implantable medical devices – and BrainFrame – an intuitive­to­program high­performance platform for brain­modeling research and applications.Each DeSyRe idea spreads. One project benefits many!

Inès Nijman, Recore Systems_________

SOCKETS OVER RDMA AND SHARED PERIPHERALS FOR ARM MICROSERVERS

Communication among microservers in data centers must be carried out at low over­head, hence low energy and low latency. The Euroserver Project (www. euroserver­project.eu) builds micro servers using low­power ARM cores, that can be assembled in large numbers to maintain high multi­threaded performance. Because communi­cation among server nodes is critical for the scalability of applications, it is essential to minimize small message latency and maximize data transfer throughput. This is not feasible via traditional communication through the TCP/IP stack, which is not optimized for data center workloads and cannot take advantage of any underlying hardware mechanisms within the micro server.

In Crete, we implemented a first prototype (shown in the photograph) of the Euro­server architecture, in the Computer Architecture and VLSI Laboratory of FORTH­ICS. It consists of eight compute

nodes (MicroZed boards with ARM Cortex­A9 processors and FPGA logic), connected by Michael Ligerakis to a central FPGA board. A custom interconnection network by George Kalokairinos, built upon the AXI protocol and the ARM master/slave ports, allows remote memory access among the nodes via physical address translation. Furthermore, the prototype shares the I/O resources, such as network interfaces (NICs) and storage devices, among the multiple compute nodes and their Linux OS instances.

We employ RDMA transfers over our custom interconnect instead of TCP over Ethernet communication. We completely bypass the TCP/IP stack of the Linux kernel to ensure low­latency transmission and reception of internal messages. To allow traditional socket applications to run without any modification, Dimitrios Poulios created our own library which is invoked by intercepting socket­related

system calls in the standard C library (libc). Transfers that are destined to the internal network pass through our RDMA driver, that handles internal connections. Remote notifications and control messages are delivered through the prototype’s hard­ware mailbox mechanism.

Communication to/from the external world is achieved through a shared virtua­lized 10 Gbps NIC, which resides in the central FPGA board. Kostas Harteros crea­ted dedicated Tx/Rx FIFOs per node in the 10 Gbps MAC layer and transmits frames round­robin. Incoming frames are routed to the appropriate nodes according to their destination MAC address. A device driver by John Velegrakis allows the Linux OS and the applications to view the NIC as a standard Ethernet device. The driver makes use of the underlying AXI DMA engine to send and receive frames to and from the MAC. Zero­copy between the kernel and the driver is achieved by main­taining rings of DMA descriptors and operating the DMA in scatter­gather mode.

For further information please contact: Manolis Marazakis, Iakovos Mavroidis, Manolis Katevenis {maraz,jacob,kateveni} @ics.forth.gr FORTH-ICS, Heraklion, Crete, Greecewww.ics.forth.gr/carv

_________

HiPEAC info 4310

Page 11: HiPEACinfo 43

in the spotlight

An Analytics Framework for Integrated and Personalized Healthcare Services in Europe

The AEGLE project aims to produce value out of big data in healthcare, with the goal of revolutionizing integrated and persona lized healthcare services, offering analytic services at two levels, as shown in Figure 1. First at the local level, AEGLE will focus on real­time processing of large volumes of raw data originating from patient moni toring services. Then at the cloud level, AEGLE will offer an experimental big data research platform for data scientists, wor kers and data professionals across Europe. The platform consists of a large pool of semantically­annotated and anonymized healthcare data, state­of­the­art big data analytics methods and advanced visuali sation tools, allowing data scientists to steer the analytics mechanisms with their own insights.

THE AEGLE PROJECT

Consortium Members: Exodus S.A. (EXUS) ­ GRInstitute of Communications and Computer Systems (ICCS) ­ GRKingston University Higher Education Corporation (KINGSTON) ­ UKCenter for Research and Technology Hellas (CERTH) ­ GRMaxeler Technologies (MAXELER) ­ UKUppsala University (UU) ­ SWUniversita’ Vita­Salute San Raffaele (USR) ­ ITTime.Lex (TML) ­ BEErasmus Universiteit Rotterdam (EUR) ­ NLCroydon Health Services NHS Trust (CROYDON) ­ UKGlobaz Grupo S.A. (GLOBAZ) ­ PTUniversity Hospital of Heraklion (PAGNI) ­ GRGnúbila France (GN) ­ FR

Project Coordinator:Exodus S.A. (EXUS) ­ GR

Project website:http://www.aegle­uhealth.eu

Figure 1: The AEGLE infrastructure

AEGLE will go beyond state­of­art on big data services by introducing an integrated infrastructure that exploits FPGA­based dataflow acceleration across three diffe rent software levels, i.e.:• Algorithmic level: customized DataFlow Engines (DFEs) will accelerate the com pu­

tation intensive kernels found in the targeted big data analytics proce dures that will be subsequently mapped to MAXELER’s devices. Advanced compiler­ and datapath­level optimi za tion techniques will be adapted for spatial computing with DFEs. Along with per­accelerator datapath optimi zation, maximization of the accelerator’s scalability issues given the FPGA’s compu tational and memory organi za tion constraints will also be considered.

• MapReduce runtime level: Specialized DFEs will be designed targeting the acce le ration of the underlying Map Reduce programming model. Map Reduce allocates several resources from the software processors, reducing the overall performance of the big data applica tion. In this case, acceleration targets the efficient implementation of internal procedures found in Map Reduce runtime. Early experimental ana lysis considering the implemen tation of a MapReduce accelerated framework for FPGAs, showed speedup gains of up to 32x with respect to a purely software solution. In addition, customized memory management schemes tailored to the memory hierarchy and organization of MAXELER’s devices will be incorporated to efficiently handle the large number of key­value pairs usually generated by MapReduce semantics, as well as platform specific task schedulers for balancing the load across the software processors and the DFEs.

HiPEAC info 43 11

Page 12: HiPEACinfo 43

in the spotlight

• Storage and data management level: the database management system (DBMS) would be extended to support both adaptive data layout optimizations, e.g. columnar versus row­wise storage model according to the type of queries, and query­specific hardware pipelines dataflow­based acceleration. There are a lot of calculation­intensive operations that are executed by the DBMS to maintain the stored data; e.g. merging the update buffer into the main storage of an in­memory column store, that can be efficiently accelerated though data flow­based FPGA acceleration. Regar ding DBMS acceleration for data analytics, MAXELER’s in­memory capa bilities are expected to fulfil the needs for high demanding and fast data retrieval. Several DFEs organizations and design options will be investigated, in order to tailor the hardware architecture of DFEs to the set of most demanding queries.

Building upon the synergy of hetero geneous high performance computing, while exploiting reconfigurable architec tures, cloud and big data computing technologies, AEGLE will provide a framework for big data analytics for healthcare that will overall enable and promote innovation activities, placing health in the spotlight.

_________

Finding patterns of parallelism in industrial software for heterogeneous systems

SUCCESSFUL CONCLUSION TO EU PARAPHRASE PROJECT

A world­leading team of academic resear­chers and industrial experts from across Europe are celebrating the con clusion of a four­year research colla boration tackling the challenges posed by the fastest and most powerful computing systems on the planet.The € 4.2M ParaPhrase project brought together academic and industrial experts from across Europe to improve the pro­grammability and performance of modern parallel computing technologies.“Future computers will consist of thou­sands or even millions of processors, which poses a real problem to traditional pro­grammers not used to thinking in parallel,” said project leader Professor Kevin Hammond of the University of St. Andrews.

“The sheer complexity of these systems means that powerful tools are needed to develop software that runs stably and efficiently while making the most of the ability to process in parallel. The techno­logies we have developed in Para Phrase

make it possible now to really exploit the power of these new systems.”

The ParaPhrase researchers have deve loped an approach that allows large parallel pro­grams to be constructed out of stan dard building blocks called patterns. A refacto­ring tool allows these patterns to be reassembled in optimal ways without changing the functionality of the overall program.

Further tools developed on the project allow the program components to be run on the system in ways that make best use of the available processors, maximising throughput and minimising run time of large programs. The tools can even adapt the program while it is running to improve performance.Professor Hammond said, “It was impor­tant to us that our research could be directly exploited by industry and other researchers. That’s why we applied Para­Phrase to several important industrial case studies during the project.”Indeed, the project team has used its extensive industrial expertise to develop Use Case Scenarios in a range of application areas including industrial optimisation, scientific simulation and data mining.

The outputs of the project have been impressive. As well as producing over 80 publications in leading international

con fe rences and journals and being demon strated at over 100 international con fe rences and other events, the project has produced a range of new software tools and programming standards to support the growing global community in parallel programming.

A Streaming Parallel Skeleton Library for the Erlang programming language has recently been made available and a new release of the FastFlow parallel pro­gramming framework has already seen thou sands of downloads. Industrial part­ners are already applying the technology in their own operations and three recently­launched spin­out compa nies are set to take full commercial advantage of the technologies produced.

Already, the project partners are looking to the future. A number of follow­on projects are underway and more are in the pipeline. "ParaPhrase has been a tremendous success but significant challenges remain. In the future, parallel programs will need to self­adapt to computing architectures we haven't even thought of yet,” said Professor Hammond.

For more information please visit: http://www.paraphrase-ict.eu/

Kevin Hammond, University of St Andrews_________

HiPEAC info 4312

Page 13: HiPEACinfo 43

We present the outcome of a technology transfer project between the non-profit cTuning Foundation, France (Grigori Fursin) and ARM, UK (Anton Lokhmotov, Ed Plowman). The six-month project supported by the TETRACOM Coordination Action has resulted in developing from scratch the Collective Knowledge framework and validating it on realistic use cases, as well as forming a startup called dividiti.

in the spotlight

COLLECTIVE KNOWLEDGE: A FRAMEWORK FOR SYSTEMATIC PERFORMANCE ANALYSIS AND OPTIMIZATION

Designing, modeling and benchmarking of computer systems in terms of performance, power consumption, size, reliability and other characteristics is becoming extra­ordinarily complex and costly. This is due to a large and continuously growing number of available design and optimization choices, a lack of common performance analysis and optimization methodologies, and a lack of common ways to create, preserve and reuse vast design and opti­mization knowledge. As a result, optimal characteristics are achieved only for a few ad­hoc benchmarks, while often leaving real­world applications underperforming. Eventually, these problems lead to a dramatic increase in the development, optimization and maintenance costs, increasing time to market for new products, eroding return on investment (ROI), and slowing down innovation in computer engineering.

Since 2012 the non­profit cTuning Founda­tion and ARM have been engaging in discussions on systematic performance analysis and optimization using statistical analysis, machine learning and crowd­tuning techniques. In November 2014, we started a six­month technology transfer project supported by the FP7 TETRACOM Coordination Action. The cTuning techno­logy comprises a framework and reposi­tories for collaborative and reproducible experimentation combined with predictive

analytics. This technology has been successfully used in several EU projects including the FP6 MILEPOST project.

The TETRACOM grant has allowed us to completely design and develop from scratch the fourth version of cTuning technology which we called Collective Knowledge (or CK for short). CK is a Python­based framework, repository and web service, supporting JSON interfaces and standard Git services such as GitHub and Bitbucket. CK allows engineers and researchers to organize, describe, cross­reference and share their code, data, experimental setups and meta information as unified and reusable components. CK users can assemble from components various experimental workflows, quickly prototype ideas, crowdsource experiments using spare computer resources such as mobile phones, and more. Importantly, CK allows experimental results to be exposed to powerful predictive analytics packages such as scikit­learn and R in order to speed up decision making via statistical analysis, data mining and machine learning.

During the project, we have successfully applied the Collective Knowledge frame­work to perform systematic analysis, data mining and online/offline learning on vast amounts of benchmarking data available at ARM. Our technology showed good potential to automatically find various important correlations between numerous in­house benchmarks, data sets, hardware, performance, energy and run­time state. Such correlations can, in turn, help derive

representative benchmarks and data sets, quickly detect unexpected behavior, suggest how to improve architectures and compilers, and speed up machine­learning based multi­objective autotuning.Furthermore, our technology has also showed potential to enable collaborative research and development within and across groups. Therefore, we have released the Collective Knowledge framework under a permissive BSD license, and expect to grow the user community.

Finally, our positive results have motivated us to establish a UK­based startup called dividiti to accelerate computer engineering and research by further developing our technology and applying it to real­world problems.

Acknowledgments: We would like to thank the TETRACOM Steering Committee for accepting the project proposal and the TETRACOM manager Eva Haas for simplifying the paperwork. We also would like to thank our ARM colleagues Marco Cornero, Alexis Mather and Jem Davies for encouraging and supporting the project.

Further resources:• http://tetracom.eu• http://ctuning.org• http://github.com/ctuning/ck• http://hal.inria.fr/hal-01054763• http://www.dividiti.com

Grigori Fursin, cTuning Foundation_________

HiPEAC info 43 13

Page 14: HiPEACinfo 43

hipeac students

Host Institution: University of St Andrews, UKTitle: Exploiting parallel data-flow tools for graph traversing

COLLABORATION GRANT: HECTOR ORTEGA

During the last decade, parallel processing architectures have become a powerful tool to deal with massively­parallel problems that require high performance computing (HPC). In order to give support to the massive demand of HPC, the latest trends focus on the use of heterogeneous environ ments including computational units of different natures, such as common CPU­cores, graphics processing units (GPUs) and other hardware accelerators. The exploitation of these environments offers a higher peak performance and better efficiency compared to classical homogeneous cluster systems.

Part of my research has been focused on the development of a model, and its corresponding framework implemen ta­tion, whose main objectives are: to simplify the programming of these heterogeneous systems, by hiding the details of synchro­nization, deployment, and tuning; and to

maximize their efficiency, by automatically using all available resources. For various reasons, however, it is not always true that exploiting all available computational units leads to the fastest results.

One of the interests of the host research group at the University of St. Andrews is to efficiently parallelize the solution of a problem, by automatically distributing the computation between different hetero­geneous processors, and following a particular mapping for the resource usage. Learning this knowledge, and working together, also with another HiPEAC grant holder from Chalmers, Josef Svenningsson, has led to new methods that predict optimal configurations of the resource usage, leading, therefore, to improvements in the automatic distribution of the work. In order to enrich the obtained results, experiments have been conducted using very different server machines from both

universities, and solving real­world pro­blems with different characteristics. These activities resulted in a willingness to con­tinue the collaboration in distance in order to extend the experimentation, and also, in the improvement of our respective frame­works by the addition of the new developed knowledge.

I would like to thank HiPEAC for giving me this opportunity, as well as all the people of the University of St. Andrews and Josef, for their unsurpassable treat in both aca­demic and personal contexts, and finally, to my supervisors at Universidad de Valladolid for their support and encouragement to participate in this rewarding experience.

Hector Ortega, Universidad de Valladolid, Spain_________

Thanks to this HiPEAC Collaboration Grant, I have had the pleasure of spending the last three months of the year at the IBM Thomas J. Watson Research Center, and this report summarizes the research work carried out during this period. It is well known that the capacity to extract as much knowledge as possible from the large volumes of data that our society generates plays an important role in the development of systems. Traditional metho dologies are no longer valid to manage extremely large and complex data sets, and a new computation paradigm known as BigData or data analytics is gaining ground. From current social networks to the future Internet of Things (IoT), all these systems share the same characteristics: the amount of data generated per time unit is vast, highly variable and schema­less. The complexity

of the problem lies in the fact that the majority of these data does not include relevant information and the filtering of data that do include useful information has to be done in a limited time. This is necessary to reach useful decisions in time. For this reason, many of these systems have to be carefully designed and planned in order to fulfil the workload requirements. Under these circumstances, it is necessary to propose new architectures suitable for overcoming the obstacles of these information systems. To achieve such a goal, it is essential to know the impact that these applications will have on the memory system of actual architec­tures. During this three­month research collaboration, we have been focused on studying the type of databases used to manage these high volumes of data in current systems. These are no longer being

structured with relational databases, and instead NoSQL databases are becoming more popular, since their distributed and horizontally scalable characteristics make them the perfect tool to manage these types of data. More specifically, our main database has been the Apache Cassandra distribution. In order to work with it, it is necessary to run some kind of client. For this task, we decided to use the Yahoo Cloud Serving Benchmark (YCSB), which is a highly configurable benchmark that allows the user to set different parameters, in order to test the database under different behavior patterns. Apache Cassandra was configured as a cluster database in a three­node system, each with two Intel Sandy Bridge quad­core chips and a memory hierarchy of 12 MB of cache each and 48 GB of main memory. Its configuration details were set to the

Host Institution: IBM Thomas J. Watson, USATitle: Processor-memory effects analysis on BigData workloads

COLLABORATION GRANT: LUCIA G. MENEZO

HiPEAC info 4314

Page 15: HiPEACinfo 43

hipeac students

During this collaboration project we have designed and implemented a high­level software pipelining method which ope­rates in the target independent optimi­zation layer of the compiler. This allowed us to create a software pipelining optimi­zation pass that can easily be reused for many different architectures. Our current implementation is targeted at the Movi­dius SHAVE processor architecture. From our initial experiments we found that this method does indeed provide benefits over naively unrolling loops, while requiring only very limited information about the processor architecture.

The current work of this project focuses mostly on stabilizing the code so that we can run a larger set of test benchmarks and start tuning the required processor architecture model interface. Once this is completed we plan to demonstrate the portability of our approach by applying it to several processor architectures already supported by LLVM, such as Qualcomm’s Hexagon DSP and AMD’s R600 series of GPUs. Finally, we will also present our design together with the results of our expe riments at the upcoming EuroLLVM meeting (April 2015) in London.I am grateful to HiPEAC for providing me with the opportunity for performing this

research through a collaboration grant. It has been a great pleasure to work in such an international collaboration, meet inte­resting people, and make use of the vast amount of experience available within industry. It was a valuable experience and hopefully the beginning of a long­term scientific contact between both our insti tu­tions. In particular I would like to thank David Moloney and Martin O’Riordan for hosting me and sharing their knowledge on processor architecture and compiler design.

Roel Jordans, Eindhoven University of Technology, The Netherlands_________

Host Institution: Movidius, IrelandTitle: An implementation of software pipelining in LLVM

COLLABORATION GRANT: ROEL JORDANS

The Project developed during the intern­ship of Mr. Jaime Espinosa in BSC­CNS, a HiPEAC member, has been successfully completed. Thanks to the collaboration between the different institutions, the expertise of the Fault­Tolerant Systems Research Group (STF) at the Polytechnic University of Valencia in Fault Injection has been of great use when performing the experi mentation required in the work. Likewise, many lessons have been learned when applying injection in complex sys­tems such as the one used in the host institution. Furthermore it has been an enriching experience to work with people

more closely related to industry, since the view of academia is not always so closely aligned with it. The outcomes of the program have been remarkable.

Firstly, a joint publication has been published at DAC 2015. The title of the publi cation is “Analysis and RTL correlation of Instruction Set Simula tors for Auto­motive Microcontroller Robust ness Verifi­ca tion” J. Espinosa, C. Hernández, J. Abella, D. de Andrés, JC. Ruiz.

Secondly, a longer collaboration has been fostered thanks to the internship, which

will probably yield a second extended publication with more in­depth results and conclusions. A set of interesting contacts have been made with people from BSC, which may inspire new research work and further fruitful discussion in the area. I would like to personally thank all the people at HiPEAC who have made this internship possible, as well as the people at BSC for their support.

Jaime Espinosa Garcia, Universitat Politècnica de Valencia, Spain_________

Host Institution: Barcelona Supercomputing Center, SpainTitle: Correlating microarchitectural fault injection with RTL fault injection experiments

COLLABORATION GRANT: JAIME ESPINOSA GARCIA

default values, mainly regarding the repli­ca tion strategy and the snitches behavior. The Cassandra tool named nodetool was used for managing the constructed cluster and the whole testing infrastructure has been completed with monitoring tools to analyze the behavior of the system, speci­fically processor and memory behavior.

This collaboration work will be continued by constructing benchmarks from this

structure, in order to integrate them into a flexible and reliable framework such as the Gem5 simulator, in which we already have experience. This will allow us to evaluate new hardware proposals for the coherence protocol especially conceived to accelerate BigData or data analytics workloads. Finally, I would like to thank HiPEAC for this opportunity and IBM for accepting me. I would also like to thank the Compu ter Architecture Department at IBM T.J. Watson

for receiving me in such a warm way from the first day and especially Ravi Nair for our discussions. In addition to the extraordinary experience I had, I hope and wish that this rewarding collaboration might continue in the future.

Lucia G. Menezo, University of Cantabria, Spain _________

HiPEAC info 43 15

Page 16: HiPEACinfo 43

hipeac students

Host Institution: University of Cambridge, UKTitle: Implementing Microarchitectural Mechanisms to Slow Down Ageing in Current Microprocessors

COLLABORATION GRANT: ALEJANDRO VALERO

I have spent three months in the Computer Laboratory at the University of Cambridge, UK, under the guidance of Dr. Timothy M. Jones. During this period I have learned about how variations in the manufacturing process of current microprocessors affect the lifetime of their transistors. Degrada­tion of the different microprocessor com­po nents affects the overall system, depen ding on the importance of the component. For instance, cache memories are implemented with a huge quantity of transistors and they are critical for system performance. Caches have usually been implemented with Static Random ­ Access Memory (SRAM) cells composed of six transistors. Over the lifetime of processors, Negative Bias Temperature Instability (NBTI) and Hot Carrier Injection (HCI)

gradually increase the threshold voltage of SRAM’s PMOS and NMOS transis tors, res­pec tively, causing slower transistor switching, which in turn results in timing violations and faulty operation in such cells.

In this collaboration we have focused on new architectural mechanisms to lengthen the lifetime of the processor caches. We have identified that some memory cells are much more affected by both NBTI and HCI effects than others. Based on these observations, our mecha nisms attempt to mitigate these effects by ensuring a homo­geneous degradation across the memory cells. This collaboration has allowed me to exploit a new research direction and it is the beginning of new exciting research opportunities. Besides, this grant has

permitted us to establish a solid rela tion­ship between the University of Cambridge and the Universitat Politè cnica de València. We are still working together after the end of the mobility and we plan to apply jointly for future research project calls. I would like to thank HiPEAC for giving me the opportunity to partici pate in this intern­ship, as well as my host Tim Jones, who was extremely committed to our collaboration from the very beginning and provided useful hints to develop our techniques. Finally, I also would like to thank all the fellows in the lab, who made this expe­rience unfor gettable for me

Alejandro Valero, Universitat Politècnica de València, Spain _________

Massive parallelism in many commu ni­cation and multimedia applications shows up as significant opportunities for data­level parallelism (DLP). DLP is usually exploited by single­instruction multiple­data (SIMD) execution units due to their low­power architecture which applies a single instruction across many processing elements. Nowadays, SIMD execution units exists in many modern embedded and mainstream processor architectures.

Although SIMD execution is one of the main enablers of computing efficiency, programming for SIMD architectures is still a challenge and a hot research topic. Moreover, auto­vectorization capabilities of compilers are very limited and vecto ri­zation mostly requires tweaks and instru­mentation (e.g. pragmas, target­specific intrinsics, etc.) to be added in the source code by the programmer. Generating vector code for an architecture with a single vector width is already difficult. It

gets even much more challen ging when the vector code is to be gene rated for a VLIW architecture with a multiple native vector widths. The SHAVE VLIW vector processor of Movidius is an example of such an architecture. SHAVE is a unique VLIW processor that provides hardware support for both native 32­bit (short) and 128­bit (long) vector operations.

During my HIPEAC internship at Movidius, I have been continuously challenged to improve the SIMD code generation of an LLVM­based commer cial compiler targe­ting the SHAVE processor family. The compiler was capable of SIMD code gene­ration for long (128­ / 64­bit) vector operations. I focused on the compiler back­end support for short (32­bit) vector code generation. More specifically, the work aimed at generating SIMD code for a short vector type that can be executed next to the long vector SIMD code. As a result, the compiler is now able to generate mixed­

width assembly code consisting of both short and long SIMD operations. To our knowledge, we implemented the first (prototype) compiler producing such mixed­width SIMD code.

The research paper explaining the results of the work will be presented at the 26th IEEE International Con fe rence on Appli­cation­specific Systems, Architectures and Processors (ASAP), 2015, in Toronto, Canada. Moreover, our experiences with LLVM com­piler development were presen ted at the EuroLLVM 2015 conference in London, UK.

Finally, I would like to thank the team at Movidius, in particular Martin J. O’Riordan and David Moloney, for all their support during my stay in Dublin, as well as HiPEAC for making this internship possible.

Erkan Diken, Eindhoven University of Technology, The Netherlands_________

Host Institution: Movidius, IrelandTitle: Mixed-width SIMD code generation in an LLVM based compiler for SHAVE VLIW Vector Processor

COLLABORATION GRANT: ERKAN DIKEN

HiPEAC info 4316

Page 17: HiPEACinfo 43

hipeac students

I am a second year PhD student at the University of Siena (UniSi), Siena, Italy. I have just completed my internship at Ericsson Research Lab, Lund, Sweden. During my stay at Ericsson, I worked with the Cloud Computing Research Group. I enjoyed my stay working with Cloud Principal Researcher, Johan Eker and his team. The main goal of the internship was to implement actor­based models on Parallella boards. To build a highly complex Cloud/HPC system, we now have many options, but in recent times, there has been a sea change in the semiconductor industry. Each technology has its pros and cons. To build highly complex and scalable

computer systems, there exists multi pro­cessors, multi­coprocessors, Accelerated Processing Units (APUs), GPUs and FPGA based devices. But, each device comes with 3P (Performance, Programming and Power) issues. To exploit the massive levels of parallelism of these processors, we must find the best­suited programming model for each. Dataflow models and actor­based models are examples of well known programming models that have recently gained popularity for achie ving high levels of parallelism. It was a nice experience for me to explore Parallella boards and to implement the func­tionalities of actor­based models on these

boards. I especially want to thank sincerely my supervisor Prof. Roberto Giorgi for his support and inspi ration to apply for this internship in my first year of my PhD. I would also like to thank HiPEAC for giving me the oppor tu nity to work with this group of highly skilled and inspiring professional researchers at Ericsson.

Somnath Mazumdar, University of Siena (UniSi), Siena, Italy _________

I am a final year PhD student at Politecnico di Milano, Italy. For the HiPEAC Industrial Internship,I worked at the Media Pro­cessing Group (MPG) of ARM in Cambridge, from August to December of 2014, under the super vision of Marco Cornero. The topic of the internship was to investigate improvements in the pro grammability of tightly­integrated CPU­GPU sys tems. The cost of commu nication between CPUs and GPUs is decreasing, in particular thanks to multi­core chips and memory coherency tech niques. This trend can also be noticed in current programming models, for example in the new OpenCL 2.0 standard, which introduces shared virtual memory profiles, and in the HSA Foundation. The idea developed during the internship is to fully exploit the benefits provided by full shared virtual memory, in order to simplify hetero geneous systems programming, in parti cular the host­device interaction API. We defined a prototype programming model inspired by the Khronos SyCL provisional specification, which we exten­ded to support almost all the features of

the C++ language, on both CPU and GPU, with the least possible restric tions. The sharing of data and data pointers is implicit and almost free, thanks to shared virtual memory, while special care is needed for the executable code, given the presence of multiple instruction sets in the hetero geneous system. The main topic of the work was to provide full support for the C++ language also for the device kernels. We identified three main challen­ges: support for virtual member functions and representation of virtual tables for multiple devices, support for function poin ters, and support for generalized sys tem calls (host services accessed remotely from the devices). These features share the fact that they involve code for both the CPU and the GPU. Because of the different instruction sets, functions must be compiled for multiple targets. In this context, a device that invokes a function from a function pointer needs to access the version of the function that corres­ponds to its ISA, or otherwise if such a version is not present, it should remotely

invoke the function in the host. During the internship, various solutions for the aforemen tioned functionality, with diffe­rent requirements, were evaluated and prototyped. A considerable effort was put into not breaking the host ABI, which has a strong impact in ensuring interopera bility with already existing codebases. Further­more, I started the development of a proto­type compiler based on the Clang/LLVM framework to support the new pro­gramming model. In addition I have actively contributed to the development of the compiler backend for the ARM next­generation GPUs, in order to support some of the features required by the hete ro ge­neous programming model inves ti ga tions.

Michele Scandale, Politecnico di Milano, Italy_________

Host Institution: Ericsson Research Lab, SwedenTitle: Programming Parallel Cloud Hardware

Host Institution: ARM, UKTitle: Exploiting Tight CPU/GPU integration to Improve GPGPU Programming Models

INTERNSHIP REPORT: SOMNATH MAZUMDAR

INTERNSHIP REPORT: MICHELE SCANDALE

HiPEAC info 43 17

Page 18: HiPEACinfo 43

hipeac students

Host Institution: Samsung R&D Institute (SRUK), United KingdomTitle: On Heterogeneous Programming with MPSoC Application Programming Studio “MAPS”

INTERNSHIP REPORT: TURKEY ALSALKINI

I am a PhD student at Heriot­Watt Univer­sity, working on exploring a mechanism to balance the runtime load to optimize resource use on heterogeneous architec­tures. As a part of a HiPEAC sponsored industrial internship, I spent four months, part­time, at Samsung R&D Institute (SRUK), London, United Kingdom. Samsung R&D Institute focuses on investigating and optimising native libraries for embedded multi­core heterogeneous architecture. Android is a mobile operating system based on the Linux kernel and it is currently developed by Google. With a user interface based on direct manipulation, Android is designed primarily for touch­screen mobile devices such as smartphones and tablet computers. The number of cores in mobile devices is increasing to meet the needs of performance demanding applications. To take advantage of such a

multi­core architecture, these resources should be utilized. The main task of this internship was to improve the start­up time of Android applications in order to reduce the power consumption. We started by analysing the memory heap for applica tions in Android to investigate the memory usage and we looked for memory leaks which may have an impact on the per formance of the application. Then, we moved to explore the Android application launch process. In this process, the current layout inflator serially instantiates the UI elements from the XML file. Instantiating a UI element does not take long time unless the element has an image for decoding. Our solution assigns the decoding of images to worker threads, in order to let the main thread inflate other components while the threads are decoding the images. Our modifications were on the Android

framework, through adding a task manager, runnable tasks and future results. We started the implementation by applying our changes to the Android framework and we tested the results on the Emulator, Samsung Nexus 10 and Samsung Note 4. The results show that the inflator submits the tasks when it finds an image for decoding images for some UI elements. As a result, this improved the start­up time by utilizing the multi­core architecture in the mobile device. I would like to thank HiPEAC and Samsung for giving me this opportunity to meet and work with highly skilled people. I would also like to thank my supervisor Greg Michaelson for his support.

Turkey Alsalkini, Heriot-Watt University, Edinburgh, UK _________

As part of the HiPEAC industrial internship program for PhD students, I spent four months at ARM (Cambridge, UK) investi­gating the effects of false sharing.

False sharing arises in shared memory multiprocessors with an invalidation based coherence protocol. As multiple addresses are mapped onto the same cache line, two cores working on distinct addresses might end up sharing a cache line. This can cause false invalidations where a write to address A by one core might end up invalidating the copy of the cache line containing A that is cached at another location, even though the remote cache never accessed A and will never access A. In the worst case scenario, both cores are writing to distinct addresses mapped onto the same cache line, causing a ping­pong effect where they continuously invalidate each other’s copy. This means that false sharing causes

misses that could have been avoided. False sharing can be addressed in software by remapping the data structures that cause the false sharing, but this requires active colla bo ration from the programmer. Hardware can also be used to detect false sharing but it comes at the cost of additional logic in the cache controllers.

The goal of this internship was investi­gating the severity of the false sharing problem using the computing system simu lator gem5. I instrumented the cache controllers with additional code that assigns access vectors to all cache lines to detect which words in the cache line have been used locally. By comparing these with the access vector of remote requests, false sharing can be detected. This also allowed for other statistics to be gathered about the utilisation of cache lines, etc.I really liked doing research in an industrial

environment, as it has different focuses than research in an academic setting. I felt really inte grated into the group and its research, which allowed me to learn a lot during my time at ARM. The work I did also contributed to internal projects,l which was a nice change from PhD work, which can at times be a bit solitary. ARM gave me the chance to extend my stay by another two months, which I took so I that could completely finish the project. I am very thankful towards HiPEAC and ARM for offering these industrial internships: I not only learnt more about computer architec­ture, but I also think this experience will contribute to my becoming a better researcher. I would also like to thank every­body at ARM who I worked with for an interesting and nice experience.

Anouk Van Laer, University College London, UK _________

Host Institution: ARM, UKTitle: False Sharing Cache Misses Detection and Elimination

INTERNSHIP REPORT: ANOUK VAN LAER

HiPEAC info 4318

Page 19: HiPEACinfo 43

PhD news

cution scenarios where no single policy is best suited in all cases, this work proposes an approach based on the idea of mixture of experts. It considers a number of offline experts or mapping policies, each special­ized for a given scenario, and learns online the best expert that is optimal for the current execution._________

tions in the MPSoCs. Last but not least, we discuss the application of aforementioned strategies in 3D NoC systems and we propose a Bus Virtual channel Allocation (BVA) mechanism to enable vertical worm­hole switching in 3D NoC­Bus hybrid systems. All proposals are evaluated in our NoC simulation platform and their advan­tage over state of the art counterparts are demonstrated by means of experimental results. _________

Murali Emani, University of Edinburgh, UKAdvisor: Prof. Michael O’BoyleGraduation date: March 2015

Changlin Chen, Delft University of Technology, NLAdvisor: Assoc. Prof. Sorin CotofanaGraduation date: May 2015

ADAPTIVE PARALLELISM MAPPING IN DYNAMIC ENVIRONMENTS USING MACHINE LEARNING

TOWARDS DEPENDABLE NETWORK-ON-CHIP ARCHITECTURES

to the mapping of parallel programs in dynamic environments. It employs predic­tive modelling techniques to determine the best degree of parallelism. Firstly, this thesis proposes a machine learning­based model using static code and dynamic runtime fea­tures as input, to determine the optimal thread number for a target program. Next, this thesis proposes a novel solution to monitor the proposed offline model and adjust its decisions in response to any drastic environment changes. Furthermore, considering the multitude of potential exe­

we propose a Flit Serialization (FS) strategy to tolerate broken link wires and to effi­ciently utilize the remaining link band­width. Within the FS framework heavily defected links whose fault levels exceed a certain threshold value are deactivated. Moreover, we design a distributed logic based routing algorithm able to tolerate totally broken links as well as to efficiently utilize UnPaired Functional (UPF) Links in partially­defected interconnects. We also introduce a link bandwidth aware run­time task mapping algorithm to improve the mapping quality for newly injected applica­

Modern day hardware platforms ranging from mobiles to data centers are parallel and diverse. The execution environment composed of workloads and hardware resources, is dynamic and unpredictable. Here efficient matching of program paral­lelism to machine parallelism under uncer­tainty is hard. My thesis proposes solutions

In this dissertation, we propose several novel NoC tailored mechanisms to tolerate faults induced by transistor miniaturiza­tion, as well as to efficiently utilize still functional NoC components. We first intro­duce a low­cost method to allow for correct flit transmission even when soft errors are occurring in the router control plane. Then

HiPEAC info 43 19

Page 20: HiPEACinfo 43

upcoming events

International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS XV)20-23 July 2015, Samos Island, Greece http://samos-conference.com/

The 2015 International Conference on High Performance Computing & Simulation (HPCS 2015)20-24 July 2015, Amsterdam, the Netherlands http://hpcs2015.cisedu.info/

International Symposium on Low Power Electronics and Design (ISLPED)22-24 July 2015, Rome, Italy http://www.islped.org/2015/index.html

Euro-Par 201524-28 August 2015, Vienna, Austria http://www.europar2015.org/

22nd European conference on circuit theory and design (ECCTD2015)24-26 August 2015 in Trondheim, Norway http://www.ntnu.edu/ecctd2015/

ParCo20151-4 September 2015, Edinburgh, UK http://www.parco.org/

International Conference on Field-programmable Logic and Applications (FPL 2015)2-4 September 2015, London, UK http://www.fpl2015.org/

IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-15)23-25 September 2015, Turin, Italy http://mcsoc-forum.org/2015/

2015 IEEE Nordic Circuits and Systems Conference (NORCAS)26-28 October 2015, Oslo, Norway http://www.norcas.org/

22nd IEEE International Symposium on High Performance Computer Architecture (HPCA), 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), and 2016 International Symposium on Code Generation and Optimization (CGO)12-16 March 2016, Barcelona, Spain http://hpca22.site.ac.upc.edu/ http://conf.researchr.org/home/PPoPP-2016 http://cgo.org/cgo2016/

HIPEAC 2016 CONFERENCE, 18-20 JANUARY 2016, PRAGUE, CZECH REPUBLICWWW.HIPEAC.NET/2016/PRAGUE

info 43

hipeac info is a quarterly newsletter published by the hipeac network of excellence, funded by the 7th european framework programme (fp7) under contract no. fp7/ict 287759website: https://www.hipeac.net/subscriptions: https://www.hipeac.net/publications/newsletter/

contributions If you are a HiPEAC member and would like to contribute to future HiPEAC newsletters, please visit https://www.hipeac.net/publications/newsletter/

design

: w

ww

. mag

elaa

n.be