NO.075 Putting Heterogeneous High-Performance Computing at the Fingertips of Domain Experts
November 17 - 20, 2015 (Check-in: November 16, 2015 )
- Wim Vanderbauwhede
- University of Glasgow, UK
- Sven-Bodo Scholz
- Heriot-Watt University, Scotland
- Tetsuya Takemi
- Kyoto University, Japan
Description of the meeting
Perspectives, needs and ideas from the research communities involved in using, operating and developing HPC software and systems.
High-performance computing (HPC) is critically important in many scientific fields that use numerical models which are computationally intensive both in terms of the speed of computation and in the memory usage.
For example, recently large-eddy simulation models have become more and more common for research in meteorological and atmospheric sciences because they can explicitly represent the turbulent nature of atmospheric flow and dispersion at fine spatial scales. This is a very promising tool for numerical weather forecasting, emergency response, hazard/disaster assessment, and air-quality assessment, but the technique is much more computationally expensive than the current numerical weather forecasting.
Writing code that will run with good performance on modern computer systems is becoming increasingly hard because of the advent of many-core systems and an increasingly broad range of accelerators such as Graphics Processing Units (GPUs) and Field- Programmable Gate Arrays (FPGAs).
HPC clusters built from such systems pose an even greater challenge. Traditionally, such highly parallel HPC systems were predominantly used by a rather small group of people. Applications were limited to a relatively well defined set of scientific codes which were tuned and adjusted over long periods of time. The most demanding applications typically were highly economically relevant and, therefore, could afford HPC experts with extensive expertise in programming parallel systems to devote large amounts of time to fine-tuning the codes to the executing hardware.
Since the turn of the century, the situation has changed dramatically. Not only has the information area that we are embarking into led to new increased data processing demands, sometimes referred to as ”big data”, but we also see heterogeneous HPC systems becoming mainstream facilities. As these systems become more affordable they become more widely available. We see a much larger cohort of application programmers from domains such as climate modelling, image processing, finance, personalised medicine, biology, chemistry, etc.
The compilers and analysis tools required to achieve high performance on modern systems are highly complex in themselves, and aimed at the HPC experts rather than the domain experts. However, many domain experts do not have the means to employ HPC experts, nor can they afford the time to acquire the necessary expertise themselves. As a result, the gap between the performance of code written by many domain experts and the capability of modern computer systems is growing steadily. Even for the HPC experts it is becoming increasingly difficult to achieve optimal performance, due to the huge complexity of heterogeneous many-core systems. This is already the case for systems with accelerators such as GPUs or Intel’s Many Integrated Cores (MIC) architecture, but is particularly acute for FPGAs: these devices are very promising for HPC because they can achieve very high performance per Watt; however it is still very difficult to achieve good performance in FPGA computing.
To safeguard the progress of the domain experts and, with it, the progress of scientific research relying on numerical computations, addressing this performance gap is crucial.
Bridging the gap
The current work flow for scientific computing is typically as follows: domain experts typically write single-threaded code, usually in Fortran or C/C++. If they have in their team parallel programming expertise, or the means to afford support from HPC experts, the code will be parallelised for clusters through MPI and for multicore processors via OpenMP. Porting the code to GPUs requires manual rewriting of parts of the code in CUDA or OpenCL. For FPGAs, the situation is even more complex: here, the code needs to be re-implemented in a hardware design language.
At the same time, there are many efforts in the computing science community to create languages and compilation approaches that can target heterogeneous systems without manual rewrites, for example OpenACC, Single-Assignment C, Halide, as well as
languages with explicit parallism support such as Chapel, Cilk, Co-array Fortran and many others. However, this research assumes knowledge about programming paradigms, architectures, cost models etc. which come natural to computing scientists and HPC programmers but not to domain experts.
We therefore want to bring these communities together to exchange views, so that the computing science research will benefit the domain experts much more directly.
Aim of the meeting
The aim of this Shonan meeting is to bring together researchers from the disciplines involved, in particular
- domain experts such as geophysicists, meteorologists etc.,
- High-Performance Computing experts,
- computing scientists with expertise in programming languages and compilers for heterogeneous many-core systems,
- specifically FPGA and GPU experts
- to have a discussion on the challenges each community faces and on ways to bridge the gap between the domain experts and today’s and tomorrow’s clusters of heterogeneous many-core systems.
We want to address questions relating performance to code analysis, refactoring, com-pilation and run-time adaptation, as well as user experience design.
The different communities will ask very different questions: computing scientists might ask: “Is static code analysis possible? Do we require run-time analysis? Should the compiler suggest changes to the source code for better performance? Do we need tools to predict performance on various platforms? Can we develop analysis tools that provide useful feedback to a non-expert end user?”; HPC experts would e.g. ask “What obstacles do we face when rewriting existing code to get top performance? Could the compiler give us suggestions based on the machine architecture?” and domain experts might ask “Why do I need to write code at all? Can we not use the equations as inputs? Why does the compiler not automatically parallelise my code? How do I select my numerical algorithm, why is there no tool to help me with this choice?”
All these questions are interlinked, and the different communities will have very different perspectives on the issues involved. We want to stimulate this discussion so that the different communities can learn from one another and arrive at shared ideas that will put Heterogeneous High-Performance Computing at the fingertips of the domain experts.