No.056 Staging and high-performance computing: theory and practice


NII Shonan Meeting Seminar 056

Discussion materials

Specializing Sparse Matrix-Vector Multiplication

Baris Aktemur

Sparse matrix-vector multiplication (SpMV) is a kernel operation used in many scientific computation domains. SpMV is amenable to optimization by using generative programming to specialize the function based on the structure and values of the sparse matrix. In this work we evaluate the performance gains obtainable from specialized SpMV by generating code using several different methods. We then discuss how to predict which code generation method will give the best performance. We finally share our experiences regarding generating high-performant SpMV code quickly.

On the Mechanics of a Program-Generator Generator

Robert Glück

We present principles and implementation behind a light-weight program-generator generator based on partial evaluation techniques for a recursive flowchart language.

An Innocent Solution to Shonan Challenge 2012

Yukiyoshi Kameyama

The first Shonan Challenge collects a set of problems for generating optimized programs in the domain of high performance computing. Even though several solutions have been already posted by programming language experts, it was not clear how a non-expert can solve the problems. In this talk I will report a student’s solution to the Hidden Markov Model (HMM) problem, and how he developed his solution to accommodate several optimizations. I also briefly mention the evaluation of his solution and the lessons learned. This talk is based on the work by Haruki Shimizu.

Code Generation Challenge for Staggered Mesh

Takayuki Muranushi

In ordered-mesh based solvers of partial differential equations, every
discretized physical values lie on integer coordinates. Staggered mesh is where rational-number coordinates such as half-integers are allowed. For example in hydrodynamics simulations, gas density variables may lie in the center of the cubic cells while gas flux variables may sit on the center of the cell surfaces. Staggered meshes are sometimes consequences of higher-order differentiation, and sometimes important for numerical stability or conservation laws. On the other hand, staggered mesh algorithms require slightly different treatment for different variables or different vector components. This poses challenge to abstraction, and the failure of abstraction often results in Fortran or C code where x-component logic is copied and slightly modified for y-component and then z-component, and so on.

In my talk I’ll illustrate this staggered mesh problem code by taking elastic wave equations as an example.

Paraiso: an automated tuning framework for explicit solvers of partial differential equations

Takayuki Muranushi

Paraiso Wiki

DSL embedding – how do we know if it’s worth it?

Ryan Newton

The embedded DSL approach has established benefits but also suffers from problems that do not necessarily show up in code figures of academic papers: e.g. terrible error messages and other limitations of syntax overloading. In this talk I will argue against the the traditional cost-benefit analysis ascribed to embedded/standalone DSL design decision. To illustrate, I will discuss three hypothetical alternative front-ends to the Accelerate GPU DSL: Haskell-embedded (the original), standalone with language subsetting, and Racket-embedded with its own type system.

Embedding Application Knowledge for Improved Dynamic Adaptation

Sven-Bodo Scholz

Generating efficient code for a range of heterogeneous hardware from DSLs is hard. Trying to achieve the same for Not-So-Domain-Specific-Languages is even harder. This talk presents some of our latest ideas on how to provide the compiler/code generator with domain knowledge without modifying the code generator at all. Instead, we use programmer-provided alternatives in conjunction with dynamic adaptation to achieve overall efficiency. This is very much work in progress; besides presenting the motivation and basics of the approach it provides several open questions.

More challenges for HPC program generation

Reiji Suda

I introduce several optimizations (program transformations) that are frequently used by HPC programmers. I present several questions for staging.


May 26 (Monday)

  • 15:00 -: Hotel Check In (early check-in from 12:00 is negotiable if informed in advance)
  • 19:00 – 21:30: Welcome Reception

May 27 (Tuesday)

  • 7:30 – 9:00: Breakfast
  • 9:00 – 9:15 Welcome, Overview, Administrative issues
  • 9:15 – 10:15: Self-introduction
    Chair: Yukiyoshi Kameyama
    • Georg Ofenbeck
    • Sven Bodo-Scholz
    • Robert Glück
    • Jeremy Yallop
    • Takayuki Muranushi
    • Tobias Grosser
    • Ryan Newton
    • Daisuke Takahashi
    • Tiark Rompf
    • Nada Amin
  • 10:15 – 10:45: Tea Break
  • 10:45 – 12:00 continue self-introductions
    Chair: Oleg Kiselyov

    • Yukiyoshi Kameyama
    • Baris Aktemur
    • Jonathan Ragan-Kelley
    • Zach DeVito
    • Lindsay Errington
  • 12:00 – 13:30: Lunch
  • 13:30 – 14:30: Daisuke Takahashi Performance Tuning for High Performance Computing Applications (Overview of HPC)
  • 14:30 – 15:05: Baris Aktemur: Shonan Challenges
  • 15:05 – 15:30: Yukiyoshi Kameyama: An Innocent Solution to Shonan Challenge 2012
  • 15:30 – 16:00: Tea break
  • 16:00 – 16:30: Nada Amin: Scala/LMS/Delight
  • 16:30 – 17:00: Tiark Romp: Scala answer to Shonan challenges
  • 17:00 – 18:00: Final discussion
  • 18:00 – 19:30: Dinner

May 28 (Wednesday)
Theme: Control, interactivity, feedback

May 29 (Thursday)
Theme: more challenges

May 30 (Friday)

  • 7:30 – 9:00: Breakfast
  • 9:00 – 9:30: Robert Glück On the Mechanics of Program-Generator Generators
  • 9:30 -10:00: Lindsay Errington
  • 10:00 – 10:30: New Shonan challenges
  • 10:30 – 11:00: Tea Break
  • 11:00 – 12:00: Final panel discussion
    How to continue (abstracts? Position paper? Special issue?)
  • 12:00 – 14:00: Lunch

Travel Information

General information

Getting to Shonan Village

The following instructions cover the common cases for a non-resident of Japan to move from Tokyo and its airports to Shonan Village. In general, first take trains to either JR’s Zushi station or Keikyu’s Shin-Zushi station, then take a bus or taxi. The Shonan Village Web site gives more information, but please do not hesitate to ask the organizers for help!

If you get lost

Railway staff (especially at ticketing offices) tend to speak some English and know about destinations. Bus drivers, maybe less likely.

Step 1 of 2: tracks

You can search for train schedules online. However, if using the “N’EX”, please focus on trains labeled “Narita Express” or “JR” and disregard the fares calculated.

From Tokyo’s Narita airport

Take JR trains to Zushi. JR is Japan’s “national” railway company. At the airport after exiting customs, go downstairs and find a JR office (not a Keisei office). Buy the N’EX
TOKYO Direct Ticket (One-way)
(it saves you more than 2000 yen). Take “N’EX” (Narita Express) to Yokohama or Ofuna (better), then take the Shonan-Shinjuku or Yokosuka line to Zushi. The N’EX ticket will cover the JR train trip all the way to Zushi (about 2 hours) and, if you buy the round-trip N’EX ticket, back. Tell the JR representative that you’re going to Zushi and you know you have to change trains at Yokohama or Ofuna. The Narita Express train splits en route so board the car of your assigned seat.

From Tokyo’s Haneda airport

Warning! Trains in Japan do not run at night. Night buses are rare. If you come to Haneda airport at about 10pm (which is typical for many US flights), you will be stuck in Tokyo for the night and should plan accordingly (e.g., make a reservation for a hotel in the vicinity of Haneda or the Tokyo station — which is as far as you could possibly go.) It’s best to plan to arrive to Japan before 6-7pm.

Take Keikyu trains (not the monorail) to Shin-Zushi. Keikyu is one of Japan’s many private railway companies. At the airport, buy a PASMO or Suica card with about 2000 yen. (You might get a slightly better deal if you buy a PASMO card from a Keikyu ticket office and tell them you are going to Shin-Zushi.) Use the card to go to Shin-Zushi terminal (about 1 hour), possibly changing trains at Keikyu-Kamata and/or Kanazawa-Hakkei.

From central Tokyo

Take JR’s Yokosuka line (e.g., from Tokyo or Shinagawa station) or Shonan-Shinjuku line (e.g., from Ikebukuro, Shinjuku, or Shibuya) to Zushi (about 1 hour). You can also take Keikyu trains (e.g., from Shinagawa) to Shin-Zushi.

Step 2 of 2: roads

A Keikyu bus departs once or twice per hour from Zushi and Shin-Zushi stations (bay 1) to Shonan Village (the last stop). It takes 30 minutes and costs 350 yen. The bus time table is online. The departure times from Shin-Zushi are 2 minutes after the departure times from Zushi.

Pay for the bus with your Suica or PASMO card by touching it to the blue “IC” panel when you get on (through the center door) and again when you get off (through the front door). It is when you get off that you pay, so don’t ask the driver for a ticket when you get on. After you get off, your destination building is right across the road (north, in other words).

A taxi from Zushi or Shin-Zushi station to Shonan Village takes 20 minutes and costs 2500 or 3000 yen.

Travel tips

Arranging for a ride together

Please add a comment to this post to announce your arrival information and arrange to share transportation. The comments will be deleted after the seminar.


If you have never experienced even a small earthquake, you may wish to check the page “What should I do when…?” just in case (just as you should know how to evacuate in case of a fire even if the chance is very small).

Meeting points in Tokyo

Haneda airport, international terminal: after exiting baggage claim and customs, turn right and you’ll see the tourist information office to your right. There are plenty of seats in front of the office.

Narita airport: please see

JR Tokyo station, underground within the ticket gates: There are two meeting points. The “Gin-no-suzu” (a large silver bell) is traditional and next to many shops and services. It is on the first underground floor. Another place on the same floor is a big square in front of a relief steam locomotive structure.

JR Ueno station: the ticket gates at the exit to the park, which is a good place to let daylight attenuate your jetlag.


Generative programming, in particular, in the form of staging, is widely recognized in HPC as the leading approach to resolve the conflict between high-performance on one hand, and portability and maintainability on the other hand. In its general form, staging is an implementation technique for efficient domain-specific languages (DSL), letting HPC experts conveniently express their domain-specific knowledge and optimization heuristics.? However, the results and tools of the current staging research are little used in the HPC community. Partly this is because HPC practitioners are not aware of the progress in staging; staging researchers are likewise unaware of what HPC practitioners really need. Bringing together HPC practitioners and programming language (PL) researchers to break this awareness barrier was the goal of the first Shonan meeting “Bridging the theory of staged programming languages and the practice of high-performance computing” (Shonan Seminar 19), which took place in May 2012. The meeting aimed to solicit and discuss real-world applications of assured code generation in HPC that would drive PL research in meta-programming.

The first Shonan meeting succeeded in its aim. It developed a set of benchmarks, representative HPC examples, where staging could be of help in producing more maintainable code and letting domain experts perform modifications at a higher-level of abstraction. This set was dubbed `Shonan Challenge’.

Shonan Challenge has greatly stimulated research and development of staging, resulting in extensible compilers based on staging (Rompf et al., POPL 2013) and the revival of MetaOCaml (ML 2013). The answers to Shonan Challenges have been presented in the overview paper (Aktemur et el., PEPM 2013), in (Rompf et al., POPL 2013) and in a poster at APLAS 2012. Shonan Challenge problems (specifically, the Hidden-Markov Model benchmark) were discussed at the staging tutorials given in 2013 at the premier PL conferences PLDI, ECOOP, and ICFP/CUFP.

It is time to report the progress in staging back to the HPC practitioners who posed the challenges, to evaluate how well the developed tools meet the needs of the HPC practice already, and what is yet to be done.

As before, we anticipate the workshop participants to consist of three groups of people: PL theorists, HPC researchers, and PL-HPC intermediaries (that is, people who are working with HPC professionals, translating insights from PL theory to HPC practice). To promote the mutual understanding, we plan for the workshop to have lots of time for discussion. We will emphasize tutorial, brainstorming and working-group sessions rather than mere conference-like presentations.




  1. Baris Aktemur
  2. Nada Amin
  3. Kenichi Asai
  4. Sven Bodo-Scholz
  5. Zach DeVito
  6. Lindsay Errington
  7. Robert Glück
  8. Tobias Grosser
  9. Jun Inoue
  10. Yukiyoshi Kameyama (organizer)
  11. Oleg Kiselyov (organizer)
  12. Frédéric Loulergue
  13. Takayuki Muranushi
  14. Ryan Rhodes Newton
  15. Georg Ofenbeck
  16. Jonathan Ragan-Kelley
  17. Tiark Rompf
  18. Jeremy Siek (organizer)
  19. Reiji Suda
  20. Daisuke Takahashi
  21. Jeremy Yallop