May 31, 2014 Comments Off
Specializing Sparse Matrix-Vector Multiplication
Sparse matrix-vector multiplication (SpMV) is a kernel operation used in many scientific computation domains. SpMV is amenable to optimization by using generative programming to specialize the function based on the structure and values of the sparse matrix. In this work we evaluate the performance gains obtainable from specialized SpMV by generating code using several different methods. We then discuss how to predict which code generation method will give the best performance. We finally share our experiences regarding generating high-performant SpMV code quickly.
On the Mechanics of a Program-Generator Generator
We present principles and implementation behind a light-weight program-generator generator based on partial evaluation techniques for a recursive flowchart language.
An Innocent Solution to Shonan Challenge 2012
The first Shonan Challenge collects a set of problems for generating optimized programs in the domain of high performance computing. Even though several solutions have been already posted by programming language experts, it was not clear how a non-expert can solve the problems. In this talk I will report a student’s solution to the Hidden Markov Model (HMM) problem, and how he developed his solution to accommodate several optimizations. I also briefly mention the evaluation of his solution and the lessons learned. This talk is based on the work by Haruki Shimizu.
Code Generation Challenge for Staggered Mesh
In ordered-mesh based solvers of partial differential equations, every
discretized physical values lie on integer coordinates. Staggered mesh is where rational-number coordinates such as half-integers are allowed. For example in hydrodynamics simulations, gas density variables may lie in the center of the cubic cells while gas flux variables may sit on the center of the cell surfaces. Staggered meshes are sometimes consequences of higher-order differentiation, and sometimes important for numerical stability or conservation laws. On the other hand, staggered mesh algorithms require slightly different treatment for different variables or different vector components. This poses challenge to abstraction, and the failure of abstraction often results in Fortran or C code where x-component logic is copied and slightly modified for y-component and then z-component, and so on.
In my talk I’ll illustrate this staggered mesh problem code by taking elastic wave equations as an example.
DSL embedding – how do we know if it’s worth it?
The embedded DSL approach has established benefits but also suffers from problems that do not necessarily show up in code figures of academic papers: e.g. terrible error messages and other limitations of syntax overloading. In this talk I will argue against the the traditional cost-benefit analysis ascribed to embedded/standalone DSL design decision. To illustrate, I will discuss three hypothetical alternative front-ends to the Accelerate GPU DSL: Haskell-embedded (the original), standalone with language subsetting, and Racket-embedded with its own type system.
Embedding Application Knowledge for Improved Dynamic Adaptation
Generating efficient code for a range of heterogeneous hardware from DSLs is hard. Trying to achieve the same for Not-So-Domain-Specific-Languages is even harder. This talk presents some of our latest ideas on how to provide the compiler/code generator with domain knowledge without modifying the code generator at all. Instead, we use programmer-provided alternatives in conjunction with dynamic adaptation to achieve overall efficiency. This is very much work in progress; besides presenting the motivation and basics of the approach it provides several open questions.
More challenges for HPC program generation
I introduce several optimizations (program transformations) that are frequently used by HPC programmers. I present several questions for staging.