No.136 Functional Stream Libraries and Fusion: What’s Next?

NII Shonan Meeting:

@ Shonan Village Center, October 22 – 25, 2018

Organizers

  • Aggelos Biboudis, EPFL, Switzerland
  • Oleg Kiselyov, Tohoku University, Japan
  • Martin Odersky, EPFL, Switzerland

Overview

Description of the meeting

Stream processing defines a pipeline of operators that transform, combine, or reduce (even to a single scalar) large amounts of data. Characteristically, data is accessed strictly linearly rather than randomly and repeatedly—and processed uniformly. The upside of the limited expressiveness is the opportunity to process large amount of data efficiently, in constant and small space.

 

Stream processing underlies one of the first programming languages (COBOL), and it has been coming to forefront again, with the popularity of big data and MapReduce. The widespread automated trading is one example of extremely high-performant stream processing; software-defined radio is another such example. Widely-used stream libraries are now available for virtually all modern OO and functional languages, from Java to C# to Scala to OCaml to Haskell, for small and big data. Streams also motivate a vast range of topics in computer science such as dependent types, termination and partial evaluation. Streams stress virtual machines and CPU memory access paths, and pose challenges for compiler writers and hardware designers.

 

Functional stream libraries let us easily build stream processing pipelines, by composing sequences of simple transformers such as map or filter with producers (backed by an array, a file, or a generating function) and consumers (reducers). The purely applicative approach of building a complex pipeline from simple immutable pieces simplifies programming and reasoning: the assembled pipeline is an executable specification. To be practical, however, a library has to be efficient: at the very least, it should avoid creating intermediate structures – especially structures like files and lists whose size grows with the length of the stream. Even the bounded-size intermediate structures significantly, up to two orders of magnitude, slow down the processing. Eliminating the intermediate strctures is the central problem in stream processing: so-called stream fusion.

 

Stream fusion has been the subject of intensive research since late 1950’s. By now, the low-hanging fruit in stream processing has been all picked up – although some of it though quite recently, POPL 2017. Stream fusion made it to the front page of CACM1. Java 8 Streams and Haskell compilers, among others, have implemented some of the earlier research results. We have attained a milestone. What are the further challenges?

 

We thus propose a discussion-heavy workshop to assess the state of the art and to answer:
  1. Can the recent advances in ‘traditional’ stream fusion be extended to non-traditional domains such as GPGPU, heterogenous architectures (FPGA)? Can the methods of stream fusion be applied to functional reactive programming (FRP) and optimizing database queries?
  2. Can we bridge functional and effectful stream libraries?
  3. What are the next big challenges? Which real world scenarios require new advances in stream fusion?
Just as the two Shonan meetings (No.2012-4 “Bridging the Theory of Staged Programming Languages and the Practice of High-Performance Computing” and No. 2014-7 “Staging and High-Performance Computing: Theory and Practice”) aimed to solicit and discuss real-world applications of assured code generation in HPC (High-Performance Computing) that would drive programming language research, we likewise aim to compile the case studies requiring further research in stream fusion, and distill them into a benchmark.

 

To promote mutual understanding, we plan for the workshop to have lots of time for discussion. We will emphasize tutorial, brainstorming and working-group sessions rather than mere conference-like presentations.