No.113 Meta-Programming for Statistical Machine Learning

Icon

NII Shonan Meeting Seminar 113

Overview

Statistical machine learning (ML) is a broad branch of machine learning aiming at drawing conclusions and learning from inherently uncertain data, using “ideas from probability theory and statistics to address uncertainty while incorporating tools from logic, databases,  and programming languages to represent structure.” (Getoor, Taskar, 2007).

Just as importance of ML increases, the scalability problem of developing ML applications becomes more and more pressing. Currently, applying a non-trivial machine learning task requires expertise both in the modeled domain as well as in probabilistic inference methods and their efficient implementations on modern hardware. The tight coupling between the model and the efficient inference procedure hinders making changes and precludes reuse. When the model changes significantly, the inference procedure often has to be re-written from scratch.

Probabilistic programming — which decouples modeling and inference and lets them be separately written, composed and reused — has the potential to make it remarkably easier to develop new ML tasks and keep adjusting them, while increasing confidence in the accuracy of the results. That promise has been recognized by the U.S. Defense Advanced Research Projects Agency (DARPA), which has initiated the broad program Probabilistic Programming for Advancing Machine Learning (PPAML),
started in March 2013 and running through 2017. The range of targeted applications can be seen from PPAML Challenge problems.

Developing the potential of probabilistic programming requires applying the recent insights from programming language (PL) research such as supercompilation from metaprogramming (with very promising results shown in Lingfeng Yang et al., AISTATS 2014). A surprising challenge is correctness: it turns out that a number of well-known and widely-used libraries and systems such as STAN may produce patently wrong results on some problems (as well-demonstrated in Hur et al., FSTTCS 2015).

Hand-in-hand with the interest of ML researchers in programming language topics (evidenced from PPAML PI meetings at which one of the organizers participated) is the growing interest of programming language researchers in probabilistic programming — if the record attendance of the first two POPL-affiliated workshops Probabilistic Programming Semantics are of any indication.

We propose a discussion-heavy workshop to promote the evident and growing interest of the developers of ML/probabilistic domain-specific languages in program generation and transformation, and programming language researchers in ML applications. We expect many participants come from PPAML teams and participants of the PPS workshops. The Shonan meeting coincides with the conclusion of PPAML program. We hope it to be a venue to discuss the not-yet-answered challenges and the issues raised at PPS workshops, but in more depth
and detail.

We anticipate the workshop participants to consist of three groups of people: Statistical Machine Learning, researchers and practitioners building, using, and adjusting probabilistic learning systems, and PL researchers with some connections to ML.

  • Probabilistic programming is coming of age and could really help real ML people in some cases. Selling points: correctness by construction (ML codes are very hard to debug and test) and some consistency in performance (saves time in many optimizations and
    writing custom code).

  • Many implementors of probabilistic languages and libraries come to realize the importance of meta-programming and PL research in general (determining the validity of optimizations/transformations, knowledge of transformation techniques and good ways/algorithms of applying them, knowing tools like Lightweight Modular Staging (LMS), partial evaluators, staged languages).
  • Treating programs as subjects of probabilistic computation, in the sense of learning facts about programs from data, i.e., learning from “big code”.

Our goal is to bring three groups together and see what probabilistic programming can do more, and mainly how we can apply advances in PL and meta-programming more consciously and profitably (and if we cannot, what the PL community should be investigating then).

Just as the two Shonan meetings (No.2012-4 “Bridging the Theory of Staged Programming Languages and the Practice of High-Performance Computing” and No. 2014-7 “Staging and High-Performance Computing: Theory and Practice”) aimed to solicit and discuss real-world
applications of assured code generation in HPC (High-Performance Computing) that would drive PL research in meta-programming, we propose a similar direction for ML and meta-programming.

To promote mutual understanding, we plan for the workshop to have lots of time for discussion. We will emphasize tutorial, brainstorming and working-group sessions rather than mere conference-like presentations.

Category: Overview

Tagged:

Comments are closed.