Seminars

NO.045 Towards the ground truth: Exact algorithms for bioinformatics research

Shonan Village Center

March 17 - 20, 2014 (Check-in: March 16, 2014 )

Organizers

  • Sebastian Böcker
    • Friedrich Schiller University Jena, Germany
  • Gunnar W. Klau
    • Centrum Wiskunde & Informatica (CWI), The Netherlands
  • Veli Mäkinen
    • Helsinki Institute for Information Technology (HIIT), University of Helsinki, Finland
  • Hon Wai Leong
    • National University of Singapore, Singapore

Overview

Today, bioinformatics has become an integral and indispensable part of life science research: Success stories include the assembly and deciphering of genomes, understanding the complexity of cellular processes by means of biological networks, recovering the “tree of life”, and deciding on treatment plans for HIV or cancer patients. Applications range from fundamental questions such as the origin of life to multi-billion dollar decisions on novel drug leads and molecular modeling. None of these questions could be approached without massive support from bioinformatics.

Many of the core challenges in bioinformatics can be described as combinatorial optimization problems. Examples are the identification of genes and regulatory structures within genomes; discovering genomic or transcriptomic variations; mining biological networks for, say, protein-protein interactions; or, establishing the evolutionary history of organisms, to name just a few.
Unfortunately, a large fraction?and arguably the majority?of these problems are NP-hard:
Prominent problems are Multiple Sequence Alignment or Maximum Parsimony in phylogenetics, but there are many more?the query “bioinformatics NP-hard” yields over 10,000 hits in Google Scholar.

It is common practice in bioinformatics to approach these NP-hard problems using heuristics.
Although the mathematical model provides only an imperfect approximation to the true goal, namely, to discover nature’s ground truth, finding optimal solutions is indispensable to rigorously evaluate the quality of the model. Heuristics and approximation algorithms are useless for this purpose, for which exact algorithms are needed. Furthermore, good exact algorithms provide deep insight in the structure of the underlying combinatorial problem, which leads to a better understanding of what exactly makes the biological question hard to solve.

In particular, modern measurement techniques such as high-throughput sequencing provide such direct access to the biological ground truth, so that problem modeling can be focused to reverse engineer the biotechnology protocol. While the combinatorial modeling of, for example, assembly problems related to sequencing typically lead to NP-hard problems, the dramatic decrease in sequencing costs also enables multiplexing divide-and-conquer approaches such that inputs to each problem instance become smaller. In these scenarios, exact algorithms for hard problems can be feasible both from the computational and economical perspective.

Recently, there has been much progress on solving combinatorial problems in bioinformatics to provable optimality, despite their hardness. Different techniques have contributed to this progress: in particular, Integer Linear Programming, data reduction and kernelization, and fixed-parameter algorithms. In addition, Algorithm Engineering techniques, which exploit the fact that the structure in realistic problem instances often deviates from the worst case scenario, have contributed to the success of many exact approaches. In contrast, classical exponential-time algorithms such as exhaustive search or higher-dimensional Dynamic Programming have played a negligible role in bioinformatics research.

The aim of this workshop is to bring together researchers active in exact approaches for combinatorial bioinformatics problems. We want to tackle the difficult issues these problems pose, and to exchange ideas and theoretical frameworks that allow the design and implementation of algorithms and methods for their solution. Researchers in this workshop come from different areas of algorithmics, such as kernelization and Integer Linear Programming; assembling their views and ideas will foster the applicability of exact algorithms in bioinformatics. Through discussion and sharing knowledge, we will promote collaborations, contribute to the progress in this growing field, and make the field more visible for other scientists.


Report

No-045.pdf