No. 045 Towards the ground truth: Exact algorithms for bioinformatics research

Icon

NII Shonan Meeting Seminar 045

Protected: Internal information

This content is password protected. To view it please enter your password below:

Organizers

  • Sebastian Böcker, Friedrich Schiller University Jena, Germany
  • Gunnar W. Klau, Centrum Wiskunde & Informatica (CWI), Amsterdam, The Netherlands
  • Hon Wai Leong, National University of Singapore

Participants

  • Sebastian Böcker, Friedrich Schiller University Jena
  • Paola Bonizzoni, Università di Milano-Bicocca
  • Laurent Bulteau, TU Berlin
  • Francis Chin, The Universiy of Hong Kong
  • Mohammed El-Kebir, CWI Amsterdam
  • Mike Fellows, Charles Darwin University
  • Iman Hajirasouliha, Brown University
  • Falk Hüffner, Technische Universität Berlin
  • Mikko Koivisto, University of Helsinki
  • Christian Komusiewicz, TU Berlin & Univ. Nantes
  • Hon Wai Leong, National University of Singapore
  • Matthias Müller-Hannemann, Martin-Luther University Halle-Wittenberg
  • Marco Pellegrini, CNR
  • Nadia Pisanti, University of Pisa
  • Alberto Policriti, University of Udine and Applied Genomic Institute
  • Sven Rahmann, University of Duisburg-Essen
  • Frances Rosamond, Charles Darwin University
  • Kunihiko Sadakane, NII
  • Chuan Yi Tang, Providence University
  • Annelyse Thévenin, Bielefeld University
  • Alexandru Tomescu, University of Helsinki
  • Fabio Vandin, Brown University
  • Gunnar W. Klau, CWI Amsterdam
  • Annegret Wagler, Université Blaise Pascal
  • Tim White, FSU Jena
  • Siu Ming Yiu, The University of Hong Kong

Schedule

Arrival day: Sunday, March 16th, 2014

  • 15:00 ? Hotel Check In (early check-in from 12:00 is negotiable if informed in advance)
  • 19:00 ? 21:00: Welcome banquet

Day 1: Monday, March 17th

  • 7:30 ? 9:00: Breakfast
  • 9:00 – 10:30: Seminar start. Introduction
  • 10:30 – 11:00: Tea break
  • 11:00 – 12:00: Session 1 (topic tba)
  • 12:00 – 13:30: Lunch
  • 13:30 – 14:00: Group photo shooting
  • 14:00 – 15:30: Session 2 (topic tba)
  • 15:30 – 16:00: Tea break
  • 16:00 – 18:00: Session 3 (topic tba)
  • 18:00 – 19:30: Dinner

Day 2: Tuesday, March 18th

  • 7:30 ? 9:00: Breakfast
  • 9:00 – 10:30: Session 4 (topic tba)
  • 10:30 – 11:00: Tea break
  • 11:00 – 12:00: Session 5 (topic tba)
  • 12:00 – 13:30: Lunch
  • 13:30 – 15:30: Session 6 (topic tba)
  • 15:30 – 16:00: Tea break
  • 16:00 – 18:00: Session 7 (topic tba)
  • 18:00 – 19:30: Dinner

Day 3: Wednesday, March 19th

  • 7:30 ? 9:00: Breakfast
  • 9:00 – 10:30: Session 8 (topic tba)
  • 10:30 – 11:00: Tea break
  • 11:00 – 12:00: Session 9 (topic tba)
  • 12:00 – 13:30: Lunch
  • 13:30 – 18:00: Excursion/hiking
  • 19:00 – 21:30: Main Banquet

Day 4: Thursday, March 20th

  • 7:30 ? 9:00: Breakfast
  • 9:00 – 10:30: Session 10 (topic tba)
  • 10:30 – 11:00: Tea break
  • 11:00 – 12:00: Concluding session
  • 12:00 – 13:30: Lunch

Overview

Today, bioinformatics has become an integral and indispensable part of life science research: Success stories include the assembly and deciphering of genomes, understanding the complexity of cellular processes by means of biological networks, recovering the “tree of life”, and deciding on treatment plans for HIV or cancer patients. Applications range from fundamental questions such as the origin of life to multi-billion dollar decisions on novel drug leads and molecular modeling. None of these questions could be approached without massive support from bioinformatics.

Many of the core challenges in bioinformatics can be described as combinatorial optimization problems. Examples are the identification of genes and regulatory structures within genomes; discovering genomic or transcriptomic variations; mining biological networks for, say, protein-protein interactions; or, establishing the evolutionary history of organisms, to name just a few. Unfortunately, a large fraction?and arguably the majority?of these problems are NP-hard: Prominent problems are Multiple Sequence Alignment or Maximum Parsimony in phylogenetics, but there are many more?the query “bioinformatics NP-hard” yields over 10,000 hits in Google Scholar.

It is common practice in bioinformatics to approach these NP-hard problems using heuristics. Although the mathematical model provides only an imperfect approximation to the true goal, namely, to discover nature’s ground truth, finding optimal solutions is indispensable to rigorously evaluate the quality of the model. Heuristics and approximation algorithms are useless for this purpose, for which exact algorithms are needed. Furthermore, good exact algorithms provide deep insight in the structure of the underlying combinatorial problem, which leads to a better understanding of what exactly makes the biological question hard to solve. In particular, modern measurement techniques such as high-throughput sequencing provide such direct access to the biological ground truth, so that problem modeling can be focused to reverse engineer the biotechnology protocol. While the combinatorial modeling of, for example, assembly problems related to sequencing typically lead to NP-hard problems, the dramatic decrease in sequencing costs also enables multiplexing divide-and-conquer approaches such that inputs to each problem instance become smaller. In these scenarios, exact algorithms for hard problems can be feasible both from the computational and economical perspective.

Recently, there has been much progress on solving combinatorial problems in bioinformatics to provable optimality, despite their hardness. Different techniques have contributed to this progress: in particular, Integer Linear Programming, data reduction and kernelization, and fixed-parameter algorithms. In addition, Algorithm Engineering techniques, which exploit the fact that the structure in realistic problem instances often deviates from the worst case scenario, have contributed to the success of many exact approaches. In contrast, classical exponential-time algorithms such as exhaustive search or higher-dimensional Dynamic Programming have played a negligible role in bioinformatics research.

The aim of this workshop is to bring together researchers active in exact approaches for combinatorial bioinformatics problems. We want to tackle the difficult issues these problems pose, and to exchange ideas and theoretical frameworks that allow the design and implementation of algorithms and methods for their solution. Researchers in this workshop come from different areas of algorithmics, such as kernelization and Integer Linear Programming; assembling their views and ideas will foster the applicability of exact algorithms in bioinformatics. Through discussion and sharing knowledge, we will promote collaborations, contribute to the progress in this growing field, and make the field more visible for other scientists.