NO.250 Advances and opportunities in artificial intelligence for computational metabolomics
June 29 - July 2, 2026 (Check-in: June 28, 2026 )
Organizers
- Tomáš Pluskal
- IOCB Prague, Czech Republic
- Wout Bittremieux
- University of Antwerp, Belgium
- Ichigaku Takigawa
- The University of Tokyo, Japan
Overview
Description of the Meeting
The convergence of biology, chemistry, and computer science is opening unprecedented opportunities to decode the molecular underpinnings of life. At the heart of this endeavor lies metabolomics, the study of small molecules that serve as the building blocks and dynamic drivers of biological processes. These metabolites play critical roles in cellular growth, development, communication, and response to environmental changes. Furthermore, their relevance extends to drug discovery and development, as many pharmaceuticals are small molecules targeting specific pathways. Metabolomics offers a uniquely direct window into the functional state of biological systems, providing unparalleled insights into phenotype—the observable characteristics and traits of organisms.
Mass spectrometry is the key analytical technique for metabolomics, offering unparalleled sensitivity and precision in detecting and characterizing metabolites. Advances in instrumentation have enabled the generation of increasingly complex and large datasets, revealing rich biochemical diversity across various biological and environmental systems.
However, this flood of data presents significant computational challenges. Only 5–10% of the mass spectra generated in untargeted experiments can be confidently annotated, leaving vast swathes of the metabolome unexplored. The result is a “dark metabolome,” a largely uncharted frontier that holds immense potential for scientific discovery.
The limitations of traditional computational approaches for metabolite identification underscore the need for innovative solutions. Biomedical research is emerging as a pivotal domain for AI breakthroughs, yet computational metabolomics has lagged behind in comparison to fields like genomics, proteomics, and drug discovery. The metabolomics community stands at a critical juncture, where the integration of advanced AI techniques could fundamentally redefine the landscape of the field.
This Shonan Seminar seeks to address this pressing need by fostering cross-disciplinary collaboration between computational scientists, biochemists, mass spectrometrists, and machine learning researchers. Our goal is to catalyze the development of AI-driven methods tailored to metabolomics, leveraging the unique setting and collaborative format of the Shonan meeting to tackle these challenges. Building on the successes of previous Shonan seminars in 2017 and 2023, which laid the foundation for vibrant community initiatives such as the CompMS Slack Workspace (over 350 active members) and open source projects like DreaMS, a foundational model for small molecule mass spectrometry, and MassSpecGym, a public benchmark for annotation tasks in untargeted metabolomics, this seminar aims to broaden the scope and impact of the field.
The seminar will gather experts from diverse domains, including quantum chemistry, bioinformatics, machine learning, and metabolomics, to collaboratively define the next generation of computational tools for metabolomics. Uniquely, the Shonan Seminar format enables deep, focused engagement over several days, fostering a level of collaboration and intellectual exchange that traditional conferences cannot match. The proposed seminar is ideally timed to capitalize on the growing momentum in computational metabolomics, engaging the broader AI and computer science communities to help illuminate the dark metabolome and unlock its potential for biology, medicine, and environmental science. This seminar aims to not only advance computational metabolomics but also establish it as a field where AI and domain-specific expertise converge to solve challenging scientific questions.
Potential Topics for Discussions
Building Foundational Models for Metabolomics: The potential of large datasets of unannotated spectra or molecular structures to construct robust foundational models for metabolomics is immense. How can advanced training strategies such as contrastive learning, masked spectrum modeling, or other self-supervised techniques, address the current annotation bottlenecks? Are there innovative approaches, such as incorporating domain-specific architectures or multi-modal learning, that could be uniquely suited to mass spectrometry data?
Generative AI for Molecular Structures: Recent advances in generative models for graphs and molecules, including diffusion models, flow-based approaches, and autoregressive models, offer exciting opportunities for metabolomics. How can these methods be adapted to the specific challenges of the field, such as conditioning generative models on mass spectral data to produce molecular structures? What are the most effective ways to integrate domain knowledge to improve model accuracy and applicability?
Modeling Uncertainty in Molecular Identification: Molecular identification from mass spectra often involves inherent uncertainties, especially when spectral data lacks sufficient structural information. How can uncertainty quantification techniques improve the reliability of predictions? What are the best practices for reporting predicted molecular structures, ensuring reproducibility and transparency while acknowledging the limits of current models?
Efficient Data Management and Standardization: The mass spectrometry community faces significant challenges in handling and utilizing large datasets. What are the best practices for working with high-dimensional, unannotated data? Can recent innovations such as vector databases or neural search methods enable more efficient data manipulation, retrieval, and annotation? Should the field prioritize the construction of large, standardized hypothetical metabolite databases to expand coverage?
Generalization Beyond Annotated Molecular Space: With the molecular diversity of metabolites far exceeding annotated databases, how can models generalize to predict truly novel molecular structures? Techniques like few-shot learning, domain adaptation, and test-time training hold promise for overcoming this limitation. How can these approaches be leveraged to explore “unknown unknowns” in metabolomics and unlock hidden biochemical knowledge?
Quantum Chemistry and Machine Learning Integration: Simulating fragmentation spectra has traditionally relied on computationally expensive quantum chemical methods. Can machine learning approximate these simulations with sufficient accuracy, enabling scalable solutions? How can hybrid approaches that combine quantum chemistry and machine learning improve the simulation of fragmentation spectra?
Enhancing Molecular Networking with AI: Molecular networking is a widely used technique for organizing and visualizing relationships in untargeted metabolomics data using heuristic algorithms. Can machine learning enhance molecular networking by incorporating repository-scale data, enabling more nuanced connections and discoveries?
Community and Talent Development: The challenges of computational metabolomics require a diverse and engaged community. How can we spread awareness of these challenges and opportunities to attract talent from machine learning and data science communities? Can targeted outreach initiatives, including academic-industry partnerships and public benchmarks like MassSpecGym, foster deeper collaboration?
Organizing a Kaggle Competition: A Kaggle competition could serve as a gateway for engaging the broader machine learning and data science communities. Building on the recently established MassSpecGym benchmark, such a competition would provide clear and accessible entry points for researchers unfamiliar with metabolomics.
These topics encapsulate some of the critical challenges and opportunities at the intersection of AI and metabolomics. Through the Shonan Seminar’s unique interdisciplinary format, we aim to foster a collaborative environment where computer scientists, chemists, and biologists can come together to tackle these issues, pushing the boundaries of what is possible in computational mass spectrometry and metabolomics.