NO.258 Advancing Open Machine Learning

March 15 - 18, 2027 (Check-in: March 14, 2027 )

Organizers

Joaquin Vanschoren
- Eindhoven University of Technology, Netherlands
Saso Dzeroski
- Josef Stefan Institute, Slovenia
Takashi Washio
- Kansai University, Japan

Overview

Description of the Meeting

Machine learning (ML) is a key AI technology and has major impact on scientific research, industry, and people's daily lives. However, it is critically important that machine learning is also open. That means that data, code, models, and methodologies are open, accessible, and well-organized so that others can discover and use them, to foster innovation, transparency, accessibility, and trust in technology. This is important for:

・ Reproducibility, ensuring that results can be verified and trusted, allowing others to replicate experiments and validate conclusions.
・ Transparency, ensuring that the inner workings of ML models are open to scrutiny, enabling people to understand how results are generated and identify limitations.
・ Collaborative problem-solving, allowing researchers, engineers, and enthusiasts worldwide to contribute to and improve upon existing tools and frameworks.
・ Democratizing access to AI, making data, code, and models freely available to researchers, educators, and practitioners worldwide
・ Fostering trust in AI, ensuring that data and models can be audited for ethical concerns, biases, and fairness, and building public trust in AI technologies.
・ Open standards, allowing us to interconnect data platforms and AI tools and make it easier to discover, share, and reuse AI resources
・ Interdisciplinary collaboration, allowing AI researchers to help solve societal challenges in areas such as climate change, energy, healthcare, and education.

Recent years have seen a trend towards more proprietary ML systems. While powerful, these are often opaque, expensive, and controlled by a few corporations, leading to risks of monopolization and lack of accountability. Open source ML provides a counterbalance, ensuring that advancements in AI are not dictated exclusively by private interests, but remain a public good.

In this workshop, we bring together proficient researchers from all over the world who are actively promoting and building open machine learning, and leading key research and industry projects in this area. Specifically, it will be centered on expanding and improving OpenML, a collaborative open science platform for Machine Learning available at https://openml.org, and bringing together communities in Europe, America, and Asia to find new synergies and start new initiatives.
OpenML is a fully open platform where researchers can share their machine learning data, algorithms, models, and experiments. OpenML evaluates and organizes all results into a coherent whole so that they can be objectively compared, reproduced and reused. All data is accessible through the website and programmatic interfaces in languages familiar to researchers, such as Python, R, and Java. OpenML launched in 2014 and is now used by over 300.000 researchers. All components are open source (https://github.com/openml), and the development of the platform is sustained through data and code contributions by volunteers in our community. Most of these contributions are made during hackathons and seminars that we have been organizing regularly around Europe, including three seminars at Dagstuhl. OpenML collaborates with many other open science initiatives, such as creating the Croissant standard for machine learning dataset with MLCommons, Google, Kaggle, and HuggingFace, and the OpenML dataset loader in scikit-learn.

We long aspired to organize a seminar at Shonan to foster diverse research collaborations, bringing together open machine learning researchers from Europe, America, and Asia into the same room. We aim to work on enabling more open machine learning research by expanding the OpenML catalog and infrastructure to novel AI tasks, models, and data types.

Key research challenges to be discussed may include:
・ LLMs and agentic AI are changing the way researchers conduct their research. How can OpenML ensure that datasets, models, and benchmarks are used correctly by AI tools?
・ How to evaluate large foundation models? If most benchmarks are broken, how do we evaluate these models to get a deeper understanding and drive real progress?
・ Fragmented infrastructure: datasets, models, and evaluations spread across different, often disconnected, platforms. How can we make them interoperable?
・ How can we make it easier for researchers to track the complete lifecycle of AI experiments and create a "collective memory" of AI research?
・ Not all researchers have equitable access to the high-performance computing (HPC) resources needed to train and run complex AI models. Can we remedy this?
・ How can we foster an open science culture that combines good scientific practices with good AI practices, and how can we support it through better infrastructure?

We also want to facilitate open research by building strong new benchmarks for state-of-the- art challenges, and accelerate science by initiating collaborations that use OpenML to conduct state-of-the-art research. We believe that this event will engender state-of-the-art, impactful work, but in the process also improve open science infrastructure.

We propose a standard 4-day seminar schedule that balances focused work on specific topics, and the opportunity to have shorter break-out sessions to discuss or work on smaller problems. The exact content of the seminar is entirely shaped with input from the seminar participants.

・ Prior to the seminar:
- Disseminate material to explore OpenML and get more familiar. This will be accompanied by (online) office hours, where people can ask questions or get help.
- Brainstorm topics that may be interesting to work on during the seminar.

・ Day 1:
- Timeslot 1: Short introduction to OpenML - plenary talk and live demonstration.
- Timeslot 2: Pitch topic ideas, and self-organize into topic groups on popular ideas.
Our aim is to make sure each topic group has 4-8 members, so everyone has the opportunity to make individual contributions but the minimum size is still large enough to conduct a meaningful amount of work in the limited timespan of the seminar.
- Timeslot 3: Additional time for people to discuss ideas, or have a short break-out session: a short meeting centered around a specific topic or goal. We will also host an interactive tutorial on OpenML, for those participants that want to get more familiar.
- Timeslot 4: The topic groups will work on their projects.
- Day 2:
- Timeslot 1&2: The topic groups will work on their projects.
- Timeslot 3&4: Not scheduled (yet). People may choose to continue work on the topic group, or propose break-out sessions.
- Day 3, Timeslot 1&2: The topic groups will work on their project.
- Day 4:
- Timeslot 1: Continue work with the topic groups, with the goal of either finishing their project or creating a plan for continuation after the seminar.
- Morning, 11-12: Plenary session to wrap up the seminar. Topic groups present their work.

Seminars

NO.258 Advancing Open Machine Learning

Organizers

Overview

Description of the Meeting