NO.235 LLM-guided Synthesis, Verification, and Testing of Learning-Enabled CPS

March 9 - 12, 2026 (Check-in: March 8, 2026 )

Organizers

Xi Zheng
- Macquarie University, Australia
Simon Thompson
- Tier IV, Japan
Sanjoy Baruah
- Washington University in St. Louis, USA

Overview

Research Area and Meeting Agenda

The fusion of machine learning (ML) and cyber-physical systems (CPS) has enabled a range of revolutionary applications, including autonomous
vehicles (Waymo [29], Tesla Autopilot [24], Uber ATG [27]), delivery drones (Amazon Prime Air [1], Google Wing [9], Zipline [33]), service robots (Softbank Robotics [22], Boston Dynamics [4], Savioke [19]), and automated surgery systems (Da Vinci [5], Mazor [16], Mako [15]). Despite their potential, these learning-enabled CPS applications have encountered critical safety issues, leading to human fatalities and significant economic losses [17, 28, 8, 21].

To effectively manage critical safety issues in learning-enabled CPS, robust verification and validation are crucial. Traditional methods, designed for deterministic software behaviors and formal properties, fall short in the CPS development life cycle, which is inherently more complex due to the integration of data and machine learning. Unlike traditional systems, learning-enabled CPS require navigating the quality and representativeness of training data and handling out-of-distribution scenarios during real-world deployment. Moreover, the probabilistic nature and opaqueness of machine learning models, along with the absence of formal modeling tools suitable for these systems, pose further challenges to conventional verification approaches [30, 18, 31, 32].

While several well-established research communities are addressing different aspects of the puzzle, these efforts are often fragmented and lack coordination, particularly in tackling the testing and verification challenges within complex machine learning pipelines and additional data contexts. One of the most daunting tasks is the generation of diverse corner cases that expose unsafe behaviors in such systems, which remains a significant hurdle due to a vast search space and limited domain knowledge [14, 25, 13, 7, 26, 6]. Furthermore, although model learning techniques have progressed from Biermann’s offline approach to Angluin’s more efficient online L* algorithms, their practical implementation is still fraught with challenges arising from the complexities of real system tests [12, 23, 2, 10]. In model-based testing, existing tools that rely on symbolic execution, random testing, and mutation tests, such as Modbat and MoMuT for navigating state machine models, do not adequately meet the specialized needs of learning-based CPS [20, 3, 11]. There is an urgent need for cutting-edge testing
and verification methods. Leverage LLM presents immense opportunities in this regard.

The meeting aims to unite experts from software engineering, system engineering, control theory, robotics, formal methods, and AI across academia and industry. The primary goal is to forge cutting-edge testing and verification methodologies for learning-enabled CPS utilising LLMs. Moreover, we are also looking into innovative ways to synthesize learning-enabled CPS that are, by design, easier to verify and test.Motivated by the broad adoption of LLMs, this gathering seeks to harness these models to utilise human knowledge encapsulated in existing rules and analyse the large amount of data from CPS, including sensor outputs and event logs. Our discussions will focus on how LLMs, through mining human knowledge and data analysis, can enhance our understanding of system behaviors and generate a rich set of realistic, high-quality test data. We will also explore how improved data quality can aid in applying both active and passive learning methods to better define and refine formal specifications. Additionally, the agenda includes enhancing model-based testing through these refined specifications, aiming for a more systematic approach over traditional heuristic techniques like search-based testing. Another intriguing topic is the potential synergy between human expertise and LLMs in synthesizing critical components from specifications that are easier to verify and test,as well as integrating design-time testing with real-time verification. The overarching goal of the meeting is to champion this vital research trajectory and foster its industrial adoption, providing practical, efficient solutions for the testing and verification of reliable learning-enabled CPS.

Participants will be encouraged to share short summaries of their ongoing/ prior work before the seminar, so that everyone at the seminar can learn a bit about the other participants beforehand. The meeting will feature two main types of sessions. The first type involves short presentations during which participants will share their research work, highlighting newly discovered problems and research directions they wish to explore further during the meeting. These presentations are planned for the first two days. The second type of session will focus on group discussions. The discussion topics have been preliminarily outlined in the proposal but may be adjusted based on the presentations from the first two days. Topics will be addressed in a opportunistic order, dedicating these sessions to in-depth discussions on the chosen subjects. For each discussion group, a research agenda will be established, and individuals will be invited to lead these agendas. Our aim is to compile and publish our findings in a leading journal, such as Communications of the ACM.

Related Previous Workshops and Seminars

We acknowledge that there are both past and upcoming scientific events with related topics which are similar to us but are very distinct to each other. Some of them are:
Shonan Seminar No. 156: Software Engineering for Machine Learning Systems: The seminar is centered on the development paradigm for machine learning systems, whereas our emphasis is on the testing and verification perspective for learning-enabled CPS, facilitated by large-language models.
Shonan Seminar No. 178: Formal Methods for Trustworthy AI-based Autonomous Systems: This seminar is dedicated to exploring formal methods for enhancing the trustworthiness of AI-based autonomous systems. Our approach, however, delves into practical testing techniques and investigates how these can be integrated with verification methods, leveraging Large-Language Models.
Shonan Seminar No 204: DevOps for Cyber-Physical Systems: This seminar explores the role of DevOps in enhancing the development efficiency of COPS. Our seminar complements this by also considering DevOps, yet with a focus on seamlessly integrating testing and verification into the industrial DevOps pipeline, facilitated by the use of Large-Language Models.
Shonan Seminar No 176: Foundation Models and Software Engineering: Challenges and Opportunites: This seminar concentrates on addressing challenges such as hallucination, logical reasoning, and prompt engineering in foundation models, including large language models. This theme complements our seminar, which specifically targets the application of large language models for testing and verification in learning-enabled CPS.
Shonan Seminar No 208: Trustworthy Machine Learning System Engineering Techniques for Practical Applications: This seminar is centered on innovative engineering practices aimed at enhancing the trustworthiness of ML-based systems, covering a wide array of topics. In contrast, our focus is more targeted, specifically on the testing and verification challenges in learning-enabled CPS and their connection to system trustworthiness. Furthermore, we explore how these testing and verification challenges can be addressed through the innovative approaches outlined in our proposal, utilizing Large-Language Models.
Shonan Seminar No 222: The Future of Development Environments with AI Foundation Models: This seminar explores the use of AI Foundational models in enhancing integrated development environments. It complements our seminar, which specifically delves into the cutting-edge synthesis, testing and verification of learning-enabled CPS, facilitated by Large Language Models.

References

[1] AmazonPrimeAir. 2023. URL: https://shorturl.at/otyU3.
[2] Y. Annpureddy et al. “S-taliro: A tool for temporal logic falsification for hybrid systems”. In: TACAS. Springer. 2011.
[3] C. V. Artho et al. “Modbat: A model-based API tester for event-driven systems”. In: HVC. Springer. 2013.
[4] Boston Dynamics. 2023. URL: http://www.bostondynamics.com/.
[5] Da Vinci Surgical System. 2023. URL: http://www.intuitivesurgical.com/.
[6] Y. Deng et al. “A declarative metamorphic testing framework for autonomous driving”. In: IEEE Transactions on Software Engineering (2022). DOI: 10.1109/TSE.
2022.3206427.
[7] Y. Deng et al. “An analysis of adversarial attacks and defenses on autonomous driving models”. In: PerCom. IEEE. 2020. DOI: 10.1109/PerCom45495.2020.
9127389.
[8] DW. Volkswagen: Robot kills worker installing it. 2015. URL: https://shorturl.at/cnCEN.
[9] Google Wing. 2023. URL: http://www.wing.com/.
[10] S. Jha et al. “Telex: Passive stl learning using only positive examples”. In: RV. Springer. 2017.
[11] W. Krenn et al. “Momut:: UML model-based mutation testing for UML”. In: ICST. IEEE. 2015.
[12] M. Leucker. “Learning meets verification”. In: FMCO. Springer. 2006.
[13] G. Li et al. “Av-fuzzer: Finding safety violations in autonomous driving systems”. In: ISSRE. IEEE. 2020. DOI: 10.1109/ISSRE5003.2020.00012.
[14] G. Lou et al. “Testing of autonomous driving systems: where are we and where should we go?” In: FSE. 2022. DOI: 10.1145/3540250.3549111.
[15] Mako Robotic-Arm. 2023. URL: http://www.stryker.com/.
[16] Mazor Robotics Guidance Systems. 2023. URL: http://www.medtronic.com/.
[17] A. Press. Nearly 400 car crashes in 11 months involved automated tech, companies tell regulators. 2022. URL: https://shorturl.at/rEHS1.
[18] A. A. Santos, A. F. da Silva, and F. Pereira. “Simulation of Cyber-Physical Intelligent Mechatronic Component Behavior Using Timed Automata Approach”. In:
International Conference Innovation in Engineering. Springer. 2022.
[19] Savioke. 2023. URL: http://www.savioke.com/.
[20] I. Schieferdecker and A. Hoffmann. “Model-based testing”. In: IEEE software 29.1 (2012).
[21] S. Singh et al. “Instrument malfunction during robotic surgery: A case report”. In: Indian Journal of Urology: IJU: Journal of the Urological Society of India 32.2
(2016). DOI: 10.4103/0970-1591.174781.
[22] SoftbankRobotics. 2023. URL: http://www.softbankrobotics.com/.
[23] B. Steffen, F. Howar, and M. Merten. “Introduction to active automata learning from a practical perspective”. In: SFM (2011).
[24] Tesla. 2023. URL: http://www.tesla.com/autopilot/.
[25] H. Tian et al. “MOSAT: finding safety violations of autonomous driving systems using multi-objective genetic algorithm”. In: FSE. 2022. DOI: 10.1145/3540250.
3549100.
[26] Y. Tian et al. “Deeptest: Automated testing of deep-neural-network-driven autonomous cars”. In: ICSE. 2018. DOI: 10.1145/3180155.3180220.
[27] Uber. 2023. URL: https://t.ly/0K7H5.
[28] T. Verge. Food delivery drone lands on power lines resulting in power outage for thousands. 2022. URL: https://t.ly/BmDBn.
[29] Waymo. 2023. URL: http://www.waymo.com/.
[30] X. Yang et al. “A framework for identification and validation of affine hybrid automata from input-output traces”. In: ACM Transactions on Cyber-Physical Systems
(TCPS) 6.2 (2022).
[31] X. Zheng et al. “Braceassertion: Runtime verification of cyber-physical systems”. In: 2015 IEEE 12th International Conference on Mobile Ad Hoc and Sensor Systems.
IEEE. 2015.
[32] X. Zheng et al. “Perceptions on the state of the art in verification and validation in cyber-physical systems”. In: IEEE Systems Journal 11.4 (2015).
[33] Zipline. 2023. URL: http://www.flyzipline.com/.