Seminars

NO.252 Exploring Responsible AI through Security and Privacy

Shonan Village Center

October 19 - 23, 2026 (Check-in: October 18, 2026 )

Organizers

  • Jun Sakuma
    • Institute of Science Tokyo, Japan
  • Reza Shokri
    • National University of Singapore, Singapore
  • Florian Tramèr
    • ETH Zurich, Switzerland
  • Amin Karbasi
    • Cisco, USA

Overview

Description of the Meeting

The evolution of artificial intelligence (AI) is bringing about innovation in our lives and industries, and its use in a wide range of fields is accelerating. The rise of large-scale language models (LLMs) and more advanced generative AI is further accelerating the use of AI in a diverse range of fields. These technologies are demonstrating unprecedented levels of ability in areas such as natural language processing, image generation, and decision support, and their use in many fields is progressing rapidly. However, with this evolution, the importance of security and privacy has increased more than ever before.

AI systems demonstrate their performance by handling vast amounts of data, but at the same time, the risk of data leakage and unauthorized use increases, and if these issues are ignored, trust in AI technology may be undermined. Large-scale language models and generative AI have the ability to learn vast amounts of data and generate new content from that data. This characteristic increases the risk of unintentional leakage of personal information and confidential data. In addition, the possibility of the output of these models being misused in unintended ways and the risk of the models themselves being attacked and used for illicit purposes cannot be ignored.

In addition, the increasing complexity of AI often leads to unintended vulnerabilities. This increases the risk of unauthorized manipulation. At the same time, there is a growing need to protect data that includes personal information and to use data in an ethical and transparent manner, and the issue of balancing security and privacy is becoming an increasingly important issue in the environment surrounding AI.

In the last decade, research and development in the fields of AI security and privacy has made great progress. Differential privacy, which reduces the impact of specific data on the learning process, and federated learning, which trains models in a distributed environment without centralizing data on a central server, are widely applied as privacy preserving technologies in AI. Also, as the development of attack techniques against AI, such as adversarial samples, data poisoning, and membership inference, has progressed, the vulnerabilities of AI models have become clearer, and defensive technologies to counter these have also made significant progress. On the other hand, current defense and privacy protection technologies are not sufficiently resistant to increasingly sophisticated attacks. Furthermore, security and privacy issues in large-scale language models and generative AI, which are rapidly developing, are still in the early stages and require further research and development.

AI Security and Privacy is recognized as an important research area that attracts the interest of many researchers, but the community is divided into various sub-communities, such as machine learning (NeurIPS, ICLR, etc.), security (ACM CCS, IEEE Security & Privacy, etc.), computer vision (CVPR, ICCV, etc.), and natural language processing (ACL, EMNLP, etc.), and is scattered around the world, especially in East Asia, Europe, and North America. Therefore, there is a need for collaborative efforts to bring together the community and create synergies between different directions of AI Security and Privacy.

At the Shonan Meeting, we will work to develop new solutions for the following issues in AI Security and Privacy:

1. Privacy issues in training data and generated content

Deep learning models may indirectly remember the data used for training. Using such models creates risks such as privacy data leakage, training data identification through membership inference, and attribute estimation through model inversion. In addition, for generative models, we also need to consider the risk of regenerating personal information in the generated content.

2. Utilization of privacy protection technology in large-scale models

Although many privacy protection technologies such as differential privacy, federated learning, and secure computation are theoretically maturing, they are not immediately applicable to practical-scale models such as large-scale language models. There remain various issues such as performance degradation, computational cost, and difficulty of implementation.

3. Defense technology against adversarial attacks

Compared to the development of research on attack techniques such as adversarial samples, poisoning, and backdoors in various situations, the technologies for identifying, detecting, making robust, and invalidating them are not sufficiently mature. A defense strategy that is sustainable over the long term and can respond to a wide range of attacks is needed. Furthermore, the risk of adversarial attacks on models that interact with users in complex ways, such as large-scale language models, has not yet been fully elucidated.

4. Alignment of large-scale models and a basic understanding of security and privacy risks

It is known that large-scale language models and base models of a certain size or larger can acquire the ability to solve general-purpose tasks, but they do not always generate content that is in line with human expectations. In order to mitigate security and privacy risks in content generation using generative models, alignment of generative models is necessary.
The seminar we propose will focus on the above issues. The following technical topics are expected to be of particular interest at the seminar:

1. Red-teaming for large-scale models
Red-teaming is a method for detecting vulnerabilities and potential risks and proposing improvement measures by conducting a large number of adversarial attacks on the target model from multiple perspectives. Red-teaming large-scale generative models with billions to trillions of parameters presents the following difficulties:
(1) Due to the complexity of their behavior, large-scale models have a wide variety of vulnerabilities. How can we discover and evaluate unknown vulnerabilities that attackers might exploit?
(2) It is difficult to comprehensively test for vulnerabilities and risks in large-scale models from the perspective of computational cost. How can we efficiently discover vulnerabilities and risks in large-scale generative models without spending a lot of computing resources?
(3) Unlike discriminative models, the interaction between models and data in large-scale generative models is complex and diverse, and the threat model can also become exponentially complex. How can we efficiently evaluate vulnerabilities with limited resources?

2. Privacy-Preserving Training Methods for Large Language Models
This topic explores advanced techniques for maintaining privacy during the training of large language models while preserving their effectiveness:
(1) How can we scale differential privacy techniques to massive language models without significantly degrading model performance?
(2) What are the theoretical bounds and practical limitations of federated learning when applied to foundation models?
(3) How can we develop new privacy-preserving training architectures that balance the trade-off between model utility and privacy guarantees?
(4) What methods can be developed to verify and audit the privacy preservation of training procedures in large-scale deployments?

3. Adversarial Defense Strategies for Generative AI
This section focuses on developing robust defense mechanisms against emerging threats to generative AI systems:
(1) How can we design detection mechanisms for identifying manipulated or poisoned training data in real-time?
(2) What techniques can be developed to make generative models inherently robust against prompt injection and jailbreaking attempts?
(3) How can we implement efficient verification methods to ensure the authenticity and safety of generated content?
(4) What are effective strategies for maintaining model robustness while preserving generation quality and creative capabilities?

4. AI Alignment and Safety Mechanisms
This topic addresses the fundamental challenges of ensuring AI systems behave in accordance with human values and safety requirements:
(1) How can we develop scalable methods for value alignment that work effectively with increasingly powerful language models?
(2) What techniques can be employed to detect and prevent harmful or unethical outputs while maintaining model utility?
(3) How can we implement transparent and verifiable safety bounds in generative AI systems?
(4) What methodologies can be developed for continuous monitoring and adjustment of alignment in deployed systems?

The workshop will facilitate collaborative discussions and research initiatives around these critical areas, with the goal of advancing the state-of-the-art in AI security and privacy. Participants will be encouraged to:

Share recent research findings and ongoing work in these areas
Form working groups to address specific challenges identified during the workshop
Develop new research directions and potential solutions
Establish collaborative networks across different sub-communities
Create frameworks for evaluating and comparing different approaches

Expected Outcomes:
Joint research papers addressing key challenges in AI security and privacy
New methodologies for securing and evaluating large-scale AI systems
Establishment of benchmark datasets and evaluation metrics
Formation of ongoing research collaborations across institutions
Development of best practices and guidelines for secure AI development

The workshop will conclude with a session dedicated to planning future collaborative efforts and establishing concrete next steps for advancing the field of AI security and privacy.

Reasons for the proposal by the four organizers:
AI security and privacy are composed of various aspects. Sakuma will provide a discussion from the perspective of offensive security and cryptographic aspects of AI models, Shokri will provide a discussion from the perspective of privacy aspects of AI models, Tramer will provide a discussion from the perspective of security and privacy of language models and infrastructure models , and Karbasi from the perspective of robust optimization and optimization theory for security and privacy. We thought that the four of them would be appropriate for making up a multifaceted discussion at the Shonan meeting. In addition, in order to ensure regional diversity among participants and motivate high-quality participants to attend the meeting, four organizers are appointed, one each from North America, Europe, East Asia, and Southeast Asia.