NO.244 AR and AI: Everyday AR through AI-in-the-Loop
June 8 - 11, 2026 (Check-in: June 7, 2026 )
Organizers
- Ryo Suzuki
- University of Colorado Boulder, USA / Tohoku University, Japan
- Mar Gonzalez-Franco
- Google, USA
- Misha Sra
- University of California, Santa Barbara, USA
- David Lindlbauer
- Carnegie Mellon University, USA
Overview
Meeting Description
Introduction
Augmented Reality (AR) technologies are continuously advancing, both in hardware and software. Smaller form factors, extended battery life, and constant connectivity are driving a paradigm shift in how we perceive AR. While current AR scenarios focus on specialized applications, such as productivity or maintenance [13, 12], we believe that everyday AR is becoming increasingly feasible. By ”everyday AR,” we refer to making AR always available to users, enabling seamless interactions with the digital world, and potentially replacing or augmenting technologies such as smartphones and desktop computing. Users will be able to ubiquitously query and interact with digital information, communicate with other users and virtual agents [4], and rely on AR for a majority of their interactions.
The rise of everyday AR is fueled not only by advancements in hardware and software infrastructure but also by recent breakthroughs in Artificial Intelligence (AI) and Machine Learning (ML). Generative AI techniques [14], for instance, allow the creation of multimodal digital content on-the-fly, while Large Language Models [5] enable natural, text-based, and multimodal interactions with virtual agents. Leveraging these advances will empower researchers and practitioners to design novel and refined AR interactions. To make everyday AR a reality, we propose adopting an AI-in-the-loop approach, where digital interactions and content continuously anticipate and adapt to users’ ever-changing needs and contexts. This combination of AR and AI can help move beyond monolithic AR applications, creating truly user-centered and adaptive experiences where both scene understanding and content generation are dynamic.
The goal of this workshop—an evolution of our UIST 2023 [26] and upcoming CHI 2025 [27] workshops (https://xr-and-ai.github.io/)—is to bridge the fields of AR and AI while discussing the feasibility and requirements for AI-enabled everyday AR. Our previous workshops primarily focused on cultivating a community in HCI fields through short, one-day events. In contrast, this Shonan workshop will center on 4-day long in-depth discussions to establish grand challenges that will guide the future of AR and AI research. This workshop represents a natural progression—not only to foster collaboration but also to define a way forward for the community. As a tangible outcome, we aim to produce a research vision paper based on the discussions, which will be submitted to a future prestigious venue like ACM CHI. This reference paper will consolidate research efforts and set a clear trajectory for advancing AR and AI integration.
Topics for Discussion and Perspectives of Interest
This workshop welcomes researchers and practitioners in AR, AI, machine learning, and computational interaction domains to share diverse perspectives and expertise. There are several domains that are not fully explored yet in the literature of XR and AI [19]. We plan to discuss the topics that include but are not limited to the following areas:
Adaptive and Context-Aware AR: Unlike traditional interfaces, AR embeds virtual elements into the user’s physical world. Without careful design, such interfaces can overwhelm and distract users. Context-aware AR addresses this by adapting virtual elements to users’ needs, context, and environment [23, 24, 15]. Recent work has explored blending AR interfaces into physical objects [17], using everyday objects for virtual affordances [18, 20], and applying adaptive AR to improve accessibility for people with low vision [21]. Expanding this research to broader everyday AR applications, advanced AI can enhance systems’ understanding of contextual cues like room geometry, object affordances, and user activities, enabling more seamless and adaptive experiences.
Always-on AI Assistant Integrating LLMs and AR: Large Language Models (LLMs) have the potential to transform AR experiences. Unlike traditional interfaces requiring explicit interaction, integrating LLMs into AR enables always-on AI assistants that seamlessly support users in their environments. This integration moves beyond typing on screens, facilitating intuitive, multimodal interactions with digital content. Recent work has shown how AR can enhance object intelligence [9], enable mixed-reality document enhancements [16], and support gaze and gesture-based interactions [22]. We aim to explore how LLMs’ multimodal capabilities can leverage AR’s unique visual, tangible, and spatial modalities to enable richer, more adaptive experiences.
AI-Assisted Task Guidance in AR: AI-assisted task guidance surpasses traditional video-based instruction [6] by offering real-time, personalized feedback on movements and activities. This interactive approach can accelerate learning, enhance performance, and reduce bad habits through tailored guidance on form, timing, and technique. Beyond full-body activities like fitness or sports training, we are interested in manual tasks such as cooking, repair, or assembly. Additionally, AI systems can adapt to user progress, increasing task complexity and providing encouragement, creating a more engaging learning experience compared to passive video instruction.
AI-in-the-Loop On-Demand AR Content Creation: Generative AI has the potential to revolutionize AR content creation by enabling real-time, on-demand generation. Advances in image generation, such as GANs and Stable Diffusion, now support content creation, 3D and avatar animation [2], and immersive environment design [11]. Beyond static images, generative AI can create interactive AR content through LLM-based code generation [8, 1], and make static objects interactive in AR [7]. We are particularly interested in leveraging AR’s unique physical and spatial aspects—such as 3D scene understanding and spatial interactions—to generate diverse content, including text, visuals, motion, video, and 3D objects. We aim to explore the challenges and opportunities of integrating these capabilities into AR environments for richer, more dynamic experiences.
AI for Accessible AR Design: AI has the potential to make AR design and experiences more inclusive, addressing barriers faced by individuals with hand-motor impairments. Currently, traditional input methods like bimanual keyboard-mouse setups for XR design tools (e.g., Unity) and handheld controllers for XR experiences exclude many users. To overcome these limitations, we propose exploring AI-driven input modalities such as voice commands, gaze tracking, facial expression recognition [28], and subtle body movement interpretation. Additionally, AI can adapt content difficulty, interfaces, and narratives to individual abilities, creating more accessible and personalized experiences. This approach not only broadens access but also innovates interaction paradigms, potentially transforming how all users engage with virtual worlds.
Real-World-Oriented AI Agents: We are also interested in exploring how we can design AI agents that enhance our physical world. The future of AI agents should bridge the gap between the digital and physical worlds by understanding spatial relationships, object affordances, and user activities. For example, such AI agents could assist users in navigating their environment, supporting complex everyday tasks, or providing personalized assistance. Recent work has shown promising developments in creating such agents that guide and visualize users’ attention through mixed reality avatars [29]. We aim to discuss the possibilities of designing AI agents for AR that are aware of and responsive to the real-world context.
Plans to Publish Workshop Outcomes and Participants’ Background
We plan to publish the workshop outcomes as grand challenges papers, similar to prior works [25, 10, 3], focusing on the intersection of AR and AI to highlight key findings and establish a roadmap for future research. This will provide a foundation for advancing AI-driven AR experiences and addressing the challenges and opportunities in this emerging field. We aim for the workshop to act as a catalyst for building a collaborative and interdisciplinary community around AR and AI. To achieve this, we invite participants from diverse cultural and professional backgrounds, with expertise spanning AR, AI, HCI, computer vision, and cognitive science. This diversity will enrich discussions by offering unique perspectives and fostering innovative ideas to shape the future of AI-enabled AR.
References
[1] Setareh Aghel Manesh, Tianyi Zhang, Yuki Onishi, Kotaro Hara, Scott Bateman, Jiannan Li, and Anthony Tang. How people prompt generative ai to create interactive vr scenes. In Proceedings of the 2024 ACM Designing Interactive Systems Conference, pages 2319–2340, 2024.
[2] Karan Ahuja, Eyal Ofek, Mar Gonzalez-Franco, Christian Holz, and Andrew D Wilson. Coolmoves: User motion accentuation in virtual reality. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 5(2):1–23, 2021.
[3] Jason Alexander, Anne Roudaut, J ¨ urgen Steimle, Kasper Hornbæk, Miguel Bruns Alonso, Sean Follmer, and Timothy Merritt. Grand challenges in shape-changing interface research. In Proceedings of the 2018 CHI conference on human factors in computing systems, pages 1–14, 2018.
[4] Riccardo Bovo, Steven Abreu, Karan Ahuja, Eric J Gonzalez, Li-Te Cheng, and Mar Gonzalez Franco. Embardiment: an embodied ai agent for productivity in xr. arXiv preprint arXiv:2408.08158, 2024.
[5] Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language models are few-shot learners. arXiv [cs.CL], 28 May 2020.
[6] Sonia Castelo, Joao Rulff, Erin McGowan, Bea Steers, Guande Wu, Shaoyu Chen, Iran Roman, Roque Lopez, Ethan Brewer, Chen Zhao, et al. Argus: Visualization of ai-assisted task guidance in ar. IEEE Transactions on Visualization and Computer Graphics, 2023.
[7] Neil Chulpongsatorn, Mille Skovhus Lunding, Nishan Soni, and Ryo Suzuki. Augmented math: Authoring ar-based explorable explanations by augmenting static math textbooks. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pages 1–16, 2023.
[8] Fernanda De La Torre, Cathy Mengying Fang, Han Huang, Andrzej Banburski-Fahey, Judith Amores Fernandez, and Jaron Lanier. Llmr: Real-time prompting of interactive worlds using large language models. In Proceedings of the CHI Conference on Human Factors in Computing Systems, pages 1–22, 2024.
[9] Mustafa Doga Dogan, Eric J Gonzalez, Andrea Colaco, Karan Ahuja, Ruofei Du, Johnny Lee, Mar Gonzalez-Franco, and David Kim. Augmented object intelligence: Making the analog world interactable with xr-objects. arXiv preprint arXiv:2404.13274, 2024.
[10] Barrett Ens, Benjamin Bach, Maxime Cordeil, Ulrich Engelke, Marcos Serrano, Wesley Willett, Arnaud Prouzeau, Christoph Anthes, Wolfgang B¨ uschel, Cody Dunne, et al. Grand challenges in immersive analytics. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–17, 2021.
[11] Ran Gal, Lior Shapira, Eyal Ofek, and Pushmeet Kohli. Flare: Fast layout for augmented reality applications. In 2014 IEEE international symposium on mixed and augmented reality (ISMAR), pages 207–212. IEEE, 2014.
[12] Mar Gonzalez-Franco, Julio Cermeron, Katie Li, Rodrigo Pizarro, Jacob Thorn, Windo Hutabarat, Ashutosh Tiwari, and Pablo Bermell-Garcia. Immersive augmented reality training for complex manufacturing scenarios. arXiv preprint arXiv:1602.01944, 2016.
[13] Mar Gonzalez-Franco and Andrea Colaco. Guidelines for productivity in virtual reality. Interactions, 31(3):46–53, 2024.
[14] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. Advances in Neural Information Processing Systems, 27, 2014.
[15] Jens Grubert, Tobias Langlotz, Stefanie Zollmann, and Holger Regenbrecht. Towards pervasive augmented reality: Context-awareness in augmented reality. IEEE Transactions on Visualization and Computer Graphics, 23(6):1706–1724, 2017.
[16] Aditya Gunturu, Shivesh Jadon, Nandi Zhang, Morteza Faraji, Jarin Thundathil, Tafreed Ahmad, Wesley Willett, and Ryo Suzuki. Realitysummary: Exploring on-demand mixed reality text summarization and question answering using large language models. arXiv preprint arXiv:2405.18620, 2024.
[17] Violet Yinuo Han, Hyunsung Cho, Kiyosu Maeda, Alexandra Ion, and David Lindlbauer. Blendmr: A computational method to create ambient mixed reality interfaces. Proceedings of the ACM on Human-Computer Interaction, 7(ISS):217–241, 2023.
[18] Fengming He, Xiyun Hu, Jingyu Shi, Xun Qian, Tianyi Wang, and Karthik Ramani. Ubi edge: Authoring edge-based opportunistic tangible user interfaces in augmented reality. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pages 1–14, 2023.
[19] Teresa Hirzle, Florian M¨ uller, Fiona Draxler, Martin Schmitz, Pascal Knierim, and Kasper Hornbæk. When xr and ai meet-a scoping review on extended reality and artificial intelligence. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pages 1–45, 2023.
[20] Rahul Jain, Jingyu Shi, Runlin Duan, Zhengzhe Zhu, Xun Qian, and Karthik Ramani. Ubi touch: Ubiquitous tangible object utilization through consistent hand-object interaction in augmented reality. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pages 1–18, 2023.
[21] Jaewook Lee, Andrew D Tjahjadi, Jiho Kim, Junpu Yu, Minji Park, Jiawen Zhang, Jon E Froehlich, Yapeng Tian, and Yuhang Zhao. Cookar: Affordance augmentations in wearable ar to support kitchen tool interactions for people with low vision. arXiv preprint arXiv:2407.13515, 2024.
[22] Jaewook Lee, Jun Wang, Elizabeth Brown, Liam Chu, Sebastian S Rodriguez, and Jon E Froehlich. Gazepointar: A context-aware multimodal voice assistant for pronoun disambiguation in wearable augmented reality. In Proceedings of the CHI Conference on Human Factors in Computing Systems, pages 1–20, 2024.
[23] David Lindlbauer. The future of mixed reality is adaptive. XRDS: Crossroads, The ACM Magazine for Students, 29(1):26–31, 2022.
[24] David Lindlbauer, Anna Maria Feit, and Otmar Hilliges. Context-aware online adaptation of mixed reality interfaces. In Proceedings of the 32nd annual ACM symposium on user interface software and technology, pages 147–160, 2019.
[25] Florian Floyd Mueller, Pedro Lopes, Paul Strohmeier, Wendy Ju, Caitlyn Seim, Martin Weigel, Suranga Nanayakkara, Marianna Obrist, Zhuying Li, Joseph Delfa, et al. Next steps for human-computer integration. In Proceedings of the 2020 CHI Conference on human factors in computing systems, pages 1–15, 2020.
[26] Ryo Suzuki, Mar Gonzalez-Franco, Misha Sra, and David Lindlbauer. Xr and ai: Ai-enabled virtual, augmented, and mixed reality. In Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, UIST ’23 Adjunct, New York, NY, USA, 2023. Association for Computing Machinery.
[27] Ryo Suzuki, Mar Gonzalez-Franco, Misha Sra, and David Lindlbauer. Everyday ar through ai-in-the-loop. In Extended Abstracts of the 2025 CHI Conference on Human Factors in Computing Systems, CHI ’25 Extended Abstracts, New York, NY, USA, 2025. Association for Computing Machinery.
[28] Atieh Taheri, Ziv Weissman, and Misha Sra. Exploratory design of a hands-free video game controller for a quadriplegic individual. In Proceedings of the Augmented Humans International Conference 2021, pages 131–140, 2021.
[29] Santawat Thanyadit, Matthias Heintz, and Effie LC Law. Tutor in-sight: Guiding and visualizing students’ attention with mixed reality avatar presentation tools. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pages 1–20, 2023.