NO.255 The Future of Refactoring in the Era of Generative Artificial Intelligence

November 16 - 19, 2026 (Check-in: November 15, 2026 )

Organizers

Toufique Ahmed,
- IBM Research, USA
Danny Dig
- University of Colorado, Boulder and JetBrains Research, USA
Shinpei Hayashi,
- Institute of Science Tokyo, Japan
Ying Zou
- Queen’s University, Canada

Overview

Description of the Meeting:

Refactorings are source-to-source program transformations that programmers carry to improve the design of existing code and ensure the longevity and health of code bases. Among the hundreds of kinds of refactorings, the most well-known are renaming program elements for clarity, extracting methods to reduce complexity and promote reuse, and moving classes or methods to improve code modularity. The refactoring technology has gained tremendous momentum both in academia – with more than 5,000 research papers published on refactoring in the last 2 decades, and also in industry. Refactoring is very popular with programmers who practice Agile Software Development, and all of its nuances; these developers constantly carry out refactorings to control the level of code entropy and prevent the internal code quality from decaying. The modern Integrated Development Environments (IDEs) such as IntelliJ IDEA [1] that are used daily by millions of developers, support refactoring in the top-level menu, along with File and Edit – this shows that refactoring is widely appreciated in the practice of Software Engineering. Research tools such as RefactoringMiner [2] pinpoint thousands of refactorings that opensource developers applied in the history of their projects. Moreover, empirical studies [3][4] on the practice of refactoring in real-world software systems, identify consistent refactorings during code development interspersed with intensive, short-burst refactoring activities.

With the advancement in the generative artificial intelligence (GenAI), Large Language Models (LLMs), such as Codex [5], GPT-4 [6], and StarCoder [7], accelerate the automatic generation of immense amounts of code. LLMs are trained using vast opensource code bases which could have poor quality code and lack focus on specific domains. Therefore, if the training code has flaws, the generated code could suffer from poor design and introduce severe technical debt. Automatic code refactoring is even more needed now. Recent research shows promising results when LLMs are prompted with the code to be refactored. However, the refactorings generated by LLMs have a high rate of hallucinations: up to 80% of these introduce syntax or semantic errors. Moreover, these refactoring recommendations cannot pass automated test cases because LLMs have limited knowledge and lack awareness of the context of the code to be refactored.

Questions to be Studied:

In the proposed Shonan Meeting, we will invite researchers and practitioners who share similar interests on refactoring, machine learning and GenAI to exchange ideas to answer the following questions:

How can we increase the quality of the immense amounts of automatically generated code through refactoring? Do we need new code refactoring tools and detection approaches?

What are new strategies to refactor AI generated code? Can the traditional refactoring strategies for human-written code be applied for the AI generated code? How can autonomous agents be employed for refactoring?

What are the requirements for the code refactoring tools to handle AI generated code?

How can developers be involved in the process of refactoring AI generated code?
What are the challenges for integrating LLM-based refactoring within IDEs? How can IDEs leverage the LLMs to support automatic refactoring? How can LLMs leverage the safety and raw power of IDEs?

What are the strategies for detecting and reducing LLM hallucinations in code refactoring?

How can we test the AI generated code as well as the AI refactored code?
What is the future of refactoring moving beyond code in the next 5 to 10 years?

Potential Impact of the Meeting

We will invite participants, who are world renowned experts from muti-disciplines, including software engineering, IDE development from various companies (e.g., Google Android Studio, IBM Eclipse IDE, and JetBrains), artificial intelligence, trustworthy AI systems, explainable AI, as well as human and social aspects software engineering, to produce guidelines for the future direction of refactoring. We expect the NII Shonan Meeting to have lively discussions about various emerging challenges and future research directions on refactoring in the era of GenAI. Furthermore, we expect to identify the key issues that can be solved by academics working closely with industry researchers, and which are of great importance to practitioners. By being in close proximity to industrial participants, researchers would be able to understand pain points and emerging solutions from the practitioners’ perspectives. The participants will be able to identify collaborators who may or may not come from the same research community. We expect that these collaborations will promote research synergies with respect to software engineering and GenAI through many high impact publications. In addition, we expect to come up with an agenda on how to design and teach courses in GenAI powered refactoring as well as integrating GenAI as programming assistants. While refactoring was transformative in the previous era of code development, we believe that the stakes and impact can be even greater in the GenAI era.

We will summarize the talks and discussions during the meeting and prepare a paper submission to the Communications of NII Shonan Meetings by Springer. Moreover, we plan to propose a special issue on the topic of refactoring in the era of GenAI with the ACM Transactions on Software Engineering and Methodology (TOSEM). If the proposal is approved by TOSEM, we will encourage the meeting participants to submit a research paper and also have an open call for papers to welcome the interested researchers from the research community to submit their work to the special issue.

References

https://www.jetbrains.com/idea/
Nikolaos Tsantalis, Ameya Ketkar, and Danny Dig. 2022. RefactoringMiner 2.0. IEEE Transactions On Software Engineering, 48, 3 (2022), 930 - 950.

Shayan Noei, Heng Li, Stefanos Georgiou, and Ying Zou. 2023. An Empirical Study of Refactoring Rhythms and Tactics in the Software Development Process. IEEE Transactions on Software Engineering 49, 12 (2023). 5103–5119.

Leonardo Sousa, Willian Oizumi, Alessandro Garcia, Anderson Oliveira, Diego Cedrim, and Carlos Lucena. 2020. When Are Smells Indicators of Architectural Refactoring Opportunities: A Study of 50 Software Projects. In Proceedings of the 28th International Conference on Program Comprehension. 354–365.

Mark Chen et al.. Evaluating Large Language Models Trained on Code. 2021. https://huggingface.co/papers/2107.03374

OpenAI. 2023. GPT-4 Technical Report. https://cdn.openai.com/papers/gpt-4.pdf
Anton Lozhkov et al.. 2024. Starcoder 2 and the Stack v2: The Next Generation. https://arxiv.org/abs/2402.19173

Seminars