NO.176 Foundation Models and Software Engineering: Challenges and Opportunities

March 25 - 28, 2024 (Check-in: March 24, 2024 )

Organizers

Zhen Ming (Jack) Jiang
- York University, Canada
Ahmed E. Hassan
- Queen's University, Canada
Yasutaka Kamei
- Kyushu University, Japan

Overview

Description of the meeting

Generative AI is an AI technology which is capable of generating text, images, or other media, using generative models [1]. According to Gartner’s Hypecycle, the generative AI is currently on the peak of the inflated expectations on emerging technologies [21]. Its market is expected to have an explosive growth in the next ten years with a compound annual growth rate (CAGR) of 40% from $40 billion in 2022 to $1.3 trillion in 2033 [2]. This is mainly driven by the increasing rate of adoptions for Foundation Models (FM), e.g., Large Language Models, and their associated use cases (e.g., OpenAI’s ChatGPT and GitHub’s Copilot).

FMs are trained on a huge set of data and can be adapted (e.g., fine-tuned) to a wide range of downstream tasks [3]. Currently the most popular type of FM is Large Language Models (LLMs), which is trained on large corpus of unlabelled text data. Although these LLMs can perform various tasks from language translation to code generation, there are a few serious drawbacks which prevent FMs to be used alone in a general-purpose framework for much wider usage contexts. Instead of simply waiting for FMs to resolve these issues and become general purpose, current industry and academic researchers have come up with various innovative engineering solutions/frameworks to mitigate these issues. Below we describe a few such examples:

Grounding: Hallucinations refer to the problem of LLMs sometimes generating texts that is incorrect or purely fictional [4]. To mitigate this issue, it is recommended to incorporate LLMs with some domain specific knowledge databases, such as a vector database [7] or a Graph [8], so that the answers can be semantically searched within these right contexts.
External Tool Uses: Benchmarking studies have shown that LLMs are not good at logical reasoning tasks such as mathematical calculations [5] and logical inference [9]. On the contrary, traditional software applications are coded based on rules and excel in reasoning. Hence, various solutions have enabled LLMs to invoke external tools [11] or third-party plugins [13].
Prompt Engineering: Interacting LLMs are quite different from traditional ML models or classic software applications. Users interact with the LLMs through natural language like instructions, called prompts. However, LLMs may output incorrect or suboptimal results due to misunderstanding of the intentions behind such prompts. Hence, a new discipline, called prompt engineering [12], has emerged. The discipline focuses on developing tools and techniques to optimize the use of prompts for LLMs to accomplish a variety of tasks.

In addition to the aforementioned engineering techniques, various engineering frameworks (e.g., Langchain [18], HuggingGPT [19], and AutoGPT [20]) have been developed to facilitate the engineering of better FM-powered applications. These applications interact with one or multiple FMs and interacts with third party tools/frameworks. Hence, they are more capable of completing more complex tasks like price matching and enterprise search.

The synergy between the software engineering and FM is just at its beginning and needs to continue throughout this new era of generative AI. This is mainly due to the following three reasons: (1) FMs are black-boxes which require researchers and practitioners to explore and experiment in various ways to uncover their emergent behaviors and limitations. On one hand, newly discovered emergent capabilities (e.g., chain-of-thought reasoning and instruction following [10]) are reported for FMs. On the other hand, through experimentation and trail-and-error, new associated risks and drawbacks will gradually be revealed and reported. (2) New types of FM models (e.g., multimodality FM models [14] and world models [15]) or domain specific FM models (e.g., FMs for Finance [16] and IT operations [17]), which are equipped with new or enhanced capabilities, are being proposed at a much faster pace with new capabilities. (3) As these FM-powered software applications are slowly moving from research labs into production, in addition to technology novelty, additional concerns (e.g., legal and trustworthy concerns, costs and efficiency, etc.) need to be properly evaluated and addressed before the launch of such products. Failure to address these concerns will result in profit loss or/and major trustworthiness issues.

This meeting will bring together leading researchers to discuss current and future trends and challenges related to FMs and software engineering, for example:

How would the software look like in the FM-era? Would legacy systems remain in their current form?
Do existing programming models (e.g., object-oriented or functional programming) remain suitable for developing and maintaining FM-powered software applications?
What roles do autonomous agents play in the development and maintenance of FM-powered applications?
What kind of release engineering practices do we need for FM-powered software applications? Are LLMOps suitable for the new types of FM models?
How do we debug and monitor FM-powered software applications?

Academic Impact

We expect the NII Shonan meeting to have lively discussions about various emerging challenges in order to identify key issues that can be solved by academics and which are of great importance to practitioners. Furthermore, by discussing with industrial participants, researchers would be able to access valuable industrial monitoring datasets, which might not be otherwise possible. We also expect the researchers to be able to identify collaborators that are suitable for the problems on which they wish to work, among the other invitees, who may or may not come from the same research community. We expect that these collaborations will push the boundaries of research with respect to FM and SE through many high impact publications. In addition, we expect to come up with an agenda on how to design and teach courses in the area of FM and SE.

Industrial Impact

Gartner places Generative AI at the peak of its inflated expectations in its hypecycle for emerging technologies in 2023 [21]. According to Bloomberg Intelligence, the generative AI market is expected to have an explosive growth in the next ten years a compound annual growth rate (CAGR) of 40% from $40 billion in 2022 to $1.3 trillion in 2033 [2]. Industrial participants will benefit greatly from this meeting by learning and discussing existing state-of-the-art research as well as finding suitable potential collaborators for their problems.

References:

What are foundation models? IBM Research Blog: https://research.ibm.com/blog/what-are-foundation-models.
Bloomberg Intelligence: New Report Finds That the Emerging Industry Could Grow at a CAGR of 42% Over the Next 10 Years. https://www.bloomberg.com/company/press/generative-ai-to-become-a-1-3-trillion-market-by-2032-research-finds/
On the Opportunities and Risks of Foundation Models. https://arxiv.org/pdf/2108.07258.pdf
Survey of Hallucination in Natural Language Generation. ACM Survey. 2023.
AI Language Models Are Struggling to “Get” Math. https://spectrum.ieee.org/large-language-models-math. 2022.
ChatGPT and generative AI are booming, but the costs can be extraordinary. https://www.cnbc.com/2023/03/13/chatgpt-and-generative-ai-are-booming-but-at-a-very-expensive-price.html. 2023.
What is a Vector Database? https://www.pinecone.io/learn/vector-database/
Overview of Microsoft Graph. https://learn.microsoft.com/en-us/graph/overview
The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A”. https://owainevans.github.io/reversal_curse.pdf
Emergent Abilities of Large Language Models. https://arxiv.org/abs/2206.07682.
Toolformer: Language Models Can Teach Themselves to Use Tools. https://arxiv.org/abs/2302.04761
Prompt Engineering Guide. https://www.promptingguide.ai/
ChatGPT Plugins. https://openai.com/blog/chatgpt-plugins
Bringing the world closer together with a foundational multimodal model for speech translation. https://ai.meta.com/blog/seamless-m4t/. 2023
A Path Towards Autonomous Machine Intelligence. https://openreview.net/pdf?id=BZ5a1r-kVsf. 2022.
BloombergGPT: A Large Language Model for Finance. https://arxiv.org/abs/2303.17564
OWL: A Large Language Model for IT Operations. https://arxiv.org/abs/2309.09298
https://www.langchain.com/
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face. https://arxiv.org/abs/2303.17580
https://github.com/Significant-Gravitas/AutoGPT
Gartner Places Generative AI on the Peak of Inflated Expectations on the 2023 Hype Cycle for Emerging Technologies. https://www.gartner.com/en/newsroom/press-releases/2023-08-16-gartner-places-generative-ai-on-the-peak-of-inflated-expectations-on-the-2023-hype-cycle-for-emerging-technologies

Report

No.176.pdf

Seminars

NO.176 Foundation Models and Software Engineering: Challenges and Opportunities

Organizers

Overview

Report