No.084 Mining & Modeling Unstructured Data in Software‐Challenges for the Future


NII Shonan Meeting Seminar 084


To analyze, comprehend, and reverse engineer software projects and their software development processes, developers rely on various sources of information. Bug reports, execution logs, mailing lists, code review reports, change logs, requirements documents, and the actual source code contain implicit developer knowledge about the project and past development efforts. Most of this knowledge is captured as unstructured information, that is, natural language text used to exchange information among people.

Researchers in the Information Retrieval (IR), Data Mining (DM), and Natural Language Processing (NLP) fields have experimented with various techniques (such as, Latent Dirichlet Allocation and Vector Space Model) and ad‐hoc approaches to enable the mining of unstructured data from software artifacts. However, these techniques were not designed to work with the complexities and peculiarities of unstructured software engineering data, and thus are not readily applicable to the software engineering research domain.

The challenges for both researchers and practitioners are to determine the appropriate set of techniques to tackle the problem at hand and to understand how to use them effectively.

