NO.105 Data-Driven Search-Based Software Engineering

Shonan Village Center

December 11 - 14, 2017 (Check-in: December 10, 2017 )


  • Markus Wagner
    • University of Adelaide, Australia
  • Leandro Minku
    • University of Leicester, UK
  • Ahmed E. Hassan
    • Queen’s University, Canada
  • John Clark
    • University of Sheffield, UK


Search-Based Software Engineering (SBSE) is a well-established research area that not only involves the application of search and optimisation techniques in software engineering, but also promotes rethinking and reformulation of classical software engineering problems in different ways. By doing so, it has provided

many inspirations for improving software engineering, both in terms of the engineering process as well as the software product. In particular, it has shown how difficult software engineering problems can be solved more effectively using search and optimisation algorithms [1], and how re-formulating software engineering problems as multi-objective optimisation problems can lead to better solutions as well as richer information that can be provided to software engineers [2,3].

In recent years, there has been a renewed interest in this area, driven by the need to cope with increased software size and complexity. Moreover, advances in search-based algorithms, especially in genetic programming, are now enabling successful use of search-based software engineering to automatic programming, which is an area where SBSE had been previously struggling to address. For example, there have been recent advances in terms of bug-fixing [4,7], speeding-up software execution [5] and making software more energy efficient [6]. These advances show that SBSE is able to improve non-trivial real world programs.

Meanwhile, software processes and products have been generating a wealth of various data, e.g., source change history, test cases, bug reports, operation logs, field crashes, etc. Hidden in these data is rich and valuable information about the quality of software and services and the dynamics of software development.

The software data analytics community has been achieving promising results in using such data to gain insights into several tasks, e.g. identifying what software modules are most likely to contain bugs [10, 11], estimating the amount of effort likely to be required to develop new software projects or Web applications [12, 13], determining what software changes are most likely to induce bugs [14, 15], and tracking how the productivity of a company changes over time [16].

The availability of software data and the promising results being achieved have resulted in a steep growth of the software data analytics community. This is very well illustrated by the Working Conference on Mining Software Repositories (MSR), which observed an increase in the number of submissions from around 40 to more than 180 from 2004 to 2015. Companies (Microsoft, Google, Facebook, Cisco, Yahoo, IBM, RIM, etc.) are also increasingly adding analytics as an important role in their organizations, leveraging the wealth of various data produced around their software or services.

Such wealth of data also has the potential to guide the search and optimization process in SBSE towards promising solutions considering the specific environment where the software process or product operates. It has the potential to take SBSE to yet another level – that of creating context-aware solutions. Nevertheless, very few data-driven SBSE approaches have been proposed so far, e.g., [8,9]. Moreover, the software data analytics and SBSE communities are considerably disjoint. With that in mind, this proposed NII Shonan Meeting aims at getting these two communities together in order to discuss software engineering problems that can benefit from the integration of software data analytics and SBSE, and potential ways how to combine these two areas. We expect this meeting to identify the key challenges and opportunities in integrating software data analytics with SBSE, and to form new research collaborations among members of these communities. Ultimately, this will push the boundaries of research in both software data analytics and SBSE, helping to consolidate the area of data-driven SBSE. We will also encourage discussions on the topics of software data analytics and SBSE themselves, as improvements in these areas are also necessary for advancing data-driven SBSE.


[1] M. Harman and B. F. Jones, Search-based software engineering. Information and Software Technology, 43:833-839, 2001.

[2] K. Praditwong, M. Harman and X. Yao, Software Module Clustering as a Multi-Objective Search Problem. IEEE Transactions on Software Engineering, 37(2):264-282, 2011.

[3] Z. Wang, K. Tang and X. Yao, Multi-objective Approaches to Optimal Testing Resource Allocation in Modular Software Systems. IEEE Transactions on Reliability, 59(3):563-575, 2010.

[4] A. Arcuri and X. Yao, “A novel co-evolutionary approach to automatic software bug fixing,” Proceedings of the 2008 IEEE Congress on Evolutionary Computation (CEC2008), (Piscataway, NJ), pp. 162-168, IEEE Press, 2008.

[5] W. B. Langdon and M. Harman. Optimising existing software with genetic programming. IEEE Transactions on Evolutionary Computation (TEVC), 2014

[6] Mario Linares-Vásquez, Gabriele Bavota, Carlos BernalCárdenas, Rocco Oliveto, Massimiliano Di Penta, Denys Poshyvanyk. Optimizing Energy Consumption of GUIs in Android Apps: A Multi-objective Approach Proceedings of the 10th Joint Meeting on Foundations of Software Engineering, pp. 143-154, 2015.

[7] C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer. GenProg: A generic method for automatic software repair. IEEE Transactions on Software Engineering, 38(1):54–72, 2012.

[8] Shin Yoo, Amortised Optimisation of Non-Functional Property in Production Environment. Proceedings of the Symposium on Search-Based Software Engineering (SSBSE), 2015.

[9] Harman et al. Genetic Improvement for Adaptive Software Engineering. 9th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, 2014.

[10] Hall, T., Beecham, S., Bowes, D., Gray, D., Counsell, S.: A systematic literature review on fault prediction performance in software engineering. IEEE Transaction on Software Engineering 38(6), 1276–1304, 2012.

[11] Menzies, T., Milton, Z., Turhan, B., Cukic, B., Jiang, Y., Bener, A.: Defect prediction from static code features: current results, limitations, new approaches. Automated Software Engineering 17(4), 375–407, 2010.

[12] Dejaeger, K., Verbeke, W., Martens, D., Baesens, B.: Data mining techniques for software effort estimation: a comparative study. IEEE Transaction on Software Engineering, 38(2), 375–397, 2012.

[13] Mendes, E., Mosley, N.: Web Engineering. Springer Science & Business Media, New York, 2006.

[14] An, L., Khomh, F.: An empirical study of crash-inducing commits in mozilla firefox. In: Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE), pp. 5.1–5.10, 2015.

[15] Kamei, Y., Shihab, E., Adams, B., Hassan, A., Mockus, A., Sinha, A., Ubayashi, N.: A large-scale empirical study of just-in-time quality assurance. IEEE Transaction on Software Engineering, 39(6), 757–773, 2013.

[16] Minku, L., Yao, X.: How to make best use of cross-company data in software effort estimation? In: Proceedings of the 36th International Conference on Software Engineering, pp. 446–456, 2014.