No.053 Computational Intelligence for Software Engineering

Icon

NII Shonan Meeting Seminar 053

Talks

“Searching Software Design Spaces Using Genetic Programming”, by Bradley Alexander

This talk summarises a small number of projects that I have worked on that exploit heuristic search to produce human competitive software artefacts. These projects range from the automatic construction of rules defining the core of an optimising compiler; the discovery of non-trivial recurrences from partial call trees; the evolution of navigation functions for robotic control; and the search for patches for software repair.

A common thread through all this research has been aggressive reduction in the search space through the target representations used and exploiting various mechanisms to keep the search focused on feasible solutions where possible. I conclude the talk with proposals to combine recently produced frameworks to further focus search in automated software repair.

 

“Computational Stupid: Language-Independent Program Slicing”, by Jens?Krinke

Current slicing techniques cannot handle systems written in multiple programming languages because it requires knowledge on the semantic of all used languages and their interaction. ?Observation-Based Slicing (ORBS) is a language-independent slicing technique capable of slicing multi-language systems, including systems which contain (third party) binary components. A potential slice obtained through repeated statement deletion is validated by observing the behaviour of the program: if the slice and original program behave the same under the slicing criterion, the deletion is accepted. The resulting slice is similar to a dynamic slice. We evaluate five variants of ORBS on ten programs of different sizes and languages showing that it is less expensive than similar existing techniques. We also evaluate it on bash and four other systems to demonstrate feasible large-scale operation in which a parallelised ORBS needs up to 82% less time when using four threads. The results show that an ORBS slicer is simple to construct, effective at slicing, and able to handle systems written in multiple languages without specialist analysis tools.

 

“Genetic Improvement”, by Bill Langdon

Genetic programming can optimise software. There may be many ways to trade off?resource consumption (such as time, memory, battery life) vs. functionality.?Human programmers cannot try them all. Also the best multi-objective?Pareto point?may change with time, with hardware infrastructure, with network?characteristics?and especially for individual user behaviour.?It may be GP can automatically suggest different trade offs for each?new market.?Recent results include substantial speed up by evolving a new version?of a program?customised for a special case,?medical image registration on GPUs?and even growing small grafts containing new code?(grow and graft GP, GGGP)?which can be inserted into much bigger human written code.

SlidesBillLangdon

Efficient Search based Regression Test case prioritization”, by Zheng Li

Test case prioritization (TCP) aims to improve the effectiveness of?regression testing by ordering the test cases so that the most?beneficial are executed first. TCP can be formulated into a search?problem and then varied search techniques are applied in TCP. In this?presentation, we discuss multiple objective search algorithms used in?TCP, including NSGA-II, PSO, ACO and a coevolutionary algorithm. We?illustrate the efficient problem when applied these multiple objective?algorithms in TCP, and present a GPU-based parallel fitness evaluation?and three parallel crossover computation schemes based on ordinal and?sequential representations, which can highly speed up the computation?of the evolutionary algorithms for TCP.

 

Can Big Data bring us new opportunities for software automation?”, by Hong Mei

Software automation (SA) is the process of automatically generating artifacts, especially programs, in the whole life cycle with a computer based on an informal (or formal) specification. In the histroy, there were a lot of researchers who paid efforts in this field and there were some exciting results in some areas of academic research, such as the research on automated programming. But SA is not successfully adopted by practical software development.?With the coming of Big Data era, massive software engineering data, including various large repositories of source code, are available in many software companies and over the Internet. Can the Big Data in software engineering bring? new opportunities to Software Automation? Can we develop a data-driven approach (the past research on software automation mainly focused on rule-based approach) to automatically synthesizing real large programs based on source code with rich documents and other related data in these repositories by using machine learning and statistical data analysis techniques? In this presentation, I will give a brief historical overview of Software Automation and some thoughts on Big Data based approach to software engineering, especially the software automation.?

 

Software Effort Estimation Models — Past, Present and Future”, by Leandro Minku

As the effort (e.g., person-hours, person-months) required to develop software projects is frequently?the most important factor for calculating the cost of software projects, the task of estimating effort is?of strategic importance for software companies. Due to the difficulty of this task, researchers have been investigating the use of machine learning to create software effort estimation models that can be used as decision-support tools for software engineers.?In this talk, I will briefly go through some key points in terms of the past, present?and future of the field of machine learning for creating software effort estimation models. I will go from (a) conclusion instability?to (b) ensembles and locality?to (c) the importance of performing temporal?and cross-company learning, generating insights, and obtaining a better understanding of when, why and how our?models work (or don’t work). Even though this talk is in the context of software?effort estimation models, several of these ideas are also applicable to other types?of software prediction models, such as defect predictors.

 

“Automatic Improvement Programming”, by Jerry Swan

A recent methodology for automated improvement of (traditionally human-designed) programs is Genetic Improvement Programming (GIP), which has typically used?methods inspired by Darwinian evolution to generate programs (variously in source?or binary form) that meet some user-specified quality measure. This talk will describe two recent research directions in “Automatic Improvement Programming” – an extension of GIP which features:

  • An “online adaptive” approach, as exemplified by the Gen-O-Fix monitor framework for self-improving software systems.
  • The incorporation of formal techniques from category theory, yielding a significant new direction, viz. semantics-preserving transformations that come with a guarantee of asymptotic improvement.

 

“Maximising Axiomatization Coverage and Minimizing Regression Testing Time”, by Markus Wagner

The correctness of program verification systems is of great importance, as they are used to formally prove that safety- and security-critical programs follow their specification. One of the contributing factors to the correctness of the whole verification system is the correctness of the background axiomatization, which captures the semantics of the target program language. We present a framework for the maximization of the proportion of the axiomatization that is used (“covered”) during testing of the verification tool. The diverse set of test cases found not only increases the trust in the verification system, but it can also be used to reduce the time needed for regression testing.

SlidesMarkusWagner

 

“Predicting Release Time Based on Generalized Software Reliability Model”, by Hironori Washizaki

Development environments have changed drastically, development periods?are shorter than ever and the number of team members has increased.?These changes have led to difficulties in controlling the development?activities and predicting when the development will end. We propose a?generalized software reliability model (GSRM) based on a stochastic?process, and simulate developments that include uncertainties and?dynamics. We also compare our simulation results to those of other?software reliability models. Using the values of uncertainties and?dynamics obtained from GSRM, we can evaluate the developments in a?quantitative manner. Additionally, we use equations to define the?uncertainty regarding the time required to complete a development,?and predict whether or not a development will be completed on time.

SlidesHironoriWashizaki

“Search-Based System Testing“, by Andreas Zeller

Writing tests is hard, maintaining them is even worse. ?Where symbolic reasoning is impossible because of scale or complexity, generating test cases is a scalable, fully automatic technique to detect errors. ?Search-based techniques use genetic algorithms to systematically evolve a population of inputs towards a coverage goal. They form a middle ground between the scalability and applicability of random testing and the corner-case reasoning ?of symbolic techniques. ?When applied to generate system inputs, one gets powerful test generators that can be easily adapted towards arbitrary testing goals simply by changing the fitness function.

SlidesAndreasZeller

Backtracking search techniques in software testing and analysis”, by Jian Zhang

Abstract TBC.

 

The Structure Problem in Search-Based Software Engineering”, by Lu Zhang

In many areas of software engineering, it is typical that a candidate solution has a structure, often a complex structure. In my talk, I argue that such a structure may pose an obstacle to search-based software engineering. When we search for a good solution via a meta-heuristic algorithm, such a structure may make it difficult to generate and/or synthesize new candidate solutions from existing solutions. When we search for a good solution via an existing mathematical optimization framework (such as ILP and SAT), such a structure may incur an inflation of the size of the formulated ILP or SAT problem. In my opinion, this difficulty may lie in that existing search frameworks do not allow incomplete solutions during the search. This indicates that we may need to invent new search frameworks for search-based software engineering.

SlidesLuZhang

 

“Debugging with Online Slicing and Dryrun” by Jianjun Zhao

Efficient tools are indispensable in the battle against software bugs during both development and maintenance. In this talk, we will introduce two techniques that target different phases of an interactive and iterative debugging session. To help fault diagnosis, we split the costly computation of backward slicing into online and offline, and employ incremental updates after program edits. The result is a vast reduction of slicing cost.
The possibility of running slicing in situ and with instant response time gives rise to the possibility of editing-time validation, which we call dryrun. The idea is that a pair of slices, one forward from root cause and one backward from the bug site, defines the scope to validate a fix. This localization makes it possible to invoke symbolic execution and constraint solving