No.100 Analysis and Verification of Pointer Programs

Icon

NII Shonan Meeting Seminar 100

Overview

Computer software is ubiquitous in today’s information society, and ensuring its correctness is of great importance. This is particularly true for safety-critical systems, which occur in transportation, communication, healthcare, and many other application domains. Consequently, software failures can have severe consequences. To tackle the problem of correctness and reliability of such software, formal methods are increasingly being employed. They enable the exhaustive and mathematically founded analysis of all possible behaviours of a computer program and the verification of properties such as functional correctness. They also allow to reduce the effort and, thus, the cost of testing activities. Due to their benefits, they are increasingly becoming an integral part of the development cycle of safety-critical systems.

Many software bugs can be traced back to the erroneous use of pointers, i.e., references to memory addresses. They constitute an essential concept in modern programming languages, and are used for realising (dynamic) data structures like lists, trees etc., which are organised in the computer’s memory as a so-called heap. Pointers are also abundantly present in object-oriented software such as Java collections, albeit in the somewhat implicit form of object references. Pointer-handling operations occur in device drivers, operating systems, and all kinds of application codes including those implementing safety-critical systems. The analysis of such software is a highly demanding and important task, as memory leaks, dereferencing null pointers or the accidental invalidation of data structures can cause great damage especially when software reliability is at stake. Moreover, the increasing presence of concurrency in modern computing raises additional problems such as the non-synchronised access to memory areas, which can entail so-called data races. Even worse, the formal analysis of concurrent software poses additional challenges caused by the non-deterministic execution order (interleaving) between different strands of concurrent activities.

In consequence, the complexity of state spaces arising from dynamic data structures, recursive method or procedure calls, and dynamic creation of and interleaving between concurrent threads imposes challenges which cannot be handled by standard verification algorithms such as finite-state model checking. This problem has been a topic of continuous research interest since the early 1970s. A common approach are abstraction techniques that employ symbolic representations of sets of program states using suitable formalisms. Various such formalisms have been considered for this purpose, which can be distinguished with regard to their expressiveness, precision, efficiency, and automatability. The most important ones are sketched in the following.

Automata-based approaches
Regular model checking is a generic technique for algorithmic verification of infinite-state systems which uses (finite) word, tree, or forest automata to finitely represent potentially infinite sets of reachable configurations. As the related problems are typically undecidable, research activities in this area have focused on developing semi-algorithms, decidability results for restricted cases, and systematic over-approximations of the state space by means of acceleration techniques.

Logic-based approaches
Shape analysis is a form of pointer analysis that attempts to characterize collections of heap cells reachable from particular pointers, for example, to determine whether the cells form a list or a tree. In particular, three-valued logic can be used as a framework for designing such analyses. It is instantiated by supplying both predicates that capture different relationships between cells and the functions that determine how the predicates are updated by particular program operations. Another logic-based technique is separation logic, which extends classical Hoare logic by operators that enable modular reasoning about heap structures. While the logic was originally designed as a calculus for manually verifying low-level pointer manipulating programs, it has subsequently become the basis for several abstract interpretation based verifiers. Since then, this topic has been actively studied with regard to both its theoretical foundations and its practical applications. Its importance is witnessed by the 2016 Gödel Prize, which was awarded to Stephen Brookes and Peter W. O’Hearn for their invention of Concurrent Separation Logic.

Graph-based approaches
As heap data structures can be formalised by graphs, it is quite natural to employ graph transformation techniques for both specifying symbolic execution of pointer programs and abstraction mappings on heap structures. With respect to the former, methods for verifying graph transformation systems based on model checking and similar techniques are being studied. With respect to abstraction methods, graph grammars such as regular tree grammars and hyperedge replacement grammars are employed for describing the shape of complex data structures.

Extensions for concurrency
As described earlier, another source of infinite state spaces is unbounded concurrency, i.e., dealing with a parametric number or with unbounded dynamic creation of threads. In order to meet this challenge, for all of the approaches mentioned above, extensions for concurrency have been developed. They are ranging from adaptations of regular model checking over specific kinds of abstraction and automated induction based on network invariants to thread-modular reasoning. An especially challenging task is to verify programs that involve both unbounded concurrency as well as unbounded data structures. In fact, the survey on software model checking by R. Jhala and R. Majumdar states that “the scalable verification of heap-based data structures is perhaps the least understood direction in software model checking” and that “scaling verification techniques in the presence of expressive heap abstractions and concurrent interactions remain outstanding open problems”.

Aims of the meeting
Although many of these approaches address similar concerns, their formulations and formalisms often seem dissimilar and sometimes even unrelated. Thus, the insights and results gained in one description of heap abstraction may not directly carry over to some other descriptions. This Shonan meeting shall bring together both theoreticians and practitioners working on different techniques for heap abstraction and pointer program analysis. It aims to provide a broad understanding of the various techniques to support the exploitation of their commonalities such that they can benefit from each other.

More specifically, the following questions will be discussed:

  • What are the commonalities and differences between the various techniques?
  • Is it possible to exploit interrelations between different methods? In particular, by utilising results from related areas such as graph theory, is it possible for tools supporting separation logic to reach a similar degree of automation as state-of-art static analyzers based on abstract interpretation?
  • Some approaches are more successful in automation and extension for concurrency than
    others. What can these less successful techniques learn from the more successful ones?
  • Which theoretical questions should be addressed in the future?

Category: Uncategorized

Tagged:

Comments are closed.