Ericpony's Blog: May 2015

Thursday, May 28, 2015

A note on formal verification of programs

Formal verification for softwares involves proving properties of programs under various assumptions. In order to automate the verifying process, we have to formalize the program to verify. This usually amounts to transforming the program from source code to model while preserving the properties of interests. In this post, we shall present an example formalization of programs. We shall also demonstrate how program properties can be verified using the presented formalization.

Let's first consider a simple imperative programming language that can encode assignments, arithmetic operations, conditional branches, skips, sequences and non-recursive function calls. Clearly, any program written in this language has a finite number of possible execution paths, as well as a finite number of steps in each execution path. This fact allows us to transform a program to a logical formula, such that each model of the formula corresponds to a valid execution path of the program and vice versa.
More precisely, suppose $F(x_1,...,x_n)$ is a program that takes $n$ inputs and gives one output. Then $F$ can be represented as a formula (called a path formula) in form of $$ \phi_F(x_1,...,x_n,y) = \bigvee\nolimits_{i \in I}\left( p_i \wedge y = e_i \right),$$where $I$ is a finite index set, $p_i$ is a predicate and $e_i$ is an expression. A path formula $\phi$ is called valid if the following conditions are satisfied: i) $\bigvee_{i \in I}\ p_i$ is a tautology, which means all possible paths are covered by the formula, ii) $\neg\bigvee_{i \neq j}(p_i \wedge p_j)$ is a tautology, which means no two paths will be taken simultaneously, and iii) an assignment to $\{x_1,..., x_n, y\}$ satisfies $\phi_F$ if and only if $f(x_1,..., x_n)=y$, which means solutions of the formula precisely characterizes the input-output relation of program $F$.
One can employ techniques of symbolic execution to convert programs of various imperative languages into path formulae, see e.g., [1,2]. Such conversions usually allow one to reduce the problems of verifying program properties to those of checking satisfiability for first-order constraints. For example, to verify that function $F(x_1,x_2)$ is commutative, i.e., $\forall x_1,x_2.\ F(x_1,x_2) = F(x_2,x_1)$, it suffices to check that constraint $\phi_F(x_1,x_2,y) \wedge \phi_F(x_2,x_1,y') \wedge y \neq y'$ is unsatisfiable.
Modern SMT solvers effectively decide formulas over combinations of theories such as uninterpreted functions, boolean algebra, integer arithmetic, bit-vectors, arrays, sets and maps [3]. Program verification thus can benefit from advances in SMT solvers by modelling programs as first-order constraints augmented with theories, see e.g., [4-7].

Handling recursive functions

Uninterpreted functions are often leveraged to facilitate automatic reasoning about recursive functions. The idea is to abstract each recursive call to an uninterpreted function call. If a constraint is unsatisfiable assuming uninterpreted functions, then it is also unsatisfiable assuming the correct interpretation. On the other hand, if a constraint is satisfiable assuming uninterpreted functions, then no reliable conclu-sion could be drawn for the correct interpretation, as the assumptions can be wrong. In [4], the authors proposes a technique to refine assumptions iteratively when the constraint is satisfiable. This technique is the cornerstone of the Leon verification system [5].
As an illustration of the technique, consider function

function size (list) {
  if (list == null) { return 0 } else { return 1 + size(list.next) }
}

with path formula $ \phi_{size} \equiv (b_1 \wedge y = 0) \vee (\neg b_1 \wedge y = 1 + \mathrm{size(list.next)}),$ and define $$ \phi = (\mathrm{size(list)} = y) \wedge (b_1 \Leftrightarrow \mathrm{list} = \mathrm{null}) \wedge \phi_{size}.$$Note that a boolean variable $b_1$, called a control literal, is introduced to represent the truth value of the branching condition in $\mathrm{size}$. The purpose of introducing control literals to a path formula is to guard invocations of uninterpreted functions. It turns out that we can switch between under- and over-approximation of a path formula by changing assignments of its control literals [4], as explained below.
Suppose we want to verify property $\psi$, say $\psi \equiv \mathrm{size(list)} \ge 0$. Define $\psi_1 \equiv \neg\psi \wedge \phi$. Given a correct interpretation of $\mathrm{size}$, $\psi_1$ is satisfiable iff the $\mathrm{size}$ function satisfies property $\psi$. In particular, if $\psi_1$ is unsat, then so is $\psi$ for any interpretation of $\mathrm{size}$. However, if a counterexample is reported assuming uninterpreted functions, then this may be because $\mathrm{size(list.next)}$ is assigned an incorrect value. To exclude such case, we check the satisfiability of $\psi_1 \wedge b_1$. If the later formula is sat, then so is $\psi_1$ and we are done. Otherwise, we have to explore the branch restricted by $b_1$. For this purpose, we unfold the definition of $\mathrm{size(list.next)}$ one more time in $\phi$, and then rewrite formula $\psi_1$ with appropriate substitutions, e.g., replacing $\mathrm{list}$ with $\mathrm{list.next}$, using fresh variables for $b_1$ and $y$, etc. The new formula is denoted as $\psi_2$, and we proceed to check the satisfiabilty of $\psi_2$ and $\psi_2 \wedge b_2$, where $b_2$ is the control literal introduced by the unfolding.
We repeat these steps and produce a sequence of alternating approximations. Thus the verification procedure for property $\psi$ would look like

    $n := 1$
    while true do
        if $\psi_n$ is unsat then output UNSAT and break
        if $\psi_n\wedge b_n$ is sat then output SAT and break
        $n := n + 1$
    done

Observe that the procedure iteratively computes a succession of under- and over-approximations of the recursive calls in the provided constraint. Also, it finds a counterexample for the constraint whenever the constraint is satisfiable. Intuitively, this happens because a counterexample corresponds to an execution that violates the property, and the procedure enumerates all possible executions in increasing lengths. On the other hand, since our demo language becomes Turing complete after it is equipped with recursions, we cannot expect the procedure to always terminate when the provided constraint is unsatisfiable. For an important class of recursive functions, however, the approach outlined here acts as a decision procedure and terminates in all cases [8]. For instance, the $\mathrm{size}$ function listed above falls into this class.

(To be completed)

References

1. Liu, Chang, et al. "Automating distributed partial aggregation." SoCC, 2014.
2. Srivastava, et al. "Path-based inductive synthesis for program inversion." PLDI, 2010.
3. De Moura, Leonardo, et al. "Satisfiability modulo theories: introduction and applications." Comm. of ACM, 2011.
4. P. Suter, A. S. Koksal, et al. "Satisfiability modulo recursive programs." SAS, 2011.
5. Blanc, Regis, et al. "An overview of the Leon verification system." SCALA, 2013.
6. Bjørner, et al. "Program Verification as Satisfiability Modulo Theories." SMT@ IJCAR, 2012.
7. Bjørner, et al. "Higher-order Program Verification as Satisfiability Modulo Theories with Algebraic Data-types." Higher-Order Program Analysis, 2013.
8. Suter, P., et al. "Decision procedures for algebraic data types with abstractions." POPL, 2010.