This is ericpony's blog
Showing posts with label Logic. Show all posts
Showing posts with label Logic. Show all posts

Wednesday, November 15, 2017

Liveness to safety reduction with predicate abstraction

An automata-based approach for LTL model checking

Linear-time Temporal Logic (LTL) can be model checked for finite-state systems using symbolic automata-based techniques [6]. Given a transition system $S$ and an LTL formula $φ$, the model checking problem (denoted as $S ⊨ φ$) asks whether $φ$ is true on all infinite execution paths of $S$. A classical automata-based approach [6] toward this problem consists of building a Buchi automaton $S_{¬φ}$ with a fairness condition $f_{¬φ}$ such that $S ⊨ φ$ iff $S × S_{¬φ} ⊨ FG\ ¬f_{¬φ}$. Therefore, when we model check an LTL formula, we may assume without loss of generality that the automata-based transformation has been applied and consider the problem in form of $S ⊨ FG\ ¬f_{¬φ}$. In the following, we shall drop the subscript $¬φ$ and simply regard $FG\ ¬f$ as the target property to verify.

Liveness-to-safety reduction for finite-state systems

Liveness-to-safety reduction [4, 5] is a technique to translate an LTL model checking problem to an invariant verification problem. The invariant encodes the the absence of counterexamples to $FG\ ¬f$, namely, the system does not have a computation that visits some state $s ⊨ f$ infinitely often. The reduction algorithm augments the states of the system with flags seen, triggered, and loop, as well as maintaining a fresh copy of the state variables. It then simulates the system execution and in the meantime guesses the entry point of a cycle by non-deterministically flagging a state as "seen" and saving the valuation of the state variables. An execution will be "triggered" if it encounters a state $s ⊨ f$ after having flagged a state as "seen". Finally, a loop is detected if a triggered execution enters a seen state. The safety property thus specifies that no loop is detected on any execution. This reduction is sound and complete for finite-state and parameterised systems, since a liveness property is violated in such systems iff a lasso-shaped counterexample is present.

Liveness-to-safety reduction for infinite-state systems

The above liveness-to-safety reduction is unsound for infinite-state systems where an execution can violate a liveness property without visiting the same state twice. To overcome this problem, the authors of [1] propose to incorporate the notion of predicate abstraction in the reduction. The abstraction is supposed to be coarse enough to map a real counterexample (i.e. a path from an initial state that visits the fairness condition infinitely often) to a lasso-shaped abstract path, but at the same time precise enough not to admit spurious abstract cycles. The soundness of this approach is based on the fact that existential abstraction preserves any temporal logic without existential quantification over paths, e.g., $\forall$CTL$^*$ and LTL [7, 8].

Overview of the reduction procedure

Below we introduce the liveness-to-safety reduction proposed in [1] for infinite-state systems. Give a system $S$ and an LTL property $FG\ ¬f$, the procedure generates a sequence of safety verification problems $$S_0 ⊨_{inv} φ_0 ,\quad S_1 ⊨_{inv} φ_1,\quad \dots.$$ We shall denote the encoding of the $i$-th safety property as $$S_i, φ_i := {\rm encode}(S, f, P_i, W_i),$$ where $P_i$ is a set of state predicate, and $W_i$ is a set well-founded relations. Furthermore, the encoding ensures that $S ⊨ FG\ ¬f$ holds if $S_i ⊨_{inv} φ_i$ for some $i$; here $φ_i$ essentially states that cycles consisting of states satisfying $f$ is unreachable. The iteration terminates if some generated safety property is satisfied. Otherwise, suppose we have found $S_i \not⊨_{inv} φ_i$ for some $i$. We analyse a finite counterexample path $π$ of $S_i$ to determine whether it corresponds to an infinite real counterexample for $FG\ ¬f$ in $S$. If so, then we conclude that the property does not hold. Otherwise, if we can conclude that $π$ does not correspond to any real counterexample in $S$, we try to extract a set of new predicates $P'$ and/or well-founded relations $W'$ to produce a refined encoding: $$S_{i+1}, φ_{i+1} := {\rm encode}(S, f, P_i ∪ P', W_i ∪ W'),$$ where $P', W' := {\rm refine}(S_i, π, P_i, W_i)$. It is possible that the algorithm can neither confirm nor refute the existence of real counterexamples, in which case it aborts and returns “unknown”. The algorithm might also diverge and/or exhaust resources in various intermediate steps (e.g. in checking $S_i ⊨_{inv} φ_i$ or during refinement).
In the following, we shall describe the encoding and refinement procedures in more details. We begin with a simplified version that only uses predicates from $P$, i.e., $W = ∅$. We then describe how to extend the encoding to exploit also well-founded relations.

Counterexamples and refinement

Consider a safety checking problem $S_i ⊨_{inv} φ_i$. Each state of $S_i$ is a state of $S$ augmented with flags seen, triggered, and loop as in the case of finite-state system. However, instead of comparing the variation of state variables, the augmented system $S_i$ compares the bit-vector induced by the predicates $P_i$ to determine the equivalence between two states. A counterexample $π$ for a safety property $S_i ⊨_{inv} φ_i$ is a finite sequence of tuples where each tuple consists of an abstract state and a concrete state. The projection of the sequence to abstract states forms a lasso-shaped abstract path, and the projection to concrete states recovers the execution from which this counterexample is obtained. When the invariant is violated, we need to make sure that the obtained counterexample does not result from an insufficient precision of the abstraction induced by $P_i$. To do this, the algorithm first searches for a concrete lasso witnessing a real violation of the LTL property, using standard bounded model checking. If this fails, it tries to prove that the abstract lasso is infeasible by checking an increasing number of finite unrollings of the loop. Upon infeasibility, new predicates can be extracted from sequence interpolants with standard predicate refinement strategies for invariant properties [3].

Enhancement with well-founded relations

In the refinement operation described above, new predicates are extracted at the point when the unrolling of a spurious abstract counterexample becomes infeasible. However, it is possible that all finite unrollings of the abstract loop are feasible, but there is no concrete path executing the loop for an infinite number of times. If this happens, the refinement step might not terminate even though the abstract path cannot be concretised. The following example illustrate one such situations.

Example. Suppose that we want to verify an LTL property $FG\ ¬(x≤y)$ against a transition system $$S=(ℕ \times ℕ, \{(0,y) \mid y≥0\}, \{(x,y) \to (x+1, y)\}).$$ Let $P_0 = \{(x≤y), (y=0)\}$ be the initial set of predicates. Then $S_0 ⊨_{inv} \varphi_0$ admits an abstract counterexample path through the self loop at $\langle(x≤y), ¬(y=0)\rangle$. Unrolling this abstract path, however, will not terminate: any path formula obtained by unrolling the abstract loop for $i$ times is feasible by starting from the concrete initial state $(0,i).$

To address the spurious abstract paths whose finite prefixes are all feasible, we extend the encoding to incorporate well-founded relations over the concrete states. The idea is to block these spurious abstract paths by finding suitable termination arguments in form of (disjunctive) well-founded transition invariants. More precisely, given a finite set $W$ of well-founded relations, we encode $S, φ$ such that a lasso-shaped abstract path leads to a counterexample only if the loop contains at least two concrete states not covered by any relation in $W$. The resulting abstraction is sound as it preserves all concrete infinite paths. Indeed, if there is an infinite path such that any two states on it are covered by some relation in $W$, then by Ramsey's Theorem, there must be a relation in $W$ admitting an infinite descending chain, a contradiction. Below we show how spurious counterexamples can be removed by a suitably chosen well-founded relation.

Example. Consider the spurious abstract counterexample in the previous example. Projecting its loop to concrete states leads to two states $(a,b)$ and $(a+1,b)$ for some integers $a ≤ b$. Including relation $R := \{((x,y), (x',y)) \mid x < x' ≤ y  \}$ in $W$ will remove the path from the abstraction and prove the desired LTL property.

k-liveness as a well-founded relation. There are various heuristics to come up with new abstraction predicates and well-founded relations for refinement (see [1] for the details). That said, it is still possible that at some point all heuristics get stuck, failing to produce new predicates or relations to remove a spurious counterexample. In this situation, we may exploit the notion of k-liveness in hope to make further progress. Intuitively, we can maintain an auxiliary integer variable $k$ in the transition system to count the occurrences of $f$. With this counter, the refinement may always discover a new well-founded relation $R := \{ (s, s') ∣ k < k' ≤ n \}$ where $n$ is the number of occurrences of $f$ in the current counterexample. By adding this relation, we allow the procedure to progress by blocking all counterexamples where the occurrences of $f$ is less than or equal to $n$. This heuristic is sound since $f$ occurs infinitely often in a real counterexample.

References

1. Infinite-state Liveness-to-Safety via Implicit Abstraction and Well-founded Relations
2. Counterexample-guided Abstraction Refinement for Symbolic Model Checking
3. Abstractions from Proofs
4. Liveness Checking as Safety Checking
5. Liveness Checking as Safety Checking for Infinite State Spaces
6. An Automata-Theoretic Approach to Linear Temporal Logic

7. Model Checking and Abstraction
8. Property preserving abstractions for the verification of concurrent systems

Thursday, August 10, 2017

Logic grid puzzles and Beth's Definability Theorem


Logic Grid Puzzles are popular in magazines and newspapers. In such puzzles, the reader is asked to deduce an answer to a question from a set of clues in some scenario. For example, the scenario may be as follows: five guys are having drinks in a room. The clues are often information like "They all come from different countries", "The British is in blue" or "People in red don't drink alcohol". The question may be "What does the German drink?". The reader usually tries to deduce the answer by filling out a matrix with the clues, like the right one.
Puzzles of this kind involve turning an implicit characterisation (i.e. the clues) into an explicit one (i.e. a direct answer to the question). In formal logic, we may define a relation explicitly with a formula. More precisely, given a theory $S$ (i.e. a set of sentences), an n-ary relation $R(x_1,...,x_n)$ is explicitly defined by $\Phi$ w.r.t. $S$ if$$S \models \forall x_1,...,x_n: R(x_1,...,x_n)\equiv\Phi(x_1,...,x_n),$$where $\Phi(x_1,...,x_n)$ is a formula that doesn't refer to $R$. (If $A,B$ are theories, $A\models B$ means that every model of $A$ is also a model of $B.$) In contrast, a theory referring to a relation may not uniquely determine that relation. Intuitively, a theory characterises a relation $R$ iff we cannot find two models of the theory that are only different in their interpretations of the symbol $R$. More formally, if $S$ is a theory of vocabulary $\tau$, then $S$ implicitly defines a non-logical symbol $R\in\tau$ iff for all structures $A$ of vocabulary $\tau\setminus\{R\}$, there exist at most one relation $R^A$ over the domain of $A$ such that $A$ becomes a model of $S$ by interpreting $R$ to $R^A$. There are several equivalent ways to express this concept (e.g. this and this), but we shall use the following one for our convenience of proof:
Definition. A theory $S$ implicitly defines an n-ary relation $R(x_1,...,x_n)$ if$$S\cup S' \models \forall x_1,...,x_n: R(x_1,...,x_n)\equiv R'(x_1,...,x_n),$$where $S'$ is obtained by replacing every occurrence of $R$ in $S$ with a fresh relation symbol $R'$.
If a $R$ is explicitly defined by $\Phi$ w.r.t. $S$, then it is clear that $R$ can also be implicitly defined by $S$. The converse result is proved by Beth for first-order logic.
Theorem. (Beth's Definability Theorem for FOL) If a first-order property is implicitly definable then it can also be defined explicitly.
Beth's result is a corollary of Craig's Interpolation Theorem.
Lemma. If $A$ is a theory and $A\models \varphi$, then there is a finite set $A'\subseteq A$ such that $A'\models \varphi.$
Proof of lemma. Note that $A\models \varphi$ if and only if $A\cup \{\neg \varphi\}$ is unsatisfiable. By the compactness theorem, there is a finite subset $E\subseteq A\cup \{\neg \varphi\}$ that is unsatisfiable. If $\neg \varphi\not\in E$ then let $A' = E$. Otherwise, let $A' = E\setminus\{\neg \varphi\}$. Then $A'$ is the desired set.
Proof of Beth's theorem. Suppose we have$$S\cup S' \models \forall x_1,...,x_n: R(x_1,...,x_n)\equiv R'(x_1,...,x_n).$$By the above lemma, we can assume w.l.o.g. that $S,S'$ are finite. Let $X=\bigwedge S$ and $X'=\bigwedge S'$. Our assumption implies that $$X\wedge X' \rightarrow \forall x_1,...,x_n: R(x_1,...,x_n)\equiv R'(x_1,...,x_n)$$is valid. By introducing fresh variables $p_1,...,p_n$, we can rewrite the above sentence to $X\wedge X' \rightarrow (R(p_1,...,p_n)\equiv R'(p_1,...,p_n))$, which is further equivalent to$$(X\wedge R(p_1,...,p_n)) \rightarrow (X' \rightarrow R'(p_1,...,p_n)).$$Since $X$ doesn't contain $R'$, and $X'$ doesn't contain $R$, by Craig's theorem there exist a formula $\Phi(p_1,...,p_n)$ containing neither $R$ nor $R'$, such that both$$(X\wedge R(p_1,...,p_n)) \rightarrow \Phi(p_1,...,p_n) \quad\mbox{and}\quad \Phi(p_1,...,p_n) \rightarrow (X' \rightarrow R'(p_1,...,p_n))$$ hold. Note that the latter can be rewritten to $\Phi(p_1,...,p_n) \rightarrow (X \rightarrow R(p_1,...,p_n))$ by replacing every occurrences of $R'$ with $R.$ It follows from the currying law that$$X \rightarrow (R(p_1,...,p_n) \rightarrow \Phi(p_1,...,p_n))\quad\mbox{and}\quad X \rightarrow (\Phi(p_1,...,p_n) \rightarrow R(p_1,...,p_n))$$are valid. Conjuncting the two formulas leads to $$X \rightarrow (R(p_1,...,p_n) \equiv \Phi(p_1,...,p_n)),$$ which further implies that $$S \models \forall x_1,...,x_n: R(x_1,...,x_n) \equiv \Phi(x_1,...,x_n).$$ This concludes the proof of the theorem. 
Craig's interpolation can be computed via syntactical transformation, as is shown in Craig's 1957 paper. Hence, we can always construct an explicit definition out of an implicit one. Since explicit definition is a syntactical notion (i.e. defined with a formula) while implicit definition tends to be more semantical (i.e. defined with a theory), Beth's theorem indicates that the syntactical definability of first-order logic is "complete" in some sense.

References

Tuesday, July 18, 2017

A note on useful decidable logics & theories

Note that this post discusses decidable fragments that allows at least some degree of quantifications, not decidable theories of ground formulas used in SMT solvers.

Decidable fragments of first-order logic

Monadic first-order logic. Also known as the relational monadic fragment (RMF) or the Löwenheim class [2], this fragment is a classical example of decidable first-order logic without equality. The RMF consists of the first-order formulas where all relation symbols are unary:$$\phi ::= R(x) \mid \neg \phi \mid \phi \vee \phi \mid \phi \wedge \phi \mid \exists x.\phi \mid \forall x.\phi.$$Note that equality is not allowed in this logic and there are no constant symbols. Satisfiability of the logic is first proved to be NE-complete [2], where NE = $\bigcup_c NTIME(2^{cn})$. Later it is shown that NE = NEXPTIME [1], and thus the problem is NEXPTIME-complete. The Löb-Gurevich class (cf. [8]) is a decidable extension of the RMF where unary function symbols are also allowed. Both of the classes enjoy the finite model property, cf [8].

Effectively propositional logic (EPR). [5] Also known as the Bernays-Schönfinkel class, this fragment consists of first-order formulas such that:
1. The vocabulary is restricted to constant and relation symbols.
2. The quantifier prefix is restricted to $\exists^*\forall^*$ for formulas in prenex normal form.
Condition 2 implies that EPR is not closed under negation. EPR enjoys the finite model property, meaning that a satisfiable formula is guaranteed to have a finite model. The satisfiability problem of EPR is NEXPTIME-complete.
We can extend EPR to allow quantifier alternation and function symbols while maintaining the same properties, as long as the formula in the extended logic is "stratified" [11]. More precisely, we define the quantifier alternation graph $G(\phi)$ for a formula $\phi$ as a directed graph where the set of vertices is the set of sorts, and for each function symbol $f : (s_1,\ldots, s_n) \to s$ in the skolem normal form of $\phi$, there is an edge $s_i \to s$ for $1 \le i \le n$. $\phi$ is called stratified if $G(\phi)$ does not contain a cycle through the sort of some universally quantified variable. For example, $\phi := \forall x. \exists y. f(x) = y$ is skolemized to $\forall x. f(x) = g(y)$ with $f,g : A \to B$. Hence $G(\phi)$ is acyclic and thus $\phi$ is in the decidable fragment.
Stratified formulas are also decidable, for there is no way to generate an infinite sequence of instantiation terms. By contrast, $\psi := \forall x. \exists y. f(y) = x$ is skolemized to $\forall x. f(g(x)) = x$ where $f: A \to B$ and $g: B \to A$. Hence $G(\psi)$ contains a cycle through the sort of $x$, and thus $\psi$ is not in the decidable fragment.
Formulas in the extended EPR can be effectively translated into propositional logic formulas through a instantiation process over a finite Henken domain, see e.g., [12,13]. Extended EPR is supported by first-order logic provers such as iProver (which is known as the most efficient solver for EPR), and by modern SMT solvers such as Z3.

Presburger arithmetic. The first-order theory of structure $(\mathbb N, +)$ can be decided by encoding $n\in \mathbb N$ in binary (LSB first), such that the atomic ternary-relation $+$ corresponds to a regular relation $+_2$ over this coding and $\mathbb N$ corresponds to the $\omega$-regular language $(0+1)^*0^\omega$. Buchi has shown in 1962 that given any formula $\phi(x_1,...,x_n)$ over ${\rm FO}(\mathbb N, +)$, one can effectively compute an $\omega$-automaton $A(x_1,...,x_n)$ that defines the same relation modulo convolution $\otimes$, namely, $\phi(x_1,...,x_n)$ holds iff $A$ accepts the $\omega$-word $x_1 \otimes \cdots \otimes x_n.$ Satisfiability problem for ${\rm FO}(\mathbb N, +)$ thus reduces to the emptiness problem for $\omega$-automata and is thus decidable. See [9] for a review of the algorithmic and computational complexity aspects of PA. Also see a recent historical review in this paper and its presentation.

A list of decidable theories can be found on Wikipedia.

References

1. Structural properties of complete problems for exponential time. 1997.
2. Complexity results for classes of quantification formulas. 2001.
3. Verification of randomised security protocols.
4. Decidability of second-order theories and automata on infinite trees. 1969.
5. Complexity results for classes of quantificational formulas. 1980.
6. The theory of ends, pushdown automata, and second-order logic. 1985.
7. Bisimulation through Probabilistic Testing. 1991.
8. The Classical Decision Problem. 2001.
9. A Survival Guide to Presburger Arithmetic. 2018.
10. Automata: From Logics to Algorithms. 2007.
11. Complete instantiation for quantified formulas in satisfiabiliby modulo theories. 2009.
12. Deciding effectively propositional logic using DPLL and substitution sets. 2008.
13. Proof systems for effectively propositional logic. 2008.

Sunday, June 18, 2017

A note on useful decidable temporal logics

Subclasses of FO(LTL)

In the following, we assume that non-constant function symbols and equality are not allowed. The addition of them can lead to the loss of decidability for any of the subclasses introduced here.

Monadic FO(LTL), proposed by Hodkinson et al. [1] in 2000, is the first non-trivial decidable subclass of FO(LTL). This class contains all sub-formulas beginning with a temporal operator (Since or Until) have at most one free variable. Thus, we can talk about objects in the intended domain using the full power of first-order logic; however, temporal operators may be used to describe the development in time of only one object (two are enough to simulate the behaviour of Turing machines). The satisfiability problem for formulas in this class can be reduced to fragments of classical two-sorted first-order logics in which one sort is intended for temporal reasoning. The decidability results also hold for the temporal logics with finite domains.

Remark. FO(LTL) provides only ‘implicit’ access to time: quantification over points in time in the sense of first-order logic is not permitted, and the only means of expressing temporal properties is by the temporal operators Since and Until. It turns out that monadic FO(LTL) is weaker than the two-sorted first-order language, where one sort of which refers to points in time and the other to the first-order domain. For example, the formula $∃t_1 ∃t_2 (t_1 < t_2 ∧ ∀x(P(t_1, x) ↔ P(t_2, x)))$ is not expressible in FO(LTL) over any interesting class of linear time structures [2, 3].

References

1. Decidable fragments of first-order temporal logics.
2. Temporal Logic. Part I & Part II. Clarendon Press, Oxford.
3. Temporal connectives versus explicit timestamps in temporal query languages.

Friday, May 19, 2017

A note on useful decidable logics & theories - Part II

Decidable fragments of second-order logic

(Weak) MSO with one successor. The monadic second-order theory of structure $(\mathbb N, <)$ (denoted by S1S) is decidable. Buchi showed in 1962 that given any formula $\phi(x_1,...,x_n)$ over ${\rm MSO}(\mathbb N, <)$, one can effectively compute an $\omega$-automaton $A(x_1,...,x_n)$ defining the same relation modulo convolution. There is a similar connection between weak $MSO(\mathbb N, <)$ (denoted by WS1S) and finite-word automata. See [10] for a historical survey.

MSO over binary trees. The structure of infinite binary trees is $T_2 := (\{1,2\}^*,S_1,S_2)$ with $S_i := \{(v,vi) : v \in \{1,2\}^*\}$. Rabin's Tree Theorem [4] states that the MSO-theory of the infinite binary tree (denoted by S2S) is decidable, or in other terminology: that model-checking the binary tree with respect to MSO-properties is decidable. In [4], Rabin also proved the decidability of a bunch of theories using interpretation in $T_2$, such as MSO over n-ary trees (see below). Note that Rabin's theorem is subsumed by a more general result stating that MSO over the prefix-rewriting systems is decidable [?].

MSO over n-ary trees. The decidability results of MSO-theory over binary trees $T_2$ can be extended to $n$-ary branching trees $T_n$ with vertices in $[1..n]^*$ by encoding a vertex $a_1 a_2 \dots a_k$ of $T_n$ by a vertex $1^{a_1} 2 1^{a_2} 2 \dots 2 1^{a_k} 2$ of $T_2$. The translation of MSO-formulas is straightforward.

MSO over pushdown automata. Let $G := (V,E)$ denote the configuration graph of a pushdown automaton, where $V\subseteq Q\times\Sigma^*$. Let $n := \max(|Q|,|\Sigma|).$ One can MSO-interpret $G$ in $T_n$ by representing the configuration $(q, s_1,...,s_k)$ as the vertex $s_1...s_k\ q$ of $T_n$. Each step between the configurations in $G$ is translated in MSO to a local move from one vertex to another in $T_n$. Decidability of MSO over binary trees hence subsumes the fundamental result of Muller and Schupp [6] that checking MSO properties is decidable for the configuration graph of a pushdown automaton.

Decidable modal logic

Probabilistic modal logic (PML). Let $\Sigma$ be a finite set of actions. The syntax of the logic is defined as follows:$$\phi ::= \top \mid \bot \mid \phi\vee\phi \mid \neg\phi \mid (\phi)_a\ {\rm op}~p,$$where $a \in \Sigma$, ${\rm op}\in \{=,<\}$, and $0\le p\le 1$. The semantics of PML is defined over a discrete non-deterministic probabilistic transition system such that for $s\in S$, e.g.,$$s \models (\phi)_a \le p \quad {\rm iff} \quad \Pr(s' \models \phi \mid s\xrightarrow{a} s') \le p.$$Larsen and Skou [7] show that, when the transition probabilities are rational and bounded above zero, given any PML formula $\phi$ and fixed $0<\epsilon<1$, one can compute an experiment $t_{\epsilon,\phi}$ and an observation $o_{\epsilon,\phi}$ such that $s \models \phi$ iff $\Pr(o_{\epsilon,\phi}$ is observed by running $t_{\epsilon,\phi}$ on $s) \ge 1-\epsilon$, and $s \not\models \phi$ iff $\Pr(o_{\epsilon,\phi}$ is observed by running $t_{\epsilon,\phi}$ on $s) \le \epsilon$. Such experiments can be computed, executed and observed in polynomial time. Larsen and Skou show that PML characterises probabilistic bisimulation in the following sense: two states are probabilistic bisimilar iff they satisfy exactly the same set of PML formulas.

Misc

Reduction of decidability. Suppose that model checking $L$-theory of structure $S$ is known to be decidable. The model checking problem for a structure $S'$ with respect to $L'$-theory is decidable if we can give an $L$-description of $S'$ within structure $S$. That is, we characterise structure $S'$ using $L$-formulas over $S$. In other words, we provide a satisfiability-preserving translation of any $L'$-formulas $\phi$ into $L$-formula $\psi$ such that $S'\models \phi$ iff $S\models \psi$. Hence $L'$-theory over structure $S'$ is also decidable.

References

1. Structural properties of complete problems for exponential time. 1997.
2. Complexity results for classes of quantification formulas. 2001.
3. Verification of randomised security protocols.
4. Decidability of second-order theories and automata on infinite trees. 1969.
5. Complexity results for classes of quantificational formulas. 1980.
6. The theory of ends, pushdown automata, and second-order logic. 1985.
7. Bisimulation through Probabilistic Testing. 1991.
8. The Classical Decision Problem. 2001.
9. A Survival Guide to Presburger Arithmetic. 2018.
10. Automata: From Logics to Algorithms. 2007.
11. Complete instantiation for quantified formulas in satisfiabiliby modulo theories. 2009.

Saturday, May 6, 2017

A note on recognisability and definability

Definitions. An $n$-ary relation is associated with a set of $n$-tuples; an $n$-ary function $f$ is associated with a set of $(n+1)$-tuples $\{(a_1,...,a_n,f(a_1,...,a_n)\}$, which is called the graph of function $f.$ A sequence $s_1,s_2,...$ over $A$ is associated with a function $s:\mathbb N^n\rightarrow A$ such that $s(i)=s_i.$ Given an integer $n$, we use $(n)_p$ to denote its p-ary expansion; given a p-expansion $w\in\{0,...,p-1\}^*$, we use $[w]_p$ to denote its value $\sum_{i=0}^{|w|-1}p^{|w|-1-i}w[i].$ By convention, we set $(0)_p=\epsilon$ and $[\epsilon]_p=0.$

Recognisable Sets

Definitions. A set is called regular or recognisable if it can be recognised by a finite DFA. For $L\subseteq\Sigma^*$, $\sim_L$ is called a right congruence iff $u\sim_L v \iff uw\sim_L vw$ for all $w\in\Sigma^*.$ Note that a right congruence is an equivalence relation.

Myhill-Nerode Theorem. A set $L$ is regular iff $\sim_L$ has finite index over $\Sigma^*.$

Definitions. (p-recognisability) A set of integers is called p-recognisable iff the p-ary expansion of the integers is recognised by a finite automaton with alphabet $\{0,...,p-1\}.$ The automaton would read a word from left to right and ignores the leading zeros, so that if it accepts $w$ then it accepts all words in $0^*w.$
Fact. A sequence $s_1,s_2,...$ over $A$ is p-recognisable if $s^{-1}(a)$ is p-recognisable for each $a\in A.$ When $A$ is finite, it is easier to define a sequence in this way.

Definitions of recognisability generalise to n-ary relations naturally by replacing the alphabet $\{0,...,p-1\}$ to $\{0,...,p-1\}^n.$ All properties and facts listed above still hold in the general case.

Definitions. (p-automata) A p-automaton recognising a sequence $s_1,s_2,...$ over a finite domain is a complete deterministic Moore machine that reads $(n)_p$ and outputs $s_n$ for all $n\ge 1.$ In such case the sequence is called p-recognisable. Note that p-recognisability of sequences generalises p-recognisability of sets, since a set is p-recognisable iff there exists a p-automaton that outputs 1 on $(n)_p$ if $n$ belongs to the set and outputs 0 otherwise. Alternatively, we can describe a set of integers using a characteristic sequence. The characteristic sequence of a set $S$ refers to a sequence over $\{0,1\}$ where the nth bit is 1 iff $n\in S.$ A set is p-recognisable iff its characteristic sequence is p-recognisable.
p-automaton can recognise an n-ary relation by recognising an n-dimensional sequence $s:\mathbb N^n\rightarrow\{0,1\}.$ The input language $((a_1)_p,...,(a_n)_p)$ is cleverly stipulated as a subset of $(\{0,...,p-1\}^n)^*$ instead of $(\{0,...,p-1\}^*)^n.$ Namely, components of a relation convoluted over alphabet $\{0,...,p-1\}^n$. In this way, theorems such as Myhill-Nerode still hold for $n\ge2.$

Definable Sets

Definitions. A set $L$ of $n$-tuples is called (first-order) definable by a formula $\phi$ in a first-order structure $\mathcal S$ iff $\phi$ has exactly $n$ free variables and $L=\{(a_1,...,a_n)\in D^n :$ $S\models \phi(a_1,...,a_n)\}$, where $D$ is the domain of $\mathcal S.$ A set is definable in a structure $\mathcal S$ if it is definable by a formula in $\mathcal S.$ When the structure is clear in the context we simply called a set definable. A sequence $s_1,s_2,...$ over $A$ is definable if the graph of the function $s:\mathbb N\rightarrow A$ associated with the sequence is definable.
Fact. A sequence $s_1,s_2,...$ over $A$ is definable iff $s^{-1}(a)$ is definable for each $a\in A.$ When $A$ is finite, it is easier to define a sequence in this way.

Definitions. (p-definability) Given $x\in\mathbb N$, define $V_p(x)\equiv$ the largest power of $p$ that divides $x.$ Let $V_p(0)\equiv 1.$ A subset of $\mathbb N$ is called $p$-definable if it is definable in $[\mathbb N,+,V_p].$ Given $x\in\mathbb N$, define $V_p(x)\equiv$ the largest power of $p$ that divides $x$ and $V_p(0)\equiv 1.$ Define a predicate $P_p$ that $P_p(x)$ holds iff $x$ is a power of $p.$ Define a binary relation $B_p$ such that $B_p(x,y)$ holds iff $y$ is a power of $p$ occurring in the p-ary expansion of $x.$ (Thus $B_p(x,y)$ iff $y<x$, $P_p(y)$, and there exits a $0\le z <y$ such that $y|(x-z).$)

Great efforts were made in the history to characterise 2-recognisable sets in an FO structure. Three candidate structures were considered for this purpose in order: $[\mathbb N, +, P_2]$, $[\mathbb N, +, B_2]$, and $[\mathbb N, +, V_2].$ We note two facts here about these structures:
Fact. $[\mathbb N, +, B_p]$ and $[\mathbb N, +, V_p]$ are (FO-)interpretable in each other.
Fact. $[\mathbb N, +, P_p]$ is strictly weaker than $[\mathbb N, +, V_p].$
To see the first, note that for $x>0$, $V_p(x)=y$ iff $B_2(x,y)\wedge \forall 0\le z<y.\ \neg B_p(x,z)$, and that $B_p(x,y)$ holds iff $\exists 0\le z <y.\ V_p(x-z, y).$ The second fact follows from an interesting decidability result. Let $p^x$ denote the exponential function. It is known that $[\mathbb N,+,V_p]$ and $[\mathbb N,+,p^x]$ are decidable, but $[\mathbb N,+,p^x,V_p]$ is undecidable. Clearly $P_p$ can be defined by $V_p$ and $p^x.$ If $[\mathbb N, +, P_p]$ is equal to $[\mathbb N, +, V_p]$, then $V_p$ will be definable by $p_x$, which implies $[\mathbb N,+,V_p]$ and $[\mathbb N,+,p^x,V_p]$ are mutually interpretable. This is impossible since the first is decidable and the second is not. The ultimate characterisation of 2-recognisable sets is given by the following theorem.
Theorem I. For any prime number p, a set is p-definable iff it is p-recognisable.
Proof. ($\Rightarrow$) It suffices to show that given any formula $\phi$ from a structure $\mathcal S$ equivalent to $[\mathbb N,+,V_p]$, we can construct a p-automaton $A_\phi$ recognising the set definable by $\phi$, i.e., $\mathcal S \models \phi(a_1,...,a_n)$ implies that $A_\phi$ accepts $([a_1]_p,...,[a_n]_p).$ To simply our proof, we choose $\mathcal S=[\mathbb N,R_1,R_2]$, where $R_1(x,y,z)$ holds iff $x+y=z$ and $R_2(x,y)$ holds iff $V_p(x)=y.$ Now we prove the claim by structural induction. First note that the three atom formulas $x=y$, $R_1(x,y,z)$, and $R_2(x,y)$ in $\mathcal S$ are p-recognisable (Exercise!). To complete the induction, it suffices to show that we can construct p-automata $A_{\neg\phi}$, $A_{\phi\vee\psi}$ and $A_{\exists x\phi}$ from p-automata $A_\phi$ and $A_\psi.$ Negation construction is immediate since p-automata is closed under complementation. For union, we modify the transition functions of the two p-automata so that they operate over the same set of input variables. For instance, suppose the free variables in $\phi$ are $x_1,...,x_m,y_1,...,y_n$ and the free variables in $\psi$ are $y_1,...,y_n,z_1,...,z_r.$ Then after the modification, the transition functions of the p-automata have the same domain $(x_1,...,x_m,$ $y_1,...,y_n,$ $z_1,...,z_r).$ The union of the two modified automata gives $A_{\phi\vee\psi}.$ For projection, to construct $\exists x.\phi(x,x_1,...,x_n)$ we just relabel the transitions of the p-automaton by replacing $(a,a_1,...,a_n)$ with $(a_1,...,a_n).$ Note that we may need to add new loops after the relabelling to recognise leading zeros caused by the projection. See [3] for details of this proof.

Corollary. The theory of $[\mathbb N,+,V_p]$ is decidable.
Proof. Given a sentence in $[\mathbb N,+,V_p]$, we can rewrite it to the form of either $\exists x.\phi(x)$ or $\neg\exists x.\phi(x).$ By Theorem I, $\phi(x)$ is p-recognisable. Hence we can construct an finite automaton $A$ that accepts a word $w$ iff $[\mathbb N,+,V_p]\models\phi([w]_p).$ Hence the truth of the sentence can be checked by checking emptiness of a recognisable language, which is decidable.

Corollary. The set of squares $\{n^2:n\in\mathbb N\}$ is not p-recognisable.
Proof. Suppose to the contrary that the set of squares is p-recognisable. By Theorem I, there exists a formula $\phi$ in $[\mathbb N,+,V_p]$ such that $\phi(x)$ holds iff $x$ is a square. Define $$\psi(c,d):=\phi(d)\wedge\forall d'.(d'>c\wedge\phi(d')\implies d'\ge d),$$ which asserts that $d$ is the smallest square larger than $c$. Observe that the set $\{(x,y)\in\mathbb N^2: y=x^2\}$ is p-definable by $$\lambda(x,y):=\phi(y)\wedge\exists a.\ \psi(y,a)\wedge a=y+2x+1.$$ Also, the set $\{(x,y,z)\in\mathbb N^3: z=xy\}$ is p-definable by formula
$\exists a,b,c.\ \lambda(a,x)\wedge\lambda(b,y)\wedge\phi(c)\wedge c=a+b+2z \wedge\exists d.\ \psi(c,d)\wedge d=c+2x+2y+1$
The validity of this formula exploits the fact that $c=(x+y)^2$ if $c$ is a square and the smallest square greater than $c$ is $c+2x+2y+1$ for some $x,y\in\mathbb N$. It then follows that multiplication is p-definable, which is impossible since $[\mathbb N,+,\cdot]$ is not decidable but $[\mathbb N,+,V_p]$ is.
(To be continued)

References and further reading

1. Recognizable sets of integers, Michel Rigo.
2. Languages, Automata, and Logic, Wolfgang Thomas.
3. Logic and p-recognizable sets of integers, V. Bruyère et al.

Sunday, October 23, 2016

A note on LTL vs CTL

  • Syntax. A quantifier in a CTL formula are in form of a path quantifier $A,E$ following by a state quantifier $F,X,G,U,W$. A quantifier in an LTL is a combination of state quantifiers.
  • States and paths. A CTL formula $\phi$ is a state formula, meaning that the temporal operators are state-based in structure: the derivation of a given formula is per-state and depends on the derivation of subformulas for subsequent states. For example, computing the set of states satisfying $AG\,φ$ relies on computing the set of states satisfying $φ$.
    In contrast, LTL formula $\psi$ is a path formula: the derivation of a given formula depends on the derivation of subformulas for suffixes of a path.
  • Entailments. We write $M,s \models \phi$ and $M,p \models \psi$ for state $s$ and path $p$. Given a set $I$ of initial states of $M$, we write $M ⊨ φ$ if $M, s ⊨ φ$ for all $s ∈ I$, and write $M ⊨ ψ$ if $M, p ⊨ ψ$ for any path $p$ starting from $I$.
  • LTL and CTL are incomparable. The reset property $AG (EF\,φ)$ (i.e. there is always a possibility that $\phi$ could hold, though it may never hold) cannot be expressed in LTL. The stability property $FG\,φ$ cannot be expressed in CTL. Consider a system $M: (s_0,s_0),(s_0,s_1),(s_1,s_2),(s_2,s_2)$ such that $s_0$ is the initial state, $s_0,s_2 \models φ$, and $s_1 \models \neg φ$. Then $M \not\models AFAG\,φ$ but $M \models FG\,φ$.
    It is known that an LTL formula can be over-approximated with ∀CTL [6]: if an ∀CTL property (consisting of AG and AF sub-properties) holds, then an analogous LTL property (where we replace AG with G and AF with F) also holds.
  • Structure vs traces. CTL can characterize the structure while LTL only characterizes the traces. Consider the following two systems:
    $A:$ $(s_1,s_2),\ (s_1,s_3),\ (s_2,s_4),\ (s_3,s_5),\ (s_4,s_1),\ (s_5,s_1)$
    $B:$ $(s_1,s_2),\ (s_2,s_4),\ (s_2,s_5),\ (s_4,s_1),\ (s_5,s_1)$
    where $L(s_1)=a,\ L(s_2)=L(s_3)=b,\ L(s_4)=c,\ L(s_5)=d$. Both systems have traces $(ab(c+d))^\omega$ and thus cannot be distinguished in LTL. On the other hand, $B$ is a model of $AG(b\Rightarrow EX\ c)$ while $A$ isn't.
  • There are lots of discussions over the best logic to express properties for software verification (cf. [2,5]). LTL can express important properties for software system modelling such as fairness, while the CTL must have a new semantics to express them, for example CTL* [4] or CTL+Fair [3]. But CTL algorithms are usually more efficient and can use BDD-based algorithms. Hence reachers have tried to approximate LTL verification using CTL model checking techniques, e.g., [7].

    References

    1. Making prophecies with decision predicates, POPL 2011.
    2. Branching vs. Linear Time: Final Showdown, TACAS 2001.
    3. Fairness for Infinite-State Systems, TACAS 2015.
    4. Verifying Increasingly Expressive Temporal Logics for Infinite-State Systems, JACM 2017.
    5. Concurrency: the Works of Leslie Lamport, ACM Press 2019.
    6. The common fragment of CTL and LTL, FOCS, 2000.
    7. Temporal verification of programs, PhD Thesis, 2012.

    Sunday, March 20, 2016

    Equality in first-order logic

    First-order logic (FOL) can be discussed with and without regarding identity as a in-built relation. This difference matters because the semantic of identity ("anything is equal to nothing but itself") is not definable in FOL. Hence, different ways to express the concept of identity lead to different models. By contrast, identity is definable in SOL, so the issues discussed here is not a problem in SOL.

    First-order logic with identity

    FOL with identity is a logic where "$=$" is a primitive logic symbol, i.e., symbols with the same semantics across all possible model. As a logical symbol, we stipulate that any model $\mathcal A$ of the logic always interpreted "$=$" to the identity relation on ${\rm Dom}(\mathcal A).$ Furthermore, the incorporation of equality adds three "Axioms of Equality" to the deduction system of the logic: reflexivity and the substitution for formulas, where the second axiom (aka. Leibniz's law) is an axiom schema. Other properties such as symmetry and transitivity can be derived from these two axioms.
    Remark. Note that the Axioms of Equality are actually not axioms but inference rules. In fact, these axioms do NOT characterise equality––what they characterise are arbitrary equivalence relations on the domain of discourse that is congruent with respect to the functions and relations of the interpretation.
    Example. Peano Arithmetic is the FO theory of $(\omega, +, succ)$ with identity, but Modulo Arithmetic is a FO theory without identity. Instead, for a fixed $n>0$, an equivalence relation $\equiv_n$ is defined such that $a \equiv_n b$ iff $a \equiv b$ mod $n$.

    First-order logic without identity

    Consider a vocabulary $\sigma$ where equality is not a logical symbol. A theory that contains the Axioms of Equality is called a theory with equality. Let $T$ be such a theory of $\sigma$ where the Axioms of Equality, denoted as $A_{\rm eq}$, are defined for a binary relation $\sim$. Given a model $\mathcal A$ of $T$, it is possible that $a\sim^{\mathcal A} b$ but $a\neq b$ although $a$ and $b$ will satisfy exactly the same formulas in ${\mathcal A}$ by Leibniz's law. On the other hand, we can define the quotient structure $\mathcal B := \mathcal A/\!\!\sim$ of $\mathcal A$. Clearly, $a\sim^{\mathcal B} b$ if and only if $a=b$.

    Now we have two methods to define equality in FOL. What would be the difference anyway? The first thing to note is that satisfiability is independent of which method is used. To see this, let $F$ be a formula of vocabulary $\sigma$ plus equality and let $F'$ be the formula obtained by replacing all "$=$" in $F$ with "$\sim$". Consider the theory $T_F := A_{\rm eq}\cup\{F'\}$. Note that if $\mathcal A$ is a model of $T_F$ then $\mathcal A/\!\!\sim$ is a model of $F$. It follows that $F$ and $T_F$ are equi-satisfiable, as concluded in the following facts:
    Fact. A formula $F$ is satisfiable in FOL with identity iff $T_F$ is satisfiable in FOL w/o identity.
    Fact. A formula $F$ is valid in FOL with identity iff $A_{\rm eq}\implies F$ is valid in FOL w/o identity.
    Let $T$ be a theory of a vocabulary with $\sim$. A model $\mathcal A$ is called a normal model of $T$ if $\mathcal A$ is a model of $T$ and $a\sim^{\mathcal A} b$ implies $a=b$. Hence, a theory with equality has a model iff it has a normal model. Results about FOL in the literature often assume identity with the notion of normal model, these results can be easily rephrased without identity. For example,
    Theorem. (Compactness with identity) Given a set of formula $\Sigma$, if any finite subset of $\Sigma$ has a model, then $\Sigma$ also has a model.
    Theorem. (Compactness w/o identity) Given a set of formula $\Sigma$, if for any finite subset of $\Sigma$ has a normal model, then $\Sigma$ also has a normal model.

    References

    The Logic of Equality, Leon Henkin, 1977.

    Wednesday, February 10, 2016

    A note on the expressiveness of logic

    Reachability problem. Fix a graph $\mathcal{G}$. A vertex $u$ is said to be reachable from a vertex $v$ in $\mathcal{G}$ if there exist $r \ge 0$ vertices $x_1$, ..., $x_r$, such that $(v, x_1)$, $(x_1, x_2)$, ..., $(x_{r−1}, u)$ are edges of $\mathcal{G}$. A formula $\phi(x,y)$ with exactly two free variables $x, y$ and one relation symbol $E$ is said to "express" reachability if and only if for any finite graph $\mathcal{G}$, vertex $u$ is reachable from vertex $v$ in $\mathcal{G}$ if and only if $\mathcal{G}$ is a model of $\phi(x,y)$ with respect to some interpretation that associates $E$ to edge relation and $x,\ y$ to vertices $u,\ v$.
    Reachability is not expressible in first-order logic. Suppose $\phi(x,y)$ is a FO logic formula expressing reachability. Define three sentences $s_1$, $s_2$, and $s_3$, such that
    $s_1 \equiv \forall x \forall y. \phi(x, y)$, i.e., all vertices are reachable from all vertices;
    $s_2 \equiv \forall x. (\exists y. E(x, y) \wedge \forall z. E(x, z) \implies z = y)$, i.e., every vertex has out-degree one;
    $s_3 \equiv \forall x. (\exists y. E(y, x) \wedge \forall z. E(z, x) \implies z = y)$, i.e., every vertex has in-degree one.
    Let $s \equiv s_1 \wedge \ s_2 \wedge \ s_3$. Observe that a graph is a model of $s$ if and only if the graph is cycle. Since there are finite cycles with as many vertices as desired, $s$ has arbitrary large finite models. It then follows by Lowenheim-Skolem theorem that $s$ has an infinite model. But infinite cycles do not exist, and thus our assumption that there exists a FO logic formula expressing reachability is contradicted.
    Reachability is expressible in second-order logic. Given constants $x$ and $y$ and a binary relation symbol $L$, we use the following sentences to characterize $L$ as a path from $x$ to $y$:
    1. $L$ is a linear order over the vertices : $$ \begin{align*} \psi_1 \equiv\ & \forall u. \neg L(u,u) \\ & \wedge \forall u\forall v. (L(u,v) \implies \neg L(v,u)) \\ & \wedge \forall u\forall v \forall w. (L(u,v) \wedge L(v,w) \implies L(u,w)) \end{align*} $$ 2. Any two consecutive vertices in the order must be joined by an edge: $$ \begin{align*} \psi_2 \equiv\ &\forall u \forall v. ((L(u,v) \wedge \forall w. (\neg L(u,w) \vee \neg L(w,v))) \implies E(u,v)). \end{align*}$$3. The first vertex in the order is $x$, the last vertex is $y$, and $x$, $y$ are in the order: $$ \psi_3 \equiv (\forall u.\ \neg L(u,x)) \wedge (\forall v.\ \neg L(y,v)) \ \wedge L(x,y).$$ Now an expression for reachability can be defined as$$\phi(x,y) \equiv \exists L. ((x=y) \vee (\psi_1 \wedge \psi_2 \wedge \psi_3)).$$

    Expressiveness of first-order Logic

    It was known that the natural deduction proof system for FO logic is both sound and complete. Therefore, since verifying a proof is mechanical, deriving a proof must be as hard as checking its validity. Hence it is impossible to build a perfect automatic theorem prover for FO logic. In fact, validity of FO is undecidable since FO can formulate the Post correspondence problem. On the other hand, validity of FO is easier than satisfiability, since the former is enumerable while the latter is not.
    Suppose we have only equality and edge relation in our language. A $\phi$-graph problem is to decide whether there exists an interpretation such that a given finite graph is a model of formula $\phi$. The class L-graph is formed by all $\phi$-graph problems where $\phi$ is a formula defined in logic L.
    Fact. FO-graph is strictly in $P$.
    The inclusion is strict because reachability problem (which is P-complete) is not expressible in FO and thus not captured by FO-graph. In fact, any FO-graph problem can be decided in LOGSPACE (check this). Immerman and Vardi (1982) added inductive definitions to FO to get the logic LFP, which is sufficient to express P problems over ordered graphs, namely, there is a distinguished relation $<$ that is interpreted as a linear order over the vertices.
    Theorem. (Immerman-Vardi 1982) LFP-graph is in $P$.
    However, for graphs without the ordering assumption, there are easily poly-time computable properties that are not definable in LFP. Hence LFP is still too weak to capture $P$. Grohe, Holm, and Laubner have added linear-algebraic operators to LFP. This properly extends the power of LFP with counting, but its relationship to $P$ remains an open question.

    Expressiveness of second-order Logic

    Second-order logic (SO) extends first-order logic with quantifications over relations and sets. Monadic second-order logic (MSO) is the restriction of SO where the second-order quantifiers are only over sets. Existential second-order logic (ESO) is a restricted version of SO where only existential quantifications are allowed.
    Fact. MSO-graph is strictly between P and NP, since 3-colorability is in MSO-graph but Hamiltonian path problem is not (the latter two problems are NP-complete). However, Hamiltonian path problem can be captured by extending MSO with quantifiers over sets of edges.
    The following result shows that ESO is in some senses the normal form of SO.
    Proposition. (Buchi 1976) Any MSO formula can be re-written to an equivalent EMSO formula.
    ESO-graph captures NP. It is straightforward to show that ESO-graph is in NP by non-deterministically choosing a relation for in polynomial time. Now let $\phi_1$ and $\phi_2$ be defined as before. In addition, define $\phi_4 \equiv \forall u\forall v. L(u, v)\vee L(v, u) \vee u = v$. Observe that $\mathcal{G}$ is a model of $\exists L. \psi_1 \wedge \psi_2 \wedge \psi_4$ if and only if $\mathcal{G}$ contains a Hamiltonian path. Since deciding the existence of a Hamiltonian path is NP-complete, we have
    Theorem. (Fagin, 1974) $NP$ = ESO-graph.
    By Fagin's theorem, ESO is closed under negation if and only if NP = co-NP. Since P $\subseteq$ NP$~\cap~$co-NP, second-order logic covers all languages in the polynomial time hierarchy. Other second-order concepts, e.g., least fixed-point operator and transitive closure operator, lead to logics which characterise further complexity classes like P, NLOGSPACE, PSPACE. For these results of descriptive complexity theory see [1].
    The "weak" MSO (WMSO) is a further restricted version of MSO such that the set quantifiers range only over finite sets. The WMSO theory of $(\mathbb N, <)$ is a particular useful theory called Weak (monadic) Second-order Logic with one Successor (WS1S). Buchi has shown that WS1S is decidable via automata-theoretic approaches.
    Theorem. (Buchi 1960) One an effectively construct an input-free finite automaton $A_\varphi$ from a sentence $\varphi$ in WS1S, such that $L(A_\varphi)\neq \emptyset$ iff $\varphi$ holds. Conversely, given an input-free automaton $A$ one can defined a sentence $\varphi_A$ in WS1S such that $L(A)\neq \emptyset$ iff $\varphi_A$ holds.
    The idea of reduction is to represent numbers in binary, namely, as finite words over $\{0,1\}$, and to view a finite word as the (characteristic function of) a finite set via finite-set interpretation [3,4]. For instance, integer $21$ corresponds to the word $10101$, which is interpreted to a finite set $\{1,3,5\}$.

    It is interesting to note that the expressiveness power of WS1S is equivalent to a first-order theory. Presburger Arithmetic (PA) is the first-order theory of the structure $(\mathbb N, +)$. In Buchi's 1960 paper it is noted that the decidability of PA can be inferred from that of WS1S. The decidability of Presburger arithmetic follows by translating first-order sentences in the signature $+$ to WS1S. To translate back from WS1S (or from finite automata), we would need to extend PA with some arithmetical means such as $|_2$, which associates each number $m>0$ with the greatest power of 2 that divides $m$. [4]
    Theorem. A relation is definable in WS1S iff it is definable in the FO theory of $(\mathbb N, +, |_2)$.

    References

    1. Finite Model Theory, Springer-Verlag, New York 1995.
    2. Languages, Automata, and Logic, Handbook of Formal Languages, Volume 3.
    3. Transforming structures by set interpretations. LMCS 2007.
    4. Automatic Structures. PhD Thesis. 2004.

    Thursday, August 20, 2015

    Proving theorems with Leon

    Here are some high level guides to proving theorems with Leon:
    1. To prove $A\implies B$, you usually need to finds some statements $A_1,...,A_n$ and prove $A\implies A_1$, $A_1\implies A_2$, ..., $A_n\implies B$. These intermediate results are called Lemmas. $A\implies A_1 \implies ... A_n \implies B$ is called an implication chain.
    2. You will find yourself get stuck at some step in the implication chain very often, say at $A_k\implies A_{k+1}$. To get through, you can try to find another route either from $A$ to $B$ or from $A_k$ to $A_{k+1}$. For example, if $A_k$ looks too weak for Leon to infer $A_{k+1}$, you can try to strengthen the hypothesis. At a high level, it is possible to find a sequence of statements $H_m, ..., H_k$, such that beginning from some $m$th step you can prove $A_m\implies A_{m+1}\wedge H_m$ and then $A_i\wedge H_{i-1} \implies A_{i+1}\wedge H_i$ for $i=m,...,k$ in turn, eventually reaching the desired result $A_k\wedge H_{k-1} \implies A_{k+1}$.
      (In Leon, a lemma is just a boolean function. Hence, to strength $A_i \implies A_{i+1}$ to $A_i\wedge H_{i-1} \implies A_{i+1}\wedge H_i$, you have to add $H_{i-1}$ to the pre-condition and add $H_i$ to the post-condition of the boolean function that asserts $A_i \implies A_{i+1}$.)
    3. In general, it is a good idea to made as strong assumption as possible for the proposition to verify, and set as weak requirements as possible for auxiliary lemmas. If the proposition assumes properties of a function (e.g. always returning nonzero values), putting the assumptions in the post-condition of that function can improve the odds Leon successfully verifying your proposition.
    4. I observe that Leon performs far better at proving a property as a post-condition than as a proposition. In many cases, a property that takes complicated proof hints and dozens of auxiliary lemmas to be verified as a proposition can be verified directly as a post-condition. Unfortunately, currently I are not very sure what makes this difference.
    5. Isolating a statement in a proof as a standalone lemma enables Leon to verify that statement via a separate induction process (by adding @induct to the new lemma), and to use a difference base case than the one, if any, used in the main proof.
    6. Use @library and @ignore annotations to help Leon concentrate on the part of interest in your code. Also, don't forget to use the (currently undocumented) --watch command-line option to turn on continuous verification.
    7. Avoid using higher-order functions. For example, try to define list operation from scratch without using helper functions such as map, filter, and foreach. The resulting code, while more tedious, is far easier to be verified by Leon as a whole.
    8. If you have provided logically sufficient lemmas to prove a proposition, and you find that Leon is able prove all the lemmas but the proposition, then check that if @induct annotation is used. If it is, check that if you set a pre-condition for the proposition (by using require) to prove. If you do, then it is possible that the pre-condition is not satisfied in the base case, and thus Leon cannot finish the induction.
    9. There seems to be some kind of non-determinism with Leon. In most cases, a property is either verifiable in milliseconds, or cannot be verified within even an hour. Hence, for efficiency, one usually set timeout as a small amount say 3 seconds. However, I notice that the time Leon takes to verify a lemma may vary drastically under some conditions yet unclear to me. For one of my lemmas, the time Leon takes to verify it ranges from 2 to 20 seconds in a sequence of 20 executions. Be careful not to waste time revising a proof that (suddenly) needs more time to be verified.
    10. Be careful about cyclic arguments. Leon cannot detect cyclic arguments, and is likely to regard them as valid proofs. The simplest example for this logic fallacy may be as follows. If you provide two lemmas A and B, and claim that lemma A holds because of lemma B, and lemma B holds because of lemma A. Then Leon will report that both A and B holds.
    11. More to come.

    Thursday, May 28, 2015

    A note on formal verification of programs

    Formal verification for softwares involves proving properties of programs under various assumptions. In order to automate the verifying process, we have to formalize the program to verify. This usually amounts to transforming the program from source code to model while preserving the properties of interests. In this post, we shall present an example formalization of programs. We shall also demonstrate how program properties can be verified using the presented formalization.

    Let's first consider a simple imperative programming language that can encode assignments, arithmetic operations, conditional branches, skips, sequences and non-recursive function calls. Clearly, any program written in this language has a finite number of possible execution paths, as well as a finite number of steps in each execution path. This fact allows us to transform a program to a logical formula, such that each model of the formula corresponds to a valid execution path of the program and vice versa.
    More precisely, suppose $F(x_1,...,x_n)$ is a program that takes $n$ inputs and gives one output. Then $F$ can be represented as a formula (called a path formula) in form of $$ \phi_F(x_1,...,x_n,y) = \bigvee\nolimits_{i \in I}\left( p_i \wedge y = e_i \right),$$where $I$ is a finite index set, $p_i$ is a predicate and $e_i$ is an expression. A path formula $\phi$ is called valid if the following conditions are satisfied: i) $\bigvee_{i \in I}\ p_i$ is a tautology, which means all possible paths are covered by the formula, ii) $\neg\bigvee_{i \neq j}(p_i \wedge p_j)$ is a tautology, which means no two paths will be taken simultaneously, and iii) an assignment to $\{x_1,..., x_n, y\}$ satisfies $\phi_F$ if and only if $f(x_1,..., x_n)=y$, which means solutions of the formula precisely characterizes the input-output relation of program $F$.
    One can employ techniques of symbolic execution to convert programs of various imperative languages into path formulae, see e.g., [1,2]. Such conversions usually allow one to reduce the problems of verifying program properties to those of checking satisfiability for first-order constraints. For example, to verify that function $F(x_1,x_2)$ is commutative, i.e., $\forall x_1,x_2.\ F(x_1,x_2) = F(x_2,x_1)$, it suffices to check that constraint $\phi_F(x_1,x_2,y) \wedge \phi_F(x_2,x_1,y') \wedge y \neq y'$ is unsatisfiable.
    Modern SMT solvers effectively decide formulas over combinations of theories such as uninterpreted functions, boolean algebra, integer arithmetic, bit-vectors, arrays, sets and maps [3]. Program verification thus can benefit from advances in SMT solvers by modelling programs as first-order constraints augmented with theories, see e.g., [4-7].

    Handling recursive functions

    Uninterpreted functions are often leveraged to facilitate automatic reasoning about recursive functions. The idea is to abstract each recursive call to an uninterpreted function call. If a constraint is unsatisfiable assuming uninterpreted functions, then it is also unsatisfiable assuming the correct interpretation. On the other hand, if a constraint is satisfiable assuming uninterpreted functions, then no reliable conclu-sion could be drawn for the correct interpretation, as the assumptions can be wrong. In [4], the authors proposes a technique to refine assumptions iteratively when the constraint is satisfiable. This technique is the cornerstone of the Leon verification system [5].
    As an illustration of the technique, consider function
    function size (list) {
      if (list == null) { return 0 } else { return 1 + size(list.next) }
    }
    
    with path formula $ \phi_{size} \equiv (b_1 \wedge y = 0) \vee (\neg b_1 \wedge y = 1 + \mathrm{size(list.next)}),$ and define $$ \phi = (\mathrm{size(list)} = y) \wedge (b_1 \Leftrightarrow \mathrm{list} = \mathrm{null}) \wedge \phi_{size}.$$Note that a boolean variable $b_1$, called a control literal, is introduced to represent the truth value of the branching condition in $\mathrm{size}$. The purpose of introducing control literals to a path formula is to guard invocations of uninterpreted functions. It turns out that we can switch between under- and over-approximation of a path formula by changing assignments of its control literals [4], as explained below.
    Suppose we want to verify property $\psi$, say $\psi \equiv \mathrm{size(list)} \ge 0$. Define $\psi_1 \equiv \neg\psi \wedge \phi$. Given a correct interpretation of $\mathrm{size}$, $\psi_1$ is satisfiable iff the $\mathrm{size}$ function satisfies property $\psi$. In particular, if $\psi_1$ is unsat, then so is $\psi$ for any interpretation of $\mathrm{size}$. However, if a counterexample is reported assuming uninterpreted functions, then this may be because $\mathrm{size(list.next)}$ is assigned an incorrect value. To exclude such case, we check the satisfiability of $\psi_1 \wedge b_1$. If the later formula is sat, then so is $\psi_1$ and we are done. Otherwise, we have to explore the branch restricted by $b_1$. For this purpose, we unfold the definition of $\mathrm{size(list.next)}$ one more time in $\phi$, and then rewrite formula $\psi_1$ with appropriate substitutions, e.g., replacing $\mathrm{list}$ with $\mathrm{list.next}$, using fresh variables for $b_1$ and $y$, etc. The new formula is denoted as $\psi_2$, and we proceed to check the satisfiabilty of $\psi_2$ and $\psi_2 \wedge b_2$, where $b_2$ is the control literal introduced by the unfolding.
    We repeat these steps and produce a sequence of alternating approximations. Thus the verification procedure for property $\psi$ would look like
        $n := 1$ while true do if $\psi_n$ is unsat then output UNSAT and break if $\psi_n\wedge b_n$ is sat then output SAT and break $n := n + 1$ done
    Observe that the procedure iteratively computes a succession of under- and over-approximations of the recursive calls in the provided constraint. Also, it finds a counterexample for the constraint whenever the constraint is satisfiable. Intuitively, this happens because a counterexample corresponds to an execution that violates the property, and the procedure enumerates all possible executions in increasing lengths. On the other hand, since our demo language becomes Turing complete after it is equipped with recursions, we cannot expect the procedure to always terminate when the provided constraint is unsatisfiable. For an important class of recursive functions, however, the approach outlined here acts as a decision procedure and terminates in all cases [8]. For instance, the $\mathrm{size}$ function listed above falls into this class.

    (To be completed)

    References

    1. Liu, Chang, et al. "Automating distributed partial aggregation." SoCC, 2014.
    2. Srivastava, et al. "Path-based inductive synthesis for program inversion." PLDI, 2010.
    3. De Moura, Leonardo, et al. "Satisfiability modulo theories: introduction and applications." Comm. of ACM, 2011.
    4. P. Suter, A. S. Koksal, et al. "Satisfiability modulo recursive programs." SAS, 2011.
    5. Blanc, Regis, et al. "An overview of the Leon verification system." SCALA, 2013.
    6. Bjørner, et al. "Program Verification as Satisfiability Modulo Theories." SMT@ IJCAR, 2012.
    7. Bjørner, et al. "Higher-order Program Verification as Satisfiability Modulo Theories with Algebraic Data-types." Higher-Order Program Analysis, 2013.
    8. Suter, P., et al. "Decision procedures for algebraic data types with abstractions." POPL, 2010.

    Monday, December 29, 2014

    Basic dependent type theory ─ Part I

    Computational trinitarianism. Proof Theory, Category Theory, and Type Theory, to a significant degree, can be regarded as fields that see the same subject from different points of view. They can be related via the follows:
    TT ⇔ CT: category semantics; WTT corresponds to morphisms and categories.
    PT ⇔ CT: category logic; an algebraic way of thinking about logic and proofs.
    PT ⇔ TT: thinking propositions as types and WTT as proofs.
    When you are learning a new idea in one field, it is helpful to transpose it to the other two fields and seek its correspondence there. See here and here for more details.
    Judgements. We start with three kinds of categorical judgements:
    $A\ type$   ─  $A$ is a type.
    $a \in A$    ─  $a$ is a value of type $A.$
    $a_1=a_2: A$ ─  $a_1$ and $a_2$ are "definitionally equivalent" values of type $A.$
    A judgement in form of $x_1:A_1,...,x_n:A_n \models a(x_1,...,x_n):A$ is called a hypothetical judgement. One may regard such a judgement as a mapping from $A_1,...,A_n$ to $A.$ $a(x_1,...,x_n)$ is called an open term that might involve free variables $x_1,...,x_n.$ In practice, we often consider the open term parametrized at $x_1,...,x_n.$ A closed term of type $A$ can be determined categorically after the values of $x_1,...,x_n$ are plugged in.
    Definitional equality is 1) an equivalence relation, 2) a congruence (see typing rules), and 3) respected by maps. That is, $$\frac{\Gamma, x:A \models b:B, \Gamma \models a_1\equiv a_2: A}{\Gamma \models [a_1/x]b \equiv [a_2/x]b : B}.$$The last feature is called the principle of functionality property. This property and its related concepts will play a central role later and in Homotopy type theory.
    Definitional equality has its limits. For example, one cannot prove definitional equivalence between $0+1$ and $1+0$ even given all the arithmetic rules about natural numbers. Definitional equality corresponds to calculating by simplification. There however are equations (such as $0+1=1+0$) that you wish to hold cannot be reached that way. These equations need not just calculation but proof, and thus cannot hold definitionally.
    Typing rules. We have the following kinds of rules to reason about types. Rules 1-5 come from Proof Theory and Rule 6 comes from Category Theory.
    1. Formation rules: how to construct a type.
    2. Introduction rules: how to create elements of a type.
    3. Elimination rules: how to use elements of a type.
    4. Congruence rules: how to interact with definitional equivalence.
    5. Computation rules: how to do calculation with types
    6. Uniquicity/Unisersality rules: characterize precisely types

    Negative types. E.g., Unit, product, and function. Elimination-oriented. Uniquicity of product: $\Gamma \vdash \langle fst(c), snd(c) \rangle \equiv c : A\times B.$ Computation of function is substitution. Elimination of function is application. Extension of function can be written as $$\frac{\Gamma, x:A \vdash a_1(x)\equiv a_2(x):B}{\Gamma \vdash a_1\equiv a_2: A\rightarrow B}$$This means that functions are characterized by their input-output behaviors.
    Positive types. E.g., Void and sum. Introduction-oriented. The Uniquicity property of positive types is problematic, and cannot be shown as for the negative types as directly. Elimination rule for the Void type:$$\frac{\Gamma\vdash a:\emptyset}{\Gamma\vdash abort(a): A}$$This means if you have a value of the Void type, you can abort the program with any type. Note that one can never build a term of type $A$ in this way since there is no closed term of type $\emptyset.$ One application of this rule is that if you are in a conditional branch, then one side of the conditional may be contradictory, and you can build an $abort$ element of type $A$ to make sure both elements for the condition have the same type.
    Negative types vs. positive types. Approximately, the two types differ most noticeably in how they behave under elimination. Reasoning about a negative type has noting to do with its structure. For example, a product is characterized by its behaviors on $\texttt{fst}$ and $\texttt{snd}$ without caring about what it actually is. A product can be realized by an ordered pair or just a black box that computes the $\texttt{fst}$ and $\texttt{snd}$ components on demand. Also, there is no induction principle for negative types. For example, all you know about a function is that it is a lambda term. Thus you cannot induct on its structure via the typing rules. In contrast, the structure of a positive type is enforced in the theory. The elimination form of positive types work by case analysis─the actual element in a sum is extracted by $\texttt{case}$ in the process of inference. Thus it cannot be anything else after the inference.
    A correspondence btw STT and IPT. $Unit$ is $true.$ $Void$ is $false.$ Sum is disjunction. Product is conjunction. Function is implication. $A\ type$ is $A\ prop.$ $\exists e: A$ is $A\ true.$ $\exists b.\, a:A \vdash b:B$ is $A\le B.$ Thus $A\le B$ means that there is a function transforming a proof for $A$ to a proof for $B.$ Note that $\le$ is a pre-order: that is, it is reflexive and transitive. We use $A\supset B$ to denote the largest element $x$ satisfying $A\wedge x \le B$, i.e., $C \wedge A \le B$ implies $C \le A \supset B.$ If an ordered structure satisfies 1) $\forall A: A\le \top$; 2) $\forall A: \bot \le A$; 3) $A\wedge B$ (meet) is a glb for $A,\,B$; 4) $A\vee B$ (join) is a sub for $A,\,B$; 5) $\wedge$ is distributive over $\vee$; 6) $A \supset B$ exists for every $A,B$, then it is called a Heyting pre-algebra. (It is called a pre-algebra because $\le$ is not anti-reflexive.)
    Negations. $\neg\ A$ is defined as $A \supset \bot.$ Thus we can get a contradiction from a value of $A.$ The fact $A\le \neg\neg\ A$ follows directly by the properties of $\supset.$ Its converse $\neg\neg\ A \le A$ however doesn't hold in general (it takes quite some efforts to check this statement). Similarly, while $A \vee \neg\ A \le \top$ is trivial, $\top \le A \vee \neg\ A$ cannot be proved in general without knowing something about $A.$
    By definition, $\neg A$ is the largest element inconsistent with $A.$ That is, $A\wedge\neg A \le \bot$ and $B\wedge\neg A \le \bot \implies B\le \neg A.$ The compliment $\overline{A}$ of $A$ is defined as the smallest element $B$ such that $\top\le A\vee\overline{A}.$ Hence $\top\le A\vee B$ implies $\overline{A}\le B.$ Note that i) $\neg A$ always exists while $\overline{A}$ does not; ii) it is reasonable to add the largest/smallest requirement to the definitions, for otherwise we can trivially choose $\neg\ A=\bot$ and $\overline{A}=\top$; iii) it is easy to check $\neg\ A \le \overline{A}$ (things in $\neg\ A$ are definitely not in $A$). The converse $\overline{A} \le \neg\ A$ however does not hold in general (the negation does not fill up the whole space).
    A Boolean algebra is also called a complemented Heyting algebra in that $\overline{A}$ exists for every $A$, with i) $\top \le A\vee \neg\ A$ (which implies $A\vee \neg\ A=A\vee \overline{A}$), and ii) $\overline{\overline{A}} \le A$ (note that $A \le \overline{\overline{A}}$ follows by definition). Heyting pre-algebras generalize Boolean pre-algebras in the sense that a Heyting pre-algebra satisfying $1 \le A \vee \neg A$ (law of excluded middle), or equivalently $\neg\neg A \le A$ (law of double negation), is a Boolean pre-algebra.
    Heyting algebras. An exponential $y^x$ is defined as $\overline{x}\vee y.$ A Heyting algebra can also be defined as a lattice with exponentials: $H$ is called a Heyting algebra if i) it has a preorder; ii) it has finite meets and joins, and iii) satisfies $y^x$ is the largest element $z$ such that $x \wedge z \le y.$
    Exercise. Show that every Heyting algebra is distributive.
    We give hints about proving $x \wedge (y \vee z) = (x \wedge y) \vee (x \wedge z).$ First, note that $z \wedge x \le y$ iff $z \le y^x.$ Hence, to prove direction $x \wedge (y \vee z) \le (x \wedge y) \vee (x \wedge z)$, it suffices to show $y \vee z \le ((x \wedge y) \vee (x \wedge z))^x$, which amounts to show that $u \le ((x \wedge y) \vee (x \wedge z))^x$ for $u=y, z.$ However, $u \le ((x \wedge y) \vee (x \wedge z))^x$ iff $u \wedge y \le (x \wedge y) \vee (x \wedge z).$ For $u=y, z$ this follows by definition. As to the other direction, note that $x \le y$ iff $\forall z.\,(y\le z$ $\implies x \le z).$ Hence, to prove $(x \wedge y) \vee (x \wedge z) \le x \wedge (y \vee z)$, it suffices to show that $(x \wedge y) \vee (x \wedge z) \le u$ implies $x \wedge (y \vee z) \le u$, which is equivalent to $y \vee z \le u^x.$ Note that $(x \wedge y) \vee (x \wedge z) \le u$ implies both $x \wedge y \le u$ and $x \wedge z \le u.$ Thus we have $y \le u^x$ and $z \le u^x$, that is, $y \vee z \le u^x.$
    As an alternative, you can also use the completeness theorem for IPT. Completeness assures that anything that holds in every Heyting algebra must be provable in IPT, and vice versa. Since preorder is interpreted to entailment in IPT, you can transfer e.g., $x \wedge (y \vee z) \le (x \wedge y) \vee (x \wedge z)$ to $X \wedge (Y \vee Z)\ true \vdash (X \wedge Y) \vee (X \wedge Z)\ true$, and then reason about the logical connectives to do the proof. You can further add proof terms to obtain a FP problem $\Gamma, x: A \times (B + C) \vdash M: (A \times B) + (A \times C).$ Resolving this problem amounts to writing a program $M$ of type $(A \times B) + (A \times C)$, and it is not hard to see that $M = \texttt{case}(z.\,\texttt{inl}(\langle \texttt{fst}(x), z\rangle);\, z.\,\texttt{inr}(\langle \texttt{fst}(x), z\rangle))\ \texttt{snd}(x).$ It is sort of a cyclelogical matter for people to transfer problems around the trinitarian when they are doing proofs.

    Wednesday, December 10, 2014

    A logical interpretation of the function subtyping rule

    We discussed the subtyping rule for function type in this post. In short, a function type is contravariant in the argument types and covariant in the result type. Several days ago, I happened to notice that the Monotonicity Theorem for implication (see p.70 in [1]) looks very similar to the aforementioned subtyping rule in form. Given propositions $p,p',q,q'$ such that $p'\Rightarrow p$ and $q'\Rightarrow q$, the theorem asserts $$p'\vee q' \Rightarrow p \vee q.$$ Besides, it is immediate to see that $p \vee q \equiv \neg q \Rightarrow p$ and $q' \Rightarrow q \equiv \neg q \Rightarrow \neg q'$, assuming the law of double negation. Now, by writing $p$, $p'$, $\neg q$ and $\neg q'$ as $P$, $P'$, $Q$ and $Q'$, respectively, we obtain \begin{equation} (Q' \Rightarrow P') \Rightarrow (Q \Rightarrow P) \label{subtyping} \end{equation} given that $P'\Rightarrow P$ and $Q\Rightarrow Q'$.

    The connection between the above fact and the subtyping rule of function types can be established by treating the implication connective "$\Rightarrow$" as both a function-type constructor and a subtype-of relation. Indeed, if Apple is a subtype of Fruit then being an Apple implies being a Fruit. It is thus reasonable to use "Apple $\Rightarrow$ Fruit" to express the subtyping relation between Apple and Fruit. In view of this, the logical entailment in \eqref{subtyping} leads us to the following statement for function types: If $P'$ is a subtype of $P$ (i.e. $P' \Rightarrow P$) and $Q$ is a subtype of $Q'$ (i.e. $Q \Rightarrow Q'$), then function type $Q' \Rightarrow P'$ is a subtype of function type $Q \Rightarrow P$ (i.e. $(Q' \Rightarrow P') \Rightarrow (Q \Rightarrow P)$). In other words, a function type is contravariant in the argument types and covariant in the result type.

    Reference

    1. Gries, David, and Fred B. Schneider. "A logical approach to discrete math." Springer, 1993.

    Monday, September 8, 2014

    A note on expectation transformer semantics

    An expectation is a function that maps program states to either a nonnegative real value, or a non-negative real-valued expression over state variables. When the program contains randomized operations, an expectation here is in fact a nonnegative random variable over the distribution induced by the reachability probability of program states. An expectation is called a post-expectation when it is to be evaluated in on final states. An expectation is called a pre-expectation if it is to be evaluated on initial states.

    Intuitively, we may regard a post-expectation as a random variable representing nonnegative "rewards" of final states. One can examine the program beforehand to estimate for each initial state the expected rewards when the program runs repeatedly from there. This estimate, called the weakest pre-expectation, is a nonnegative function that maps an initial state to the expected reward over the final states starting from that state. In other words, a weakest pre-expectation evaluates to the expected value of the post-expectation over the distribution of final states produced by program executions.

    Given a probabilistic program $prog$ and a post-expectation $postE$ over final states, we use $wp(prog,postE)$ to denote the weakest pre-expectation of $prog$ with respect to $postE$. As a function over state variables, $wp(prog,postE)$ maps an initial state of $prog$ to the expected value of $postE$ in the final distribution reached via executions of $prog$. In particular, $wp(prog,1)$ maps an initial state to the probability that the program will terminate from that state, and $wp(prog,[pred])$ is the probability that the program will terminate in a state satisfying predicate $pred$. (Here we use square brackets $[\cdot]$ to denote an indicator function, so that given a predicate $pred$, $[pred]$ evaluates to 1 on states satisfying $pred$ and evaluates to 0 otherwise.)

    Expectations are quantitative analogue to predicates in the predicate transformer semantics; they yield expectation transformer semantics of probabilistic programs. An expectation transformer is a total function between two expectations on the states of a program. The expectation transformer $wp(prog,postE)$ gives the least expected value of $postE$ over the distribution produced by the executions of $prog$. The annotation $\langle {preE} \rangle\; prog\;\langle {postE} \rangle$ holds for total correctness if and only if \begin{equation} preE\le wp(prog,postE),\label{eq:pre-wp-post} \end{equation} where $\le$ is interpreted in a point-wise manner. In other words, $preE$ gives in each initial state a lower bound for the expected value of $postE$ on final states when $prog$ starts from that initial state.

    In the special case when expectations $preE$ and $postE$ are given by $[pre]$ and $[post]$, respectively, where $pre$ and $post$ are predicates, annotation $\langle {preE} \rangle\; prog\;\langle {postE} \rangle$ is just the embedded form of a standard Hoare-triple specification $\{pre\}\; prog\;\{post\}$, i.e., we can assume the truth of $post$ after the execution of $prog$, provided $pre$ holds in the initial state. More precisely, we have \[ [pre]\le wp(prog,[post])\quad\mbox{iff}\quad pre\Rightarrow wp(prog,post). \]In this view, inequality between expectations in expectation transformer semantics can be seen as a generalization of implication between predicates in predicate transformer semantics.

    Reference

    McIver, Annabelle, and Charles Carroll Morgan. "Abstraction, refinement and proof for probabilistic systems." Springer, 2006.

    Monday, September 1, 2014

    A note on predicate transformers and loop invariants

    For standard programs, the computational model of execution supports a "logical" view—given a set of final states, we can examine the program to determine the largest set of initial states from which execution of the program is guaranteed to reach the final states. The sets of states are characterized by logical formula called predicates, and the program is being regarded as a predicate transformer, i.e., a total function that maps one predicate to another. We give some important terminology before we proceed:
    • An assertion is a predicate placed in a program to assert that it holds at that point
    • A precondition is an assertion placed before a program segment
    • A postcondition is an assertion placed after a program segment
    • A formula in form $\{precond\} \; program \; \{postcond\}$ is called a Hoare triple
    • A Hoare triple is valid iff whenever the precondition holds, the postcondition holds after the execution of the program segment terminates
    • A loop invariant is an assertion supposed to be true before and after each iteration of a loop
    The following example puts all ingredients together:
    // Precondition: (x=0 and y=0 and n≥0)
    while (y<n) { 
        // Loop invariant: 2*x = y*(y-1)
        x+=y; y++; 
    }
    // Postcondition: 2*x = n*(n-1)
    
    Observe that $\{x=0 \wedge y=0 \wedge n\ge0\}$ while(y<n){x+=y;y++;} $\{2x=n(n-1)\}$ is a valid Hoare triple and $2x = y(y-1)$ is an invariant for the while loop.

    Predicate transformer semantics provide an effective algorithm to verify a Hoare triple by reducing the verification problem to the problem of proving logical entailment: Given a Hoare triple $\{pre\} \; prog \; \{post\}$, we first compute the weakest precondition of executing $prog$ starting from $post$, and then check that it entails $pre$. This amounts to proving that $$pre \Rightarrow wp(prog, post).$$When $prog$ doesn't contain a loop, the weakest precondition is simply given by syntactic rules. However, the presence of loops may pose a problem because the precondition is given in terms of a least fixed point. Besides, a separate proof is required to establish that the loops terminate from their initial state. Formally, if $G$ is a predicate and $body$ is a loop-free program segment, then \begin{equation} wp({\rm while}(G)\{body\}, post) = \mu X.((G\Rightarrow wp(body, X)) \wedge (\neg G\Rightarrow post)),\label{loop-wp} \end{equation} where $\mu$ is the least fixed point operator w.r.t. logic implication. Given that the loop terminates and the least fixed point exists, the construction of the weakest precondition can be sketched in set theory as follows. Let $\Sigma$ denote the state space. We define a family $\{A_k:k\ge0\}$ of subsets of $\Sigma$ by induction over $k$: \begin{array}{rcl} A_0 & = & \emptyset \\ A_{k+1} & = & \left\{\ y \in \Sigma: ((G \Rightarrow wp(body, x \in A_k)) \wedge (\neg G \Rightarrow post))[x \leftarrow y]\ \right\} \\ \end{array} Informally, $A_k$ represents the set of initial states that makes the postcondition $post$ satisfied after less than $k$ iterations of the loop. We can then define the weakest precondition in \eqref{loop-wp} as a first-order predicate $\exists k. x \in A_k$.

    While an abstract construction as above is elegant, it can not be handled efficiently by theorem provers in practice. Fortunately, using special assertions called loop invariants we may by and large avoid the need of calculating fixed points. Suppose that we want to verify \begin{equation} pre \Rightarrow wp({\rm while} (G) \{body\}, post).\label{loop-hoare} \end{equation} Instead of calculating the least fixed point of the loop directly, we may divide the problem into simpler subtasks:
    1. Find an assertion $I$ such that $pre \Rightarrow I$ and $I\wedge\neg G \Rightarrow post$.
    2. Show that $I$ is an invariant: $I\wedge G \Rightarrow wp(body, I)$.
    3. Show that $I$ is sound: $I \Rightarrow wp({\rm while}(G)\{body\}, I\wedge\neg G)$.
    The first task ensures that the existence of $I$ establishes the desired relation between $pre$ and $post$. The second task considers a single iteration of the loop body and ensures that the execution of the loop preserves the validity of $I$. Note that it is easy to compute $wp(body, I)$ because $body$ is loop-free by assumption. In the third task, we check the soundness property such that every execution of the loop from a state satisfying the invariant can only terminate in a state that also satisfies the invariant and violates the guard $G$. It is still an open problem in the literature to give a necessary and sufficient condition for the soundness. On the other hand, soundness property can be established if one can show, e.g., by using a loop variant, that the loop terminates when it started from a state satisfying $I$.

    With invariant $I$, the relation between $pre$ and $post$ in \eqref{loop-hoare} is established as $$pre \Rightarrow I \Rightarrow wp({\rm while}(G)\{body\}, I\wedge\neg G)\quad {\rm and }\quad I\wedge\neg G \Rightarrow post.$$ Checking correctness for code without loops turns out be easy in practice. Hence, the crucial point in verifying a Hoare triple is to discover the necessary loop invariants for each loop in the program.

    References

    Gries, David. "The science of programming." Vol. 1981. Heidelberg: Springer, 1981.

    Friday, July 4, 2014

    Constructive Logic: The Law of the Excluded Middle

    In constructive logic, a proposition $P$ is called decidable if $P\vee\neg P \;true$ is provable constructively. That is, either we can write a proof to show $P \;true$, or we can write a proof to show $\neg P \;true$. Since $P$ might express an open problem for which we have neither a proof nor a refutation at present, not every proposition is decidable in the constructive sense. Hence, the Law of the Excluded Middle (LEM) cannot be expected to hold universally in constructive logic as it does in classic logic, where every proposition is decidable under the truth table interpretation.

    Note that this doesn't mean constructive logic refutes LEM. In fact, we are free to assume LEM in constructive logic without fear of causing inconsistency. Even when $P$ is not known to be decidable, it may still be proved constructively with the proviso that $P\vee\neg P \;true$ holds. Adding such an assumption is always safe, precisely because constructive logic does not refute any instance of LEM. In fact, classical logic can be regarded as a special case of constructive logic in which LEM holds for all propositions. Proofs in classical logic may, and often do, rely implicitly on the universal validity of LEM, whereas in constructive logic we often prove without it. Hence, any proposition provable in constructive logic is also provable in classical logic.

    What if one attempts to prove or disprove LEM, i.e., $P\vee\neg P \;true$, in constructive logic? In order to prove $P\vee\neg P \;true$, we must either prove $P \;true$ or prove $\neg P \;true$. However, we cannot expect to do this for arbitrary proposition $P$ in the absence of further information about it. Similarly, the only option to prove $\neg (P\vee\neg P) \;true$ is to assume either $P \;true$ or $\neg P \;true$, which leads to $P\vee\neg P \;true$, and then deduce absurdity. This is also impossible, for doing so for arbitrary $P$ means refuting all propositions or their negations. Hence, constructive logic neither affirms nor refutes LEM.

    Having known that one cannot prove either $P\vee\neg P \;true$ or $\neg (P\vee\neg P) \;true$, it may be surprising that $\neg \neg (P\vee\neg P) \;true$ is in fact provable. Here is an outline of the proof:
    1. Note that $\neg Q$ is a syntactic abbreviation of $Q\rightarrow \perp$. Hence, $\neg Q\;true$ can be proved by assuming $Q \;true$ and deriving a contradiction (i.e. $\perp \;true$) from the assumption. Also recall that a contradiction can be derived by assuming $Q\;true$ and $\neg Q\;true$, which is known as the Principle of Explosion.
    2. The strategy to prove $\neg \neg (P\vee\neg P) \;true$ is then to derive $P\vee\neg P \;true$ assuming $\neg (P\vee\neg P) \;true$, since $\neg (P\vee\neg P)\;true\Rightarrow P\vee\neg P \;true$ implies $\neg (P\vee\neg P)\Rightarrow \neg (P\vee\neg P)\;true, (P\vee\neg P)\;true$, which further implies $\neg (P\vee\neg P)\;true\Rightarrow\perp\;true$, i.e., the assumption leads to a contradiction.
    3. Assume that $\neg (P\vee\neg P) \;true$. If $P \;true$ is provable, then so is $P\vee\neg P \;true$. Since this leads to a contradiction, we have $P\;true\Rightarrow\perp\;true$, i.e., $\neg P \;true$ is provable. But again, this implies $P\vee\neg P \;true$ is provable, a contradiction. Since $\neg (P\vee\neg P) \;true$ always leads to contradictions, we have $\neg (P\vee\neg P)\;true \Rightarrow \perp\;true$, i.e., we found a proof of $\neg \neg (P\vee\neg P) \;true$.
    The derivation described above can be formally written as follows [2, 3]:
    $$ \genfrac{ }{ }{0.5}{0} { \genfrac{ }{ }{0.5}{0} { \genfrac{ }{ }{0.5}{0}{ } {\Gamma\vdash\neg(P\vee\neg P)\;true} R^u \quad \genfrac{ }{ }{0.5}{0} {\genfrac{ }{ }{0.5}{0}{ }{\Gamma\vdash P\;true} R^v} {\Gamma\vdash (P\vee\neg P)\;true} \vee\!{-}I{-}L } { \genfrac{ }{ }{0.5}{0} {\Gamma \vdash\!\bot\; true} { \genfrac{ }{ }{0.5}{0}{ }{\Gamma\vdash\neg(P\vee\neg P)\; true} R^u \quad \genfrac{ }{ }{0.5}{0} {\;\Gamma\vdash\neg P\;true} {\;\Gamma\vdash P\vee\neg P\; true} \vee\!{-}R } \neg I^v } } { \genfrac{ }{ }{0.5}{0} {\Gamma\vdash\!\bot\;true} {\neg\neg(P\vee\neg P)\;true} \neg{-}I^u } $$
    The fact that $\neg \neg (P\vee\neg P) \;true$ is provable means, while it is consistent to add LEM to constructive logic, we cannot add the refutation of LEM without degenerating the logic into inconsistency. Recall that, while LEM is not provable in general, we can assume it in a proof without fear of contradiction. In contrast, since the double negation of LEM is provable for every proposition, assuming the denial of LEM would soon lead us to absurdity.

    As is shown in the foregoing argument, $P \;true$ is in general not provable even though $\neg\neg P \;true$ is provable. On the other hand, it is straightforward to show that $P\;true\Rightarrow\neg\neg P\;true$. Therefore, $\neg\neg P$ is a proposition strictly weaker than $P$ itself in constructive logic. Namely, being able to reject any refutation of $P$ does not mean we actually know a proof of $P$, but being able to prove $P$ suffices to reject any refutation of it by  absurdity. It is possible to interpret classical logic in terms of constructive logic by doubly negating all statements, rendering them provable in constructive logic. From this perspective, constructive logic is more powerful than classical logic, in that it can distinguish constructive theorems from non-constructive ones whereas classical logic cannot.

    References and further readings

    1. Practical Foundations for Programming Languages, Robert Harper, 1st ed., 2012
    2. Supplements of Constructive Logic (Spring 2005), taught by Robert Harper
    3. Handouts of Constructive Logic (Fall 2009), taught by Frank Pfenning
    4. On the Meanings of the Logical Constants and the Justifications of the Logical Laws, Per Martin-Lof
    5. Truth of a Proposition, Evidence of a Judgement, Validity of a Proof, Per Martin-Lof
    6. A compilation of lecture notes on Intuitionistic Type Theory given by Per Martin-Lof, and a friendlier introduction to ITT by Silvio Valentini.
    7. Intuitionistic Logic, Dirk van Dalen

    Thursday, May 29, 2014

    Equivalence Checking in Lambda Calculus

    Definitional equivalence checking is the problem of determining whether two λ-terms in λ-calculus are equivalent in definition. (By "definitional", we mean that equivalence is defined directly by rules, rather than by an appeal to an operational semantics.) Equivalence of λ-terms in untyped λ-calculus is undecidable, see Scott's Theorem in [3]. This result is not surprising, since untyped λ-calculus is Turing complete. On the other hand, it is decidable for simply typed λ-calculus (aka. simple type theory, STT) with a single base type, see Sec. 6 of [4]. Equivalence checking would become undecidable once we extend STT with dependent types such as lists, or recursive data types such as natural numbers, see Chap 4 and 5 of [5].

    It is interesting to check the weaker relation that whether two λ-terms are equivalent in operational semantics, i.e., they reduce to the same value given the same context. For example, $\lambda x.x:\;Nat\rightarrow Nat$ and $\lambda x.pred(suc\ x): Nat\rightarrow Nat$ are not definitionally equivalent but they are operationally equivalent. For untyped λ-calculus this problem is clearly undecidable, but for STT I am not sure. I haven't found enough materials to say anything on this subject by far.

    References

    1. Notes on Simply Typed Lambda Calculus (another lecture note)
    2. Simply Typed Lambda Calculus on Wiki
    3. Practical Foundations for Programming Languages (Preview of the 1st ed.)
    4. Advanced Topics in Types and Programming Languages (available online)
    5. http://www.cs.cmu.edu/~rwh/courses/logic/www/handouts/logic.pdf
    Related Posts Plugin for WordPress, Blogger...