Ericpony's Blog: 2017

Wednesday, November 15, 2017

Liveness to safety reduction with predicate abstraction

An automata-based approach for LTL model checking

Linear-time Temporal Logic (LTL) can be model checked for finite-state systems using symbolic automata-based techniques [6]. Given a transition system $S$ and an LTL formula $φ$, the model checking problem (denoted as $S ⊨ φ$) asks whether $φ$ is true on all infinite execution paths of $S$. A classical automata-based approach [6] toward this problem consists of building a Buchi automaton $S_{¬φ}$ with a fairness condition $f_{¬φ}$ such that $S ⊨ φ$ iff $S × S_{¬φ} ⊨ FG\ ¬f_{¬φ}$. Therefore, when we model check an LTL formula, we may assume without loss of generality that the automata-based transformation has been applied and consider the problem in form of $S ⊨ FG\ ¬f_{¬φ}$. In the following, we shall drop the subscript $¬φ$ and simply regard $FG\ ¬f$ as the target property to verify.

Liveness-to-safety reduction for finite-state systems

Liveness-to-safety reduction [4, 5] is a technique to translate an LTL model checking problem to an invariant verification problem. The invariant encodes the the absence of counterexamples to $FG\ ¬f$, namely, the system does not have a computation that visits some state $s ⊨ f$ infinitely often. The reduction algorithm augments the states of the system with flags seen, triggered, and loop, as well as maintaining a fresh copy of the state variables. It then simulates the system execution and in the meantime guesses the entry point of a cycle by non-deterministically flagging a state as "seen" and saving the valuation of the state variables. An execution will be "triggered" if it encounters a state $s ⊨ f$ after having flagged a state as "seen". Finally, a loop is detected if a triggered execution enters a seen state. The safety property thus specifies that no loop is detected on any execution. This reduction is sound and complete for finite-state and parameterised systems, since a liveness property is violated in such systems iff a lasso-shaped counterexample is present.

Liveness-to-safety reduction for infinite-state systems

The above liveness-to-safety reduction is unsound for infinite-state systems where an execution can violate a liveness property without visiting the same state twice. To overcome this problem, the authors of [1] propose to incorporate the notion of predicate abstraction in the reduction. The abstraction is supposed to be coarse enough to map a real counterexample (i.e. a path from an initial state that visits the fairness condition infinitely often) to a lasso-shaped abstract path, but at the same time precise enough not to admit spurious abstract cycles. The soundness of this approach is based on the fact that existential abstraction preserves any temporal logic without existential quantification over paths, e.g., $\forall$CTL$^*$ and LTL [7, 8].

Overview of the reduction procedure

Below we introduce the liveness-to-safety reduction proposed in [1] for infinite-state systems. Give a system $S$ and an LTL property $FG\ ¬f$, the procedure generates a sequence of safety verification problems $$S_0 ⊨_{inv} φ_0 ,\quad S_1 ⊨_{inv} φ_1,\quad \dots.$$ We shall denote the encoding of the $i$-th safety property as $$S_i, φ_i := {\rm encode}(S, f, P_i, W_i),$$ where $P_i$ is a set of state predicate, and $W_i$ is a set well-founded relations. Furthermore, the encoding ensures that $S ⊨ FG\ ¬f$ holds if $S_i ⊨_{inv} φ_i$ for some $i$; here $φ_i$ essentially states that cycles consisting of states satisfying $f$ is unreachable. The iteration terminates if some generated safety property is satisfied. Otherwise, suppose we have found $S_i \not⊨_{inv} φ_i$ for some $i$. We analyse a finite counterexample path $π$ of $S_i$ to determine whether it corresponds to an infinite real counterexample for $FG\ ¬f$ in $S$. If so, then we conclude that the property does not hold. Otherwise, if we can conclude that $π$ does not correspond to any real counterexample in $S$, we try to extract a set of new predicates $P'$ and/or well-founded relations $W'$ to produce a refined encoding: $$S_{i+1}, φ_{i+1} := {\rm encode}(S, f, P_i ∪ P', W_i ∪ W'),$$ where $P', W' := {\rm refine}(S_i, π, P_i, W_i)$. It is possible that the algorithm can neither confirm nor refute the existence of real counterexamples, in which case it aborts and returns “unknown”. The algorithm might also diverge and/or exhaust resources in various intermediate steps (e.g. in checking $S_i ⊨_{inv} φ_i$ or during refinement).
In the following, we shall describe the encoding and refinement procedures in more details. We begin with a simplified version that only uses predicates from $P$, i.e., $W = ∅$. We then describe how to extend the encoding to exploit also well-founded relations.

Counterexamples and refinement

Consider a safety checking problem $S_i ⊨_{inv} φ_i$. Each state of $S_i$ is a state of $S$ augmented with flags seen, triggered, and loop as in the case of finite-state system. However, instead of comparing the variation of state variables, the augmented system $S_i$ compares the bit-vector induced by the predicates $P_i$ to determine the equivalence between two states. A counterexample $π$ for a safety property $S_i ⊨_{inv} φ_i$ is a finite sequence of tuples where each tuple consists of an abstract state and a concrete state. The projection of the sequence to abstract states forms a lasso-shaped abstract path, and the projection to concrete states recovers the execution from which this counterexample is obtained. When the invariant is violated, we need to make sure that the obtained counterexample does not result from an insufficient precision of the abstraction induced by $P_i$. To do this, the algorithm first searches for a concrete lasso witnessing a real violation of the LTL property, using standard bounded model checking. If this fails, it tries to prove that the abstract lasso is infeasible by checking an increasing number of finite unrollings of the loop. Upon infeasibility, new predicates can be extracted from sequence interpolants with standard predicate refinement strategies for invariant properties [3].

Enhancement with well-founded relations

In the refinement operation described above, new predicates are extracted at the point when the unrolling of a spurious abstract counterexample becomes infeasible. However, it is possible that all finite unrollings of the abstract loop are feasible, but there is no concrete path executing the loop for an infinite number of times. If this happens, the refinement step might not terminate even though the abstract path cannot be concretised. The following example illustrate one such situations.

Example. Suppose that we want to verify an LTL property $FG\ ¬(x≤y)$ against a transition system $$S=(ℕ \times ℕ, \{(0,y) \mid y≥0\}, \{(x,y) \to (x+1, y)\}).$$ Let $P_0 = \{(x≤y), (y=0)\}$ be the initial set of predicates. Then $S_0 ⊨_{inv} \varphi_0$ admits an abstract counterexample path through the self loop at $\langle(x≤y), ¬(y=0)\rangle$. Unrolling this abstract path, however, will not terminate: any path formula obtained by unrolling the abstract loop for $i$ times is feasible by starting from the concrete initial state $(0,i).$

To address the spurious abstract paths whose finite prefixes are all feasible, we extend the encoding to incorporate well-founded relations over the concrete states. The idea is to block these spurious abstract paths by finding suitable termination arguments in form of (disjunctive) well-founded transition invariants. More precisely, given a finite set $W$ of well-founded relations, we encode $S, φ$ such that a lasso-shaped abstract path leads to a counterexample only if the loop contains at least two concrete states not covered by any relation in $W$. The resulting abstraction is sound as it preserves all concrete infinite paths. Indeed, if there is an infinite path such that any two states on it are covered by some relation in $W$, then by Ramsey's Theorem, there must be a relation in $W$ admitting an infinite descending chain, a contradiction. Below we show how spurious counterexamples can be removed by a suitably chosen well-founded relation.

Example. Consider the spurious abstract counterexample in the previous example. Projecting its loop to concrete states leads to two states $(a,b)$ and $(a+1,b)$ for some integers $a ≤ b$. Including relation $R := \{((x,y), (x',y)) \mid x < x' ≤ y \}$ in $W$ will remove the path from the abstraction and prove the desired LTL property.

k-liveness as a well-founded relation. There are various heuristics to come up with new abstraction predicates and well-founded relations for refinement (see [1] for the details). That said, it is still possible that at some point all heuristics get stuck, failing to produce new predicates or relations to remove a spurious counterexample. In this situation, we may exploit the notion of k-liveness in hope to make further progress. Intuitively, we can maintain an auxiliary integer variable $k$ in the transition system to count the occurrences of $f$. With this counter, the refinement may always discover a new well-founded relation $R := \{ (s, s') ∣ k < k' ≤ n \}$ where $n$ is the number of occurrences of $f$ in the current counterexample. By adding this relation, we allow the procedure to progress by blocking all counterexamples where the occurrences of $f$ is less than or equal to $n$. This heuristic is sound since $f$ occurs infinitely often in a real counterexample.

References

1. Infinite-state Liveness-to-Safety via Implicit Abstraction and Well-founded Relations
2. Counterexample-guided Abstraction Refinement for Symbolic Model Checking
3. Abstractions from Proofs
4. Liveness Checking as Safety Checking
5. Liveness Checking as Safety Checking for Infinite State Spaces
6. An Automata-Theoretic Approach to Linear Temporal Logic
7. Model Checking and Abstraction
8. Property preserving abstractions for the verification of concurrent systems

Friday, October 20, 2017

Verifying liveness property for array systems

Numerous results in the literature of regular model checking have focused on parameterised concurrent systems, which consists of a parameterised number of identical concurrent processes. However, only a handful of these results have addressed parameterised systems allowing concurrent infinite-state processes, such as processes operating on variables over an unbounded data domain. Such kind of systems are also known as parameterised array systems, for a parameterised number of processes can easily be encoded via a parameterised array containing the local states of the processes. Existing results for verifying such systems have mostly dealt with safety properties. For liveness properties, it is sometimes possible to apply an ad-hoc finitary abstraction so that the system under consideration can be handled in the classical context of regular model checking. The following example demonstrates how finitary abstraction may be applied to verify an array system.

Chang-Robert's algorithm is a leader election protocol for a clockwise directed ring of processes where each process has a unique ID. Initially, all processes are active. The algorithm starts from one initiator sending a message to the next process in the ring, tagged with its ID. When an active process with ID $p$ receives a message tagged with $q$, there are three cases: i) $q < p$ : then $p$ drops the message. ii) $q > p$ : then $p$ becomes passive, and passes on the message. iii) $p = q$ : then $p$ becomes the leader. A passive process simply forwards every message it receives. If a process receives a message while it is holding one, the message with a larger ID will take over the another message.
The idea behind Chang-Robert's algorithm is that only the message with the highest ID will complete the round trip and finish the election by the third case. The parameterised version of the algorithm has two sources of infinity: the number of processes and the number of IDs. However, observe that we do not need the exact ID to run the election correctly---it suffices to know which ID is the highest. Hence, we can apply an abstraction that maps the highest ID to 1 and the other IDs to 0. It is clear that a process is elected as the leader in the concrete system iff it is so in the abstracted system. Verification of the liveness property "a leader will eventually be elected" then reduces to a fair termination problem amenable to off-the-shelf regular model checking tools.
Unfortunately, not every parameterised array system can be successfully simplified using a finitary abstraction. The following example provides an evidence for this fact.

Dijkstra's self-stabilising mutual exclusion algorithm assumes a clockwise directed ring of processes $p_0,\dots,p_{N-1}$ with $N\ge 2$. Each process $p_i$ holds an integer variable $x_i$ that can be read by the process $p_{i-1~{\rm mod}~N}$ in the ring. The values of each $x_i$ ranges in $\{0, \dots, K-1\}$, where $K\ge N$. A process is privileged if one of the following two conditions holds:

For $i>0$, $p_i$ is privileged if $x_i \neq x_{i-1}$.
$p_0$ is privileged if $x_0 = x_{N-1}$.

A privileged processes $p_i$ is allowed to update its local variable $x_i$, causing the loss of its privilege:

For $i>0$, $p_i$ can copy the value of $x_{i-1}$ to $x_i$ if $x_{i-1} \neq x_i$.
$p_0$ can set $x_0 \leftarrow (x_0+1)~{\rm mod}~K$ if $x_0 = x_{N-1}$.

Dijkstra's algorithm is self-stabilising in that the systems eventually converges to states where exactly one process is privileged. As before, the parameterised version of Dijkstra's algorithm has two sources of infinity: concurrency and numeric data type. Nevertheless, the liveness property of Dijkstra's algorithm cannot be as easily verified with a finitary abstraction as in the previous example. To see this, consider an abstraction function $\alpha$ that maps the concrete values of the variables $x_0,\dots,x_{N-1}$ to a finite abstract domain. Then for sufficiently large $K$, there is $k\in\{0,\dots,K-1\}$ such that $\alpha(k)=\alpha(k+1)$. Under this abstraction, however, any concrete state where $x_0,\dots,x_{N-1}\in\{k,k+1\}$ will result in a spurious fair path that does not stabilise. Clearly, such kind of spurious counterexamples would appear in any finitary abstraction of the data domain. What if we keep the data intact but instead apply finitary abstraction to the underlying topology? Unfortunately, it turns out that two incomparable processes alone suffice to produce a spurious counterexample.

Instead of verifying the desired property as a whole, we may break it into lemmas and prove them separately. More precisely, we can decompose the liveness property of Dijkstra's algorithm into three lemmas:

Lemma 1: The value of $x_0$ is updated infinitely often.

Lemma 2: Eventually $x_0 \neq x_i$ for all $i \neq 0$.

Lemma 3: Eventually $p_0$ is the only privileged process.

Lemma 1 implies Lemma 2 due to the assumption that $n \le m$ and the pigeonhole principle. Lemma 2 implies Lemma 3 by a simple propagation argument. With appropriate abstraction, these two implication relations can be modelled as liveness properties of simple token-passing rings and are amenable to standard liveness checkers (e.g. [1]) for regular model checking. Below we shall concentrate on how to automate the the proof of Lemma 1.

We first establish an invariant "it always holds that at least one process is privileged". This invariant can be seen by abstracting the value of each $x_i$ to EQ if $x_i = x_0$, and to NE if $x_i \neq x_0$. The invariant then follows from the fact that an abstract state is either in form of [EQ...EQ], where $p_0$ is privileged, or in form of [...EQ NE...], where $p_i$ is privileged for some $i\ge 1$.

With the above invariant, we know that the system never gets stuck. Now we show that $x_0$ is updated infinitely often. Given an infinite run of the system, consider a *maximal* segment $\pi$ of the run such that the status of $x_0$ is not changed on $\pi$ (i.e. $x_0$ is always privileged or unprivileged on $\pi$). We argue that the length of $\pi$ is finite. Consider abstract states w ∈ (P+N)* such that w[i] = P if $p_i$ is privileged, and w[i] = N if $p_i$ is not privileged. Then the transitions on $\pi$ are abstracted to

x·P·N·y → x·N·P·y
x·P·P·y → x·N·P·y
x·P·P·y → x·N·N·y
x·P → x·N

where x ∈ (P+N)(P+N)* and y ∈ (P+N)*. Now it is clear that $\pi$ is finite, since each abstract transition either moves a P rightward, or eliminates at least one P from the state. Therefore the status of $x_0$ will change infinitely often on any infinite run, which implies that $x_0$ is updated infinitely often.

As a result, we can reduce Lemma 1 to a safety property and a termination property of simple token ring protocols, and verify them separately with safety verifiers (e.g. [2]) and termination verifiers (e.g. [3]).

References

1. SLRP: Solver for Liveness of Randomised Parameterised systems

2. Learning to Prove Safety over Parameterised Concurrent Systems

3. Fair Termination for Parameterized Probabilistic Concurrent Systems

Thursday, August 10, 2017

Logic grid puzzles and Beth's Definability Theorem

Logic Grid Puzzles are popular in magazines and newspapers. In such puzzles, the reader is asked to deduce an answer to a question from a set of clues in some scenario. For example, the scenario may be as follows: five guys are having drinks in a room. The clues are often information like "They all come from different countries", "The British is in blue" or "People in red don't drink alcohol". The question may be "What does the German drink?". The reader usually tries to deduce the answer by filling out a matrix with the clues, like the right one.
Puzzles of this kind involve turning an implicit characterisation (i.e. the clues) into an explicit one (i.e. a direct answer to the question). In formal logic, we may define a relation explicitly with a formula. More precisely, given a theory $S$ (i.e. a set of sentences), an n-ary relation $R(x_1,...,x_n)$ is explicitly defined by $\Phi$ w.r.t. $S$ if$$S \models \forall x_1,...,x_n: R(x_1,...,x_n)\equiv\Phi(x_1,...,x_n),$$where $\Phi(x_1,...,x_n)$ is a formula that doesn't refer to $R$. (If $A,B$ are theories, $A\models B$ means that every model of $A$ is also a model of $B.$) In contrast, a theory referring to a relation may not uniquely determine that relation. Intuitively, a theory characterises a relation $R$ iff we cannot find two models of the theory that are only different in their interpretations of the symbol $R$. More formally, if $S$ is a theory of vocabulary $\tau$, then $S$ implicitly defines a non-logical symbol $R\in\tau$ iff for all structures $A$ of vocabulary $\tau\setminus\{R\}$, there exist at most one relation $R^A$ over the domain of $A$ such that $A$ becomes a model of $S$ by interpreting $R$ to $R^A$. There are several equivalent ways to express this concept (e.g. this and this), but we shall use the following one for our convenience of proof:
Definition. A theory $S$ implicitly defines an n-ary relation $R(x_1,...,x_n)$ if$$S\cup S' \models \forall x_1,...,x_n: R(x_1,...,x_n)\equiv R'(x_1,...,x_n),$$where $S'$ is obtained by replacing every occurrence of $R$ in $S$ with a fresh relation symbol $R'$.
If a $R$ is explicitly defined by $\Phi$ w.r.t. $S$, then it is clear that $R$ can also be implicitly defined by $S$. The converse result is proved by Beth for first-order logic.
Theorem. (Beth's Definability Theorem for FOL) If a first-order property is implicitly definable then it can also be defined explicitly.
Beth's result is a corollary of Craig's Interpolation Theorem.
Lemma. If $A$ is a theory and $A\models \varphi$, then there is a finite set $A'\subseteq A$ such that $A'\models \varphi.$
Proof of lemma. Note that $A\models \varphi$ if and only if $A\cup \{\neg \varphi\}$ is unsatisfiable. By the compactness theorem, there is a finite subset $E\subseteq A\cup \{\neg \varphi\}$ that is unsatisfiable. If $\neg \varphi\not\in E$ then let $A' = E$. Otherwise, let $A' = E\setminus\{\neg \varphi\}$. Then $A'$ is the desired set. ◻
Proof of Beth's theorem. Suppose we have$$S\cup S' \models \forall x_1,...,x_n: R(x_1,...,x_n)\equiv R'(x_1,...,x_n).$$By the above lemma, we can assume w.l.o.g. that $S,S'$ are finite. Let $X=\bigwedge S$ and $X'=\bigwedge S'$. Our assumption implies that $$X\wedge X' \rightarrow \forall x_1,...,x_n: R(x_1,...,x_n)\equiv R'(x_1,...,x_n)$$is valid. By introducing fresh variables $p_1,...,p_n$, we can rewrite the above sentence to $X\wedge X' \rightarrow (R(p_1,...,p_n)\equiv R'(p_1,...,p_n))$, which is further equivalent to$$(X\wedge R(p_1,...,p_n)) \rightarrow (X' \rightarrow R'(p_1,...,p_n)).$$Since $X$ doesn't contain $R'$, and $X'$ doesn't contain $R$, by Craig's theorem there exist a formula $\Phi(p_1,...,p_n)$ containing neither $R$ nor $R'$, such that both$$(X\wedge R(p_1,...,p_n)) \rightarrow \Phi(p_1,...,p_n) \quad\mbox{and}\quad \Phi(p_1,...,p_n) \rightarrow (X' \rightarrow R'(p_1,...,p_n))$$ hold. Note that the latter can be rewritten to $\Phi(p_1,...,p_n) \rightarrow (X \rightarrow R(p_1,...,p_n))$ by replacing every occurrences of $R'$ with $R.$ It follows from the currying law that$$X \rightarrow (R(p_1,...,p_n) \rightarrow \Phi(p_1,...,p_n))\quad\mbox{and}\quad X \rightarrow (\Phi(p_1,...,p_n) \rightarrow R(p_1,...,p_n))$$are valid. Conjuncting the two formulas leads to $$X \rightarrow (R(p_1,...,p_n) \equiv \Phi(p_1,...,p_n)),$$ which further implies that $$S \models \forall x_1,...,x_n: R(x_1,...,x_n) \equiv \Phi(x_1,...,x_n).$$ This concludes the proof of the theorem. ◻
Craig's interpolation can be computed via syntactical transformation, as is shown in Craig's 1957 paper. Hence, we can always construct an explicit definition out of an implicit one. Since explicit definition is a syntactical notion (i.e. defined with a formula) while implicit definition tends to be more semantical (i.e. defined with a theory), Beth's theorem indicates that the syntactical definability of first-order logic is "complete" in some sense.

References

1. Craig’s Interpolation Lemma and Beth’s Definability Theorem

2. Understanding Definability in First-Order Logic

Friday, July 21, 2017

A note on linking ideas in academic writing

Flow of information

Sentences and ideas should be ordered and linked to present information in a reasonable manner. Here reasonable means the reader can easily find what he expects to see as he reads. The following example shows how sentences can be re-ordered to improve readability. It is extracted from the first paragraph of an essay with title "The main sources of diversity in work values":

Original		Revised
The importance of education should be pointed out as one of the crucial sources of diversity in attitudes to work. The rising educational level has been perceived as one of the most significant investments in the general quality of life. Tertiary education in particular plays a very special role in this respect.		One of the crucial sources of diversity in attitudes to work is education. The rising educational level has been perceived as one of the most significant investments in the general quality of life. In this respect, tertiary education in particular plays a very special role.

Comments. It is preferred that the first sentence connects directly to the title, as that is what a reader expects when he starts reading the paragraph. The importance of education is redundant here because education is already pointed out as a crucial source. In this respect should be put at the beginning of a sentence since it serves as a linking phrase.

Text cohesion

Cohesion refers to the way in which elements of language are linked together to make a text. The sequence of given information followed by new information can contribute to the cohesion of a text because the new information of one sentence often becomes the given information of the next sentence. The following is an extract from a thesis about the construction of stance in academic writing. Words are colourised to show how old information is rephrased or repeated to improve smooth transitions to new information.

Passage		Comments
For the thesis writer, the construction of stance takes a distinctive form which derives from the nature of the thesis itself.		Clarify the subject of this passage and identify the expected target readers—the thesis writers.
As noted above, writers have to convince the supervisors and examiners to accept their research, which means that the thesis is inherently persuasive.		Clarify the goal of thesis writers to bring up the key characteristic of a thesis: being persuasive.
However, acceptance depends upon satisfying two quite distinct requirements.		The subject acceptance (of persuasion) follows logically from the key old info "being persuasive".
The first is that the thesis should make an original contribution to knowledge.		Introduce a claim.
In creating new knowledge and conveying it to other members of the discipline, the writer’s stance is that of a professional.		Rephrase the claim and reach a partial conclusion.
The second requirement is that the thesis must show mastery of the field.		Make another claim.
The display of established knowledge is required because the work will be assessed and thus the writer must also take a stance as a candidate.		Rephrase the claim and make another partial conclusion.
However combining the roles of candidate and professional can lead to considerable conflict and uncertainty.		Summarise the partial conclusions and start a new discussion.

As this example demonstrates, the new information of one sentence often becomes the given information of the next sentence. This sequence of given information → new information links the ideas of the text together and creates cohesion.

Devices to link ideas

We present more examples showing how ideas can be linked between sentences grammatically. In the second sentences of each example, the given information are referred using three kinds of grammar patterns. New ideas can be introduced by adding extra information to the given information.

Examples		Comments
Contrast from differently doped regions immediately appears when the in-lens detector is switched on. This huge effect is believed to hold some important implications.		A comment (ie. the effect is huge) is implicitly introduced when the old information, which is a neutral description of a fact, is referred. It is preferable to separate description and evaluation clearly in academic writing.
It will be seen in the sample that the simulation is able to adapt again. The ability of this model to emulate changes in the characteristics of different sample types is cited as strong support for its validity.		The old information is rephrased by changing part of speech and inverting subject and object. By such rephrasing we avoid using "..., which ..." repeatedly for further explanation/elaboration.
In this analysis, "norm" refers to a prescribed or proscribed standard of behaviour. Such a definition is therefore broader than rules alone.		Using such a instead of this means that the author is going to generalise. In this case, the author intends to discuss a class of definitions instead of the definition of "norm".
The fact that the same British forces had to cover various commitments at once did stretch them to the limit, and perhaps beyond. The problem of over-stretch had emerged in the late 1950s.		The use of problem and over indicates the author's position in the following discussion on the aforementioned fact. Description and evaluation is cleanly separated.
For example, it would be interesting to find out if the active region contains more crystalline polymer as a result of operation. Such an investigation would involve using staining techniques.		The noun investigation is used to encapsulate the text to find out .... The use of such a indicates that the following discussion would not be restricted to that particular example.

Remark. The main device for linking ideas in these examples is called nominalisation, which is the use of a noun to express an idea that is previously expressed in the form of a clause or a sentence. Nominalisation is very useful in the construction of argument, as we can present what we already said in a shortened form and then move on to the new point following from that information. Here is an example:

I explained the results as such and such. This explanation is in agreement with earlier work.

We can add more details to the noun using prepositional phrases and clauses:

What I have sought to disprove is that there is an incompatibility between a "bad" nationalism and a "good" patriotism. The assertion that nationalism and patriotism are incompatible causes complete confusion when historians, even great ones, are confronted by past thinkers.

Tuesday, July 18, 2017

A note on useful decidable logics & theories

Note that this post discusses decidable fragments that allows at least some degree of quantifications, not decidable theories of ground formulas used in SMT solvers.

Decidable fragments of first-order logic

Monadic first-order logic. Also known as the relational monadic fragment (RMF) or the Löwenheim class [2], this fragment is a classical example of decidable first-order logic without equality. The RMF consists of the first-order formulas where all relation symbols are unary:$$\phi ::= R(x) \mid \neg \phi \mid \phi \vee \phi \mid \phi \wedge \phi \mid \exists x.\phi \mid \forall x.\phi.$$Note that equality is not allowed in this logic and there are no constant symbols. Satisfiability of the logic is first proved to be NE-complete [2], where NE = $\bigcup_c NTIME(2^{cn})$. Later it is shown that NE = NEXPTIME [1], and thus the problem is NEXPTIME-complete. The Löb-Gurevich class (cf. [8]) is a decidable extension of the RMF where unary function symbols are also allowed. Both of the classes enjoy the finite model property, cf [8].

Effectively propositional logic (EPR). [5] Also known as the Bernays-Schönfinkel class, this fragment consists of first-order formulas such that:
1. The vocabulary is restricted to constant and relation symbols.
2. The quantifier prefix is restricted to $\exists^*\forall^*$ for formulas in prenex normal form.
Condition 2 implies that EPR is not closed under negation. EPR enjoys the finite model property, meaning that a satisfiable formula is guaranteed to have a finite model. The satisfiability problem of EPR is NEXPTIME-complete.
We can extend EPR to allow quantifier alternation and function symbols while maintaining the same properties, as long as the formula in the extended logic is "stratified" [11]. More precisely, we define the quantifier alternation graph $G(\phi)$ for a formula $\phi$ as a directed graph where the set of vertices is the set of sorts, and for each function symbol $f : (s_1,\ldots, s_n) \to s$ in the skolem normal form of $\phi$, there is an edge $s_i \to s$ for $1 \le i \le n$. $\phi$ is called stratified if $G(\phi)$ does not contain a cycle through the sort of some universally quantified variable. For example, $\phi := \forall x. \exists y. f(x) = y$ is skolemized to $\forall x. f(x) = g(y)$ with $f,g : A \to B$. Hence $G(\phi)$ is acyclic and thus $\phi$ is in the decidable fragment.
Stratified formulas are also decidable, for there is no way to generate an infinite sequence of instantiation terms. By contrast, $\psi := \forall x. \exists y. f(y) = x$ is skolemized to $\forall x. f(g(x)) = x$ where $f: A \to B$ and $g: B \to A$. Hence $G(\psi)$ contains a cycle through the sort of $x$, and thus $\psi$ is not in the decidable fragment.
Formulas in the extended EPR can be effectively translated into propositional logic formulas through a instantiation process over a finite Henken domain, see e.g., [12,13]. Extended EPR is supported by first-order logic provers such as iProver (which is known as the most efficient solver for EPR), and by modern SMT solvers such as Z3.

Presburger arithmetic. The first-order theory of structure $(\mathbb N, +)$ can be decided by encoding $n\in \mathbb N$ in binary (LSB first), such that the atomic ternary-relation $+$ corresponds to a regular relation $+_2$ over this coding and $\mathbb N$ corresponds to the $\omega$-regular language $(0+1)^*0^\omega$. Buchi has shown in 1962 that given any formula $\phi(x_1,...,x_n)$ over ${\rm FO}(\mathbb N, +)$, one can effectively compute an $\omega$-automaton $A(x_1,...,x_n)$ that defines the same relation modulo convolution $\otimes$, namely, $\phi(x_1,...,x_n)$ holds iff $A$ accepts the $\omega$-word $x_1 \otimes \cdots \otimes x_n.$ Satisfiability problem for ${\rm FO}(\mathbb N, +)$ thus reduces to the emptiness problem for $\omega$-automata and is thus decidable. See [9] for a review of the algorithmic and computational complexity aspects of PA. Also see a recent historical review in this paper and its presentation.

A list of decidable theories can be found on Wikipedia.

References

1. Structural properties of complete problems for exponential time. 1997.
2. Complexity results for classes of quantification formulas. 2001.
3. Verification of randomised security protocols.
4. Decidability of second-order theories and automata on infinite trees. 1969.
5. Complexity results for classes of quantificational formulas. 1980.
6. The theory of ends, pushdown automata, and second-order logic. 1985.
7. Bisimulation through Probabilistic Testing. 1991.
8. The Classical Decision Problem. 2001.
9. A Survival Guide to Presburger Arithmetic. 2018.
10. Automata: From Logics to Algorithms. 2007.
11. Complete instantiation for quantified formulas in satisfiabiliby modulo theories. 2009.
12. Deciding effectively propositional logic using DPLL and substitution sets. 2008.
13. Proof systems for effectively propositional logic. 2008.

Tuesday, June 20, 2017

A note on bisimulation and coinduction

A machine $(X,Y,S,\delta,\lambda)$ is a 5-tuple composed by an input alphabet $X$, an output alphabet $Y$, a state space $S$, a transition function $\delta:S\times X \rightarrow S$, and an output function $\lambda: S\rightarrow Y.$ We assume that $\delta$ is total without loss of generality. Also, its domain be extended to $X^*$ by defining $\delta^*(s,\epsilon)=s$ and $\delta^*(s,wx)=\delta(\delta^*(s,w),x).$ A bisimulation $R$ between two machines $M=(X,Y,S,\delta,\lambda)$ and $M'=(X,Y,S',\delta',\lambda')$ is a relation on $S\times S'$ defined recursively as $$R=\{(s,s'): \lambda(s)=\lambda'(s') \wedge \forall x\in X. (\delta(s,x),\delta'(s',x))\in R\}.$$ The union of all bisimulations is the greatest bisimulation and is often denoted by $\sim$. When $\sim$ is defined for one machine, it forms an equivalence relation and is called bisimilarity. Bisimilarity can be characterised inductively as follows. We define a sequence of equivalence relations $R_0 \supseteq R_1 \supseteq \cdots$ by letting $R_0 = S\times S$ and for $n\ge 1$, $(s, s') \in R_n$ iff for all $x\in X$, $\delta(s, x)$ implies $\exists x'\in X.\ (\delta(s, x), \delta(s', x')) \in R_{n-1}.$ It is easy to verify that the $R_n$ converges to the bisimilarity $\sim$, namely, $\bigcap_{n\ge 0} R_n=\sim$. (To see this, note that $\supseteq$ holds as $R_n \supseteq \sim$ for all $n\ge 0.$ Also, $\bigcap_{n\ge 0} R_n$ is a bisimulation and thus $\subseteq$ holds.) This inductive definition gives a simple fixpoint procedure for computing bisimilarity, see below.

Bisimulation and language equivalence

We call a machine a (deterministic) automaton if its output alphabet, denoted by $\mathcal{B}$, consists of only "rejected" and "accepted", denoted by $\bot$ and $\top$, respectively. It is interesting to note that a bisimulation $R$ between two automata $M$ and $M'$ over the same input alphabet $X$ can be itself described by an automaton $(X,\mathcal{B},R,\delta_R,\lambda_R)$, such that
1. $\delta_R((s,s'), x)=(\delta(s,x),\delta'(s',x))$ iff $(\delta(s,x),\delta'(s',x))\in R,$ and
2. $\lambda_R((s,s'))=\lambda(s)=\lambda'(s').$
Given an automaton $M=(X,Y,S,\delta,\lambda)$, the state language of a state $s$ of $M$ is defined as $L(s)=\{w\in X^{*}:\lambda\circ\delta^*(s,w)=\top\}.$ We call $L(s)$ the language of $M$ when $s$ is the initial state of $M$. The following implication can be shown by induction on the length of words:$$s\sim s' \implies L(s)=L(s').$$
Bisimulation equivalence is also called behavioural or observational equivalence by regarding the observable behaviours of a state as its state language. In the world of finite-state systems, two other notions of equivalence are often used in practice to prove language equivalence. The first notion is graph isomorphism. As the minimal automaton recognising a ($\omega$-)regular language is unique up to graph isomorphism, checking language equivalence amounts to checking graph isomorphism between their minimised automata. The second notion is trace equivalence, which relates two states iff they yield the same set of traces. Trace equivalence and bisimulation equivalence coincide on deterministic machines but the latter is finer in general (e.g. consider $a(b+c)$ v.s $ab+ac$). Here is a comparison among the three relations on finite machines:
Complexity: Bisimulation equivalence is $O(m\log n)$; Graph isomorphism is in NP; Trace equivalence is PSPACE-Complete.
Fineness: Graph isomorphism (w.o. minimisation) $\subseteq$ Bisimulation equivalence $\subseteq$ Trace equivalence $\subseteq$ Language equivalence

The proof principle of coinduction

Coinduction is a mathematical tool to define bisimulation. To demonstrate its use, we present here a technique that proves language equivalence using coinduction [1]. A $w$-derivative of a language $L\subseteq X^{*}$ is defined as $L_{w}=\{u\in X^{*}:wu\in L\}.$ The set of $\mathcal{L}$ languages on $X$ can be defined as an automaton $M_{\mathcal{L}}=(X^{*},\mathcal{L},\mathcal{B},\delta,\lambda)$ such that $\delta(L,x)=L_{x}$ and $\lambda(L)=\top$ iff $\epsilon\in L$. It turns out that $L(l)=l$ for all $l\in\mathcal{L}$, namely the state language of each state in $M_{\mathcal{L}}$ is the state itself. Hence we have$$L\sim L' \iff L=L'.$$This fact provides a uniform way to prove the equivalence of two languages. The more familiar method of proof by induction requires that one start from the equality of the base case (which is the pair of the minimal words in the two languages) and proceed to establish the equivalence of the pairs of longer words. Induction however doesn't work when the languages contain words of infinite length. In contrast, proof by coinduction starts from $\{(L,L')\}$ and continues to include more and more pairs via the transitions until a least fixed point is reached. This fixed point, whenever exists, is a bisimulation relation contained in the bisimilarity $\sim$ and thus implies that $L = L'.$
The principle of coinduction is not always effective—the process of constructing a bisimulation may not terminate even when the target languages are equivalent. This fact can be expected since otherwise the Halting problem would be solvable. On the other hand, the coinduction process always terminates when the two languages under consideration are regular: it either constructs a bisimilarity when they are equivalent, or finds a counterexample when they are not. In the case when the two languages are the same regular language, the process actually constructs the smallest finite automaton recognising them (more details later). This fact is a corollary of Kleene's Theorem, stating that a language is regular iff it is accepted by a finite automaton; see Section 8 of [1] for details. In fact, the most efficient equivalence checking algorithm for regular languages up to date is based on coinduction [4].

Bisimulation and fixed points

Given a machine $M = (X,Y,S,\delta,\lambda)$, one can define a monotone function $$F(A) := \{(s,s')\in S\times S : \forall x\in X\cup\{\epsilon\}.(\delta(s,x),\delta(s',x))\in A\}$$on lattice $(2^{S\times S},\subseteq).$ In the terminologies of fixed-point theory, bisimulations are the post-fixed points of $F$ and bisimilarity is the greatest post-fixed point $\nu X.~F(X).$ (Note: these fixed points collapse when $\delta$ defines deterministic transitions.) In this regard, proof by coinduction can be seen as showing that a property is closed under backward computation and contained in a coinductively defined set (ie. the greatest fixed point). In contrast, induction involves showing a property is closed under forward computation and containing an inductively defined set (ie. the least pre-fixed point). Intuitively, backward means that the set is obtained by keeping refine larger sets until the first (ie. the greatest) fixpoint is reached; and forward oppositely means that the set is obtained by keeping extend smaller sets until the first (ie. the smallest) fixpoint is reached.
The following procedure gives a naive fixpoint computation for the bisimilarity over a finite and non-deterministic machine. The bisimilarity is represented as a functional relation $\rho \subset S\times\mathbb N$ that maps states to labels of the equivalence classes induced by the bisimilarity.

Procedure ComputeBisimilarity
Begin
Set $\rho := \{(s, 1) : s \in S\}$, $\rho' := \emptyset$
While $\rho \neq \rho'$ do
Set $\gamma := \{(s, \{ (x, \emptyset) : \delta(s,x)\in S \}) : s \in S \}$
For all $s, x, s'$ such that $\delta(s,x) = s'$
Set $\gamma(s)(x) := \gamma(s)(x) \cup \{ \rho(s') \}$
Set $id := 0$, $A' := \mathbb N$, $\rho' := \emptyset$
Let $L$ be a listing of $\gamma$ sorted by the second component
For each $(s,A)$ in $L$ do
If $A\neq A'$ then
Set $A' := A$, $id := id + 1$
Set $\rho' := \rho' \cup \{(s,id)\}$
Output $\rho$ as the equivalence classes induced by the bisimilarity
End

The procedure starts from a trivial binary relation where all states are equivalent. In each iteration, the algorithm computes a mapping $\gamma$ from each state to the labelling of its successors induced by the current bisimulation. A state will then be re-labelled according to the labelling of its successors changes. The refinement will continue until the labelling is stable, meaning that the greatest fixpoint (i.e. the unique largest bisimulation) is reached. See this paper for a survey of algorithms for computing bisimulations.
The fact that bisimulation relation can be constructed inductively implies that coinductive and inductive reasoning should yield the same results on inductively defined structures such as lists and trees, and make a difference only on structures that contain cycles or infinite paths. Further, while bisimulation coincides with bisimilarity on deterministic machines (such as $M_{\mathcal L}$), coinduction in general does not pinpoint bisimilarity for non-deterministic machines.

Bisimulation and minimisation

A homomorphism between two machines $M$ and $M'$ over the same input alphabets is any function $f:S\rightarrow S'$ satisfying $f(\delta(s,x))=\delta'(f(s),x)$, namely $$ \array{ S\times X &\overset{f\times id_X}{\longrightarrow}& S'\times X \\ \delta \downarrow & & \downarrow \delta' \\ S &\overset{f}{\longrightarrow}& S' }$$A homomorphism is called unique if given any homomorphism $g$ there exists an isomorphism $h$ such that $f=h\circ g$. The notion of unique homomorphisms is closely related to that of bisimulations. First, a mapping $f:S\rightarrow S'$ is a homomorphism iff $\{(s,f(s)):s\in S\}$ is a bisimulation between machines $M$ and $M'$. Furthermore, given a homomorphism $f:S\rightarrow S'$,
i) If $R$ is a bisimulation on $M$, then $f(R)=\{(f(u),f(v)):(u,v)\in R\}$ is a bisimulation on $M'$. [p.34, 5]
ii) If $R'$ is a bisimulation on $M'$, $f^{-1}(R')=\{(u,v):(f(u),f(v))\in R'\}$ is a bisimulation on $M$. [p.34, 5]
iii) $f$ is unique iff for any homomorphism $g:S\rightarrow S'$, the set $\{(f(s),g(s)):s\in S\}$ is a bisimulation on $M'$.
Unique homomorphism is a powerful utility to study languages and automata. For example, fix an automaton $M$ and consider a mapping $\phi: s \mapsto L(s)$ from $M$ to $M_{\mathcal L}.$ It is straightforward to check that $\phi$ is a unique homomorphism. One interesting fact about $\phi$ is that it identifies precisely the bisimilar states in $M$. That is, $$u\sim v \iff \phi(u)=\phi(v),$$which follows from the fact that $\{(u,v)\in S\times S: \phi(u)=\phi(v)\}$ is a bisimulation on $M$ and that $\{(\phi(u),\phi(v))\in \mathcal{L}\times \mathcal{L}:u\sim v\}$ is a bisimulation on $M_{\mathcal L}.$ Another amazing fact is that $\phi$ effectively minimises automata. To see this, let $\langle s \rangle_N$ (resp. $\langle S \rangle_N$) denote the sub-automaton generated by state $s$ (resp. sets of states $S$), i.e., the automaton induced by the states reachable from $s$ (resp. $S$), in automaton $N$. Also, we lift $\phi$ to a graph homomorphism such that given an automaton $N$, $\phi(N)$ is the sub-automaton of $M_{\mathcal L}$ generated by $\{\phi(s):s\in State(N)\}$. It turns out that$$\langle L(s) \rangle_{M_{\mathcal L}} = \langle \phi(s) \rangle_{M_{\mathcal L}} = \phi(\langle s \rangle_M).$$Since given $L$ we are free to choose any $s$ and $M$ satisfying $L(s)=L$, it follows that
i) If $L$ is regular, then $\langle L \rangle_{M_{\mathcal L}}$ is the canonical minimum automaton accepting $L$, and
ii) $\phi(\langle s \rangle_M)$ is the canonical minimisation of automaton $\langle s \rangle_M$.
In particular, $L$ is accepted by a finite automaton iff $\{L_w: w\in X^*\}$ is finite. This result is equivalent to the well-known characterisation of regular languages by Nerode and Myhill:
Theorem. (Nerode-Myhill) $L$ is accepted by a finite automaton if $R_L$ has a finite index. (Namely there are a finite number of classes in the equivalence relation $R_L$ on $L$ defined by $(u,v)\in R_L$ iff $\forall w\in L. (uw\in L \iff vw\in L).$)

References and further reading

1. Automata and Coinduction (An Exercise in Coalgebra)
2. Introduction to Bisimulation and Coinduction (book) (slides)
3. An introduction to (co)algebra and (co)induction
4. Checking NFA Equivalence with Bisimulations up to Congruence
5. Universal Coalgebra: A Theory of Systems
6. Bisimulation and Language Equivalence

Sunday, June 18, 2017

A note on useful decidable temporal logics

Subclasses of FO(LTL)

In the following, we assume that non-constant function symbols and equality are not allowed. The addition of them can lead to the loss of decidability for any of the subclasses introduced here.

Monadic FO(LTL), proposed by Hodkinson et al. [1] in 2000, is the first non-trivial decidable subclass of FO(LTL). This class contains all sub-formulas beginning with a temporal operator (Since or Until) have at most one free variable. Thus, we can talk about objects in the intended domain using the full power of first-order logic; however, temporal operators may be used to describe the development in time of only one object (two are enough to simulate the behaviour of Turing machines). The satisfiability problem for formulas in this class can be reduced to fragments of classical two-sorted first-order logics in which one sort is intended for temporal reasoning. The decidability results also hold for the temporal logics with finite domains.

Remark. FO(LTL) provides only ‘implicit’ access to time: quantification over points in time in the sense of first-order logic is not permitted, and the only means of expressing temporal properties is by the temporal operators Since and Until. It turns out that monadic FO(LTL) is weaker than the two-sorted first-order language, where one sort of which refers to points in time and the other to the first-order domain. For example, the formula $∃t_1 ∃t_2 (t_1 < t_2 ∧ ∀x(P(t_1, x) ↔ P(t_2, x)))$ is not expressible in FO(LTL) over any interesting class of linear time structures [2, 3].

References

1. Decidable fragments of first-order temporal logics.
2. Temporal Logic. Part I & Part II. Clarendon Press, Oxford.
3. Temporal connectives versus explicit timestamps in temporal query languages.

Friday, May 19, 2017

A note on useful decidable logics & theories - Part II

Decidable fragments of second-order logic

(Weak) MSO with one successor. The monadic second-order theory of structure $(\mathbb N, <)$ (denoted by S1S) is decidable. Buchi showed in 1962 that given any formula $\phi(x_1,...,x_n)$ over ${\rm MSO}(\mathbb N, <)$, one can effectively compute an $\omega$-automaton $A(x_1,...,x_n)$ defining the same relation modulo convolution. There is a similar connection between weak $MSO(\mathbb N, <)$ (denoted by WS1S) and finite-word automata. See [10] for a historical survey.

MSO over binary trees. The structure of infinite binary trees is $T_2 := (\{1,2\}^*,S_1,S_2)$ with $S_i := \{(v,vi) : v \in \{1,2\}^*\}$. Rabin's Tree Theorem [4] states that the MSO-theory of the infinite binary tree (denoted by S2S) is decidable, or in other terminology: that model-checking the binary tree with respect to MSO-properties is decidable. In [4], Rabin also proved the decidability of a bunch of theories using interpretation in $T_2$, such as MSO over n-ary trees (see below). Note that Rabin's theorem is subsumed by a more general result stating that MSO over the prefix-rewriting systems is decidable [?].

MSO over n-ary trees. The decidability results of MSO-theory over binary trees $T_2$ can be extended to $n$-ary branching trees $T_n$ with vertices in $[1..n]^*$ by encoding a vertex $a_1 a_2 \dots a_k$ of $T_n$ by a vertex $1^{a_1} 2 1^{a_2} 2 \dots 2 1^{a_k} 2$ of $T_2$. The translation of MSO-formulas is straightforward.

MSO over pushdown automata. Let $G := (V,E)$ denote the configuration graph of a pushdown automaton, where $V\subseteq Q\times\Sigma^*$. Let $n := \max(|Q|,|\Sigma|).$ One can MSO-interpret $G$ in $T_n$ by representing the configuration $(q, s_1,...,s_k)$ as the vertex $s_1...s_k\ q$ of $T_n$. Each step between the configurations in $G$ is translated in MSO to a local move from one vertex to another in $T_n$. Decidability of MSO over binary trees hence subsumes the fundamental result of Muller and Schupp [6] that checking MSO properties is decidable for the configuration graph of a pushdown automaton.

Decidable modal logic

Probabilistic modal logic (PML). Let $\Sigma$ be a finite set of actions. The syntax of the logic is defined as follows:$$\phi ::= \top \mid \bot \mid \phi\vee\phi \mid \neg\phi \mid (\phi)_a\ {\rm op}~p,$$where $a \in \Sigma$, ${\rm op}\in \{=,<\}$, and $0\le p\le 1$. The semantics of PML is defined over a discrete non-deterministic probabilistic transition system such that for $s\in S$, e.g.,$$s \models (\phi)_a \le p \quad {\rm iff} \quad \Pr(s' \models \phi \mid s\xrightarrow{a} s') \le p.$$Larsen and Skou [7] show that, when the transition probabilities are rational and bounded above zero, given any PML formula $\phi$ and fixed $0<\epsilon<1$, one can compute an experiment $t_{\epsilon,\phi}$ and an observation $o_{\epsilon,\phi}$ such that $s \models \phi$ iff $\Pr(o_{\epsilon,\phi}$ is observed by running $t_{\epsilon,\phi}$ on $s) \ge 1-\epsilon$, and $s \not\models \phi$ iff $\Pr(o_{\epsilon,\phi}$ is observed by running $t_{\epsilon,\phi}$ on $s) \le \epsilon$. Such experiments can be computed, executed and observed in polynomial time. Larsen and Skou show that PML characterises probabilistic bisimulation in the following sense: two states are probabilistic bisimilar iff they satisfy exactly the same set of PML formulas.

Misc

Reduction of decidability. Suppose that model checking $L$-theory of structure $S$ is known to be decidable. The model checking problem for a structure $S'$ with respect to $L'$-theory is decidable if we can give an $L$-description of $S'$ within structure $S$. That is, we characterise structure $S'$ using $L$-formulas over $S$. In other words, we provide a satisfiability-preserving translation of any $L'$-formulas $\phi$ into $L$-formula $\psi$ such that $S'\models \phi$ iff $S\models \psi$. Hence $L'$-theory over structure $S'$ is also decidable.

References

Saturday, May 6, 2017

A note on recognisability and definability

Definitions. An $n$-ary relation is associated with a set of $n$-tuples; an $n$-ary function $f$ is associated with a set of $(n+1)$-tuples $\{(a_1,...,a_n,f(a_1,...,a_n)\}$, which is called the graph of function $f.$ A sequence $s_1,s_2,...$ over $A$ is associated with a function $s:\mathbb N^n\rightarrow A$ such that $s(i)=s_i.$ Given an integer $n$, we use $(n)_p$ to denote its p-ary expansion; given a p-expansion $w\in\{0,...,p-1\}^*$, we use $[w]_p$ to denote its value $\sum_{i=0}^{|w|-1}p^{|w|-1-i}w[i].$ By convention, we set $(0)_p=\epsilon$ and $[\epsilon]_p=0.$

Recognisable Sets

Definitions. A set is called regular or recognisable if it can be recognised by a finite DFA. For $L\subseteq\Sigma^*$, $\sim_L$ is called a right congruence iff $u\sim_L v \iff uw\sim_L vw$ for all $w\in\Sigma^*.$ Note that a right congruence is an equivalence relation.

Myhill-Nerode Theorem. A set $L$ is regular iff $\sim_L$ has finite index over $\Sigma^*.$

Definitions. (p-recognisability) A set of integers is called p-recognisable iff the p-ary expansion of the integers is recognised by a finite automaton with alphabet $\{0,...,p-1\}.$ The automaton would read a word from left to right and ignores the leading zeros, so that if it accepts $w$ then it accepts all words in $0^*w.$
Fact. A sequence $s_1,s_2,...$ over $A$ is p-recognisable if $s^{-1}(a)$ is p-recognisable for each $a\in A.$ When $A$ is finite, it is easier to define a sequence in this way.

Definitions of recognisability generalise to n-ary relations naturally by replacing the alphabet $\{0,...,p-1\}$ to $\{0,...,p-1\}^n.$ All properties and facts listed above still hold in the general case.

Definitions. (p-automata) A p-automaton recognising a sequence $s_1,s_2,...$ over a finite domain is a complete deterministic Moore machine that reads $(n)_p$ and outputs $s_n$ for all $n\ge 1.$ In such case the sequence is called p-recognisable. Note that p-recognisability of sequences generalises p-recognisability of sets, since a set is p-recognisable iff there exists a p-automaton that outputs 1 on $(n)_p$ if $n$ belongs to the set and outputs 0 otherwise. Alternatively, we can describe a set of integers using a characteristic sequence. The characteristic sequence of a set $S$ refers to a sequence over $\{0,1\}$ where the nth bit is 1 iff $n\in S.$ A set is p-recognisable iff its characteristic sequence is p-recognisable.
A p-automaton can recognise an n-ary relation by recognising an n-dimensional sequence $s:\mathbb N^n\rightarrow\{0,1\}.$ The input language $((a_1)_p,...,(a_n)_p)$ is cleverly stipulated as a subset of $(\{0,...,p-1\}^n)^*$ instead of $(\{0,...,p-1\}^*)^n.$ Namely, components of a relation convoluted over alphabet $\{0,...,p-1\}^n$. In this way, theorems such as Myhill-Nerode still hold for $n\ge2.$

Definable Sets

Definitions. A set $L$ of $n$-tuples is called (first-order) definable by a formula $\phi$ in a first-order structure $\mathcal S$ iff $\phi$ has exactly $n$ free variables and $L=\{(a_1,...,a_n)\in D^n :$ $S\models \phi(a_1,...,a_n)\}$, where $D$ is the domain of $\mathcal S.$ A set is definable in a structure $\mathcal S$ if it is definable by a formula in $\mathcal S.$ When the structure is clear in the context we simply called a set definable. A sequence $s_1,s_2,...$ over $A$ is definable if the graph of the function $s:\mathbb N\rightarrow A$ associated with the sequence is definable.
Fact. A sequence $s_1,s_2,...$ over $A$ is definable iff $s^{-1}(a)$ is definable for each $a\in A.$ When $A$ is finite, it is easier to define a sequence in this way.

Definitions. (p-definability) Given $x\in\mathbb N$, define $V_p(x)\equiv$ the largest power of $p$ that divides $x.$ Let $V_p(0)\equiv 1.$ A subset of $\mathbb N$ is called $p$-definable if it is definable in $[\mathbb N,+,V_p].$ Given $x\in\mathbb N$, define $V_p(x)\equiv$ the largest power of $p$ that divides $x$ and $V_p(0)\equiv 1.$ Define a predicate $P_p$ that $P_p(x)$ holds iff $x$ is a power of $p.$ Define a binary relation $B_p$ such that $B_p(x,y)$ holds iff $y$ is a power of $p$ occurring in the p-ary expansion of $x.$ (Thus $B_p(x,y)$ iff $y<x$, $P_p(y)$, and there exits a $0\le z <y$ such that $y|(x-z).$)

Great efforts were made in the history to characterise 2-recognisable sets in an FO structure. Three candidate structures were considered for this purpose in order: $[\mathbb N, +, P_2]$, $[\mathbb N, +, B_2]$, and $[\mathbb N, +, V_2].$ We note two facts here about these structures:
Fact. $[\mathbb N, +, B_p]$ and $[\mathbb N, +, V_p]$ are (FO-)interpretable in each other.
Fact. $[\mathbb N, +, P_p]$ is strictly weaker than $[\mathbb N, +, V_p].$
To see the first, note that for $x>0$, $V_p(x)=y$ iff $B_2(x,y)\wedge \forall 0\le z<y.\ \neg B_p(x,z)$, and that $B_p(x,y)$ holds iff $\exists 0\le z <y.\ V_p(x-z, y).$ The second fact follows from an interesting decidability result. Let $p^x$ denote the exponential function. It is known that $[\mathbb N,+,V_p]$ and $[\mathbb N,+,p^x]$ are decidable, but $[\mathbb N,+,p^x,V_p]$ is undecidable. Clearly $P_p$ can be defined by $V_p$ and $p^x.$ If $[\mathbb N, +, P_p]$ is equal to $[\mathbb N, +, V_p]$, then $V_p$ will be definable by $p_x$, which implies $[\mathbb N,+,V_p]$ and $[\mathbb N,+,p^x,V_p]$ are mutually interpretable. This is impossible since the first is decidable and the second is not. The ultimate characterisation of 2-recognisable sets is given by the following theorem.
Theorem I. For any prime number p, a set is p-definable iff it is p-recognisable.
Proof. ($\Rightarrow$) It suffices to show that given any formula $\phi$ from a structure $\mathcal S$ equivalent to $[\mathbb N,+,V_p]$, we can construct a p-automaton $A_\phi$ recognising the set definable by $\phi$, i.e., $\mathcal S \models \phi(a_1,...,a_n)$ implies that $A_\phi$ accepts $([a_1]_p,...,[a_n]_p).$ To simply our proof, we choose $\mathcal S=[\mathbb N,R_1,R_2]$, where $R_1(x,y,z)$ holds iff $x+y=z$ and $R_2(x,y)$ holds iff $V_p(x)=y.$ Now we prove the claim by structural induction. First note that the three atom formulas $x=y$, $R_1(x,y,z)$, and $R_2(x,y)$ in $\mathcal S$ are p-recognisable (Exercise!). To complete the induction, it suffices to show that we can construct p-automata $A_{\neg\phi}$, $A_{\phi\vee\psi}$ and $A_{\exists x\phi}$ from p-automata $A_\phi$ and $A_\psi.$ Negation construction is immediate since p-automata is closed under complementation. For union, we modify the transition functions of the two p-automata so that they operate over the same set of input variables. For instance, suppose the free variables in $\phi$ are $x_1,...,x_m,y_1,...,y_n$ and the free variables in $\psi$ are $y_1,...,y_n,z_1,...,z_r.$ Then after the modification, the transition functions of the p-automata have the same domain $(x_1,...,x_m,$ $y_1,...,y_n,$ $z_1,...,z_r).$ The union of the two modified automata gives $A_{\phi\vee\psi}.$ For projection, to construct $\exists x.\phi(x,x_1,...,x_n)$ we just relabel the transitions of the p-automaton by replacing $(a,a_1,...,a_n)$ with $(a_1,...,a_n).$ Note that we may need to add new loops after the relabelling to recognise leading zeros caused by the projection. See [3] for details of this proof.

Corollary. The theory of $[\mathbb N,+,V_p]$ is decidable.
Proof. Given a sentence in $[\mathbb N,+,V_p]$, we can rewrite it to the form of either $\exists x.\phi(x)$ or $\neg\exists x.\phi(x).$ By Theorem I, $\phi(x)$ is p-recognisable. Hence we can construct an finite automaton $A$ that accepts a word $w$ iff $[\mathbb N,+,V_p]\models\phi([w]_p).$ Hence the truth of the sentence can be checked by checking emptiness of a recognisable language, which is decidable.

Corollary. The set of squares $\{n^2:n\in\mathbb N\}$ is not p-recognisable.
Proof. Suppose to the contrary that the set of squares is p-recognisable. By Theorem I, there exists a formula $\phi$ in $[\mathbb N,+,V_p]$ such that $\phi(x)$ holds iff $x$ is a square. Define $$\psi(c,d):=\phi(d)\wedge\forall d'.(d'>c\wedge\phi(d')\implies d'\ge d),$$ which asserts that $d$ is the smallest square larger than $c$. Observe that the set $\{(x,y)\in\mathbb N^2: y=x^2\}$ is p-definable by $$\lambda(x,y):=\phi(y)\wedge\exists a.\ \psi(y,a)\wedge a=y+2x+1.$$ Also, the set $\{(x,y,z)\in\mathbb N^3: z=xy\}$ is p-definable by formula
$\exists a,b,c.\ \lambda(a,x)\wedge\lambda(b,y)\wedge\phi(c)\wedge c=a+b+2z \wedge\exists d.\ \psi(c,d)\wedge d=c+2x+2y+1$
The validity of this formula exploits the fact that $c=(x+y)^2$ if $c$ is a square and the smallest square greater than $c$ is $c+2x+2y+1$ for some $x,y\in\mathbb N$. It then follows that multiplication is p-definable, which is impossible since $[\mathbb N,+,\cdot]$ is not decidable but $[\mathbb N,+,V_p]$ is.
(To be continued)

References and further reading

1. Recognizable sets of integers, Michel Rigo.
2. Languages, Automata, and Logic, Wolfgang Thomas.
3. Logic and p-recognizable sets of integers, V. Bruyère et al.

Wednesday, April 26, 2017

Deadlock-freeness, safey, and liveness

Deadlock-freeness of a concurrent systems refers to the property that at any time point, at least one process in the system is allowed to take an action. In other words, all reachable states of a deadlock-free system have a successor. At the first glance, deadlock-freeness is a safety property and can be handled as such with safety verifiers by marking all states with no successors as unsafe. However, there is a subtle fact about deadlock-freeness that make it different from safety.
Recall that the safety of a transition system $T$ can be proved by verifying a system with richer behaviours than $T$. Many tools exploit this fact to compute compatible over-approximation for safety verification. Hower, a sound over-approximation for safety properties may not be sound for deadlock-free properties, since deadlock-freeness is not preserved by disabling transitions. Hence, deadlock-free properties are in general not suitable to be verified with conventional safety verifiers as that may lead to unexpected results.
That said, if we are able to determine the set of deadlock-free states a priori, it is still possible to model deadlock-freeness as a safety property (eg., see [1]). For instance, we can mark all states with no successors as unsafe, and check if the system remains forever in safe states. The marking are static and will not be changed by adding or removing transitions. (In fact, using marking one may even model a liveness property as a safety property. But this doesn't simplify the problem since coming up with correct marking would require to check the liveness property first.)

References

1. Alpern, B. "Defining liveness", 1985.
2. A. Biere, C. Artho, and V. Schuppan. "Liveness checking as safety checking", 2002.
3. V. Schuppan and A. Biere. "Efficient reduction of finite state model checking to reachability
analysis", 2004.
4. V. Schuppan and A. Biere. "Liveness checking as safety checking for infinite state spaces", 2006.

Thursday, April 6, 2017

Existence of sets that are regular but not computable

A regular set is called computable if one can construct a finite-state automaton to recognise it. Given a finite alphabet $\Sigma$, a rational transition system is a tuple $(I, T)$ comprised of a regular set $I\subseteq\Sigma^*$ as the initial states and a rational relation $T\subseteq\Sigma^*\times\Sigma^*$ as the transition relation. We will use $T^k$ to denote the rational relation formed by composing $T$ for $k$ times, and $T^*$ to denote the reflexive and transitive closure of $T$. In this way, $T^*(I)$ will represent the reachability set of system $(I,T)$, i.e., the set of words that can be reached from the words in $I$ through a finite number of transitions.
It is clear that $T^k(I)$ is regular and computable for any $0\le k<\infty.$ It however may be less obvious that the reachability set $T^*(I)$ is not necessarily regular. To see this, consider a system $\Sigma\equiv\{a,b\}$, $I\equiv\{(ab)^n:n\ge 1\}$ and $T\equiv\{(xbay, xaby): x,y\in \Sigma^*\}.$ Then $T^*(I) \cap a^*b^* =\{a^nb^n:n\ge1\}$ is a well-know non-regular set, which implies that $T^*(I)$ is non-regular as well. On the other hand, there are rational transition systems of which the reachability sets are guaranteed to be regular, such as pushdown systems. In fact, the reachability relation of a pushdown system is regular and poly-time computable [12]. Surprisingly, $T^*(I)$ it is not necessarily computable even when it is regular. Namely, given a regular language $A$ and a rational transition system $(I, T)$, it is undecidable to check where $T^*(I)=A$ even though the former is known to be a regular set. This post will try to present a self-contained argument about the existence of such systems.
Subword ordering. Our argument would centre around a reflexive and transitive relation called "a subword of": given $x,y\in\Sigma^*$, $x\preceq y$ holds iff $x$ can be obtained by removing some segments from $y.$
Lemma. (Higman) A set is finite if all elements in it are mutually incomparable w.r.t. $\preceq.$
Proof. For $x,y\in\Sigma^*$, let $x\sim y$ iff there are distinct alphabets $a_1,...,a_n$ in $\Sigma$ such that both $x$ and $y$ match $a_1^+a_2^+...a_n^+.$ It is clear that $\sim$ is an equivalence relation splitting $\Sigma^*$ into $\binom{N}{1}1! + \binom{N}{2}2! + ... + \binom{N}{N} N!$ classes where $N=|\Sigma|$, and two words are comparable iff they are in the same class. ☐
Given a set $X\subseteq \Sigma^*$, define the upward closure of $X$ as $uc(X)\equiv\{w:\exists w'\in X.\ w'\preceq w\}.$ We say $X$ is upward closed iff $X=uc(X).$ Higman's lemma tells us that every upward closed set have a finite representation.
Lemma 1. Every upward closed set is regular.
Proof. Note that for any $X\subseteq\Sigma^*$, elements in $min(X)\equiv\{w\in X: \forall w'\in X.\ w'\not\preceq w\}$ are mutually incomparable. It follows from Higman's lemma that $min(X)$ is finite. Also, $X$ is upward closed iff $X=uc(X)=\bigcup_{w\in min(X)}wc(w)$, which implies $X$ is regular since $min(X)$ is finite and $uc(w)$ is regular for every $w\in\Sigma^*.$ ☐
Another consequence of Higman's lemma is that every increasing (in the sense of set inclusion) sequence of upward closed sets is finite.
Lemma 2. If $X_0\subseteq X_1\subseteq X_2\subseteq...$ and $X_i=uc(X_i)$ for all $i\ge0$, then there exists a finite $n$ such that $X_k=X_n$ for all $k\ge n.$
Proof. Let $X\equiv \bigcup_{i\ge 0} X_i.$ Since $min(X)$ is finite by Higman's lemma, there exists $n$ such that $X_k\cap min(X)=\emptyset$ for all $k\ge n.$ But this means $X_k\subseteq X_n$ for all $k\ge n.$ ☐
Lemma 3. For every infinite sequence of words $w_0,w_1,...\in \Sigma^*$, there exists $i<j$ such that $w_i\preceq w_j.$
Lemma 4. For any sets $X_0,X_1,...\subseteq\Sigma^*$, $uc(\bigcup_{i\ge 0}X_i)=\bigcup_{i\ge 0}uc(X_i).$
Lossy channel systems. We briefly introduce the concept of lossy channel systems. A channel system with $n$ queues are like a pushdown system with $n$ stacks except that it uses FIFO queues instead of FILO stacks. Channel systems are more powerful than pushdown systems, because they only need one queue to simulate Turing machines while pushdown systems need two stacks. A lossy channel system is a channel system where contents in the queues can be spontaneously lost. Technically, the capability of losing data can be introduced to a channel system by adding a non-deterministic self loop to each control state that removes an arbitrary alphabet from the queues.
Note that channel systems (lossy or not) can be expressed as rational transition systems where a configuration with n queues is encoded as a word in form of [control state]#[contents of the 1st queue]#...#[contents of the nth queue]#. We can define a subword ordering $\preceq_L$ over the configurations of a channel system as follows:

$c$#$w_1$#$...$#$w_n$# $\preceq_L$ $c'$#$w_1'$#$...$#$w_n'$#$\quad$ iff $\quad c=c'$ and $w_i\preceq w_i'$ for $i=1,...,n.$

Since the number of control states is finite, Lemma 1~3 also holds for $\preceq_L.$
Lossy channel systems enjoy many good properties. Among them, two properties are of particular interest for our establishment:
Fact 1. If $u\rightarrow^* w$ then $u\rightarrow^* w'$ for all $w'\preceq w.$ It follows that the complement of the reachability set is upward closed w.r.t. $\preceq_L.$
Fact 2. For any set of configurations $C$, the set $pre^*(C)\equiv\{w:\exists w'\in C.\ w\rightarrow^* w'\}$ of its predecessors is upward closed w.r.t. $\preceq_L.$
Theorem 1. The reachability set of a lossy channel system is regular.
Proof. This theorem is an immediate consequence of Fact 1, Lemma 1, and the fact that regular sets are closed under complementation. ☐
Given a system $(I, T)$ and a regular set $B\subseteq\Sigma^*$, the reachability problem asks whether $T^*(I)\cap B \neq\emptyset$, i.e., words in $B$ are reachable from words in $I$ through a finite number of transitions.
Theorem 2. The reachability problem of a lossy channel system is decidable.
Proof. Historically, this result was first proved in [6] using backward analysis as follows. Define the backward transition function $pre(C)\equiv\{w:\exists w'\in C.\ w\rightarrow w'\}$, which would preserve regularity when the forward transition relation is regular. Given a regular set $B$ of the bad configurations, let $X_0\equiv B$ and $X_{n+1}\equiv X_n\cup pre(X_n)$ for $n\ge 0$. Define $X\equiv\bigcup_{i\ge 0}X_i$. Note that $X$ is the set of configurations backward reachable from $B$, and thus is upward closed by Fact 2. By Lemma 4, we can construct another sequence $X_0',X_1',...$ converging to $X$ by defining $X_n'\equiv uc(X_n).$ Each $X_{n+1}'$ can be computed from $X_n'$ for $n\ge0.$ Hence, by Lemma 2, the fixed point $X$ can be computed within a finite number of steps. Since $T^*(I)\cap B\neq\emptyset$ if and only if $X\cap I\neq\emptyset$, the reachability problem is decidable. ☐
Remark. The above proof actually gives an algorithm to construct the reachability set. However, the algorithm relies on Higman’s Lemma for termination and thus the time complexity is not primitive recursive [8,9].

Now we proceed to show that the reachability set of a lossy channel system, albeit regular, is not computable. Given a system $(I,T)$ and a word $w$, the recurrent state problem asks whether there exists an infinite trace from $I$ that visits $w$ infinitely often.
Theorem 3. [6] The recurrent state problem is undecidable for lossy channel systems.

To solve the recurrent state problem it suffices to check $w \in T^*(I)$ and $w \in T^*(\{w\})$. Hence, if these two questions are decidable then so is the recurrent state problem, leading to a contradiction with Theorem 3. Therefore, we can conclude that there exists no procedure that computes a finite-state automaton (or any symbolic representation allowing effective membership query) for $T^*(I)$.
Corollary. There exists a regular transition system $(I,T)$ such that $T^*(I)$ is regular but not computable.
Remark. Does the above corollary holds when $T$ is length-preserving? I didn't find any result about this at the time I was writing this post.

Discussion. The proof of Theorem 2 relies on properties of the subword ordering (or a well-quasi-ordering in general) to assert that backward search always terminates. In fact, reachability can be shown both r.e. and co-r.e. whenever a regular proof exists: just launch two procedures at the same time, one enumerating all possible counter-examples and one enumerating all possible proofs. It is clear that exactly one of the two procedures will terminate. (Here a counterexample is an execution path from $I$ to $B$, and a proof is a regular set $P$ satisfying the following three proof rules: i) $I\subseteq P$, ii) $T(P)\subseteq P$, and iii) $P\cap B=\emptyset.$) In particular, reachability is decidable for a transition system whenever the set of reachable configurations is regular.
More generally, the above argument works whenever a "regular" symbolic proof exists and the one-step image $T(X)$ or pre-image $T^{-1}(X)$ can be computed effectively from $X$. The "regularity" here refers to the existence of any finitary representations that are computable and closed under complement and intersection. This argument has been applied to prove decidability results for transition systems such as Petri nets and lossy counter systems [10]. See [11] for a general discussion on this kind of proof techniques.

References and further reading

1. Transitive Closures of Regular Relations for Verifying Infinite-state Systems, 2000.
2. Unreliable Channels are Easier to Verify than Perfect Channels, 1996.
3. On-the-fly Analysis of Systems with Unbounded, Lossy FIFO Channels, 2005.
4. Effective Lossy Queue Languages, 2001.
5. Undecidable Problems in Unreliable Computations, 2003.
6. Undecidability of Verifying Programs with Unreliable Channels, 1996.
7. Verifying Systems with Infinite but Regular State Spaces, 1998.
8. Post Embedding Problem is not Primitive Recursive, 2007.
9. The Ordinal Recursive Complexity of Lossy Channel Systems, 2008.
10. Lossy Counter Machines Decidability Cheat Sheet, 2010.
11. A unified approach for studying the properties of transitions systems, TCS, 1982.
12. On the regular structure of prefix rewriting. TCS, 1992.

Saturday, March 4, 2017

A note on parallelizable functions

Yesterday I was accidentally asked that what kind of problems can be benefited the most from distributed computing frameworks such as MapReduce. I responded immediately that any problem that can be tackled by divide-and-conquer strategies should fit in the framework. However, on second thought, I found the answer missing the point because it remains unclear as to what kind of problems are solvable by divide and conquer. For example, sorting can be done in this way because sort(a++b) can be obtained from sort(a) and sort(b). How about the Boolean function is_sorted? The value of is_sort(a++b) can not be determined merely from the return values of is_sort(a) and is_sort(b). So there must be some characteristics that tell sort and is_sort apart.
Let's call a function parallelizable if it can be computed by divide and conquer. It is possible to "lift" a non-parallelizable function to a parallelizable one, such that the output of the original function can be obtained from that of the lifted function. In the above example, function is_sort can be made parallelizable by augmenting its return value with additional information max(s) and min(s) for input list s, since $a$$+\!\!+$$b$ is sorted in ascending order if and only if $a$ and $b$ are sorted in ascending order and $\max(a)\le \min(b)$. This parallelization is also effective because it reduces the time complexity from linear to logarithmic in appropriate computation models. So here we can ask several interesting questions: 1) What functions are parallelizable? 2) Is it always possible to lift a non-parallelizable function to a parallelizable one? 3) Do parallelizable functions always have efficient parallel implementations?

Homomorphism and Parallelism

After a brief survey, I found the rough ideas in my mind actually fall into a popular line of research over a long period of time. Particularly, there is a well-known result connecting homomorphisms with functions that admit divide-and-conquer imple-mentations.
Definition. A function $f : [A]\rightarrow B$ is $\odot$-homomorphic for operator $\odot:B\times B\rightarrow B$ if $f(s$$+\!\!+$$t) = f(s)\odot f(t)$ for any lists $s$ and $t$ on $[A]$, the set of finite lists of type $A.$
Note that the definition implies that $\odot$ is associative on $B$, since $+\!\!+$ is associative on $[A].$ Also, $f([\ ])$ is a unit of $\odot$ if $f$ is defined on $[\ ]$, which is a unit of $+\!\!+$ on $[A]$.
The First Homomorphism Theorem. ([1]) A function $f : [A]\rightarrow B$ can be computed by divide and conquer if and only if it is $\odot$-homomorphic for some $\odot$ on $B$.
It is clear that the $\odot$ operator captures the semantic of "combine" in a divide-and-conquer algorithm. In MapReduce, the conquer phase is done by a mapper and the combine phase by a reducer. However, as MapReduce does not specify the order of reduction, the $\odot$ operator (i.e. the reducer) has to be commutative for the outcome to be deterministic [2].
The above theorem tells us that a function is parallelizable iff it is homomorphic. Interestingly, it turns out that any non-homomorphic function can be trivially lifted to a homomorphism using a trick called "tupling". For example, this trick would lift $is\_sort: [Num]\rightarrow Bool$ to a function $is\_sort': [Num]\rightarrow [Num]\times Bool$ that is $\odot$-homomorphic with respect to $(s, a)\odot (t, b) \equiv (s$$+\!\!+$$t,\ is\_sort(s$$+\!\!+$$t))$. The parallel implementation based on this lift is clearly no better than its sequential counterpart. Similarly, $sort:[Num]\rightarrow[Num]$ is $\odot$-homomorphic when $\odot$ is defined by $s \odot t \equiv sort'(s$$+\!\!+$$t)$, where $sort'$ can be any sorting algorithm such as insertion sort, though the resulted parallel algorithm is inefficient and useless. Therefore, finding an efficient parallel implementation for a function, parallelizable or not, amounts to finding a $\odot$ operator with low computation costs. (To be continued)

References

1. Bird, Richard S. "An introduction to the theory of lists." 1987.
2. Chen, Y. F., et al. "Commutativity of Reducers." 2015.

Monday, February 20, 2017

DFA as proof for (recurrent) unreachability

Is this post we discussed how to prove unreachability problem $f^*(I)\cap B=\emptyset$ in an infinite-state transition system. In particular, we used the L* learning algorithm to learn a regular inductive invariant. In order to answer membership queries, we put a restriction that the transition relation $f$ is length-preserving, such that it is decidable to check whether a given configuration is reachable. Is putting this restriction the only way to make learning a proof possible? In this post, we shall discuss an elegant learning framework that can be applied to prove not only unreachability (aka. safety verification) but also recurrent unreachability (aka. liveness verification). We begin with briefly recapitulating the definitions.

Learning to prove unreachability for non-length-preserving systems

Given a finite set $\Sigma$ and a binary relation $f:\Sigma^* \to \mathcal P(\Sigma^*)$, define $F:\mathcal P(\Sigma^*) \to \mathcal P(\Sigma^*)$ by $F(S):= \bigcup_{v\in S}f(v).$ A binary relation is called regular if it can be accepted by an FA over $(\Sigma\cup\{\epsilon\})\times(\Sigma\cup\{\epsilon\}).$ Given two regular sets $I$ and $B$ and a regular relation $f$, the unreachability problem $(I,F,B)$ asks whether $B$ is unreachable from $I$ through $F$, namely, $F^*(I)\cap B=\emptyset.$ A proof for unreachability is a set $P\subseteq \Sigma^*$ satisfying $I\subseteq P$, $f(P)\subseteq P$, and $P\cap B=\emptyset.$ Such a set exists iff $B$ is unreachable from $I$.
Suppose we want to find a proof using the L* learning algorithm. In order to make membership queries decidable, Vardhan [1] proposed to learn, instead of the set of configurations, the set of configuration-witness pairs, where a witness refers to the number of steps needed to reach a configuration from the initial configurations. More precisely, in Vardhan's framework, the proof we aim to learn is a regular subset $W$ of $\Omega:=(\Sigma\cup\{\bot\})\times\{0,1\}$ ($\bot$ is the padding symbol). Each word in $W$ encodes a pair $(c,n)$ such that $c$ is a configuration and $c\in F^n(I)$. We shall use $[c,n]$ to denote the encoding of $(c,n).$ Now, given $I,f,B$, we shall consider an unreachability problem for $I',f',B'$ such that
$\begin{eqnarray*}
I' & := & \{[c,0]: c\in I\};\\
f'(S) &:=& \{[f(c),n+1]: [c,n]\in S\};\\
B' &:= & \{[c,n]:c\in B, n\ge 0\}.
\end{eqnarray*}$
Note that if the sets $I,f,B$ are regular then so are the sets $I',f',B'$.
It is straightforward to learn a proof for unreachability problem $(I',F',B')$ using L*: membership query is decidable thanks to the presence of witness, and equivalence query can be answered as we did before, aiming to find an inductive invariant as an over-approximation. After L* learns a proof for the new problem, we obtain a proof for the original problem by projection. However, note that the transformation from configurations to configuration-witness pairs doesn't preserve regularity. Hence, the transformed problem $(I',F',B')$ doesn't necessarily have a regular proof even when the original problem $(I,F,B)$ does (in which case the problem is decidable). In particular, our previous technique guarantees to terminate when $F^*(I)$ is regular, while the technique presented here doesn't. It would be interesting to characterise the classes for which this technique is effective, namely, the classes where regularity of $F^*(I)$ implies that of $F'^*(I')$.

Learning to prove recurrent unreachability for length-preserving systems

The notion of "learning configuration-witness pairs instead of configurations" can also be applied to learn proofs for recurrent unreachability. Given $I,f,B$ as before, we say $B$ is recurrently reachable from $I$ iff there exists $w\in I$ such that the trace $f*(w)$ visited $B$ infinitely often. To prove recurrent unreachability, a witness needs to make it decidable to check whether a configuration is visited i.o. from $I$. This doesn't look possible as a single witness would then need to encode an infinite amount of information. (Recall that a witness for reachability only needs to encode a finite path.) For length-preserving systems, however, a counterexample to recurrent reachability is a lasso-shaped path and hence has a finite encoding. In fact, we can apply the well-known liveness-to-safety reduction for regular model checking to translate a recurrent reachability problem to a unreachability problem. More precisely, given a length-preserving transition system $(I,T,F)$ over alphabet $\Sigma$, we can specify a unreachability problem $(I',T',B')$ over alphabet $\Sigma\times\Sigma$ where
$I' := F\times F$,
$T' := \{((u,v),(u,w)) : u\in\Sigma^* \wedge |u|=|v| \wedge (v,w) \in T\}$, and
$B' := (T^*(I) \cap F)\times(T^*(I) \cap F)$.
Observe that $B'$ is reachable from $I'$ in $(I',T',B')$ iff $F$ can be visited infinitely often from $I$ in $(I,T,F)$. Even though $B'$ is in general not computable, the unreachability problem is still amenable to the automata learning method introduced in our last post. To see this, suppose that the learner has made an equivalence query for a candidate proof $P$. If either $T'(P)\subseteq P$ or $I'\subseteq P$ fails, a counterexample for the query can be computed as before. To check whether $P ∩ B = ∅$, we first check whether there is some $y \in P \cap (F\times F)$. If no such $y$ exists, then $P$ is a proof for the unreachability problem. Otherwise, we proceed to check whether $y \in T'^*(I')$, which is decidable since $T'$ is length-preserving. If $y \in T'^*(I')$, then $y$ is a witness for the fact that $B'$ is reachable from $I'$. Otherwise, we can use $y$ as a counterexample for the equivalence query.

Tuesday, January 31, 2017

DFA as proof for unreachability

Preliminaries. Given a finite set $\Sigma$ and a binary relation $f:\Sigma^* \to \mathcal P(\Sigma^*)$, we would use $F$# to denote the set function $\mathcal P(\Sigma^*) \to \mathcal P(\Sigma^*)$ defined by $F(S):= \bigcup_{v\in S}f(v).$ Also,$$F^0(S):= S;\quad F^{n+1}(S):= F(F^n(S))\cup F^n(S);\quad F^*(S):= \bigcup\nolimits_{n\ge 0} F^n(S).$$Intuitively, $F^*(S)$ is the set reachable from $S$ by applying $f$ to $S$ finitely many times. A binary relation is called regular if it can be recognised by a finite automaton over alphabet $(\Sigma\cup\{\epsilon\})\times(\Sigma\cup\{\epsilon\}).$
Unreachability analysis. Suppose $I$ and $B$ are two regular sets and $f$ is a regular relation. The unreachability problem $(I,F,B)$ asks whether $B$ is unreachable from $I$ through $F$, namely, $F^*(I)\cap B=\emptyset$. In software verification, many safety properties of a software can be modelled as (un)reachability problems [1] for an infinite state system. Particularly, the semantics of a program is modelled by a binary relation over configurations $\Sigma^*.$ If we let $I$ and $B$ denote the sets of the initial and the bad configurations, respectively, then $F^*(I)\cap B=\emptyset$ means that the execution of the program would never go wrong.
Tractability. Given $(I,F,B)$, $F^*(I)$ is not necessarily regular even though $F^n(S)$ is regular for all $n\ge0.$ Also, most interesting properties about $F^*(S)$ are undecidable. This fact is not surprising as the transition relation of a Turing Machine is regular*. To make the verification problem more tractable, we assume that $f$ is length-preserving, i.e., $w\in f(u)$ only if $|w|=|u|$, which weakens the expressive power of the transition system $(I,f)$ from a Turing machine to a linear-bounded automaton (LBA), which is only allowed to work with the amount of space occupied by the input string. Membership of $F^*(I)$ becomes decidable with the length-preserving restriction, but unreachability remains undecidable, as $LBA\cap DFA=LBA$ and checking emptiness of LBA is undecidable. This follows from the fact one can encode a given TM $T$ as an LBA $A$ such that $L(A)$ consists of precisely the halting runs of $T$.
Remark. While a regular transition system is as powerful as a TM (which can compute functions like $f(w)=w\#w$), not all transition systems are regular. For example, transition systems induced by higher-order pushdown automata allow transitions like $w \to w\#w$ and are thus not regular.
Regular proof synthesis. While the problem is undecidable, we can given a positive answer to it as long as we can find a proof . Such a proof can be a set $S$, called an inductive invariant, that satisfies $B\cap S=\emptyset$ and $F^*(I)\subseteq S.$ Recall that in this post we introduced L*, an algorithm that can learn a DFA by querying the target language. Below, we try to construct a teacher for L* to learn a regular proof for unreachability problem $(I,F,B).$ Upon a membership queries for word $w$, the teacher checks if $I$ is reachable from $w$ through $f^{-1}$. This check is decidable thanks to the assumption that $f$ is length-preserving, which guarantees that a word is only reachable from a finite number of words. Upon an equivalence query for DFA $A$, the teacher performs the following 4 steps in turn:

Step 1. If $\exists w\in I\setminus L(A)$ then return $w$. // check that $I\subseteq L(A)$)
Step 2. If $\exists w\in L(A)\cap B$ then halt and report "reachable" if $Mem(u)=yes$; return $w$ otherwise. // check that $L(A)$ doesn't overlap $B$
Step 3. If there exist $u\in L(A)$ and $w\in f(u)\setminus L(A)$, then return $w$ if $Mem(u)=yes$ and return $u$ otherwise. // check that $L(A)$ is inductive
Step 4. Halt and report "unreachable".

If a candidate DFA $A$ passes Step 1-3, then $L(A)$ is clearly an inductive invariant. Observe that each counterexample falls in the symmetric difference of $L(A)$ and $F^*(I).$ When $F^*(I)$ is regular, it follows from the complexity results of L* that an inductive invariant can be learnt with at most $n$ equivalence queries, where $n$ is the number of states in the minimal DFA of $F^*(I)$, and at most $m\!\cdot\!(n+1)$ membership queries, where $m$ is an upper-bound on the length of the counter-examples returned by the teacher. Here we have several notes in order:
On the length-preserving restriction. We have assumed that the transition relation preserve lengths. In this way, a bad configuration is reachable from some initial configuration if and only if it can be reached within a bounded memory. This restriction is not essential for practical unreachability analysis as long as we allow "padding" in the initial set. That is, for every word in the initial configurations, say $w$, we would pad it with an arbitrary number of placeholders, say $\bot$, so that $w\in I$ implies $w\bot^*\subseteq I$. In this way, the size of available memory for computing $w$ is finite but effectively unbounded. On the other hand, though length-preserving relations are powerful enough to model many useful transition systems, they cannot model systems that require $\epsilon$ in their alphabet, such as lossy channel machines.
Minimality of proof. While L* always learns a minimal DFA by the Myhill-Nerode theorem, here it does not necessarily learn the minimal inductive invariant in terms of set inclusion. Instead, depending on the counterexamples returned by the teacher, L* would try to learn one of the inductive invariants and, when it found one, it learns the minimal DFA for that particular invariant.
Completeness of the method. When $F^*(I)$ is regular, L* is guaranteed to find a proof. When $F^*(I)$ is not regular, however, our method may not find a regular proof even when one exists. On the other hand, checking unreachability is in fact decidable when a regular proof exists, since we can launch two procedures at the same time, one looking for a counterexample and one looking for a proof. From this aspect, L* learning is weaker in completeness than brute force enumeration. (Note that finding a DFA for $F^*(I)$ may still be undecidable even when $F^*(I)$ is regular.)
A non-terminating example. Consider an unreachability problem $(I,F,B)$ where $I:=\{(ab)^n:n\ge 1\}$, $f:= \{(xbay, xaby): x,y\in \{a,b\}^*\}$, and $B$ is some trivial set such as $\emptyset.$ Note that $F^*(I)$ is not regular, since $F^*(I)\cap a^*b^*=\{a^nb^n:n\ge1\}$ is not regular. Now consider a Teacher implementing Step 1-4 for equivalence queries that would return a shortest counterexample for the candidate proof. Below we list the queries and the automata $A$ learnt by L* in each iteration.
Iteration 1: $A=\emptyset$; containment check of $I$ fails with counterexample $ab$; add $ab.$
Iteration 3: $A=a(ba)^*b$; inductiveness check fails at $abab \to aabb$; add $aabb.$

digraph finite_state_machine {
rankdir=LR;
size="8,5"
node [shape = doublecircle]; 3 ;
node [shape = circle];
1 -> 2 [ label = "a" ];
2 -> 3 [ label = "b" ];
3 -> 2 [ label = "a" ];
"init" [shape = point];
"init" -> 1;
}

Iteration 4: $A=a(ba|ab)^*b$; inductiveness check fails at $aababb \to aaabbb$; add $aaabbb.$

digraph finite_state_machine {
rankdir=LR;
size="8,5"
node [shape = doublecircle]; 4 ;
node [shape = circle];
2 -> 3 [ label = "b" ];
1 -> 3 [ label = "a" ];
3 -> 2 [ label = "a" ];
3 -> 4 [ label = "b" ];
4 -> 3 [ label = "a" ];
"init" [shape = point];
"init" -> 1;
}

Iteration 5: $A=a(ba|a(ab)^*b)^*b$; inductiveness check fails at $aaababbb \to aaaabbbb$; add $aaaabbbb.$

digraph finite_state_machine {
rankdir=LR;
size="8,5"
node [shape = doublecircle]; 5 ;
node [shape = circle];
2 -> 3 [ label = "b" ];
3 -> 2 [ label = "a" ];
3 -> 4 [ label = "b" ];
1 -> 4 [ label = "a" ];
4 -> 3 [ label = "a" ];
4 -> 5 [ label = "b" ];
5 -> 4 [ label = "a" ];
"init" [shape = point];
"init" -> 1;
}

One can observe that L* got trapped from the 3rd iteration - it keeps splitting state 2, creating a lineage that leads to a divergent sequence of counterexamples. On the other hand, L* has indeed tried its best after the 3rd iteration: each time L* receives a counterexample of length $2n$, it succeeds to find an inductive proof for the reachable configurations of length $\le 2n$, namely $A_n:=\{w\in \{a,b\}^*:|w|\le n$ and $0\le$#$a-$#$b\le n$ in all prefixes of $w \}.$ Note that $|A_{n+1}|=|A_{n}|+1.$ Due to the fact that L* progresses incrementally in terms of automata size, and that $a^{n+1}b^{n+1}$ is the shortest counter-example for $A_n$, L* will never reach an automaton covering all possible $A_n$.

Learning with witnesses. We have discussed how to use the L* learning algorithm to prove $F^*(I)\cap B=\emptyset$. In order to answer membership queries, we assumed that the transition relation is length-preserving. However, it is possible to use L* without this assumption. For example, Vardhan et al. [3] proposed to learn a set of configuration-witness pairs as a proof, where a witness refers to the number of steps needed to reach a configuration. More precisely, the proof they proposed to learn is a regular subset of $((\Sigma\cup\{\bot\})\times\{0,1\})^*$, where $\bot$ serves as a padding symbol. Each word in this set encodes a pair $(c,n)$, meaning that $c$ is a configuration contained in $F^n(I)$. Let $[c,n]$ denote the encoding of $(c,n).$ Given $I$, $f$, and $B$, we define $I'$, $f'$, and $B'$ as
$\begin{eqnarray*}
I' & := & \{[c,0]: c\in I\};\\
f'(S) &:=& \{[f(c),n+1]: [c,n]\in S\};\\
B' &:= & \{[c,n]:c\in B, n\ge 0\}.
\end{eqnarray*}$
Note that if $I$, $f$, and $B$ are regular then so are $I'$, $f'$, and $B'.$
It is straightforward to learn a proof for unreachability problem $(I',F',B')$ using L*: membership query is decidable thanks to the presence of witness, and equivalence query can be answered as we did before, aiming to find an inductive invariant as an over-approximation. After L* learns a proof for the new problem, we obtain a proof for the original problem by projection. However, note that the transformation from configurations to configuration-witness pairs doesn't preserve regularity. Hence, the transformed problem $(I',F',B')$ doesn't necessarily have a regular proof even when the original problem $(I,F,B)$ does. In particular, unlike the previous method, the learning process here doesn't guarantee to terminate when $F^*(I)$ is regular. It would be interesting to characterise the classes for which this technique is effective, namely, the classes where regularity of $F^*(I)$ implies that of $F'^*(I')$.

References

1. A unified approach for studying the properties of transitions systems, TCS 1982.
2. Learning to prove safety over parameterised concurrent systems, FMCAD 2017.
3. Learning to verify safety properties, ICFME 2004.