Ericpony's Blog: 2016

Wednesday, December 7, 2016

A note on induction and coinduction

An illustrative example. Consider a context-free grammar (CFG) with start symbol $S$, alphabet $\Sigma=\{0,1,2\}$, and the following production rules:\begin{equation}S\rightarrow 0\ |\ 1\ |\ 0S\ |\ 1S\label{eg1-cfg1}\end{equation}A CFG is conventionally interpreted inductively, which means a string belongs to a grammar's language if and only if it can be transformed from the start symbol through a finite number of rule applications. Hence, the language of the above grammar is $\{0,1\}^+$, for this set contains exactly the strings in $\Sigma$ that can be derived from $S$. On the other hand, a grammar can also be interpreted coinductively, meaning that a string belongs to the a grammar's language if and only if its existence is consistent with the grammar rules. In this sense, a string is in the language of \eqref{eg1-cfg1} if and only if it is of one of the forms 0, 1, $0x$ or $1x$, and, therefore, the language of the grammar is $\{0,1\}^+\cup\{0,1\}^\omega$, the set of nonempty strings comprised of 0's and 1's. Note that any string containing 2 is not in the language, since if $x2y=\{0,1\}\cdot z$ is in the language, then so is a string $x'2y$ with $|x'| = |x| - 1$ and this will eventually lead to a contradiction. Now consider another CFG as follows:\begin{equation}S\rightarrow 0\ |\ 1\ |\ S\ |\ SS\label{eg1-cfg2}\end{equation}It is clear that grammars \eqref{eg1-cfg1} and \eqref{eg1-cfg2} produce the same language when they are interpreted inductively. However, if a coinductive interpretation is taken, then the language of \eqref{eg1-cfg2} will become $\Sigma^\omega$ since any string satisfies rule $S \rightarrow S$. In short, an inductive interpretation of a set of rules looks for the minimal satisfying set (a string is in the language iff it can be derived via the rules), while a coinductive interpre-tation looks for a maximal one (a string is in the language iff it doesn't violate the rules).
The concepts of induction and coinduction can be unified in order theory. To see this, fix a countable domain $D$ and consider a predicate $Q: D \rightarrow \{true,false\}$. We shall use $Q$ to capture the minimal set of elements satisfying a set of recurrent equations. For example, the language induced by grammar \eqref{eg1-cfg1} can be captured by $Q$ with $D=\Sigma^\omega$ and recurrent equations $Q(0)=true$, $Q(1)=true$, $Q(0x)=Q(x)$, and $Q(1x)=Q(x)$, such that a string $x$ is in the language iff $Q(x)=true$. Now we rewrite these equations by replacing the $Q$'s on the left-hand side of each equation with a functional $F$ over $D\rightarrow \{true,false\}$, obtaining a set of equations defining $F$: \begin{equation}F(Q)(0)=true;\quad F(Q)(1)=true;\quad F(Q)(0x)=Q(x);\quad F(Q)(1x)=Q(x).\label{eg1-fp1}\end{equation}Consider the (standard) complete lattice $(D\rightarrow \{true,false\},\ \sqsubseteq)$ of predicates where $(Q_1\sqcap Q_2)(d) = Q_1(d)\wedge Q_2(d)$, $(Q_1\sqcup Q_2)(d) = Q_1(d)\vee Q_2(d)$, $\bot$ maps all $d\in D$ to $false$, $\top$ maps all $d\in D$ to $true$, and $Q_1 \sqsubseteq Q_2$ means $Q_1(d)\Rightarrow Q_2(d)$ for all $d\in D$. Observe that $F$ is a monotone function on this lattice: $Q_1 \sqsubseteq Q_2 \Rightarrow F(Q_1) \sqsubseteq F(Q_2)$. Hence, by Tarski's Fixed Point Theorem, $F$ has a least fixed point (denoted by $\lfp(F)$) and a greatest fixed point (denoted by $\gfp(F)$). The relationship between these fixed points is$$\bot\sqsubseteq F^k(\bot)\sqsubseteq \bigsqcup\nolimits_n F^n(\bot) \sqsubseteq \lfp(F) \sqsubseteq \gfp(F) \sqsubseteq \bigsqcap\nolimits_n F^n(\top) \sqsubseteq F^k(\top) \sqsubseteq \top.$$Take the $F$ defined in \eqref{eg1-fp1} as an example. $F^n(\bot)$ captures the set of strings that can be derived from grammar \eqref{eg1-cfg1} with a parse tree of height less than $n+1$. Furthermore, since each equation only has a finite number of $Q$'s on the right-hand side, $F$ is in fact a Scott continuous function and thus $\lfp(F)=\bigsqcup_n F^n(\bot)$, i.e., $\lfp(F)(x)=true$ iff $x \in\{0,1\}^+$, the set of strings that can be derived from $S$ in a finite number of steps. To find $\gfp(F)$, define $A_n=\{$nonempty binary strings of length less than $n+1\}$ and $B_n=\{$infinite strings in $\Sigma^*$ containing a binary prefix of length less than $n+1\}$. Observe that $F^n(\top)$ captures the set $A_n \cup B_n$, $F^n(\top) \sqcap F^{n+1}(\top)$ captures the set $A_{n+1} \cup B_{n+1}$. Thus $\bigsqcap\nolimits_n F^n(\top)$ will capture the set $\lim_{n\rightarrow\infty}(A_n\cup B_n)$ if the limit exists. However, since $A_n \nearrow \{0,1\}^+$ and $B_n \searrow\{0,1\}^\omega$, we have $$\limsup_{n\rightarrow\infty}(A_n\cup B_n) = \bigcap_{n\ge 1}\bigcup_{k\ge n} (A_k \cup B_k) = \bigcap_{n\ge 1}(\{0,1\}^+\cup B_n) = \{0,1\}^+\cup\{0,1\}^\omega,$$and$$\liminf_{n\rightarrow\infty}(A_n\cup B_n) = \bigcup_{n\ge 1}\bigcap_{k\ge n} (A_k \cup B_k) = \bigcup_{n\ge 1}(A_n \cup \{0,1\}^\omega) = \{0,1\}^+\cup\{0,1\}^\omega.$$It follows that $\lim_{n\rightarrow\infty}(A_n\cup B_n)=\{0,1\}^+\cup\{0,1\}^\omega$, which is also the set captured by $\gfp(F)$ as $F$ is Scott continuous.
Proof by induction. In mathematical induction, we usually set $D=\mathbb N\cup\{0\}$ and try to prove that $Q(0)$ holds and $Q(n)\Rightarrow Q(n+1)$ for all $n\ge 0$. Generally, proving a property $Q$ by induction amounts to proving that $Q$ holds in the base cases and $Q(d)\Rightarrow F(Q)(d)$ for all $d\in D$. The predicate $Q$ satisfying this proof rule turns out to be $\bigsqcup_n F^n(\bot)$, which, when $F$ is Scott continuous, agrees with the least fixed point of $F$. In this sense, proving by induction usually corresponds to taking the least fixed point. In program verification, inductions and least fixed points are extensively used to reason about semantics, because the semantics of a program should precisely describe what will happen during the evaluation of the program.
Proof by coinduction. Suppose $Q$ is the greatest fixed point of $F$. By definition, any predicate $Q'$ satisfying $Q' \sqsubseteq F(Q')$ also satisfies $Q' \sqsubseteq Q$, since $Q$ is an upperbound of any ascending chain in the lattice. Hence, to show $Q(d)$ holds it suffices to find some predicate $Q'$ such that $Q'(d)$ holds and $Q'(d')\Rightarrow F(Q')(d')$ for all $d' \in D$. The rationale behind doing this is to regard the idea of being acceptable (i.e., $Q(d)$ holds) as being consistent with the induction rules (i.e., $\exists Q': Q'(d) \wedge Q'(d')\Rightarrow F(Q')(d')$ for all $d'$). In program verification, coinductions and greatest fixed points are extensively used to check consistence with specifications, because the specification of a program should not forbid a behaviour unless it explicitly specifies so.
Collapse of fixed points. The least fixed point and the greatest fixed point in a complete lattice can coincide when Banach's Fixed Point Theorem can be applied. Namely, a "contractive" operator in a metric space has a unique fixed point. A functional $F$ will be contractive if there exists a well-founded relation $\prec$ on $D$ such that the clause of $F(Q)(d)$ only call $Q$ on arguments smaller than $d$ with respect to $\prec$. For example, the $F$ defined in \eqref{eg1-fp1} is contractive when $D$ is set to $\Sigma^*$; in such case, both $\lfp(F)$ and $\gfp(F)$ capture $\{0,1\}^+$. By contrast, the functional that expresses grammar \eqref{eg1-cfg2}, i.e., the functional $F$ that satisfies\begin{equation}F(Q)(0)=true;\ F(Q)(1)=true;\ F(Q)(x)=Q(x);\ F(Q)(xy)=Q(x)\wedge Q(y),\label{eg1-fp2}\end{equation}is not contractive even when $D=\Sigma^*$. Indeed, one can see that $\lfp(F)$ captures $\{0,1\}^+$ but $\gfp(F)$ captures $\{0,1\}^*$.

Inductive and coinductive datatypes

An introduction to (co)algebra and (co)induction.
https://pages.lip6.fr/Pierre-Evariste.Dagand/stuffs/notes/20150126_coinduction.txt
http://wiki.c2.com/?CoinductiveDataType

Sunday, November 27, 2016

Some interesting proofs using oracles

A proof with oracles exploits the following two facts about Turing Machines (TMs): 1) there exist effective representations of TMs as strings, and 2) a TM can simulate other TMs taking their representation as input. Any argument that only relies on these two facts about TMs treats them as black boxes. This post tries to collect some results with interesting oracle proofs that sketches the proof ideas.$\newcommand{\P}{{\sf P}}$ $\newcommand{\NP}{{\sf NP}}$ $\newcommand{\PSPACE}{{\sf PSPACE}}$ $\newcommand{\EXPTIME}{{\sf EXPTIME}}$
Diagonalization. Any proof exploiting diagonalization proof techniques to prove properties of TMs is a proof with oracles. A well-known application of this proof techniques is to show that the Halting problem is undecidable. That is, generally, no TM can determine whether a TM will halt on a string. The key factors leading to this result are precisely the two facts stated earlier: any TM can be represented as a string, and a TM can simulate another TM based on its string representation. To see that the Halting problem is undecidable, suppose to the contrary that we can decide whether $M$ halts on $x$ for any $M$ and $x$. (It is a convention to regard $M(x)$ as diverging if $M$ is not a valid TM.) Define a TM $N$ such that $N(M)$ halts if $M(M)$ diverges, and $N(M)$ loops if $M(M)$ halts. After some speculation, one can see that it is impossible to determine whether $N(N)$ halts, contradicting to our assumption.

Fixed-Point Theorem. Two TMs $A$ and $B$ are called semantically equivalent, denoted by $A\simeq B$, iff $A(x)=B(x)$ for any string $x$ (set $M(x):=\nearrow$ if a TM $M$ diverges on $x$). We claim that if $f$ is a TM mapping TMs to TMs, then $f$ has a fixed point, i.e., there exists a TM $M$ such that $f(M)\simeq M$. The key to prove this claim is to consider a special TM $G(M):=M(M)$. More precisely, given string $M$, $G(M)$ is a TM that loops on any input when $M(M)$ is not a valid TM, and produces $M(M)(x)$ on input $x$ otherwise. Somehow we guess that a fixed point occurs at $G(N)$ for some TM $N$. Writing down the equation $f(G(N))\simeq G(N)\simeq N(N)$ with unknown $N$, one finds that $N:=f\circ G$ is a solution and thus $G(f\circ G)$ is a fixed point of $f$. By using integers as an encoding of TMs and taking the convention that $M(x):=\nearrow$ on all $x$ when $M$ is not a valid TM, we obtain Rogers' Fixed-Point Theorem: Any total computable function has a fixed point.

P vs NP. A canonical example used to show the power of proofs with oracles is to show that the $\P$ vs $\NP$ problem doesn't relativize. That is, there exist oracles $A$ and $B$ such that a) $\P^A=\NP^A$ and b) $\P^B\neq \NP^B$. First, note that any $C$ oracle such that $\NP \subseteq C$ and $C=\texttt{co}C$ can serve as $A$. For example, $C$ can be $\PSPACE$ or $\EXPTIME$. For such $C$, $C\subseteq \P^A\subseteq \NP^A$ clearly holds. Further, $\NP^A\subseteq C$ holds as any machine in $\NP$ can be simulated in $C$ and any query to $A$ can be answered in $C$. To prove (b), we will construct an oracle $B$ such that language $L_B:=\{1^n: B$ accepts a string of length $n\}$ locates in $\NP^B \setminus \P^B$. To force $L_B \not\in \P^B$, we consider all oracle machines in $\P$ by labelling them as $M_1, M_2,....$ We hope that by the construction of $B$, there exists an integer $n_i$ for each $M_i$ such that $M_i^B$ accepts $1^{n_i}$ iff $1^{n_i} \not\in L_B$. This will then imply $L_B \not\in \P^B.$ The fact that $L_B \in \NP^B$ is clear, as $\NP^B$ can guess a string of size $n$ in $B$ by making at most $n$ non-deterministic queries to $B$.
Let $B_0=\emptyset$ and $n_0=0$. For each $i\ge 1$, we shall construct $B_i$ as follows. Let $n_i>n_{i-1}$ be any number larger than the maximal length of strings in $B_{i-1}$. Now simulate $M_i^{B_{i-1}}$ on $1^{n_i}$ for at most $2^{n_i}/3$ computation steps. If $M_i^{B_{i-1}}$ accepts $1^{n_i}$, then we set $B_i:=B_{i-1},$ which implies $1^{n_i}\not\in L_{B_i}$. If $M_i^{B_{i-1}}$ rejects $1^{n_i}$, then there exists a string $s$ of length $n_i$ not queried by $M_i^{B_{i-1}}$. Define $B_i:=B_{i-1}\cup\{s\}$, which implies $1^{n_i} \in L_{B_i}$. Hence, by construction, $M_i^{B_{i-1}}$ cannot decide the membership of $1^{n_i}$ in $L_{B_i}$ within $2^{n_i}/3$ steps. Define $B:=\bigcup_i B_i$. Observe that $B_i\nearrow B$ and $L_{B_i}\nearrow L_B$. Also, for any string $x$, $M^B(x)=M^{B_{i}}(x)$ as long as $M^B(x)$ terminates in $2^{n_i}/3$ steps. Note that every TM $M$ has infinitely many TMs that i) are semantically equivalent to $M$, ii) are of size arbitrarily larger than $M$, and iii) has the same running time as that of $M$. Since $n_i\nearrow\infty$ as $i\rightarrow\infty$, for each oracle machine $M$ in $\P$, there exists $i$ such that $M_i^B\simeq M^B$ and the running time of $M_i^B$ on $1^{n_i}$ is less than $2^{n_i}/3$. It follows that $M^B$ accepts $1^{n_i}$ iff $1^{n_i}$ is not in $L_B$, i.e., $M^B$ doesn't decide $L_B$. Hence, $L_B$ is a language in $\NP^B$ but not in $\P^B$.

Sunday, October 23, 2016

A note on LTL vs CTL

Syntax. A quantifier in a CTL formula are in form of a path quantifier $A,E$ following by a state quantifier $F,X,G,U,W$. A quantifier in an LTL is a combination of state quantifiers.
States and paths. A CTL formula $\phi$ is a state formula, meaning that the temporal operators are state-based in structure: the derivation of a given formula is per-state and depends on the derivation of subformulas for subsequent states. For example, computing the set of states satisfying $AG\,φ$ relies on computing the set of states satisfying $φ$.
In contrast, LTL formula $\psi$ is a path formula: the derivation of a given formula depends on the derivation of subformulas for suffixes of a path.
Entailments. We write $M,s \models \phi$ and $M,p \models \psi$ for state $s$ and path $p$. Given a set $I$ of initial states of $M$, we write $M ⊨ φ$ if $M, s ⊨ φ$ for all $s ∈ I$, and write $M ⊨ ψ$ if $M, p ⊨ ψ$ for any path $p$ starting from $I$.
LTL and CTL are incomparable. The reset property $AG (EF\,φ)$ (i.e. there is always a possibility that $\phi$ could hold, though it may never hold) cannot be expressed in LTL. The stability property $FG\,φ$ cannot be expressed in CTL. Consider a system $M: (s_0,s_0),(s_0,s_1),(s_1,s_2),(s_2,s_2)$ such that $s_0$ is the initial state, $s_0,s_2 \models φ$, and $s_1 \models \neg φ$. Then $M \not\models AFAG\,φ$ but $M \models FG\,φ$.
It is known that an LTL formula can be over-approximated with ∀CTL [6]: if an ∀CTL property (consisting of AG and AF sub-properties) holds, then an analogous LTL property (where we replace AG with G and AF with F) also holds.
Structure vs traces. CTL can characterize the structure while LTL only characterizes the traces. Consider the following two systems:
$A:$ $(s_1,s_2),\ (s_1,s_3),\ (s_2,s_4),\ (s_3,s_5),\ (s_4,s_1),\ (s_5,s_1)$
$B:$ $(s_1,s_2),\ (s_2,s_4),\ (s_2,s_5),\ (s_4,s_1),\ (s_5,s_1)$
where $L(s_1)=a,\ L(s_2)=L(s_3)=b,\ L(s_4)=c,\ L(s_5)=d$. Both systems have traces $(ab(c+d))^\omega$ and thus cannot be distinguished in LTL. On the other hand, $B$ is a model of $AG(b\Rightarrow EX\ c)$ while $A$ isn't.
There are lots of discussions over the best logic to express properties for software verification (cf. [2,5]). LTL can express important properties for software system modelling such as fairness, while the CTL must have a new semantics to express them, for example CTL* [4] or CTL+Fair [3]. But CTL algorithms are usually more efficient and can use BDD-based algorithms. Hence reachers have tried to approximate LTL verification using CTL model checking techniques, e.g., [7].

References

1. Making prophecies with decision predicates, POPL 2011.
2. Branching vs. Linear Time: Final Showdown, TACAS 2001.
3. Fairness for Infinite-State Systems, TACAS 2015.
4. Verifying Increasingly Expressive Temporal Logics for Infinite-State Systems, JACM 2017.
5. Concurrency: the Works of Leslie Lamport, ACM Press 2019.
6. The common fragment of CTL and LTL, FOCS, 2000.
7. Temporal verification of programs, PhD Thesis, 2012.

Monday, October 3, 2016

Enumeration in C#

Enumerating a collection

In C#, a collection class needs to implement IEnumerable to be used in foreach. This interface provides a function GetEnumerator that returns an IEnumerator instance (called an enumerator, which may or may not be the collection itself) supporting operations such as Reset, Current, and MoveNext. An enumerator serves as a proxy to sequentially access the elements in a collection. However, the behavior of a enumerator can be problematic if the underlying collection is modified while the enumeration is in progress. There are several approaches to deal with this problem. One is to take a snapshot of the collection for enumeration. Another approach is to locate the elements on demand, so that the ith enumerated element is always the ith element, if any, in the current collection. To guarantee this, it may take linear time to visit each element and thus quadratic time to enumerate a collection using foreach. The third approach is to throw an exception when the enumerator detects that the contents of the collection changed during enumeration. The last approach is most frequently adopted in C#'s standard libraries for performance concerns. Even so, the first approach can be realized in the call site by creating a snapshot manually using ICollection.CopyTo.
In Java, Iterator.remove is the only safe way to modify a collection during iteration; the behavior of an iterator is unspecified if the underlying collection is modified in any other way. Besides, if you happen to have a ListIterator then you can use ListIterator.add to insert elements in addition to removing them during iteration.

Enumerators are not Enumerable

While it is enumerators that are used for enumeration, foreach has to work with an IEnumerable instance from which an enumerator is obtained. Namely, one can not directly use an enumerator in a foreach loop. In Java, this is not a problem because an anonymous interface instance can be easily created on the fly when it is needed. For example, one common scenario to instantiate an interface anonymously in Java is to create a task for a thread:

new Thread (new Runnable() { public void run() { ... } }).start();

In the snippet, an anonymous Runnable class is created, used and then discarded at runtime. Anonymous classes come in handy when Java programmers want to create classes to serve as workers, listeners, adapters, etc. By contrast, C# programmers are expected to use anonymous functions (delegates or lambdas) for these purposes. While C# does supports anonymous classes, anonymous classes in C# look more like structs than classes, as they can only have public read-only properties and they are not allowed to extend classes (other than object) or implement interfaces.
Although one cannot use an enumerator with foreach, it is almost as easy to use it in a while loop:

while (enum.MoveNext()) { Process(enum.Current); }

However, in situations where an enumerable instance is explicitly required (e.g., by the constructor of a collection), it will take some tricks to convert an enumerator to an enumerable instance at runtime.

Wednesday, June 22, 2016

Limits of sets

Wikipedia has an article that rigorously defines this concept using scaring symbols such as limsup, liminf, bigcap, and bigcup. However, I found a post on MO that explains the same idea intuitively. The keypoint is we can categorize every point ever appearing in the infinite sequence of sets $S_1,S_2,...$ into three partitions:
I) Present in the all but finitely many sets
II) Present only in finitely many sets
III) Present in and absent from the sets infinitely often
If partition III is nonempty then the limit of the sequence clearly doesn't exist. If the limit does exist, then it should contain exactly the points in partition I. The limit of a monotone set sequence always exist because every point in this sequence falls in the first two partitions. Hence for any sequence of sets $S_1,S_2,...$ the following sequences converge: $A_n\equiv \bigcup_{i=1}^n S_i$, $B_n\equiv \bigcup_{i=n}^\infty S_i$, $C_n\equiv \bigcap_{i=1}^n S_i$, $A_n\equiv \bigcap_{i=n}^\infty S_i.$

Sunday, June 19, 2016

A note on frameworks for computational logics

A fundamentional computational assumption

SMT-based verification techniques such as [1,..] make fundamental assumptions about an background structure $\mathfrak{S}$, an r.e. universe $U$, and an r.e. set $\Phi$ of quantifier-free formulas such that
1. All elements of $U$ can be named by ground terms in the language of $\mathfrak{S}$.
2. All quantifier-free formulas in the language of $\mathfrak{S}$ are in $\Phi$.
3. If $a\in U$ and $\varphi \in \Phi$, then $\varphi[a/x]\in \Phi$.
4. If $\varphi_1, \varphi_2 \in \Phi$, then $\varphi_1\vee \varphi_2$, $\varphi_1\wedge \varphi_2$, and $\neg \varphi_1$ are all in $\Phi$.
5. Satisfiability of formulas in $\Phi$ is decidable.
Expressiveness. The above formalism does not capture all decidable languages. For example, it cannot express context-free languages as they are not closed under intersection. However, the formalism does embody the basic properties of the theories supported by any state-of-the-art SMT solver.
Working with a solver. The specialisation on quantifier-free formulas is important for SMT solvers, since theory combination in SMT solvers use quanti er free formulas and becomes undecidable when quantifiers are allowed. This fact limits what we are allowed to express in $\mathfrak{S}$ without sacrificing decidability – in general, only positive existential quantifiers are allowed, while other quantifiers are not.
Computation of witness. The assumption that $U$ is r.e. allows us to effectively compute a witness whenever a formula is satisfiable. Here effectiveness means that we have a solver which decides satisfiability and finds a witness in a finite number of steps.

Multi-typed universe & structure

In [2, 3], the background universe $U$ is an r.e. multi-typed carrier set. For each type $\tau,$ $U$ contains a sub-universe $U^\tau$ of elements of type $\tau$. For example, $U^{Bool} = \{⊥, ⊤\}$ and $U^{Int} = ℤ$. Terms are defined by induction over the language of $𝔖$ and are assumed to be well-typed. In a multi-typed structure, formulas are simply terms of type $Bool$, i.e., no distinction is made between formulas and Boolean terms. As before, we assume that all elements in $U$ can be named by the ground terms in the language.
Interpretations. An uninterpreted function symbol of arity n ≥ 1 is a function symbol $f$ with a domain sort $\sigma_1 × ⋯ × \sigma_n$ and a range sort $\sigma$. An interpretation for $f$ is a function from $U^{\sigma_1} × ⋯ × U^{\sigma_n}$ to $U^\sigma$. An interpretation of a variable of sort $\tau$ is an element in $U^\tau$. A constant is also called a function symbol of arity 0.
Semantics. The Tarski semantics of a closed term $t : \tau$ is denoted by $[\![ t ]\!] ∈ U^\tau$. For example, given a $\lambda $-term $\lambda x. t : \tau → \sigma$, $[\![ \lambda x. t ]\!]$ denotes a function mapping $a ∈ U^\tau$ to $[\![ t[x/a] ]\!] ∈ U^\sigma$. A $\lambda $-term $φ : \tau → Bool$ is called a $\tau$-predicate. Note that the semantics of $\tau$-predicates is defined differently than other $\lambda $-terms: given a $\tau$-predicate $φ$, we shall use $[\![ φ ]\!]$ to denote the set $\{ a ∈ U^\tau ∣ [\![ φ ]\!](a) = ⊤ \}$. Hence $\tau$-predicates are closed under Boolean operations, e.g., $[\![ φ ∧ ψ ]\!] = [\![ φ ]\!] ∩ [\![ ψ ]\!]$, $[\![ ¬φ ]\!] = U^\tau \setminus [\![ φ ]\!]$, etc. Under the previous assumption, satisfiability for $\tau$-predicates is decidable and a witness is computable. Hence we can assume for $\tau$-predicates an effective function $w_\tau$ such that $w_\tau(φ) ∈ [\![ φ ]\!] ∪ \{ ⊥ \}$.

References

1. Monadic Decomposition, Journal of ACM, 2017.
2. Symbolic Transducers, Microsoft Technical Report, 2011
3. Symbolic Finite State Transducers: Algorithms and Applications, POPL 2012.

Wednesday, May 4, 2016

Active Set in GraphX Pregel

The original Pregel computation abstraction takes a vertex-centric view to express graph programs. In this abstraction, a vertex is a two-state machine which is either inactive or active. Each active vertex concurrently runs a vertex program, which can send messages to other vertices, update the value of the vertex, or deactivates the vertex (aka. votes to halt). A deactivated vertex remains inactive in subsequent execution steps unless it receives a message. A vertex activated by a message must vote to halt again to get deactivated. No computation is associated with edges.

The GraphX computation framework realizes Pregel based on the Gather-Apply-Scatter (GAS) pattern. GAS breaks down the computation of vertex program into purely edge-parallel and vertex-parallel stages with three user-defined functions, called gather, apply and scatter: the gather function is an associative and commutative operator over the inbound messages of a vertex; the apply function operates on a vertex and updates its value based on the inbound message; the scatter function computes and sends messages along each edge. Note that, unlike more standard implementations of Pregel, GraphX Pregel does not allow modifying edge attributes and graph topology during execution. Besides, it allows message exchanges only between the neighboring vertices. These restrictions are imposed so as to increase parallelism and enable more optimizations within GraphX.
GraphX Pregel does not provide operations to deactivate a vertex (i.e. to let a vertex vote to halt). Instead, it automatically avoids executing the scatter function on edges where neither endpoint has received a message. These edges will remain skipped until their endpoints receive a message again. Hence, an edge in GraphX Pregel is also a two-state machine, just like a vertex in the original Pregel: an edge will be deactivated if its endpoints do not receive any message and will be activated after its endpoints receive a message. In practice, GraphX decides whether an edge is activated or not by bookkeeping the set of vertices that received messages from the last execution step, called the active set. Hence, an edge is activated iff it is incident to some vertex in the active set. In addition, GraphX also uses the size of the active set as an indicator of workload and adjust message delivery strategy accordingly at runtime.
Adopting the bookkeeping mechanism allows GraphX Pregel to avoid unnecessary computation and adjust its message delivery strategy at runtime. The mechanism, however, can increase the complexity to reason about the behaviors of a Pregel program. Fortunately, it turns out that enabling bookkeeping does not change the observable behaviors of a Pregel program under certain conditions. More precisely, we have the following two facts:
Fact 1. The outcome of a GraphX Pregel program is not affected by the choice of message delivery strategy if the gather function is associative and commutative.
Fact 2. The outcome of a GraphX Pregel program is not affected by the bookkeeping mechanism if the scatter function only depends on the edge triplet, namely, the edge attribute, the source vertex, and the destination vertex.
The first fact is clear. To see why the second fact is true, consider two versions of GraphX frameworks, one of which executes the scatter function only on activated edges and the other executes it on all edges. Given the condition in Fact 2 and a deterministic apply function, we will show that a Pregel program produces the same output graph in both frameworks from the same initial message and input graph. The proof is done by induction on the number of execution steps taken by the program:
Step 0. Since the apply function is deterministic and the initial message is sent to all vertices without executing the scatter function, the program produces the same graph in both frameworks after the first step.
Step n+1. Suppose the program produces the same graph after the nth step. Consider an edge e in the (n+1)-th step. If e is active, then the scatter function behaves the same on it in both frameworks. If e is inactive, then two facts about its endpoints hold in the last step: i) they did not receive any message from e, and ii) they did not change values. Since the scatter function only checks the edge triplet of e, which is unchanged since the last step, the function will not send any message along e even if it is executed on e in this step. Hence, the scatter function sends the same messages to the same vertex in both frameworks. Moreover, since the gather function is associative and commutative, each vertex receives the same reduced inbound message in both frameworks. Finally, since the apply function is deterministic, the program will produce the same graph in both frameworks after the (n+1)-th step. This concludes our proof.
Note that the condition in Fact 2 holds when the scatter function is pure. Hence, we can safely assume that all edges are activated when we are simulating or reasoning about a pure Pregel program in GraphX. In the original Pregel model, however, active set is essential to decide whether vertex program should be executed on a vertex. It is the GAS decomposition and the restriction on edge modification that enable us to eliminate the use of active set in GraphX.

References

1. Pregel API in GraphX Programming Guide and its source code on GitHub.
2.Xin, Reynold S., et al. "Graphx: A resilient distributed graph system on spark." First International Workshop on Graph Data Management Experiences and Systems. 2013.
3. Xin, Reynold S., et al. "GraphX: Unifying data-parallel and graph-parallel analytics." arXiv preprint. 2014.
4. Gonzalez, Joseph E., et al. "Graphx: Graph processing in a distributed dataflow framework." OSDI. 2014.

Sunday, March 20, 2016

Equality in first-order logic

First-order logic (FOL) can be discussed with and without regarding identity as a in-built relation. This difference matters because the semantic of identity ("anything is equal to nothing but itself") is not definable in FOL. Hence, different ways to express the concept of identity lead to different models. By contrast, identity is definable in SOL, so the issues discussed here is not a problem in SOL.

First-order logic with identity

FOL with identity is a logic where "$=$" is a primitive logic symbol, i.e., symbols with the same semantics across all possible model. As a logical symbol, we stipulate that any model $\mathcal A$ of the logic always interpreted "$=$" to the identity relation on ${\rm Dom}(\mathcal A).$ Furthermore, the incorporation of equality adds three "Axioms of Equality" to the deduction system of the logic: reflexivity and the substitution for formulas, where the second axiom (aka. Leibniz's law) is an axiom schema. Other properties such as symmetry and transitivity can be derived from these two axioms.
Remark. Note that the Axioms of Equality are actually not axioms but inference rules. In fact, these axioms do NOT characterise equality––what they characterise are arbitrary equivalence relations on the domain of discourse that is congruent with respect to the functions and relations of the interpretation.
Example. Peano Arithmetic is the FO theory of $(\omega, +, succ)$ with identity, but Modulo Arithmetic is a FO theory without identity. Instead, for a fixed $n>0$, an equivalence relation $\equiv_n$ is defined such that $a \equiv_n b$ iff $a \equiv b$ mod $n$.

First-order logic without identity

Consider a vocabulary $\sigma$ where equality is not a logical symbol. A theory that contains the Axioms of Equality is called a theory with equality. Let $T$ be such a theory of $\sigma$ where the Axioms of Equality, denoted as $A_{\rm eq}$, are defined for a binary relation $\sim$. Given a model $\mathcal A$ of $T$, it is possible that $a\sim^{\mathcal A} b$ but $a\neq b$ although $a$ and $b$ will satisfy exactly the same formulas in ${\mathcal A}$ by Leibniz's law. On the other hand, we can define the quotient structure $\mathcal B := \mathcal A/\!\!\sim$ of $\mathcal A$. Clearly, $a\sim^{\mathcal B} b$ if and only if $a=b$.

Now we have two methods to define equality in FOL. What would be the difference anyway? The first thing to note is that satisfiability is independent of which method is used. To see this, let $F$ be a formula of vocabulary $\sigma$ plus equality and let $F'$ be the formula obtained by replacing all "$=$" in $F$ with "$\sim$". Consider the theory $T_F := A_{\rm eq}\cup\{F'\}$. Note that if $\mathcal A$ is a model of $T_F$ then $\mathcal A/\!\!\sim$ is a model of $F$. It follows that $F$ and $T_F$ are equi-satisfiable, as concluded in the following facts:
Fact. A formula $F$ is satisfiable in FOL with identity iff $T_F$ is satisfiable in FOL w/o identity.
Fact. A formula $F$ is valid in FOL with identity iff $A_{\rm eq}\implies F$ is valid in FOL w/o identity.
Let $T$ be a theory of a vocabulary with $\sim$. A model $\mathcal A$ is called a normal model of $T$ if $\mathcal A$ is a model of $T$ and $a\sim^{\mathcal A} b$ implies $a=b$. Hence, a theory with equality has a model iff it has a normal model. Results about FOL in the literature often assume identity with the notion of normal model, these results can be easily rephrased without identity. For example,
Theorem. (Compactness with identity) Given a set of formula $\Sigma$, if any finite subset of $\Sigma$ has a model, then $\Sigma$ also has a model.
Theorem. (Compactness w/o identity) Given a set of formula $\Sigma$, if for any finite subset of $\Sigma$ has a normal model, then $\Sigma$ also has a normal model.

References

The Logic of Equality, Leon Henkin, 1977.

Friday, March 18, 2016

A note on semi-linear and regular sets

Regular sets

A set is called regular if it can be recognised by a finite DFA. For $L\subseteq\Sigma^*$, $\sim_L$ is called a right congruence iff $u\sim_L v \iff uw\sim_L vw$ for all $w\in\Sigma^*.$ Note that a right congruence is an equivalence relation. The classes of $\sim_L$ induce a minimal complete DFA recognising $L.$ Intuitively, words belonging to the same class in $\sim_L$ lead to the same state in the DFA. The classes containing the empty string and the strings in $L$ correspond to the initial state and the final states, respectively.
Myhill-Nerode Theorem. A set $L$ is regular iff $\sim_L$ has finite index over $\Sigma^*.$
Rational sets. A set is called rational iff it can be generated from a finite number of finite subsets of $\Sigma^*$ through a finite number of rational operations, namely union, concatenation and the Kleene star.
Kleene's theorem. A set of words is regular iff it is rational. (Note: The only-if part doesn't holds for non-length-preserving relations.)

Semi-linear sets

A subset of $\mathbb N^k$ is linear if it can be expressed as$$u_0 + \langle u_1, \dots, u_m \rangle = \{u_0 + a_1u_1 + \dots + a_mu_m \mid a_1, \dots, a_m \in \mathbb{N}\}$$such that $u_0,...,u_m$ are vectors of dimension $k$. A set is called semi-linear if it is a union of finitely many linear sets.
Fact. Semi-linear sets are closed under complement, intersection, projection, etc., in an effective way.
Fact. A set is semi-linear iff it is Presburger-definable.
Fact. If a set is upward-closed or downward-closed, then it is semi-linear.

Parikh image. Let $\Sigma=\{a_1,...,a_k\}$. Given a word $w$, we use $|w|(a)$ to denote the number of occurrences of symbol $a$ in $w$. A function $\phi:\Sigma^*\rightarrow\mathbb N^k$ is called the Parikh image if the $i$-th component of $\phi(w)$ is $|w|(a_i).$ [note]

Fact. Parikh image is a morphism from monoid $(\Sigma^*,\cdot,\epsilon)$ to monoid $(\mathbb N^k,+,0).$
Fact. A set $S\subseteq \mathbb N^k$ is semi-linear iff there is a regular set $R\subseteq\Sigma^*$ such that $\phi(R)=S$, where $\phi$ is the Parikh image. [1]

From RE to its Parikh image. The key observation to computing the Parikh image of a regular set is to describe the depths of Kleene stars by constraints over the dependency between frequencies. For example, $(a^*b)^*$ can be reduced to $a^*b^*$ plus a constraint "$|w|(b)=0$ implies $|w|(a)=0$", which can further resolve to two linear conditions "$|w|(b)=0\wedge |w|(a)=0$" and "$|w|(b)\ge 1 \wedge |w|(a)\ge 0$" corresponding to a union of two linear sets. See [2] and the references therein for more details abort constructions.

Parikh's Theorem. [1] If $L$ is a context-free language then there exists a regular language $R$ such that $\phi(L)=R.$ (This implies CFL and RL are indistinguishable when the alphabet is commutative.)

Real numbers

A relation in real closed fields is definable in the first-order logic if and only if it is a Boolean combination of polynomial equations [?].

Connection with logic

Connection with MSO. [Buchi] A set $L\subseteq\Sigma^*$ is regular iff it can be defined in $MSO(\Sigma, <_{\mathbb N}, \{P_a\}_{a\in\Sigma})$. [slide]
Connection with LTL. [?] A set is LTL-defiable iff it is $FO(\Sigma^\omega, <_{\mathbb N}, \{P_a\}_{a\in\Sigma})$-definable.
Connection with FO. A regular expression constructed without the Kleene star is called star-free. For example, $(ab)^*$ is star-free but $(aa)^*$ is not. McNaughton showed that a regular set is star-free iff it can be defined in $FO(\Sigma, <_{\mathbb N}, \{P_a\}_{a\in\Sigma})$, cf. [3]. Hence, checking star-freeness of a regular expression is equivalent to checking whether an MSO-formula is FO-definable. The former problems is shown to be PSPACE-complete, see here and here. This complexity result is generalised to checking star-freeness of rational relations, see here.

References

1. On Context-Free Languages, J. ACM, 1966.
2. Semi-linear Parikh Images of Regular Expressions via Reduction, MFCS10.
3. First-order definable languages, 2008.
4. Jewels of Formal Language Theory, 1982.

Wednesday, February 17, 2016

A note on checking properties of regular languages

Deterministic Finite Automata

Equivalence. If a DFA with $n$ states does not accept every string, it rejects some string of length at most $n-1$. It follows that the languages of two DFA $A$ and $B$ coincide iff they accept the same words up to length $|A|\cdot|B| - 1$. (Hint: consider automaton $A\times B$.) This observation leads to an exponential-time algorithm for language equivalence checking. In practice, equivalence between DFA is often checked by first minimising the DFA (see below). Isomorphism between DFA can be checked in linear time when an appropriate representation is used, cf. Hopcroft's method.

Minimisation. The minimal DFA of a regular language is unique by the Myhill-Nerode Theorem. We can hence check language equivalence of DFA by checking isomorphism between DFA. DFA minimisation takes $O(|\Sigma|\cdot n\log n)$ time using Hopcroft's method, which is the most commonly used approach in practice. Aho, Hopcroft and Ullman proposed an algorithm that uses union-and-find to create and locate the equivalence classes for states in a DFA (cf. Sec. 3 of [2]). The running time is $O(kn\!\cdot\!A(kn,n))$ with $k=|\Sigma|$.

Universality can be checked by checking that all reachable states are accepting.
Complement is obtained by simply flipping the accepting and the non-accepting states.
Aperiodicity (ie. existence of non-trivial cycles) of a DFA is PSPACE-complete, cf. here.
Intersection vacuity of an unbounded number of DFA is PSPACE-complete, cf. here and here.
Reversal. The trivial way to get the DFA of the reverse language is to determinise the NFA obtained by flipping the initial and the accepting states and reversing the directions of the transitions. This approach leads to exponential blowup. It is unclear as to the lower bound of this problem.

Non-deterministic Finite Automata

Equivalence. NFA equivalence problem is PSPACE-complete --- in fact, it is the first problem shown to be in this complexity class, see [5]. Practically, checking equivalence of NFA is often done by first converting NFA to DFA using subset construction, which however can lead to exponential growth in automata size. There are other determinisation methods in the literature, c.f. wiki and [4].

Universality. Using determinisation, if an NFA of size $n$ does not accept every string then it rejects some string of length at most $|\Sigma|^n-1$. Together with the classic result that reachability can be checked is in NL, we easily see that university checking for NFA is in PSPACE. One can show that the problem is PSPACE-complete and it is so even when every state of the NFA is accepting.

Complement. Complementing a DFA can be easily done by flipping the accepting and the non-accepting states. This method doesn't work for NFA: the language of the "flipped" NFA will be the superset of its complement in general (e.g., consider the NFA $\{q_0 \xrightarrow{a} q_1, q_0 \xrightarrow{a} q_2\}$ where $q_1$ is accepting). In [6], the authors give very precise bounds for complexities for complement on NFA. Particularly, there are NFA with exponentially large complement: the language of all words over $\{1,…,n\}$ which don't contain all symbols is accepted by an NFA of size $n$, but its complement needs $2^n$ states. Hence, complement of NFA is EXPSPACE-complete; no general method for complementing NFA can have better complexity than doing subset construction in theory.

Minimisation. Minimisation to/from NFA is PSPACE-hard. Given a DFA or an NFA, it is PSPACE-hard to find the size of the minimal equivalent NFA. Some hardness of approximation results have shown that it is even hard to approximate the size of the minimal equivalent NFA. Since a regular expression can be converted to an NFA of comparable size, these hardness results show that we can't expect any efficient algorithm to convert an regular expression to a minimal NFA. See these lecture notes by Artem Kaznatcheev for more details.

Regular expressions

Equivalence. Like NFA, checking equivalence of RE is PSPACE-complete. This can be expected since RE can be converted to and from NFA of roughly the same size, the complexity of checking properties of RE should be comparable to that of checking NFA. However, if intersection is allowed then the difficulty rises to EXPTIME. Further, if negation is allowed then the problem becomes non-elementary [5].

Star-freeness of RE is PSPACE-complete, see here. This problem corresponds to checking the non-existence of non-trivial cycles in DFA, which is also PSPACE-complete.
Emptiness for regular expressions with intersection is PSPACE-complete.
(Often, checking the a property for DFA is far more efficient than checking the same property for RE. So it is surprising to note that the above two problems have the same complexity for both DFA and RE.)

Star-free Regular expressions

Checking equivalence of star-free RE is NP-complete, cf. Section 2 of [3].

Resources

Online converters: RE-to-NFA, NFA-to-RE, NFA-to-DFA
RE equivalence matching in the wild:
1.Backtracking (used in Perl, Python, etc): Discussion, Demo, Code
2.Virtual-machine method (used in Ruby): Explanation

References

1. Equivalence of regular languages.
2. Testing the Equivalence of Regular Languages
3. On the equivalence, containment, and covering problems for the regular and context-free languages
4. Five determinisation algorithms
5. Word problems requiring exponential time
6. State Complexity of Concatenation and Complementation of Regular Languages
7. Complexity of Some Problems from the Theory of Automata

Wednesday, February 10, 2016

A note on the expressiveness of logic

Reachability problem. Fix a graph $\mathcal{G}$. A vertex $u$ is said to be reachable from a vertex $v$ in $\mathcal{G}$ if there exist $r \ge 0$ vertices $x_1$, ..., $x_r$, such that $(v, x_1)$, $(x_1, x_2)$, ..., $(x_{r−1}, u)$ are edges of $\mathcal{G}$. A formula $\phi(x,y)$ with exactly two free variables $x, y$ and one relation symbol $E$ is said to "express" reachability if and only if for any finite graph $\mathcal{G}$, vertex $u$ is reachable from vertex $v$ in $\mathcal{G}$ if and only if $\mathcal{G}$ is a model of $\phi(x,y)$ with respect to some interpretation that associates $E$ to edge relation and $x,\ y$ to vertices $u,\ v$.
Reachability is not expressible in first-order logic. Suppose $\phi(x,y)$ is a FO logic formula expressing reachability. Define three sentences $s_1$, $s_2$, and $s_3$, such that
$s_1 \equiv \forall x \forall y. \phi(x, y)$, i.e., all vertices are reachable from all vertices;
$s_2 \equiv \forall x. (\exists y. E(x, y) \wedge \forall z. E(x, z) \implies z = y)$, i.e., every vertex has out-degree one;
$s_3 \equiv \forall x. (\exists y. E(y, x) \wedge \forall z. E(z, x) \implies z = y)$, i.e., every vertex has in-degree one.
Let $s \equiv s_1 \wedge \ s_2 \wedge \ s_3$. Observe that a graph is a model of $s$ if and only if the graph is cycle. Since there are finite cycles with as many vertices as desired, $s$ has arbitrary large finite models. It then follows by Lowenheim-Skolem theorem that $s$ has an infinite model. But infinite cycles do not exist, and thus our assumption that there exists a FO logic formula expressing reachability is contradicted.
Reachability is expressible in second-order logic. Given constants $x$ and $y$ and a binary relation symbol $L$, we use the following sentences to characterize $L$ as a path from $x$ to $y$:
1. $L$ is a linear order over the vertices : $$ \begin{align*} \psi_1 \equiv\ & \forall u. \neg L(u,u) \\ & \wedge \forall u\forall v. (L(u,v) \implies \neg L(v,u)) \\ & \wedge \forall u\forall v \forall w. (L(u,v) \wedge L(v,w) \implies L(u,w)) \end{align*} $$ 2. Any two consecutive vertices in the order must be joined by an edge: $$ \begin{align*} \psi_2 \equiv\ &\forall u \forall v. ((L(u,v) \wedge \forall w. (\neg L(u,w) \vee \neg L(w,v))) \implies E(u,v)). \end{align*}$$3. The first vertex in the order is $x$, the last vertex is $y$, and $x$, $y$ are in the order: $$ \psi_3 \equiv (\forall u.\ \neg L(u,x)) \wedge (\forall v.\ \neg L(y,v)) \ \wedge L(x,y).$$ Now an expression for reachability can be defined as$$\phi(x,y) \equiv \exists L. ((x=y) \vee (\psi_1 \wedge \psi_2 \wedge \psi_3)).$$

Expressiveness of first-order Logic

It was known that the natural deduction proof system for FO logic is both sound and complete. Therefore, since verifying a proof is mechanical, deriving a proof must be as hard as checking its validity. Hence it is impossible to build a perfect automatic theorem prover for FO logic. In fact, validity of FO is undecidable since FO can formulate the Post correspondence problem. On the other hand, validity of FO is easier than satisfiability, since the former is enumerable while the latter is not.
Suppose we have only equality and edge relation in our language. A $\phi$-graph problem is to decide whether there exists an interpretation such that a given finite graph is a model of formula $\phi$. The class L-graph is formed by all $\phi$-graph problems where $\phi$ is a formula defined in logic L.
Fact. FO-graph is strictly in $P$.
The inclusion is strict because reachability problem (which is P-complete) is not expressible in FO and thus not captured by FO-graph. In fact, any FO-graph problem can be decided in LOGSPACE (check this). Immerman and Vardi (1982) added inductive definitions to FO to get the logic LFP, which is sufficient to express P problems over ordered graphs, namely, there is a distinguished relation $<$ that is interpreted as a linear order over the vertices.
Theorem. (Immerman-Vardi 1982) LFP-graph is in $P$.
However, for graphs without the ordering assumption, there are easily poly-time computable properties that are not definable in LFP. Hence LFP is still too weak to capture $P$. Grohe, Holm, and Laubner have added linear-algebraic operators to LFP. This properly extends the power of LFP with counting, but its relationship to $P$ remains an open question.

Expressiveness of second-order Logic

Second-order logic (SO) extends first-order logic with quantifications over relations and sets. Monadic second-order logic (MSO) is the restriction of SO where the second-order quantifiers are only over sets. Existential second-order logic (ESO) is a restricted version of SO where only existential quantifications are allowed.
Fact. MSO-graph is strictly between P and NP, since 3-colorability is in MSO-graph but Hamiltonian path problem is not (the latter two problems are NP-complete). However, Hamiltonian path problem can be captured by extending MSO with quantifiers over sets of edges.
The following result shows that ESO is in some senses the normal form of SO.
Proposition. (Buchi 1976) Any MSO formula can be re-written to an equivalent EMSO formula.
ESO-graph captures NP. It is straightforward to show that ESO-graph is in NP by non-deterministically choosing a relation for in polynomial time. Now let $\phi_1$ and $\phi_2$ be defined as before. In addition, define $\phi_4 \equiv \forall u\forall v. L(u, v)\vee L(v, u) \vee u = v$. Observe that $\mathcal{G}$ is a model of $\exists L. \psi_1 \wedge \psi_2 \wedge \psi_4$ if and only if $\mathcal{G}$ contains a Hamiltonian path. Since deciding the existence of a Hamiltonian path is NP-complete, we have
Theorem. (Fagin, 1974) $NP$ = ESO-graph.
By Fagin's theorem, ESO is closed under negation if and only if NP = co-NP. Since P $\subseteq$ NP$~\cap~$co-NP, second-order logic covers all languages in the polynomial time hierarchy. Other second-order concepts, e.g., least fixed-point operator and transitive closure operator, lead to logics which characterise further complexity classes like P, NLOGSPACE, PSPACE. For these results of descriptive complexity theory see [1].
The "weak" MSO (WMSO) is a further restricted version of MSO such that the set quantifiers range only over finite sets. The WMSO theory of $(\mathbb N, <)$ is a particular useful theory called Weak (monadic) Second-order Logic with one Successor (WS1S). Buchi has shown that WS1S is decidable via automata-theoretic approaches.
Theorem. (Buchi 1960) One an effectively construct an input-free finite automaton $A_\varphi$ from a sentence $\varphi$ in WS1S, such that $L(A_\varphi)\neq \emptyset$ iff $\varphi$ holds. Conversely, given an input-free automaton $A$ one can defined a sentence $\varphi_A$ in WS1S such that $L(A)\neq \emptyset$ iff $\varphi_A$ holds.
The idea of reduction is to represent numbers in binary, namely, as finite words over $\{0,1\}$, and to view a finite word as the (characteristic function of) a finite set via finite-set interpretation [3,4]. For instance, integer $21$ corresponds to the word $10101$, which is interpreted to a finite set $\{1,3,5\}$.

It is interesting to note that the expressiveness power of WS1S is equivalent to a first-order theory. Presburger Arithmetic (PA) is the first-order theory of the structure $(\mathbb N, +)$. In Buchi's 1960 paper it is noted that the decidability of PA can be inferred from that of WS1S. The decidability of Presburger arithmetic follows by translating first-order sentences in the signature $+$ to WS1S. To translate back from WS1S (or from finite automata), we would need to extend PA with some arithmetical means such as $|_2$, which associates each number $m>0$ with the greatest power of 2 that divides $m$. [4]
Theorem. A relation is definable in WS1S iff it is definable in the FO theory of $(\mathbb N, +, |_2)$.

References

1. Finite Model Theory, Springer-Verlag, New York 1995.

2. Languages, Automata, and Logic, Handbook of Formal Languages, Volume 3.
3. Transforming structures by set interpretations. LMCS 2007.
4. Automatic Structures. PhD Thesis. 2004.

Thursday, January 28, 2016

Intuition behind a mathematical puzzle

Sometimes, it is helpful to reason about a mathematical puzzle in a concrete context. A solution may reveal itself after you reword the problem statement in intuitive terms. The following example shows how the solution to a seemingly nontrivial problem becomes obvious after the statement of the problem is reworded appropri-ately.
Example. Consider the following function:

function f (n: int) {
  if n == 1 return n
  choose integers a, b ≥ 0 such that a + b == n
  return a * b + f(a) + f(b)
}

Our goal is to show that the output of f(n) does not depend on how a and b are determined in the function body. This fact is far from obvious by inspecting the code. A standard approach to show this kind of fact is via mathematical induction. However, to establish the induction hypothesis, one must first express the value of f(n) in $n$. This is not easy given that values of a and b are not specified. Besides, an inductive proof typically proceeds by manipulating formulas. In that case, the proof itself is not very helpful for us to gain insights to the fact it proves.
On the other hand, the fact turns evident once you notice what the function intends to compute. The key observation here is that f(n) effectively counts the number of hand-shaking in a group of $n$ people. A group of size $n$ is is divided into two groups of size $a$ and $b.$ People from different groups then shake hand with each other. This process continues recursively until no smaller group can be form. At that time, each person eventually shakes hand with everybody else for exactly once. Therefore, the value of f(n) is $n\cdot(n-1)/2$ regardless of how groups are divided during the process.

A technique called double counting is closely related to the idea expressed above.
Example. We know that there are ${n \choose k}$ ways to select a set of size $k$ from a set $S$ of $n$ distinct elements. If $S$ is divided into two partitions of size $a$ and $b$, then selecting a subset from $S$ amounts to selecting a subset from each of the partitions respectively. Therefore, no matter how the partitions are divided, i.e., $a + b = n$, we have $$\sum_{i+j=k}{a \choose i}{b \choose j} = {n \choose k}$$(recall that ${n \choose 0}=1$ and ${n \choose k}=0$ for $k<0$ or $k>n$).

Tuesday, January 12, 2016

A note on Angluin’s Learning Algorithm

The L* Learning Algorithm, also known as Angluin’s Learning Algorithm for DFAs, was first developed by Angluin [1] and later improved by Rivest and Schapire [2]. L* learns an unknown regular language, say $U$ over an alphabet $\Sigma$, and produces a DFA $C$ such that $L(C) = U$. L* works by incrementally producing a sequence of candidate DFAs $C_1 , C_2, ...$ converging to $C$. In order to learn $U$, L* needs a Teacher to answer two type of questions. The first type is a membership query, consisting of a string $\sigma \in \Sigma*$; the answer is true if $\sigma \in U$, and false otherwise. The second type of question is a conjecture, which is a candidate DFA $C$ whose language the algorithm believes to be identical to $U$. The answer is true if $L(C) = U$. Otherwise the Teacher returns a counterexample, which is a string $\sigma'$ in the symmetric difference of $L(C)$ and $U$.
At a higher level, L* creates a table where it incrementally records whether strings in $\Sigma^*$ belong to $U$. It does this by making membership queries to the Teacher. At various stages, L* would decide to make a conjecture. It constructs a candidate automaton $C$ based on the information contained in the table, and asks the Teacher whether the conjecture is correct. If it is, the algorithm terminates. Otherwise, L* uses the counterexample returned by the Teacher to expand the table with strings that witness differences between $L(C)$ and $U$.
The Learner maintains a Boolean table with index set $X\times Y \subset \Sigma^*\times\Sigma^*.$ The rows of the table are indexed by $X$ while the columns are indexed by $Y$. Each cell $(x,y)$ indicates whether or not $xy \in U$. Two strings $a,b\in X$ are called $Y$-equivalent, written as $a\equiv_Y b$, if $ay\in U \iff by\in U$ for all $y\in Y.$ Note that $a\equiv_Y b$ iff the rows indexed by $a$ and $b$ are identical binary strings. The table is called consistent iff all distinct strings $x,x'\in X$ are not $Y$-equivalent. The table is called closed iff for all $x\in X$ and $a\in\Sigma$, there exists $x'\in X$ such that $xa\equiv_Y x'$.
The table determines a reduced DFA when it is consistent and closed. The states of the DFA are $\{[x]_Y:x\in X\}$ (here $[\cdot]_Y$ is the equivalence classes induced by $\equiv_Y$); accepting states are $\{[x]_Y: x\in X\cap U\}$; the transition function $\delta:[X]_Y\times\Sigma\to [X]_Y$ is given by $\delta([x]_Y,a)=[xa]_Y.$ This DFA is reduced as every two states of it can be distinguished by some string in $Y$ by the definition of consistency. The L* algorithm can now be sketched as follows:
Step 1. Start with a one-cell table with index set $\{\epsilon\}\times \{\epsilon\}$.
Step 2. Fill the table cells with membership queries and expand the index set when necessary until the table is consistent and closed. This procedure guarantees to find the minimal expansion of the table that is consistent and closed. Construct a conjecture DFA $C$ from the table.
Step 3. Check the conjecture using an equivalence query. If it is correct then output it and halt. Otherwise, expand the table with the provided counterexample and go to Step 2.
We give more details about Step 3. Suppose the Teacher returns a counterexample $w=w_1\dots w_n \in \Sigma^n$ for conjecture $C$. Let the trace of this string in $C$ be$$x_0\overset{w_1}{\longrightarrow}x_1\overset{w_2}{\longrightarrow}\cdots\overset{w_n}{\longrightarrow}x_n,$$for $x_0,...,x_n\in X.$ Then there is $1\le i\le n$ such that $x_i w_{i+1}...w_n \in U \iff w \not\in U.$ The table is then expanded by including $x_{i-1}w_i$ and $w_{i+1}...w_n$ in $X$ and $Y$, respectively.
Recall that the number of states of a conjecture DFA is the same as the number of distinct rows in the table. Also, the number of distinct rows increases by at least one each time an equivalence query fails. The L* algorithm would therefore terminate after making at most $|C| − 1$ incorrect equivalence queries and learn the reduced automaton $C$ for the target language.

Difficulty of learning DFAs

Difficulty of offline learning. Let $\mathcal{A}, \mathcal{B} \subset \Sigma^*$ be two finite sets of strings. Given $k$, it is NP-complete to decide whether a DFA $D$ with $k$ states exists such that $\mathcal{A} \subseteq L(D)$ and $\mathcal{B} \subseteq \Sigma^* \setminus L(D)$. It is still NP-complete to find such a DFA with the minimum size. In fact, even finding a DFA with a number of states polynomial on the number of states of the minimum one is NP-complete. If all strings of length $n$ or less are given, then the problem can be solved in time polynomial on the input size. Note, however, that the size of the input is itself exponential on the number of states in the resulting DFA. Angluin has shown that this problem remains NP-complete if an arbitrarily small fixed fraction is missing from the input set.
Difficulty of online learning. Learning a minimal DFA becomes easier if the algorithm is allowed to make queries to the unknown automaton. The L* algorithm proposed by Angluin [1] solves the problem in polynomial time by allowing the algorithm to ask membership and equivalence queries. Rivest and Schapire [2] later improved the algorithm so that it does not need to reset the unknown automaton to the initial state after each query.

References

1. D. Angluin. Learning regular sets from queries and counterexamples. Information and Computation, 1987.
2. R. L. Rivest and R. E. Schapire. Inference of finite automata using homing sequences. Information and Computation, 1993.
3. Michael Kearns and Umesh Vazirani. An Introduction to Computational Learning Theory. MIT Press, 1994.
4. Ron Rivest, Machine Learning Lecture Notes at MIT, Lecture 23 & 24.
5. Rajesh Parekh, Vasant Honavar. Learning DFA from Simple Examples. Machine Learning, 2001.