Ericpony's Blog: February 2016

Wednesday, February 17, 2016

A note on checking properties of regular languages

Deterministic Finite Automata

Equivalence. If a DFA with $n$ states does not accept every string, it rejects some string of length at most $n-1$. It follows that the languages of two DFA $A$ and $B$ coincide iff they accept the same words up to length $|A|\cdot|B| - 1$. (Hint: consider automaton $A\times B$.) This observation leads to an exponential-time algorithm for language equivalence checking. In practice, equivalence between DFA is often checked by first minimising the DFA (see below). Isomorphism between DFA can be checked in linear time when an appropriate representation is used, cf. Hopcroft's method.

Minimisation. The minimal DFA of a regular language is unique by the Myhill-Nerode Theorem. We can hence check language equivalence of DFA by checking isomorphism between DFA. DFA minimisation takes $O(|\Sigma|\cdot n\log n)$ time using Hopcroft's method, which is the most commonly used approach in practice. Aho, Hopcroft and Ullman proposed an algorithm that uses union-and-find to create and locate the equivalence classes for states in a DFA (cf. Sec. 3 of [2]). The running time is $O(kn\!\cdot\!A(kn,n))$ with $k=|\Sigma|$.

Universality can be checked by checking that all reachable states are accepting.
Complement is obtained by simply flipping the accepting and the non-accepting states.
Aperiodicity (ie. existence of non-trivial cycles) of a DFA is PSPACE-complete, cf. here.
Intersection vacuity of an unbounded number of DFA is PSPACE-complete, cf. here and here.
Reversal. The trivial way to get the DFA of the reverse language is to determinise the NFA obtained by flipping the initial and the accepting states and reversing the directions of the transitions. This approach leads to exponential blowup. It is unclear as to the lower bound of this problem.

Non-deterministic Finite Automata

Equivalence. NFA equivalence problem is PSPACE-complete --- in fact, it is the first problem shown to be in this complexity class, see [5]. Practically, checking equivalence of NFA is often done by first converting NFA to DFA using subset construction, which however can lead to exponential growth in automata size. There are other determinisation methods in the literature, c.f. wiki and [4].

Universality. Using determinisation, if an NFA of size $n$ does not accept every string then it rejects some string of length at most $|\Sigma|^n-1$. Together with the classic result that reachability can be checked is in NL, we easily see that university checking for NFA is in PSPACE. One can show that the problem is PSPACE-complete and it is so even when every state of the NFA is accepting.

Complement. Complementing a DFA can be easily done by flipping the accepting and the non-accepting states. This method doesn't work for NFA: the language of the "flipped" NFA will be the superset of its complement in general (e.g., consider the NFA $\{q_0 \xrightarrow{a} q_1, q_0 \xrightarrow{a} q_2\}$ where $q_1$ is accepting). In [6], the authors give very precise bounds for complexities for complement on NFA. Particularly, there are NFA with exponentially large complement: the language of all words over $\{1,…,n\}$ which don't contain all symbols is accepted by an NFA of size $n$, but its complement needs $2^n$ states. Hence, complement of NFA is EXPSPACE-complete; no general method for complementing NFA can have better complexity than doing subset construction in theory.

Minimisation. Minimisation to/from NFA is PSPACE-hard. Given a DFA or an NFA, it is PSPACE-hard to find the size of the minimal equivalent NFA. Some hardness of approximation results have shown that it is even hard to approximate the size of the minimal equivalent NFA. Since a regular expression can be converted to an NFA of comparable size, these hardness results show that we can't expect any efficient algorithm to convert an regular expression to a minimal NFA. See these lecture notes by Artem Kaznatcheev for more details.

Regular expressions

Equivalence. Like NFA, checking equivalence of RE is PSPACE-complete. This can be expected since RE can be converted to and from NFA of roughly the same size, the complexity of checking properties of RE should be comparable to that of checking NFA. However, if intersection is allowed then the difficulty rises to EXPTIME. Further, if negation is allowed then the problem becomes non-elementary [5].

Star-freeness of RE is PSPACE-complete, see here. This problem corresponds to checking the non-existence of non-trivial cycles in DFA, which is also PSPACE-complete.
Emptiness for regular expressions with intersection is PSPACE-complete.
(Often, checking the a property for DFA is far more efficient than checking the same property for RE. So it is surprising to note that the above two problems have the same complexity for both DFA and RE.)

Star-free Regular expressions

Checking equivalence of star-free RE is NP-complete, cf. Section 2 of [3].

Resources

Online converters: RE-to-NFA, NFA-to-RE, NFA-to-DFA
RE equivalence matching in the wild:
1.Backtracking (used in Perl, Python, etc): Discussion, Demo, Code
2.Virtual-machine method (used in Ruby): Explanation

References

1. Equivalence of regular languages.
2. Testing the Equivalence of Regular Languages
3. On the equivalence, containment, and covering problems for the regular and context-free languages
4. Five determinisation algorithms
5. Word problems requiring exponential time
6. State Complexity of Concatenation and Complementation of Regular Languages
7. Complexity of Some Problems from the Theory of Automata

Wednesday, February 10, 2016

A note on the expressiveness of logic

Reachability problem. Fix a graph $\mathcal{G}$. A vertex $u$ is said to be reachable from a vertex $v$ in $\mathcal{G}$ if there exist $r \ge 0$ vertices $x_1$, ..., $x_r$, such that $(v, x_1)$, $(x_1, x_2)$, ..., $(x_{r−1}, u)$ are edges of $\mathcal{G}$. A formula $\phi(x,y)$ with exactly two free variables $x, y$ and one relation symbol $E$ is said to "express" reachability if and only if for any finite graph $\mathcal{G}$, vertex $u$ is reachable from vertex $v$ in $\mathcal{G}$ if and only if $\mathcal{G}$ is a model of $\phi(x,y)$ with respect to some interpretation that associates $E$ to edge relation and $x,\ y$ to vertices $u,\ v$.
Reachability is not expressible in first-order logic. Suppose $\phi(x,y)$ is a FO logic formula expressing reachability. Define three sentences $s_1$, $s_2$, and $s_3$, such that
$s_1 \equiv \forall x \forall y. \phi(x, y)$, i.e., all vertices are reachable from all vertices;
$s_2 \equiv \forall x. (\exists y. E(x, y) \wedge \forall z. E(x, z) \implies z = y)$, i.e., every vertex has out-degree one;
$s_3 \equiv \forall x. (\exists y. E(y, x) \wedge \forall z. E(z, x) \implies z = y)$, i.e., every vertex has in-degree one.
Let $s \equiv s_1 \wedge \ s_2 \wedge \ s_3$. Observe that a graph is a model of $s$ if and only if the graph is cycle. Since there are finite cycles with as many vertices as desired, $s$ has arbitrary large finite models. It then follows by Lowenheim-Skolem theorem that $s$ has an infinite model. But infinite cycles do not exist, and thus our assumption that there exists a FO logic formula expressing reachability is contradicted.
Reachability is expressible in second-order logic. Given constants $x$ and $y$ and a binary relation symbol $L$, we use the following sentences to characterize $L$ as a path from $x$ to $y$:
1. $L$ is a linear order over the vertices : $$ \begin{align*} \psi_1 \equiv\ & \forall u. \neg L(u,u) \\ & \wedge \forall u\forall v. (L(u,v) \implies \neg L(v,u)) \\ & \wedge \forall u\forall v \forall w. (L(u,v) \wedge L(v,w) \implies L(u,w)) \end{align*} $$ 2. Any two consecutive vertices in the order must be joined by an edge: $$ \begin{align*} \psi_2 \equiv\ &\forall u \forall v. ((L(u,v) \wedge \forall w. (\neg L(u,w) \vee \neg L(w,v))) \implies E(u,v)). \end{align*}$$3. The first vertex in the order is $x$, the last vertex is $y$, and $x$, $y$ are in the order: $$ \psi_3 \equiv (\forall u.\ \neg L(u,x)) \wedge (\forall v.\ \neg L(y,v)) \ \wedge L(x,y).$$ Now an expression for reachability can be defined as$$\phi(x,y) \equiv \exists L. ((x=y) \vee (\psi_1 \wedge \psi_2 \wedge \psi_3)).$$

Expressiveness of first-order Logic

It was known that the natural deduction proof system for FO logic is both sound and complete. Therefore, since verifying a proof is mechanical, deriving a proof must be as hard as checking its validity. Hence it is impossible to build a perfect automatic theorem prover for FO logic. In fact, validity of FO is undecidable since FO can formulate the Post correspondence problem. On the other hand, validity of FO is easier than satisfiability, since the former is enumerable while the latter is not.
Suppose we have only equality and edge relation in our language. A $\phi$-graph problem is to decide whether there exists an interpretation such that a given finite graph is a model of formula $\phi$. The class L-graph is formed by all $\phi$-graph problems where $\phi$ is a formula defined in logic L.
Fact. FO-graph is strictly in $P$.
The inclusion is strict because reachability problem (which is P-complete) is not expressible in FO and thus not captured by FO-graph. In fact, any FO-graph problem can be decided in LOGSPACE (check this). Immerman and Vardi (1982) added inductive definitions to FO to get the logic LFP, which is sufficient to express P problems over ordered graphs, namely, there is a distinguished relation $<$ that is interpreted as a linear order over the vertices.
Theorem. (Immerman-Vardi 1982) LFP-graph is in $P$.
However, for graphs without the ordering assumption, there are easily poly-time computable properties that are not definable in LFP. Hence LFP is still too weak to capture $P$. Grohe, Holm, and Laubner have added linear-algebraic operators to LFP. This properly extends the power of LFP with counting, but its relationship to $P$ remains an open question.

Expressiveness of second-order Logic

Second-order logic (SO) extends first-order logic with quantifications over relations and sets. Monadic second-order logic (MSO) is the restriction of SO where the second-order quantifiers are only over sets. Existential second-order logic (ESO) is a restricted version of SO where only existential quantifications are allowed.
Fact. MSO-graph is strictly between P and NP, since 3-colorability is in MSO-graph but Hamiltonian path problem is not (the latter two problems are NP-complete). However, Hamiltonian path problem can be captured by extending MSO with quantifiers over sets of edges.
The following result shows that ESO is in some senses the normal form of SO.
Proposition. (Buchi 1976) Any MSO formula can be re-written to an equivalent EMSO formula.
ESO-graph captures NP. It is straightforward to show that ESO-graph is in NP by non-deterministically choosing a relation for in polynomial time. Now let $\phi_1$ and $\phi_2$ be defined as before. In addition, define $\phi_4 \equiv \forall u\forall v. L(u, v)\vee L(v, u) \vee u = v$. Observe that $\mathcal{G}$ is a model of $\exists L. \psi_1 \wedge \psi_2 \wedge \psi_4$ if and only if $\mathcal{G}$ contains a Hamiltonian path. Since deciding the existence of a Hamiltonian path is NP-complete, we have
Theorem. (Fagin, 1974) $NP$ = ESO-graph.
By Fagin's theorem, ESO is closed under negation if and only if NP = co-NP. Since P $\subseteq$ NP$~\cap~$co-NP, second-order logic covers all languages in the polynomial time hierarchy. Other second-order concepts, e.g., least fixed-point operator and transitive closure operator, lead to logics which characterise further complexity classes like P, NLOGSPACE, PSPACE. For these results of descriptive complexity theory see [1].
The "weak" MSO (WMSO) is a further restricted version of MSO such that the set quantifiers range only over finite sets. The WMSO theory of $(\mathbb N, <)$ is a particular useful theory called Weak (monadic) Second-order Logic with one Successor (WS1S). Buchi has shown that WS1S is decidable via automata-theoretic approaches.
Theorem. (Buchi 1960) One an effectively construct an input-free finite automaton $A_\varphi$ from a sentence $\varphi$ in WS1S, such that $L(A_\varphi)\neq \emptyset$ iff $\varphi$ holds. Conversely, given an input-free automaton $A$ one can defined a sentence $\varphi_A$ in WS1S such that $L(A)\neq \emptyset$ iff $\varphi_A$ holds.
The idea of reduction is to represent numbers in binary, namely, as finite words over $\{0,1\}$, and to view a finite word as the (characteristic function of) a finite set via finite-set interpretation [3,4]. For instance, integer $21$ corresponds to the word $10101$, which is interpreted to a finite set $\{1,3,5\}$.

It is interesting to note that the expressiveness power of WS1S is equivalent to a first-order theory. Presburger Arithmetic (PA) is the first-order theory of the structure $(\mathbb N, +)$. In Buchi's 1960 paper it is noted that the decidability of PA can be inferred from that of WS1S. The decidability of Presburger arithmetic follows by translating first-order sentences in the signature $+$ to WS1S. To translate back from WS1S (or from finite automata), we would need to extend PA with some arithmetical means such as $|_2$, which associates each number $m>0$ with the greatest power of 2 that divides $m$. [4]
Theorem. A relation is definable in WS1S iff it is definable in the FO theory of $(\mathbb N, +, |_2)$.

References

1. Finite Model Theory, Springer-Verlag, New York 1995.

2. Languages, Automata, and Logic, Handbook of Formal Languages, Volume 3.
3. Transforming structures by set interpretations. LMCS 2007.
4. Automatic Structures. PhD Thesis. 2004.