Ericpony's Blog: A note on checking properties of regular languages

Wednesday, February 17, 2016

A note on checking properties of regular languages

Deterministic Finite Automata

Equivalence. If a DFA with

$n$ states does not accept every string, it rejects some string of length at most

$n-1$ . It follows that the languages of two DFA

$A$ and

$B$ coincide iff they accept the same words up to length

$|A|\cdot|B| - 1$ . (Hint: consider automaton

$A\times B$ .) This observation leads to an exponential-time algorithm for language equivalence checking. In practice, equivalence between DFA is often checked by first minimising the DFA (see below). Isomorphism between DFA can be checked in linear time when an appropriate representation is used, cf. Hopcroft's method.

Minimisation. The minimal DFA of a regular language is unique by the Myhill-Nerode Theorem. We can hence check language equivalence of DFA by checking isomorphism between DFA. DFA minimisation takes

$O(|\Sigma|\cdot n\log n)$ time using Hopcroft's method, which is the most commonly used approach in practice. Aho, Hopcroft and Ullman proposed an algorithm that uses union-and-find to create and locate the equivalence classes for states in a DFA (cf. Sec. 3 of [2]). The running time is

$O(kn\!\cdot\!A(kn,n))$ with

$k=|\Sigma|$ .

Universality can be checked by checking that all reachable states are accepting.
Complement is obtained by simply flipping the accepting and the non-accepting states.
Aperiodicity (ie. existence of non-trivial cycles) of a DFA is PSPACE-complete, cf. here.
Intersection vacuity of an unbounded number of DFA is PSPACE-complete, cf. here and here.
Reversal. The trivial way to get the DFA of the reverse language is to determinise the NFA obtained by flipping the initial and the accepting states and reversing the directions of the transitions. This approach leads to exponential blowup. It is unclear as to the lower bound of this problem.

Non-deterministic Finite Automata

Equivalence. NFA equivalence problem is PSPACE-complete --- in fact, it is the first problem shown to be in this complexity class, see [5]. Practically, checking equivalence of NFA is often done by first converting NFA to DFA using subset construction, which however can lead to exponential growth in automata size. There are other determinisation methods in the literature, c.f. wiki and [4].

Universality. Using determinisation, if an NFA of size

$n$ does not accept every string then it rejects some string of length at most

$|\Sigma|^n-1$ . Together with the classic result that reachability can be checked is in NL, we easily see that university checking for NFA is in PSPACE. One can show that the problem is PSPACE-complete and it is so even when every state of the NFA is accepting.

Complement. Complementing a DFA can be easily done by flipping the accepting and the non-accepting states. This method doesn't work for NFA: the language of the "flipped" NFA will be the superset of its complement in general (e.g., consider the NFA

$\{q_0 \xrightarrow{a} q_1, q_0 \xrightarrow{a} q_2\}$ where

$q_1$ is accepting). In [6], the authors give very precise bounds for complexities for complement on NFA. Particularly, there are NFA with exponentially large complement: the language of all words over

$\{1,…,n\}$ which don't contain all symbols is accepted by an NFA of size

$n$ , but its complement needs

$2^n$ states. Hence, complement of NFA is EXPSPACE-complete; no general method for complementing NFA can have better complexity than doing subset construction in theory.

Minimisation. Minimisation to/from NFA is PSPACE-hard. Given a DFA or an NFA, it is PSPACE-hard to find the size of the minimal equivalent NFA. Some hardness of approximation results have shown that it is even hard to approximate the size of the minimal equivalent NFA. Since a regular expression can be converted to an NFA of comparable size, these hardness results show that we can't expect any efficient algorithm to convert an regular expression to a minimal NFA. See these lecture notes by Artem Kaznatcheev for more details.

Regular expressions

Equivalence. Like NFA, checking equivalence of RE is PSPACE-complete. This can be expected since RE can be converted to and from NFA of roughly the same size, the complexity of checking properties of RE should be comparable to that of checking NFA. However, if intersection is allowed then the difficulty rises to EXPTIME. Further, if negation is allowed then the problem becomes non-elementary [5].

Star-freeness of RE is PSPACE-complete, see here. This problem corresponds to checking the non-existence of non-trivial cycles in DFA, which is also PSPACE-complete.
Emptiness for regular expressions with intersection is PSPACE-complete.
(Often, checking the a property for DFA is far more efficient than checking the same property for RE. So it is surprising to note that the above two problems have the same complexity for both DFA and RE.)

Star-free Regular expressions

Checking equivalence of star-free RE is NP-complete, cf. Section 2 of [3].

Resources

Online converters: RE-to-NFA, NFA-to-RE, NFA-to-DFA
RE equivalence matching in the wild:
1.Backtracking (used in Perl, Python, etc): Discussion, Demo, Code
2.Virtual-machine method (used in Ruby): Explanation

References

1. Equivalence of regular languages.
2. Testing the Equivalence of Regular Languages
3. On the equivalence, containment, and covering problems for the regular and context-free languages
4. Five determinisation algorithms
5. Word problems requiring exponential time
6. State Complexity of Concatenation and Complementation of Regular Languages
7. Complexity of Some Problems from the Theory of Automata

Ericpony's Blog