Ericpony's Blog: January 2016

Thursday, January 28, 2016

Intuition behind a mathematical puzzle

Sometimes, it is helpful to reason about a mathematical puzzle in a concrete context. A solution may reveal itself after you reword the problem statement in intuitive terms. The following example shows how the solution to a seemingly nontrivial problem becomes obvious after the statement of the problem is reworded appropri-ately.
Example. Consider the following function:

function f (n: int) {
  if n == 1 return n
  choose integers a, b ≥ 0 such that a + b == n
  return a * b + f(a) + f(b)
}

Our goal is to show that the output of f(n) does not depend on how a and b are determined in the function body. This fact is far from obvious by inspecting the code. A standard approach to show this kind of fact is via mathematical induction. However, to establish the induction hypothesis, one must first express the value of f(n) in $n$. This is not easy given that values of a and b are not specified. Besides, an inductive proof typically proceeds by manipulating formulas. In that case, the proof itself is not very helpful for us to gain insights to the fact it proves.
On the other hand, the fact turns evident once you notice what the function intends to compute. The key observation here is that f(n) effectively counts the number of hand-shaking in a group of $n$ people. A group of size $n$ is is divided into two groups of size $a$ and $b.$ People from different groups then shake hand with each other. This process continues recursively until no smaller group can be form. At that time, each person eventually shakes hand with everybody else for exactly once. Therefore, the value of f(n) is $n\cdot(n-1)/2$ regardless of how groups are divided during the process.

A technique called double counting is closely related to the idea expressed above.
Example. We know that there are ${n \choose k}$ ways to select a set of size $k$ from a set $S$ of $n$ distinct elements. If $S$ is divided into two partitions of size $a$ and $b$, then selecting a subset from $S$ amounts to selecting a subset from each of the partitions respectively. Therefore, no matter how the partitions are divided, i.e., $a + b = n$, we have $$\sum_{i+j=k}{a \choose i}{b \choose j} = {n \choose k}$$(recall that ${n \choose 0}=1$ and ${n \choose k}=0$ for $k<0$ or $k>n$).

Tuesday, January 12, 2016

A note on Angluin’s Learning Algorithm

The L* Learning Algorithm, also known as Angluin’s Learning Algorithm for DFAs, was first developed by Angluin [1] and later improved by Rivest and Schapire [2]. L* learns an unknown regular language, say $U$ over an alphabet $\Sigma$, and produces a DFA $C$ such that $L(C) = U$. L* works by incrementally producing a sequence of candidate DFAs $C_1 , C_2, ...$ converging to $C$. In order to learn $U$, L* needs a Teacher to answer two type of questions. The first type is a membership query, consisting of a string $\sigma \in \Sigma*$; the answer is true if $\sigma \in U$, and false otherwise. The second type of question is a conjecture, which is a candidate DFA $C$ whose language the algorithm believes to be identical to $U$. The answer is true if $L(C) = U$. Otherwise the Teacher returns a counterexample, which is a string $\sigma'$ in the symmetric difference of $L(C)$ and $U$.
At a higher level, L* creates a table where it incrementally records whether strings in $\Sigma^*$ belong to $U$. It does this by making membership queries to the Teacher. At various stages, L* would decide to make a conjecture. It constructs a candidate automaton $C$ based on the information contained in the table, and asks the Teacher whether the conjecture is correct. If it is, the algorithm terminates. Otherwise, L* uses the counterexample returned by the Teacher to expand the table with strings that witness differences between $L(C)$ and $U$.
The Learner maintains a Boolean table with index set $X\times Y \subset \Sigma^*\times\Sigma^*.$ The rows of the table are indexed by $X$ while the columns are indexed by $Y$. Each cell $(x,y)$ indicates whether or not $xy \in U$. Two strings $a,b\in X$ are called $Y$-equivalent, written as $a\equiv_Y b$, if $ay\in U \iff by\in U$ for all $y\in Y.$ Note that $a\equiv_Y b$ iff the rows indexed by $a$ and $b$ are identical binary strings. The table is called consistent iff all distinct strings $x,x'\in X$ are not $Y$-equivalent. The table is called closed iff for all $x\in X$ and $a\in\Sigma$, there exists $x'\in X$ such that $xa\equiv_Y x'$.
The table determines a reduced DFA when it is consistent and closed. The states of the DFA are $\{[x]_Y:x\in X\}$ (here $[\cdot]_Y$ is the equivalence classes induced by $\equiv_Y$); accepting states are $\{[x]_Y: x\in X\cap U\}$; the transition function $\delta:[X]_Y\times\Sigma\to [X]_Y$ is given by $\delta([x]_Y,a)=[xa]_Y.$ This DFA is reduced as every two states of it can be distinguished by some string in $Y$ by the definition of consistency. The L* algorithm can now be sketched as follows:
Step 1. Start with a one-cell table with index set $\{\epsilon\}\times \{\epsilon\}$.
Step 2. Fill the table cells with membership queries and expand the index set when necessary until the table is consistent and closed. This procedure guarantees to find the minimal expansion of the table that is consistent and closed. Construct a conjecture DFA $C$ from the table.
Step 3. Check the conjecture using an equivalence query. If it is correct then output it and halt. Otherwise, expand the table with the provided counterexample and go to Step 2.
We give more details about Step 3. Suppose the Teacher returns a counterexample $w=w_1\dots w_n \in \Sigma^n$ for conjecture $C$. Let the trace of this string in $C$ be$$x_0\overset{w_1}{\longrightarrow}x_1\overset{w_2}{\longrightarrow}\cdots\overset{w_n}{\longrightarrow}x_n,$$for $x_0,...,x_n\in X.$ Then there is $1\le i\le n$ such that $x_i w_{i+1}...w_n \in U \iff w \not\in U.$ The table is then expanded by including $x_{i-1}w_i$ and $w_{i+1}...w_n$ in $X$ and $Y$, respectively.
Recall that the number of states of a conjecture DFA is the same as the number of distinct rows in the table. Also, the number of distinct rows increases by at least one each time an equivalence query fails. The L* algorithm would therefore terminate after making at most $|C| − 1$ incorrect equivalence queries and learn the reduced automaton $C$ for the target language.

Difficulty of learning DFAs

Difficulty of offline learning. Let $\mathcal{A}, \mathcal{B} \subset \Sigma^*$ be two finite sets of strings. Given $k$, it is NP-complete to decide whether a DFA $D$ with $k$ states exists such that $\mathcal{A} \subseteq L(D)$ and $\mathcal{B} \subseteq \Sigma^* \setminus L(D)$. It is still NP-complete to find such a DFA with the minimum size. In fact, even finding a DFA with a number of states polynomial on the number of states of the minimum one is NP-complete. If all strings of length $n$ or less are given, then the problem can be solved in time polynomial on the input size. Note, however, that the size of the input is itself exponential on the number of states in the resulting DFA. Angluin has shown that this problem remains NP-complete if an arbitrarily small fixed fraction is missing from the input set.
Difficulty of online learning. Learning a minimal DFA becomes easier if the algorithm is allowed to make queries to the unknown automaton. The L* algorithm proposed by Angluin [1] solves the problem in polynomial time by allowing the algorithm to ask membership and equivalence queries. Rivest and Schapire [2] later improved the algorithm so that it does not need to reset the unknown automaton to the initial state after each query.

References

1. D. Angluin. Learning regular sets from queries and counterexamples. Information and Computation, 1987.
2. R. L. Rivest and R. E. Schapire. Inference of finite automata using homing sequences. Information and Computation, 1993.
3. Michael Kearns and Umesh Vazirani. An Introduction to Computational Learning Theory. MIT Press, 1994.
4. Ron Rivest, Machine Learning Lecture Notes at MIT, Lecture 23 & 24.
5. Rajesh Parekh, Vasant Honavar. Learning DFA from Simple Examples. Machine Learning, 2001.