Ericpony's Blog: December 2016

Wednesday, December 7, 2016

A note on induction and coinduction

An illustrative example. Consider a context-free grammar (CFG) with start symbol $S$, alphabet $\Sigma=\{0,1,2\}$, and the following production rules:\begin{equation}S\rightarrow 0\ |\ 1\ |\ 0S\ |\ 1S\label{eg1-cfg1}\end{equation}A CFG is conventionally interpreted inductively, which means a string belongs to a grammar's language if and only if it can be transformed from the start symbol through a finite number of rule applications. Hence, the language of the above grammar is $\{0,1\}^+$, for this set contains exactly the strings in $\Sigma$ that can be derived from $S$. On the other hand, a grammar can also be interpreted coinductively, meaning that a string belongs to the a grammar's language if and only if its existence is consistent with the grammar rules. In this sense, a string is in the language of \eqref{eg1-cfg1} if and only if it is of one of the forms 0, 1, $0x$ or $1x$, and, therefore, the language of the grammar is $\{0,1\}^+\cup\{0,1\}^\omega$, the set of nonempty strings comprised of 0's and 1's. Note that any string containing 2 is not in the language, since if $x2y=\{0,1\}\cdot z$ is in the language, then so is a string $x'2y$ with $|x'| = |x| - 1$ and this will eventually lead to a contradiction. Now consider another CFG as follows:\begin{equation}S\rightarrow 0\ |\ 1\ |\ S\ |\ SS\label{eg1-cfg2}\end{equation}It is clear that grammars \eqref{eg1-cfg1} and \eqref{eg1-cfg2} produce the same language when they are interpreted inductively. However, if a coinductive interpretation is taken, then the language of \eqref{eg1-cfg2} will become $\Sigma^\omega$ since any string satisfies rule $S \rightarrow S$. In short, an inductive interpretation of a set of rules looks for the minimal satisfying set (a string is in the language iff it can be derived via the rules), while a coinductive interpre-tation looks for a maximal one (a string is in the language iff it doesn't violate the rules).
The concepts of induction and coinduction can be unified in order theory. To see this, fix a countable domain $D$ and consider a predicate $Q: D \rightarrow \{true,false\}$. We shall use $Q$ to capture the minimal set of elements satisfying a set of recurrent equations. For example, the language induced by grammar \eqref{eg1-cfg1} can be captured by $Q$ with $D=\Sigma^\omega$ and recurrent equations $Q(0)=true$, $Q(1)=true$, $Q(0x)=Q(x)$, and $Q(1x)=Q(x)$, such that a string $x$ is in the language iff $Q(x)=true$. Now we rewrite these equations by replacing the $Q$'s on the left-hand side of each equation with a functional $F$ over $D\rightarrow \{true,false\}$, obtaining a set of equations defining $F$: \begin{equation}F(Q)(0)=true;\quad F(Q)(1)=true;\quad F(Q)(0x)=Q(x);\quad F(Q)(1x)=Q(x).\label{eg1-fp1}\end{equation}Consider the (standard) complete lattice $(D\rightarrow \{true,false\},\ \sqsubseteq)$ of predicates where $(Q_1\sqcap Q_2)(d) = Q_1(d)\wedge Q_2(d)$, $(Q_1\sqcup Q_2)(d) = Q_1(d)\vee Q_2(d)$, $\bot$ maps all $d\in D$ to $false$, $\top$ maps all $d\in D$ to $true$, and $Q_1 \sqsubseteq Q_2$ means $Q_1(d)\Rightarrow Q_2(d)$ for all $d\in D$. Observe that $F$ is a monotone function on this lattice: $Q_1 \sqsubseteq Q_2 \Rightarrow F(Q_1) \sqsubseteq F(Q_2)$. Hence, by Tarski's Fixed Point Theorem, $F$ has a least fixed point (denoted by $\lfp(F)$) and a greatest fixed point (denoted by $\gfp(F)$). The relationship between these fixed points is$$\bot\sqsubseteq F^k(\bot)\sqsubseteq \bigsqcup\nolimits_n F^n(\bot) \sqsubseteq \lfp(F) \sqsubseteq \gfp(F) \sqsubseteq \bigsqcap\nolimits_n F^n(\top) \sqsubseteq F^k(\top) \sqsubseteq \top.$$Take the $F$ defined in \eqref{eg1-fp1} as an example. $F^n(\bot)$ captures the set of strings that can be derived from grammar \eqref{eg1-cfg1} with a parse tree of height less than $n+1$. Furthermore, since each equation only has a finite number of $Q$'s on the right-hand side, $F$ is in fact a Scott continuous function and thus $\lfp(F)=\bigsqcup_n F^n(\bot)$, i.e., $\lfp(F)(x)=true$ iff $x \in\{0,1\}^+$, the set of strings that can be derived from $S$ in a finite number of steps. To find $\gfp(F)$, define $A_n=\{$nonempty binary strings of length less than $n+1\}$ and $B_n=\{$infinite strings in $\Sigma^*$ containing a binary prefix of length less than $n+1\}$. Observe that $F^n(\top)$ captures the set $A_n \cup B_n$, $F^n(\top) \sqcap F^{n+1}(\top)$ captures the set $A_{n+1} \cup B_{n+1}$. Thus $\bigsqcap\nolimits_n F^n(\top)$ will capture the set $\lim_{n\rightarrow\infty}(A_n\cup B_n)$ if the limit exists. However, since $A_n \nearrow \{0,1\}^+$ and $B_n \searrow\{0,1\}^\omega$, we have $$\limsup_{n\rightarrow\infty}(A_n\cup B_n) = \bigcap_{n\ge 1}\bigcup_{k\ge n} (A_k \cup B_k) = \bigcap_{n\ge 1}(\{0,1\}^+\cup B_n) = \{0,1\}^+\cup\{0,1\}^\omega,$$and$$\liminf_{n\rightarrow\infty}(A_n\cup B_n) = \bigcup_{n\ge 1}\bigcap_{k\ge n} (A_k \cup B_k) = \bigcup_{n\ge 1}(A_n \cup \{0,1\}^\omega) = \{0,1\}^+\cup\{0,1\}^\omega.$$It follows that $\lim_{n\rightarrow\infty}(A_n\cup B_n)=\{0,1\}^+\cup\{0,1\}^\omega$, which is also the set captured by $\gfp(F)$ as $F$ is Scott continuous.
Proof by induction. In mathematical induction, we usually set $D=\mathbb N\cup\{0\}$ and try to prove that $Q(0)$ holds and $Q(n)\Rightarrow Q(n+1)$ for all $n\ge 0$. Generally, proving a property $Q$ by induction amounts to proving that $Q$ holds in the base cases and $Q(d)\Rightarrow F(Q)(d)$ for all $d\in D$. The predicate $Q$ satisfying this proof rule turns out to be $\bigsqcup_n F^n(\bot)$, which, when $F$ is Scott continuous, agrees with the least fixed point of $F$. In this sense, proving by induction usually corresponds to taking the least fixed point. In program verification, inductions and least fixed points are extensively used to reason about semantics, because the semantics of a program should precisely describe what will happen during the evaluation of the program.
Proof by coinduction. Suppose $Q$ is the greatest fixed point of $F$. By definition, any predicate $Q'$ satisfying $Q' \sqsubseteq F(Q')$ also satisfies $Q' \sqsubseteq Q$, since $Q$ is an upperbound of any ascending chain in the lattice. Hence, to show $Q(d)$ holds it suffices to find some predicate $Q'$ such that $Q'(d)$ holds and $Q'(d')\Rightarrow F(Q')(d')$ for all $d' \in D$. The rationale behind doing this is to regard the idea of being acceptable (i.e., $Q(d)$ holds) as being consistent with the induction rules (i.e., $\exists Q': Q'(d) \wedge Q'(d')\Rightarrow F(Q')(d')$ for all $d'$). In program verification, coinductions and greatest fixed points are extensively used to check consistence with specifications, because the specification of a program should not forbid a behaviour unless it explicitly specifies so.
Collapse of fixed points. The least fixed point and the greatest fixed point in a complete lattice can coincide when Banach's Fixed Point Theorem can be applied. Namely, a "contractive" operator in a metric space has a unique fixed point. A functional $F$ will be contractive if there exists a well-founded relation $\prec$ on $D$ such that the clause of $F(Q)(d)$ only call $Q$ on arguments smaller than $d$ with respect to $\prec$. For example, the $F$ defined in \eqref{eg1-fp1} is contractive when $D$ is set to $\Sigma^*$; in such case, both $\lfp(F)$ and $\gfp(F)$ capture $\{0,1\}^+$. By contrast, the functional that expresses grammar \eqref{eg1-cfg2}, i.e., the functional $F$ that satisfies\begin{equation}F(Q)(0)=true;\ F(Q)(1)=true;\ F(Q)(x)=Q(x);\ F(Q)(xy)=Q(x)\wedge Q(y),\label{eg1-fp2}\end{equation}is not contractive even when $D=\Sigma^*$. Indeed, one can see that $\lfp(F)$ captures $\{0,1\}^+$ but $\gfp(F)$ captures $\{0,1\}^*$.

Inductive and coinductive datatypes

An introduction to (co)algebra and (co)induction.
https://pages.lip6.fr/Pierre-Evariste.Dagand/stuffs/notes/20150126_coinduction.txt
http://wiki.c2.com/?CoinductiveDataType