This is ericpony's blog

Monday, December 29, 2014

Basic dependent type theory ─ Part I

Computational trinitarianism. Proof Theory, Category Theory, and Type Theory, to a significant degree, can be regarded as fields that see the same subject from different points of view. They can be related via the follows:
TT ⇔ CT: category semantics; WTT corresponds to morphisms and categories.
PT ⇔ CT: category logic; an algebraic way of thinking about logic and proofs.
PT ⇔ TT: thinking propositions as types and WTT as proofs.
When you are learning a new idea in one field, it is helpful to transpose it to the other two fields and seek its correspondence there. See here and here for more details.
Judgements. We start with three kinds of categorical judgements:
$A\ type$   ─  $A$ is a type.
$a \in A$    ─  $a$ is a value of type $A.$
$a_1=a_2: A$ ─  $a_1$ and $a_2$ are "definitionally equivalent" values of type $A.$
A judgement in form of $x_1:A_1,...,x_n:A_n \models a(x_1,...,x_n):A$ is called a hypothetical judgement. One may regard such a judgement as a mapping from $A_1,...,A_n$ to $A.$ $a(x_1,...,x_n)$ is called an open term that might involve free variables $x_1,...,x_n.$ In practice, we often consider the open term parametrized at $x_1,...,x_n.$ A closed term of type $A$ can be determined categorically after the values of $x_1,...,x_n$ are plugged in.
Definitional equality is 1) an equivalence relation, 2) a congruence (see typing rules), and 3) respected by maps. That is, $$\frac{\Gamma, x:A \models b:B, \Gamma \models a_1\equiv a_2: A}{\Gamma \models [a_1/x]b \equiv [a_2/x]b : B}.$$The last feature is called the principle of functionality property. This property and its related concepts will play a central role later and in Homotopy type theory.
Definitional equality has its limits. For example, one cannot prove definitional equivalence between $0+1$ and $1+0$ even given all the arithmetic rules about natural numbers. Definitional equality corresponds to calculating by simplification. There however are equations (such as $0+1=1+0$) that you wish to hold cannot be reached that way. These equations need not just calculation but proof, and thus cannot hold definitionally.
Typing rules. We have the following kinds of rules to reason about types. Rules 1-5 come from Proof Theory and Rule 6 comes from Category Theory.
1. Formation rules: how to construct a type.
2. Introduction rules: how to create elements of a type.
3. Elimination rules: how to use elements of a type.
4. Congruence rules: how to interact with definitional equivalence.
5. Computation rules: how to do calculation with types
6. Uniquicity/Unisersality rules: characterize precisely types

Negative types. E.g., Unit, product, and function. Elimination-oriented. Uniquicity of product: $\Gamma \vdash \langle fst(c), snd(c) \rangle \equiv c : A\times B.$ Computation of function is substitution. Elimination of function is application. Extension of function can be written as $$\frac{\Gamma, x:A \vdash a_1(x)\equiv a_2(x):B}{\Gamma \vdash a_1\equiv a_2: A\rightarrow B}$$This means that functions are characterized by their input-output behaviors.
Positive types. E.g., Void and sum. Introduction-oriented. The Uniquicity property of positive types is problematic, and cannot be shown as for the negative types as directly. Elimination rule for the Void type:$$\frac{\Gamma\vdash a:\emptyset}{\Gamma\vdash abort(a): A}$$This means if you have a value of the Void type, you can abort the program with any type. Note that one can never build a term of type $A$ in this way since there is no closed term of type $\emptyset.$ One application of this rule is that if you are in a conditional branch, then one side of the conditional may be contradictory, and you can build an $abort$ element of type $A$ to make sure both elements for the condition have the same type.
Negative types vs. positive types. Approximately, the two types differ most noticeably in how they behave under elimination. Reasoning about a negative type has noting to do with its structure. For example, a product is characterized by its behaviors on $\texttt{fst}$ and $\texttt{snd}$ without caring about what it actually is. A product can be realized by an ordered pair or just a black box that computes the $\texttt{fst}$ and $\texttt{snd}$ components on demand. Also, there is no induction principle for negative types. For example, all you know about a function is that it is a lambda term. Thus you cannot induct on its structure via the typing rules. In contrast, the structure of a positive type is enforced in the theory. The elimination form of positive types work by case analysis─the actual element in a sum is extracted by $\texttt{case}$ in the process of inference. Thus it cannot be anything else after the inference.
A correspondence btw STT and IPT. $Unit$ is $true.$ $Void$ is $false.$ Sum is disjunction. Product is conjunction. Function is implication. $A\ type$ is $A\ prop.$ $\exists e: A$ is $A\ true.$ $\exists b.\, a:A \vdash b:B$ is $A\le B.$ Thus $A\le B$ means that there is a function transforming a proof for $A$ to a proof for $B.$ Note that $\le$ is a pre-order: that is, it is reflexive and transitive. We use $A\supset B$ to denote the largest element $x$ satisfying $A\wedge x \le B$, i.e., $C \wedge A \le B$ implies $C \le A \supset B.$ If an ordered structure satisfies 1) $\forall A: A\le \top$; 2) $\forall A: \bot \le A$; 3) $A\wedge B$ (meet) is a glb for $A,\,B$; 4) $A\vee B$ (join) is a sub for $A,\,B$; 5) $\wedge$ is distributive over $\vee$; 6) $A \supset B$ exists for every $A,B$, then it is called a Heyting pre-algebra. (It is called a pre-algebra because $\le$ is not anti-reflexive.)
Negations. $\neg\ A$ is defined as $A \supset \bot.$ Thus we can get a contradiction from a value of $A.$ The fact $A\le \neg\neg\ A$ follows directly by the properties of $\supset.$ Its converse $\neg\neg\ A \le A$ however doesn't hold in general (it takes quite some efforts to check this statement). Similarly, while $A \vee \neg\ A \le \top$ is trivial, $\top \le A \vee \neg\ A$ cannot be proved in general without knowing something about $A.$
By definition, $\neg A$ is the largest element inconsistent with $A.$ That is, $A\wedge\neg A \le \bot$ and $B\wedge\neg A \le \bot \implies B\le \neg A.$ The compliment $\overline{A}$ of $A$ is defined as the smallest element $B$ such that $\top\le A\vee\overline{A}.$ Hence $\top\le A\vee B$ implies $\overline{A}\le B.$ Note that i) $\neg A$ always exists while $\overline{A}$ does not; ii) it is reasonable to add the largest/smallest requirement to the definitions, for otherwise we can trivially choose $\neg\ A=\bot$ and $\overline{A}=\top$; iii) it is easy to check $\neg\ A \le \overline{A}$ (things in $\neg\ A$ are definitely not in $A$). The converse $\overline{A} \le \neg\ A$ however does not hold in general (the negation does not fill up the whole space).
A Boolean algebra is also called a complemented Heyting algebra in that $\overline{A}$ exists for every $A$, with i) $\top \le A\vee \neg\ A$ (which implies $A\vee \neg\ A=A\vee \overline{A}$), and ii) $\overline{\overline{A}} \le A$ (note that $A \le \overline{\overline{A}}$ follows by definition). Heyting pre-algebras generalize Boolean pre-algebras in the sense that a Heyting pre-algebra satisfying $1 \le A \vee \neg A$ (law of excluded middle), or equivalently $\neg\neg A \le A$ (law of double negation), is a Boolean pre-algebra.
Heyting algebras. An exponential $y^x$ is defined as $\overline{x}\vee y.$ A Heyting algebra can also be defined as a lattice with exponentials: $H$ is called a Heyting algebra if i) it has a preorder; ii) it has finite meets and joins, and iii) satisfies $y^x$ is the largest element $z$ such that $x \wedge z \le y.$
Exercise. Show that every Heyting algebra is distributive.
We give hints about proving $x \wedge (y \vee z) = (x \wedge y) \vee (x \wedge z).$ First, note that $z \wedge x \le y$ iff $z \le y^x.$ Hence, to prove direction $x \wedge (y \vee z) \le (x \wedge y) \vee (x \wedge z)$, it suffices to show $y \vee z \le ((x \wedge y) \vee (x \wedge z))^x$, which amounts to show that $u \le ((x \wedge y) \vee (x \wedge z))^x$ for $u=y, z.$ However, $u \le ((x \wedge y) \vee (x \wedge z))^x$ iff $u \wedge y \le (x \wedge y) \vee (x \wedge z).$ For $u=y, z$ this follows by definition. As to the other direction, note that $x \le y$ iff $\forall z.\,(y\le z$ $\implies x \le z).$ Hence, to prove $(x \wedge y) \vee (x \wedge z) \le x \wedge (y \vee z)$, it suffices to show that $(x \wedge y) \vee (x \wedge z) \le u$ implies $x \wedge (y \vee z) \le u$, which is equivalent to $y \vee z \le u^x.$ Note that $(x \wedge y) \vee (x \wedge z) \le u$ implies both $x \wedge y \le u$ and $x \wedge z \le u.$ Thus we have $y \le u^x$ and $z \le u^x$, that is, $y \vee z \le u^x.$
As an alternative, you can also use the completeness theorem for IPT. Completeness assures that anything that holds in every Heyting algebra must be provable in IPT, and vice versa. Since preorder is interpreted to entailment in IPT, you can transfer e.g., $x \wedge (y \vee z) \le (x \wedge y) \vee (x \wedge z)$ to $X \wedge (Y \vee Z)\ true \vdash (X \wedge Y) \vee (X \wedge Z)\ true$, and then reason about the logical connectives to do the proof. You can further add proof terms to obtain a FP problem $\Gamma, x: A \times (B + C) \vdash M: (A \times B) + (A \times C).$ Resolving this problem amounts to writing a program $M$ of type $(A \times B) + (A \times C)$, and it is not hard to see that $M = \texttt{case}(z.\,\texttt{inl}(\langle \texttt{fst}(x), z\rangle);\, z.\,\texttt{inr}(\langle \texttt{fst}(x), z\rangle))\ \texttt{snd}(x).$ It is sort of a cyclelogical matter for people to transfer problems around the trinitarian when they are doing proofs.

Wednesday, December 10, 2014

A logical interpretation of the function subtyping rule

We discussed the subtyping rule for function type in this post. In short, a function type is contravariant in the argument types and covariant in the result type. Several days ago, I happened to notice that the Monotonicity Theorem for implication (see p.70 in [1]) looks very similar to the aforementioned subtyping rule in form. Given propositions $p,p',q,q'$ such that $p'\Rightarrow p$ and $q'\Rightarrow q$, the theorem asserts $$p'\vee q' \Rightarrow p \vee q.$$ Besides, it is immediate to see that $p \vee q \equiv \neg q \Rightarrow p$ and $q' \Rightarrow q \equiv \neg q \Rightarrow \neg q'$, assuming the law of double negation. Now, by writing $p$, $p'$, $\neg q$ and $\neg q'$ as $P$, $P'$, $Q$ and $Q'$, respectively, we obtain \begin{equation} (Q' \Rightarrow P') \Rightarrow (Q \Rightarrow P) \label{subtyping} \end{equation} given that $P'\Rightarrow P$ and $Q\Rightarrow Q'$.

The connection between the above fact and the subtyping rule of function types can be established by treating the implication connective "$\Rightarrow$" as both a function-type constructor and a subtype-of relation. Indeed, if Apple is a subtype of Fruit then being an Apple implies being a Fruit. It is thus reasonable to use "Apple $\Rightarrow$ Fruit" to express the subtyping relation between Apple and Fruit. In view of this, the logical entailment in \eqref{subtyping} leads us to the following statement for function types: If $P'$ is a subtype of $P$ (i.e. $P' \Rightarrow P$) and $Q$ is a subtype of $Q'$ (i.e. $Q \Rightarrow Q'$), then function type $Q' \Rightarrow P'$ is a subtype of function type $Q \Rightarrow P$ (i.e. $(Q' \Rightarrow P') \Rightarrow (Q \Rightarrow P)$). In other words, a function type is contravariant in the argument types and covariant in the result type.

Reference

1. Gries, David, and Fred B. Schneider. "A logical approach to discrete math." Springer, 1993.

Tuesday, October 21, 2014

A note on the strong law of large numbers

I.i.d. r.v.s with zero mean

Consider a sequence of i.i.d. random variables $X_0, X_1, ....$ When the variance is finite, the sequence $S_0, S_1, ...$ of partial sums is expected to have a faster rate of convergence than the rate guaranteed by the Strong Law of Large Numbers (SLLN). In fact, we have the following theorem:

Theorem 2.5.1. If $Var(X_n)<\infty$ and $E[X_n]=0$, then $\lim_{n\rightarrow\infty} (S_n/n^p)=0$ a.s. for all $p>\frac{1}{2}$.

This theorem can be proven using criteria of convergent series and the Borel-Cantelli lemma.

Indep. r.v.s with zero mean and finite sum of variance

It is possible to use a denominator with a slower growing rate than $n_p$. However, we will need a stronger assumption for the variances to ensure convergence without a divergent denominator.

Theorem 2.5.3. If $\sum_{n\ge1} Var(X_n)<\infty$ and $E[X_n]=0$, then $S_n$ converges a.s.

For sequences with $E[X_n]\neq0$, we can consider $Y_n=X_n-E[X_n]$ instead of $X_n$. Note that $Var(Y_n)=Var(X_n)$ and $E[Y_n]=0$. It then follows from the theorem that $\sum Y_n=\sum (X_n-E[X_n])$ converges a.s.

The ultimate characterisation of independent r.v.s with convergent partial sums is given by Kolmogorov.

Theorem 2.5.4. (Kolmogorov's Three-Series Theorem) Given independent $\{X_n\}$ and $a>0$, define $Y_n=X_n\cdot 1\{|X_n|\le A\}$. Then $S_n$ converges a.s. iff the followings all hold:
(i) $\sum \Pr\{|X_n|>A\}<\infty$, (ii) $\sum E[Y_n]<\infty$, and (iii) $\sum Var(Y_n)<\infty$.

Observe that $\sum_{n\ge1}Y_n<\infty$ a.s by the 2nd condition. Also, putting the 3rd condition and Theorem 2.5.3 together leads to the fact that $\sum_{n\ge1}(Y_n-E[Y_n])$ converges a.s. Finally, $\Pr\{X_n=Y_n$ for $n$ large$\}=1$ by the 1st condition and the Borel-Cantelli lemma. Hence, we see that $\sum_{n\ge1}X_n$ converges a.s. We need the Lindeberg-Feller theorem to prove the other direction.

Convergence of sequence is linked to the SLLN by the following theorem.

Theorem 2.5.5. (Kronecker's Lemma) If $a_n\nearrow\infty$ and $\sum_{n\ge1}b_n/a_n$ converges, then $\sum_{n\ge1}^N b_n/a_N \rightarrow 0$ as $N\rightarrow\infty$.

I.i.d. r.v.s with finite mean

The SLLN follows by Kolmogorov's Three-Series Theorem and Kronecker's Lemma.

Theorem 2.5.6. (SLLNIf $E[X_n]=\mu$, then $S_n/n=\mu$ a.s.

Faster rate of convergence

We can prove a faster rate of convergence under stronger assumptions.

Theorem 2.5.7. Suppose that $\{X_n\}$ are i.i.d., $E[X_n]=0$ and $\sigma^2=E[X_n^2]$, then for all $\epsilon>0$, $$\lim\frac{S_n}{\sqrt n (\log n)^{1/2+\epsilon}}=0\quad a.s.$$The most exact estimate is obtained from Kolmogorov's test (Theorem 8.11.3): $$\lim\frac{S_n}{\sqrt n (\log \log n)^{1/2}}=\sigma\sqrt 2\quad a.s.$$
Theorem 2.5.8. Suppose that $\{X_n\}$ are i.i.d., $E[X_n]=0$ and $E[X^p]<\infty$ for $1<p<2$. Then $\lim S_n/n^{1/p}=0$ a.s.

Examples

Example 2.5.3. Suppose $\{X_n\}$ is indep. and $\Pr\{X_n=\pm n^{-p}\}=1/2$. Then $p>1/2$ iff $S_n$ converges a.s. (Hint: use Theorem 2.5.4 and let $A=1$)

References

Rick Durrett. Probability: Theory and Examples. Edition. 4.1.

Friday, October 10, 2014

A note on the Riemann integral of sin(x)/x

In this post, we will prove the famous result that \[ \int_{0}^{\infty}\frac{\sin x}{x}dx=\frac{\pi}{2}. \] We first fix some $0<a<\infty$ and consider $e^{-xy}\sin x$. We shall show that $e^{-xy}\sin x$ is integrable on $0<x< a$ and $y>0$. As $\left|\sin x\right|\le1$, one may instead try to prove the integrability of $e^{-xy}$ for this purpose. Unfortunately, the attempt would fail since \[ \int_{0}^{a}\int_{0}^{\infty}e^{-xy}\,dy\,dx=\int_{0}^{a}\frac{1}{x}dx=\infty \] diverges. Bounding $e^{-xy}\sin x$ with $e^{-xy}$ turns out to be too rough near $x=0$, because $1/x\rightarrow\infty$ but $\sin x/x\rightarrow1$ when $x\searrow0$. We therefore need to use a tighter bound near $x=0$ to make the integral finite. Observe that, since $\sin x/x\rightarrow1$, given $\varepsilon>0$ there exists $\delta>0$ such that $\left|\sin x/x-1\right|< \varepsilon$ for $0< x< \delta$. Hence, \begin{eqnarray*} \int_{0}^{a}\int_{0}^{\infty}e^{-xy}\sin x\,dy\,dx & = & \int_{0}^{\delta}\int_{0}^{\infty}e^{-xy}\sin x\,dy\,dx+\int_{\delta}^{a}\int_{0}^{\infty}e^{-xy}\sin x\,dy\,dx\\ & \le & \int_{0}^{\delta}\frac{\sin x}{x}dx+\int_{\delta}^{a}\frac{1}{x}dx\\ & \le & \left(1+\varepsilon\right)\delta+\ln a-\ln\delta < \infty. \end{eqnarray*} Similarly, \[\int_{0}^{a}\int_{0}^{\infty}e^{-xy}\sin x\,dy\,dx \ge \left(1-\varepsilon\right)\delta-\ln a+\ln\delta > -\infty. \] It follows that the integral indeed exists. Now we can perform the double integral in two orders to obtain \[ \int_{0}^{a}\left(\int_{0}^{\infty}e^{-xy}\, dy \right)\sin x\, dx=\int_{0}^{a}\frac{\sin x}{x}dx \] and \begin{eqnarray*} \int_{0}^{\infty}\left(\int_{0}^{a}e^{-xy}\sin x \, dx \right) dy & = & \int_{0}^{\infty}\left(\dfrac{1}{y^{2}+1}-\dfrac{\left( \cos a + y\,\sin a \right)e^{-ay}}{y^{2}+1}\right) dy \\ & = & \frac{\pi}{2}-\cos a\int_{0}^{\infty}\frac{e^{-ay}}{1+y^{2}}dy-\sin a\int_{0}^{\infty}\frac{y\,e^{-ay}}{1+y^{2}} dy \end{eqnarray*} Hence, \begin{eqnarray*} \left|\int_{0}^{a}\frac{\sin x}{x}dx-\frac{\pi}{2}\right| & = & \cos a\int_{0}^{\infty}\frac{e^{-ay}}{1+y^{2}}dy+\sin a\int_{0}^{\infty}\frac{y\,e^{-ay}}{1+y^{2}}dy\\ & \le & \int_{0}^{\infty}\left(1+y\right)e^{-ay}\, dy\\ & = & a^{-2}+a^{-1}. \end{eqnarray*} By taking $a\rightarrow\infty$ we obtain the desired result.
You can find different proofs of the same result here. Besides, this thread shows why $\sin{x}/x$ is Riemann integrable but not Lebesgue integrable on $(0,\infty)$.

Thursday, September 18, 2014

An algebraic method for a graph decomposition problem

Let $K_{n}$ denote an $n$-clique and $K_{n,m}$ denote a complete bipartite graph. We are interested in the least number $m$, denoted as $m(n)$, such that $K_{n}$ can be edge-decomposed into $m$ complete bipartite graphs. Observe that $m(1)=0$, $m(2)=1$, $m(3)=2$, $m(4)=3$, etc. It is not difficult to deduce $m(n)\le n-1$ after some case studies, since we can obtain a decomposition of $K_{n+1}$ by adding $K_{1,n}$ to a decomposition of $K_{n}$, which implies $m(n+1)\le m(n)+1$. The nontrivial part lies in the reverse direction: how can we prove a strict lowerbound for $m(n)$? If we want to establish the lowerbound by analyzing the graphs directly, we have to show for all $K_n$ that there is no better decomposition than the one we offer, and this is far from an easy job.

R.$~$Graham proposed a very clever method to establish $m(n)\ge n-1$ based on elementary knowledge from linear algebra. He used $n$ variables $x_{1},...,x_{n}$ to represent the $n$ vertices $v_{1},...,v_{n}$ of $K_{n}$. An edge $\overline{v_{i}v_{j}}$ is selected iff $x_{i},x_{j}$ are both nonzero. The key observation is: if $(A_{1},B_{1}),...,(A_{m},B_{m})$ is a decomposition of $K_{n}$ into complete bipartite graphs, then the equation \[ \sum_{1\le i < j\le n}x_{i}x_{j}=\sum_{k=1}^{m}\sum_{x\in A_{k}}\sum_{y\in B_{k}}xy=\sum_{k=1}^{m}\left(\sum_{x\in A_{k}}x\right)\left(\sum_{y\in B_{k}}y\right) \] must holds for any assignment of $x_{1},...,x_{m}$. (An equation like this is called a double counting in the buzzwords of Combinatorics.)

With the observation in mind, he considered the following homogeneous system of linear equations: \[ \begin{cases} \sum_{i=1}^{n}x_{i}=0,\\ \sum_{x\in A_{k}}x=0, & k=1,...,m \end{cases} \] Now suppose that $m < n-1$. The system then has $n$ unknowns and $m+1 < n$ equations. This implies that the system has an infinitely number of solutions. In particular, there exists a nontrivial assignment for $x_{1},...,x_{n}$ such that \[ 2\cdot\sum_{1\le i < j\le n}x_{i}x_{j}=\left(\sum_{i=1}^{n}x_{i}\right)^{2}- \sum_{i=1}^{n}x_{i}^{2}=-\sum_{i=1}^{n}x_{i}^{2}=0, \] which is impossible since the last equality enforces $x_{1}=...=x_{n}=0$. It follows that $m(n)\ge n-1$ by contradiction. Hence, we can conclude $m(n)=n-1$.

Monday, September 8, 2014

A note on expectation transformer semantics

An expectation is a function that maps program states to either a nonnegative real value, or a non-negative real-valued expression over state variables. When the program contains randomized operations, an expectation here is in fact a nonnegative random variable over the distribution induced by the reachability probability of program states. An expectation is called a post-expectation when it is to be evaluated in on final states. An expectation is called a pre-expectation if it is to be evaluated on initial states.

Intuitively, we may regard a post-expectation as a random variable representing nonnegative "rewards" of final states. One can examine the program beforehand to estimate for each initial state the expected rewards when the program runs repeatedly from there. This estimate, called the weakest pre-expectation, is a nonnegative function that maps an initial state to the expected reward over the final states starting from that state. In other words, a weakest pre-expectation evaluates to the expected value of the post-expectation over the distribution of final states produced by program executions.

Given a probabilistic program $prog$ and a post-expectation $postE$ over final states, we use $wp(prog,postE)$ to denote the weakest pre-expectation of $prog$ with respect to $postE$. As a function over state variables, $wp(prog,postE)$ maps an initial state of $prog$ to the expected value of $postE$ in the final distribution reached via executions of $prog$. In particular, $wp(prog,1)$ maps an initial state to the probability that the program will terminate from that state, and $wp(prog,[pred])$ is the probability that the program will terminate in a state satisfying predicate $pred$. (Here we use square brackets $[\cdot]$ to denote an indicator function, so that given a predicate $pred$, $[pred]$ evaluates to 1 on states satisfying $pred$ and evaluates to 0 otherwise.)

Expectations are quantitative analogue to predicates in the predicate transformer semantics; they yield expectation transformer semantics of probabilistic programs. An expectation transformer is a total function between two expectations on the states of a program. The expectation transformer $wp(prog,postE)$ gives the least expected value of $postE$ over the distribution produced by the executions of $prog$. The annotation $\langle {preE} \rangle\; prog\;\langle {postE} \rangle$ holds for total correctness if and only if \begin{equation} preE\le wp(prog,postE),\label{eq:pre-wp-post} \end{equation} where $\le$ is interpreted in a point-wise manner. In other words, $preE$ gives in each initial state a lower bound for the expected value of $postE$ on final states when $prog$ starts from that initial state.

In the special case when expectations $preE$ and $postE$ are given by $[pre]$ and $[post]$, respectively, where $pre$ and $post$ are predicates, annotation $\langle {preE} \rangle\; prog\;\langle {postE} \rangle$ is just the embedded form of a standard Hoare-triple specification $\{pre\}\; prog\;\{post\}$, i.e., we can assume the truth of $post$ after the execution of $prog$, provided $pre$ holds in the initial state. More precisely, we have \[ [pre]\le wp(prog,[post])\quad\mbox{iff}\quad pre\Rightarrow wp(prog,post). \]In this view, inequality between expectations in expectation transformer semantics can be seen as a generalization of implication between predicates in predicate transformer semantics.

Reference

McIver, Annabelle, and Charles Carroll Morgan. "Abstraction, refinement and proof for probabilistic systems." Springer, 2006.

Monday, September 1, 2014

A note on predicate transformers and loop invariants

For standard programs, the computational model of execution supports a "logical" view—given a set of final states, we can examine the program to determine the largest set of initial states from which execution of the program is guaranteed to reach the final states. The sets of states are characterized by logical formula called predicates, and the program is being regarded as a predicate transformer, i.e., a total function that maps one predicate to another. We give some important terminology before we proceed:
  • An assertion is a predicate placed in a program to assert that it holds at that point
  • A precondition is an assertion placed before a program segment
  • A postcondition is an assertion placed after a program segment
  • A formula in form $\{precond\} \; program \; \{postcond\}$ is called a Hoare triple
  • A Hoare triple is valid iff whenever the precondition holds, the postcondition holds after the execution of the program segment terminates
  • A loop invariant is an assertion supposed to be true before and after each iteration of a loop
The following example puts all ingredients together:
// Precondition: (x=0 and y=0 and n≥0)
while (y<n) { 
    // Loop invariant: 2*x = y*(y-1)
    x+=y; y++; 
}
// Postcondition: 2*x = n*(n-1)
Observe that $\{x=0 \wedge y=0 \wedge n\ge0\}$ while(y<n){x+=y;y++;} $\{2x=n(n-1)\}$ is a valid Hoare triple and $2x = y(y-1)$ is an invariant for the while loop.

Predicate transformer semantics provide an effective algorithm to verify a Hoare triple by reducing the verification problem to the problem of proving logical entailment: Given a Hoare triple $\{pre\} \; prog \; \{post\}$, we first compute the weakest precondition of executing $prog$ starting from $post$, and then check that it entails $pre$. This amounts to proving that $$pre \Rightarrow wp(prog, post).$$When $prog$ doesn't contain a loop, the weakest precondition is simply given by syntactic rules. However, the presence of loops may pose a problem because the precondition is given in terms of a least fixed point. Besides, a separate proof is required to establish that the loops terminate from their initial state. Formally, if $G$ is a predicate and $body$ is a loop-free program segment, then \begin{equation} wp({\rm while}(G)\{body\}, post) = \mu X.((G\Rightarrow wp(body, X)) \wedge (\neg G\Rightarrow post)),\label{loop-wp} \end{equation} where $\mu$ is the least fixed point operator w.r.t. logic implication. Given that the loop terminates and the least fixed point exists, the construction of the weakest precondition can be sketched in set theory as follows. Let $\Sigma$ denote the state space. We define a family $\{A_k:k\ge0\}$ of subsets of $\Sigma$ by induction over $k$: \begin{array}{rcl} A_0 & = & \emptyset \\ A_{k+1} & = & \left\{\ y \in \Sigma: ((G \Rightarrow wp(body, x \in A_k)) \wedge (\neg G \Rightarrow post))[x \leftarrow y]\ \right\} \\ \end{array} Informally, $A_k$ represents the set of initial states that makes the postcondition $post$ satisfied after less than $k$ iterations of the loop. We can then define the weakest precondition in \eqref{loop-wp} as a first-order predicate $\exists k. x \in A_k$.

While an abstract construction as above is elegant, it can not be handled efficiently by theorem provers in practice. Fortunately, using special assertions called loop invariants we may by and large avoid the need of calculating fixed points. Suppose that we want to verify \begin{equation} pre \Rightarrow wp({\rm while} (G) \{body\}, post).\label{loop-hoare} \end{equation} Instead of calculating the least fixed point of the loop directly, we may divide the problem into simpler subtasks:
  1. Find an assertion $I$ such that $pre \Rightarrow I$ and $I\wedge\neg G \Rightarrow post$.
  2. Show that $I$ is an invariant: $I\wedge G \Rightarrow wp(body, I)$.
  3. Show that $I$ is sound: $I \Rightarrow wp({\rm while}(G)\{body\}, I\wedge\neg G)$.
The first task ensures that the existence of $I$ establishes the desired relation between $pre$ and $post$. The second task considers a single iteration of the loop body and ensures that the execution of the loop preserves the validity of $I$. Note that it is easy to compute $wp(body, I)$ because $body$ is loop-free by assumption. In the third task, we check the soundness property such that every execution of the loop from a state satisfying the invariant can only terminate in a state that also satisfies the invariant and violates the guard $G$. It is still an open problem in the literature to give a necessary and sufficient condition for the soundness. On the other hand, soundness property can be established if one can show, e.g., by using a loop variant, that the loop terminates when it started from a state satisfying $I$.

With invariant $I$, the relation between $pre$ and $post$ in \eqref{loop-hoare} is established as $$pre \Rightarrow I \Rightarrow wp({\rm while}(G)\{body\}, I\wedge\neg G)\quad {\rm and }\quad I\wedge\neg G \Rightarrow post.$$ Checking correctness for code without loops turns out be easy in practice. Hence, the crucial point in verifying a Hoare triple is to discover the necessary loop invariants for each loop in the program.

References

Gries, David. "The science of programming." Vol. 1981. Heidelberg: Springer, 1981.

Monday, July 21, 2014

Multivariate Lagrange Interpolation

\[ \def\R{\mathbb{R}} \def\v#1{\mathbf{#1}} \def\vv#1#2{\mathbf{#1}_{#2}} \]
Let $P_{m}^{n}$ denote the vector space formed by real-coefficient, $m$-variable polynomials of degree at most $n$. The dimension of $P_{m}^{n}$ is $d=\tbinom{m+n}{n}$, since each polynomial in $P_{m}^{n}$ consists of at most $\tbinom{m+n}{n}$ monomials. Therefore, if $P\subset P_{m}^{n}$ is an independent set of cardinality $d$, then every polynomial in $P_{m}^{n}$ can be uniquely represented by a linear combination of polynomials in $P$. Consider a function $f\in P_{m}^{n}$ which can only be accessed as a black-box. That is, we can request it for value $f(\v s)$ at any point $\v s\in\R^{m}$, but the explicit representation of $f$ is hidden from us. In such case, Lagrange interpolation offers us a possibility to determine the function by sampling and evaluating points from its domain.

The basic idea behind Lagrange's method is to choose $d$ points $\vv{\v s}1,\dots,\vv sd\in\R^{m}$ and compute a set of polynomials $P=\{L_{1},\dots,L_{d}\}\subset P_{m}^{n}$, which we shall called a Lagrange basis, such that $L_{i}(\vv sj)=[i=j]$ for $1\le i,j\le d$. Observe that $P$ is an independent set, and thus polynomials in $P$ uniquely determine polynomials in $P_{m}^{n}$.

Given a set of points $\vv s1,\dots,\vv sd\in\R^{m}$, we can use a procedure proposed in [1] to compute a Lagrange basis from the points. First, let $\{q_{1},\dots,q_{d}\}\subset P_{m}^{n}$ denote the set of monomials in $P_{m}^{n}$. Consider polynomial functions $M_{1},\dots,M_{d}\in P_{m}^{n}$ defined as: \[ M_{i}(\vv x{})=det\begin{bmatrix}q_{1}(\vv s1) & \cdots & q_{d}(\vv s1)\\ \vdots & & \vdots\\ q_{1}(\v x) & \cdots & q_{d}(\v x)\\ \vdots & & \vdots\\ q_{1}(\vv sd) & \cdots & q_{d}(\vv sd) \end{bmatrix}\leftarrow\mbox{the $i$th row} \] Observe that $M_{i}(\vv sj)=0$ for $i\neq j$, and $M_{1}(\v s_{1})=\cdots=M_{d}(\v s_{d})=M$ for some $M\in\R$. If $M=0$, then there is no Lagrange basis associated with the points. If $M\neq0$, then $\{M_{i}(\v x)/M:\,i=1,\dots,d\,\}$ is a Lagrange basis. Hence, any polynomial $f\in P_m^n$ can be written in the Lagrange form as $$f(\v x)=\sum_{i=1}^{d}f(\vv si)M_{i}(\v x)/M.$$ Remark. The fact that $M=0$ reflects a geometrical dependency among the $d$ sampling points, in which case it is impossible to determine a unique polynomial from these points. Characterizing the geometry configuration of the points that lead to $M=0$ is an intricate research problem. See e.g., [2] for more details.

References

1. Saniee, K. "A simple expression for multivariate Lagrange interpolation." SIAM Undergraduate Research Online, 2008.
2. Olver, Peter J. "On multivariate interpolation." Studies in Applied Mathematics 116.2, 2006.

Friday, July 4, 2014

Constructive Logic: The Law of the Excluded Middle

In constructive logic, a proposition $P$ is called decidable if $P\vee\neg P \;true$ is provable constructively. That is, either we can write a proof to show $P \;true$, or we can write a proof to show $\neg P \;true$. Since $P$ might express an open problem for which we have neither a proof nor a refutation at present, not every proposition is decidable in the constructive sense. Hence, the Law of the Excluded Middle (LEM) cannot be expected to hold universally in constructive logic as it does in classic logic, where every proposition is decidable under the truth table interpretation.

Note that this doesn't mean constructive logic refutes LEM. In fact, we are free to assume LEM in constructive logic without fear of causing inconsistency. Even when $P$ is not known to be decidable, it may still be proved constructively with the proviso that $P\vee\neg P \;true$ holds. Adding such an assumption is always safe, precisely because constructive logic does not refute any instance of LEM. In fact, classical logic can be regarded as a special case of constructive logic in which LEM holds for all propositions. Proofs in classical logic may, and often do, rely implicitly on the universal validity of LEM, whereas in constructive logic we often prove without it. Hence, any proposition provable in constructive logic is also provable in classical logic.

What if one attempts to prove or disprove LEM, i.e., $P\vee\neg P \;true$, in constructive logic? In order to prove $P\vee\neg P \;true$, we must either prove $P \;true$ or prove $\neg P \;true$. However, we cannot expect to do this for arbitrary proposition $P$ in the absence of further information about it. Similarly, the only option to prove $\neg (P\vee\neg P) \;true$ is to assume either $P \;true$ or $\neg P \;true$, which leads to $P\vee\neg P \;true$, and then deduce absurdity. This is also impossible, for doing so for arbitrary $P$ means refuting all propositions or their negations. Hence, constructive logic neither affirms nor refutes LEM.

Having known that one cannot prove either $P\vee\neg P \;true$ or $\neg (P\vee\neg P) \;true$, it may be surprising that $\neg \neg (P\vee\neg P) \;true$ is in fact provable. Here is an outline of the proof:
  1. Note that $\neg Q$ is a syntactic abbreviation of $Q\rightarrow \perp$. Hence, $\neg Q\;true$ can be proved by assuming $Q \;true$ and deriving a contradiction (i.e. $\perp \;true$) from the assumption. Also recall that a contradiction can be derived by assuming $Q\;true$ and $\neg Q\;true$, which is known as the Principle of Explosion.
  2. The strategy to prove $\neg \neg (P\vee\neg P) \;true$ is then to derive $P\vee\neg P \;true$ assuming $\neg (P\vee\neg P) \;true$, since $\neg (P\vee\neg P)\;true\Rightarrow P\vee\neg P \;true$ implies $\neg (P\vee\neg P)\Rightarrow \neg (P\vee\neg P)\;true, (P\vee\neg P)\;true$, which further implies $\neg (P\vee\neg P)\;true\Rightarrow\perp\;true$, i.e., the assumption leads to a contradiction.
  3. Assume that $\neg (P\vee\neg P) \;true$. If $P \;true$ is provable, then so is $P\vee\neg P \;true$. Since this leads to a contradiction, we have $P\;true\Rightarrow\perp\;true$, i.e., $\neg P \;true$ is provable. But again, this implies $P\vee\neg P \;true$ is provable, a contradiction. Since $\neg (P\vee\neg P) \;true$ always leads to contradictions, we have $\neg (P\vee\neg P)\;true \Rightarrow \perp\;true$, i.e., we found a proof of $\neg \neg (P\vee\neg P) \;true$.
The derivation described above can be formally written as follows [2, 3]:
$$ \genfrac{ }{ }{0.5}{0} { \genfrac{ }{ }{0.5}{0} { \genfrac{ }{ }{0.5}{0}{ } {\Gamma\vdash\neg(P\vee\neg P)\;true} R^u \quad \genfrac{ }{ }{0.5}{0} {\genfrac{ }{ }{0.5}{0}{ }{\Gamma\vdash P\;true} R^v} {\Gamma\vdash (P\vee\neg P)\;true} \vee\!{-}I{-}L } { \genfrac{ }{ }{0.5}{0} {\Gamma \vdash\!\bot\; true} { \genfrac{ }{ }{0.5}{0}{ }{\Gamma\vdash\neg(P\vee\neg P)\; true} R^u \quad \genfrac{ }{ }{0.5}{0} {\;\Gamma\vdash\neg P\;true} {\;\Gamma\vdash P\vee\neg P\; true} \vee\!{-}R } \neg I^v } } { \genfrac{ }{ }{0.5}{0} {\Gamma\vdash\!\bot\;true} {\neg\neg(P\vee\neg P)\;true} \neg{-}I^u } $$
The fact that $\neg \neg (P\vee\neg P) \;true$ is provable means, while it is consistent to add LEM to constructive logic, we cannot add the refutation of LEM without degenerating the logic into inconsistency. Recall that, while LEM is not provable in general, we can assume it in a proof without fear of contradiction. In contrast, since the double negation of LEM is provable for every proposition, assuming the denial of LEM would soon lead us to absurdity.

As is shown in the foregoing argument, $P \;true$ is in general not provable even though $\neg\neg P \;true$ is provable. On the other hand, it is straightforward to show that $P\;true\Rightarrow\neg\neg P\;true$. Therefore, $\neg\neg P$ is a proposition strictly weaker than $P$ itself in constructive logic. Namely, being able to reject any refutation of $P$ does not mean we actually know a proof of $P$, but being able to prove $P$ suffices to reject any refutation of it by  absurdity. It is possible to interpret classical logic in terms of constructive logic by doubly negating all statements, rendering them provable in constructive logic. From this perspective, constructive logic is more powerful than classical logic, in that it can distinguish constructive theorems from non-constructive ones whereas classical logic cannot.

References and further readings

1. Practical Foundations for Programming Languages, Robert Harper, 1st ed., 2012
2. Supplements of Constructive Logic (Spring 2005), taught by Robert Harper
3. Handouts of Constructive Logic (Fall 2009), taught by Frank Pfenning
4. On the Meanings of the Logical Constants and the Justifications of the Logical Laws, Per Martin-Lof
5. Truth of a Proposition, Evidence of a Judgement, Validity of a Proof, Per Martin-Lof
6. A compilation of lecture notes on Intuitionistic Type Theory given by Per Martin-Lof, and a friendlier introduction to ITT by Silvio Valentini.
7. Intuitionistic Logic, Dirk van Dalen

Saturday, June 21, 2014

Verifying the Commutativity of Reduce Functions

MapReduce is a programming model that assumes as input a large, unordered stream of data. Computation essentially proceeds in three steps:
Map A map function is applied to each input, which outputs zero or more intermediate key-value pairs of an arbitrary type.
Shuffle All intermediate key-value pairs are grouped and sorted by key, so that pairs with the same key can be reduced together.
Reduce A reduce function combines values with the same key and produces results associated with that key in the final output.
Note that there is in fact an optional optimization step between map and shuffle that combines values by key on a map node. This is useful as reducing the number of key-value pairs locally before they are shuffled. We ignore this step because it is irrelevant to the topics discussed later.

While records are sorted by key before they reach the reduce function, for any particular key, the order that the values appear is not stable from one run to the next, since they come from different map tasks, which may finish at different times from run to run. As a result, most MapReduce programs are written so as not to depend on the order that the values appear to the reduce function. In other words, an implementation of a reduce function is expected to be commutative with respect to its input. Formally, a reduce function, which we may alternatively call a reducer hereafter, is commutative if for each input list $L$, it returns the same result for each permutation $\sigma$ of that list: $$\forall L, \sigma: reduce(k, L) = reduce(k, \sigma(L))$$ Note that we treat reduce functions in a very high-level manner: A reducer in our mind is just a function that takes a key and a list as input and returns a value as output. Moreover, we may assume without loss of generality that keys and values are all integers, say by considering their hash values instead of their actual values.
The main purpose of this post is to initiate an investigation to model check commutativity of reduce functions. Our modeling language supports Boolean and integer data types, Boolean operations and integer arithmetics, if-then-else and control structures that can be constructed from it, return statement, and a while loop over the iterator of the input list. Moreover, we introduce non-determinism to the language so that we can model uninterpreted function calls, etc. While this model looks like an over-simplification of reducers used in practical MapReduce programs, hopefully it has captured enough ingredients for us to derive non-trivial results that may shed light on more realistic use cases. Our first theorem about this model is as follows.
Theorem. The commutativity of reducers is undecidable.
Proof. We shall reduce the Diophantine problem , i.e., determining the solvability of Diophantine equation systems, which is known to be undecidable in general, to the problem of determining commutativity of reducers. Given a Diophantine equation system $P: P_1(x_1,...,x_k)=0$, ..., $P_n(x_1,...,x_k)=0$, we define a reducer as follows:
public int reduce(int key, Iterator<int> values) {
    int x1, ..., xk;
    if(values.hasNext()) x1 = values.next(); else return 0;
    ...
    if(values.hasNext()) xk = values.next(); else return 0;

    if(P1(x1,...,xk)!=0) return 0;
    ...
    if(Pn(x1,...,xk)!=0) return 0;

    int y1, y2;
    if(values.hasNext()) y1 = values.next(); else return 0;
    if(values.hasNext()) y2 = values.next(); else return 0;
    return y1 - y2;
}
It is clear that if equation system $P$ has no solution, the reduce function always returns zero regardless of its input, i.e., it is commutative. On the other hand, if there is a solution, then its return value depends on its input values as well as the order that they are iterated, i.e., the function is not commutative. Note that the return value $y_1-y_2$ makes the reducer non-commutative even when $P$ has a unique solution with $x_1=...=x_k$. In this way, we reduced the problem of checking the solvability of $P$ to that of checking the commutativity of a reducer. This concludes our proof. $\square$
As verifying commutativity is undecidable, the best hope for us is to derive a semi-algorithm that is effective for some interesting case studies. Our first attempt is to use abstract interpretation, along with counter abstraction for lists, and reduce the verification problem to a reachability problem. Let $t,n\in\mathbb N$ be refinement parameters. A state is a 4-tuple $(L,pc,itr,V)$ where $L$ is an abstract list value, $pc$ is the program counter, $itr$ is the position of iterator, and $V$ is the valuation in abstract domain $[-n, n]$. An abstract list is a prefix of size $t$ followed by a subset of $\{-n,...,n\}$ that represents the "tail". For example, an abstract list $1\rightarrow 2\rightarrow \{1,3\}$ concretizes to set of lists $12\{1,3\}^*$ (in regular expression). We say an abstract list is "likely" a permutation of another abstract list if any of the permutations of its concretization is contained in the concretization of the other abstract list. Finally, a state is called accepting if its $pc$ equals to the line number of some return statement.
Now, given two initial states $(L,pc,itr,V)$ and $(L',pc',itr',V')$ such that $L$ is likely a permutation of $L'$, we carry out two coordinated symbolic executions from each state and see if they can reach accepting states after the same number of steps (note that we have to allow nondeterministic NOP's so that the executions can represent real paths of different lengths even though they are coordinated). We say that the executions reach a bad state if the two paths reach some return statements, say "return x" and "return y", respectively, and $V(x)\neq V'(y)$. If no bad state is reached, then we have proved that the reducer is commutative. Otherwise, we check the bad state we reach and see if it is in effect concrete, i.e., abstract lists do not have tails and all valuations are in $(-n, n)$. If so, then we have proved that the reducer is not commutative. If the bad state is not concrete, then we have found a plausible bug, but it may be a false alarm since abstract states are over-approximation of concrete states. In such case, we need to refine the abstraction, e.g., by increasing parameters $t,n$, and re-run the execution from the beginning. Continuing in this flavor, we can obtain an answer for certain if the process terminates. Of course, it is possible that the process never stops and we just keep doing refinements again and again. This possibility cannot be eliminated, however, since the question we want to answer is undecidable per se.

References and further readings

1. Yu-Fang Chen, Chih-Duo Hong, Nishant Sinha, and Bow-Yaw Wang. "Commutativity of Reducers", TACAS, 2015.
2. A note on Hilbert's 10th Problem by Yuri Matiyasevich, or see this overview paper for a glimpse of the literature.
4. Csallner, Christoph, Leonidas Fegaras, and Chengkai Li. "New ideas track: testing MapReduce-style programs." ACM SIGSOFT, 2011.

Monday, June 16, 2014

Functors and Monads in Scala ─ Part I

A type constructor is a type that you can apply type arguments to construct a new type: given a type A and a type constructor M, you get a new type M[A]. A functor is a type constructor, say M, together with a function
    lift: (A -> B) -> M[A] -> M[B]
that lifts a function over the original types into a function over the new types. Moreover, we require that the lifting function should preserve composition and identities. That is, if function h is formed by composing functions g and f, then lift(h) should be equivalent to the composition of lift(g) and lift(f). Formally, this means
    h(a) = g(f(a))  implies  lift h a = lift g (lift f a).
Also, if h is an identity function on variables of type A, then lift(h) should be an identity function on variables of type M[A]:
    h(a) = a  implies  lift h a = a.
A monad is a functor associated with two special operations, unitizing and binding:
    unit: A -> M[A]
    bind: (A -> M[B]) -> M[A] -> M[B]
A unitizing operation takes an object of the original type and maps it to an object of the new type. A binding operation is useful in composing two monadic functions, i.e., functions with signature $*\rightarrow M[*]$. Using binding, the composition of two monadic functions f: A -> M[B] and g: B -> M[C] can be defined as
    a => bind g f(a),
which has type A -> M[C]. Composition of monadic functions is supposed to be associative and has unit as a left and right identity. For this purpose, definitions of bind and unit must satisfy three "monad laws". That is, the following three equations
1.    bind (a => bind g f(a)) la = bind g (bind (a => f(a)) la)
2.    bind f (unit a) = f(a)
3.    bind unit la = la
should hold for any values a with type A, la with type M[A], functions f with type A -> M[B] and g with type B -> M[C].

A monad can be thought of as representing computations just like a type represents values. These computations are operations waiting to be carried out. Presumably, the programmer has some interest in not carrying them out directly. For example, the computations to be done may not be fully known at compile time. One can find similar ideas of representing computations as objects in the Command design pattern of GoF. Regarding monads as computations, the purpose of unit is to coerce a value into a computation; the purpose of bind is to construct a computation that evaluates another computation and yields a value. Informally, unit gets us into a monad and bind gets us around the monad. To get out of the monad, one usually uses an operation with type M[A] -> A, though the precise type depends on the purpose of the monad at hand.

In Scala, a monad is conventionally treated like a collection associated with list operations: the unit operation corresponds to a constructor of singleton lists; lift and bind are named as map and flatMap, respectively, in the sense that lift maps a list to another list and bind maps a list of lists to another list of lists and then flattens the result to a one-layered list. Map and flatMap operations kind of subsume each other given the flatten operation, as
    flatMap g la = flatten (map g la)
    map (a => f(a)) la = flatMap (a => unit f(a)) la
Hence, the minimal set of operations a monad must provide can be {unit, flatMap} or {unit, map, flatten}. Scala doesn't define a base trait for Monad and leaves the decision of interface to the programmer. On the other hand, if a monad implements methods map, flapMap and filter, then it can be composed with other monads via for-comprehension to streamline monadic computations.

Examples

Optional values. Option (aka. Maybe) monads are used extensively in many functional programming languages: see here for a more general treatment of option monads, and here for a detailed explanation of their usages in Scala. Below is a minimalistic implementation of Option type, and its companion subtypes Some and None:
sealed trait Option[T] {
    def map[U](f: T => U):Option[U]
    def flatMap[U](f: T => Option[U]): Option[U]
}
case class Some[T](val t:T) extends Option[T] { 
    override def map[U](f: T => U): Option[U] = new Some[U](f(t))
    override def flatMap[U](f: T => Option[U]): Option[U] = f(t)
}
case class None[T] extends Option[T] {
    override def map[U](f: T => U): Option[U] = new None[U]
    override def flatMap[U](f: T => Option[U]): Option[U] = new None[U]
}
The canonical advantage of the option monad is that it allows composition of partial functions, i.e., functions that return a null pointer when they fail to give correct results. Instead, a partial function returns None (representing an "empty" Option) and this result may be composed with other functions. In other words, it allows transformation of partial functions into total ones. For example, suppose g1, g2 and g3 are functions that return either a null pointer or an addable value, and we want to print g3(g2(x) + g1(x)) given that the result is meaningful. Without Option, one has to use nested if-then null checks or a try-catch block to handle possible null inputs/outputs, for example,
try{ 
  val y = g3(g2(x) + g1(x))
  if(y != null) println(y)
}catch {
  case e : NullPointerException => ; 
}
If we make g1, g2 and g3 return an option monad via a wrapper, we can achieve the same purpose using for-comprehension, so that nothing would be printed if any of g1, g2 or g3 returns null:
def monadify[T](f: T => T) =
  (x: T) => { 
    val y = if(x!=null) f(x) else x
    if(y != null) Some(y) else None
  }
for {
  y <- monadify(g1)(x)
  z <- monadify(g2)(x)
  r <- monadify(g3)(y + z)
} yield println(r)
Whether the functional ("idiomatic") way of handling null pointers is preferable to the traditional one is still under debate, especially among sophisticated users of imperative programming languages. Since these two approaches differ very little in performance and Scala supports both of them, the choice is really only a matter of taste than truth IMHO.

An important thing to notice about this example is that the monad is used here as a control structure. We have got a sequence of operations chained together in a monad via for-comprehension. When we evaluate the monadic sequence, the implementation of the binding operator that combines monads actually does control flow management. This trick of using the chaining operator to insert control flow into monadic sequences is an incredibly useful technique and is used in Scala by a variety of monads beside Option, such as Try, Either, Future and Promise.

State transformers. A stateful computation can be modeled as a function that takes inputs with an initial state and produces an output paired with the new state. Due to the presence of states, a function may produce different results even though it is given the the same inputs. Let Env be a type representing the possible states of the environment. An Effect is a mapping from Env to Env that describes changes of states, i.e., the side-effects caused by a computation, and the monad type M adjoins an extra side-effect to any type:
    Effect = Env -> Env
    M[A] = (A, Effect)
The unit function pairs up a value with the identity on side-effects; the map function propagates a state leaving it unchanged:
    unit a = (a, e => e)
    (a, e) map f = (f(a), e)
If you want to adjoin two side-effects e and e', you can use bind to compose them and get a single effect e' ○ e, where ○ denotes function composition:
    (a, e) bind (a => (f(a), e')) = (f(a), e' ○ e)    
In Scala, the state monad described above may be written as
case class M[+A](a: A, e: Env => Env) {
    private def bind[B](f: A => M[B]): M[B] = {
      val fa = f(a); M(fa.a, env => fa.e(e(env)))
    }
    override def toString = showM(this)
    def map[B](f: A => B): M[B] = bind(x => unitM(f(x)))
    def flatMap[B](f: A => M[B]): M[B] = bind(f)
}
def unitM[A](a: A) = M(a, env => env)
The following example shows a state monad that increments a counter each time an arithmetic operation is invoked. (Note that one can actually declare the counter as a field member, avoiding the use of monads. This may explain the viewpoint that state monads are arguably less important in impure FPs like Scala than in pure FPs like Haskell.) The add and div operations demonstrate how to handle a monad wrapped in another monad using nested for-comprehensions.
type Env = Int
// Ad hoc operations for this particular monad
def showM[A](m: M[A]) = "Value: " + m.a + "; Counter: " + m.e(0)
val tickS = M(0, e => e + 1)

// Following code demonstrates the use of monads
type Value = Double
def add(a: M[Option[Value]], b: M[Option[Value]]): M[Option[Value]] = 
  for {
    _  <- tickS
    aa <- a 
    bb <- b
    c = for {
      a <- aa if aa.nonEmpty
      b <- bb if bb.nonEmpty
    } yield a + b     // yield None if aa is None or bb is None
  } yield c

def div(a: M[Option[Value]], b: M[Option[Value]]): M[Option[Value]] = 
  for {
    _  <- tickS
    aa <- a
    bb <- b
    c = for {
      a <- aa if aa.nonEmpty
      b <- bb if bb.nonEmpty
      b <- Some(b) if b != 0
    } yield a / b     // yield None if b==0 or aa is None or bb is None
  } yield c

val const0 = unitM(Some(0.0))
val const1 = unitM(Some(1.0))
val const2 = unitM(Some(2.0))
val const3 = unitM(Some(3.0))
val expr = div(add(const3, add(const1, const2)), const2)  
println(expr)                            // Value: Some(3.0); Counter: 2
println(add(div(expr, const0), const1))  // Value: None; Counter: 5
The syntax sugar of for-comprehensions abstracts the details and help the user manipulate monads without getting involved in the underlying machinery of binding and lifting. The fact that the abstraction works as expected is a consequence of the monad laws, and it is up to the designer of the monad to assure that the implementation does keep the laws.

We just stop here as this post is getting too long. More examples, techniques and discussions will be introduced in the Part II of this tutorial.

References and resources

1. Monads are Elephants, Part 1 ~ Part 4, is a gentle and pragmatic tutorial that explains exactly what a Scala programmer needs to know to exploit monads. Scala Monads: Declutter Your Code With Monadic Design is one of the videos on YouTube that teaches you how to write modular and extensible programs with monadically structured code.
2. Many excellent papers and slides about monads could be found in the homepage of Philip Wadler, one of the most enthusiastic advocates of monadic design in the FP community. Among others, The Essence of Functional Programming and Monads for Functional Programming are two must-reads for anyone who is struggling to turn monadic reasoning into his intuitions.
3. For a general and rigid mathematical treatment of monads, see e.g., Arrows, Structures, and Functors: The Categorical Imperative, where Section 10.2 describes monads as generalized monoids in a way that basically makes everything that we can write down or model using abstract syntax a monad. Mathematically mature computer science readers will find everything they need to know about the subject in the celebrated book Categories for the Working Mathematician by Mac Lane, the co-founder of category theory.
4. There are also books available online that introduce programmers with moderate math background into the subject, including Category Theory for Computing Science by M. Barr and C. Wells, and Computational Category Theory by D.E. Rydeheard and R.M. Burstall. Also check this thread and references therein about the relations between category theory, type theory, and functional programming.

Thursday, May 29, 2014

Equivalence Checking in Lambda Calculus

Definitional equivalence checking is the problem of determining whether two λ-terms in λ-calculus are equivalent in definition. (By "definitional", we mean that equivalence is defined directly by rules, rather than by an appeal to an operational semantics.) Equivalence of λ-terms in untyped λ-calculus is undecidable, see Scott's Theorem in [3]. This result is not surprising, since untyped λ-calculus is Turing complete. On the other hand, it is decidable for simply typed λ-calculus (aka. simple type theory, STT) with a single base type, see Sec. 6 of [4]. Equivalence checking would become undecidable once we extend STT with dependent types such as lists, or recursive data types such as natural numbers, see Chap 4 and 5 of [5].

It is interesting to check the weaker relation that whether two λ-terms are equivalent in operational semantics, i.e., they reduce to the same value given the same context. For example, $\lambda x.x:\;Nat\rightarrow Nat$ and $\lambda x.pred(suc\ x): Nat\rightarrow Nat$ are not definitionally equivalent but they are operationally equivalent. For untyped λ-calculus this problem is clearly undecidable, but for STT I am not sure. I haven't found enough materials to say anything on this subject by far.

References

1. Notes on Simply Typed Lambda Calculus (another lecture note)
2. Simply Typed Lambda Calculus on Wiki
3. Practical Foundations for Programming Languages (Preview of the 1st ed.)
4. Advanced Topics in Types and Programming Languages (available online)
5. http://www.cs.cmu.edu/~rwh/courses/logic/www/handouts/logic.pdf

Friday, May 23, 2014

Scala: Mapping a set removes duplicates

Consider the following code:
def sumSizes(collection: Iterable[List[_]]): Int = collection.map(_.size).sum
The function shall count the accumulated size of all the contained Lists. The problem is that if collection is a Set, then after mapping the objects to their lengths, the result is still a Set and all Lists of the same size are reduced to a single representative. For example, sumSizes(List(List(1,2), List(3,4))) outputs 4 but sumSizes(Set(List(1,2), List(3,4))) mistakenly outputs 2. To avoid such pitfalls one may write the method as
/* version 1 */
def sumSizes(collection: Iterable[List[_]]): Int = collection.toSeq.map(_.size).sum
/* version 2 */
def sumSizes(collection: Iterable[List[_]]): Int = collection.foldLeft(0)(_+_.size)
The first version requires an unnatural discipline of "always using toSeq before mapping" in programming, and worse, it introduces an overhead of allocating memory and copying elements which is seldomly necessary. The second version doesn't causes overheads, but it is asymmetric and thus less elegant. It is even potentially less efficient in a context where the underlying collection is parallel, since sum, which is implemented as fold(0)(_+_) in the Scala library, can be optimized better than foldLeft or foldRight in such a context.

Accidental removal of duplicates is often harder to spot in real code, especially when you write utility code which simply takes a Traversable and/or use the output of methods which are declared to return a Traversable, which may bring surprising results when the implementation happens to be a Set. Imagine the following situation:
val edgeNum = graph.nodes.map(_.edges).map(_.size).sum / 2
You fetch a collection of Node objects from a graph, use them to get their adjacent edges and sum over them. This works if graph.nodes returns a Seq, and it ruins if someone decides to make Graph return its nodes as a Set. A way to defend against surprises is to explicitly label types. For example, annotating methods with return types, even if it's not required. The annotation serves as a caution for the programmer if he wants to change the type of graph.nodes from Seq to Set.

All in all, the best practice to prevent these types of errors is good unit test coverage with representative data, together with sensible API's coupled with knowledge of how the scala compiler maintains source types through map/for generators etc. If you are returning a Set of something you should make that obvious, as returning a Collection/Traversable is hiding a relevant implementation detail and that sometimes causes unexpected problems.

Saturday, May 3, 2014

Scala: Subtyping

When a class inherits from another, the first class is said to be a nominal subtype of the other one. It is a nominal subtype because each type has a name, and the names are explicitly declared to have a subtyping relationship. On the other hand, a structural subtype relationship holds if two types have the same members. While the compiler only verifies this property at the level of type checking, a subtype should correctly implement the contracts of its supertypes so that Liskov's Substitution Principle applies, which essentially says that beside supporting the same operations, the operations in the subtype should require no more and provide no less than the corresponding operations in the supertype.

The Scala compiler allows a type’s subtypes to be used as a substitute wherever that type is required. For classes and traits that take no type parameters, the subtype relationship simply mirrors the subclass relationship. For classes and traits that take type parameters, however, variance comes into play. For example, class List is covariant in its type parameter, and thus List[Cat] is a subtype of List[Animal] if Cat is a subtype of Animal. These subtype relationships exist even though the class of each of these types is List. By contrast, class Array is nonvariant in its type parameter, so Array[Cat] is not a subtype of Array[Animal]. See here for a more detailed explanation of Scala's variation rules.

Subtyping rules of function types

Suppose that Cat and Dog are subtypes of type Animal. Consider function types A and B defined as follows:
type A := Animal => Dog
type B := Cat => Animal
What should be the subtyping relation between A and B? According to Liskov's Substitution Principle, $T$ is a subtype of $T'$ iff whatever we can do with $T'$, we can do it with $T$. In this example, we can pass a cat to a function of type B and get some animal in return. We can do the same thing to A: if we pass a cat to a function of type A, it returns a dog, which is an animal. Hence, A satisfies the same contract with B (in fact its contract says more than that, as it guarantees that the returned animal is a dog), and thus A is a subtype of B.

In general, if we denote the "subtype of" relation by $<$, then we can express the subtyping rule as $(A \Rightarrow B) < (C \Rightarrow D)$ iff $C < A$ and $B < D$. To see this, note that arguments are something that’s required, whereas return values are something that’s provided. To satisfy Liskov's Substitution Principle, a function subtype must require less and provide more than its supertype. Since a subtype contains more information than its supertype, a reasonable function type should be contravariant in the argument types and covariant in the result type.

Behavioral subtyping

Let A and B be two types. We say A is a behavioral subtype of B iff the set of properties an object of type A satisfies is a superset of the properties satisfied by an object of type B. In terms of Liskov's principle, this means that a property provable by objects of type B should also be provable by objects of type A.

Behavioral subtyping is a stronger notion than classical subtyping rules of function types, which rely merely on type signatures. It is a semantic rather than syntactic relation because it intends to guarantee semantic interoperability of types in a hierarchy. Behavioral subtyping is thus trivially undecidable: if the property to prove is "a function always terminates for arguments of type A", then it is impossible in general for a program (e.g. a compiler) to verify that property holds for any subtype of A.

Behavioral subtyping of function types imposing requirements on signatures just like the classical subtyping rules. In addition, there are a number of behavioral conditions a function subtype must meet:
  • Preconditions cannot be strengthened in a subtype.
  • Postconditions cannot be weakened in a subtype.
  • Invariants of the supertype must be preserved in a subtype.
  • History constraint: objects are regarded as being modifiable only through their methods.
A violation of history constraint can be illustrated by defining a mutable point as a subtype of an immutable point. On the other hand, fields added to the subtype may be safely modified because they are not observable through the supertype methods. Thus, one may derive a circle with fixed center but mutable radius from immutable point without violating subtyping rules.

Monday, April 21, 2014

Java: HashMap vs. TreeMap vs. Hashtable vs. LinkedHashMap

Map is one of the most important data structures. In this post, I shall show you the main differences between HashMap, TreeMap, HashTable and LinkedHashMap.
There are four commonly used implementations of Map in Java SE: HashMap, TreeMap, Hashtable and LinkedHashMap. If we use one sentence to describe each implementation, it would be the following:
HashMap is a hash table without ordering on keys, and with amortized constant access costs.
TreeMap is a hash table allows to access entries based on key ordering with logarithmic costs.
LinkedHashMap is a hash table using a linked list to preserve the insertion order of entries.
Hashtable is the thread-safe version of HashMap without nullable values.

We give more specific remarks concerning the use of maps below.

HashMap

If you want to use self-defined objects as keys, make sure the equals and hashCode methods implemented correctly. By default, the hashCode and equals methods implemented in Object class are used. The default hash codes are distinct for distinct objects, and two keys are equal only if they have the same primitive value or they are references to the same object.

TreeMap

Keys for TreeMap must implement Comparable interface, since they have to compare with each other. Note that TreeMap uses compareTo other than equals to decide whether two keys are identical. Another difference between TreeMap and the other map implementations is that it provides guaranteed O(lgn) time cost for the containsKey, get, put and remove operations, while others guarantee amortized O(1) time cost (O(n) in the worst case) for these operations.

Hashtable

The Hashtable class is roughly equivalent to HashMap, except that it is synchronized and it does not permit null keys or null values.

LinkedHashMap

LinkedHashMap is a subclass of HashMap. It inherits the features of HashMap and, in addition, the output of the entrySet method preserves the insertion/access-order of the entries. There is also a special method removeEldestEntry that allows implementer to erase entries according to custom criteria. One direct application of LinkedHashMap in computer science is the LRU (least-recently-used) cache (example code).

    Thursday, March 13, 2014

    Scala: Views of Collections

    In Scala, you can create Views for many types of collections. Views are non-strict versions of collections meant to be treated much like a database view. This means that the elements are calculated at access and not eagerly as in normal collections. As an example consider the following code:
    val ys = (1 to 10).view.map { x => println(x); }
    
    This will not print anything but every access to the list will perform the calculation and print the value, i.e. every call to ys.head will result in a number being printed. If you want to get a strict version of the collection again you can call ys.force, which will print all numbers out.

    One use of views is to traverse a collection of values that are expensive to compute and only one value is needed at a time. Also, views let you build lazy sequences by calling toStream on them, which will also cache the evaluated elements. Another use of views is to avoid intermediate copies of collection. For example, the following code
    (1 to 10).map(_-1).map(_*2).map(-_)
    creates a new list for each call of map operation. To avoid the intermediate results, one can turn the list first into a view, applying all transformations to the view, and finally forcing the view to a sequence:
    (1 to 10).view.map(_-1).map(_*2).map(-_).force
    The three stored functions (_-1), (_*2), (-_) get applied as part of the execution of the force operation and a new sequence is constructed. That way, no intermediate data structure is needed. Another example of the use of view would be (assuming that we have very little memory at hand):
    scala> (1 to 1000000000).take(5).toList
    java.lang.OutOfMemoryError: GC overhead limit exceeded
    
    Here Scala tries to create a collection with 100000000 elements to access the first five. Using view, only the first five elements will be generated and accessed:
    scala> (1 to 1000000000).view.take(5).force.toList
    res2: List[Int] = List(1, 2, 3, 4, 5)
    
    The third use case applies to views over mutable sequences. Many transformer functions on such views provide a window into the original sequence that can then be used to update selectively some elements of that sequence. The view does not copy these elements, it just provides a reference to them. For example,
    def neg(a: collection.mutable.Seq[Int]) = a.indices.foreach(i => {a(i) = -a(i)})
    var A = Array(1,2,3,4,5,6)
    neg(A.slice(1,4))      // A doesn't change
    neg(A.view.slice(1,4)) // A becomes (1,-2,-3,-4,-5,6)
    

    Remarks. For smaller collection sizes the added overhead of forming and applying closures in views is often greater than the gain from avoiding the intermediary data structures. A possibly more important reason is that evaluation in views can be very confusing if the delayed operations have side effects. Because of the penalties incurred by views, one should usually force it after applying the transformations, or keep it as a view if only few elements are expected to ever be fetched, compared to the total size of the view.

    Tuesday, March 11, 2014

    Loop Invariants & Loop Variants

    Loop invariant

    In Hoare logic, the partial correctness of a while loop is governed by the following rule of inference:$$\frac{\{C\land I\}\;S\;\{I\}} {\{I\}\;\mathbf{while}\; C \; \mathbf{do}\;S\;\{\lnot C\land I\}}$$ The rule above is a deductive step where the premise states that the loop's body does not change $I$ from true to false given the condition $C$, and the conclusion that if the loop terminates then it leads from a state satisfying $I$ to a state satisfying $\lnot C\land I$. The formula $I$ in this rule is known as the loop invariant. In words, a loop invariant
    • holds before the loop begins
    • establishes the desired result (i.e. the postcondition) given that $\lnot C$ holds
    • holds before and after the execution of each iteration
    • describes what has been done so far and what remains to be done at each iteration
    Finding a useful invariant is often a tricky task requiring sophisticated heuristics. As an illustration, consider the following program that computes the factorial: $$\{x=n \land y=1\}\quad\mathbf{while}\;x\neq 0\;\mathbf{do}\;y:=y\cdot x;\ x:=x-1\quad\{y=n!\}$$ In each iteration, $y$ holds the result so far, $x!$ is what remains to be computed, and $n!$ is the desired results. Also note that $y$ is increasing while $x$ is decreasing. Hence, we can put the decrement of $x$ and the increment of $y$ together to obtain an invariant $x!\cdot y=n!$, as desired.
    Now consider a similar program that also computes $n!$: $$\{x=1 \land y=1\}\quad\mathbf{while}\; x<n\; \mathbf{do}\;x:=x+1;\ y:=y\cdot x\quad\{y=n!\}$$This time, both $x$ and $y$ increase as the loop iterates. One can immediately find an invariant $y=x!$ according to rules of thumb. This invariant, however, turns out to be too weak to prove the desired result: we need $y=n!$ after the loop terminates, while the iteration rules only gives $x\ge n \land y=x!$. To fix this, we can strengthen the invariant by conjuncting another invariant $x\le n$ with it. The new invariant $x\le n \land y=x!$ yields $x\ge n\land x\le n \land y=x!$ on termination, which is precisely what we need. (A similar problem would occur in the first example if the guard was $x\ge 1$ instead of $x\neq 0$. Again, this can be solved by strengthening the invariant with $x\ge 0$.)
    We refer the reader to this note for more interesting examples of loop invariants.

    Loop variant

    In Hoare logic, the termination of a while loop can be ensured by the condition $[V=z]\;S\;[V<z]$, where $<$ is a well-founded relation, $V$ is called the loop variant, and $z$ is a unbound symbol that is universally quantified. Combining it with the aforementioned rule of partial correctness, we obtain a rule that expresses the total correctness of the program:$$\frac{[I \land C \land V=z ]\;S\;[I \land V < z]}{[I]\;\mathbf{while}\;C\; \mathbf{do}\;S\;[I\land\lnot C]}.$$ The existence of a variant implies that a while loop terminates. It may seem surprising that the converse is true, as long as we assume the Axiom of Choice: every while loop that terminates (given its invariant) has a variant. To prove this, we need the following theorem:
    Theorem. Assuming the Axiom of Choice, a partially ordered set does not possess any infinite descending chains if and only if the corresponding strict order is well-founded.
    Assume that the while loop terminates with invariant $I$ and the total correctness assertion $[C\land I]\;S\;[I]$ holds. Consider a "successor" relation $>$ on the state space $\Sigma$ induced by the execution of statement $S$, such that $s>s'$ iff (i) $s$ satisfies $C\land I$, and (ii) $s'$ is the state reached from state $s$ after executing statement $S$. Note that $s\neq s'$, for otherwise the loop would not terminate. Next, we define the "descendant" relation, denoted by $\ge$, as the reflexive and transitive closure of the successor relation. That is, $s\ge s'$ if either $s=s'$, or there exist states $s_1, ..., s_n$ such that $s>s_1>...>s_n>s'$. Note that $\ge$ is antisymmetric, since $s\ge s'$ and $s'\ge s$ implies $s=s'$ under the termination assumption. Therefore, $(\Sigma,\ge)$ is a partially ordered set. Observe that each state has only finitely many distinct descendants, and thus there is no infinite descending chain. Assuming the Axiom of Choice, it follows from above theorem that the successor relation we defined for the loop is well-founded on the state space, since it is strict (irreflexive) and contained in the $\ge$ relation. Thus the identity function on this state space is a variant for the while loop, as we have shown that the state must strictly decrease (to its successor) each time the body $S$ is executed given the invariant $I$ and the condition $C$.

    A fixed-point characterization

    In predicate transformers semantics, invariant and variant are built by mimicking the Kleene fixed-point theorem. Below, this construction is sketched in set theory. Let $\Sigma$ denote the state space as before. We first define a family $\{A_k\}_{k\ge1}$ of subsets of $\Sigma$ by induction over natural number $k$. Informally, $A_k$ represents the set of initial states that makes the postcondition $P$ satisfied after less than $k$ iterations of the loop: \begin{array}{rcl} A_0 & = & \emptyset \\ A_{k+1} & = & \left\{\ y \in \Sigma\ | \ ((C \Rightarrow wp(S, x \in A_k)) \wedge (\neg C \Rightarrow P))[x \leftarrow y]\ \right\} \\ \end{array}We can then define invariant $I$ as the predicate $\exists k. x \in A_k$ and variant $V$ as the identity function on the state space with a well-founded partial order $<$ such that for $y,z\in\Sigma$, $y<z$ iff $\exists i.\forall j.(y \in A_i \wedge (z \in A_j \Rightarrow i < j))$.

    While the theory looks elegant, such an abstract construction cannot be handled efficiently by theorem provers in practice. Hence, loop invariants and variants are often provided by human users, or are inferred by some abstract interpretation procedure.

    Sunday, March 9, 2014

    Algorithm: Linked List Problems on Leetcode

    Copy List with Random Pointer

    Problem A linked list is given such that each node contains an additional random pointer which could point to any node in the list or null. Return a deep copy of the list.
    Solution It is straightforward to use a hash table to record the new node for each existing node.  The new list can then be established from the table in linear time (code). The drawback of a hash table is the space overhead. Here we offer another solution that takes $O(n)$ time and $O(1)$ auxiliary space (code). The idea is as follows:
    Step 1: Create a new node for each existing node and join them together, eg: A->B->C will become A->A'->B->B'->C->C', so that n' is a copy of n except that the random pointer in  n' is not set.
    Step 2: Copy the random links to the new nodes: for each node n,
    n'.random = n.random.next.
    Step 3: Detach the new nodes from the list: for each node n,
    n.next = n.next.next; n'.next = n'.next.next.

    Merge k Sorted Lists

    Problem Merge $k$ sorted linked lists and return it as one sorted list.
    Solution Denote the size of the $i$th list by $n_i$. Merging two lists with sizes $n_i$ and $n_j$ takes $\Theta(n_i+n_j)$ time. A naive approach to merge $k$ sorted lists would be merging them one by one. The time complexity would be $$\Theta(n_1+n_2) + \Theta(n_1+n_2+n_3) + ... + \Theta(n_1+...+n_k)=\Theta(kn_1+(k-1)n_2+...+n_k).$$
    (To be continued)

    Tuesday, March 4, 2014

    Node.js: Reading a Large Text File Line by Line

    We all know the famous slogan of Perl: "there is more than one way to do it", which encourages a Perl programmer to do things in a creative way using Perl's exuberant syntax. On the other hand, the advocates of Python take a minimalist approach and argue that "there should be one—and preferably only one—obvious way to do it", making a point of programming discipline and code readability. As a beginner to adopt Node.js in my daily projects, I found Node.js blends these two opposite philosophies in a funny manner: there are many ways to do it, but only one or two of them are preferable. However, telling the good one from the others is far from obvious even for veteran programmers.
    For example, if you want to read and process a large text file one line at a time, in Perl you can do it like this:
    while (<>) { chomp; process_line($_); }
    This line of Perl code does almost everything you expect it to do, including
    • has no limit on file size or number of lines
    • has no limit on length of lines
    • can handle full Unicode in UTF-8
    • can handle *nix, Mac and Windows line endings
    • does not use any external libraries not included in the core language distribution
    Surprisingly, it turns out that there is no simple solution in Node.js fulfilling all of the above requirements. Node.js questions concerning routine jobs of this kind recur in forums like StackOverflow, and the suggested solutions are often either pointed out to be inefficient by other reposters, or far less intuitive than one would've expected for a scripting type language.
    As to reading a large text file line by line, there are many third-party modules off the shelf to do this job. However, many of the modules are either out-of-date and abandoned by their developers, or reported to have horrifying bugs such as missing the last line, leaking massive memory, etc. Even when de-facto standard modules are present, they are usually under rapid development without proved reliability.

    Friday, February 21, 2014

    Paradoxes in the World of Probability

    Gambler’s ruin

    Consider a symmetric random walk on line $x\ge 0$ starting from some $c>0$. Suppose the walker stops when he reaches $x=0$. Let $X(t)$ denote the location of the walker at time $t$. Then $\Pr(X(t)=0\ \mbox{for some}\ t)=1$, but $E[X(t)]=c$ for all $t$. The latter can be seen by noting that $X(t) = X(t-1) + Y$ where $\Pr(Y=1)=\Pr(Y=-1)=0.5$. (It might be surprising that $E[X(t)]$ is the same as in an unbounded random walk.)
    Furthermore, if we define the time to stop as $T_k=\inf\{t:X(t)=k\}$, then for any $c>0$ we have $Pr\{T_0<\infty\}=1$ but $E[T_0]=\infty$. This is because that the distribution of $T$ is very heavily tailed. In fact, one can show that $\Pr\{T\ge n\}\approx \frac{a}{\sqrt{n}}$ for some constant $a$. An intuition to see $E[T]=\infty$ is as follows. Since $E[T_0] = c\cdot E[T_{c-1}]$, it suffices to consider $E[T_{c-1}]$. Conditioning on the direction of the first step: \begin{eqnarray*} E[T_{c-1}] & = & E[T_{c-1}\ |\ \mbox{left}] \Pr(\mbox{left}) + E[T_{c-1}\ |\ \mbox{right}] \Pr(\mbox{right})\\ & = & (1 + E[T_{c+1}]) \cdot 2^{-1}\\ & = & (1 + 2\cdot E[T_{c-1}]) \cdot 2^{-1} = 2^{-1} + E[T_{c-1}]. \end{eqnarray*} Hence the only possibility is $E[T_{c-1}] = \infty.$

    The St. Petersburg paradox I

    Consider a game where a player flips a fair coin and wins $2^n$ dollars if the head turns up at the $n$th flip. The game continues until the head turns up. Then the expect amount of money a player can win from the game is $\sum_{n\ge1} 2^n\cdot 2^{-n} = \infty$. Hence, to make this game fair, a player should pay an infinite amount of money to play this game!

    The St. Petersburg paradox II

    Following the previous example, how much should a player be charged if he is only allowed to play for at most $N$ rounds? Denote the reward of the player as $S_N$. It is tempting to estimate the reward by $E[S_N]=\sum_{n=1}^N 2^n\cdot 2^{-n} = N$. However, it can be shown that $\frac{S_N}{N\lg N}\rightarrow 1$ almost surely. Namely, when $N$ is large, the player is very likely to win about $N\lg N$ dollars from this game. Hence, $N\lg N$ is arguably a more reasonable price than $N$ to charge the player.

    A winning strategy for a fair game

    Consider a game where a player bets $X_n\ge0$ dollars on the $n$th round, $n\ge1$. After $X_n$ is decided, a fair coin is flipped. The player loses the bet if the tail turns up, and wins another $X_n$ dollars otherwise. Here is a strategy that guarantees a player to win 1 dollar almost surely from this game: (i) set $X_1=1$, (ii) set $X_{n+1}=2^n$ if he lost the $n$th round, and (iii) set $X_n=0$ for $n>N$ if he wins the $N$th round. Suppose the head first turns up at the $N$th flip. Then the expected amount of money he can win from the game is $2^N-(1+2^1+\cdots+2^{N-1})=1$ dollar. However, although the average number of rounds for this strategy is $1/0.5=2$, you are expected to pay $1\cdot 2^{-2}+2\cdot 2^{-3}+...=1/4+1/4+...=\infty$ dollars before you can get the dollar.

    Waiting time for a better offer

    Suppose you want to sell you car on the Internet. It turns out that if the offers to buy your house are i.i.d., then you may expect to wait forever to get an offer better than the first offer. To see this, denote the first offer by $X_0$ and the sequence of offers by $\{X_n:n\ge0\}$. Let $N=\inf\{n:X_n>X_0\}$ be the waiting time until the first offer better than $X_0$. Note that $$\Pr\{N>n\}=\Pr\{X_0=\max_{0\le k \le n} X_k\}\ge \frac{1}{n+1}$$ by symmetry. Hence $E[N]=\sum_{n\ge0} \Pr\{N>n\} = \sum_{n\ge1} \frac{1}{n} = \infty$. This fact suggests that you should accept the first offer if you have to make decision immediately after after an offer is made, and cannot change your mind after you made a decision.

    Power of hints in a fair game

    A banker tosses two fair coins and then covers the coins. To play this game, you choose to uncover one of them in blind. You win this game if the coin you choose is head.
    Case 1. Without further information, you have a 1/2 chance to win regardless of the coin you choose.
    Case 2. Suppose the banker uncovers one of the two coins. If you decide to choose the other coin, the chance you win is still 1/2 regardless of the result of the former coin. Hence, uncover a coin doesn't provide any information for the other coin.
    Case 3. Suppose the banker peeked the coins before they are covered, and he tells you that one of the coins is head. Then whatever coin you choose, you have 2/3 probability to win.
    Case 4. Following case 3, suppose the banker not only tells you that one of the coins is head, but also uncovers that coin. If you choose the other coin, your chance to win declines to 1/3.
    Case 5. Suppose the banker peeked the coins before they are covered. He tells you that one of the coins is tail. Then whatever coin you choose, you have 1/3 probability to win.
    Case 6. Following case 5, suppose the banker not only tells you that one of the coin is tail, but also uncovers that coin. If you choose the other coin, then your chance to win increases to 2/3.

    Mutually independency does not imply true independency

    Let a set $\{X_i\}$ of random variables be $k$-independent if $$\Pr(X_i) = \Pr(X_i\ |\ X_{a1},...,X_{ak})$$for any $i\not\in\{a1,...,ak\}$. Namely, one cannot infer the value of $X_i$ given the values of up to $k$ other random variables. Given any $k>2$, it is easy to construct $k+1$ random variables $X_0,...,X_k$ that are mutually independent but not $k$-dependent. Let $U_0,...,U_k$ be iid random variables uniformly distributed over $1,...,N.$ Define $X_i=U_i−U_{i+1~\rm{mod}~k+1} \mod N$. Since $X_0+\cdots+X_k=0$, they are obvious not $k$-independent. It is also easy to show that $X_i$ and $X_j$ are independent for $i\neq j$. In fact, $X_0,...,X_k$ are $m$-independent for any $2\le m <k.$
    An immediate consequence of this fact is: $\Pr(X) = \Pr(X\ |\ Y)$ and $\Pr(X) = \Pr(X\ |\ Z)$ doesn't imply that $\Pr(X) = \Pr(X\ |\ Y,Z)$.

    Thursday, February 20, 2014

    Scala: Duck Typing

    Scala offers a functionality known as structural typing, which allows to set a behaviour very similar to what dynamic languages allow to do when they support duck typing. The main difference between the two is that structural typing is a type-safe, static-typed implementation checked up at compile time. This means that you can create a method that receives an expected duck. At compile time, it would be checked that anything passed to the method can actually quack like a duck.
    We give some examples below to show how structural typing is used in Scala.

    Example 1

    case class ForInt(x: Int) {
      def myPrint() = println("Int: " + x)
    }
    case class ForDouble(x: Double) {
      def myPrint() = println("Double: " + x)
    }
    case class ForString(x: String) {
      def myPrint() = println("String: '" + x + "'")
    }
    val forInt: ForInt = new ForInt(3)
    val forString: ForString = new ForString("Hello World")
    val forDouble: ForDouble = new ForDouble(3.14159)
    /* 
    * This is like defining a "Printable" interface in Java. With structural typing,
    * however, a type doesn't have to implement a Printable interface explicitly 
    * in order to be recognized as printable. Scala compiler will check that for you.
    */
    type T = { def myPrint(): Unit } 
    val list: List[T] = List[T](forInt, forString, forDouble)
    list.foreach(_.myPrint())
    

    Example 2

    def using[A <: {def close(): Unit}, B](res: A)(op: A => B):
      B = try op(res) finally res.close
     
    def writeToFile(path: String, data: String): Unit = 
      using(new BufferedWriter(new OutputStreamWriter(
        new FileOutputStream(path), "UTF-8")))(_.write(data))
     
    def appendToFile(path: String, data: String): Unit =
      using(new BufferedWriter(new OutputStreamWriter(
        new FileOutputStream(path, true), "UTF-8")))(_.write(data))
      
    def printToFile(path: String)(op: PrintWriter => Unit): Unit = 
      using(new PrintWriter(new BufferedWriter(new OutputStreamWriter(
        new FileOutputStream(path), "UTF-8"))))(op)
    

    Technical remarks. In Scala 2.10 or later you should either add
    import scala.language.reflectiveCalls
    to any classes that use structural typing, or add the -language:reflectiveCalls option to the scalac command line, otherwise you will get a compile warning. For more info on why, see SIP 18 for more details.

    Calling methods in structural types is made using reflection, which means it will be much slower than a normal method call using a trait. For that reason, structural types in Scala should generally be avoided. Use them when you need them and in situations where the performance impact won't be too significant. See here for arguments on other points against the usefulness of structural typing in Scala. Another concern of duck typing is that it is not type safe. You can do something similar to duck typing in a type-safe and performant manner is to use type classes. Unfortunately, type classes cannot be realized very well in a JVM language, as Java bytecode is type-erasure. See this paper for a proposal arguably better than reflection to do structural typing in JVM languages.

    Related Posts Plugin for WordPress, Blogger...