This is ericpony's blog

Friday, February 21, 2014

Paradoxes in the World of Probability

Gambler’s ruin

Consider a symmetric random walk on line $x\ge 0$ starting from some $c>0$. Suppose the walker stops when he reaches $x=0$. Let $X(t)$ denote the location of the walker at time $t$. Then $\Pr(X(t)=0\ \mbox{for some}\ t)=1$, but $E[X(t)]=c$ for all $t$. The latter can be seen by noting that $X(t) = X(t-1) + Y$ where $\Pr(Y=1)=\Pr(Y=-1)=0.5$. (It might be surprising that $E[X(t)]$ is the same as in an unbounded random walk.)
Furthermore, if we define the time to stop as $T_k=\inf\{t:X(t)=k\}$, then for any $c>0$ we have $Pr\{T_0<\infty\}=1$ but $E[T_0]=\infty$. This is because that the distribution of $T$ is very heavily tailed. In fact, one can show that $\Pr\{T\ge n\}\approx \frac{a}{\sqrt{n}}$ for some constant $a$. An intuition to see $E[T]=\infty$ is as follows. Since $E[T_0] = c\cdot E[T_{c-1}]$, it suffices to consider $E[T_{c-1}]$. Conditioning on the direction of the first step: \begin{eqnarray*} E[T_{c-1}] & = & E[T_{c-1}\ |\ \mbox{left}] \Pr(\mbox{left}) + E[T_{c-1}\ |\ \mbox{right}] \Pr(\mbox{right})\\ & = & (1 + E[T_{c+1}]) \cdot 2^{-1}\\ & = & (1 + 2\cdot E[T_{c-1}]) \cdot 2^{-1} = 2^{-1} + E[T_{c-1}]. \end{eqnarray*} Hence the only possibility is $E[T_{c-1}] = \infty.$

The St. Petersburg paradox I

Consider a game where a player flips a fair coin and wins $2^n$ dollars if the head turns up at the $n$th flip. The game continues until the head turns up. Then the expect amount of money a player can win from the game is $\sum_{n\ge1} 2^n\cdot 2^{-n} = \infty$. Hence, to make this game fair, a player should pay an infinite amount of money to play this game!

The St. Petersburg paradox II

Following the previous example, how much should a player be charged if he is only allowed to play for at most $N$ rounds? Denote the reward of the player as $S_N$. It is tempting to estimate the reward by $E[S_N]=\sum_{n=1}^N 2^n\cdot 2^{-n} = N$. However, it can be shown that $\frac{S_N}{N\lg N}\rightarrow 1$ almost surely. Namely, when $N$ is large, the player is very likely to win about $N\lg N$ dollars from this game. Hence, $N\lg N$ is arguably a more reasonable price than $N$ to charge the player.

A winning strategy for a fair game

Consider a game where a player bets $X_n\ge0$ dollars on the $n$th round, $n\ge1$. After $X_n$ is decided, a fair coin is flipped. The player loses the bet if the tail turns up, and wins another $X_n$ dollars otherwise. Here is a strategy that guarantees a player to win 1 dollar almost surely from this game: (i) set $X_1=1$, (ii) set $X_{n+1}=2^n$ if he lost the $n$th round, and (iii) set $X_n=0$ for $n>N$ if he wins the $N$th round. Suppose the head first turns up at the $N$th flip. Then the expected amount of money he can win from the game is $2^N-(1+2^1+\cdots+2^{N-1})=1$ dollar. However, although the average number of rounds for this strategy is $1/0.5=2$, you are expected to pay $1\cdot 2^{-2}+2\cdot 2^{-3}+...=1/4+1/4+...=\infty$ dollars before you can get the dollar.

Waiting time for a better offer

Suppose you want to sell you car on the Internet. It turns out that if the offers to buy your house are i.i.d., then you may expect to wait forever to get an offer better than the first offer. To see this, denote the first offer by $X_0$ and the sequence of offers by $\{X_n:n\ge0\}$. Let $N=\inf\{n:X_n>X_0\}$ be the waiting time until the first offer better than $X_0$. Note that $$\Pr\{N>n\}=\Pr\{X_0=\max_{0\le k \le n} X_k\}\ge \frac{1}{n+1}$$ by symmetry. Hence $E[N]=\sum_{n\ge0} \Pr\{N>n\} = \sum_{n\ge1} \frac{1}{n} = \infty$. This fact suggests that you should accept the first offer if you have to make decision immediately after after an offer is made, and cannot change your mind after you made a decision.

Power of hints in a fair game

A banker tosses two fair coins and then covers the coins. To play this game, you choose to uncover one of them in blind. You win this game if the coin you choose is head.
Case 1. Without further information, you have a 1/2 chance to win regardless of the coin you choose.
Case 2. Suppose the banker uncovers one of the two coins. If you decide to choose the other coin, the chance you win is still 1/2 regardless of the result of the former coin. Hence, uncover a coin doesn't provide any information for the other coin.
Case 3. Suppose the banker peeked the coins before they are covered, and he tells you that one of the coins is head. Then whatever coin you choose, you have 2/3 probability to win.
Case 4. Following case 3, suppose the banker not only tells you that one of the coins is head, but also uncovers that coin. If you choose the other coin, your chance to win declines to 1/3.
Case 5. Suppose the banker peeked the coins before they are covered. He tells you that one of the coins is tail. Then whatever coin you choose, you have 1/3 probability to win.
Case 6. Following case 5, suppose the banker not only tells you that one of the coin is tail, but also uncovers that coin. If you choose the other coin, then your chance to win increases to 2/3.

Mutually independency does not imply true independency

Let a set $\{X_i\}$ of random variables be $k$-independent if $$\Pr(X_i) = \Pr(X_i\ |\ X_{a1},...,X_{ak})$$for any $i\not\in\{a1,...,ak\}$. Namely, one cannot infer the value of $X_i$ given the values of up to $k$ other random variables. Given any $k>2$, it is easy to construct $k+1$ random variables $X_0,...,X_k$ that are mutually independent but not $k$-dependent. Let $U_0,...,U_k$ be iid random variables uniformly distributed over $1,...,N.$ Define $X_i=U_i−U_{i+1~\rm{mod}~k+1} \mod N$. Since $X_0+\cdots+X_k=0$, they are obvious not $k$-independent. It is also easy to show that $X_i$ and $X_j$ are independent for $i\neq j$. In fact, $X_0,...,X_k$ are $m$-independent for any $2\le m <k.$
An immediate consequence of this fact is: $\Pr(X) = \Pr(X\ |\ Y)$ and $\Pr(X) = \Pr(X\ |\ Z)$ doesn't imply that $\Pr(X) = \Pr(X\ |\ Y,Z)$.

Thursday, February 20, 2014

Scala: Duck Typing

Scala offers a functionality known as structural typing, which allows to set a behaviour very similar to what dynamic languages allow to do when they support duck typing. The main difference between the two is that structural typing is a type-safe, static-typed implementation checked up at compile time. This means that you can create a method that receives an expected duck. At compile time, it would be checked that anything passed to the method can actually quack like a duck.
We give some examples below to show how structural typing is used in Scala.

Example 1

case class ForInt(x: Int) {
  def myPrint() = println("Int: " + x)
}
case class ForDouble(x: Double) {
  def myPrint() = println("Double: " + x)
}
case class ForString(x: String) {
  def myPrint() = println("String: '" + x + "'")
}
val forInt: ForInt = new ForInt(3)
val forString: ForString = new ForString("Hello World")
val forDouble: ForDouble = new ForDouble(3.14159)
/* 
* This is like defining a "Printable" interface in Java. With structural typing,
* however, a type doesn't have to implement a Printable interface explicitly 
* in order to be recognized as printable. Scala compiler will check that for you.
*/
type T = { def myPrint(): Unit } 
val list: List[T] = List[T](forInt, forString, forDouble)
list.foreach(_.myPrint())

Example 2

def using[A <: {def close(): Unit}, B](res: A)(op: A => B):
  B = try op(res) finally res.close
 
def writeToFile(path: String, data: String): Unit = 
  using(new BufferedWriter(new OutputStreamWriter(
    new FileOutputStream(path), "UTF-8")))(_.write(data))
 
def appendToFile(path: String, data: String): Unit =
  using(new BufferedWriter(new OutputStreamWriter(
    new FileOutputStream(path, true), "UTF-8")))(_.write(data))
  
def printToFile(path: String)(op: PrintWriter => Unit): Unit = 
  using(new PrintWriter(new BufferedWriter(new OutputStreamWriter(
    new FileOutputStream(path), "UTF-8"))))(op)

Technical remarks. In Scala 2.10 or later you should either add
import scala.language.reflectiveCalls
to any classes that use structural typing, or add the -language:reflectiveCalls option to the scalac command line, otherwise you will get a compile warning. For more info on why, see SIP 18 for more details.

Calling methods in structural types is made using reflection, which means it will be much slower than a normal method call using a trait. For that reason, structural types in Scala should generally be avoided. Use them when you need them and in situations where the performance impact won't be too significant. See here for arguments on other points against the usefulness of structural typing in Scala. Another concern of duck typing is that it is not type safe. You can do something similar to duck typing in a type-safe and performant manner is to use type classes. Unfortunately, type classes cannot be realized very well in a JVM language, as Java bytecode is type-erasure. See this paper for a proposal arguably better than reflection to do structural typing in JVM languages.

Wednesday, February 12, 2014

Algorithm: Detecting cycles in a linked list

Detecting cycle in a linked list I

Problem Detect if there is a cycle in a singly-linked list. Do it in $O(n)$ time and $O(1)$ space.

Solution Assume without loss of generality that a list node never points to itself. To detect a cycle, we iterate the list with two pointers such that one pointer moves faster than the other. We argue that the two pointers eventually coincide (i.e. point to the same node) if the list contains a cycle.
Suppose that in each iteration, the faster pointer advances $m$ steps and the slower pointer advances one step. We check whether the two pointers coincide at the end of an iteration. Let $d(x,y)\in[0,\infty]$ denote the minimal number of moves a pointer has to made starting from node $x$ to reach node $y$. Let $A$ be the first node of the cycle visited by the slower pointer, and $B$ be the node the pointers point to when they coincide. Denote the number of nodes in the cycle by $k$.
When the two pointers coincide, the number of moves made by the faster pointer is $m$ times larger than the moves made by the slower pointer. Hence, the two pointers eventually coincide iff there exist integers $p,q,v,h\ge0$ such that $h=d(head,A) \ge 0$, $v=d(A,B)\le k-1$, and $h+pk+v=m(h+qk+v)$ with $m,k\ge 2$, which holds iff there exist $p,q,v\ge 0$ satisfying \begin{equation}k(m-1) > v(m-1) = k(p-mq)-h(m-1) \ge 0,\label{coin-eg1}\end{equation}which, by setting $X=p-mq$, can be be written as $$k (m-1) > kX - h (m-1) \ge 0.$$ If $k\ge h$, then all $p,q\ge 0$ such that $X=m-1$, namely $p=mq+m-1$, satisfy \eqref{coin-eg1}. If $k < h$, then there exists $n\ge 1$ such that $kn\ge h > k (n-1)$. In this case, observe that when $X=n(m-1)$, namely $p=mq+nm-n$, we have
 $kX-h(m-1) = kn(m-1) - h(m-1) \ge 0$, and
 $k(m-1) = kn(m-1) - k(n-1)(m-1) > kn(m-1) - h(m-1) = kX - h(m-1)$
It follows that all $p,q\ge 0$ such that $p=mq+mn-n$ satisfy \eqref{coin-eg1}. This concludes the proof.
Discussion. The above analysis gives two sufficient conditions for coincidence where the only constraint for $q$ is $q\ge 0$. By setting $q=0$, we see that one can detect the cycle at the $h+v<n$th iteration, which is already the minimal number possible. Also, $v=kp/(m-1)$, where $p$ is the minimal integer to make $v$ an integer. We can obtain the size of the cycle with an additional $k$ iterations after we found node $B$.

Detecting cycle in a linked list II

Problem Detect if there is a cycle in a singly-linked list, and if there is one, find the node where the cycle begins. Do it in $O(n)$ time and $O(1)$ space.

Solution We assume that the list contains a cycle and use the notations defined before. Iterate the list as we did in the previous solution and set $m=2$. By the above argument, there exist $p,q\ge0$ such that $h+pk+v=2(h+qk+v)$, which implies $h+v=k(p-2q)$. Hence, if a pointer advances $h$ steps from node $B$, it will reach node $A$ after looping the cycle $p-2q$ times. To find node $A$, we can launch two fresh pointers from the head and node $B$ respectively, both moving one node forward at a time. These two pointers will both point to node $A$ after $h$ moves. See here for an implementation of the algorithm.

Discussion. In this solution, we stipulate the ratio of speeds $m=2$ for the two pointers. I don't know if it is possible to find the node within the required complexity given an arbitrary ratio of speeds. Also, for the ease of proof, we check coincidence of the pointers only at the end of each iteration. A conscientious reader may want to prove the correctness of an algorithm that checks coincidence after each step, and see if it is indeed more efficient than the simplified version presented here.

Further readings

1. http://en.wikipedia.org/wiki/Cycle_detection

Scala: Case Classes

In essential, case classes can be seen as plain and immutable data-holding objects that should exclusively depend on their constructor arguments.
Case classes differ with ordinary classes in that they
  • automatically define hashCode, equals, and toString methods
  • automatically define getter methods for the constructor arguments
This functional concept allows us to
  • use a compact initialization syntax, e.g., Node(Leaf(2), EmptyLeaf)
  • decompose them using pattern matching
  • have equality comparisons implicitly defined
Case classes are by default immutable data structures, as they implicitly use val constructor parameters. They are suitable to works with hash maps or sets, since they have a valid, stable hashCode. One can override the default behavior by prepending each constructor argument with var instead of val for case classes. In this way, you get the same getter/setter generation just like ordinary classes. However, making case classes mutable causes their equal and hashCode methods to be time variant. If an object performs stateful computations on the inside or exhibits other kinds of complex behaviour, it should instantiate an ordinary class instead of a case class. [1]

A case class automatically creates a companion object with the same name as the class, which contains apply and unapply methods. The apply method enables constructing instances without prepending with new. The unapply extractor method enables the pattern matching that mentioned in Note I. The compiler optimizes the speed of pattern matching for case classes. [2]

In combination with inheritance, case classes are often used to mirror algebraic data types. A very simple example of such types are trees. A binary tree, for instance, can be implemented like this:
sealed abstract class Tree
case class Node(left: Tree, right: Tree) extends Tree
case class Leaf[A](value: A) extends Tree
case object EmptyLeaf extends Tree
That enable us to do the following:
// DSL-like assignment:
val treeA = Node(EmptyLeaf, Leaf(5))
val treeB = Node(Node(Leaf(2), Leaf(3)), Leaf(5))
// modification through cloning:
val treeC = treeA.copy(left = treeB.left)
// Pretty printing
println("Tree A: " + treeA)
println("Tree B: " + treeB)
println("Tree C: " + treeC)
// Comparison
println("Tree A == Tree B: %s" format (treeA == treeB).toString)
println("Tree B == Tree C: %s" format (treeB == treeC).toString)
// Pattern matching
treeA match {
  case Node(EmptyLeaf, right) => println("Can be reduced to " + right)
  case Node(left, EmptyLeaf) => println("Can be reduced to " + left)
  case _ => println(treeA + " cannot be reduced")
}
Note that trees construct and deconstruct (through pattern match) with the same syntax, which is also exactly how they are printed (minus spaces).

Technically, one may treat case classes as instances of Product and thus inherit these methods:
def productElement(n: Int): Any
def productArity: Int
def productIterator: Iterator[Any]
where the productArity returns the number of class parameters, productElement(i) returns the ith parameter, and productIterator allows iterating through them.

References

1. Case classes are cool
2. Case Classes and Extractors, p.15

Wednesday, February 5, 2014

Deriving the Y Combinator in Javascript

A fixpoint combinator is a higher-order function that computes a fixed point of other functions. Fix-point computation is essential in the theory of programming languages, because it allows the implementation of recursion in the form of rewrite rules, without explicit support from the language's runtime engine. Fixpoint combinators are also used to show that typed lambda calculus is Turing complete, which is an important result in the theory of computation that provides a theoretical foundation for functional programming.

The Y combinator is one of the many fixpoint combinators in lambda calculus. These fixpoint combinators implement recursion without requiring a function to call itself. The requirement, though, is that the language supports anonymous and call-by-name functions. If you want to use this in a strict call-by-value language, you will need in addition the related Z combinator; otherwise the Y combinator derived here will lead to an infinite call stack.

Three derivations of fixpoint combinators are introduced in this post:

Derivation based on the Fixpoint Theorem

Let $D$ be a chain-complete partially ordered set (abbr. ccpo) and $FIX$ be a fixpoint operator for Scott continuous function on $D$. Hence, if $g:D\rightarrow D$ is Scott continuous then $FIX(g)$ is the least upper bound of chain $\{g^n(\perp):n\ge0\}$. The fact that operator $FIX$ is well-defined follows from the Kleene Fixpoint Theorem, see here or [2] for more details. It turns out that the property $g(FIX(g))=FIX(g)$ of a fixpoint suffices to characterize the computation of $FIX$:
function FIX(g) = { return function(x){ g(FIX(g))(x); } }
This function effectively finds a fixpoint of functional $g$ if there exists one. Note the return value is eta-abstracted so as to avoid infinite recursion due to strict evaluation. By exploiting a construct called the U combinator, we can eliminate explicit recursion from the definition of $FIX$ (see here for a step-by-step example of recursion elimination using the U combinator) and obtain
function FIX(g) { 
    return (function(h) { return h(h,g); })
    (function (h,g) { 
        return function(x) { return g(h(h,g))(x); }
    }) 
}
Now fix a function $f$ on non-negative integers. Let $f_m$ be a partial function such that $f_m(n)=f(n)$ for $0\le n \le m$ and $f_m(n)=\perp$ otherwise. Suppose that $g$ is a functional such that $g(f_m)=f_{m+1}$ for all $m\ge0$. Since $g$ is Scott continuous on chain $\{f_m:m\ge0\}$, it follows from the Fixpoint Theorem that $FIX(g)$ exists and $FIX(g)=\bigsqcup \{f_m\}=f$. For example, if $f$ is the factorial function and $g$ is
function g(f) { return function(n) { return n<=1 ? 1 : n*f(n-1); } }
then $FIX(g)(n)=f(n)=n!$ for $n=0,1,2,...$.

An ad hoc derivation of the Y combinator

In the following, I shall start from an ordinary recursive factorial function, apply transformations over the initial function, eliminate the recursive structure step by step, and derive the Y combinator in the end. A recursive factorial function is
function fact (n) {
    if(n<2) return 1; else return fact(n-1)*n;
}
Now consider a function, denoted by makeFact, that takes a factorial function as parameter and generates another factorial function that "unwind" the computation of factorial for one step:
function makeFact (fact) {
    return function(n) {
        if(n<2) return 1; else return fact(n-1)*n;
    }
}
Here we have makeFact(fact)(n) = fact(n). It seems that we cannot use makeFact to generate factorial without already having a factorial function at hand. However, it turns out that makeFact does not really need a factorial function. Given that makeFact takes a factorial function and returns a factorial function, the following function will take makeFact as input and generates a factorial function:
function createFact (makeFact) {
    var fact = function(n) { return makeFact(fact)(n); }
    return fact;
}
Instead of using a factorial function, createFact uses a function that generates a factorial function to generate another factorial function, ie., createFact(makeFact)(n) = fact(n). Thus it avoids the need of a predefined factorial function. However, the definition of variable fact (in the function body of makeFact) is recursive, since it needs to call itself to compute the return value. Such a reference can be avoided if we define another function that makes the reference for it:
function createFact (makeFact) {
    var genFact = function() {
        var fact = function(n) { return genFact()(n); };
        return makeFact(fact);
    };
    return genFact();
}
Now, the definition of fact is not recursive, but we have a new function genFact that refers to itself. This reference can be made via a parameter, rather than calling the function directly:
function createFact (makeFact) {
    var genFact = function(genFactRef) {
        var fact = function(n) { return genFactRef(genFactRef)(n); };
        return makeFact(fact);
    };
    return genFact(genFact);
}
Now, all the functions (ie. genFact and fact) declared in the function body of createFact are non-recursive. We proceed to show that the declarations can be further eliminated and the result is a version of the famous Y combinator. First, we avoid declaring fact by making it an anonymous function:
function createFact (makeFact) {
    var genFact = function(genFactRef) {
        return makeFact(function(n) { return genFactRef(genFactRef)(n) });
    };
    return genFact(genFact);
}
Also, we use an anonymous helper function function(f){ return f(f) } to return genFact(genFact), avoiding the need of declaring genFact:
function createFact (makeFact) {
    return function(f){ return f(f); }(function(genFactRef){
        return makeFact(function(n){ return genFactRef(genFactRef)(n) });
    });
}
Now we reach a function which has no declarations, only parameters. By renaming the parameters to general names, we call this function the Y combinator (for single-parameter functions):
function Y (rec) {
    return function(f){ return f(f) }(function(f) {
        return rec(function(x) { return f(f)(x) });
    });
} // In untyped lambda calculus: Y = λf.(λx.x x) (λx.f (x x))
Take a close look at the definition of Y. It uses only three kinds of expression: anonymous functions, variable reference and function application. It also makes no explicit reference to an outside variable or to itself. The Y combinator allows a concise transformation from a recursive function to a non-recursive function. For example, we can use the Y combinator function to define non-recursive factorial function and a Fibonacci function. To compute factorial,
Y(function(f) { return function(n) { return n<2 ? 1 : n*f(n-1); }})
To compute Fibonacci number,
Y(function(f) { return function(n) { return n<=2 ? 1 : f(n-1)*f(n-2); }})
Finally, if you need a Y combinator for functions that takes more than one argument, you can use Javascript's built-in argument object to obtain all the arguments indefinitely:
function Y(rec) {
    return function(f) { return f(f) }(function(f) {
        return rec(function() { 
            return f(f).apply(null, arguments) 
        });
    });
}

An ad hoc derivation of the U combinator

The Y combinator it not the only way to eliminate recursive calls from a program. The U combinator also does the job, and it is much simpler than the Y combinator. The derivation of the U combinator is also more intuitive and straightforward, at least for those who are comfortable with currying. The ease of derivation, however, comes at the cost that the input function of the U combinator has to make self-application explicit, which may make the user program (which is the factorial function here) look more artificial.
First note that a reference to function itself can be provided to the function as a parameter, avoiding the self reference:
function fact0(f,n) { 
    return n<2 ? 1 : n*f(f,n-1);
}
Hence we have fact0(fact0, n) = fact(n). If we define a function U as
function U(f,n) { return f(f,n); }
then U(fact0, n) yields fact(n). Here we reach a function with two inputs, but we are looking for a combinator that should take only one input. Can we work around this problem? Well a smart guy by the name of Haskell Curry has a neat trick: if you have good higher-order functions then you only need functions of one variable. The proof is that you can get from functions of two (or more in the general case) variables to one variable with a purely mechanical text transformation called currying. For example, by currying f(x,y) we can obtain a function g such that g(x)(y) = f(x,y):
var f = function(x,y) { ... } // original
var g = function(x) { return function(y) { return f(x,y) } } // curried
Now we just apply this transformation in the right places to get our final version of the U combinator.
function fact0(f,n) { // fact0(fact,n) = fact(n)
    return n<2 ? 1 : n*f(n-1);
}
function fact0(f) { // curried, now fact0(fact0)(n) = fact(n)
    return function(n) { return n<2 ? 1 : n*f(f)(n-1); } 
}
function U(f,n) { // U(fact0,n) = fact(n)
    return f(f)(n);
}
function U(f) { // curried, reaching the famous U combinator λf.(f f)
    return function(n) { return f(f)(n); }
}
function U(f) { // allowing an arbitrary number of parameters
    return function() {
        return f(f).apply(null, arguments); 
    }
}
One can check that U(fact0)(n)=fact(n), where U(fact0) can be expanded to
U(function(f) { return function(n) { return n<2 ? 1 : n*f(f)(n-1); }})

References

1. Many faces of the fixed-point combinator: This web page also describes in detail the fix-point combinators for mutual recursion, called poly-variadic fixpoint combinators. It derives the simplest combinator I've seen so far.
2. Semantics with Applications: An Appetizer, Section 5.2.
3. Fixed-point combinators in JavaScript: Memoizing recursive functions
4. Also check Y combinators on Rosetta Code, and many other fixed-point combinators on Wiki.

Related Posts Plugin for WordPress, Blogger...