Ericpony's Blog

Wednesday, February 5, 2014

Deriving the Y Combinator in Javascript

A fixpoint combinator is a higher-order function that computes a fixed point of other functions. Fix-point computation is essential in the theory of programming languages, because it allows the implementation of recursion in the form of rewrite rules, without explicit support from the language's runtime engine. Fixpoint combinators are also used to show that typed lambda calculus is Turing complete, which is an important result in the theory of computation that provides a theoretical foundation for functional programming.

The Y combinator is one of the many fixpoint combinators in lambda calculus. These fixpoint combinators implement recursion without requiring a function to call itself. The requirement, though, is that the language supports anonymous and call-by-name functions. If you want to use this in a strict call-by-value language, you will need in addition the related Z combinator; otherwise the Y combinator derived here will lead to an infinite call stack.

Three derivations of fixpoint combinators are introduced in this post:

Derivation based on the Fixpoint Theorem

Let $D$ be a chain-complete partially ordered set (abbr. ccpo) and $FIX$ be a fixpoint operator for Scott continuous function on $D$. Hence, if $g:D\rightarrow D$ is Scott continuous then $FIX(g)$ is the least upper bound of chain $\{g^n(\perp):n\ge0\}$. The fact that operator $FIX$ is well-defined follows from the Kleene Fixpoint Theorem, see here or [2] for more details. It turns out that the property $g(FIX(g))=FIX(g)$ of a fixpoint suffices to characterize the computation of $FIX$:

function FIX(g) = { return function(x){ g(FIX(g))(x); } }

This function effectively finds a fixpoint of functional $g$ if there exists one. Note the return value is eta-abstracted so as to avoid infinite recursion due to strict evaluation. By exploiting a construct called the U combinator, we can eliminate explicit recursion from the definition of $FIX$ (see here for a step-by-step example of recursion elimination using the U combinator) and obtain

function FIX(g) { 
    return (function(h) { return h(h,g); })
    (function (h,g) { 
        return function(x) { return g(h(h,g))(x); }
    }) 
}

Now fix a function $f$ on non-negative integers. Let $f_m$ be a partial function such that $f_m(n)=f(n)$ for $0\le n \le m$ and $f_m(n)=\perp$ otherwise. Suppose that $g$ is a functional such that $g(f_m)=f_{m+1}$ for all $m\ge0$. Since $g$ is Scott continuous on chain $\{f_m:m\ge0\}$, it follows from the Fixpoint Theorem that $FIX(g)$ exists and $FIX(g)=\bigsqcup \{f_m\}=f$. For example, if $f$ is the factorial function and $g$ is

function g(f) { return function(n) { return n<=1 ? 1 : n*f(n-1); } }

then $FIX(g)(n)=f(n)=n!$ for $n=0,1,2,...$.

An ad hoc derivation of the Y combinator

In the following, I shall start from an ordinary recursive factorial function, apply transformations over the initial function, eliminate the recursive structure step by step, and derive the Y combinator in the end. A recursive factorial function is

function fact (n) {
    if(n<2) return 1; else return fact(n-1)*n;
}

Now consider a function, denoted by makeFact, that takes a factorial function as parameter and generates another factorial function that "unwind" the computation of factorial for one step:

function makeFact (fact) {
    return function(n) {
        if(n<2) return 1; else return fact(n-1)*n;
    }
}

Here we have makeFact(fact)(n) = fact(n). It seems that we cannot use makeFact to generate factorial without already having a factorial function at hand. However, it turns out that makeFact does not really need a factorial function. Given that makeFact takes a factorial function and returns a factorial function, the following function will take makeFact as input and generates a factorial function:

function createFact (makeFact) {
    var fact = function(n) { return makeFact(fact)(n); }
    return fact;
}

Instead of using a factorial function, createFact uses a function that generates a factorial function to generate another factorial function, ie., createFact(makeFact)(n) = fact(n). Thus it avoids the need of a predefined factorial function. However, the definition of variable fact (in the function body of makeFact) is recursive, since it needs to call itself to compute the return value. Such a reference can be avoided if we define another function that makes the reference for it:

function createFact (makeFact) {
    var genFact = function() {
        var fact = function(n) { return genFact()(n); };
        return makeFact(fact);
    };
    return genFact();
}

Now, the definition of fact is not recursive, but we have a new function genFact that refers to itself. This reference can be made via a parameter, rather than calling the function directly:

function createFact (makeFact) {
    var genFact = function(genFactRef) {
        var fact = function(n) { return genFactRef(genFactRef)(n); };
        return makeFact(fact);
    };
    return genFact(genFact);
}

Now, all the functions (ie. genFact and fact) declared in the function body of createFact are non-recursive. We proceed to show that the declarations can be further eliminated and the result is a version of the famous Y combinator. First, we avoid declaring fact by making it an anonymous function:

function createFact (makeFact) {
    var genFact = function(genFactRef) {
        return makeFact(function(n) { return genFactRef(genFactRef)(n) });
    };
    return genFact(genFact);
}

Also, we use an anonymous helper function function(f){ return f(f) } to return genFact(genFact), avoiding the need of declaring genFact:

function createFact (makeFact) {
    return function(f){ return f(f); }(function(genFactRef){
        return makeFact(function(n){ return genFactRef(genFactRef)(n) });
    });
}

Now we reach a function which has no declarations, only parameters. By renaming the parameters to general names, we call this function the Y combinator (for single-parameter functions):

function Y (rec) {
    return function(f){ return f(f) }(function(f) {
        return rec(function(x) { return f(f)(x) });
    });
} // In untyped lambda calculus: Y = λf.(λx.x x) (λx.f (x x))

Take a close look at the definition of Y. It uses only three kinds of expression: anonymous functions, variable reference and function application. It also makes no explicit reference to an outside variable or to itself. The Y combinator allows a concise transformation from a recursive function to a non-recursive function. For example, we can use the Y combinator function to define non-recursive factorial function and a Fibonacci function. To compute factorial,

Y(function(f) { return function(n) { return n<2 ? 1 : n*f(n-1); }})

To compute Fibonacci number,

Y(function(f) { return function(n) { return n<=2 ? 1 : f(n-1)*f(n-2); }})

Finally, if you need a Y combinator for functions that takes more than one argument, you can use Javascript's built-in argument object to obtain all the arguments indefinitely:

function Y(rec) {
    return function(f) { return f(f) }(function(f) {
        return rec(function() { 
            return f(f).apply(null, arguments) 
        });
    });
}

An ad hoc derivation of the U combinator

The Y combinator it not the only way to eliminate recursive calls from a program. The U combinator also does the job, and it is much simpler than the Y combinator. The derivation of the U combinator is also more intuitive and straightforward, at least for those who are comfortable with currying. The ease of derivation, however, comes at the cost that the input function of the U combinator has to make self-application explicit, which may make the user program (which is the factorial function here) look more artificial.

First note that a reference to function itself can be provided to the function as a parameter, avoiding the self reference:

function fact0(f,n) { 
    return n<2 ? 1 : n*f(f,n-1);
}

Hence we have fact0(fact0, n) = fact(n). If we define a function U as

function U(f,n) { return f(f,n); }

then U(fact0, n) yields fact(n). Here we reach a function with two inputs, but we are looking for a combinator that should take only one input. Can we work around this problem? Well a smart guy by the name of Haskell Curry has a neat trick: if you have good higher-order functions then you only need functions of one variable. The proof is that you can get from functions of two (or more in the general case) variables to one variable with a purely mechanical text transformation called currying. For example, by currying f(x,y) we can obtain a function g such that g(x)(y) = f(x,y):

var f = function(x,y) { ... } // original
var g = function(x) { return function(y) { return f(x,y) } } // curried

Now we just apply this transformation in the right places to get our final version of the U combinator.

function fact0(f,n) { // fact0(fact,n) = fact(n)
    return n<2 ? 1 : n*f(n-1);
}
function fact0(f) { // curried, now fact0(fact0)(n) = fact(n)
    return function(n) { return n<2 ? 1 : n*f(f)(n-1); } 
}
function U(f,n) { // U(fact0,n) = fact(n)
    return f(f)(n);
}
function U(f) { // curried, reaching the famous U combinator λf.(f f)
    return function(n) { return f(f)(n); }
}
function U(f) { // allowing an arbitrary number of parameters
    return function() {
        return f(f).apply(null, arguments); 
    }
}

One can check that U(fact0)(n)=fact(n), where U(fact0) can be expanded to

U(function(f) { return function(n) { return n<2 ? 1 : n*f(f)(n-1); }})

References

1. Many faces of the fixed-point combinator: This web page also describes in detail the fix-point combinators for mutual recursion, called poly-variadic fixpoint combinators. It derives the simplest combinator I've seen so far.
2. Semantics with Applications: An Appetizer, Section 5.2.
3. Fixed-point combinators in JavaScript: Memoizing recursive functions
4. Also check Y combinators on Rosetta Code, and many other fixed-point combinators on Wiki.

Friday, January 31, 2014

Tossing coins

Simulate a biased coin with a fair coin

The following algorithm simulates a biased coin with head probability 0<p<1, using a random number generator toss_coin that returns 0 and 1 with equal probability.

// Pre-condition: p is a rational number and 0 < p < 1
while (true) {
    var p = p * 2,
    var p2 = p - floor(p);
    var msb = p - p2;
    var random_bit = toss_coin(); // 0 or 1
    if (random_bit < msb) {
        print "head";
        break;
    }
    if (random_bit > msb) {
        print "tail";
        break;
    }
    p = p2;
}
// Post-condition: "head" is outputted with probability p
//                 "tail" is outputted with probability 1-p

To see why it works, consider a random number 0 < r < 1 generated by enumerating its binary representation with a fair coin. Observe that given 0 < p < 1, the probability that 0 < r < p is exactly p. The algorithm then enumerates p and r, digit by digit, and determines the return value according to the following fact: r < p if and only if r_i < p_i for some i and r_j = p_j for all j < i, where x_k denotes the kth digit in the binary representation of x.

The following code fragment is adapted from the book Abstraction, Refinement and Proof for Probabilistic Systems, A McIver, C Morgan, 2005. Succinct as it is, its behavior is not as straightforward to understand as the above one and we skip the analysis of its correctness.

// Pre-condition: p is a rational number and 0 < p < 1
var x = p;
while (toss_coin() == 1) {
    x = 2 * x;
    if(x > 1){
      x = x - 1;
    }
}
// Post-condition: x > 0.5 with probability p
print x > 0.5 ? "head" : "tail";

Simulate a fair coin with a biased coin

while (true) {
    var a = toss_biased_coin();
    var b = toss_biased_coin();
    if(a == b){
      print a == 1 ? "head" : "tail";
      break;
    }
}

The expected running time is $\frac{2}{1-2p(1-p)}$.

Checking whether a coin is fair

First, note that it is impossible to rule out arbitrarily small deviations from fairness such as might be expected to affect only one flip in a lifetime of flipping. Also, it is always possible for a biased coin to happen to turn up exactly 10 heads in 20 flips. As such, any fairness test must only establish a certain degree of confidence in a certain degree of fairness (a certain maximum bias). In more rigorous terminology, this problem can be viewed as a special case of determining the parameters of a Bernoulli process, given only a limited sample of Bernoulli trials.
Suppose that there are $h$ heads and $t$ tails in an experiment of $n=h+t$ tosses. The best estimator for the actual head probability $p$ is $$q=\frac{h}{h+t}.$$ One can associate this estimator with a margin of error $\epsilon$ at a particular confidence level, which is given by the Z-value of a standard normal distribution in the following sense: $$P\{|p-q|\le z\sigma\}=P\{|Z|\le z\},$$ where $\sigma$ is the standard deviation of error of the estimate and $Z$ is a standard-normal distributed random variable. For $n$ Bernoulli trials, the standard deviation is $$\sigma=\sqrt{\frac{p(1-p)}{n}}\le\frac{1}{2\sqrt n}.$$ Hence, suppose that $P\{|Z|\le z\}=c$ then we have $$P\{|p-q|\le \frac{z}{2\sqrt n}\}\ge c.$$ For example, by looking up the table we see that $z=2.5759$ for $c=99\%$. Suppose now $h=54$ and $t=46$, then the actual head probability $p$ falls into $0.54\pm 0.12879$ at 99% level of confidence.

Saturday, January 11, 2014

Scala: Applications of implicit constructs

Prefix-to-Infix conversion. Suppose we have $F: Int \rightarrow Double \rightarrow Boolean$ and we would like to define an infix operator $\otimes$ such that $a \otimes b = F(a, b)$. The infix call is technically interpreted as a.⊗(b), so we have to add a $\otimes$ method to class Int. However, we cannot do this easily because Int is defined in the core Scala library. This situation is where Scala's implicit conversion can come to play: an implicit class defined as follows would perfectly serve our purpose:

implicit class Converter(b: Double) {
  def ⊗(a: Int): Boolean = F(a, b)
}

Note that the class name doesn't matter. The point is to use the keyword implicit and provide the function $\otimes$ with the correct signature. Now, when the compiler encounters something like a⊗b and tries to resolve the method call, it does not find a method named "$\otimes$" in class Int. It will then look for any implicit conversions it could use to transform a into something that has a method named "$\otimes$" with an applicable signature. Assuming our Converter class is in the right scope, the compiler will implicitly construct a Converter object from a and call $\otimes$ on that object instead.
PS. Scala defines a default converter from type Any to type String, which simply invokes the toString method of Any.

Type classes. Suppose we have a function $F: A \rightarrow B$ and we would like it to work only for a restricted subset $A'$ of $A$. There is no way to express such type constraint through inheritance. In Scala, however, we can use the type class pattern to set the constraint. Here is the idea: 1) create an abstract class that will represent a type class, 2) create elements representing the classes belonging to that type class, and 3) use an implicit parameter to restrict the type parameter of $F$ to that type class.
More concretely, suppose $A$ is the Numeric type and we would like $F$ to work only for Int and Long, but not any other subtype of Numeric. We first define an abstract type class:

abstract class Acceptable[T]

Next, we create the members of the type class, one for Int and another for Long. To make them available without need for import statements, we put them inside the companion class to Acceptable.

object Acceptable {
  implicit object AllowInt extends Acceptable[Int]
  implicit object AllowLong extends Acceptable[Long]
}

Finally, we allow $F$ to accept only type Int and Long by letting it take an implicit parameter:

def F[T](t: T)(implicit constraint: Acceptable[T]) = ...

Now, if you try to call $F(x)$ on an argument $x$ of type other than Int and Long, you will get an error at compile time, because the compiler cannot find the implicit object for that type. You can increase the number of allowed types by defining new implicit objects in the right scope.

Tuesday, November 19, 2013

Message authentication code

In almost all situations the damage that modified message can do is far greater than the damage of leaking the plaintext. Therefore, you should always combine encryption with authentication.

A secure MAC must have existential unforgeability under an adaptive chosen-message attack.

Let $F:\mathcal K\times \mathcal X \rightarrow \mathcal Y$ be a PRF. Then if we set $I=(S,V)$ such that $S(k,m)=F(k,m)$ and $V(k,m,t)=yes$ iff $F(k,m)=t$, then for all PPT-adversary $A$ that attacks $I$, there is a PPT-adversary $B$ that attacks $F$ and $$Adv_{MAC}[A,I]\le Adv_{PRF}[B,F] + \frac{1}{|\mathcal Y|}.$$It follows that if $F$ is secure, then the derived MAC is secure given that $|\mathcal Y|$ is large, say $|\mathcal Y|=2^{20}.$

Algorithm: Order Statistics

Finding dominating members

Problem Given $k\ge2$ and array $A$ with $|A|=n$, find the elements with frequency greater than $n/k$ in $O(nk)$ time. (When $k=2$, this problem reduces to finding the majority.)

Solution Given a multiset $A$ and $k>0$, we call a number $(A,k)$-dominating if its frequency in $A$ is greater than $|A|/k$. Observe that if $S\subset A$ and $|S|=k$, then $(A,k)$-dominating implies $(A/S,k)$-dominating.
Now we can solve this problem by removing sets of $k$ distinct elements from $A$ until no such set exists. To do this efficiently, we enumerate $A$ and in the meantime maintain $k-1$ counters to record the frequencies of duplicate elements. For each element, we check if all $k-1$ counters are nonzero. If so, we can remove $k$ distinct elements from $A$ by decrementing all counters by 1. Otherwise, we record the element by incrementing the associated counter by 1. At the end of enumeration, elements that dominate $A$ must have the maximal counter values in the record. Note that the reverse is not true, since it is possible that dominating elements don't exist in the first place. Hence, we collect the elements associated with the maximal counter values and return them if they indeed dominate. [Example Code]

Sunday, August 18, 2013

Cryptography: Notions of security

This article briefly discusses the relations between the following notions of security:

Ciphertext integrity
Ciphertext indistinguishability
Semantic security
Existential unforgeability
Collision resistance

Ciphertext integrity

It is computationally infeasible to create a ciphertext that decrypts correctly. More precisely, if the decryption key is chosen at random, then a ciphertext forged in polynomial time fails to decrypt by that key with very high probability. The main purpose of CI is to ensure that the ciphertext is not modified during the transmission from the sender to the receiver. Note that when we say CI, we actually means CI under CPA.

In general, good encryption does not necessarily imply integrity.
Encryption schemes that never reject cannot be CI. For example, CBC with random IV never rejects a ciphertext, hence the adversary easily wins CI game.

Ciphertext indistinguishability

It is computationally infeasible to distinguish two ciphertexts with respect to their plaintexts. More precisely, given two plaintexts $m_0, m_1$ such that $|m_0|=|m_1|$ and $c\in_r\{E(k,m_0),E(k,m_1)\}$, a polynomial-time adversary has no efficient way to tell the plaintext of $c$ significantly better than random guess. Ciphertext indistinguishability (IND) is identified by the type of attack and can be ordered according to their strength of security as follows: IND-CPA < IND-CCA1 < IND-CCA2. Note that IND-CPA plus CI imply IND-CCA1.

An encryption system with ciphertext indistinguishability must be randomized. For public key encryption system, the adversary have an public key issued by the challenger, and thus he can encrypt any plaintexts at his disposal. However, the probabilistic nature of the encryption means that the ciphertexts are nearly random, and therefore encrypting $m_0, m_1$ and comparing the resulting ciphertexts with the challenge ciphertext does not afford any non-negligible advantage to the adversary.

Semantic security

It is computationally infeasible to learn information about the plaintext from the ciphertext. More precisely, if two plaintexts $m_0, m_1$ satisfy $|m_0|=|m_1|$, then $\{E(k, m_0)\} \approx_p \{E(k, m_1)\}$. Note that when we say semantic security, we actually mean semantic security under CPA. Hence, the two probability distributions above are in fact conditioned on a polynomial number of plaintext-ciphertext pairs where the plaintexts are chosen by the adversary.

Semantic security is shown to be equivalent to IND-CPA, and many cryptographic proofs use these two definitions interchangeably. Like IND, deterministic encryption cannot be semantically secure.

Existential unforgeability

It is computationally infeasible to forge a message along with a correct tag. More precisely, if $(S_k,V_k)$ is a MAC with existential unforgeability and a polynomial-time adversary forges $m,t$ without $k$, then $V_k(m,t)=0$ with very high probability. Note that when we say existential unforgeability, we actually mean existential unforgeability under CPA, ie., the adversary can request a polynomial number of messages for their tags before the challenge.

Fact. Let $(E,D)$ be a symmetric encryption system and define a MAC by letting $T_k(m)=E(k,m)$ and $V_k(m,t)=1$ iff $D(k,m)=t$. Then this MAC has existential unforgeability iff the encryption system has ciphertext integrity.

Collision resistance

It is computationally infeasible to find two inputs with the same output. More precisely, if $H$ is a function with collision resistance, then no polynomial-time adversary can find $m_1\neq m_2$ such that $H(m_1)=H(m_2)$ with a chance significantly better than random guess. Collision resistance is a property of a cryptographic hash function, among the following two properties: (i) Preimage resistance: For any element in the image, it is computationally infeasible to find its preimage; (ii) Second-preimage resistance: For any element in the domain, it is computationally infeasible to find another element in the domain with the same image.