Ericpony's Blog: Scala

Showing posts with label Scala. Show all posts

Thursday, August 20, 2015

Proving theorems with Leon

Here are some high level guides to proving theorems with Leon:

To prove $A\implies B$, you usually need to finds some statements $A_1,...,A_n$ and prove $A\implies A_1$, $A_1\implies A_2$, ..., $A_n\implies B$. These intermediate results are called Lemmas. $A\implies A_1 \implies ... A_n \implies B$ is called an implication chain.
You will find yourself get stuck at some step in the implication chain very often, say at $A_k\implies A_{k+1}$. To get through, you can try to find another route either from $A$ to $B$ or from $A_k$ to $A_{k+1}$. For example, if $A_k$ looks too weak for Leon to infer $A_{k+1}$, you can try to strengthen the hypothesis. At a high level, it is possible to find a sequence of statements $H_m, ..., H_k$, such that beginning from some $m$th step you can prove $A_m\implies A_{m+1}\wedge H_m$ and then $A_i\wedge H_{i-1} \implies A_{i+1}\wedge H_i$ for $i=m,...,k$ in turn, eventually reaching the desired result $A_k\wedge H_{k-1} \implies A_{k+1}$.
(In Leon, a lemma is just a boolean function. Hence, to strength $A_i \implies A_{i+1}$ to $A_i\wedge H_{i-1} \implies A_{i+1}\wedge H_i$, you have to add $H_{i-1}$ to the pre-condition and add $H_i$ to the post-condition of the boolean function that asserts $A_i \implies A_{i+1}$.)
In general, it is a good idea to made as strong assumption as possible for the proposition to verify, and set as weak requirements as possible for auxiliary lemmas. If the proposition assumes properties of a function (e.g. always returning nonzero values), putting the assumptions in the post-condition of that function can improve the odds Leon successfully verifying your proposition.
I observe that Leon performs far better at proving a property as a post-condition than as a proposition. In many cases, a property that takes complicated proof hints and dozens of auxiliary lemmas to be verified as a proposition can be verified directly as a post-condition. Unfortunately, currently I are not very sure what makes this difference.
Isolating a statement in a proof as a standalone lemma enables Leon to verify that statement via a separate induction process (by adding @induct to the new lemma), and to use a difference base case than the one, if any, used in the main proof.
Use @library and @ignore annotations to help Leon concentrate on the part of interest in your code. Also, don't forget to use the (currently undocumented) --watch command-line option to turn on continuous verification.
Avoid using higher-order functions. For example, try to define list operation from scratch without using helper functions such as map, filter, and foreach. The resulting code, while more tedious, is far easier to be verified by Leon as a whole.
If you have provided logically sufficient lemmas to prove a proposition, and you find that Leon is able prove all the lemmas but the proposition, then check that if @induct annotation is used. If it is, check that if you set a pre-condition for the proposition (by using require) to prove. If you do, then it is possible that the pre-condition is not satisfied in the base case, and thus Leon cannot finish the induction.
There seems to be some kind of non-determinism with Leon. In most cases, a property is either verifiable in milliseconds, or cannot be verified within even an hour. Hence, for efficiency, one usually set timeout as a small amount say 3 seconds. However, I notice that the time Leon takes to verify a lemma may vary drastically under some conditions yet unclear to me. For one of my lemmas, the time Leon takes to verify it ranges from 2 to 20 seconds in a sequence of 20 executions. Be careful not to waste time revising a proof that (suddenly) needs more time to be verified.
Be careful about cyclic arguments. Leon cannot detect cyclic arguments, and is likely to regard them as valid proofs. The simplest example for this logic fallacy may be as follows. If you provide two lemmas A and B, and claim that lemma A holds because of lemma B, and lemma B holds because of lemma A. Then Leon will report that both A and B holds.
More to come.

Thursday, July 16, 2015

Correctness specification for Pregel programs

In this post, we show how to prove correctness of programs written in the Pregel abstraction by specifying loop invariants and variants for the supersteps. This post is only a sketch and will be elaborated incrementally when I have time.

In the following examples, we use $G=(V,E)$ to denote the base graph. Without loss of generality, we maintain a global counter to record the number of supersteps taken so far. Let $C_v(n)$ denote the attribute of vertex $v$ in the $n$-th Pregel iteration. A state of a Pregel program is a tuple $(k, G, \{C_v(k): v \in V\})$, where $k$ denotes the value of the global counter. We shall use $c(s)$ to denote the counter value in state $s$. An invariant for a Pregel program is a sequence of assertions $\{f_n\}_{n\ge0}$ over program states, such that each $f_k$ holds after the $k$-th superstep (i.e., $c(s)=k$ implies $f_k(s)$ is true). Moreover, $f_{k+1}$ should hold given that $f_{k}$ holds and that there are active vertices after the $k$-th superstep. A variant $V$ for a Pregel program is mapping from program states to a poset $(P,\succ)$, such that i) $P$ doesn't contain a infinite descending chain, and ii) $s_1\leadsto s_2 \wedge c(s_2)>c(s_1)$ implies $V(s_1)\succ V(s_2)$. Here we simply set this poset to $(\mathbb N\cup\{0\}, >)$.

Our first example is the connected components algorithm from the Spark library. This algorithm, after it terminates, assigns each vertex the smallest vertex id among the vertices in the same connected components. The vertex attribute $C_v(n)$ records the the lowest vertex id that vertex $v$ has found up to the $n$-th superstep. Let $C_v(0)=id(v)$ for all $v\in V$. It can be shown that
Invariant: $f_n$ $\equiv$ "$C_v(n) = \min\{C_u(n-1) : u\in N(v)\}$"
Variant: $V \equiv \sum_{v\in V} C_v(n)$.

The second example is the shortest landmark distances algorithm from the Spark library. Given a set of landmarks $L \subset V$, the algorithm computes for each vertex $v$ a mapping $C_v(n): L \rightarrow Int$, such that $C_v(n)(u)$ records the shortest distances from $v$ to $u$ that vertex $v$ has found up to the $n$-th superstep. For all $v,u\in V$, let $C_v(0)(u) = 0$ if $v=u \in L$, and $C_v(0)(u) = \infty$ otherwise. It turns out that
Invariant: $f_n$ $\equiv$ "For each $u \in L$, $C_v(n)(u)$ is the distance of the shortest paths from $v$ to $u$ with at most $n$ edges."
Variant: $V \equiv \sum_{v\in V}\sum_{u\in L} C_v(n)(u)$.

Sometimes, loop invariants and variants cannot be found as intuitively even for simple Pregel programs. Consider the following vertex coloring algorithm that colors the vertices using at most $\Delta+1$ colors, where $\Delta$ denotes the maximum degree of the base graph. Also, let the set of available colors be cyclically ordered with cardinality $\Delta+1$.

    Initialize all vertices to the same color.
    while there exists $(v, u) \in E$ such that $v$ and $u$ share the same color do
        Let $w$ denote the vertex from ${v, u}$ with a larger vertex id.
        Set the color of $w$ to the minimal color larger than its current color.
    done

Obviously, there exists a $(\Delta+1)$-coloring for the graph. Besides, the algorithm terminates only if it found a proper coloring. Now the question is: Is it possible that this algorithm doesn't terminate, say, given some order in which the vertices are colored? A variant cannot be found as easily as in the previous examples, because the relation $\succ$ is anti-reflexive but the number of colliding vertices doesn't decrease strictly as the algorithm iterates. However, after some reflection, we can still specify a variant as follows:

Variant: $V \equiv $ the maximal number of vertices that collide with their neighbors.

More precisely, after the $n$-th superstep finishes, there exists at least $n$ vertices in the graph, such that the coloring of these $n$ vertices is a subset of a proper $(\Delta+1)$-coloring of the graph. Observe that existence of $V$ here happens to assure both the termination and the correctness of this vertex coloring algorithm.

Monday, June 15, 2015

Writing decentralized algorithms with Pregel: A case study

In this post, we demonstrate how to implement a purely functional decentralized algorithm using the Pregel abstraction offered by the Spark GraphX library.

A vertex coloring of a graph is a way to color the vertices such that no two adjacent vertices share the same color. Formally, given an undirected graph $G=(V,E)$, a k-coloring of $G$ is a map $f: V \rightarrow \{1,...,k\}$ such that $f(v) \neq f(u)$ for any $(v,u) \in E$. A graph is called k-colorable if it has a $k$-coloring.

It takes linear time to find a $(\Delta+1)$-coloring for any graph with maximum degree $\Delta$. On the other hand, finding a 4-coloring for a 3-colorable graph is already NP-hard. There are many randomized algorithms in the literature that color a graph in exponential time and linear space. Among them, the Communication-Free Learning (CFL) algorithm [1] is a fully decentralized algorithm inspired by the problem of allocating non-interfering channels in IEEE 802.11 wireless networks.

The CFL algorithm works as follows. For each vertex $v \in V$, let $C_n(v) \in \{1,...,k\}$ denote the color of $v$ in the $n$-th iteration. Also, let $P_n(v,c)$ denote the probability that vertex $v$ has color $c$ in the $n$-th iteration. That is, $\Pr\{C_n(v) = c\} = P_n(v,c)$. At $n = 0$, the color of each vertex $v$ is chosen uniformly at random, i.e., $P_0(v, c) = 1/k$ for all $v \in V$ and $c = 1,...,k$. For $n \ge 1$, the color of $v$ is chosen randomly according to the following rule: If no adjacent vertex to $v$ shares the same color with $v$, then we let $P_n(v, c) = 1$ for $c = C_{n-1}(v)$ and let $P_n(v, c) = 0$ otherwise. Note that this implies $C_n(v) = C_{n-1}(v)$. If there is a collision of color between $v$ and its neighbors, then we choose $C_n(v)$ according to probability distribution $P_n(v, \cdot)$, such that for $c = 1,...,k$,
$$ P_{n}(v,c)=\begin{cases} (1-\beta)\cdot P_{n-1}(v,c), & c=C_{n-1}(v)\\ (1-\beta)\cdot P_{n-1}(v,c)+\beta\cdot(k-1)^{-1}, & \mbox{otherwise,} \end{cases} $$ where $\beta \in (0,1)$ is a pre-determined parameter. Observe that the vertex colors stabilize if and only if they form a $k$-coloring.
The CFL algorithm can be executed via the following decentralized procedure:

Set up $C_0(v)$ and $P_0(v,\cdot)$ for each vertex $v$ and let all vertices be active.
A) An active vertex $v$ is deactivated if no vertices in $N(v)$ share the same color with $v$. An inactive vertex $v$ is activated if there is a vertex in $N(v)$ sharing the same color with $v$.
B) Each vertex $v$ computes $C_n(v)$ and $P_n(v,\cdot)$ upon the $n$-th change of the vertex state (from active to inactive or vice versa).
Repeat Step 2 until all vertices are simultaneously inactive.

Thus we implement the algorithm by carrying out the above procedure with Pregel. A vertex $v$ is represented as a 4-tuple

type Palette = (Color, List[Double], Boolean, Random)

composed of the vertex color $C_n(v)$, the color distribution $P_n(v,\cdot)$, the vertex state (active/inactive), and a random number generator in turn. Given a graph RDD graph, we first construct its base graph and initialize the attributes of the vertices:

val seed = Random.nextInt
val baseGraph = graph.mapVertices((id, _) => {
  val rng = new Random(seed + id)
  (rnd.nextInt(k + 1), initColorDist, true, rng)
})

Note that we choose to maintain an RNG at each vertex. One may feel tempted to declare a global RNG and share it among the vertices. However, since Spark serializes the closure as it dispatches jobs, doing so will lead each vertex to use a copy of the global RNG instead of sharing it. Hence, all vertices will obtain the same sequence of random numbers upon their queries, which is definitely an undesired situation.
We use functions sendMsg and mergeMsg to carry out Part A of the second step in the procedure:

def sendMsg (edge: EdgeTriplet[Palette, _]): Iterator[(VertexId, Boolean)] = {
  if (edge.srcAttr._1 == edge.dstAttr._1)
    return Iterator((edge.srcId, true))
  if (edge.srcAttr._3)
    return Iterator((edge.srcId, false))
  Iterator.empty
}
def mergeMsg (msg1: Boolean, msg2: Boolean) = msg1 || msg2

For each edge $(v, u)$, the sendMsg function can activate or deactivate the source vertex $v$ by sending it a Boolean message. The messages are aggregated by mergeMsg via disjunction, such that a vertex is activated if any of its neighbors wants to activate it, and is deactivated if all of its neighbors want to deactivate it.
Part B of the second step is carried out by the vprog function:

def vprog (id: VertexId, attr: Palette, active: Boolean): Palette = {
  val color = attr._1
  val dist  = attr._2
  val rng   = attr._4
  val newDist = dist.foldLeft((1, List[Double]())) {
    case ((i, list), weight) => (i + 1,
      if (active)
        list :+ (weight * (1 - beta) + (if (color == i) 0.0 else beta / (maxNumColors - 1)))
      else
        list :+ (if (color == i) 1.0 else 0.0))
  }._2
  val newColor = if (active) sampleColor(newDist, rng.nextDouble) else color
  (newColor, newDist, active, rng)
}

The vprog function uses a helper function colorSampler to color a vertex randomly according to its current color distribution:

def sampleColor (dist: List[Double], rnd: Double): Int = {
  dist.foldLeft((1, 0.0)) {
    case ((color, mass), weight) => {
      val m = mass + weight
      (if (m < rnd) color + 1 else color, m)
    }}._1
}

Finally, we invoke Pregel with the above functions to compute a $k$-coloring for the base graph baseG:

Pregel(baseGraph, true)(vprog, sendMsg, mergeMsg).mapVertices((_, attr) => attr._1)

It is shown in [2] that the CFL algorithm colors a graph in exponential time with high probability. More precisely, given any $\epsilon \in (0,1)$, the algorithm can find a $k$-coloring for a $k$-colorable graph with probability $1-\epsilon$ in $O(N\exp^{cN\lg 1/\epsilon})$ iterations, where $N$ is the number of vertices and $c$ is a constant depending on $\beta$.

References

1. D. J. Leith, P. Clifford, "Convergence of distributed learning algorithms for optimal wireless channel allocation." IEEE CDC, 2006.
2. Duffy, Ken R., et al. "Complexity analysis of a decentralised graph colouring algorithm." Information Processing Letters, 107.2, 2008.

Friday, January 23, 2015

Deterministic aggregate semantics in Scala and Spark

The aggregate function in Scala has method signature as follows:

def aggregate[A](z: => A)(seqOp: (A, B) => A, combOp: (A, A) => A): A

The semantics of aggregate depend on the collection upon which it is invoked. When aggregate is invoked on a sequential traversable collection such as a List, it fallbacks to foldLeft and ignores the provided compOp. In this case, aggregate behaves just like the ordinary fold in typical functional programming languages such as Lisp and ML. When aggregate is invoked on a parallel traversable collection, however, it performs a two-stage aggregation over the data elements in parallel. In Scala, a parallel collection can be splitted into partitions and distributed over several cores or nodes. Invoking aggregate on a parallel collection reduces a value for the collection by first folding each partition separately using seqOp, and then merging the results one by one using combOp.

Note that the Scala API document doesn't specify the way in which the collection is partitioned, as well as the order in which the intermediate results are merged. Hence, if a programmer hopes to obtain the same output whenever aggregate is invoked on the same parallel collection, he has to make sure himself that seqOp is associative and combOp is both associative and commutative.

Aggregate v.s fold

The fold function in Scala has method signature as follows:

def fold[A1 >: A](z: A1)(combOp: (A1, A1) => A1): A1

One can observe two differences between the signatures of fold and aggregate: 1) fold only needs one operator, while aggregate needs two; 2) fold requires the result type to be a supertype of the element type, while aggregate doesn't. Note that both methods fallback to foldLeft when they are invoked on a sequential collection. One may notice that Scala's aggregate is in effect the ordinary FP fold

fold : (A \to B \to A) \to A \to ([B] \to A)

and Scala's fold is just a special form of its aggregate. The idea behind this design is to allow for parallel FP fold. In Scala, fold and aggregate are both done in two stages, namely folding and combining. For fold, the same operator is used for folding and combining, and this requires the supertype relationship between the element and the result type. Without the supertype relationship, it will take two operators to fold and combine, and this is why Scala provides aggregate that takes two operators as arguments.

Aggregate in Spark

Spark implements its own aggregate operations for RDD. One can observe from the source code of Spark that the intermediate results of aggregate are always iterated and merged in a fixed order (see the Java, Python and Scala implementations for details). Hence, we don't have to use commutative combOp to obtain deterministic results from aggregate, given that at least one of the following conditions holds:
(i) there doesn't exist an empty partition in any partitioning of the collection, or
(ii) z is an identity of combOp, i.e., for all a of type A, combOp(z,a) = combOp(a,z) = a.
Note that this observation is not assured by the official Spark API document and is thus not guaranteed to hold in the future versions of Spark. However, it is still useful if we want to model the behaviours of a Spark program formally.

Below, we shall try to formalize Spark's implementation of aggregate. While Scala is an impure functional language, we only consider pure functions here for simplicity. Let $R(L,m)$ denote an RDD object obtained from partitioning a list $L \in \mathbb{D}^*$ into $m \ge 1$ sub-lists. Suppose that we have fixed an element $zero \in \mathbb{E}$, as well as operators $seq:{\mathbb{E} \times \mathbb{D} \to \mathbb{E}}$ and $comb:{\mathbb{E} \times \mathbb{E} \to \mathbb{E}}$. An invocation of aggregate is said to be deterministic with respect to $zero$, $seq$ and $comb$ if the output of $$R(L,\cdot).\textrm{aggregate}(zero)(seq,\ comb)$$ only depends on $L$. Define a function $\textrm{foldl}:({\mathbb{E} \times \mathbb{D} \to \mathbb{E}}) \times \mathbb{E} \times \mathbb{D}^* \to \mathbb{E}$ by \begin{eqnarray*} \textrm{foldl}(f,\ a,\ Nil) & = & a \\ \textrm{foldl}(f,\ a,\ b \!::\! bs) & = & \textrm{foldl}(f,\ f(a,b),\ bs), \end{eqnarray*} where $f\in {\mathbb{E} \times \mathbb{D} \to \mathbb{E}}$, $a \in \mathbb{E}$, $b \in \mathbb{D}$ and $bs \in \mathbb{D}^*$. Suppose that $1\le m \le |L|$ and $R(L,m)$ partitions $L$ into $\{L_1, \dots, L_m\}$ such that $L=L_1 \cdots L_m$. According to the source code of Spark, we can define the aggregate operation in Spark as \begin{eqnarray*} R(L,m).\textrm{aggregate}(zero)(seq,\ comb) = \textrm{foldl}(comb,\ zero,\ L'), \end{eqnarray*} where $L' \in \mathbb{E}^*$ is a list of length $m$ and $L'[i] = \textrm{foldl}(seq,\ zero,\ L_i)$ for $i=1,\dots,m$.

Note that the partitioning of $L$ is unspecified and may be non-deterministic. Hence, we have to use an associative $comb$ operator if we want to ensure that the result of aggregation is deterministic. Suppose that $comb$ is associative and at least one of conditions (i) and (ii) is satisfied. It turns out that the output of aggregate is deterministic with respect to $zero$, $seq$ and $comb$ if and only if for all lists $L_1$ and $L_2$, $$comb(zero,\ \textrm{foldl}(seq,\ zero,\ L_1L_2)) = comb(\textrm{foldl}(seq,\ zero,\ L_1),\ \textrm{foldl}(seq,\ zero,\ L_2)).$$ While the above condition is undecidable in general, it can be effectively checked for some classes of programs such as FSMs over finite alphabets.

Concluding remarks

Aggregate in Spark is the parallelized version of the conventional fold, which was already shown to be extremely useful in functional programming. It is therefore of practical interests to study the properties and behaviours of the aggregate function. In particular, it is interesting to establish and verify the conditions under which aggregate is deterministic with respect to the provided $zero$, $seq$ and $comb$. We may further investigate this problem in the coming posts.

Monday, June 16, 2014

Functors and Monads in Scala ─ Part I

A type constructor is a type that you can apply type arguments to construct a new type: given a type A and a type constructor M, you get a new type M[A]. A functor is a type constructor, say M, together with a function

    lift: (A -> B) -> M[A] -> M[B]

that lifts a function over the original types into a function over the new types. Moreover, we require that the lifting function should preserve composition and identities. That is, if function h is formed by composing functions g and f, then lift(h) should be equivalent to the composition of lift(g) and lift(f). Formally, this means

    h(a) = g(f(a))  implies  lift h a = lift g (lift f a).

Also, if h is an identity function on variables of type A, then lift(h) should be an identity function on variables of type M[A]:

    h(a) = a  implies  lift h a = a.

A monad is a functor associated with two special operations, unitizing and binding:

    unit: A -> M[A]
    bind: (A -> M[B]) -> M[A] -> M[B]

A unitizing operation takes an object of the original type and maps it to an object of the new type. A binding operation is useful in composing two monadic functions, i.e., functions with signature $*\rightarrow M[*]$. Using binding, the composition of two monadic functions f: A -> M[B] and g: B -> M[C] can be defined as

    a => bind g f(a),

which has type A -> M[C]. Composition of monadic functions is supposed to be associative and has unit as a left and right identity. For this purpose, definitions of bind and unit must satisfy three "monad laws". That is, the following three equations

1.    bind (a => bind g f(a)) la = bind g (bind (a => f(a)) la)
2.    bind f (unit a) = f(a)
3.    bind unit la = la

should hold for any values a with type A, la with type M[A], functions f with type A -> M[B] and g with type B -> M[C].

A monad can be thought of as representing computations just like a type represents values. These computations are operations waiting to be carried out. Presumably, the programmer has some interest in not carrying them out directly. For example, the computations to be done may not be fully known at compile time. One can find similar ideas of representing computations as objects in the Command design pattern of GoF. Regarding monads as computations, the purpose of unit is to coerce a value into a computation; the purpose of bind is to construct a computation that evaluates another computation and yields a value. Informally, unit gets us into a monad and bind gets us around the monad. To get out of the monad, one usually uses an operation with type M[A] -> A, though the precise type depends on the purpose of the monad at hand.

In Scala, a monad is conventionally treated like a collection associated with list operations: the unit operation corresponds to a constructor of singleton lists; lift and bind are named as map and flatMap, respectively, in the sense that lift maps a list to another list and bind maps a list of lists to another list of lists and then flattens the result to a one-layered list. Map and flatMap operations kind of subsume each other given the flatten operation, as

    flatMap g la = flatten (map g la)
    map (a => f(a)) la = flatMap (a => unit f(a)) la

Hence, the minimal set of operations a monad must provide can be {unit, flatMap} or {unit, map, flatten}. Scala doesn't define a base trait for Monad and leaves the decision of interface to the programmer. On the other hand, if a monad implements methods map, flapMap and filter, then it can be composed with other monads via for-comprehension to streamline monadic computations.

Examples

Optional values. Option (aka. Maybe) monads are used extensively in many functional programming languages: see here for a more general treatment of option monads, and here for a detailed explanation of their usages in Scala. Below is a minimalistic implementation of Option type, and its companion subtypes Some and None:

sealed trait Option[T] {
    def map[U](f: T => U):Option[U]
    def flatMap[U](f: T => Option[U]): Option[U]
}
case class Some[T](val t:T) extends Option[T] { 
    override def map[U](f: T => U): Option[U] = new Some[U](f(t))
    override def flatMap[U](f: T => Option[U]): Option[U] = f(t)
}
case class None[T] extends Option[T] {
    override def map[U](f: T => U): Option[U] = new None[U]
    override def flatMap[U](f: T => Option[U]): Option[U] = new None[U]
}

The canonical advantage of the option monad is that it allows composition of partial functions, i.e., functions that return a null pointer when they fail to give correct results. Instead, a partial function returns None (representing an "empty" Option) and this result may be composed with other functions. In other words, it allows transformation of partial functions into total ones. For example, suppose g1, g2 and g3 are functions that return either a null pointer or an addable value, and we want to print g3(g2(x) + g1(x)) given that the result is meaningful. Without Option, one has to use nested if-then null checks or a try-catch block to handle possible null inputs/outputs, for example,

try{ 
  val y = g3(g2(x) + g1(x))
  if(y != null) println(y)
}catch {
  case e : NullPointerException => ; 
}

If we make g1, g2 and g3 return an option monad via a wrapper, we can achieve the same purpose using for-comprehension, so that nothing would be printed if any of g1, g2 or g3 returns null:

def monadify[T](f: T => T) =
  (x: T) => { 
    val y = if(x!=null) f(x) else x
    if(y != null) Some(y) else None
  }
for {
  y <- monadify(g1)(x)
  z <- monadify(g2)(x)
  r <- monadify(g3)(y + z)
} yield println(r)

Whether the functional ("idiomatic") way of handling null pointers is preferable to the traditional one is still under debate, especially among sophisticated users of imperative programming languages. Since these two approaches differ very little in performance and Scala supports both of them, the choice is really only a matter of taste than truth IMHO.

An important thing to notice about this example is that the monad is used here as a control structure. We have got a sequence of operations chained together in a monad via for-comprehension. When we evaluate the monadic sequence, the implementation of the binding operator that combines monads actually does control flow management. This trick of using the chaining operator to insert control flow into monadic sequences is an incredibly useful technique and is used in Scala by a variety of monads beside Option, such as Try, Either, Future and Promise.

State transformers. A stateful computation can be modeled as a function that takes inputs with an initial state and produces an output paired with the new state. Due to the presence of states, a function may produce different results even though it is given the the same inputs. Let Env be a type representing the possible states of the environment. An Effect is a mapping from Env to Env that describes changes of states, i.e., the side-effects caused by a computation, and the monad type M adjoins an extra side-effect to any type:

    Effect = Env -> Env
    M[A] = (A, Effect)

The unit function pairs up a value with the identity on side-effects; the map function propagates a state leaving it unchanged:

    unit a = (a, e => e)
    (a, e) map f = (f(a), e)

If you want to adjoin two side-effects e and e', you can use bind to compose them and get a single effect e' ○ e, where ○ denotes function composition:

    (a, e) bind (a => (f(a), e')) = (f(a), e' ○ e)

In Scala, the state monad described above may be written as

case class M[+A](a: A, e: Env => Env) {
    private def bind[B](f: A => M[B]): M[B] = {
      val fa = f(a); M(fa.a, env => fa.e(e(env)))
    }
    override def toString = showM(this)
    def map[B](f: A => B): M[B] = bind(x => unitM(f(x)))
    def flatMap[B](f: A => M[B]): M[B] = bind(f)
}
def unitM[A](a: A) = M(a, env => env)

The following example shows a state monad that increments a counter each time an arithmetic operation is invoked. (Note that one can actually declare the counter as a field member, avoiding the use of monads. This may explain the viewpoint that state monads are arguably less important in impure FPs like Scala than in pure FPs like Haskell.) The add and div operations demonstrate how to handle a monad wrapped in another monad using nested for-comprehensions.

type Env = Int
// Ad hoc operations for this particular monad
def showM[A](m: M[A]) = "Value: " + m.a + "; Counter: " + m.e(0)
val tickS = M(0, e => e + 1)

// Following code demonstrates the use of monads
type Value = Double
def add(a: M[Option[Value]], b: M[Option[Value]]): M[Option[Value]] = 
  for {
    _  <- tickS
    aa <- a 
    bb <- b
    c = for {
      a <- aa if aa.nonEmpty
      b <- bb if bb.nonEmpty
    } yield a + b     // yield None if aa is None or bb is None
  } yield c

def div(a: M[Option[Value]], b: M[Option[Value]]): M[Option[Value]] = 
  for {
    _  <- tickS
    aa <- a
    bb <- b
    c = for {
      a <- aa if aa.nonEmpty
      b <- bb if bb.nonEmpty
      b <- Some(b) if b != 0
    } yield a / b     // yield None if b==0 or aa is None or bb is None
  } yield c

val const0 = unitM(Some(0.0))
val const1 = unitM(Some(1.0))
val const2 = unitM(Some(2.0))
val const3 = unitM(Some(3.0))
val expr = div(add(const3, add(const1, const2)), const2)  
println(expr)                            // Value: Some(3.0); Counter: 2
println(add(div(expr, const0), const1))  // Value: None; Counter: 5

The syntax sugar of for-comprehensions abstracts the details and help the user manipulate monads without getting involved in the underlying machinery of binding and lifting. The fact that the abstraction works as expected is a consequence of the monad laws, and it is up to the designer of the monad to assure that the implementation does keep the laws.

We just stop here as this post is getting too long. More examples, techniques and discussions will be introduced in the Part II of this tutorial.

References and resources

1. Monads are Elephants, Part 1 ~ Part 4, is a gentle and pragmatic tutorial that explains exactly what a Scala programmer needs to know to exploit monads. Scala Monads: Declutter Your Code With Monadic Design is one of the videos on YouTube that teaches you how to write modular and extensible programs with monadically structured code.
2. Many excellent papers and slides about monads could be found in the homepage of Philip Wadler, one of the most enthusiastic advocates of monadic design in the FP community. Among others, The Essence of Functional Programming and Monads for Functional Programming are two must-reads for anyone who is struggling to turn monadic reasoning into his intuitions.
3. For a general and rigid mathematical treatment of monads, see e.g., Arrows, Structures, and Functors: The Categorical Imperative, where Section 10.2 describes monads as generalized monoids in a way that basically makes everything that we can write down or model using abstract syntax a monad. Mathematically mature computer science readers will find everything they need to know about the subject in the celebrated book Categories for the Working Mathematician by Mac Lane, the co-founder of category theory.
4. There are also books available online that introduce programmers with moderate math background into the subject, including Category Theory for Computing Science by M. Barr and C. Wells, and Computational Category Theory by D.E. Rydeheard and R.M. Burstall. Also check this thread and references therein about the relations between category theory, type theory, and functional programming.

Friday, May 23, 2014

Scala: Mapping a set removes duplicates

Consider the following code:

def sumSizes(collection: Iterable[List[_]]): Int = collection.map(_.size).sum

The function shall count the accumulated size of all the contained Lists. The problem is that if collection is a Set, then after mapping the objects to their lengths, the result is still a Set and all Lists of the same size are reduced to a single representative. For example, sumSizes(List(List(1,2), List(3,4))) outputs 4 but sumSizes(Set(List(1,2), List(3,4))) mistakenly outputs 2. To avoid such pitfalls one may write the method as

/* version 1 */
def sumSizes(collection: Iterable[List[_]]): Int = collection.toSeq.map(_.size).sum
/* version 2 */
def sumSizes(collection: Iterable[List[_]]): Int = collection.foldLeft(0)(_+_.size)

The first version requires an unnatural discipline of "always using toSeq before mapping" in programming, and worse, it introduces an overhead of allocating memory and copying elements which is seldomly necessary. The second version doesn't causes overheads, but it is asymmetric and thus less elegant. It is even potentially less efficient in a context where the underlying collection is parallel, since sum, which is implemented as fold(0)(_+_) in the Scala library, can be optimized better than foldLeft or foldRight in such a context.

Accidental removal of duplicates is often harder to spot in real code, especially when you write utility code which simply takes a Traversable and/or use the output of methods which are declared to return a Traversable, which may bring surprising results when the implementation happens to be a Set. Imagine the following situation:

val edgeNum = graph.nodes.map(_.edges).map(_.size).sum / 2

You fetch a collection of Node objects from a graph, use them to get their adjacent edges and sum over them. This works if graph.nodes returns a Seq, and it ruins if someone decides to make Graph return its nodes as a Set. A way to defend against surprises is to explicitly label types. For example, annotating methods with return types, even if it's not required. The annotation serves as a caution for the programmer if he wants to change the type of graph.nodes from Seq to Set.

All in all, the best practice to prevent these types of errors is good unit test coverage with representative data, together with sensible API's coupled with knowledge of how the scala compiler maintains source types through map/for generators etc. If you are returning a Set of something you should make that obvious, as returning a Collection/Traversable is hiding a relevant implementation detail and that sometimes causes unexpected problems.

Saturday, May 3, 2014

Scala: Subtyping

When a class inherits from another, the first class is said to be a nominal subtype of the other one. It is a nominal subtype because each type has a name, and the names are explicitly declared to have a subtyping relationship. On the other hand, a structural subtype relationship holds if two types have the same members. While the compiler only verifies this property at the level of type checking, a subtype should correctly implement the contracts of its supertypes so that Liskov's Substitution Principle applies, which essentially says that beside supporting the same operations, the operations in the subtype should require no more and provide no less than the corresponding operations in the supertype.

The Scala compiler allows a type’s subtypes to be used as a substitute wherever that type is required. For classes and traits that take no type parameters, the subtype relationship simply mirrors the subclass relationship. For classes and traits that take type parameters, however, variance comes into play. For example, class List is covariant in its type parameter, and thus List[Cat] is a subtype of List[Animal] if Cat is a subtype of Animal. These subtype relationships exist even though the class of each of these types is List. By contrast, class Array is nonvariant in its type parameter, so Array[Cat] is not a subtype of Array[Animal]. See here for a more detailed explanation of Scala's variation rules.

Subtyping rules of function types

Suppose that Cat and Dog are subtypes of type Animal. Consider function types A and B defined as follows:
type A := Animal => Dog
type B := Cat => Animal
What should be the subtyping relation between A and B? According to Liskov's Substitution Principle, $T$ is a subtype of $T'$ iff whatever we can do with $T'$, we can do it with $T$. In this example, we can pass a cat to a function of type B and get some animal in return. We can do the same thing to A: if we pass a cat to a function of type A, it returns a dog, which is an animal. Hence, A satisfies the same contract with B (in fact its contract says more than that, as it guarantees that the returned animal is a dog), and thus A is a subtype of B.

In general, if we denote the "subtype of" relation by $<$, then we can express the subtyping rule as $(A \Rightarrow B) < (C \Rightarrow D)$ iff $C < A$ and $B < D$. To see this, note that arguments are something that’s required, whereas return values are something that’s provided. To satisfy Liskov's Substitution Principle, a function subtype must require less and provide more than its supertype. Since a subtype contains more information than its supertype, a reasonable function type should be contravariant in the argument types and covariant in the result type.

Behavioral subtyping

Let A and B be two types. We say A is a behavioral subtype of B iff the set of properties an object of type A satisfies is a superset of the properties satisfied by an object of type B. In terms of Liskov's principle, this means that a property provable by objects of type B should also be provable by objects of type A.

Behavioral subtyping is a stronger notion than classical subtyping rules of function types, which rely merely on type signatures. It is a semantic rather than syntactic relation because it intends to guarantee semantic interoperability of types in a hierarchy. Behavioral subtyping is thus trivially undecidable: if the property to prove is "a function always terminates for arguments of type A", then it is impossible in general for a program (e.g. a compiler) to verify that property holds for any subtype of A.

Behavioral subtyping of function types imposing requirements on signatures just like the classical subtyping rules. In addition, there are a number of behavioral conditions a function subtype must meet:

Preconditions cannot be strengthened in a subtype.
Postconditions cannot be weakened in a subtype.
Invariants of the supertype must be preserved in a subtype.
History constraint: objects are regarded as being modifiable only through their methods.

A violation of history constraint can be illustrated by defining a mutable point as a subtype of an immutable point. On the other hand, fields added to the subtype may be safely modified because they are not observable through the supertype methods. Thus, one may derive a circle with fixed center but mutable radius from immutable point without violating subtyping rules.

Thursday, March 13, 2014

Scala: Views of Collections

In Scala, you can create Views for many types of collections. Views are non-strict versions of collections meant to be treated much like a database view. This means that the elements are calculated at access and not eagerly as in normal collections. As an example consider the following code:

val ys = (1 to 10).view.map { x => println(x); }

This will not print anything but every access to the list will perform the calculation and print the value, i.e. every call to ys.head will result in a number being printed. If you want to get a strict version of the collection again you can call ys.force, which will print all numbers out.

One use of views is to traverse a collection of values that are expensive to compute and only one value is needed at a time. Also, views let you build lazy sequences by calling toStream on them, which will also cache the evaluated elements. Another use of views is to avoid intermediate copies of collection. For example, the following code

(1 to 10).map(_-1).map(_*2).map(-_)

creates a new list for each call of map operation. To avoid the intermediate results, one can turn the list first into a view, applying all transformations to the view, and finally forcing the view to a sequence:

(1 to 10).view.map(_-1).map(_*2).map(-_).force

The three stored functions (_-1), (_*2), (-_) get applied as part of the execution of the force operation and a new sequence is constructed. That way, no intermediate data structure is needed. Another example of the use of view would be (assuming that we have very little memory at hand):

scala> (1 to 1000000000).take(5).toList
java.lang.OutOfMemoryError: GC overhead limit exceeded

Here Scala tries to create a collection with 100000000 elements to access the first five. Using view, only the first five elements will be generated and accessed:

scala> (1 to 1000000000).view.take(5).force.toList
res2: List[Int] = List(1, 2, 3, 4, 5)

The third use case applies to views over mutable sequences. Many transformer functions on such views provide a window into the original sequence that can then be used to update selectively some elements of that sequence. The view does not copy these elements, it just provides a reference to them. For example,

def neg(a: collection.mutable.Seq[Int]) = a.indices.foreach(i => {a(i) = -a(i)})
var A = Array(1,2,3,4,5,6)
neg(A.slice(1,4))      // A doesn't change
neg(A.view.slice(1,4)) // A becomes (1,-2,-3,-4,-5,6)

Remarks. For smaller collection sizes the added overhead of forming and applying closures in views is often greater than the gain from avoiding the intermediary data structures. A possibly more important reason is that evaluation in views can be very confusing if the delayed operations have side effects. Because of the penalties incurred by views, one should usually force it after applying the transformations, or keep it as a view if only few elements are expected to ever be fetched, compared to the total size of the view.

Thursday, February 20, 2014

Scala: Duck Typing

Scala offers a functionality known as structural typing, which allows to set a behaviour very similar to what dynamic languages allow to do when they support duck typing. The main difference between the two is that structural typing is a type-safe, static-typed implementation checked up at compile time. This means that you can create a method that receives an expected duck. At compile time, it would be checked that anything passed to the method can actually quack like a duck.
We give some examples below to show how structural typing is used in Scala.

Example 1

case class ForInt(x: Int) {
  def myPrint() = println("Int: " + x)
}
case class ForDouble(x: Double) {
  def myPrint() = println("Double: " + x)
}
case class ForString(x: String) {
  def myPrint() = println("String: '" + x + "'")
}
val forInt: ForInt = new ForInt(3)
val forString: ForString = new ForString("Hello World")
val forDouble: ForDouble = new ForDouble(3.14159)
/* 
* This is like defining a "Printable" interface in Java. With structural typing,
* however, a type doesn't have to implement a Printable interface explicitly 
* in order to be recognized as printable. Scala compiler will check that for you.
*/
type T = { def myPrint(): Unit } 
val list: List[T] = List[T](forInt, forString, forDouble)
list.foreach(_.myPrint())

Example 2

def using[A <: {def close(): Unit}, B](res: A)(op: A => B):
  B = try op(res) finally res.close
 
def writeToFile(path: String, data: String): Unit = 
  using(new BufferedWriter(new OutputStreamWriter(
    new FileOutputStream(path), "UTF-8")))(_.write(data))
 
def appendToFile(path: String, data: String): Unit =
  using(new BufferedWriter(new OutputStreamWriter(
    new FileOutputStream(path, true), "UTF-8")))(_.write(data))
  
def printToFile(path: String)(op: PrintWriter => Unit): Unit = 
  using(new PrintWriter(new BufferedWriter(new OutputStreamWriter(
    new FileOutputStream(path), "UTF-8"))))(op)

Technical remarks. In Scala 2.10 or later you should either add

import scala.language.reflectiveCalls

to any classes that use structural typing, or add the -language:reflectiveCalls option to the scalac command line, otherwise you will get a compile warning. For more info on why, see SIP 18 for more details.

Calling methods in structural types is made using reflection, which means it will be much slower than a normal method call using a trait. For that reason, structural types in Scala should generally be avoided. Use them when you need them and in situations where the performance impact won't be too significant. See here for arguments on other points against the usefulness of structural typing in Scala. Another concern of duck typing is that it is not type safe. You can do something similar to duck typing in a type-safe and performant manner is to use type classes. Unfortunately, type classes cannot be realized very well in a JVM language, as Java bytecode is type-erasure. See this paper for a proposal arguably better than reflection to do structural typing in JVM languages.

Wednesday, February 12, 2014

Scala: Case Classes

In essential, case classes can be seen as plain and immutable data-holding objects that should exclusively depend on their constructor arguments.
Case classes differ with ordinary classes in that they

automatically define hashCode, equals, and toString methods
automatically define getter methods for the constructor arguments

This functional concept allows us to

use a compact initialization syntax, e.g., Node(Leaf(2), EmptyLeaf)
decompose them using pattern matching
have equality comparisons implicitly defined

Case classes are by default immutable data structures, as they implicitly use val constructor parameters. They are suitable to works with hash maps or sets, since they have a valid, stable hashCode. One can override the default behavior by prepending each constructor argument with var instead of val for case classes. In this way, you get the same getter/setter generation just like ordinary classes. However, making case classes mutable causes their equal and hashCode methods to be time variant. If an object performs stateful computations on the inside or exhibits other kinds of complex behaviour, it should instantiate an ordinary class instead of a case class. [1]

A case class automatically creates a companion object with the same name as the class, which contains apply and unapply methods. The apply method enables constructing instances without prepending with new. The unapply extractor method enables the pattern matching that mentioned in Note I. The compiler optimizes the speed of pattern matching for case classes. [2]

In combination with inheritance, case classes are often used to mirror algebraic data types. A very simple example of such types are trees. A binary tree, for instance, can be implemented like this:

sealed abstract class Tree
case class Node(left: Tree, right: Tree) extends Tree
case class Leaf[A](value: A) extends Tree
case object EmptyLeaf extends Tree

That enable us to do the following:

// DSL-like assignment:
val treeA = Node(EmptyLeaf, Leaf(5))
val treeB = Node(Node(Leaf(2), Leaf(3)), Leaf(5))
// modification through cloning:
val treeC = treeA.copy(left = treeB.left)
// Pretty printing
println("Tree A: " + treeA)
println("Tree B: " + treeB)
println("Tree C: " + treeC)
// Comparison
println("Tree A == Tree B: %s" format (treeA == treeB).toString)
println("Tree B == Tree C: %s" format (treeB == treeC).toString)
// Pattern matching
treeA match {
  case Node(EmptyLeaf, right) => println("Can be reduced to " + right)
  case Node(left, EmptyLeaf) => println("Can be reduced to " + left)
  case _ => println(treeA + " cannot be reduced")
}

Note that trees construct and deconstruct (through pattern match) with the same syntax, which is also exactly how they are printed (minus spaces).

Technically, one may treat case classes as instances of Product and thus inherit these methods:

def productElement(n: Int): Any
def productArity: Int
def productIterator: Iterator[Any]

where the productArity returns the number of class parameters, productElement(i) returns the ith parameter, and productIterator allows iterating through them.

References

1. Case classes are cool
2. Case Classes and Extractors, p.15

Saturday, January 11, 2014

Scala: Applications of implicit constructs

Prefix-to-Infix conversion. Suppose we have $F: Int \rightarrow Double \rightarrow Boolean$ and we would like to define an infix operator $\otimes$ such that $a \otimes b = F(a, b)$. The infix call is technically interpreted as a.⊗(b), so we have to add a $\otimes$ method to class Int. However, we cannot do this easily because Int is defined in the core Scala library. This situation is where Scala's implicit conversion can come to play: an implicit class defined as follows would perfectly serve our purpose:

implicit class Converter(b: Double) {
  def ⊗(a: Int): Boolean = F(a, b)
}

Note that the class name doesn't matter. The point is to use the keyword implicit and provide the function $\otimes$ with the correct signature. Now, when the compiler encounters something like a⊗b and tries to resolve the method call, it does not find a method named "$\otimes$" in class Int. It will then look for any implicit conversions it could use to transform a into something that has a method named "$\otimes$" with an applicable signature. Assuming our Converter class is in the right scope, the compiler will implicitly construct a Converter object from a and call $\otimes$ on that object instead.
PS. Scala defines a default converter from type Any to type String, which simply invokes the toString method of Any.

Type classes. Suppose we have a function $F: A \rightarrow B$ and we would like it to work only for a restricted subset $A'$ of $A$. There is no way to express such type constraint through inheritance. In Scala, however, we can use the type class pattern to set the constraint. Here is the idea: 1) create an abstract class that will represent a type class, 2) create elements representing the classes belonging to that type class, and 3) use an implicit parameter to restrict the type parameter of $F$ to that type class.
More concretely, suppose $A$ is the Numeric type and we would like $F$ to work only for Int and Long, but not any other subtype of Numeric. We first define an abstract type class:

abstract class Acceptable[T]

Next, we create the members of the type class, one for Int and another for Long. To make them available without need for import statements, we put them inside the companion class to Acceptable.

object Acceptable {
  implicit object AllowInt extends Acceptable[Int]
  implicit object AllowLong extends Acceptable[Long]
}

Finally, we allow $F$ to accept only type Int and Long by letting it take an implicit parameter:

def F[T](t: T)(implicit constraint: Acceptable[T]) = ...

Now, if you try to call $F(x)$ on an argument $x$ of type other than Int and Long, you will get an error at compile time, because the compiler cannot find the implicit object for that type. You can increase the number of allowed types by defining new implicit objects in the right scope.