This is ericpony's blog

Thursday, July 16, 2015

Correctness specification for Pregel programs

In this post, we show how to prove correctness of programs written in the Pregel abstraction by specifying loop invariants and variants for the supersteps. This post is only a sketch and will be elaborated incrementally when I have time.

In the following examples, we use $G=(V,E)$ to denote the base graph. Without loss of generality, we maintain a global counter to record the number of supersteps taken so far. Let $C_v(n)$ denote the attribute of vertex $v$ in the $n$-th Pregel iteration. A state of a Pregel program is a tuple $(k, G, \{C_v(k): v \in V\})$, where $k$ denotes the value of the global counter. We shall use $c(s)$ to denote the counter value in state $s$. An invariant for a Pregel program is a sequence of assertions $\{f_n\}_{n\ge0}$ over program states, such that each $f_k$ holds after the $k$-th superstep (i.e., $c(s)=k$ implies $f_k(s)$ is true). Moreover, $f_{k+1}$ should hold given that $f_{k}$ holds and that there are active vertices after the $k$-th superstep. A variant $V$ for a Pregel program is mapping from program states to a poset $(P,\succ)$, such that i) $P$ doesn't contain a infinite descending chain, and ii) $s_1\leadsto s_2 \wedge c(s_2)>c(s_1)$ implies $V(s_1)\succ V(s_2)$. Here we simply set this poset to $(\mathbb N\cup\{0\}, >)$.

Our first example is the connected components algorithm from the Spark library. This algorithm, after it terminates, assigns each vertex the smallest vertex id among the vertices in the same connected components. The vertex attribute $C_v(n)$ records the the lowest vertex id that vertex $v$ has found up to the $n$-th superstep. Let $C_v(0)=id(v)$  for all $v\in V$. It can be shown that
Invariant: $f_n$ $\equiv$ "$C_v(n) = \min\{C_u(n-1) : u\in N(v)\}$"
Variant: $V \equiv \sum_{v\in V} C_v(n)$.

The second example is the shortest landmark distances algorithm from the Spark library. Given a set of landmarks $L \subset V$, the algorithm computes for each vertex $v$ a mapping $C_v(n): L \rightarrow Int$, such that $C_v(n)(u)$ records the shortest distances from $v$ to $u$ that vertex $v$ has found up to the $n$-th superstep. For all $v,u\in V$, let $C_v(0)(u) = 0$ if $v=u \in L$, and $C_v(0)(u) = \infty$ otherwise. It turns out that
Invariant: $f_n$ $\equiv$ "For each $u \in L$, $C_v(n)(u)$ is the distance of the shortest paths from $v$ to $u$ with at most $n$ edges."
Variant: $V \equiv \sum_{v\in V}\sum_{u\in L} C_v(n)(u)$.

Sometimes, loop invariants and variants cannot be found as intuitively even for simple Pregel programs. Consider the following vertex coloring algorithm that colors the vertices using at most $\Delta+1$ colors, where $\Delta$ denotes the maximum degree of the base graph. Also, let the set of available colors be cyclically ordered with cardinality $\Delta+1$.
  Initialize all vertices to the same color.   while there exists $(v, u) \in E$ such that $v$ and $u$ share the same color do       Let $w$ denote the vertex from ${v, u}$ with a larger vertex id.       Set the color of $w$ to the minimal color larger than its current color.   done
Obviously, there exists a $(\Delta+1)$-coloring for the graph. Besides, the algorithm terminates only if it found a proper coloring. Now the question is: Is it possible that this algorithm doesn't terminate, say, given some order in which the vertices are colored? A variant cannot be found as easily as in the previous examples, because the relation $\succ$ is anti-reflexive but the number of colliding vertices doesn't decrease strictly as the algorithm iterates. However, after some reflection, we can still specify a variant as follows:

Variant: $V \equiv $ the maximal number of vertices that collide with their neighbors.

More precisely, after the $n$-th superstep finishes, there exists at least $n$ vertices in the graph, such that the coloring of these $n$ vertices is a subset of a proper $(\Delta+1)$-coloring of the graph. Observe that existence of $V$ here happens to assure both the termination and the correctness of this vertex coloring algorithm.

Related Posts Plugin for WordPress, Blogger...