A vertex coloring of a graph is a way to color the vertices such that no two adjacent vertices share the same color. Formally, given an undirected graph $G=(V,E)$, a k-coloring of $G$ is a map $f: V \rightarrow \{1,...,k\}$ such that $f(v) \neq f(u)$ for any $(v,u) \in E$. A graph is called k-colorable if it has a $k$-coloring.
It takes linear time to find a $(\Delta+1)$-coloring for any graph with maximum degree $\Delta$. On the other hand, finding a 4-coloring for a 3-colorable graph is already NP-hard. There are many randomized algorithms in the literature that color a graph in exponential time and linear space. Among them, the Communication-Free Learning (CFL) algorithm [1] is a fully decentralized algorithm inspired by the problem of allocating non-interfering channels in IEEE 802.11 wireless networks.
The CFL algorithm works as follows. For each vertex $v \in V$, let $C_n(v) \in \{1,...,k\}$ denote the color of $v$ in the $n$-th iteration. Also, let $P_n(v,c)$ denote the probability that vertex $v$ has color $c$ in the $n$-th iteration. That is, $\Pr\{C_n(v) = c\} = P_n(v,c)$. At $n = 0$, the color of each vertex $v$ is chosen uniformly at random, i.e., $P_0(v, c) = 1/k$ for all $v \in V$ and $c = 1,...,k$. For $n \ge 1$, the color of $v$ is chosen randomly according to the following rule: If no adjacent vertex to $v$ shares the same color with $v$, then we let $P_n(v, c) = 1$ for $c = C_{n-1}(v)$ and let $P_n(v, c) = 0$ otherwise. Note that this implies $C_n(v) = C_{n-1}(v)$. If there is a collision of color between $v$ and its neighbors, then we choose $C_n(v)$ according to probability distribution $P_n(v, \cdot)$, such that for $c = 1,...,k$,
$$ P_{n}(v,c)=\begin{cases} (1-\beta)\cdot P_{n-1}(v,c), & c=C_{n-1}(v)\\ (1-\beta)\cdot P_{n-1}(v,c)+\beta\cdot(k-1)^{-1}, & \mbox{otherwise,} \end{cases} $$ where $\beta \in (0,1)$ is a pre-determined parameter. Observe that the vertex colors stabilize if and only if they form a $k$-coloring.
The CFL algorithm can be executed via the following decentralized procedure:
- Set up $C_0(v)$ and $P_0(v,\cdot)$ for each vertex $v$ and let all vertices be active.
-
A) An active vertex $v$ is deactivated if no vertices in $N(v)$ share the same color with $v$. An inactive vertex $v$ is activated if there is a vertex in $N(v)$ sharing the same color with $v$.
B) Each vertex $v$ computes $C_n(v)$ and $P_n(v,\cdot)$ upon the $n$-th change of the vertex state (from active to inactive or vice versa). - Repeat Step 2 until all vertices are simultaneously inactive.
type Palette = (Color, List[Double], Boolean, Random)composed of the vertex color $C_n(v)$, the color distribution $P_n(v,\cdot)$, the vertex state (active/inactive), and a random number generator in turn. Given a graph RDD
graph
, we first construct its base graph and initialize the attributes of the vertices:val seed = Random.nextInt val baseGraph = graph.mapVertices((id, _) => { val rng = new Random(seed + id) (rnd.nextInt(k + 1), initColorDist, true, rng) })Note that we choose to maintain an RNG at each vertex. One may feel tempted to declare a global RNG and share it among the vertices. However, since Spark serializes the closure as it dispatches jobs, doing so will lead each vertex to use a copy of the global RNG instead of sharing it. Hence, all vertices will obtain the same sequence of random numbers upon their queries, which is definitely an undesired situation.
We use functions
sendMsg
and mergeMsg
to carry out Part A of the second step in the procedure:def sendMsg (edge: EdgeTriplet[Palette, _]): Iterator[(VertexId, Boolean)] = { if (edge.srcAttr._1 == edge.dstAttr._1) return Iterator((edge.srcId, true)) if (edge.srcAttr._3) return Iterator((edge.srcId, false)) Iterator.empty } def mergeMsg (msg1: Boolean, msg2: Boolean) = msg1 || msg2For each edge $(v, u)$, the
sendMsg
function can activate or deactivate the source vertex $v$ by sending it a Boolean message. The messages are aggregated by mergeMsg
via disjunction, such that a vertex is activated if any of its neighbors wants to activate it, and is deactivated if all of its neighbors want to deactivate it.Part B of the second step is carried out by the
vprog
function:def vprog (id: VertexId, attr: Palette, active: Boolean): Palette = { val color = attr._1 val dist = attr._2 val rng = attr._4 val newDist = dist.foldLeft((1, List[Double]())) { case ((i, list), weight) => (i + 1, if (active) list :+ (weight * (1 - beta) + (if (color == i) 0.0 else beta / (maxNumColors - 1))) else list :+ (if (color == i) 1.0 else 0.0)) }._2 val newColor = if (active) sampleColor(newDist, rng.nextDouble) else color (newColor, newDist, active, rng) }The
vprog
function uses a helper function colorSampler
to color a vertex randomly according to its current color distribution:def sampleColor (dist: List[Double], rnd: Double): Int = { dist.foldLeft((1, 0.0)) { case ((color, mass), weight) => { val m = mass + weight (if (m < rnd) color + 1 else color, m) }}._1 }Finally, we invoke
Pregel
with the above functions to compute a $k$-coloring for the base graph baseG
:Pregel(baseGraph, true)(vprog, sendMsg, mergeMsg).mapVertices((_, attr) => attr._1)It is shown in [2] that the CFL algorithm colors a graph in exponential time with high probability. More precisely, given any $\epsilon \in (0,1)$, the algorithm can find a $k$-coloring for a $k$-colorable graph with probability $1-\epsilon$ in $O(N\exp^{cN\lg 1/\epsilon})$ iterations, where $N$ is the number of vertices and $c$ is a constant depending on $\beta$.
References
1. D. J. Leith, P. Clifford, "Convergence of distributed learning algorithms for optimal wireless channel allocation." IEEE CDC, 2006.2. Duffy, Ken R., et al. "Complexity analysis of a decentralised graph colouring algorithm." Information Processing Letters, 107.2, 2008.