This is ericpony's blog

Sunday, March 22, 2015

Designing algorithms with Spark aggregate – Part II

In the previous post, I proposed a framework for Spark aggregate and listed several examples derived from the framework. One may observe that the $\mathrm{comb}$ functions are associative and commutative in all but the last example. One may also observe that the results of aggregate calls are deterministic in all but the last example. Now a question rises: is it necessary to use an associative and commutative $\mathrm{comb}$ function in an aggregate call, so that the result can be deterministic? The answer is (somewhat surprisingly) no. In this post, I shall give a simple guideline to find deterministic aggregate examples with non-commutative $\mathrm{comb}$ functions.

Given domain $A$ and range $B$, the idea is to find $B' \subset B$, $\mathrm{seq}: B \rightarrow A \rightarrow B'$ and $\mathrm{comb}: B \rightarrow B \rightarrow B'$, such that $\mathrm{aggregate(zero,\ seq,\ comb,}\ as)$ is deterministic w.r.t. the provided $\mathrm{zero} \in B$ and all $as \in A^+$. Moreover, $\mathrm{comb}$ is non-commutative over $B$, but the restriction of $\mathrm{comb}$ to $B'$ is associative and commutative.

The above idea can be instantiated using an abelian subgroup $(B', ⊗)$ of a non-abelian group $(B, ⊗)$. For example, we can let $B'$ be a cyclic subgroup of a symmetric group $B$. Here ⊗ is function composition and $A$ is the set of integers:

$\quad \mathrm{zero}$: an arbitrary element from $B$.
$\quad \mathrm{seq}\ (b, n) = b^n$.
$\quad \mathrm{comb}\ (b_1, b_2) = b_1 ⊗ b_2$.

Observer that the range of $\mathrm{comb}$ is confined to $B'$ due to the closure property of group operators. For another example, we can let $B$ be the group comprised of all non-singular 2-by-2 square matrices, and $B'$ be the subgroup formed by the rotation matrices in $B$. Here we set $A = B$ and let ⊗ be matrix multiplication:

$\quad \mathrm{zero}$: the identity matrix.
$\quad \mathrm{seq}\ (b, a) = b ⊗ [a]$, where $[x] = x \cdot |\det(x)|^{-1}$, the normalized matrix of $x$.
$\quad \mathrm{comb}\ (b_1, b_2) = b_1 ⊗ b_2$.

We can also construct examples using an abelian base group. We just have to "twist" the $\mathrm{comb}$ function such that it is commutative only over $B'$. For example, let $B$ be the set of n-by-n square matrices, and B' be the subset of $B'$ formed by all matrices whose last row is zero. Moreover, set $A = B$ and use matrix addition $+$ as a commutative group operator. Then we have the following example:

$\quad \mathrm{zero}$: the zero matrix.
$\quad \mathrm{seq}\ (b, a) = b + \{a\}$, where $\{x\}$ is obtained from setting the last row of ${x}$ to zero.
$\quad \mathrm{comb}\ (b_1, b_2) = b_1 + \langle b_2\rangle$, where $\langle x\rangle$ is obtained from doubling the last row of $x$.

Observe that $\mathrm{seq}$ and $\mathrm{comb}$ behave just like ordinary matrix additions over $B'$. They however become non-commutative operators when they are applied to elements outside $B'$. It turns out that we can use a textbook of elementary abstract algebra as a free source of interesting aggregate examples, at least in theory.

Related Posts Plugin for WordPress, Blogger...