This is ericpony's blog

Saturday, March 4, 2017

A note on parallelizable functions

 click me to see the origin
Yesterday I was accidentally asked that what kind of problems can be benefited the most from distributed computing frameworks such as MapReduce. I responded immediately that any problem that can be tackled by divide-and-conquer strategies should fit in the framework. However, on second thought, I found the answer missing the point because it remains unclear as to what kind of problems are solvable by divide and conquer. For example, sorting can be done in this way because sort(a++b) can be obtained from sort(a) and sort(b). How about the Boolean function is_sorted? The value of is_sort(a++b) can not be determined merely from the return values of is_sort(a) and is_sort(b). So there must be some characteristics that tell sort and is_sort apart.
Let's call a function parallelizable if it can be computed by divide and conquer. It is possible to "lift" a non-parallelizable function to a parallelizable one, such that the output of the original function can be obtained from that of the lifted function. In the above example, function is_sort can be made parallelizable by augmenting its return value with additional information max(s) and min(s) for input list s, since $a$$+\!\!+$$b$ is sorted in ascending order if and only if $a$ and $b$ are sorted in ascending order and $\max(a)\le \min(b)$. This parallelization is also effective because it reduces the time complexity from linear to logarithmic in appropriate computation models. So here we can ask several interesting questions: 1) What functions are parallelizable? 2) Is it always possible to lift a non-parallelizable function to a parallelizable one? 3) Do parallelizable functions always have efficient parallel implementations?

Homomorphism and Parallelism

After a brief survey, I found the rough ideas in my mind actually fall into a popular line of research over a long period of time. Particularly, there is a well-known result connecting homomorphisms with functions that admit divide-and-conquer imple-mentations.
Definition. A function $f : [A]\rightarrow B$ is $\odot$-homomorphic for operator $\odot:B\times B\rightarrow B$ if $f(s$$+\!\!+$$t) = f(s)\odot f(t)$ for any lists $s$ and $t$ on $[A]$, the set of finite lists of type $A.$
Note that the definition implies that $\odot$ is associative on $B$, since $+\!\!+$ is associative on $[A].$ Also, $f([\ ])$ is a unit of $\odot$ if $f$ is defined on $[\ ]$, which is a unit of $+\!\!+$ on $[A]$.
The First Homomorphism Theorem. ([1]) A function $f : [A]\rightarrow B$ can be computed by divide and conquer if and only if it is $\odot$-homomorphic for some $\odot$ on $B$.
It is clear that the $\odot$ operator captures the semantic of "combine" in a divide-and-conquer algorithm. In MapReduce, the conquer phase is done by a mapper and the combine phase by a reducer. However, as MapReduce does not specify the order of reduction, the $\odot$ operator (i.e. the reducer) has to be commutative for the outcome to be deterministic [2].
The above theorem tells us that a function is parallelizable iff it is homomorphic. Interestingly, it turns out that any non-homomorphic function can be trivially lifted to a homomorphism using a trick called "tupling". For example, this trick would lift $is\_sort: [Num]\rightarrow Bool$ to a function $is\_sort': [Num]\rightarrow [Num]\times Bool$ that is $\odot$-homomorphic with respect to $(s, a)\odot (t, b) \equiv (s$$+\!\!+$$t,\ is\_sort(s$$+\!\!+$$t))$. The parallel implementation based on this lift is clearly no better than its sequential counterpart. Similarly, $sort:[Num]\rightarrow[Num]$ is $\odot$-homomorphic when $\odot$ is defined by $s \odot t \equiv sort'(s$$+\!\!+$$t)$, where $sort'$ can be any sorting algorithm such as insertion sort, though the resulted parallel algorithm is inefficient and useless. Therefore, finding an efficient parallel implementation for a function, parallelizable or not, amounts to finding a $\odot$ operator with low computation costs. (To be continued)

References

1. Bird, Richard S. "An introduction to the theory of lists." 1987.
2. Chen, Y. F., et al. "Commutativity of Reducers." 2015.

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...