Ericpony's Blog: February 2015

Saturday, February 21, 2015

An example of online algorithm

Online algorithms. An online algorithm are designed to receive and serve an unbounded finite sequence of requests $\sigma=\sigma(1),\sigma(2),...,\sigma(m)$ with a bounded memory. The requests are served in the order of occurrence. When serving request $\sigma(t)$, the algorithm doesn't have knowledge of requests $\sigma(t')$ for $t'>t$. The total number of requests is unknown until the last request is served. Serving requests incurs cost. The goal of an algorithm designer is to minimize the total cost paid on serving the entire request sequence.
Competitive analysis. The performance of an online algorithm is usually evaluated by comparing it with an optimal offline algorithm in the worst case. Let $ALG$ be an online algorithm, and $OPT$ be an algorithm that knows all the requests in advance, so that it can serve them in the minimal cost. Given a request sequence $\sigma$, let $ALG(\sigma)$ and $OPT(\sigma)$ denote the costs incurred by $ALG$ and $OPT$, respectively. Algorithm $ALG$ is called c-competitive if there exists a constant $b$ such that for all sequences $\sigma$, $$ALG(\sigma) \le c · OPT(\sigma) + b.$$The competitive ratio of algorithm $ALG$ is defined as $c(ALG)=\sup_{\sigma}ALG(\sigma)/OPT(\sigma).$ The best competitive ratio an algorithm can achieve is thus $c^*=\inf_{ALG}c(ALG)$.

Example. The ski rental problem is one of the classic online optimization problems. The scenario is as follows. You are having a vacation at a ski resort without skis. You can either rent or buy skis from the local shops. You are determined to ski on every sunny day. Suppose the cost of renting skis is one dollar per day, and the cost of buying skis is $s>1$ dollars. Denote the number of sunny days in your vacation as $d$. Clearly, if $d$ is known beforehand (eg. from a magical local weather forecast service), then the optimal (offline) strategy is to rent skis all the way when $d<s$, and buy skis at the first day when $d\ge s$. Now, let's assume you are in the opposite situation, where $d$ is unknown and can't be estimated. You need an online algorithm to help you make decision to rent or buy skis.
Suppose a deterministic online algorithm tells you to rents skis up to day $k$ ($k\le d$) and buy skis on day $k+1$ when $k<d$. Then the competitive ratio of this algorithm is computed by\begin{align*}\label{eq:eg1} \max\left\{\min\limits_{k\ge d}\max\limits_{k\ge d, s\ge d}\left\{\frac{d}{d}\right\}, \min\limits_{k\ge d}\max\limits_{k\ge d, s<d}\left\{\frac{d}{s}\right\}, \min\limits_{k<d}\max\limits_{k< d, s\ge d}\left\{\frac{k+s}{d}\right\}, \min\limits_{k<d}\max\limits_{k < d, s<d}\left\{\frac{k+s}{s}\right\}\right\}. \end{align*}The four terms in the braces correspond to the online/optimal cost ratios in the four cases. In each case, the ratio is first maximized by the adversary (ie. the weather) by choosing $d$ assuming the values of $k$ and $s$. The online algorithm then minimizes the maximal ratio (ie. the worst case) by selecting $k$ based on the choice of $d$. In this way, we obtain a tight upperbound on the cost ratio for each case. Since competitive analysis is a worst-case analysis, the largest bound among the four upperbounds is chosen as the competitive ratio:$$\max\left\{1, \frac{s+1}{s}, \min_{k<s}\left\{\frac{k+s}{k+1}\right\}\right\} = \max\left\{1, \frac{s+1}{s}, \frac{2s-1}{s}\right\} = 2 - \frac{1}{s}.$$This ratio is met by renting for $s-1$ days and buying at the $s$th day when $s\le d.$ No deterministic algorithms can achieve a competitive ratio smaller than $2-\frac{1}{s}$.

Randomized online algorithms. In many problems, online algorithms can achieve a better performance if they are allowed to make random choices. A randomized online algorithm $ALG$ is called $c$-competitive if there are constants $c,b$ such that $E[ALG(\sigma)] \le c ·OPT(\sigma)+b$ for all request sequences $\sigma$, where expectation is taken over the random choices made by $ALG$. The best competitive ratio for randomized online algorithm is thus $c^*=\inf_{ALG}\sup_{\sigma}E[ALG(\sigma)]/OPT(\sigma).$ For simplicity, here we only consider requests generated by an oblivious adversary, that is, an optimal offline algorithm that generates the entire request sequence before any requests are served and any random choices are made. One can also consider competitive ratios against more powerful adversaries; see [3] for more details.

Example. In the ski rental problem, a randomized algorithm chooses some day $k$ at random according to some distribution, rents for up to $k$ days and buys skis at day $k+1$. Assuming the distribution of $k$, the adversary would choose $d$ to maximize the expected online/optimal cost ratio in each case. The actual distribution of $k$ is then determined to minimize the maximal expected ratio over all cases. It turns out that the optimal competitive ratio of randomized online algorithms is roughly $\frac{e}{e-1}\approx 1.58$ [1,2]. (TODO: give the derivation of $\frac{e}{e-1}$.)

Lowerbound analysis. A randomized algorithm may be interpreted as a random choice among deterministic algorithms. In this sense, the problem of finding the best competitive ratio may be viewed as a 2-person zero-sum game, where i) the players are the Algorithm Designer and the Adversary, ii) Adversary’s strategies are all request sequences, iii) Designer's strategies are all deterministic online algorithms, and iv) the payoff for Designer is $ALG(\sigma)/OPT(\sigma)$. In addition, the players are allowed to use randomized strategies. Suppose that Designer chooses algorithms $ALG$ according to probability distribution $P$, and Adversary chooses sequences $\sigma$ according to probability distribution $Q$. Then it follows from Yao's principle that the best competitive ratio $c^*$ for a randomized online algorithm satisfies $$c^*\ge \inf_{ALG}E_Q[ALG(\sigma)/OPT(\sigma)]$$That is, lowerbounds for $c^*$ can be obtained by computing the minimal expected online/optimal cost ratio of deterministic online algorithms against random request sequences. (In fact, using a more general result called von Neumann's Minimax Theorem, one can show that $c^*=\sup_{Q}\inf_{ALG}E_{Q}[ALG(\sigma)/OPT(\sigma)].$)

References

1. https://en.wikipedia.org/wiki/Ski_rental_problem
2. http://cs.brown.edu/~claire/Talks/skirental.pdf
3. http://www14.in.tum.de/personen/albers/papers/inter.pdf

Thursday, February 12, 2015

Designing algorithms with Spark aggregate – Part I

In this post, I shall give a simple framework to design algorithms in terms of Spark's aggregate function. This framework is just a proof of concept and should not be applied blindly in practice. However, it does induce efficient algorithms in some special cases, as will be shown in the examples.

In general, our goal is to compute a property of any given instance from a domain. Let $G$ denote the domain of all interested instances, and $C$ the domain of the desired property. For example, if $G$ denotes graphs and the property to compute is MST, then $C$ will be the set of trees. Furthermore, we assume that there exists an associative and commutative "composition" operator $\oplus : G \rightarrow G \rightarrow G$, such that an instance of interests can be decomposed into (and composed back from) sub-instances in $G$. We shall denote the desired property of instance $g$ as $f(g)$. For example, if $g$ is a graph, then $f(g)$ can be some graph property such as diameter, MST and vertex cover. Note that the the mapping $f : G \rightarrow C$ can be non-deterministic if the property is not unique w.r.t. the given instance.

To use the aggregate function, we specify zero, seg and comb as follows:
$\mathrm{zero} \in (G, C)$ is a pair comprised of the zero values from $G$ and $C$. If there are no such zero values, we can set $\mathrm{zero} = (g, f(g))$ for an arbitrary sub-instance of $g$.
$\mathrm{seq} : (G, C) \rightarrow G \rightarrow (G, C)$ takes a partially aggregated result $(g_1, f(g_1))$ and a sub-instance $g_2$. It outputs $(g_3, f(g_1 \oplus g_2))$ such that $f(g_3 \oplus g) = f(g_1 \oplus g_2 \oplus g)$ for all $g \in G$.
$\mathrm{comb} : (G, C) \rightarrow (G, C) \rightarrow (G, C)$ takes two partially aggregated results $(g_1, f(g_1))$ and $(g_2, f(g_2))$. It outputs $(g_3, f(g_1 \oplus g_2))$ such that $f(g_3 \oplus g) = f(g_1 \oplus g_2 \oplus g)$ for all $g \in G$.
The rest of the steps are standard: we first decompose the input instance into a sequence of sub-instances. The sequence is then converted to an RDD through parallelize. Finally, we invoke RDD.aggregate(zero)(seq,comb) to compute the desired property of the instance.

Remarks

1. While functions seq and comb look very similar in spec, they can use different methods to compute $f(g_3)$. Due to the partial results provided, seq is suitable to use incremental methods and comb can use (the merge part of) divide-and-conquer methods to compute the desired property. See the convex hull example below.
2. The assumption that operator $\oplus$ is associative and commutative over $G$ can be relaxed to be so only with respect to the given property $f$. For instance, suppose that $G$ is a set of finite strings and $\oplus$ is the string concatenation operator. While $\oplus$ is not commutative per se, it is commutative w.r.t. the string length property. That is, $\mathrm{strlen}(g_1 \oplus g_2) = \mathrm{strlen}(g_2 \oplus g_1)$ for all $g_1, g_2 \in G$. The framework is thus applicable to compute the length of string. See the majority algorithm below for a more non-trivial example.
3. The framework is practical only if $|g_3|$ can be significantly less than $|g_1 \oplus g_2|$, for otherwise we will have to pass almost the whole instance from node to node. In the case that $f(g_1 \oplus g_2)$ can be computed from $f(g_1)$ and $f(g_2)$, there is no need to output $g_3$, i.e., $|g_3|=0$. This special case allows us to operate on simpler domains (e.g., the type of seq becomes $C \rightarrow G \rightarrow C$) and to obtain more efficient aggregate algorithms due to less communication overheads.

Examples

The above naive framework can yield efficient algorithms if $f(g_1 \oplus g_2)$ can be computed solely based on $f(g_1)$ and $f(g_2)$. Below we give three examples that satisfy this condition. The first is a variant of the reverse-delete algorithm that finds an MST of a graph. The second is an outline of a convex hull algorithm. The last is Boyer and Moore's algorithm that finds the majority element in a list.

Finding MST. Let $G$ be the set of all weighted undirected graphs, and $C$ be the set of trees. Given a graph $g$ from $G$, we assign a unique index to each of its edges. Let $w(e)$ and $idx(e)$ denote the weight and index of edge $e$, respectively. Given two edges $e_1$ and $e_2$ of $g$, we say $e_1 < e_2$ iff either $w(e_1) < w(e_2)$, or $w(e_1) = w(e_2)$ and $idx(e_1)$ < $idx(e_2)$. Now we define an operator msf over $G$, such that msf computes a minimum spanning forest of $g_1 \cup g_2$ from $g_1$ and $g_2$ as follows:

def msf (g1, g2) = {
  var g = g1 ∪ g2
  while (g contains a cycle c)
    remove the largest edge from c
  return g
}

Let RDD(g) denote the RDD obtained from decomposing and parallelizing graph $g$. Then RDD(g).aggregate(Nil)(msf,msf) computes a minimum spanning forest of $g$, which would be a MST if $g$ is connected. Note that the result is deterministic if the assignment of indices is deterministic.

Finding convex hull. Here $G$ is the collection of finite sets in Euclidean plane and $C$ is the set of convex polygons. The convex hull of a set of points is the smallest convex polygon containing all the points. If the points are given as a sequence, we can use aggregate to find their convex hull by computing smaller convex hulls incrementally in seq and then merging them in comb to obtain the final convex hull. Both of seq and comb can be implemented to take linear time (see the links for pseudo codes).

Finding majority. Let $C$ be a set of comparable elements. Fix some element $\bot \in C$ and let $G = (C - \{\bot\})^*$ be a set of finite sequences. The majority problem is to determine in a given sequence from $G$ whether there is an element with more occurrences than all the others, and if so, to determine this element. Click here to see a linear-time algorithm that solves the problem using aggregate. Note that the aggregate invocation only finds a candidate for majority; we need a further check to assert that the candidate is indeed the majority element.

// Input:  a list of elements and an element 
// Output: a boolean value indicating whether elem is the majority of list
def check_majority (list: G, elem: C): Boolean = {
  def zero = (elem, 0)
  def op ((a: C, c: Int), b: Int) = {
    if(a != b)
      return (a, c - 1)
    else
      return (a, c + 1)
  }
  val res = list.fold(zero)(op)
  return res._2 > 0
}
// Input:  a list of comparable elements
// Output: the majority element of the list, or ⊥ if there is no such element
def find_majority (list: G): C = {
  def zero = (⊥, 0)  
  def seq ((a: C, c: Int), b: Int) = { 
    if(c == 0) 
      return (b, 1)
    if(a == b)
      return (a, c + 1)
    else
      return (a, c - 1)
  }  
  def comb ((a1: C, c1: Int), (a2: C, c2: Int)) = {
    if(a1 == a2) 
      return (a1, c1 + c2)
    if(c1 > c2)
      return (a1, c1 - c2)
    else
      return (a2, c2 - c1)
  }
  val res = list.aggregate(zero)(seq, comb)

  if(check_majority(list, res._1))
    return res._1
  else
    return ⊥
}