This is ericpony's blog

Thursday, March 13, 2014

Scala: Views of Collections

In Scala, you can create Views for many types of collections. Views are non-strict versions of collections meant to be treated much like a database view. This means that the elements are calculated at access and not eagerly as in normal collections. As an example consider the following code:
val ys = (1 to 10).view.map { x => println(x); }
This will not print anything but every access to the list will perform the calculation and print the value, i.e. every call to ys.head will result in a number being printed. If you want to get a strict version of the collection again you can call ys.force, which will print all numbers out.

One use of views is to traverse a collection of values that are expensive to compute and only one value is needed at a time. Also, views let you build lazy sequences by calling toStream on them, which will also cache the evaluated elements. Another use of views is to avoid intermediate copies of collection. For example, the following code
(1 to 10).map(_-1).map(_*2).map(-_)
creates a new list for each call of map operation. To avoid the intermediate results, one can turn the list first into a view, applying all transformations to the view, and finally forcing the view to a sequence:
(1 to 10).view.map(_-1).map(_*2).map(-_).force
The three stored functions (_-1), (_*2), (-_) get applied as part of the execution of the force operation and a new sequence is constructed. That way, no intermediate data structure is needed. Another example of the use of view would be (assuming that we have very little memory at hand):
scala> (1 to 1000000000).take(5).toList
java.lang.OutOfMemoryError: GC overhead limit exceeded
Here Scala tries to create a collection with 100000000 elements to access the first five. Using view, only the first five elements will be generated and accessed:
scala> (1 to 1000000000).view.take(5).force.toList
res2: List[Int] = List(1, 2, 3, 4, 5)
The third use case applies to views over mutable sequences. Many transformer functions on such views provide a window into the original sequence that can then be used to update selectively some elements of that sequence. The view does not copy these elements, it just provides a reference to them. For example,
def neg(a: collection.mutable.Seq[Int]) = a.indices.foreach(i => {a(i) = -a(i)})
var A = Array(1,2,3,4,5,6)
neg(A.slice(1,4))      // A doesn't change
neg(A.view.slice(1,4)) // A becomes (1,-2,-3,-4,-5,6)

Remarks. For smaller collection sizes the added overhead of forming and applying closures in views is often greater than the gain from avoiding the intermediary data structures. A possibly more important reason is that evaluation in views can be very confusing if the delayed operations have side effects. Because of the penalties incurred by views, one should usually force it after applying the transformations, or keep it as a view if only few elements are expected to ever be fetched, compared to the total size of the view.

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...