Ericpony's Blog: May 2014

Thursday, May 29, 2014

Equivalence Checking in Lambda Calculus

Definitional equivalence checking is the problem of determining whether two λ-terms in λ-calculus are equivalent in definition. (By "definitional", we mean that equivalence is deﬁned directly by rules, rather than by an appeal to an operational semantics.) Equivalence of λ-terms in untyped λ-calculus is undecidable, see Scott's Theorem in [3]. This result is not surprising, since untyped λ-calculus is Turing complete. On the other hand, it is decidable for simply typed λ-calculus (aka. simple type theory, STT) with a single base type, see Sec. 6 of [4]. Equivalence checking would become undecidable once we extend STT with dependent types such as lists, or recursive data types such as natural numbers, see Chap 4 and 5 of [5].

It is interesting to check the weaker relation that whether two λ-terms are equivalent in operational semantics, i.e., they reduce to the same value given the same context. For example, $\lambda x.x:\;Nat\rightarrow Nat$ and $\lambda x.pred(suc\ x): Nat\rightarrow Nat$ are not definitionally equivalent but they are operationally equivalent. For untyped λ-calculus this problem is clearly undecidable, but for STT I am not sure. I haven't found enough materials to say anything on this subject by far.

References

1. Notes on Simply Typed Lambda Calculus (another lecture note)
2. Simply Typed Lambda Calculus on Wiki
3. Practical Foundations for Programming Languages (Preview of the 1st ed.)
4. Advanced Topics in Types and Programming Languages (available online)
5. http://www.cs.cmu.edu/~rwh/courses/logic/www/handouts/logic.pdf

Friday, May 23, 2014

Scala: Mapping a set removes duplicates

Consider the following code:

def sumSizes(collection: Iterable[List[_]]): Int = collection.map(_.size).sum

The function shall count the accumulated size of all the contained Lists. The problem is that if collection is a Set, then after mapping the objects to their lengths, the result is still a Set and all Lists of the same size are reduced to a single representative. For example, sumSizes(List(List(1,2), List(3,4))) outputs 4 but sumSizes(Set(List(1,2), List(3,4))) mistakenly outputs 2. To avoid such pitfalls one may write the method as

/* version 1 */
def sumSizes(collection: Iterable[List[_]]): Int = collection.toSeq.map(_.size).sum
/* version 2 */
def sumSizes(collection: Iterable[List[_]]): Int = collection.foldLeft(0)(_+_.size)

The first version requires an unnatural discipline of "always using toSeq before mapping" in programming, and worse, it introduces an overhead of allocating memory and copying elements which is seldomly necessary. The second version doesn't causes overheads, but it is asymmetric and thus less elegant. It is even potentially less efficient in a context where the underlying collection is parallel, since sum, which is implemented as fold(0)(_+_) in the Scala library, can be optimized better than foldLeft or foldRight in such a context.

Accidental removal of duplicates is often harder to spot in real code, especially when you write utility code which simply takes a Traversable and/or use the output of methods which are declared to return a Traversable, which may bring surprising results when the implementation happens to be a Set. Imagine the following situation:

val edgeNum = graph.nodes.map(_.edges).map(_.size).sum / 2

You fetch a collection of Node objects from a graph, use them to get their adjacent edges and sum over them. This works if graph.nodes returns a Seq, and it ruins if someone decides to make Graph return its nodes as a Set. A way to defend against surprises is to explicitly label types. For example, annotating methods with return types, even if it's not required. The annotation serves as a caution for the programmer if he wants to change the type of graph.nodes from Seq to Set.

All in all, the best practice to prevent these types of errors is good unit test coverage with representative data, together with sensible API's coupled with knowledge of how the scala compiler maintains source types through map/for generators etc. If you are returning a Set of something you should make that obvious, as returning a Collection/Traversable is hiding a relevant implementation detail and that sometimes causes unexpected problems.

Saturday, May 3, 2014

Scala: Subtyping

When a class inherits from another, the first class is said to be a nominal subtype of the other one. It is a nominal subtype because each type has a name, and the names are explicitly declared to have a subtyping relationship. On the other hand, a structural subtype relationship holds if two types have the same members. While the compiler only verifies this property at the level of type checking, a subtype should correctly implement the contracts of its supertypes so that Liskov's Substitution Principle applies, which essentially says that beside supporting the same operations, the operations in the subtype should require no more and provide no less than the corresponding operations in the supertype.

The Scala compiler allows a type’s subtypes to be used as a substitute wherever that type is required. For classes and traits that take no type parameters, the subtype relationship simply mirrors the subclass relationship. For classes and traits that take type parameters, however, variance comes into play. For example, class List is covariant in its type parameter, and thus List[Cat] is a subtype of List[Animal] if Cat is a subtype of Animal. These subtype relationships exist even though the class of each of these types is List. By contrast, class Array is nonvariant in its type parameter, so Array[Cat] is not a subtype of Array[Animal]. See here for a more detailed explanation of Scala's variation rules.

Subtyping rules of function types

Suppose that Cat and Dog are subtypes of type Animal. Consider function types A and B defined as follows:
type A := Animal => Dog
type B := Cat => Animal
What should be the subtyping relation between A and B? According to Liskov's Substitution Principle, $T$ is a subtype of $T'$ iff whatever we can do with $T'$, we can do it with $T$. In this example, we can pass a cat to a function of type B and get some animal in return. We can do the same thing to A: if we pass a cat to a function of type A, it returns a dog, which is an animal. Hence, A satisfies the same contract with B (in fact its contract says more than that, as it guarantees that the returned animal is a dog), and thus A is a subtype of B.

In general, if we denote the "subtype of" relation by $<$, then we can express the subtyping rule as $(A \Rightarrow B) < (C \Rightarrow D)$ iff $C < A$ and $B < D$. To see this, note that arguments are something that’s required, whereas return values are something that’s provided. To satisfy Liskov's Substitution Principle, a function subtype must require less and provide more than its supertype. Since a subtype contains more information than its supertype, a reasonable function type should be contravariant in the argument types and covariant in the result type.

Behavioral subtyping

Let A and B be two types. We say A is a behavioral subtype of B iff the set of properties an object of type A satisfies is a superset of the properties satisfied by an object of type B. In terms of Liskov's principle, this means that a property provable by objects of type B should also be provable by objects of type A.

Behavioral subtyping is a stronger notion than classical subtyping rules of function types, which rely merely on type signatures. It is a semantic rather than syntactic relation because it intends to guarantee semantic interoperability of types in a hierarchy. Behavioral subtyping is thus trivially undecidable: if the property to prove is "a function always terminates for arguments of type A", then it is impossible in general for a program (e.g. a compiler) to verify that property holds for any subtype of A.

Behavioral subtyping of function types imposing requirements on signatures just like the classical subtyping rules. In addition, there are a number of behavioral conditions a function subtype must meet:

Preconditions cannot be strengthened in a subtype.
Postconditions cannot be weakened in a subtype.
Invariants of the supertype must be preserved in a subtype.
History constraint: objects are regarded as being modifiable only through their methods.

A violation of history constraint can be illustrated by defining a mutable point as a subtype of an immutable point. On the other hand, fields added to the subtype may be safely modified because they are not observable through the supertype methods. Thus, one may derive a circle with fixed center but mutable radius from immutable point without violating subtyping rules.