Jeremy Siek: June 2012

I recently ran across a nice paper Extensible Programming with First-Class Cases by Blume, Acar, and Chae. This paper is about a kind of type-safe disjoint union. As usual, there is a case construct that looks at a union to see which case it is and dispatches to the code for handling that case. What's different about their approach is that the handlers are first-class entities that may be defined separately from the case construct. The nice thing about having first-class handlers is that they can be used to improve modularity, for example, solving the expression problem.
Their paper presents first-class handlers in the setting of a language with type inference, which complicates the formal system and makes it a bit difficult to see the essential idea. In this blog post, I present the idea of first-class handlers in its simplest form, following the style of presentation in Pierce's Types and Programming Languages (TAPL). If you are familiar with variants in TAPL, think of the following as a nice improvement on them. As in TAPL, we present immutable variants, that is, there's no assignment to union fields, only initialization. (Otherwise the subtyping rules would need to be more restrictive.)
In addition to the usual types in the simply-typed lambda calculus, and the variant type, we add the type for a handler, which is like a function type but where the parameter type is fixed to be a variant type. Departing a little from the variants in TAPL, the ordering of labels in the variant and handler types does not matter.
$\begin{array}{rcl} T & ::= & \ldots \mid \langle l_1:T_1, \ldots, l_n:T_n \rangle \mid \langle l_1:T_1,\ldots,l_n:T_n \rangle \Rightarrow T \end{array}$
The relevant syntax definitions are as follows.
$\begin{array}{rcl} e & ::= & \ldots \mid \langle l = e \rangle \mid \langle l:T \Rightarrow e \rangle \mid e \,\mathtt{|}\, e \mid \mathbf{case}\, e \,\mathbf{of}\, e \end{array}$

The expression $\langle l = e \rangle$ creates (or "introduces") a union object with the union field l initialized to the value of e.
The expression $\langle l:T \Rightarrow e \rangle$ creates a handler object for label l. The expression e in the handler must evaluate to a function whose parameter type is T, the idea being that the value stored in union field l will be passed to the handler function.
The vertical bar expression, $e_1\,\mathtt{|}\, e_2$ , combines two sub-handlers into a composite handler that can handle the union of all of the cases that can be handled by the two sub-handlers.
The case expression has two subexpressions. The first must evaluate to a union object and the second must evaluate to a handler. The case then looks at the union and invokes the appropriate sub-handler.

To get a feel for how these features work together, let's look at an example. The following code creates two union objects, two handlers, and assigns one of the two union objects to z, depending on whether w is true or false (which we leave unspecified at the moment). The code then combines the two handlers into hs and uses a case to invoke the appropriate handler for z. $\begin{array}{l} \mathbf{let}\, u_1 = \langle i = 0 \rangle \,\mathbf{in} \\ \mathbf{let}\, h_1 = \langle i:\mathtt{Nat} \Rightarrow \lambda x:\mathtt{Nat}. x == 0 \rangle \,\mathbf{in}\\ \mathbf{let}\, u_2 = \langle b = \mathbf{true} \rangle \,\mathbf{in} \\ \mathbf{let}\, h_2 = \langle b:\mathtt{Bool} \Rightarrow \lambda y:\mathtt{Bool}. \mathbf{not}\,y \rangle \,\mathbf{in}\\ \mathbf{let}\, z = \mathbf{if}\, w \, \mathbf{then}\, u_1 \, \mathbf{else}\, u_2 \, \mathbf{in}\\ \mathbf{let}\, hs = h_1 \mathtt{|} h_2 \,\mathbf{in}\\ \mathbf{case}\, z \, \mathbf{of}\, hs \end{array}$ If w is true, then the result of this program is true (because 0 is equal to 0). Otherwise, the result is false (because not true is false).
Next we define the type system for first-class cases.
$\begin{gather*} \frac{\Gamma \vdash e : T}{\Gamma \vdash \langle l = e \rangle : \langle l:T \rangle} \qquad \frac{\Gamma \vdash e : T_1 \rightarrow T_2}{\Gamma \vdash \langle l:T_1 \Rightarrow e \rangle : \langle l:T_1 \rangle \Rightarrow T_2} \\[2ex] \frac{\begin{array}{l} \Gamma \vdash e_1 : \langle l_1:T_1,\ldots,l_n:T_n \rangle \Rightarrow T \\ \Gamma \vdash e_2 : \langle l_{n+1}:T_{n+1},\ldots,l_m:T_m \rangle \Rightarrow T\\ \{l_1,\ldots,l_n\} \cap \{l_{n+1},\ldots,l_m \} = \emptyset \end{array} } {\Gamma \vdash e_1 \,\mathtt{|}\, e_2 : \langle l_1:T_1,\ldots,l_m:T_m\rangle \Rightarrow T} \\[2ex] \frac{\begin{array}{l} \Gamma \vdash e_1 : \langle l_1:T_1,\ldots,l_n:T_n \rangle \\ \Gamma \vdash e_2 : \langle l_1:T_1,\ldots,l_n:T_n \rangle \Rightarrow T \end{array} }{\Gamma \vdash \mathbf{case}\,e_1\,\mathbf{of}\,e_2 : T} \\[2ex] \frac{\Gamma \vdash e : T_1 \quad T_1 <: T_2 }{\Gamma \vdash e : T_2} \end{gather*}$
The subtyping rules are the usual ones for functions, etc. (see TAPL) plus the following subtyping rule for unions.
$\frac{\begin{array}{l} \{l_1,\ldots,l_n \} \subseteq \{ l'_1,\ldots,l'_m\} \\ \forall i\in 1..n, \exists j, l_i = l'_j \text{ and } T_i <: T'_j \end{array} } {\langle l_1:T_1,\ldots,l_n:T_n \rangle <: \langle l'_1:T'_1,\ldots,l'_m:T'_m \rangle }$
Next we specify the dynamic semantics of first-class cases. The reduction rule for case finds the matching handler $v_i$ and applies it to the union's field value v.
$\mathbf{case}\, \langle l_i= v \rangle \,\mathbf{of}\, \langle l_1{:}T_1 \Rightarrow v_1 \rangle \,\mathtt{|} \ldots \mathtt{|}\, \langle l_n{:}T_n \Rightarrow v_n \rangle \longrightarrow v_i\,v$
To specify order of evaluation, we define evaluation frames (a shallow version of evaluation contexts that I picked up indirectly from Andrew Myers lecture notes) and a reduction rule for evaluating inside a frame. We write $F[e]$ for replacing the box inside F with the expression e.
$F ::= \Box\,e \mid v\,\Box \mid \langle l = \Box \rangle \mid \langle l : T \Rightarrow \Box \rangle \mid \Box \,\mathtt{|}\,e \mid v \,\mathtt{|}\,\Box \mid \\ \mathbf{case}\,\Box\,\mathbf{of}\,e \mid \mathbf{case}\,v\,\mathbf{of}\,\Box \\[2ex] \frac{e \longrightarrow e'}{F[e] \longrightarrow F[e]}$
To finish things off, it is always good to understand the values of a language, that is, the expressions that are well-typed and that cannot be further reduced. In addition to the usual value forms of the simply-typed lambda calculus, we add the following:
$\begin{array}{rcl} v & ::= & \ldots \mid \langle l = v \rangle \mid \langle l : T \Rightarrow v \rangle \mid v \,\mathtt{|}\,v \end{array}$
One last remark. This notion of first-class cases is quit similar to the categorical notion of a coproduct (or sum), where the unique arrow of a coproduct, $\langle f|g \rangle$ , roughly corresponds to the vertical bar operator defined above. I had seen coproducts many times, but it wasn't until reading Blume et al.'s paper that I realized that one might actually want to have $\langle f|g \rangle$ in a programming language!

Jeremy Siek

Friday, June 29, 2012

First-class Cases