Rethinking OOP Through Dependent Types and Codata

Table Of Links

DEPENDENTLY-TYPED OBJECT-ORIENTED PROGRAMMING

Typical programs written in object-oriented and functional languages have many differences. We will now look at how these differences appear when we consider programs written in the objectoriented style.

2.1 Method Call Syntax and Self Parameters

Object-oriented programmers are familiar with the method call syntax 𝑜.f(𝑒) to invoke a method f taking an argument 𝑒 on an object 𝑜. This is sometimes presented under the name “uniform function call syntax” as an alternative notation for the call f(𝑜, 𝑒), where the first argument of the method f is called the self parameter. We take this simple syntactic observation, see how we have to modify it to the dependently typed setting, and how this influenced our design of codata types.

As a starting point, we take the Agda proof from Figure 4. In that proof, the Booleans are defined as a datatype, negation is defined as a non-dependent function from Booleans to Booleans, and neg_- inverse is defined as a dependent function from a boolean 𝑥 to a proof that 𝑥 is equal to neg(neg 𝑥). In our system, we want to express both neg and neg_inverse without using non-dependent or dependent functions. For the non-dependent case, we can express negation directly as an observation on Booleans:

def Bool.neg: Bool { True => False, False => True }

If we want to express the dependent function neg_inverse as an observation on Booleans in a similar way, then we have to add a feature, self parameters. We can bind the term that we observe to a variable, which we have here called self, and use this variable in the return type of the observation:

def (self: Bool).neg_inverse: Eq(Bool, self, self.neg.neg) { True => Refl(Bool, True), False => Refl(Bool, False) }

The refunctionalization of these methods with self-parameters dictates the first feature of dependent codata types: self-parameters in destructors.

codata Bool { neg: Bool, (self: Bool).neg_inverse: Eq(Bool, self, self.neg.neg) }

It is self-parameters in codata declarations which give us the expressive power to properly represent verified interfaces and classes in an object-oriented style.

2.2 Verified Interfaces and Classes

When verifying data structures and algorithms in dependently typed languages, we can choose between two general approaches: intrinsic and extrinsic verification. Using intrinsic verification, we define the data structure or algorithm together with its correctness proofs. To intrinsically verify data structures, we commonly express properties using type indices, such as the number of elements contained in a length-indexed list.

We can employ a similar approach in an objectoriented style using indexed codata types [Thibodeau et al. 2016]. Using type indices, we can ensure that observations are called only on objects which are in the right state. This can be seen in the following example, where the observation read may be called only on non-empty buffers.

codata Buffer(m: Nat) { Buffer(S(n)).read(n: Nat): Pair(Bool, Buffer(n)) } codef EmptyBuffer: Buffer(Z) { read(n) absurd } codef Singleton(b: Bool): Buffer(S(Z)) { read(n) => MkPair(Bool, Buffer(Z), b, EmptyBuffer) }

We can see that, as usual for dependent (co)pattern matching, infeasible pattern matches may arise which need to be marked as absurd. That is, when we implement the buffer interface for the empty buffer we don’t have to implement the read method since it can never be called. However, in this work, we go beyond indexed codata types, which admit an intrinsic verification style.

We also want to support the extrinsic approach, where we want to separate our objects from their specifications. Jacobs [1995] provides us with an initial concept of how to attain that goal. He proposes a system of coalgebraic specifications that can be used to verify object-oriented classes. As an example, Figure 5a shows a coalgebraic specification for a one-element buffer that exhibits persistent read (PR) behavior: After an element has been stored, it cannot be replaced using the store method.

Instead, one needs to call the method empty to explicitly empty the buffer. Reading from the empty buffer returns an error. This specification of the buffer is given as a set of assertions that reference the buffer state s. In our system, we can realize this concept using self-parameters on destructors, allowing us to express specifications as observations on codata types. The codata type in Figure 5 defines the verified interface for persistent read buffers in our system. Similarly, we can apply this approach to express verified interfaces such as functors or monads.

2.3 Dependent Functions

Unlike in most other dependent type theories, the Π-type of dependent functions is not part of our core theory, but can be defined in a library. The Π-type is defined as a codata type indexed over a type family p, for which we use the ordinary non-dependent function type:

-- | Non-dependent Functions codata Fun(a b: Type) { Fun(a, b).ap(a b: Type, x: a): b } -- | Dependent Functions codata Π(a: Type, p: a -> Type) { Π(a, p).dap(a: Type, p: a -> Type, x: a): p.ap(a, Type, x) }

We propose that both dependent and non-dependent functions should be user-defined instead of built-in. The designers of Java decided to follow this approach when they introduced lambda abstractions as instances of functional interfaces [Goetz et al. 2014; Setzer 2003] in Java 8. This shows that our proposal is not radical, and we think it is also useful.

Apart from reducing the complexity of the core language, they simplify the situation if we have more than one function type. This is the case in substructural systems where we have linear and non-linear functions. For instance, in the Rust programming language, there are three different built-in function traits Fn, FnOnce, and FnMut, which differ in the modality of the receiver3 .

2.4 Weak and Strong Dependent Pairs

In the previous section, we showed how to define the Π-type. For the Π-type we had no choice but to define it as a codata type4 , but for the Σ-type we can choose whether to model it as a data or codata type. This choice distinguishes weak and strong Σ-types [Howard 1980]: Strong Σ-types are defined as a codata type with two projections, where the second projection mentions the result of the first projection in its return type; weak Σ-types, by contrast, are defined as a data type with one constructor which pairs the first and second element. This difference is more obvious if we first consider the case of non-dependent pairs, which can also be written as either a data or codata type.

These two representations can be obtained from each other by defunctionalization and refunctionalization. This is still the case when we generalize non-dependent pairs to the Σ-type. Similar to the Π-type in Section 2.3, the Σ-type is indexed by a type family 𝑇 . As a data type, it is defined by one constructor Pair which takes the type family 𝑇 , an element 𝑥 of type 𝐴 and a witness 𝑤 as arguments. As a codata type, we still have two projections 𝜋1 and 𝜋2 as in the case of non-dependent pairs. But the second projection now uses the self-parameter to guarantee that an element of type 𝑇 applied to self .𝜋1 is returned.

In fact, Agda can already represent Σ-types in both of these ways. But there is one caveat: Agda was not originally designed with codata types in mind, and its codata types are implemented on top of dependent records, which limits what kind of codata types are possible. For example, the order of the destructors in a codata type matter for Agda, so we cannot reorder the first and second projection. In our system the destructors of a codata type are not ordered and can mutually refer to each other, which precisely mirrors how definitions are mutually recursive on the toplevel.

But why should we care about these two alternative encodings of the Σ-type? Take, for example, Eisenberg et al. [2021] who discuss the addition of existential types to Haskell. Since Haskell both is lazy and supports type erasure, Eisenberg et al. are driven to a design that uses strong existential types. We think that by using the framework of data and codata types we can make these kind of differences even clearer.

2.5 Codata Encodings of Natural Numbers

Starting with the inception of the lambda calculus, researchers have been interested in functional encodings of data types such as booleans, natural numbers and lists. Classical examples of functional encodings are the Church, Scott and Parigot encodings of data types (cf. Geuvers [2014]; Koopman et al. [2014]). Functions are from our perspective just one particular instance of a codata type, so we are interested in the more general problem of codata encodings instead of functional encodings.

Most codata encodings of data types can be obtained by refunctionalizing a data type with an appropriate observation. That the Church encoding can be obtained from refunctionalizing a program with Peano numbers and an iter function has already been observed by Ostermann and Jabs [2018]; we restate this example in Figure 6.

We can observe that the codata type in Figure 6b which represents the Church encoding of natural numbers is not recursive. This corresponds to the well-known theorem that Church encodings can be typed in pure system F. If we apply the same method to obtain the Scott or Parigot encoding of natural numbers, then we can observe that the resulting codata type is recursive. This corresponds to the other well-known theorem that these encodings can not be typed in pure System F and require recursive types. We can even go one step further.

Geuvers [2001] showed that these previous encodings cannot express induction or dependent elimination. One way to obtain typed functional encodings which can express induction is to add a form of self types to the system; this kind of encoding was introduced by Fu and Stump [2014]. While it is hard to prove an exact correspondence, we think that the essential idea of the encoding of Fu and Stump can be expressed in our system in Figure 7 and Figure 8.

In Figure 7 we have encoded induction using a helper codata type StepFun which encodes the induction step for a given predicate 𝑃 on natural numbers. Induction is then expressed as the observation ind on a natural number 𝑛 which expects the base case and the induction step of the induction as arguments. The argument 𝑛 on which we define the observation occurs itself in the return type. Refunctionalization of this program results in Figure 8.

We think that this is further evidence that the self-parameters we introduced to the system occur naturally when we go from the non-dependent to the dependent setting.

Authors:

This paper is available on arxiv under CC 4.0 license.