Exotic Programming Ideas: Module Systems(stephendiehl.com) |
Exotic Programming Ideas: Module Systems(stephendiehl.com) |
I don't really want to read a long form description of the OCaml implementation of modules. I want a comparison to the languages he dismissed at the beginning of the article, and a discussion of why this feature has some value that isn't provided by those languages.
Basically - This feels like a different take on generics to me. There might be a lot of value in how this is implemented when compared to generics in a language like Java/C#/Typescript, but I didn't find that content anywhere in the article...
interface Mappable<Array> {
map<A, B>(f: (a: A) => B, fa: Array<A>): Array<B>
}
Likewise, for `Promise`s we could write: interface Mappable<Promise> {
map<A, B>(f: (a: A) => B, fa: Promise<A>): Promise<B>
}
But in order to generalize this interface to an arbitrary type constructor such that `F: * -> *`, we would need to write interface Mappable<F> {
map<A, B>(f: (a: A) => B, fa: ?): ?
}
which is not possible in TypeScript since it does not support higher-kinded types or type parameters that take type parameters or parametrized modules. interface Mappable<P extends Mappable<P, unknown>, T> {
flatMap<U>(f: (x: T) => Mappable<P, U>): Mappable<P, U>;
}
class Maybe<T> implements Mappable<Maybe<unknown>, T> {
x: T | undefined;
public flatMap<U>(f: (x: T) => Maybe<U>): Maybe<U> {
if (this.x) {
return f(this.x);
}
return Maybe.nothing();
}
}Secondly, generics depend upon (a) having a means to discuss functionality which abstracts over one or more types and certain behaviors those types must support, (b) having a means to bundle up one or more types along with some behaviors, and (c) being able to combine those two.
In Typescript/Java/C# this is mostly carried out by classes and subtyping. Abstraction occurs when we ask not for a specific type but instead for something a little less than that specific type, one of its supertypes; bundling occurs in classes; and the combination occurs naturally as subtypes are transparently upcast to their supertypes.
There are two practical drawbacks to this approach:
First, it's hard to abstract over behavior that doesn't merely consume your abstract type but also returns it. When we do (c) via subclassing we have to upcast and it's not always clear or possible to re-downcast things back to the appropriate type. OO has tons of workarounds for this issue and related ones.
Second, it's hard to abstract over multiple interrelated types at once. For instance, a generic graph implementation might want to be abstract both in the types of nodes and the type of edges. The generic implementation can thus handle annotations at either the edges or the nodes. In OO abstraction, you might do something like have the edges be an associated type of the nodes, but this creates an unnecessary asymmetry.
The solution is a classic one. Instead of having the class represent an object, have the class represent a bundle of operations which act on abstract objects (the C++ vtable approach). For example, in pseudocode
class GRAPH
type Graph
type Node
type Edge
# These are hard to do with subclassing since Graph will often be upcast on return
def emptyGraph(): Graph
def simplify(g: Graph): Graph
# These represent non-trivial interactions between multiple types abstracted simultaneously
def addNode(g: Graph, n: Node): Graph
def neighbors(g: Graph, n: Node): List<Node>
And this, with the appropriate type discipline, is what OCaml does. Unfortunately, what you'll find is that OCaml's type discipline is critical and difficult to emulate. Making this sort of modularity work consistently involves some notions of equivalences and transparency that are natural to discuss when talking about modules but rarely show up in OO systems.In Zig, structs and modules are equivalent, and type declarations can be manipulated at compile time just like any other value. That, among other things[1], lets you write:
fn LinkedList(comptime T: type) type {
return struct {
pub const Node = struct {
prev: ?*Node,
next: ?*Node,
data: T,
};
first: ?*Node,
last: ?*Node,
len: usize,
};
}
I wonder if there's anything that OCaml functors can do but this can't.[1] for example, you can implement a very efficient printf that gives an error at compile time when the format string is invalid. See https://ziglang.org/documentation/master/#comptime for more details.
My reaction to the title was "But they are not exotic, I use them every day!"
It's definitely the feature I miss the most every time I work in other languages, even presumable "advanced" ones, like Haskell. One notable attempt to add them elsewhere is "modular C"[1].
I really like research language 1ML's approach to modules[0]. This allows monomorphic types to be treated like values, avoiding all of OCaml's module syntax (which can be a bit complex and verbose).
https://github.com/rossberg/1ml
There isn't special module syntax. Modules are more-or-less just structs, and the structs can contain types.
Modules (which are just top level classes and contain nested classes) have no import statement and no hard linked external dependencies. When you instantiate the module (~class) you pass in dependencies it needs.
Actually Mesa.
https://play.rust-lang.org/?version=stable&mode=debug&editio...
This can be really useful especially as traits with differing concrete types diverge, you can create a unified interface trait object to allow trait objects for things like container classes.
Unfortunately, Unix (and C) really botched signals by limiting their number in the user space (in particular, they cannot be stacked), and so the idea largely fell out of favor as an error-handling paradigm.
The PL/I condition mechanism was an integral part of the OS and fault handling. For example, a floating point exception or integer overflow was detected by hardware, caused a fault, the information from the fault was packaged into a condition frame (an extended stack frame), and a condition was raised. The OS looked backward through the stack looking for an "onunit" (aka condition handler) that handled this condition - basically like try & except.
Condition handlers were very flexible, with the option to partially handle the condition then let it continue with older frames, ignore the condition in this handler and let it continue, or completely handle it.
If no running program had a handler for a condition, the default OS handler would run, which often raised a new condition and printed an error message. If the problem causing the error was corrected (for example, some files were deleted from a full disk), you could use the REN (re-enter) command to return from the disk full condition and the system call causing the condition would be automatically re-tried, sort of like when EINTR occurs on a system call and it has to be (manually) retried.
Instead of using numbers, conditions had arbitrary names and arbitrary data could be associated with a condition.
In hindsight it was very similar to Python's try/except/finally error handling mechanism. There was a condition called "cleanup$", which was executed whenever a procedure was aborted because of a stack unwind to allow it to close files, delete temp files, etc.
I wrote an emulator for the Prime and you can see all this with:
telnet em.prirun.com 8001
There are all kinds of manuals online too, including a full PL/I implementation, all from the 80's and early 90's.
A big helper is when the module interface uses named parameters and has sensible defaults for unspecified parameters. This allows the module designer to add features without breaking existing code and makes it easier for someone to integrate the module in the first place. Having the documentation built into the module itself is also a huge win.
I think its module system is a huge bonus. It makes code a lot easier to organize and reason. It forces toplevel module X to be defined in X.agda so there is no argument what the module should be named or where it should be found.
(I'm also not sure if Zig does compile-time check if the type T is compatible with LinkedList. OCaml does this type of check, making sure the signature of T fits the requirement of the functor LinkedList. The module T may have certain functions, for example a compare function associated with T to ensure sorting of the elements in your data structure.)
The zig equivalent of a module, a struct, is also a first-class object in zig, so should be the same as OCaml there.
However, Zig does not have anything like a signature -- in that sense, it's "dynamically typed" at compile time (i.e., like type args to C++'s templates, however not nearly as bad because the errors you get are immediate and understandable -- think of it as more like "I fucked up the types and got a python-style error" than I "I fucked up the types and got a C++ templates error").
Also, as noted in other comments, Zig's type parameters are also dynamically typed. This leads to implementation details leaking out. This is not an issue in OCaml and other ML-style languages.
The power of the Ocaml module system is really in the functor, which the article only touches upon. You define a module and refer in its definition to types/functions of other modules that must be provided at the time you construct the module. Even better, you can define relations between the types and do type substitutions, e.g.: to construct this module, you need to give me 2 other modules, each with a specific set of functions, and for which type t1 of the first module, matches type t2 of the second module, while the actual type of t1/t2 does not matter.
See https://dev.realworldocaml.org/functors.html for more examples.
> Even better, you can define relations between the types and do type substitutions, e.g.: to construct this module, you need to give me 2 other modules, each with a specific set of functions, and for which type t1 of the first module, matches type t2 of the second module, while the actual type of t1/t2 does not matter.
Isn't this essentially the same as generic type arguments in other languages? Like in this pseudo TypeScript:
class CustomModule<T1 extends Module1Interface, T2 extends Module2Interface> {
constructor(t1: T1, t2: T2) { ... }
}So you can write a signature like:
module type VectorSpace = sig
module Field : Field
type t
val zero : t
val (+) : t * t -> t
val (*) : Field.t * t -> t
end
In OOP it becomes hard to write an interface like this simple example, and more complex examples become harder still.C++ templates can be used in a similar way, but it's not perfect. The closest match I can think of is a template class with static members.
D has one more mechanism that is conceptually very different, but could be used to express somewhat similar patterns: template mixins. These are best described as parameterized template sections of code worh one or more declarations that have to be reasonably well formed on their own and are virtually pasted into the code at the location where they are instantiated. They exist somewhere in the space between templates and C preprocessor macros.
Modules relax this single-type requirement, and let you define multiple types in it, which makes certain things more natural to express, without a need to create multiple shallow classes or manager-like classes.
Also, functors (i.e. "module functions") in the OCaml module system allows generics, when a module is essentially parameterized by one or more other modules.
Modules are structurally typed, so the more typical sub-typing behavior expected from mainstream OOP languages is not applicable to them.
C has (unparametrized) modules, calling them “compilation unit”.
Parametric types help with part (a) by allowing you to specify only part of the structure of your type. That can help enormously, though they also force some amount of concretion in your type which isn't always good. Ultimately, OCaml's module system is pointed in the direction of ad hoc polymorphism where you pass in behavior with your abstracted types.
Subtyping supports this passing of behavior as it lets you specify a whole space of types abstractly. In that way, it's a little more supportive of the pathway to ad hoc polymorphism.
interface HKT<F, A> {
_URI: F
_A: A
}
interface Mappable<F> {
map<A, B>(f: (a: A) => B, fa: HKT<F, A>): HKT<F, B>
}
where F is a unique identifier representing the type constructor and A its type parameter.[1]: https://www.cl.cam.ac.uk/~jdy22/papers/lightweight-higher-ki... [2]: https://github.com/gcanti/fp-ts [3]: https://gist.github.com/gcanti/2b455c5008c2e1674ab3e8d5790cd...
module Data = struct
type pair = int * float
type numbers = int list
let make_pair () = (0, 0.0)
let make_numbers () = []
end
This example is not very illustrative, but just explain the idea what I mean. We have two constructors: make_pair and make_numbers. So in a way, we can have multiple types in the module, if they are meant to be tightly related. We are not forced to make three classes here (Pair, Numbers, Data), everything is in one module.EDIT: OOP classes have a primary type in your class, and sometimes this creates artificial chicken or egg type of problem, where you cannot decide what is more fundamental (message vs receiver vs sender). Modules don't force this on you.
Where in a module system its perfectly fine to have multiple instances of Ord for a given type. One gives global consistency where the other gives local consistency.
Trying to stay neutral, I think its no suprise that some people prefer typeclasses...
[1]: https://existentialtype.wordpress.com/2011/04/16/modules-mat...
However, it does serve a similar purpose. For example, if you want to create a datatype for an OrderedList then you'd create a higher-order-module (functor) that receives as an argument another module containing all the necessary comparison functions for the list elements. For example, if you apply the OrderedList functor to the IntOrdering module the result would be an OrderedListOfInt module that provides an abstract data type implementing an ordered list of integers.
In the Ocaml standard library the names would be different but that's the basic idea.
module type Comparable = sig
type t
val compare : t -> t -> int
endProbably closer would be (not sure if this is possible in Typescript):
class CustomModule<S, T1 extends Module1Interface<S>, T2 extends Module2Interface<S> > {
constructor(t1: T1, t2: T2) { ... }
}
Meaning that, for example, within Module1Inteface, there is some function f1 that returns an S, and within Module2Interface, there is some function f2 that takes an S as argument.This does become a bit tedious notation-wise, if possible at all. In Ocaml, this would look like:
module CustomModule(M1: Module1)(M2: Module2 with type s = M1.s)Here's an example. This example is a bit contrived, because I couldn't think of a better example that was simple and yet demonstrated the power of modules. So this example can be re-phrased and re-structured into a more natural OO fit, but try to look past that. Suppose you want to make a module or set of interfaces that describe a classic board game (i.e. Chess or Checkers). So you have a Board. A Board has a series of Pieces, and each Piece has a set of valid moves on the board. And a move can be applied to a Board to modify the state of the game. Again, very much glossing over the details here to get to the meat of it. So you could write a series of interfaces
interface Board {
List<Piece> getPieces();
}
interface Piece {
List<Move> getValidMoves(b: Board);
}
interface Move {
void apply(b: Board);
}
But on it's own that isn't enough, because you don't want to be able to mix-and-match different interfaces for different games, like trying to find the set of valid moves for a chess piece on a checker board. So you need to apply generics. interface Board<P> {
List<P> getPieces();
}
interface Piece<B, M> {
List<M> getValidMoves(b: B);
}
interface Move<B> {
void apply(b: B);
}
Only that's not enough either, because on it's own these parametric definitions don't enforce that the set of pieces a board returns are actually valid pieces for that game. With constraints you end up with this (in a psuedo-language where you can use a special Self type, I don't know typescript and don't think this can actually be implemented in plain Java) interface Board<P extends Piece<Self,Move<Self>>> {
List<P> getPieces();
}
interface Piece<B extends Board<Self,M>, M extends Move<B>> {
List<M> getValidMoves(b: B);
}
interface Move<B extends Board<?>> {
void apply(b: B);
}
And so you have a bunch of these weird circular definitions to get these components to play together nicely. Meanwhile, you can define a module for this without using Functors or type substitutions or anything terribly complicated (Note the below is sort of mixing OCaml with more Java-like syntax just because I'm not super familiar with OCaml): module type Game = sig
type Board
type Piece
type Move
val getPieces: Board -> List<Piece>
val getValidMoves: Piece -> Board -> List<Move>
val apply: Move -> Board -> unit
end
And I think that's the really interesting thing you can do with modules that is more awkward with the traditional OOP interfaces. It makes it more natural to talk about multiple different data types all working together.The only way to implement that as nicely with an OOP interface would be to wrap everything in a top-level object
interface Game<B, P, M> {
List<P> getPieces(b: B);
List<M> getMoves(p: P, b: B);
void apply(m: M, b: B);
}
Though now you have to make a singleton Game object and pass that around everywhere you need it, which may or may not be idiomatic or obvious depending on the language and your preferences.