Algebraic Data Types for C99(github.com) |
Algebraic Data Types for C99(github.com) |
Algebraic Data Types are almost always one of the things I miss when I use imperative languages. I have to do Java at work, and while I've kind of come around on Java and I don't think it's quite as bad as I have accused it of being, there's been several dozen instances of "man I wish Java had F#'s discriminated unions".
Obviously I'm aware that you can spoof it with a variety of techniques, and often enums are enough for what you need, but most of those techniques lack the flexibility and terseness of proper ADTs; if nothing else those techniques don't have the sexy pattern matching that you get with a functional language.
This C extension looks pretty sweet since it appears to have the pattern matching I want; I'll see if I can use it for my Arduino projects.
:)
To me it feels very similar to an interface (trait) implemented by a bunch of classes (structs). I have multiple times wondered which of those two approaches would be better in a given situation, often wanting some aspects of both.
Being able to exhaustively pattern match is nice. But being able to define my classes in different places is also nice. And being able to define methods on the classes is nice. And defining a function that will only accept particular variant is nice.
From my perspective a discriminant vs a vtable pointer is a boring implementation detail the compiler should just figure out for me based on what would be more optimal in a given situation.
I do find it a little annoying that it's taken so long for Java to get a feature that, in my opinion, was so clearly useful; it feels like they were about a decade later on this than they should have been, but I'll take whatever victories I can get.
Java has https://github.com/functionaljava/functionaljava
which is unsupported but stable.
It wasn't that I though that the JVM was incapable of doing something like an ADT, just that vanilla Java didn't support it. While it's easy to say that "companies should just use Kotlin", that's a bit of a big ordeal if you already have a 15 year old codebase that's written in Java.
I've heard of but never used the Functional Java library, though it'd be a tough sell to get my work to let me import a library that hasn't been updated in two years.
For Java, see https://www.baeldung.com/java-lts-21-new-features
Kotlin's: https://www.baeldung.com/kotlin/when
Make up your own mind.
Algebraic data types and pattern matching actually work really well in imperative languages, too. See eg Rust.
They've recently added support for compiler-enforced pattern matching over sealed classes, which I suppose does get you halfway there though.
Think Scala, Elm and Haskell have it as well.
Having that and elixirs pattern matching would be insane.
I only sometimes use it as a "I would recommend this repo" -- how can one do that anyways, given that the repo could morph into something one would no longer recommend?
I've known C for almost 20 years, and never would I have thought the macro system was powerful enough to allow such black magic.
This is awesome!
The author is only 19 years old. I feel really dumb now.
They are kind of cursed but at their core they are actually incredibly simple and a reliable tool for reducing cognitive complexity and boilerplate in C based projects.
There is a lengthy blog post about the same stuff, except that the author doesn't seem to have come across the said wiki section yet: https://nandakumar.org/blog/2023/12/paradigms-in-disguise.ht...
Kudos to the dev of datatype99 for showing the problem with such ad-hoc methods in the readme right away.
Seems like a pretty big footgun. But otherwise, very cool.
I had written a immediate mode UI out of macros, and this reminded me of that although in my case it is not a problem, although some blocks are ones that you can use "break". For example, you can use "break" to exit out of a win_form block ("goto" also works), while a win_command block does not capture "break" so using break (or goto) inside of a win_command block will break out of whatever block the win_command is in (probably a win_form block; for example, this would commonly be used in the case of a "Cancel" button).
So it's not just about being slightly better in some ways, but smoothing over so many paper cuts that it can be hard to see how they have added up overtime across ecosystems, like CPython and co having so many of its own vocab types, or HPC libs.
For example, the problem with this macro that causes this wouldn't even be problems in a well written Rust macro. They're artifacts of smart people trying to work around C's limitations.
But then the macro wouldn't have been written anyway because this is a port of a native Rust feature (which means it gets taken advantage of in community software).
Let's say further that you already know Rust exists, and aren't going to use it for reasons that anyone writing a C program already knows.
At least consider Zig. Here's a little something I wrote in Zig two days ago:
/// Adjust a label-bearing OpCode by `l`. No-op if no label.
pub fn adjust(self: *OpCode, l: i16) void {
switch (self.*) {
inline else => |*op| {
const PayType = @TypeOf(op.*);
if (PayType != void and @hasField(PayType, "l")) {
op.*.l += l;
}
},
}
}
This uses comptime (inline else) to generate all branches of a switch statement over a tagged union, and add an offset to members of that union which have an "l" field. You can vary the nature of the branches on any comptime-available type info, which is a lot, and all the conditions are compile-time, each branch of the switch has only the logic needed to handle that variant."But my program is already in C, I just need it for one file" right. Try Zig. You might like it.
The only issue is you can't do a clean switch statement that matches on the specific value of a field, but nested switch statements aren't that messy.
The trouble with this approach is that there's a lot of mental overhead in dotting all of your i's and crossing all of your t's. It's draining, so you start to, e.g., shoehorn additional functionality into existing classes instead of making new ones.
You eventually wind up perceiving the abstraction as costly which lessons your use of it at the expense of producing a more elegant solution to the problem(s) you're solving.
tl,dr? The ability to just state "Darmok and Jalad at Tanagra" is transformative when the alternative is telling an entire story every time you want to reference a complex idea.
The modelling aspects can be simulated, yes, but that's barely half of the benefits of ADTs. Pattern matching is a big ergonomic benefit.
datatype(
BinaryTree,
(Leaf, int),
(Node, BinaryTree *, int, BinaryTree *)
);
No names for the struct fields, so you need to rely on the position.And then used:
int sum(const BinaryTree *tree) {
match(*tree) {
of(Leaf, x) return *x;
of(Node, lhs, x, rhs) return sum(*lhs) + *x + sum(*rhs);
}
// Invalid input (no such variant).
return -1;
}
Where lhs, x, rhs magically match the types defined above. What a nonsense design!The worst codebases to inherit as a maintenance programmer are the ones where people got clever with the C preprocessor. Impossible to debug and impossible to maintain.
This library on the other hand addresses a nasty papercut whose presence usually stops folks with modern language experience from choosing C when it might otherwise be valid. Plus you can't beat C's long-term stability.
Though I agree that 90+% who _think_ they still need C should probably move on to making Rust work for them, instead.
Considered the example given. Now my eyes hurt. I think I started appreciating Lisp more.
https://gist.github.com/unclechu/eb37cc81e80afbbb5e74990b62e...
The difference here is that Nim compiles to C and you can turn the garbage collector off: Zig compiles C and there's no garbage collector. That means the entire standard library is available when generating object code. It's also trivial to opt-in to the C ABI on a fine-grained basis, by defining a function or struct with the extern keyword.
I believe this is still fairly current about the difficulties of building Nim dylibs for C programs: https://peterme.net/dynamic-libraries-in-nim.html
I expect Nim will stabilize about where D has: it will have a dialect of the language which, with relatively painless accommodations, is able to produce object code which speaks C ABI. Zig is different. The language is relentlessly focused on providing a better alternative to C while occupying the same niche, and a lot of design time has been spent on making it practical to take an existing C program and start writing the new parts of it in Zig.
It's a good language, Nim, and getting better. I'd recommend it for someone who is considering Go, for example.
The critical thing is that the compiler (or macro system) needs to check that you've checked all the alternatives.
(And in fact Algol 68 had a better implementation than most later languages, but Algol 68 was missing completely any documentation suitable for newbies, like tutorials and programming examples, while not being promoted by any hardware vendor, like IBM or DEC, so it was doomed.)
https://web.eecs.umich.edu/~bchandra/courses/papers/Hoare_Hi...
I also would love a future where ADTs are more common in imperative languages
contrived :: ([a], Char, (Int, Float), String, Bool) -> Bool
contrived ([], 'b', (1, 2.0), "hi", True) = False
To achieve a result like this using Zig's switch syntax would seem to involve a huge amount of boilerplate code and nested switch statements.Most of the common languages today have product types.
Java[1], Rust, Haskell, etc. have sum types.
I think it gets a bit more escoteric beyond that though - i don't doubt that there's probably some haskell extension for quotient types[2] or some other category theory high-jinx.
Most languages have ADTs built in.
[1] https://blogs.oracle.com/javamagazine/post/inside-the-langua... [2] https://en.wikipedia.org/wiki/Quotient_type
https://medium.com/dartlang/dart-3-1-a-retrospective-on-func...
You can do the same thing with boolean logic and just have not-and, but thankfully we have and, or, not, xor. Similarly, you don't need greater-than-or-equal because you can just write 'x > y || x == y'.
Comprehension is linked to how closely you can express the idea of what you're doing in the object language that you have to look at. It might be convenient to compile everything down to SK combinators so that your optimizer and evaluator can be simpler, but people should never look at that level (at least not until you suspect a compiler defect).
So we get to object oriented programming. Where our data expression has an AND property (a class has an INT field AND a STRING field), an existential property (interfaces: there exists some object with these methods), and inheritance (a truly bizarre feature where we duck tape subtyping to a method and field grouping mechanism with a bunch of hooks).
With interfaces and inheritance you can simulate both a universal property (generic) and an OR property. But because it's not a direct expression, we leave this giant gap for what people intended to happen to diverge from what actually happens. Especially after time passes, defects are found, and requirements change. [For example, when using interfaces to simulate an OR property, there really isn't any mechanism to let everyone know that this construct is closed. So if something erroneously gets added, you won't know to check the entire code base. And if requirement change and you need to add a new case, then you have to check the entire code base. Completeness checking of ADTs give you this for free in your pattern matches.]
Too many non-trivial architectural messes that I've encountered in my career have been due to either someone trying to solve all of their problems with interfaces or the same with inheritance* when a simple OR data structure would have made everything simple, clear, and correct.
[*] - Inheritance being more problematic when someone tries to create a non-trivially sized category hierarchy, which ruins the day when requirements change and suddenly the tree needs to be reorganized but doing so would invalidate entire swaths of the code base already accepting types with a different assumed (and undocumented) hierarchal tree structure. Thankfully most people have gotten the memo and switched to interfaces.
They were both introduced in the same decade.
I think that's actually wrong for "Sum types". Product types, sure. The idea of storing a bunch of fields in a single thing matches the way we've been organizing information since we started writing things down.
But I genuinely don't think I've seen an attempt at a sum/union/enumerant/whatever syntax in a programming language that wasn't horrifyingly confusing.
Where by extension: class-based inheritance is actually pretty simple to understand. The classic "IS-A" relationship isn't as simple as "fields in a struct", but it's not hard to understand (c.f. all the animal analogies), and the syntax for expressing it is pretty clean in most languages.
Is it the "best" way to solve a problem? Maybe not. Neither are ADT sum types. But I think there's a major baby-in-the-bathwater problem with trying to be different. I really don't think, for the case of typical coders writing typical code, that ADTs are bringing as much to the table as the experts think.
Syntaxes like: `A | B(Int) | C(String)`. That means A, B, or C.
> Where by extension: class-based inheritance is actually pretty simple to understand. The classic "IS-A" relationship isn't as simple as "fields in a struct", but it's not hard to understand
This value is either `A`, an Int (`B(Int)`) or a String (`C(String)`). Or: this knapsack either contains an A, B, or C. Difficult?
> (c.f. all the animal analogies),
Reminds me of the fact that software isn’t typically as static as animal taxonomies.
> and the syntax for expressing it is pretty clean in most languages.
I’m most used to Java where you spread `extends` over N files. (Sealed classes in Java is an exception.)
It’s fine. I don’t understand how it is particularly clean.
> Is it the "best" way to solve a problem? Maybe not. Neither are ADT sum types.
Is this an argument? ’Cause I don’t see it.
> But I think there's a major baby-in-the-bathwater problem
Inheritance is something that needs to have some concrete implementation affordance. Baby in the bathwater? I don’t see how you bolt this onto the struct model in a way that gets out of the way for people who don’t want to use it (the zero-cost-abstraction (in the if you don’t use it sense) is important to some low-level languages).[1]
Maybe the designers of hypothetical language X thinks that algebraic data types is enough. What baby are you missing then?
[1] For algebraic data types: structs are straightforward enough. While the “sum type” can be implemented by leaving enough space for the largest variant. That one-size-fits-all strategy isn’t perfect for all use-cases but it seems to have been good enough for Rust which has a lot, a lot of design discussions over minutiae.
> trying to be different.
With a technology from the ’70s. I also saw your “one oddball new idea is a revolution” snark. You’re clearly being very honest.
data Bool = True | FalseSimple to understand, a nightmare to debug, as you'll be chasing where your data and data contracts across a ton of files.
It honestly does annoy me that a lot of mainstream languages still haven't really adopted ADTs; when Java 8 added a lot of (well-needed) new syntax, it felt like that was an ideal opportunity to add ADTs and pattern matching (though I'm sure that was easier said than done).
Well at least Java does now (as of Java 21) have pattern matching (including nested record destructuring) and sealed classes, which let you have decent sum types.
The one issue is that everything is nullable, but that's a wider Java issue.
Also, algebraic data types can be seen as hierarchy consisting of abstract base class and several final children classes. So it is an inheritance model, just restricted one.
However
What we do see is a bunch of mathematical disciplines that end up creating properties like: AND, OR, Universal, Existential, Implication, (and a few others). They end up in places like: set theory, type theory, category theory, various logics, lattice theory, etc.
Now, maybe they're only copying one another and this is more of a memetic phenomena. Or maybe they've hit upon something that's important for human comprehensibility.
That would be the 'evidence' of the positive effect of ADTs (scare quotes because it might just be math memes and not fundamental). But we can also think about what I feel is legit evidence for the negative effect of lacking ADTs.
Consider what happens if instead of having the standard boolean logic operators and, or, not, xor, we only have the universal not-and operator. Now a straightforward statement like: A && B || C becomes (((A !& B) !& (A !& B)) !& ((A !& B) !& (A !& B))) !& (B !& B) [I think...]. It's more complicated to tell what's actually supposed to be going on AND the '&&' simulation can get intertwined with the '||' simulation. The result being that requirements changes or defect fixes end up modifying the object level expression in a way where there is no longer any mapping back to standard boolean logic. Comprehensibility approaches zero.
And we've seen this happen with interfaces and inheritance being used to implement what would otherwise be a relatively simple OR property (with the added benefit that pattern matching ADTs often comes with totality checking; not something you can do with interfaces which can always have another instance even up to and including objects loaded at runtime).
Swift, Kotlin and Scala all have had ADTs for a while, even Java has it now.
Niklaus Wirth is well known as a critic of Algol 68 (before the design of Algol 68 was finalized), but in the case of his variant records he has completely failed to create something competitive.
In practice you need to couple it with an enum, and your visitation mechanism is a switch statement. But C doesn't impose that on you and lets you do it as you see fit.
It also has all the features of Haskell, since you can implement a Haskell compiler in C.
JetBrains has prioritised compatibility with Java and it shows. Of course, there are some gotchas (such as nullability or checked exceptions which don't exist in Kotlin), but you can really mix Kotlin and Java code relatively freely.
Inheritance addresses similar issues by delegating the responsibility, something that is easy to understand. Does it have downsides? Sure. But I'm not sold that ADT has the advantage here at all.
I find positional destructuring of records a bad idea.
This code is ADTs for C.
At some point somebody's going to call it CADT and jwz will explode.
Sum typing might be better! But frankly the jury is still out, and the impact is clearly going to be smaller than what you're imagining.
Now, I should add that I did not mean my question to be a criticism of them! I'm genuinely curious on evidence that they are a basic building block. Feels save to say they are a good building block, and those aren't the same thing.
As an easy example for them not being basic building blocks, I can't remember ever seeing anything like them in any assembly instructions for things. Put together a batting net for the kids. Lots of instructions, but nothing algebraic, in this sense. Looking at recipes for food. Nothing algebraic, really? Maybe I can squint and see some, but it would be hard. Exercise plans? Music lessons? Playbooks for a sport?
Again, though, I /do not/ intend this as a criticism of them. Genuinely curious on any investigation into them.
There is the still the ongoing debate about how much human perception and human reason are shaped by cultural forces vs. universal forces (where the latter asserts humans reason in the same/similar ways).
There's evidence that certain optical illusions don't work across cultures for example (I seem to remember those in Western countries have a tendency to mentally group things in rectangular boxes). The exact balance between cultural and universal forces isn't known and I doubt we could say anything about sum types in that regard.
Rust somehow has more elegance than Go, if only in small parts. Nothing compares to Scheme in the elegance category IMO :)
Meanwhile, getting addicted to scheme many years ago is a lot of why I still mostly write perl, I need 'my' (i.e. expression/block level scoping and you actually have to declare your variables to scope them before use) and closures for the style of programming I like to do.
Rules out python, ruby, and pre-ES6 JS, though JS with 'let' and arrow functions is pretty survivable and will be even more so if the 'do' and 'match' proposals land.
(note: I know go has :=, and while I'm ambivalent about the syntax, the semantics are pretty much what I expect; the above was talking about scripting class languages though hence not mentioning it there)
It's been a long time since I've written any Perl, but I can say I don't care for it as much as I do Lisps in general.
Example:
sealed interface BinaryTree {
record Leaf(int value) implements BinaryTree {}
record Node(BinaryTree lhs,
BinaryTree rhs,
int value) implements BinaryTree {}
}
public class Hello {
static int sum(BinaryTree tree) {
return switch (tree) {
case BinaryTree.Leaf(var value) -> value;
case BinaryTree.Node(var lhs, var rhs, var value) -> sum(lhs) + value + sum(rhs);
};
}
public static void main(String... args) {
var tree = new BinaryTree.Node(
new BinaryTree.Leaf(1),
new BinaryTree.Node(
new BinaryTree.Leaf(2),
new BinaryTree.Leaf(3),
4),
5);
System.out.println(tree);
System.out.println("Sum: " + sum(tree));
}
}
If you added a new subtype to BinaryTree you would need to fix the switch.EDIT: I didn't handle the `null` case above... so it would be a NullPointerException if someone passed null... apparently, Java decided to make handling `null` optional. More information: https://www.baeldung.com/java-lts-21-new-features
final boolean hasUncollectedSecret =
switch (each)
{
case Wall() -> false;
case Goal() -> false;
case Player p -> false;
case BasicCell(Underneath(_, var collectible), _)
->
switch (collectible)
{
case NONE, KEY -> false;
case SECRET -> true;
};
case Lock() -> false;
};Java has had comprehensive pattern matching since Java 21, like one year ago (current Java version is 22).
I posted an answer to the same parent comment with the C example written in Java...
You can read more about it here: https://www.baeldung.com/java-lts-21-new-features
I have a mid-sized application I built on 17 that's used to deliver a particular project, really looking forward to finish the project so I get to move to 21 and refactor it with these new features and make it more general.
At least give Nim a try.
I do agree that, on an even playing field, Nim should be considered when any of Rust, D, C++, or Go, are on the table. This is less true of C, and therefore, Zig. There's a difference between something being possible, and something being easy and pleasant.
There are also people using C, who really should be using some other language. Which is, in that case, probably not Zig. When you understand that C is the rational choice for some sorts of programming, you'll understand why Zig is a contender, in a way that the other languages we're discussing are not.
L = { tag: "a", payload: string } | { tag: "b", payload: number }
R = { tag: "b", payload: number } | { tag: "c", payload: boolean }
T = L | R
whereas a proper sum type `L + R` would have four.It barely feels like the same language as sysadmin scripts are written in, it just happens to share a compiler and a VM.
... and it doesn't exactly even share syntax since perl supports libraries defining their own keywords, e.g. while I won't be surprised if somebody makes it a feature of the VM eventually, our async/await is currently provided by https://metacpan.org/pod/Future::AsyncAwait and as you can see at https://metacpan.org/pod/Future::AsyncAwait#WITH-OTHER-MODUL... there's quite a few other useful ones out there.
Of course, many people complain vociferously about the idea that you have to use *libraries* to get syntax, but as somebody who still loves scheme I regard the ability to build the language up towards the problem to that extent to be a feature.
My first ever cpan release was making continuations pass back out to perl as a subroutine reference from Guile so I could write linear top level logic in scheme and use continuation capturing to sit that atop an event driven I/O stack - axisofeval.blogspot.com's lispx (https://github.com/lispx/lispx/) provides an fexpr based lisp that does similar for JS.
(trying to think of a good example of perl code of mine that shows an example of this is annoying me because I don't have any public code right now that's new enough for me to not have already decided I don't like it anymore ;)
MyType= (Scalar4, Real4, NullTerminatedStringC);
MyUntaggedRecType=
RECORD
CASE MyType OF
Scalar4: (longC: ARRAY[1..150] OF longint);
Real4: (floatC: ARRAY[1..150] OF real);
END;
MyTaggedRecType=
RECORD
CASE tag: MyType OF
Scalar4: (longC: ARRAY[1..150] OF longint);
Real4: (floatC: ARRAY[1..150] OF real);
END;
...
{ set all to 0.0 without running through the MC68881 }
FOR j := 1 TO 150 DO
longC[j]:= 0;
...
CASE tag OF
Scalar4: percentReal = longC[1];
floatC: percentReal = floatC[1]*100;
ELSE
percentReal = 0.0/0.0;
edit: don't have a pascal compiler handy, but that's the ideaIt included many innovations that appeared again in other programming languages only decades later.
Niklaus Wirth was a good teacher and writer and the success of his languages is due mostly to his books and due to his languages being used for teaching in many universities, not due to their technical qualities.
Most of his languages ended up losing, because they weren't being shipped with a free beer OS, coupled with a free beer compiler.
AFAIK Pascal is C and Algol 68 is C++
people used Pascal because the compiler was blazing fast, it was easier to implement and learn and the features it lacked against Algol did not really matter most of the time (at the time)
More features doesn't automatically means "better"
Also Pascal had quite strong technical qualities, not very common among other contemporary languages
edit: can I ask the reason for the downvote? I would really like to hear an opinion on what Pascal did wrong, having used it extensively in the late 80s until the end of the 90s and why my comment was awarded with a negative score.
For two, addition is a wildly disparate thing everywhere we use it. We like to joke that computers made that hard, but literally half of intro chemistry is learning how to get thing to add together in a meaningful way, no? Balancing a chemical equation is a thing.
Logic doesn't really have a direction, it works backwards or forwards. Even if you're solving a system "backwards", whatever that means, you still have to satisfy all of the necessary AND and OR constraints for a solution to be valid, so you're effectively still building ADTs or records just using a different evaluation order.
You can /model/ it that way. But you are making a symbolic model to justify how a solution is reached.
Now, it can be frustrating to consider that this model could produce an agent that is better at the ball game than the players. But it is silly to think that means you have mirrored them.
Implication is one of the primitives in logic, and gives us several of the classic logical fallacies: affirming the consequent, denying the antecedent, fallacy of the converse, and fallacy of the inverse.
All of which are examples of trying to work logic as though it doesn't have a direction.
[1] L. Augustsson. Compiling Pattern Matching. In Functional Programming Languages and Computer Architecture, pages 368– 381, 1985.
[2] P. Wadler. Efficient Compilation of Pattern Matching. In S.L. Peyton Jones, editor, The Implementation of Functional Programming Languages, pages 78–103. Prentice Hall, 1987.
What a lot of people miss about that paper is that he wasn't just talking about goto statements. He was also making a more general observation about how more powerful and general programming language features are not necessarily desirable, because they tend to adversely impact developer productivity.
The reason I, as a user, prefer structured control flow statements over goto is not that I believe they are powerful. It's precisely because they are less powerful. The resulting constraints on how the program can be structured make it easier for me to read and reason about existing code. That makes maintaining code easier. It also makes optimization and static analysis easier. And it makes writing tests easier, too.
I have similar feelings about ADTs. The reason I prefer them to other ways of doing composite data types is not that I think they're more powerful. It's that they create constraints that tend to reduce the semantic complexity of the domain models people create in programming languages that use them. And that makes my job easier.
The corollary to that, though, is that I'm not actually all that hype about adding ADTs to existing languages. For reasons that are similar to how the mere availability of structured, reentrant function calls is small consolation in a codebase that's already riddled with goto statements. The real win doesn't come from using ADTs, it comes from not having to worry about all those other confusing overpowered things that aren't ADTs.
But they're hard to read for anyone who isn't an expert on not just the language but the type in question (c.f. Rust's Option() idioms all looks like line noise to newbies, etc...). And that's a bad trade.
In essence, this stuff is just Perl all over again. It's a language feature that prioritizes concision over comprehension. And I say that as someone who really likes coding in perl. But "people" don't like perl, and the community moved on, and the reasons are... the same reason that uptake in ADTs is lagging where the experts want it to be.
All Turing Complete programming languages are Turing equivalent to one another. Programs written in one language can be mechanically transformed into those written in another. This is irrelevant to the discussion of programming languages. The whole point of creating different programming languages is to explore different ways to express the same program!
There's also a great talk on the matter [1], if somebody is interested in formalities.
Better known as the Turing Tar Pit.
So, while you are formally right, the need of shortcuts in pattern matching is undeniable to me.
Isn't that true of ADTs in all languages? I can't think of a single language with ADTs that lets you change the tag/variant of an existing value.
#[derive(Debug)]
pub enum Example {
Foo(i32),
Bar(&'static str),
}
let mut ex: Example = Example::Foo(42);
println!("{ex:?}"); // Foo(42)
let ex_ref: &mut Example = &mut ex;
*ex_ref = Example::Bar("hello");
println!("{ex:?}"); // Bar("hello")
Given a mutable reference to a value of enum type, you can replace it with another variant. Or you can swap it out with any other value of the same type, even if the variants are different. This is most commonly used for Option<T>, where you can insert or remove the contained value as long as you have a reference.The limitation here is that for as long as the mutable reference is live, no other code can access the value. So when you do change the variant, you can't have any other references sitting around that point to the removed subfields.
[0] https://play.rust-lang.org/?version=stable&mode=debug&editio...
You can accomplish the same thing in C and C++ because they also have in-place value semantics and allow you to take the address of any variable. You can't do that in Java only because Java doesn't let you take references to variables.
By the way, I would claim Java's sum types are less limited than Rust because in Rust, variants don't have their own type. The consequence is that you can't have functions that only accept some variant, as far as I know (I remember having this problem once), or add "methods" only to one variant... while in Java, because variants are just normal types, you can do both, and doing that is pretty damn useful.
For an exercise, I had to write a system with Admins and Customers, with the ability to upgrade a Customer into an Admin, or vice versa.
My thought was to put them as two subclasses under a User superclass, so that I could put them under a single Users table, and not have to link and unlink things over the conversion. Hibernate ORM supports storing subclasses by adding an implicit discriminator field.
However, its object model specifies that a single row always corresponds to a particular instance, so it has no support for changing the subclass of a row. Ultimately, I ended up with a hacky solution of creating a new record with the primary key copied over.
> By the way, I would claim Java's sum types are less limited than Rust because in Rust, variants don't have their own type. The consequence is that you can't have functions that only accept some variant, as far as I know (I remember having this problem once), or add "methods" only to one variant... while in Java, because variants are just normal types, you can do both, and doing that is pretty damn useful.
At least in Rust, you can simulate this pretty trivially by having each variant store a struct value with all the data and methods you want. See proc_macro::TokenTree [0] for an example of this. Of course, it's not ideal in how verbose it is (though not much worse than Java!), but it can be workable on the consumer's side if you add some extra From impls.
[0] https://doc.rust-lang.org/proc_macro/enum.TokenTree.html
Wirth's Pascal was a language designed for the purpose of teaching programming and it was adequate for that, but it was completely inappropriate for any serious work.
It had no means for writing big programs that must be divided into multiple source files and it had a lot of design errors, like including the size of an array in its data type (which made impossible the writing of any linear algebra library) or the way of handling the variant records (which was insecure and verbose).
The well known paper "Why Pascal Is Not My Favorite Programming Language" by Brian W. Kernighan contains valid arguments against Pascal (against Wirth's Pascal, there have been many extended Pascals, especially following Turbo Pascal, which have corrected some of the most egregious defects of the original Pascal).
Small-C, RatC,....
Additionally forgetting that Modula-2 came out in 1978, as means the sort out all those issues, a language designed for systems programming, instead of the "Python" from 1970, designed for teaching.
With features that C is yet to have, 50 years later, while HN is hypping for having them in a C like syntax delivered in a Zig package.
- no escape (AKA no casting): good!
- no default clause in case: good idea, not so good implementation (undefined behaviour)
- no break outside for loops: inconvenient, but that's how FP works. it is still debated today if breaking loops is considered a good or a bad practice
- no separated compilation: I will quote Kernighan on this Theoretically, there is no need for separate compilation - if one's compiler is very fast Pascal compiler was fast, maybe not very fast, but speed was one of the primary goals for Wirth.
many other issues were similar in other languages and in C
Pascal had obviously its faults, but every language back then had some
Pascal was simple enough to make it easy to compile and implement. That's what Wirth thaught, he's the author of Compiler Construction after all, it wasn't like learning Python today as a data scientist
make the language more complex (and more useful/interesting) and you're stuck with a very slow or very buggy compiler that very few people would know how to implement
I think there's a reason why we had Turbo Pascal and not Turbo Algol 68
The plan has been to gradually replace it with the new version of the software (which runs on Java 17), unfortunately this plan has been ongoing for 10 years and it's not gonna be done anytime soon.
Such are the sad realities of working on legacy code sometimes.
Sometimes the upgrade path from 6 onwards isn't as nice as it usually is from 8 up, especially if you built with some old undead libraries that require an heroic effort to understand well enough to reimplement. Takes a very special organisation to divert some person-years to pull it off, and as some middle or other manager it's highly unlikely to catch you a new, better job even if it goes well.
I do worry that there's "Y2K-esque" bugs hiding in Java 6 programs somewhere. I don't think java.util.date uses 32 bit integers for time so the 2038 problem probably won't be an issue, but I do wonder if there's other stuff hiding.
By now most types of issues ought to have popped up though, since it was so widely used for such a long time.
'Values' in Rust have no identity, except for their address. They're just a bunch of bytes in a row. What could it mean for a value to exist, except for it to be present at a set place in memory? If I have an instance of a Java class, and I change all the fields, I'd hardly say the instance has been replaced with a new instance.
If you insist, I'd say "Java's sealed classes are more limited in that when you have a bunch of references to the same thing, you can change the values in the fields of that thing (and have it be reflected in other references), but you can't change which variant it is." Call that thing a 'value' or a 'variable', it doesn't change the visible outcome compared to enums in Rust, or discriminated unions in C/C++.
Pointers and value semantics are nice, but have nothing to do with ADTs. For example, in Rust, you can also do:
let mut ex: i32 = 42;
println!("{ex:?}"); // 42
let ex_ref: &mut i32 = &mut ex;
*ex_ref = 53;
println!("{ex:?}"); // 53
And in C: int x = 42;
printf("%d\n", x); // 42
int* x_ref = &x;
*x_ref = 53;
printf("%d\n", x); // 53
In these examples, we haven't mutated the number 42 into the number 53. We've simply stored an entirely new value in the location of `x`. In your Rust example, you're doing the exact same thing with an ADT. The variant case isn't being changed. You're creating a new value and storing it in an existing storage location. Every pointer pointing to that storage location sees the update because they all point to the same storage.ADTs are closed to extension with new cases but open to extension with new functions, eg. anytime you want to add new cases, you have to update all functions that depend on the ADT, but you can add as many functions for that ADT as you like with no issues.
Traits are open to extension with new cases but closed to extension with new functions, eg. you can add as many impl as you like with no issues (new cases), but if you want to add a new function to the trait you have to update all impl to support it.
They are logical duals, and the problem of designing systems that are open to extension in both cases and functions is known as the expression problem:
For example, a result is a success value or an error. A stock order is a market order or a limit order, and nothing else, at least until someone updates the spec and recompiles the code. Situations like this happen all the time. I don’t want to extend a result to include gizmos in addition to success value or errors, nor do I generally want to extend the set of functions that operate on a certain sort of result. But I very, very frequently want to represent values with a specific, simple schema, and ADTs fit the bill. A bunch of structs/classes, interfaces/traits and getters/setters can do this, but the result would look like the worst stereotypes of enterprise Java code to accomplish what a language with nice ADTs can do with basically no boilerplate.
But that's just it, specs are rarely complete because reality is fluid. For example, a result is a success or an error, until maybe you want an errors to prompt the user to correct something and then the computation can be resumed (see resumable exceptions).
Should you even have to recompile your code to handle new cases? Why can't you just add the new case, and define new handlers for the functions that depend on your ADT without recompiling that code? That's the expression problem.
This is possible to achieve (or hack your way through, if you will) by parameterizing the type and using a nullary type (a type which is impossible to have) to exclude specific cases of a sum type. In Haskell this would look like this:
data Weather a b c = Sunny a | Rainy b | Snowy c
-- can't snow in the summer!
onlySummerWeather :: forall a b. Weather a b Void -> String
onlySummerWeather weather = case weather of
Sunny _ -> "Got sunny weather"
Rainy _ -> "Got rainy weather"
Snowy v -> absurd v
where `absurd :: forall a. Void -> a` "if you give me something you can't ever have, I will give you anything in return". absurd :: Void -> (forall a. a)
drusba :: (forall a. a) -> Void
drusba void = void @VoidThere are lots of open design questions for every feature you propose, but all of them have been discussed and have higher or lower chance of making it into the language.
Then you might not fully grok sum types yet.
> From my perspective a discriminant vs a vtable pointer is a boring implementation detail the compiler should just figure out for me based on what would be more optimal in a given situation.
Disagree. It's a design choice that should be decided by the programmers. There's a tradeoff—choosing which should be easier: adding a new variant or adding a new function/method. It's called the Expression Problem: https://wiki.c2.com/?ExpressionProblem
Surely any Turing complete PL can express a sum type? I can't imagine a language that can support products but not sums.
---
We could add implicit enums to impl Trait, so that you could return different types from a function:
fn foo() -> enum impl Display {
if rand() > 0.5 {
"str"
} else {
42
}
}
which would let you get around the problem of returning a type erased object for a Trait that isn't object safe: trait Trait {
const C: i32 = 0;
}
impl Trait for i32 {}
impl Trait for &'static str {}
fn foo() -> Box<dyn Trait> {
if true {
Box::new("")
} else {
Box::new(42)
}
}
error[E0038]: the trait `Trait` cannot be made into an object
--> f500.rs:6:17
|
6 | fn foo() -> Box<dyn Trait> {
| ^^^^^^^^^ `Trait` cannot be made into an object
|
note: for a trait to be "object safe" it needs to allow building a vtable to allow the call to be resolvable dynamically; for more information visit <https://doc.rust-lang.org/reference/items/traits.html#object-safety>
--> f500.rs:2:11
|
1 | trait Trait {
| ----- this trait cannot be made into an object...
2 | const C: i32 = 0;
| ^ ...because it contains this associated `const`
= help: consider moving `C` to another trait
= help: the following types implement the trait, consider defining an enum where each variant holds one of these types, implementing `Trait` for this new enum and using it instead:
&'static str
i32
---Relax object safety rules, like making all assoc consts implicitly `where Self: Sized`.
---
We could make enum variants types on their own right, allowing you to write
fn foo() -> Result<i32, i32>::Ok { Ok(42) }
let Ok(val) = foo();
There's some work on this, under the umbrella of "patterns in types". For now the only supported part of it is specifying a value range for integers, but will likely grow to support arbitrary patterns.---
Having a way to express `impl Trait for Enum {}` when every `Enum` variant already implement `Trait` without having to write the whole `impl`.
---
Anonymous enums:
fn foo() -> Foo | Bar | Baz
---Being able to match on Box<dyn Any> or anonymous enums
match foo() {
x: Foo => ...,
x: Bar => ...,
_ => ...,
}
---Stop needing to create a new struct type in order to box a single variant
enum Foo {
Bar(Box<struct { a: i32, b: i32 }>),
}
---These are of the top of my head, there are many things that you can do to make trait objects and enums feel closer than they do today, to make changing the way your code works a "gradient" instead of a "jump". My go-to example for this is: if you have a type where every field is Debug, you can derive it. As soon as you add one field that isn't Debug, you have to implement the whole impl for your type. That's a "jump". If we had default values for structs you could still use the derive by specifying a default value in the definition. That makes the syntactic change "distance" be as far as the conceptual change "distance".
Haskell's type classes are a bit like Rust's traits. Type classes are open by default, but optionally can be closed off.
You're attempting a sleight of hand here by saying "they're" not "doing trig". Clearly they are not doing anything like that consciously, but equally clearly some part of their brain is triangulating objects and predicting trajectories based on gradients, meaning that part is "doing trig and calculus" subconsciously. What else does it mean to "do something" if not "process X is isomorphic to process Y"?
> You can /model/ it that way. But you are making a symbolic model to justify how a solution is reached.
I really don't understand what you think people are doing when they're compiling a grocery list. They're clearly thinking, "I need x AND y OR I can substitute z".
Or if they're planning to paint their fence, they're thinking, "I need paint AND brushes AND I have to start before lunch OR I won't finish before dinner".
You seem to think I have to prove your model false to show others don't do that. But I am specifically not claiming your model is false. I'm saying folks don't think that way, necessarily. For example, many build lists for shopping that they were taught. Not that they reasoned.
This is one of the social factors of programming language design and it's one of the main reasons successful programming languages work so hard to establish a coherent philosophy and a set of best practices or idioms within the language. For similar reasons, I believe this is why "anything goes" languages such as LISP have struggled to gain widespread adoption: with no philosophy every programmer becomes an island unto themselves.
That's why the most important difference between C++ and Rust isn't some technicality even though the technical differences are huge, it's cultural. Rust has a Safety Culture and everything else is subservient to that difference.
Sugar matters, Rust's familiar looking loops are just sugar, it only "really" has a single way to do loops, the loop construct, an infinite loop you can break out of. But despite that, people deliberately write the other loops - and the linter strongly recommends that they write them, because the programs aren't just for machines to compile, they're for other humans to read, and a while let loop is an intuitive thing to read for example, so is the traditional for-each style iterator loop.
Pattern matching on the next level down is a power tool to be used with care.
Having used some pattern matching languages for quite some time, I find anything much deeper than that is a code smell at best and pathological at worst. Pattern matching creates coupling proportional to the depth/complexity/precision of the pattern match. The top-level coupling is often unavoidable; if you're pattern matching at all, you certainly care which "branch" you are on and there is likely no refactoring that away. But the danger rises rapidly the deeper in you go. It's just so easy to pattern match on a third-level part of the complex object when you really ought to be wrapping that behind a function somewhere, possibly itself emitting a sum type value.
... but if all you really need is that "top level" match, a lot of pattern matching syntax and features are not really necessary (if not positively dangerous).
Which is exactly how Perl apologia arguments went.
Those suggestion of your look interesting, but I haven't thought them through enough to have an opinion.
For all purposes and intents, the "b" type in L and R should be treated the same, no? What do you gain by not doing that??
As a concrete example, consider a map with a method get(key: K) -> Option<V>. How do you tell the difference between a missing key and a key which contains `null` as a value?
If you have a function that will normally return a string, but can sometimes fail due to reasons, you may wish to yield an error message in the latter case. So you're going to be returning a string, or a string.
It's not what the content of the data is; it's how you're supposed to interpret it. You have two cases, success and failure, and control will flow differently depending on that, not based strictly on the type of data at hand. We just model those cases in a type.
T = {tag: "L", payload: L} | {tag: "R", payload: R}
The real issue is typescript doesn't have pattern-matching, which make operating on these sum types inelegantMost people learn to build lists through mimicry. They literally mimic the lists they were taught they need to build to go to the store. With enough experience, many of us learn to build our own lists from other principals, but it almost all starts with mimicry and simulation. Is why "going shopping" is such a fun game for kids. They are learning.
None of which is to say that you can't make your thinking better with logic. You almost certainly can. But you are begging the question with your examples and explanations. Heavily.
And if I want `error` to be `string` so I can just provide an error message? Proper disjoint sum types let me keep the two sides disjoint, even if they happen to be the same type. Union types will collapse on any overlap, so if I happen to instantiate `error` at `string` then I can no longer tell my error case from my data case.
You have missed the thread context, which is whether `Either a a` (also written `a + a`) has any merits over simply `a` (which is identical to `a | a`). If you're on the `Either` train already, we are arguing over imaginary beef.
> No disrespect, but that still sounds entirely useless to me.
It is disrespectful to say something "makes zero sense", regardless of anything you might say to the contrary. You've misrepresented my point: nobody wants to model something as `String | String`.
If you have, say, an `Int | Bool`, and you pass both sides through some kind of stringification function, you're naturally going to get `String | String` without ever having written that type down yourself. You wouldn't necessarily want to collapse that to `String`, however, because you may -- for instance -- want to give strings and ints different decorations around them before finally flattening them. You might write such a function as something like
(ib) => ib
.mapBoth(showInt, showBool)
.visit(prepend("int: "), prepend("bool: "));
You couldn't run this if the result of the `mapBoth` had type `String | String`: that type is indistinguishable from `String`, and since you can't tell which case you're in, you wouldn't know which tag to prepend.You could write the same function without passing through `String | String`:
(ib) => ib.visit(
sequence(showInt, prepend("int: ")),
sequence(showBool, prepend("bool: ")));
And yes, in this especially contrived example, perhaps you may find that to make more sense anyway. But in longer pipelines, sometimes separated over multiple functions, often involving generic code, it becomes much harder to simply always fuse steps in this manner. This is why we say sum types (e.g. `String + String`) compose better: they don't behave any differently depending on whether the two sides are the same or not. You have explicit control over when the two sides rejoin.With this type you would have to check/match an extra case!
The type you use there also takes more memory than Option<T> or Maybe<T>. So it has some other downsides.
There are already two misconceptions.
First: "Lisp has no programming philosophies" and styles.
Not every program starts by zero. Since Lisp exists since the end 1950s, it has seen quite a lot in programming styles over the years and it may contain traces of several. Generally it may support more than one programming paradigm. For example during the Common Lisp standardization there was a wish to have a standardized object system. So instead of the multiple possible approaches (actors, message passing, prototype-based, ...), Common Lisp has just one: CLOS, the Common Lisp Object System. So, much of the object-oriented code written in CL is implemented in one particular object system: CLOS. Object Lisp, Flavors, LOOPs, Common Objects, and a bunch of other once had thus been replaced by one standard.
CLOS also defines a bunch of user-level macros: DEFCLASS, DEFMETHOD, DEFGENERIC, ... Everyone using CL & CLOS will use those macros.
Second: "every programmer becomes an island unto themselves". If we look at the way CLOS was designed: there was a core group of six people from three companies. Around that there was a mailing-list based communication with a large group of interested people. Early on a prototype was implemented as a portable implementation of CLOS. This was widely distributed among interested parties: implementors, companies, research groups, ... Then reports about the language extension and its layers were published, books were published, application & library code was published.
One of famous books coming out of this effort: "The Art of the Meta-Object Protocol". It contained also a toy implementation of CLOS in Common Lisp. Book and the implementation of CLOS (both the larger prototype and the toy implementation) showed in excellent quality how to write object-oriented Lisp code.
https://mitpress.mit.edu/9780262610742/the-art-of-the-metaob...
So, there are communities, which share code and coding styles. Not every programmer is alone and starts from zero.
You misquoted me. I said no philosophy, singular. In the programming language context, a philosophy is a convention or a standard. Just as many standards implies that there is no standard, many philosophies implies no philosophy.
Everything else you said is evidence for my premise. Hire 3 different programmers, one from each of the communities, and you might as well have 3 different programming languages. That’s not a standard. That’s not a philosophy. That’s anything goes!
Most of the Common Lisp code that is accessible via public repositories conforms to conventions and is understandable.
Lisp programmers are highly motivated toward encouraging collaboration, since there aren't that many people in Lisp where you can afford to be turning people away toward projects that are easier to get into.
Also, you can easily hire 3 developers and get 3 different languages in, oh, Java or JavaScript. One person is doing Kotlin, another one Scala, ...
Three C++ programmers in the same room could also not understand each other. The greybeard speaking only C++98 with a bit of 2003 doesn't grok the words coming out of the C++20 girl's mouth and so it goes.
That makes no sense.
> Hire 3 different programmers, one from each of the communities, and you might as well have 3 different programming languages.
Maybe not. They build on the same foundation, a language with a large standard, which is largely unchanged since three decades. A language which can be incrementally extended, without invalidating the rest. A language where extensions can be embedded, without invalidating the rest of the language or its tools. Many projects use SBCL (now itself 25 years old and only incrementally grown) and a bunch of core libraries for it.
> That’s not a standard. That’s not a philosophy. That’s anything goes!
Most languages support widely different software development practices. Take JavaScript: it includes imperative, object-oriented and functional elements (similar to Lisp). It has huge amounts of competing frameworks (many more than any Lisp), where many of them have been superseded and many of them are built on a multitude of other libraries. The developer can pick and choose. Each projects will be different from other projects, depending on which libraries and programming frameworks it uses - and which of those the developer re-invents.
Any half-way powerful language (C++ -> templates, Ruby -> meta objects, Java -> objects & byte code & reflection & class loader, Smalltalk -> meta objects, Rust -> macros, C++ -> language interpreters, Java -> external configuration languages, C -> macro processor, ...) has ways to adapt the language to a certain style & domain.
Any large Java framework defines new standards, new configuration mechanisms, new ways to use new Java features (lambdas, streams, ...). See the large list of features added to the Java language over time: https://en.wikipedia.org/wiki/Java_version_history For many of them there were competing proposals. Each Java code base will use some subset/superset of these features, depending on what is needed and what the personal preferences are. And then the Java architect in the project will not be satisfied and will invent yet another configuration system, this time not using XML or JSON for the syntax, but develop a new embedded scripting language for the JVM and integrate that with his software. I have seen Java architects which eventually didn't understand their own configuration system any more.
If you think a language like Common Lisp is special here, then in reality it is not. It's just slightly different in that it has extensibility as one of its philosophies and provides defined interfaces for that. There is one macro mechanism, which is powerful enough, that for decades it has not been replaced. Every syntactic extension mechanism will use this macro system, which is documented, stable and widely used.
Why do you think that Lisp is an "anything goes" language? What's your baseline? I think that C is no less an "anything goes" language, but with a much less pleasant UI.
> with no philosophy every programmer becomes an island unto themselves
Some people actually think that Lispers tend to be too philosophical
For all its faults, C is quite easy to read and understand, even by beginner C programmers. Yes, C also has macros but their clumsiness helps to discourage their use.