Algebraic Data Types for C99

Algebraic Data Types for C99(github.com)

376 points by bondant 2 years ago | 225 comments

tombert 2 years ago |

Interesting.

Algebraic Data Types are almost always one of the things I miss when I use imperative languages. I have to do Java at work, and while I've kind of come around on Java and I don't think it's quite as bad as I have accused it of being, there's been several dozen instances of "man I wish Java had F#'s discriminated unions".

Obviously I'm aware that you can spoof it with a variety of techniques, and often enums are enough for what you need, but most of those techniques lack the flexibility and terseness of proper ADTs; if nothing else those techniques don't have the sexy pattern matching that you get with a functional language.

This C extension looks pretty sweet since it appears to have the pattern matching I want; I'll see if I can use it for my Arduino projects.

estebank 2 years ago | |

Everyone who hasn't used ADTs and pattern matching doesn't get what the big deal is all about. Everyone who is used to ADTs and pattern matching doesn't get what the big deal is all about, until they have to work in a language that doesn't have them. And everyone who just found out about them can't shut up about them being the best thing since sliced bread.

im3w1l 2 years ago | | |

I have mainly used them in Rust. They are nice I suppose, but nothing mindblowing.

To me it feels very similar to an interface (trait) implemented by a bunch of classes (structs). I have multiple times wondered which of those two approaches would be better in a given situation, often wanting some aspects of both.

Being able to exhaustively pattern match is nice. But being able to define my classes in different places is also nice. And being able to define methods on the classes is nice. And defining a function that will only accept particular variant is nice.

From my perspective a discriminant vs a vtable pointer is a boring implementation detail the compiler should just figure out for me based on what would be more optimal in a given situation.

acchow 2 years ago | | |

I’m in the latter camp (from Ocaml) and now using Go. Go feels clunky and awkward.

AlecBG 2 years ago | |

Sealed interfaces in java 21 allow pattern matching

tombert 2 years ago | | |

Yeah I know, we just don't use Java 21 at work yet. I'm super excited for that update, and it actually looks like we will be transitioning to that by the end of the year, but I haven't had a chance to play with it just yet.

I do find it a little annoying that it's taken so long for Java to get a feature that, in my opinion, was so clearly useful; it feels like they were about a decade later on this than they should have been, but I'll take whatever victories I can get.

lupire 2 years ago | |

Kotlin is JVM compatible and has ADTs.

Java has https://github.com/functionaljava/functionaljava

which is unsupported but stable.

tombert 2 years ago | | |

Sure, and Scala has had ADTs since its inception as well I think, and that's also JVM. It's not ADTs, but Clojure does have some level of pattern matching/destructuring as well.

It wasn't that I though that the JVM was incapable of doing something like an ADT, just that vanilla Java didn't support it. While it's easy to say that "companies should just use Kotlin", that's a bit of a big ordeal if you already have a 15 year old codebase that's written in Java.

I've heard of but never used the Functional Java library, though it'd be a tough sell to get my work to let me import a library that hasn't been updated in two years.

brabel 2 years ago | | |

Java 21's pattern matching (you don't need functionaljava, and shouldn't really use that unless you're really into FP) is kind of nicer than Kotlin's, because you can automatically "destruct" records in your matches.

For Java, see https://www.baeldung.com/java-lts-21-new-features

Kotlin's: https://www.baeldung.com/kotlin/when

Make up your own mind.

eru 2 years ago | |

> Algebraic Data Types are almost always one of the things I miss when I use imperative languages.

Algebraic data types and pattern matching actually work really well in imperative languages, too. See eg Rust.

klysm 2 years ago |

If I ever implement a product from scratch again, discriminated unions with compiler enforced exhaustive pattern matching is a hard requirement. It’s too powerful to not have.

nmfisher 2 years ago | |

Union types are my biggest wish for Dart, but unfortunately it doesn't look like they'll be added any time soon.

They've recently added support for compiler-enforced pattern matching over sealed classes, which I suppose does get you halfway there though.

actionfromafar 2 years ago | |

What languages could fit that description today? I don't really understand what it even means but maybe I could understand better if I could look at examples in languages which have them.

phi-go 2 years ago | | |

For example Rust: https://doc.rust-lang.org/std/keyword.enum.html

seivan 2 years ago | | |

Swift, Typescript depending on your opinion with structural typing, and Rust.

Think Scala, Elm and Haskell have it as well.

Having that and elixirs pattern matching would be insane.

naasking 2 years ago |

Definitely looks nicer and probably works better than my older attempt [1], but uses 8x more code and depends on the awesome but kinda scary Metalang9 macro toolkit. I think libsum is a good intro if you want to see how algebraic data types work underneath.

[1] https://github.com/naasking/libsum

Hirrolot 2 years ago | |

I have a star on your repository, so it seems I was looking into it while designing Datatype99 :)

cryptonector 2 years ago | | |

GH stars kinda function as a bookmark system, except I never go looking at what all I've starred, so it's more of an optimistic bookmark system.

I only sometimes use it as a "I would recommend this repo" -- how can one do that anyways, given that the repo could morph into something one would no longer recommend?

linkdd 2 years ago |

This is the work of a wizard.

I've known C for almost 20 years, and never would I have thought the macro system was powerful enough to allow such black magic.

This is awesome!

cl91 2 years ago | |

> I've known C for almost 20 years

The author is only 19 years old. I feel really dumb now.

clnhlzmn 2 years ago | |

You might also be interested in metalang99 by the same author.

CuriousCosmic 2 years ago | |

Yeah xmacros (the style of macro use) are pretty fancy. "Classically" they are used for creating and accessing type safe generics or for reducing boilerplate for hardware register and interrupt definitions.

They are kind of cursed but at their core they are actually incredibly simple and a reliable tool for reducing cognitive complexity and boilerplate in C based projects.

lupire 2 years ago | |

ADTs are mostly string replacement on generic structs and unions, plus tagging on the union. It's not a complicated use of macros.

drycabinet 2 years ago |

Wikipedia has something interesting on this (how unions can be implemented using "class hierarchy in object-oriented programming"): https://en.wikipedia.org/wiki/Tagged_union#Class_hierarchies...

There is a lengthy blog post about the same stuff, except that the author doesn't seem to have come across the said wiki section yet: https://nandakumar.org/blog/2023/12/paradigms-in-disguise.ht...

Kudos to the dev of datatype99 for showing the problem with such ad-hoc methods in the readme right away.

mingodad 2 years ago |

There is also https://melt.cs.umn.edu/ that has an extension that add templates and algebraic data types to C : https://github.com/melt-umn/ableC-template-algebraic-data-ty...

modeless 2 years ago |

> PLEASE, do not use top-level break/continue inside statements provided to of and ifLet; use goto labels instead.

Seems like a pretty big footgun. But otherwise, very cool.

zzo38computer 2 years ago | |

Using goto instead isn't a problem, but knowing not to use break/continue inside of such blocks is something that you will have to be aware of.

I had written a immediate mode UI out of macros, and this reminded me of that although in my case it is not a problem, although some blocks are ones that you can use "break". For example, you can use "break" to exit out of a win_form block ("goto" also works), while a win_command block does not capture "break" so using break (or goto) inside of a win_command block will break out of whatever block the win_command is in (probably a win_form block; for example, this would commonly be used in the case of a "Cancel" button).

392 2 years ago | |

What's neat about Rust is that in its macro land, writing the code that checked for this condition would be not only possible, but doable, imaginable, and aided by easily installable OSS libraries.

So it's not just about being slightly better in some ways, but smoothing over so many paper cuts that it can be hard to see how they have added up overtime across ecosystems, like CPython and co having so many of its own vocab types, or HPC libs.

For example, the problem with this macro that causes this wouldn't even be problems in a well written Rust macro. They're artifacts of smart people trying to work around C's limitations.

But then the macro wouldn't have been written anyway because this is a port of a native Rust feature (which means it gets taken advantage of in community software).

linkdd 2 years ago | |

goto is a footgun only if you use it to move from function to function, which btw was what "goto considered harmful" was about. That practice has disappeared, and now goto, within a function, is pretty harmless and quite identical to break/continue in fact.

modeless 2 years ago | | |

Goto isn't the footgun. The footgun is if you use break/continue by accident then some unspecified bad thing will happen, silently I'm guessing.

samatman 2 years ago |

Let's say you have a C program to write, and you really want exhaustive pattern matching on the tags of unions (which is what Datatype99 provides: "Put simply, Datatype99 is just a syntax sugar over tagged unions").

Let's say further that you already know Rust exists, and aren't going to use it for reasons that anyone writing a C program already knows.

At least consider Zig. Here's a little something I wrote in Zig two days ago:

    /// Adjust a label-bearing OpCode by `l`. No-op if no label.
    pub fn adjust(self: *OpCode, l: i16) void {
        switch (self.*) {
            inline else => |*op| {
                const PayType = @TypeOf(op.*);
                if (PayType != void and @hasField(PayType, "l")) {
                    op.*.l += l;
                }
            },
        }
    }

This uses comptime (inline else) to generate all branches of a switch statement over a tagged union, and add an offset to members of that union which have an "l" field. You can vary the nature of the branches on any comptime-available type info, which is a lot, and all the conditions are compile-time, each branch of the switch has only the logic needed to handle that variant.

"But my program is already in C, I just need it for one file" right. Try Zig. You might like it.

jackling 2 years ago |

Could you not get most of the benefits of ADTs using structs + unions + enums? I've used the pattern where I had a union of several types and an enum to differentiate which one to pick. Something like std::variant seems to work a bit like a sum type.

The only issue is you can't do a clean switch statement that matches on the specific value of a field, but nested switch statements aren't that messy.

acuozzo 2 years ago | |

Yes, and you can also get many of the benefits of OOP with convention and discipline, but doing so requires you to frequently get down in the weeds since, e.g., vtables must be dealt with manually.

The trouble with this approach is that there's a lot of mental overhead in dotting all of your i's and crossing all of your t's. It's draining, so you start to, e.g., shoehorn additional functionality into existing classes instead of making new ones.

You eventually wind up perceiving the abstraction as costly which lessons your use of it at the expense of producing a more elegant solution to the problem(s) you're solving.

tl,dr? The ability to just state "Darmok and Jalad at Tanagra" is transformative when the alternative is telling an entire story every time you want to reference a complex idea.

naasking 2 years ago | |

> Could you not get most of the benefits of ADTs using structs + unions + enums?

The modelling aspects can be simulated, yes, but that's barely half of the benefits of ADTs. Pattern matching is a big ergonomic benefit.

otikik 2 years ago |

What a madlad. Kudos for implementing this.

WhereIsTheTruth 2 years ago |

Tagged Union is a must have in a programming language

rurban 2 years ago |

I certainly won't use that. It's not type safe, and doesn't even allow names for its pattern matching sugar. Why he calls this simple struct matching sugar via tagged unions "Algebraic Data Types" is beyond my understanding. He cannot even do nested structs nor unions.

  datatype(
    BinaryTree,
    (Leaf, int),
    (Node, BinaryTree *, int, BinaryTree *)
  );

No names for the struct fields, so you need to rely on the position.

And then used:

    int sum(const BinaryTree *tree) {
    match(*tree) {
        of(Leaf, x) return *x;
        of(Node, lhs, x, rhs) return sum(*lhs) + *x + sum(*rhs);
    }

    // Invalid input (no such variant).
    return -1;
    }

Where lhs, x, rhs magically match the types defined above. What a nonsense design!

different_base 2 years ago |

One of the crimes of modern imperative programming languages is not having ADTs (except maybe Rust) built-in. It is such a basic mental model of how humans think and solve problems. But instead we got inheritance and enums which are practically very primitive.

KerrAvon 2 years ago |

Anyone considering using this should be strongly looking at using Swift or Rust instead. You can build almost any given language idea using the C macro preprocessor, but that doesn't mean it's a good idea to ship production code using it.

The worst codebases to inherit as a maintenance programmer are the ones where people got clever with the C preprocessor. Impossible to debug and impossible to maintain.

endgame 2 years ago | |

C99 is a stable target for writing bootstrappable software: there are multiple mature compiler implementations, at least one of which is bootstrappable down to hex0, and the bootstrap chain is not too long.

392 2 years ago | |

I find that most abuses of the preprocessor are by folks unwilling/unable to simplify their design into a form that's (a) native to the C language/runtime or (b) not repetitive to type.

This library on the other hand addresses a nasty papercut whose presence usually stops folks with modern language experience from choosing C when it might otherwise be valid. Plus you can't beat C's long-term stability.

Though I agree that 90+% who _think_ they still need C should probably move on to making Rust work for them, instead.

sealed interface BinaryTree { record Leaf(int value) implements BinaryTree {} record Node(BinaryTree lhs, BinaryTree rhs, int value) implements BinaryTree {} } public class Hello { static int sum(BinaryTree tree) { return switch (tree) { case BinaryTree.Leaf(var value) -> value; case BinaryTree.Node(var lhs, var rhs, var value) -> sum(lhs) + value + sum(rhs); }; } public static void main(String... args) { var tree = new BinaryTree.Node( new BinaryTree.Leaf(1), new BinaryTree.Node( new BinaryTree.Leaf(2), new BinaryTree.Leaf(3), 4), 5); System.out.println(tree); System.out.println("Sum: " + sum(tree)); } }

final boolean hasUncollectedSecret = switch (each) { case Wall() -> false; case Goal() -> false; case Player p -> false; case BasicCell(Underneath(_, var collectible), _) -> switch (collectible) { case NONE, KEY -> false; case SECRET -> true; }; case Lock() -> false; };

#[derive(Debug)] pub enum Example { Foo(i32), Bar(&'static str), } let mut ex: Example = Example::Foo(42); println!("{ex:?}"); // Foo(42) let ex_ref: &mut Example = &mut ex; *ex_ref = Example::Bar("hello"); println!("{ex:?}"); // Bar("hello")

MyType= (Scalar4, Real4, NullTerminatedStringC); MyUntaggedRecType= RECORD CASE MyType OF Scalar4: (longC: ARRAY[1..150] OF longint); Real4: (floatC: ARRAY[1..150] OF real); END; MyTaggedRecType= RECORD CASE tag: MyType OF Scalar4: (longC: ARRAY[1..150] OF longint); Real4: (floatC: ARRAY[1..150] OF real); END; ... { set all to 0.0 without running through the MC68881 } FOR j := 1 TO 150 DO longC[j]:= 0; ... CASE tag OF Scalar4: percentReal = longC[1]; floatC: percentReal = floatC[1]*100; ELSE percentReal = 0.0/0.0;

trait Trait { const C: i32 = 0; } impl Trait for i32 {} impl Trait for &'static str {} fn foo() -> Box<dyn Trait> { if true { Box::new("") } else { Box::new(42) } } error[E0038]: the trait `Trait` cannot be made into an object --> f500.rs:6:17 | 6 | fn foo() -> Box<dyn Trait> { | ^^^^^^^^^ `Trait` cannot be made into an object | note: for a trait to be "object safe" it needs to allow building a vtable to allow the call to be resolvable dynamically; for more information visit <https://doc.rust-lang.org/reference/items/traits.html#object-safety> --> f500.rs:2:11 | 1 | trait Trait { | ----- this trait cannot be made into an object... 2 | const C: i32 = 0; | ^ ...because it contains this associated `const` = help: consider moving `C` to another trait = help: the following types implement the trait, consider defining an enum where each variant holds one of these types, implementing `Trait` for this new enum and using it instead: &'static str i32