Parse, Don't Validate – In a Language That Doesn't Want You To(cekrem.github.io) |
Parse, Don't Validate – In a Language That Doesn't Want You To(cekrem.github.io) |
Zod is the acceptable middleground in my opinion. Zod will allow you to throw a schema against an object and it'll tell you "yes the result fits your schema". This is fine for most projects.
If you want to go zero-dependency, you can see how far you can get with TS's type system. Branded types are kinda cool. NewTypes are also cool, but also high maintenance. Unless you're building a library that millions depend on, it's probably not worth it.
What do you mean?
I'm into Effect from long time and it really scales well the more complex your applications.
Schema is way more advanced than Zod by the way, both at type level and functionality it has a proper decoder/encoder architecture.
You can encode "this isn't just a string -> non-empty-string -> valid email pattern" but a confirmed email the user has clicked on at the type level, by leveraging effectful schemas (and durable workflows if you want).
You may not need it 99% of the time, I myself rarely use that, but it's not a fair comparison.
Zod is more ergonomic, has easier apis and is perfect for most users. Would not recommend schema unless one buys the whole package.
The friction with the rest of the ecosystem is real, though. Most code out there expects you to handle errors with exceptions.
I get the impression that polymorphic return types could get in the way of JSC/V8/SpiderMonkey's JIT, but I haven't measured it and I'm not sure of the actual impact on hot and cold paths. Same for all the allocations caused by custom Option<T>/Result<T,E> implementations.
I think using Zod at the edge (with branded types and whatnot), while keeping return types as T/Promise<T> to keep a sane relationship with the ecosystem is a good middle ground.
If I could add one feature to Typescript it would be something like "as" that actually validates the result against the type system and can fail. Unfortunately, that's way, way easier said than done. It's the bad type of keyword that has unbounded runtime cost because it would have to be a runtime comparison, and there are a lot of design questions about how to write it. However, I still petulantly want it even though I can hardly define it. "zod" is pretty good but you can see how trying to add that as a "keyword" is nightmare fuel for a language-level change.
Suppose I have a User with some attributes like birthday, email and whether they have been verified.
in common codebase, you can see `if (user.verified_at != null)` or something along the lines, in case of parsed code I do feel like I should have types for each of them (or interfaces):
- UserWithBirthday
- VerifiedUser, UnverifiedUser
- UserWithEmail, UserWithoutEmail
(and imagine having a method which accepts user with birthday and email to send an email day before their birthday, would you create UserWithBirthdayAndEmail type?)it feels like it is going to bloat the interface space, how do you tackle this problem?
The reason I've not is - say there's an optional field. Currently we call that null, probably, and check each time if it's there or not. I could instead make a type, like User and UserWithPhoneNumber. Should we be making types for each combination of present/absent fields? That can't be right.
The classic answer is to move the logic inside the domain object, or have a helper function outside the object, so you aren't constantly checking for field presence/absence, but are instead writing the logic once and calling some code.
I'm not sure in practice types can help with this. But I'd love to be proven wrong.
> Booleans look tidy until somebody adds a third case and exhaustiveness silently doesn’t kick in. Strings narrow honestly.
Like, nobody truly writes like that. It wouldn't get past any competent editor.
Strings narrow honestly? What does that even mean? This kind of 3-word precision is useless and they appear everywhere in the article. We get the point with in the first sentence, no need to add more.
If the result is better for having used AI, why wouldn't an author want to disclose it?
I don't think disclosing helps here. If the article wasn't obviously generated, why would that affect you ?
The only issue I have is being half-way through the article and realizing I am reading hallucinated text. If I can mark the author once, I won't see them again. This works fine for me. You could argue that disclosing would fix this issue, but the issue is not that AI was used, but that it was not curated.
I don't speak typescript so am probably missing something obvious. but. why would you parse an email(or anything really) into a string? (or string equivalent) When parsed it will end up as a specific email object, that is, something closer to a C struct. What is the articles dance doing?
In sufficiently nominal type systems, I can hide the constructor for an EmailAddress type (as in: nobody can just construct an EmailAddress type). In Haskell speak, I can then export a function parseEmailAddress = rawString :: string -> EmailAddress. The function parseEmailAddress is the only place that has access to the constructor. Which means that the only way to turn a string into an EmailAddress is by calling parseEmailAddress.
Note that at runtime EmailAddress is just a string. The boundaries live in the type system, not on the value level. A structural typing system (as in TypeScript) does not enable that, it forces you to turn EmailAddress into something else than just a string.
Are you confusing Email vs EmailAddress? I think that in many cases would prefer to be EmailAddress represented as a dumb string at runtime. But if you don't, you will easily find other examples where you have 2 structurally similar types, that you don't want to mix up.
The article's dance is to avoid having extra fields that are completely unnecessary here. They want some kind of nominal email type, that is actually a string, so can be used in places where a string is needed, but when a method requires an "email" you can't use any string.
It's a pretty common pattern in functional programming and in many other languages nowadays
It's the same thing. In the latter case, something has validated that your NonEmpty has a first and a last element. It's all validation before you stick it in a type that asserts that the validation is guaranteed to have occurred so every function receiving it doesn't need to do it itself.
Any non-trivial use of a type system will involve making guarantees the type system itself can not actually express. There's nothing wrong with saying "this is a valid email in accordance with my standards" in a type. Merely using the type system to assert "I have some sort of value in the name and host fields" is valid but a degenerate use. "struct Email { name: Name, host: Hostname }" is an even stronger use of the type system, where Name and Hostname are themselves values you can only get by passing some incoming string through a validation process. Asserting that these things exist is just the most basic check possible, but your type still permits {name: "\0\0\0\0\0\0", host: "!"}, whereas under my definition, assuming that Name and Hostname are reasonably defined, that value will not be ever be something that can be witnessed.
In fact in general, while I don't absolutely rigidly apply this, especially in smaller script-like programs, when a "string" appears in my strong types that specifically means "this has unbounded contents". It's an appropriate type for "stuff I got off a network" or "stuff a user typed". What stuff? Don't know. Haven't checked it yet. When I do it'll get a more specific type like a Username or DecodedUTF8String or something else. Thanks to people using way too many "strings" and "ints" in the world I have to constantly explain to my LLM that I want stronger types. I'm yet to find the invocation to put into my CLAUDE.md or equivalent to get it to do it right the first time consistently.
What did you mean by that? You don't accept mutability or any inputs on your state of mind?
This one barely scrapes by at what feels like 30-40% "slop": "honestly", "the one thing", etc...
...but I did learn something about "Brand" types, and have personally tried to do more of "parse don't validate" in my own code.
Recently I did this similar trick for `exec( ValidExecutable(...) )` [python], where it required tagging/washing through a private function/variable to "get" the private bit.
All the scanners tend to light up when they see "exec" at all (eg: `exec( "pandoc" )` for PDF generation), but I needed to hard code a few "expected" pandoc locations so the imaginary hackers couldn't shadow "pandoc" on a path location they controlled.
type User = { name: string; verified: boolean; email?: string; lastName: string; birthday?: string | { year: string; month: string; date: string; }}
type Birthday = Required<Pick<User, 'birthday'>>;
type UserWithBirthday = User & { birthday: Birthday }
type VerifiedUser = User & { verified: true; email: string; }
type VerifiedUserWithBirthday = User & UserWithBirthday & VerifiedUser;
const userHasBDayAndEmail = (user: User): user is VerifiedUserWithBirthday => {
if (user.email === undefined || user.birthday === undefined) {
return false
}
return true
}
Any caller of userHasBDayAndEmail knows for the rest of its nested call stack if the provided user is a User object or a VerifiedUserWithBirthday.The types are cheap to write (they're all derived) and have no runtime impact (types are erased at build/compile time) and these parsing functions are quite small to write
https://www.typescriptlang.org/play/?#code/FAFwngDgpgBAqgZyg...
Suppose you want to add one more property to VerifiedUserWithBirthday and UnverifiedUserWithBirthday, you might get 2 more new types, and somewhere at the higher layer call chains you need to know which enclosing type you should pass so that some method in the bottom chain will accept it.
I am sure there are more elegant ways, but I am struggling to generalize it to most enterprise SaaS CRUD apps, where you have one object with bunch of properties and can conditionally traverse the code logic
In your instance, you could have:
type User = {
// ... rest of fields
email: {
verified: boolean,
// branded type here ensures that this string is a proper email address
value: EmailAddress,
},
birthday: Date | null,
};
In this instance, your logic with a method that accepts birthday and email has all the information it needs to make its choice. fn send_birthday_mail(user: {u: User, u.birthday != null})
Contracts are a similar solution that restricts the predicates to only appearing in function types.The difference between this and an assert is that it gets checked at compile time (it can get quite expensive to do the check though).
What can you do in mainstream languages? As much as is worth and no more than that. String -> User is worth it, User -> UserWithBirthday is not.
The combinatorial explosion you're picturing only shows up if you make a separate type per combination of present fields, but you don't need to. An independent optional field stays one `T | null`. You only reach for distinct types when fields are correlated and present together because they represent a state, and then it's a discriminated union on a status field, which is N states, not 2^N.
Using types like this also means you can more easily avoid assignment errors, as everything will have a very specific type (e.g. Age instead of int).
The short version is: the shape of a type is inherent to the type itself, but the optionality of its members is dependent on the situation. A type system that solves this problem separates these concepts to allow for this distinction.
I _suspect_ it's possible to implement something like that in typescript but I haven't tried it myself (and I doubt it's very ergonomic).
class User{phone: ?PhoneNumber}
over class User{phone: ?string}.It's more about writing
struct User {phone: MaybePhoneNumber} // give or take, it's a monoid
over struct User {phone: Option<String>}It’s frankly depressing when (2018) oldies-but-goodies get reposted here for the Nth time. The clarity of thought and obvious effort that went into communicating that thought that used to be expected in top-voted posts has largely been submerged into a daily dump of “the LLM just cleaned up my notes” slop.
> monoid
nullables with `??` and `?.` are also give-or-take monoids. is it common though to `or` two MaybePhoneNumbers together or to apply a PhoneNumber->MaybePhoneNumber function to it? if not then why mention it?
let's see something meaningfully different like a database schema.
[1] https://esolangs.org/wiki/Trivial_brainfuck_substitution
At the end of the day, the ideas within the content are what matters. An idea has or does not have merit regardless of if it was produced entirely by a person, or by a person using AI as an editor, or 100% generated by AI. If you need a disclosure on if an idea was produced by AI, you are saying that you have no interest on debating the content on the grounds of the arguments it is making, while simultaneously ceding you can’t tell the difference between someone using AI and someone who isn’t (which undermines one of the primary arguments against AI, that it makes for inferior outputs).
> power being used by the data center is renewable
That doesn't change anything about the content itself. AI writing is a disservice to the reader. Why should I even care to read an article you didn't even care about writing yourself? At this point a 300-character tweet would've achieved the same effect.
A translation app changes nearly 100% of the content, often changes the writer's style/voice, and can introduce hard to detect errors. But there's a far closer correspondence to what was written by the original writer. The basic ideas are still from the writer. A translation app is not expanding a short idea into something longer, and including some things the original writer never thought in the process.
***
Pre-LLMs, I did in fact disclose when I was using a translation app in some translations of scientific articles I produced. It would be weird to disclose the use of spell checking, grammar checking, or who previously taught me writing as these things are ubiquitous. I will also acknowledge people who were influential in my thinking. If a LLM is doing a lot of the thinking for me then I do think disclosing LLM use is appropriate.