Parse, Don't Validate – In a Language That Doesn't Want You To

Parse, Don't Validate – In a Language That Doesn't Want You To(cekrem.github.io)

60 points by fagnerbrack 2 hours ago | 42 comments

ramon156 34 minutes ago |

The author found out about the square holes in round peg situation with TS. Functions can implicitly error, and there's no annotation that's enforced to tell you that it might error. FP solves this with Result/Option, but this doesn't fit in TS. Effect is there to find a solution but will fail.

Zod is the acceptable middleground in my opinion. Zod will allow you to throw a schema against an object and it'll tell you "yes the result fits your schema". This is fine for most projects.

If you want to go zero-dependency, you can see how far you can get with TS's type system. Branded types are kinda cool. NewTypes are also cool, but also high maintenance. Unless you're building a library that millions depend on, it's probably not worth it.

epolanski 28 minutes ago | |

> Effect is there to find a solution but will fail.

What do you mean?

I'm into Effect from long time and it really scales well the more complex your applications.

Schema is way more advanced than Zod by the way, both at type level and functionality it has a proper decoder/encoder architecture.

You can encode "this isn't just a string -> non-empty-string -> valid email pattern" but a confirmed email the user has clicked on at the type level, by leveraging effectful schemas (and durable workflows if you want).

You may not need it 99% of the time, I myself rarely use that, but it's not a fair comparison.

Zod is more ergonomic, has easier apis and is perfect for most users. Would not recommend schema unless one buys the whole package.

programmarchy 4 minutes ago | | |

I haven’t used Effect but the problem I see with using it is that it seems to want to completely swallow the whole app architecture. At that point, why not just use a functional language?

Altern4tiveAcc 56 minutes ago |

Zod is by far the most ergonomic way to express those ideas in TypeScript these days. I miss it when writing code in other languages.

The friction with the rest of the ecosystem is real, though. Most code out there expects you to handle errors with exceptions.

I get the impression that polymorphic return types could get in the way of JSC/V8/SpiderMonkey's JIT, but I haven't measured it and I'm not sure of the actual impact on hot and cold paths. Same for all the allocations caused by custom Option<T>/Result<T,E> implementations.

I think using Zod at the edge (with branded types and whatnot), while keeping return types as T/Promise<T> to keep a sane relationship with the ecosystem is a good middle ground.

jerf 8 minutes ago | |

I haven't done a lot of Typescript, but I've done at least a couple of month's worth now, and every time I have to type "as" my inner Haskell programmer screams.

If I could add one feature to Typescript it would be something like "as" that actually validates the result against the type system and can fail. Unfortunately, that's way, way easier said than done. It's the bad type of keyword that has unbounded runtime cost because it would have to be a runtime comparison, and there are a lot of design questions about how to write it. However, I still petulantly want it even though I can hardly define it. "zod" is pretty good but you can see how trying to add that as a "keyword" is nightmare fuel for a language-level change.

throwaw12 43 minutes ago |

I personally love the idea and concept, but struggle to apply to real projects.

Suppose I have a User with some attributes like birthday, email and whether they have been verified.

in common codebase, you can see `if (user.verified_at != null)` or something along the lines, in case of parsed code I do feel like I should have types for each of them (or interfaces):

    - UserWithBirthday
    - VerifiedUser, UnverifiedUser
    - UserWithEmail, UserWithoutEmail

(and imagine having a method which accepts user with birthday and email to send an email day before their birthday, would you create UserWithBirthdayAndEmail type?)

it feels like it is going to bloat the interface space, how do you tackle this problem?

exceptione 45 minutes ago |

It is nice the author mentioned F#, because if you want to target the browser (or any JavaScript runtime), you can do from F# directly from fable (https://fable.io). This allows you to program by default in a type safe manner without having to play tricks to circumvent the limits of structural typing.

robrenaud 6 minutes ago | |

I suspect idiomatic TypeScript or idiomatic F# are both way better solutions in the real world than abstruse Typescript emulating idiomatic F#.

robertlagrant 2 hours ago |

This feels right, and I also have never done it (or had the guts to get others to do it).

The reason I've not is - say there's an optional field. Currently we call that null, probably, and check each time if it's there or not. I could instead make a type, like User and UserWithPhoneNumber. Should we be making types for each combination of present/absent fields? That can't be right.

The classic answer is to move the logic inside the domain object, or have a helper function outside the object, so you aren't constantly checking for field presence/absence, but are instead writing the logic once and calling some code.

I'm not sure in practice types can help with this. But I'd love to be proven wrong.

rzmmm 51 minutes ago |

Is there benefit of using this branded type over just encapsulating the raw string in a private variable in closure or class? This feels a bit like forced nominal typing. The Email type doesn't have to be a string, it can be encapsulated so that invalid Emails are not representable.

iainmerrick 43 minutes ago | |

The main advantage of branding is that it’s a zero-cost abstraction -- the boilerplate vanishes at runtime. Just using a string instead of a containing object can give you a lighter-weight runtime.

lumpysnake 1 hour ago |

We should make authors disclose how much AI was used to write an article. This reeks of Opus 4.8.

lijok 1 hour ago | |

Why should they disclose how much AI was used to write an article?

lumpysnake 31 minutes ago | | |

Because I would've completely avoided the article if I knew that I would be served slop. I was interested in the content, but I was immediately thrown off by the writing style, which closely resembles what I've been getting from Opus 4.8 lately in my dev work. Filler language and useless metaphors everywhere.

> Booleans look tidy until somebody adds a third case and exhaustiveness silently doesn’t kick in. Strings narrow honestly.

Like, nobody truly writes like that. It wouldn't get past any competent editor.

Strings narrow honestly? What does that even mean? This kind of 3-word precision is useless and they appear everywhere in the article. We get the point with in the first sentence, no need to add more.

Bjartr 54 minutes ago | | |

If nothing else, it should be done as a courtesy to those who would like to avoid such content.

If the result is better for having used AI, why wouldn't an author want to disclose it?

ramon156 30 minutes ago | |

I recently made a Firefox Extension to mark authors as Slop for the same goal but not the same reason.

I don't think disclosing helps here. If the article wasn't obviously generated, why would that affect you ?

The only issue I have is being half-way through the article and realizing I am reading hallucinated text. If I can mark the author once, I won't see them again. This works fine for me. You could argue that disclosing would fix this issue, but the issue is not that AI was used, but that it was not curated.

somat 47 minutes ago |

"TypeScript is structurally typed, which means two types with the same shape are the same type. string is string is string"

I don't speak typescript so am probably missing something obvious. but. why would you parse an email(or anything really) into a string? (or string equivalent) When parsed it will end up as a specific email object, that is, something closer to a C struct. What is the articles dance doing?

exceptione 30 minutes ago | |

Javascript doesn't have structs. The idea is that you have data on one hand and you have type witness about that data on the other hand. Type witness is something for the type system. But here you encounter the limits of structural typing versus nominal typing, because structural typing isn't able to witness that directly.

In sufficiently nominal type systems, I can hide the constructor for an EmailAddress type (as in: nobody can just construct an EmailAddress type). In Haskell speak, I can then export a function parseEmailAddress = rawString :: string -> EmailAddress. The function parseEmailAddress is the only place that has access to the constructor. Which means that the only way to turn a string into an EmailAddress is by calling parseEmailAddress.

Note that at runtime EmailAddress is just a string. The boundaries live in the type system, not on the value level. A structural typing system (as in TypeScript) does not enable that, it forces you to turn EmailAddress into something else than just a string.

Are you confusing Email vs EmailAddress? I think that in many cases would prefer to be EmailAddress represented as a dumb string at runtime. But if you don't, you will easily find other examples where you have 2 structurally similar types, that you don't want to mix up.

camdenreslink 42 minutes ago | |

In some languages you can create a type that is equivalent to a string, but it’s own distinct type (sometimes called the New Type pattern). Which I guess is the same as a struct with a single field, but languages have syntactic sugar, and depending on implementation doesn’t allocate another extra wrapper object on the heap (this would happen in JavaScript/TypeScript).

LelouBil 43 minutes ago | |

Look up NewTypes.

The article's dance is to avoid having extra fields that are completely unnecessary here. They want some kind of nominal email type, that is actually a string, so can be used in places where a string is needed, but when a method requires an "email" you can't use any string.

It's a pretty common pattern in functional programming and in many other languages nowadays

wwalexander 30 minutes ago |

This is just validation that is using the type system to indicate the validation has already occurred. I think the real point of “parse, don’t validate” is to make the type system give you structural guarantees that couldn’t exist otherwise (e.g. always having a first/last element in the NonEmpty example from the original article). If you’re just branding the types as “parsed” (in reality, simply validated) you still have to know that the invariants you care about hold when using the “parsed” type (e.g. splitting the email type using “@“ will always yield 2 elements), instead of the structure of the type holding that info inherently (e.g. struct Email { name: String, host: String }).

jerf 14 minutes ago | |

"This is just validation that is using the type system to indicate the validation has already occurred. I think the real point of “parse, don’t validate” is to make the type system give you structural guarantees that couldn’t exist otherwise (e.g. always having a first/last element in the NonEmpty example from the original article)."

It's the same thing. In the latter case, something has validated that your NonEmpty has a first and a last element. It's all validation before you stick it in a type that asserts that the validation is guaranteed to have occurred so every function receiving it doesn't need to do it itself.

Any non-trivial use of a type system will involve making guarantees the type system itself can not actually express. There's nothing wrong with saying "this is a valid email in accordance with my standards" in a type. Merely using the type system to assert "I have some sort of value in the name and host fields" is valid but a degenerate use. "struct Email { name: Name, host: Hostname }" is an even stronger use of the type system, where Name and Hostname are themselves values you can only get by passing some incoming string through a validation process. Asserting that these things exist is just the most basic check possible, but your type still permits {name: "\0\0\0\0\0\0", host: "!"}, whereas under my definition, assuming that Name and Hostname are reasonably defined, that value will not be ever be something that can be witnessed.

In fact in general, while I don't absolutely rigidly apply this, especially in smaller script-like programs, when a "string" appears in my strong types that specifically means "this has unbounded contents". It's an appropriate type for "stuff I got off a network" or "stuff a user typed". What stuff? Don't know. Haven't checked it yet. When I do it'll get a more specific type like a Username or DecodedUTF8String or something else. Thanks to people using way too many "strings" and "ints" in the world I have to constantly explain to my LLM that I want stronger types. I'm yet to find the invocation to put into my CLAUDE.md or equivalent to get it to do it right the first time consistently.

hankbond 1 hour ago |

As a new TypeScript user these are concepts that have greatly helped me simplify my code and improve reliability discrete of testing. Many LLMs guide in this direction if you loosely ask them, but having a concise post like this with the what and the why is fantastic as reference material. The suggestion to use Separation and a Linter rule is something I'm going to immediately look into for my current project. Great post!

ivolimmen 1 hour ago |

One of the pillars of Domain Driven Design. I love working on a pure DDD application but I do not often convince my team (I am a constant) that this is the best way ...

jve 55 minutes ago | |

> I am a constant

What did you mean by that? You don't accept mutability or any inputs on your state of mind?

ramses0 1 hour ago |

Meta: in addition to upvotes and downvotes, we almost need a slop/not-slop slider.

This one barely scrapes by at what feels like 30-40% "slop": "honestly", "the one thing", etc...

...but I did learn something about "Brand" types, and have personally tried to do more of "parse don't validate" in my own code.

Recently I did this similar trick for `exec( ValidExecutable(...) )` [python], where it required tagging/washing through a private function/variable to "get" the private bit.

All the scanners tend to light up when they see "exec" at all (eg: `exec( "pandoc" )` for PDF generation), but I needed to hard code a few "expected" pandoc locations so the imaginary hackers couldn't shadow "pandoc" on a path location they controlled.

conartist6 1 hour ago |

Don't forget to freeze the objects

type User = { name: string; verified: boolean; email?: string; lastName: string; birthday?: string | { year: string; month: string; date: string; }} type Birthday = Required<Pick<User, 'birthday'>>; type UserWithBirthday = User & { birthday: Birthday } type VerifiedUser = User & { verified: true; email: string; } type VerifiedUserWithBirthday = User & UserWithBirthday & VerifiedUser; const userHasBDayAndEmail = (user: User): user is VerifiedUserWithBirthday => { if (user.email === undefined || user.birthday === undefined) { return false } return true }

type User = { // ... rest of fields email: { verified: boolean, // branded type here ensures that this string is a proper email address value: EmailAddress, }, birthday: Date | null, };