Why static languages suffer from complexity(hirrolot.github.io) |
Why static languages suffer from complexity(hirrolot.github.io) |
Most of the problems that cause non-trivial bugs come from invariant violations. At point A, there's some assumption, and way over there at point B, that assumption is violated. That's an invariant violation.
Type systems prevent some invariant violations. Because that works, there are ongoing attempts to extend type systems to prevent still more invariant violations. That creates another layer of confusing abstraction. Some invariants are not well represented as types, and trying makes for a bad fit. What you're really trying to do is to substitute manual specification of attributes for global analysis.
The Rust borrow checker is an invariant enforcer. It explicitly does automatic global analysis, and reports explicitly that what's going on at point B is inconsistent with what point A needs. This is real progress in programming language design, and is Rust's main contribution.
That's the direction to go. Other things might be dealt with by global analysis, Deadlock detection is a good example. If P is locked before Q on one path, P must be locked before Q on all paths. There must be no path that leads to P being locked twice. That sort of thing. Rust has a related problem with borrows of reference counted items, which are checked at run time and work a lot like locks. Those potentially have a double-borrow problem related to program flow. I've heard that someone is working on that for Rust.
> ...
> The Rust borrow checker is an invariant enforcer. [...] This is real progress in programming language design, and is Rust's main contribution.
I'm so confused by your stance here. You essentially say "type systems are not useful" and then "oh but this most recent advance in type systems — that one is useful." Do you find type systems useful or not?
There are a lot of properties we can analyze statically, and practically all of them essentially amount to extensions of type systems. Any of them increases our ability to rule out undesirable programs from every beginning execution. Some of them have unintuitive syntax, but many of them are no more syntactically burdensome than most other type systems. This is especially true if you consider how far we've come with type inference, so we no longer have to write code with the verbosity of Java just to get some meager guarantees. It's still a very active area of research, but we're clearly making progress in useful ways (which you even highlight), so I don't really know what point it is you've set out to make.
> It explicitly does automatic global analysis
They appear to think that the borrow checker isn't achieved with type theory, but with some other technique ("global analysis").
Although, to be fair, my understanding of making a practical affine type checker is that things get kind of wonky if you do it purely logically. So practically you do some data flow analysis. Which is, I believe, what rust is doing. This also explains why MIR was such a big deal for certain issues with borrow checking. They ended up with a format that was easier to run a data flow analysis on, and that allowed the borrow checker to handle things like non-lexical lifetimes, etc.
[I've only read about such things. So I might have mis-remembered some of the details. However, this is my take on why someone might not call rust's advances purely type theoretic (even if they can be handwaved as type theory at a high level).]
Not the original author but it seems like they're saying that type-systems are non-specific invariant enforcers and so have costs without necessarily having benefits whereas a user-specifiable invariant enforcer is more guaranteed to have the benefits.
Two examples from the top of my head:
1. Encoding matrix sizes into the data- and function-types, so that you can safely have a function `mat[c,b] mat_mult(mat[a,b] a, mat[c,d] b)` or even `mat[w-2,h-2] convolve(mat[w,h] input, mat[3,3] kernel)` and have the compiler check that you never use a matrix of the wrong size.
2. Actually checking the correctness of your implementation.
There is a very nice online demo of Liquid Haskell [1], where they defined the properties of ordered lists (each element has to be smaller or equal to the one before, line 119). Then they define a function that takes an (unordered) list and spits out a ordered one by applying a simple quicksort.
Now, if you break the algorithm (e.g. flip the < in line 193) and run Check, the compiler will tell you that you messed up your sorting algorithm.
Pretty neat.
edit: I just realized that LiquidHaskell is almost 10 years old. Sad to see that basically nothing made it into "production".
And yet type theory is an excellent way to express all kinds of invariants. The more rich the type system the more you can express. If you get to dependent types you essentially have all of mathematics at your disposal. This is the basis of some of the most advance proof automation available.
What is super cool is that proofs are programs. You can write your programs and use the same language to prove theorems about them.
This is still fairly advanced stuff for most programming activities but the languages have been steadily getting better and the automation faster and I think the worlds will eventually collide in some area of industrial programming. We're already seeing it happen in security, privacy, and networking.
I don't think type systems suffer from complexity. They merely reveal the inherent complexity. You can use languages that hide it from you but you pay a price: errors only become obvious at run time when the program fails. For small programs that's easy enough to tolerate but for large ones? Ones where certain properties cannot fail? Not so much in my experience.
update: clarified wording of "proofs are programs"
Why is this interesting? You pay an extremely heavy price in terms of language complexity. In practise, you almost never have the invarants at all or correct when you begin programming, and your programs evolve very rapidly. Since with dependent types you loose type-inference, you now what to evolve two programs rather than one. Moreover proofs are non-compositional: you make a tiny change somewhere and you might have to change all proofs. In addition we don't have full dependent types for any advanced programming language features, we have them only for pure functions that terminate.
> the same language to prove theorems about them
That sounds like a disadvantage. Ultimately verification is about comparing two 'implementations' against each other (in a very general sense of implementations where logical specs and tests also count as implementations). And the more similar the two implementations, the more likely you are to make the same mistake in both. After all, your specification is just as likely to be buggy as your implementation code.
> type systems suffer from complexity.
This is clearly false for just about any reasonable notion of complexity. For a start pretty much as soon as you go beyond let-polymorphism in terms of typing system expressivity, you looks type-inference. Even type-checking can easily become undecidable when the ambient typing system is too expressive.
There is no free lunch.
The only difference is: instead of brittle hierarchies, we get ossified compositions (depending on how much nominal vs structural typing happens).
We, of course, agree that we are quite some distance from having advanced type systems brought to day-to-day industrial programming.
Didn't Kurt Gödel and Alan Turing do some work on proving statements within a system?
A success story in this regard is the async keyword. Very quickly you can get used to it and it feels like any other imperative programming.
In C# if I can add assertions and have C# compile time check the source that the assertion will not be violated. This would be great. I know they do this for null checking.
Ah yes. And then you end up writing entire prgrams in types. So the next logical setep would be to start unit- and integration tests for these types, and then invent types for those types to more easily check them...
> you essentially have all of mathematics at your disposal.
Most of the stuff we do has nothing to do with mathematics.
And yet Go is adding generics in 1.8. And I'm sure its type system in another 5 years will be much more expressive than 1.8's. The community has long been saying that the minimal type system isn't enough.
Isn't it stil mostly Java and C++? That's what I hear all the time here.
Also, I'm not sure what point you're trying to make. You start by saying that fascination with types systems is not useful in practice, and end with an example where it is useful (Rust). While Go can stick a GC to avoid most of the issues that Rust is trying to solve, it stil has to ship with a defer mechanism (no linear/affine types/RAII) and a data race detector.
I'm out of touch, but I would expect that there is a lot more Go code by now, and it also didn't catch up with C++ or Java.
Go's type system is much weaker and less expressive than either Java's or C++'s. C++ in particular has parametric polymorphism, type constructors, and dependent types. Go has none of those.
Which is exactly what a type error is!
> The Rust borrow checker is an invariant enforcer. It explicitly does automatic global analysis, and reports explicitly that what's going on at point B is inconsistent with what point A needs. This is real progress in programming language design, and is Rust's main contribution.
> That's the direction to go.
The borrow checker is an ad-hoc informally specified implementation of half of an affine type system. Having to switch programming languages every time you want to add a new invariant is a poor paradigm. What we need is a generic framework that allows you to express invariants that are relevant to your program - but again, that's exactly what a type system is.
Rust has done a great thing in showing that this is possible, but linear Haskell or Idris - where borrow checking is not an ad-hoc language feature that works by gnarly compiler internals, but just a normal part of the everyday type system that you can use and customize like any other library feature - are the approach that represents a viable future for software engineering.
In principle you could implement a form of borrow check with linear types (I don’t think affine is good enough), but the ergonomics would be horrible.
I think typescript with gradual and structural typing and similar like mypy or sorbet are making real difference.
Type systems provide multiple benefits, performance, self-documentation, better tooling and more explicit data model.
Rust has automatic memory management.
> or even explicit multithreading.
You don't need explicit multithreading to run into data races. Languages that allow any kind of unchecked mutable state sharing and allow any form of concurrency (explicit or hidden) are prone to that problem.
Even single-threaded programs with aliasing of mutable variables are hard to reason about and Rust improves on that considerably by not allowing accidental aliasing.
You mean, one of the companies with the largest number of developers on the world, paying one of the highest average salaries for them is able to use the language?
That means absolutely nothing.
At least so far. Rust might change that in the future.
In theory this is true. If the compiler is decent, compile times and analysis shouldn't really be affected. Maybe libraries will use X but otherwise they would use a manual implementation of X anyways.
But in practice developers misuse features, so adding a feature actually leads to worse code. It also creates a higher learning curve, since you have to decide whether to use a new feature or just re-implement it via old features. See: C++ and over-engineered Haskell. So each feature has a "learnability cost", and only add features which are useful enough to outweigh the cost.
But most features actually are useful, at least for particular types of programs. It's much harder to write an asynchronous program without some form of async; it's much harder to write a program like a video game without objects. This may be controversial, but I really don't like Go and Elm (very simple languages) because I feel like have to write so much boilerplate vs. other languages where I could just use an advanced feature. And this boilerplate isn't just hard to create, it's hard to maintain because small changes require rewriting a lot of code.
So ultimately language designers need to balance number of features with expressiveness: the goal is to use as few simple but powerful features to make your language simple but really expressive. And different languages for different people. Personally I like working with Java and Kotlin and Swift (the middle languages in the author's meme) because I can establish coding conventions and stick to them, C++ and Haskell are too complicated and it's harder to figure out and stick to the "ideal" conventions.
I guess the author just hasn't encountered Nim before, where anything becomes compile time by just assigning to a const, and macros have access to the real AST without substitution. Macros also allow compile time type inspection, as they are a first class citizen rather than tacked on.
The compile time print, AFAICT, already exists in Nim as the `&` macro in strformat. That lets you interpolate what you like at compile time, and supports run time values too.
> Yes, you can write virtually any software in Zig, but should you? My experience in maintaining high-level code in Rust and C99 says NO.
Maybe gain some experience with Zig in order to draw this conclusion about Zig?
This was indeed weird to read, given that only Zig (and soon the JVM) solves this problem, and is well known for the fact. Especially when language design and type theory are an area of interest.
But hey, silver lining: Zig still kind of came out on top.
But anyway, thanks for your work on Zig. Your metaprogramming concepts were heavily influential for some of the ideas in my own language, Empirical.
How?
> Sometimes, software engineers find their languages too primitive to express their ideas even in dynamic code. But they do not give up . . .
Is this a failure of the language, or a failure of the engineer?
> If we make our languages fully dynamic, we will win biformity and inconsistency,[^] but will imminently lose the pleasure of compile-time validation and will end up debugging our programs at mid-nights . . . One possible solution I have seen is dependent types. With dependent types, we can parameterise types not only with other types but with values, too.
Types are a productive abstraction/model in programming languages. One of many. Each has its strengths and weaknesses; each is appropriate in some circumstances and not in others. Types are not the solution to all problems, any more than currying or OOP or whatever else is.
Production languages (like prolog or make) don’t need an if statement or operator as selection is implicit when a production matches.
https://github.com/fsharp/fslang-suggestions/issues/243#issu...
https://old.reddit.com/r/ProgrammingLanguages/comments/placo...
F# designer Don Syme is making the "biformity" argument, e.g. needing a debugger for compile time as well as runtime.
and
Syme & Matsakis: F# in the Static v. Dynamic divide https://old.reddit.com/r/ProgrammingLanguages/comments/rpcm6...
I still think something an application language with something like Zig's comptime would fill a big niche. (As opposed to a systems language.)
It seems that fixing this is a research problem, which would lead to the holy grail of programming languages, i.e. an ultimate language that is as expressive as Idris and as efficient as Rust, and is thus essentially perfect.
Programming languages are not theoretical things. They're concrete, practical tools that _enable_ other stuff. Engineering, not science.
>Programming languages are not theoretical things. They're concrete, practical tools that _enable_ other stuff. Engineering, not science.
You can't escape theory, engineering is applied science.
Depends on what you mean. Idris's notion of multiplicities essentially subsumes Rust's borrowing (there's some differences with affine vs linear types), so I can't think off the top of my head of things that you can ensure with Rust that you can't with Idris, but Rust has a lot more quality of life improvements that make things less clunky (also having a GC, Idris can get away with a lot less need for borrowing in the first place).
Currently working my way through some complex Python code written in that style and it's completely impossible to understand it. In fact, the only way I can actually do it is transforming all these ad hoc data structures into proper types so I can make sense of it.
As for how you know what’s in there - you should only know whether what’s relevant to your function is in there and not care about the rest of the world. For the former, tools like clojure.spec are helpful but ultimately good design helps the most (something that typed languages can often obscure).
> In this "1ML", functions, functors, and even type constructors are one and the same construct; likewise, no distinction is made between structures, records, or tuples. Or viewed the other way round, everything is just ("a mode of use of") modules. Yet, 1ML does not require dependent types, and its type structure is expressible in terms of plain System Fω, in a minor variation of our F-ing modules approach.
> An alternative view is that 1ML is a user-friendly surface syntax for System Fω that allows combining term and type abstraction in a more compositional manner than the bare calculus.
On the other hand, from the "engineer" point of view, all abstractions melting into one may not be desirable. It's nice to be able to use weak abstractions for simple stuff and powerful abstractions for more powerful stuff. Being exposed to the full complexity of your language all the time sounds like a recipe for disaster.
Also, the main argument is that separating features into those used at compile-time (AKA static) and run-time (AKA dynamic) is necessarily creating separate languages (i.e. a "static language", which may involve types, macros, preprocessors, etc.; and a "dynamic language", which may involve memory allocation, branching, I/O, etc.)
For example, perhaps my `calculate_price` function only depends on 2 attributes of the order which has 65 attributes. Am I creating a 2-element data type for that function to process? no! I'm specifying that it processes an Order data type, with all its 65 elements. But implicitly then I'm saying the function has 65 input parameters of all these specific types and nobody can call it now without providing them all. What a pain! Huge amount of extra code, refactoring, unit testing, because of this.
So either you end up with a cambrian explosion of micro-types or you have these way overspecified interfaces everywhere.
Compare with dynamic languages (or structural typing, Go etc) that only care that things "quack like a duck". The calculate_price function doesn't care what object you give it, as long as it has the two attributes it needs. Now I can unit test `calculate_price` with a 2-element object rather than needlessly creating the 23 irrelevant required elements of a valid Order.
I think a lot could be solved with culture shift. Where data types are really known and locked in, use the crap out of them. As soon as things get ambiguous or flexible, go right ahead and specify that your function takes a Map<String,Object>. If a useful concrete interface emerges at some point factor it out then. The problem is that this is really frowned upon in a lot of places.
But dynamically typed languages produce at least the same amount of accidental complexity, just in different ways.
Complexity in the types happens when the type system isn't expressive enough. Or when you're trying to do something that would make the compiler try to solve the halting problem.
To that last point, this is why the PLT community has pushed in the direction that Agda / Idris has. Kind of like how we realized years (decades?) ago that we didn't need pointer arithmetic, there's been a realization that "total" isn't actually that helpful, and it's okay if we didn't have languages that could express the halting problem.
Maybe we should judge compile-time constraint systems by how easy it is for the library author to add good error messages for misuse?
who's "we" here? pointer arithmetic is useful for all kinds of things.
Rather "I believe there is a silver bullet, but I don't know where yet". Probably I am too naive!
We do or we don't. There is no "might". Spending money on "might" has been the death of many projects.
If we didn't, and now we do, we could write a fn to map the car to parts, or we could define the car struct in terms of its parts, or we could just do away with the car altogether.
But far more valuable would be an analysis of what changed about the requirements that the model no longer works.
Now, don't get me wrong: I'd love a better language, and by better I mean "as fast as assembly but 'dynamic'". The problem is that, at the end of the day, all compilers are just "premature optimizations" or perhaps "willing premature optimizations". We could all be happily programming in smalltalk or build a runtime using predicate logic, but a) the number of people who could program in it is vanishingly small and b) it would be fucking slow. These languages don't solve a problem that I have, or rather they don't solve a problem that I don't already have a far better solution for. They solve a problem that academics have.
We can get a more direct Idris implementation by inlining the parser (toFmt) into the interpreter (PrintfType). That lets us throw away `Fmt`, `toFmt`, etc. to just get:
PrintfType : (fmt : List Char) -> Type
PrintfType ('*' :: xs) = ({ty : Type} -> Show ty => (obj : ty) -> PrintfType xs)
PrintfType ( x :: xs) = PrintfType xs
PrintfType [] = String
printf : (fmt : String) -> PrintfType (unpack fmt)
printf fmt = printfAux (unpack fmt) [] where
printfAux : (fmt : List Char) -> List Char -> PrintfType fmt
printfAux ('*' :: fmt) acc = \obj => printfAux fmt (acc ++ unpack (show obj))
printfAux ( c :: fmt) acc = printfAux fmt (acc ++ [c])
printfAux [] acc = pack acc f : Char -> Type
f '0' = Int
f _ = Char
g : (c : Char) -> (f c)
g '0' = 0
g c = cprintf is what you call when you want to print X in hexadecimal with at least two digits, left justified on an eight-character wide field. I don't see how the sanity of whatever string representation the programming language uses is relevant here.
It's amazing what you can do when you have compiler transformations and targets always available.
Suddenly, "little DSLs" (MLIR dialects) don't seem so bad, since they are defined the same way and map in semantically-sound ways to lower-level dialects. You can have dedicated dialects, like Halide, for doing something as concrete as image processing kernels.
Oh, and you can output those kernels to both the CPU and GPU, including automatically introducing async functions, host-side sync barriers, etc. Good luck doing that automatically with a general purpose programming language and a combination of macros, AST manipulations, and derived types! You really need a compiler to stay sane.
> "Programming languages ought to be rethought."
Indeed.
Also, most of our code needs to support suspend/resume on another machine, either in the middle of an action or more often between actions. So, a "behavior" might begin on machine A and then migrate to machine B to do more work, then on to machine C. While doing work, its execution state might be serialized to Postgres while some dependency is waited on—say, a human task that doesn't get done until the following Monday. It's then resumed in the same execution state, potentially on an entirely different worker/machine, and continues executing.
The suspend/resume stuff completely destroys the code if you're writing it by hand, as does moving from machine to machine.
So we write the core logic in our own internal MLIR dialect and then output code that has the suspend/resume semantics automatically (i.e. literal compiler transformations, plus our own "interpreter" (which is just JavaScript/v8 with all of the extra suspend/resume cruft added in).
We don't translate out of SSA form at all, our codegen can execute it directly. We also insert debug hooks so when there's an error, you can map the execution state to the original code.
Most of the cool machine learning stuff MLIR can do, we're not even doing yet outside of some internal prototypes. So far, just the methodology of MLIR has made a huge impact—it gives really nice structure (read: tooling) for the kinds of code transforms we've needed to do.
HTH
For example in Typescript, I use tsc-macro to run "*.macro.ts", they can import any functions and modules just like normal source code. And their evaluated result are saved as "*.ts"
The generated ts are then compiled alone with other hand-written typical source files into js for deployment and execution.
Hoist the "Final Words" section to the top and make it a "tldr" introduction, that way your reader can begin with a high level understanding of your argument, which you can hone and refine as you progress.
def foo(bar, baz):
bar(baz)
...
What the heck is 'bar' and 'baz'? I deduce no more than 'bar' can be called with a single 'baz'. I can't use my editor/IDE to "go to definition" of bar/baz to figure out what is going on because everything is dynamically determined at runtime, and even grep -ri '\(foo\|bar\|baz\)' --include \*.py
Won't tell me much about foo/bar/baz, it will only start a hound dog on a long and windy scent trail.All features are useful. That's table stakes. But usefulness is insufficient to warrant inclusion. How does a feature interact with all existing features? Are there ambiguities? Are there conflicts? A language is not a grab-bag of capabilities, it's a single cohesive thing that requires design and thought.
Is that really a problem on the language's side, though? Devs are capable of mis-using any feature, even extremely basic ones that almost every language has (variable names, for instance (although I'm laughing in FORTH)). Code standards and code reviews are necessary tools in the first place because it doesn't matter what language you give a programmer - they're perfectly capable of constructing a monstrosity in it.
I argue that preventing programmers from doing dumb things with well-designed language features (so, hygenic Scheme macros, and not raw C pointers) is a social and/or organizational problem, and it's better to solve that at that level than to try to solve it (inadequately) at a technical level.
("I keep dereferencing null pointers", on the other hand, is an example of a technical problem that can be solved on the technical level with better language design)
Yes, for a language to be good in practice you need to look at what developers actually do and not how a perfectly rational developer would use the language.
I have found the opposite to be true. Missing features often leads to what one would call "design patterns". When the language adds official support to solve the problem you're trying to solve with that pattern, the code becomes clearer.
I think the message is more nuanced than that (otherwise wouldn't lisp with its homoiconicity and compile time macros fit the bill perfectly?). Idris uses the same language, but is still too complex. And Zig not general purpose enough. I don't want to put words in the author's mouth but I think the implication is that this is a large space to explore and we don't have a solution yet; there's nothing like "just make your language like this and it'll be good." They're just pointing out the problem they see, and some (non-)solutions to it.
I thought it was more nuanced too as they were explaining how integer types can be derived, until I finished the article, and they really did just seem to be complaining that there's a mismatch between compile time and run time.
Dynamic types don't really solve the problems they mention as far as I can tell either (perhaps I am misunderstanding), they just don't provide any guarantees at all and so "work" in the loosest sense.
> otherwise wouldn't lisp with its homoiconicity and compile time macros fit the bill perfectly?
That's a good point, I do wonder why they didn't mention Lisp at all.
> we don't have a solution yet
What they want to do with print can, as far as I can see, be implemented in Nim easily in a standard, imperative form, without any declarative shenanigans. Indeed, it is implemented as part of the stdlib here: https://github.com/nim-lang/Nim/blob/ce44cf03cc4a78741c423b2...
Of course, that implementation is more complex than the one in the article because it handles a lot more (e.g., formatting and so on).
At the end of the day, it's really a capability mismatch at the language level and the author even states this:
> Programming languages ought to be rethought.
I'd argue that Nim has been 'rethought' specifically to address the issues they mention. The language was built with extension in mind, and whilst the author states that macros are a bad thing, I get the impression this is because most languages implement them as tacked on substitution mechanisms (C/C++/Rust/D), and/or are declarative rather than "simple" imperative processes. IMHO, most people want to write general code for compile time work (like Zig), not learn a new sub-language. The author states this as well.
Nim has a VM for running the language at compile time so you can do whatever you want, including the recursive type decomposition (this lib isn't implementing Peano arithmetic but multiprecision stack based bignums): https://github.com/status-im/nim-stint and specifically here: https://github.com/status-im/nim-stint/blob/ddfa6c608a6c2a84...
func zero*[bits: static[int]](T: typedesc[Stuint[bits] or Stint[bits]]): T {.inline.} =
## Returns the zero of the input type
discard
func one*[bits: static[int]](T: typedesc[Stuint[bits]]): T {.inline.} =
## Returns the one of the input type
result.data = one(type result.data)
It also has 'real' macros that aren't substitutions but work on the core AST directly, can inspect types at compile time, and is a system language but also high level. It seems to solve their problems, but of course, they simply might not have used or even heard of it.Are both Idris's expressiveness and Rust's efficiency (given stronger guarantees) perfect? Aren't theese languages really complex both to learn and to write? There are poblems without a solution, perfect and unique to all of them.
Go is a statically typed language. Unless you're referring to interfaces at run time?
> I'm saying the function has 65 input parameters of all these specific types and nobody can call it now without providing them all. What a pain! Huge amount of extra code, refactoring, unit testing, because of this.
I've seen this "cambrian explosion of micro-types" argument before on HN but I think it comes from a misunderstanding about what you actually do in a static language. No one's creating types for every combination of parameters.
Either you'd pass the two arguments directly (`a, b: int` or whatnot), or you'd pass the Order type and just use the bits you need.
If you have multiple Order types, you'd use generics or something like it to get duck typing. If you used a field that wasn't there, you'd get a compile error.
The reality is the code would look pretty much the same in both static or dynamic languages.
Dear god, please don't. Some of the worst spaghetti code I have disentangled used this pattern. Typos in the string literals used as keys, object type mismatches, etc.
If you only want some of the fields from another struct, you have a few options
* Define another struct `FooArgs`. This is easy, and I (respectfully) reject the claim that nobody does it.
* Just define your function to take those two fields directly.
[0] https://insights.stackoverflow.com/survey/2021#technology-mo...
The issue with this is it makes it difficult to understand code. If anything works potentially anywhere then the flip side is you have no idea what works anywhere without running the code.
Running code is slower iteration cycles than a type checker. Also you don’t actually know what it returned. So it was able to produce an output, was it what you expected? Or was it subtly different in ways that will break your code downstream.
I work at a company with a large untyped code base and the product is constantly breaking in these ways.
Python can do the same using what they call Protocol. Here protocols need to be defined upfront.
This is usually called Structural typing, as opposed to nominal typing where classes inherit from a base.
You just make it way harder for people to understand your code and contribute to it.
Though in my experience, sane code structure and informative comments trump everything else when it comes to understanding big and unknown codebase. I still shudder when I think about working years ago on various Java codebases (mostly business IT systems). What a convoluted mess of n-levels deep interface hierarchies. Types? Yeah, but good luck unraveling what exactly is happening in the runtime.
I agree that on balance type signatures are better -- and that's why modern Python has evolved to incorporate them. But they aren't a magic cure-all, and do they impose a significant tax of their own.
Python is easy to grok, and if you have programmers writing code like bar(foo,baz) then the problem is not Python. You can write crap in any language.
Unit tests do much of what typing checks anyway ... and here's the thing ... you NEED unit tests no matter what. No typing system can tell you that you wrote > when you should have written <.
(defn advisories
[config]
{:pre [
(map? config)
(:download-advisories-dir config)
]
:post [
(map? %)
]
}
(let [
dir (:download-advisories-dir config)
]
;; more code here> The claim is simple: in a static type system, you must declare the shape of data ahead of time, but in a dynamic type system, the type can be, well, dynamic! It sounds self-evident, so much so that Rich Hickey has practically built a speaking career upon its emotional appeal. The only problem is it isn’t true.
Is this somehow supposed to relevant to the posted article or did you just want to start a tangentially related dynamic-vs-static flame war here in the comments section?
While you still can make a static language that is confusing, it's a lot harder... I challenge you to write a function signature in Rust that is both:
1) Useful
2) As opaque as the python signature above.
> You find automatically inferred types
A minority of static languages do type inference in function signatures. I think it's a bad idea for exactly the same reason the python code is bad. On the other hand, every dynamic language allows you to omit any information about a type signature.
Static vs dynamic makes for such difference in the detailed workflow, both in terms of changing existing code & in terms of writing new/(more) from scratch code, yet they can both be quite fruitful, and can both be abused in absurdum.
It seems like people naturally fall into one of the two camps (either by personality or by training), and the other side just seems kind of insane: "how can you even work that way!?". Then culture and idioms emerge over time and strengthen the tribalism.
I've gone back and forth between the two over the course of my career, and it's quite a mind-shift when switching, with a fair bit of pain involved ("but it would be so easy to do this in [old language]", or "what the hell is this garbage anyway!?") and then eventually it settles in and it's not all painful, all the time ;)
(Going back and forth between Scala and Python right now, so this hit a bit of a nerve)
Oh yes, they do. Even inferred, the types are there and pretty easy to locate, even if you're not using an IDE.
And traits! "Oh look, this functionality is implemented in a trait implemented by a trait implemented by a trait implemented by what you're looking at. Maybe"
const foo = JSON.parse(arbitraryJsonString);
and not having to worry about the structure up front.That's not power, that's a shotgun aimed at your crotch whose trigger is connected to a cosmic ray detector.
The API then becomes a runtime fallible one, which is perfectly sound.
let foo: serde_json::Value = serde_json::from_str(arbitraryJsonString)?;
There, just as powerful [1]. But you know what's even more powerful? After you've done your dynamic checks, you can do this on the entire JSON tree, or on a subtree: let bar: MyStaticType = serde_json::from_value(foo)?;
and you get a fully parsed instance of a static type, with all the guarantees and performance benefits that entails.[1] Value represents a JSON tree: https://docs.serde.rs/serde_json/enum.Value.html
Now you get type checking on JSON at compile time :)
As soon as you try to do anything useful to foo it's not arbitrary anymore. You have to make some kind of an assumption on the underlying type, check for keys, nulls, maybe it's a number (the right number?), maybe it's a list. So now you have to scatter some boilerplate checks everywhere you touch a part of foo.
If you could parse it into a typed structure up front, you'd only have to deal with this in one spot, and have guarantees for everything else that follows.
Bonus: if your typed language has good support for records, you can even do this in a way that only provides structure to the parts you care about, and is robust to changes to any other parts of the json.
eval('alert("hello, ' + userInput.name + '!")')So there's no docstring? And the actual variables are that random and indecipherable?
Sounds like the problem is that you're tasked with looking at code written by someone who is either inexperienced or fundamentally careless. When dealing with reasonably maintained codebases, this kind of situation would seem pretty rare. In modern python we now have type hints of course, which have helped quite a lot.
In static languages this simply isn't a problem. Types are checked for consistency at compile time and you don't have to rely on people toiling on this busy work.
Not to say documentation isn't necessary, or good, but isn't something you need to create working programs because otherwise no one knows wtf any variable is without running the program.
I had to laugh, hard.
If your Python program uses any library whatsoever, chances are that library won't have types, so you can't really use them.
Even super widely used libraries like numpy don't have good support for types, much less any library that consumes numpy for obvious reasons.
Function names and doc comments describe behavior, not argument and return types.
What I like about Rust is that it even checks code in your 'docstring'. So it is easier to keep it maintained.
I use "Find Usages" on foo to see where it is used. Once you see where it is used, you know what the types can be. It's not great, but it's also something that can be progressively remedied.
In the event that the function is not as trivial as your example suggests, the author should have written a docstring to help you understand what it is trying to do, in addition to type annotations that will make it more readable.
In this example, bar can either be a function, a class, or any object with __call__(), so the type information is less important in this case, than actual docstrings that express intent.
I just gave up and introduced globals and used as flags for some places in the code.
To make things even more fun, big parts of the system was written in 2.7 called from runtime generated bat files from 3.4 as remnants of a rewrite and the consultant that had his funding cut.
But also, there’s nothing stopping the code from being much clearer about its intention than this weirdly contrived example (I have a lot of code and most the function names are pretty unique). And surely you want to search for ‘foo(‘ to find invocations.
That said, you're right that the current borrow checker ("non-lexial lifetimes") is built off of a control flow analysis, absolutely. But it still operates only within bodies, not globally.
No, I think they appear to think that the borrow checker is not about the programmer stuffing the program with type theory annotations for this purpose, but the borrow checker handles the analysis automatically across the program.
Of course it doesn't handle it perfectly, so there's some "fighting the borrow checker" and lifetime annotations sometimes needed - but it's not like designing the whole of your program based on type theory - the same way you're somewhat free from thinking about freeing memory in a gc language.
There are quite a few internal projects written in Go, but it feels like any time one of them gets big enough to matter, someone ends up reimplementing the important parts in C++ for performance reasons.
Also, while it looks like the matrix sizes have to be known at compile time, this is not the case. You are still free to use the same matrix types with dynamic sizes (or even mix them, useful for said convolutions).
In Haskell there that is achieved by using a "KnownNat", basically you elevate an integer from the value into the type level during run-time.
Hmm, that fixed point operator, rings a bell, can't quite put my finger on it . . . :-)
That's not what I wrote about. I've seen people implement entire DSLs in types. So now you have to test that.
I know that casting to interface{} is the equivalent of casting to Object, but Go 1.18 should hopefully avoid most of that.
Just wondering what else you're thinking of, asking as someone who knows Java well and Go okay.
That's kind of the problem though. Software is written by humans, and humans are fallible. We don't always make the correct choices. Also, there are economic pressures, deficiencies in specification, and changes in business requirements.
Personally, I believe businesses should accept the aforementioned reality and optimise for cost of change.
Literally all type systems could be described as “a static analyzer” that tries to assign and validate properties over the code it’s analyzing.
All compilers also rely on the results of that static analysis to direct codegen.
Rust’s type system implements substructural typing, and the borrow checker is an integral element of that type system.
Types define what operations can be performed. The borrow checker looks at allocations and determines when they can be freed. It really has nothing to do with types. An ideal implementation of the borrow checker could be totally type unaware and work with dynamically typed languages.
I was under the impression that it's substantially an implementation of an affine type system (https://en.m.wikipedia.org/wiki/Substructural_type_system).
Probably you could build such a thing without thinking about types per-se, but I don't think the designers of Rust did that, and I am not sure that makes it "not a type system" anyway.
> any static analyzer is a type system
"Any static analysis checking correctness is either a linter or a type checker" isn't a claim I would make, but I am not sure offhand how I would argue against it.
See here for more re Swift: https://github.com/apple/swift/blob/main/docs/OwnershipManif...
No, it's the thing that enforces a type system.
The borrow/move semantics it enforces are a (part of a) type system.
The point of my original comment was to suggest that some of the flexibility offered by duck typing can be achieved in FP, so they should seem similar.
I would still just pass the fields as two parameters.
I'm just saying that core problem it solves -- "bad developers" -- is going bite you no matter what (if not addressed at the source). And that supposed magical efficacy of measures designed to protect against it is somewhat overstated.
None of these are as significant as generics, but things do change.
It was before my time so I don't know what the practical impact was, just inherited some code that needed updating a few years ago...
I have used tagless casting a half-dozen times to deal with JSON vagaries; and overlapping methods today almost constantly. Slice conversions to array pointers is too new to say generally but I had one use-case pretty much immediately.
Otherwise, since Assembly provides the most control out of all, would you consider it the most expressive? ;-)
Maybe this definition is idiosyncratic, though. It's certainly not objective.
> engineering is applied science.
Absolutely. But the metrics are different.
In any case, I wrote "doesn't need", though perhaps you consider that hair splitting.
"predicateA is true" if "clauseB is true" and "clauseC is true".
or in prolog
predicateA :-
clauseB,
clauseC.
so :- is if, just in it's own, Prolog-y way. if (clauseB && clauseC)
predicateA = true;
else
predicateA = false;
which is clearly an "if". Or you could write it as: predicateA = claseB && clauseC;
which is not an "if" at all, but just a boolean calculation. (Unless you regard all boolean calculations as "if"s in disguise...)The prolog version seems to me to be more in the spirit of the second C version.
The second part is using Peano numbers to enforce the constraint. I guess you could try and force that into some mainstream languages, probably C++. With its template programming you could get something going in this vein, though I'm not sure how well it would work if the number were calculated at runtime rather than compile time. You'd still end up with a dynamic check somewhere.
Very often I'm looking at code and "how to use the interface" is not a question I'm looking to find answers for.
%foo(+A,-B)
foo(A,42) :-
A < 10.
foo(A,24) :-
A >= 10.
foo(_,100).
which would be bool foo(int *A, int *B){
if (*A < 10){
*B = 42;
}elseif (*A >= 10){
*B = 24;
}else{
*B = 100;
}
return true;
} /* my pointer knowledge is rusty though, so grain of salt */Also, how do you then access nested objects, like data['key'][0]['attr'] in Python?
Then you write one short operator (and I agree that some static languages make this more cumbersome than it should be) to say so, and either handle the case where it isn't, or explicitly declare yourself partial and not handling it.
> Also, how do you then access nested objects, like data['key'][0]['attr'] in Python?
With lenses, something like:
data ^? (key "key") >>> (nth 0) >>> (key "attr")
If you do several unsafe operations in a row then this is cumbersome by design - you want to be clear which parts of your program are safe and which are unsafe, so that readers can understand and know where to review. But a good language should let you compose together several unsafe operations in a lightweight way and then execute them as a single unsafe operation, for cases like this where you want to work in the unsafe part of the language for a bit.But my main point is that HideousKojima's "statically-typed" solution would result in a runtime type error if it was given unexpected input, just like a dynamically typed solution.
It’s what you’re used to. I personally find Python horrible to read because I used to a whole different class of programming languages. But I’m sure some of my code might be hard to read by others who aren’t used to that particular programming language too.
> Unit tests do much of what typing checks anyway ... and here's the thing ... you NEED unit tests no matter what.
Some, not all. Strictly typed languages are handy when it comes to refactoring and unit tests can sometimes fail there if the design is being changed enough that the unit tests need rewriting too.
> No typing system can tell you that you wrote > when you should have written <.
Not technically true. Some languages with a richer set of types and operator overloading could have code written to detect that sort of thing. But I do get your point that unit tests are import too.
I’ve been programming for > 30 years and in dozens of different languages. In that time I’ve felt strictly typed languages make larger and more mature code based slightly easier to maintain. While loosely typed languages are easier for smaller and/or younger code based. But largely it boils more down to personal preference than anything.
I will caveat that by saying the fact that Python supports type annotations should be telling that even dynamic languages benefit from a stricter approach to typing.
People who just learn programming probably think the same about whatever language they are learning.
> No typing system can tell you that you wrote > when you should have written <.
The more the compiler can figure out for you, the quicker problems can be identified and fixed. I stopped using python altogether, because it was just infuriating to have the tiniest mistakes blowing up in spectacular and inscrutable ways. Mixing up values of complex types often does not fail at the actual site of the error, but much, much later. Sometimes literally later in time, as in hours, days, or months until you get an obscure "FooType does not Bar" error, and how the thing in question ever became a FooType is inscrutable at that point. If the result even is a runtime error at all! (Bonus points if your production database is now full of junk as well.[1])
The unit test did not catch it because it did not test the offending composition of classes and functions. Meanwhile, a compiler would have caught it immediately: "The thing you're doing here leads to your data structures being nested wrong."
When I started using async/await in python, at first it was just over, since in plain python that introduces another layer of typing without any assistance whatsoever. Then I discovered mypy which actually lets me do some amount of static typing in python, and it was very enjoyable and now python is back on the table for smaller projects.
There is a reason Haskell has the reputation of "if it compiles, it works". There is a reason why system programmers that work on critical systems are jealously eyeing Rust if their shop still does C.
By the way, dependent type systems absolutely can tell you if you wrote > instead of <. But since that usually comes at the expense of not being Turing complete anymore, it's more used for very critical systems, or for theorem provers.
[1] Yes sqlite, I'm looking at you. The decision to make database column dynamically typed, and hence have for example an INTEGER column silently accept data that is very much not an INTEGER at all, caused me some grief on a widely deployed system once.
Python is not easy to grok at all, when you consider you have to grok implementations to understand what they are supposed to do, and require extensive runtime debugging to figure out if it is behaving as expected before you can even write unit tests.
Compare to decent statically typed languages, which have quicker write/debug cycles since checking type definitions is faster than checking code behavior, and the structural unit testing is covered automatically by the compiler.
It's like getting more than 50% of your programs' test coverage, for free!
It's not, though, it just gives you that illusion.
The code might be easier to read but it's harder to understand and to modify safely because of the absence of type annotations.
You can also make a book easier to read by ripping out all its pages.
If you eliminate the content you need to read to understand something, what have you actually made easier?
> No typing system can tell you that you wrote > when you should have written <.
There are many that can; e.g. via SMT-decidable refinement types, or even full undecidable dependent types coupled with automated solvers and manual proofs.
I don't think HideousKojima ever called it a "statically-typed solution". Their point was that statically-typed languages still let you write unchecked code when you want to - and yes, of course such unchecked code can fail at runtime - but give you the option of having checking in the cases where you want it.
Yes, just like you have to evolve your specification/documentation. Similarly, in the exploratory phase you'll stick to very 'rough' typing and next to no proofs and as the program gets clearer and solidifies, you can continuously refine your types (with the amount of refinement depending on how much certainty one wants). A mature program with a rigid spec is going to be harder to change, but that's just how it is anyway.
>Moreover proofs are non-compositional: you make a tiny change somewhere and you might have to change all proofs.
People on HN keep repeating this, but it's trivial because it's actually just a statement about the nature of properties. Proven invariants provide an interface to be used by the rest of the program. If the changes you make don't break your invariants but only the proofs, then you just have local adjustments to make. If your change cascades through the entire program, then it's because all the invariants get invalidated and your 'tiny' change actually changes a good deal about the semantics (and thus the spec) of the program.
The exact same thing applies to unit tests, but you don't see people constantly bemoan the non-compositional nature of unit tests even though a 'tiny' change could break all your unit tests.
>After all, your specification is just as likely to be buggy as your implementation code.
Not only are specifications declarative, they're generally magnitudes smaller. If you're as confident in your spec as in your code, something is very, very wrong with your spec. Well, or your code is your spec, but in that case you get what you pay for.
That is correct, and also one of the core reasons why in the vast majority of cases either no specification/documentation exists, or will only cover a small case of the actual specification. For example I would bet money that not a single function in the C, C++, Java and Python standard libraries is fully specified, in the sense of nailing down the program up to observational equivalence. (Aside: I can still count the programmers who would be able to sketch the observational equivalence using in e.g. C++.)
> If your change cascades through the entire program, then it's because all the invariants get invalidated and your 'tiny' change actually changes a good deal about the semantics (and thus the spec) of the program.
This is not borne out in practise.
A lot of code refactoring I've done was trivial (e.g. changing the order or arguments), but ripples through the program and proof structure. HoTT was invented in order to automate some of those trivialities. Java exception specifications are an example: you call a different library function somewhere and need to change all exceptions specs up to the main function, rippling through millions of LoCs. That's why exception specs were abandoned. Another example are termination proofs (note that a full specification must involve termination proofs, which is why the existing expressive type theories don't give you unrestricted recursion, and also the reason why program logics are typically only for partial correctness). In my experience, termination bugs are rare, and it would be insanely counterproductive if I had to refactor all proofs globally just because I've made a slight change to some printf somewhere in a large code base.
> unit tests, but you don't see people constantly bemoan
The reason is that no programming language forces you to do unit tests. In contrast, expressive type-theories constantly force you to prove a lot of trivialities.
> declarative, they're generally magnitudes smaller.
I don't know what you mean by declarative (other than: leaving out some detail). But they cannot be smaller in general: if every program P had a full specification S that was shorter (here full specification means specifying P up to chosen notion of observational equivalence) then you've an impossibly strong compressor which you can use to prove that every string can be compressed even more. Contradiction.
What you see in practise is that you only specify some properties of the program you are working on. For example sorting routines in C, C++, Java etc. I have never seen a specification that says what happens when the sorting predicate is nicely behaved (e.g. returns a < b on first call but a > b on second). It's fine to omit details, but that limits the correctness you get from your spec (and also limits program extraction). Moreover, if you only work with a partial specification, you can ask the question: what level of partiality in my specification gives me the best software engineering results. My personal and anecdotal experience (which includes dependently typed languages) has consistently been that the full automation given by let-polymorphism is the sweet spot for non-trivial programs (lets say > 10k LoC).
I feel that is much too pessimistic.
>will only cover a small case of the actual specification.
If the same applies to proofs: so be it. Don't let perfect be the enemy of good!
>For example I would bet money that not a single function in the C, C++, Java and Python standard libraries is fully specified, in the sense of nailing down the program up to observational equivalence.
I'd imagine so as well, but I think that's more indicative of how even a (superficially) simple language like C is not all that amenable to specification.
> A lot of code refactoring I've done was trivial (e.g. changing the order or arguments), but ripples through the program and proof structure.
This is not my experience. If you use sufficient proof automation, something like this should have next to no impact on proof structure. Univalence is useful, but a lot of refactoring is not just switching to isomorphic representations, so I'm convinced that larger scale proof automation is way more essential than HoTT.
> Java exception specifications
I'm not convinced that this is fundamental to specifying exceptions rather than just Java having particularly poor ergonomics for it. I've never met a person that actually liked implicit exceptions and if you ask people who dislike exceptions, that's often one key part of it.
> In contrast, expressive type-theories constantly force you to prove a lot of trivialities.
For all large dependently typed languages that actually bill themselves as programming languages (Agda, Idris, Lean) there is some sort of 'unsafe' feature that allows you to turn the termination checker off - at your own peril of course. But you only pay for what you use, if you don't need the additional correctness, you don't need to put in any more work, just like with unit tests.
(There are also ways to safely defer termination proofs, but to be fair the experience isn't the best currently.)
>I don't know what you mean by declarative (other than: leaving out some detail).
Specs tell you the what but not (necessarily) the how. Take the spec for all sorting algorithms (giving observational equivalence):
1. The output list is a permutation of the input
2. The output is sorted in regards to some predicate.
That's a couple lines at most (or a one-liner if you don't count the building blocks like the definition of a permutation), which is a good amount shorter and easier to verify than e.g. a full heapsort implementation.
>But they cannot be smaller in general: if every program P had a full specification S that was shorter [...] then you've an impossibly strong compressor
The compression is stripping away the 'how', that's why you can't 'write a spec for a spec' and compress further.
>What you see in practise is that you only specify some properties of the program you are working on.
Sure, writing a fully specified program of non-trivial size is currently really only possible if you're willing to ~waste~ invest a decade or more.
>Moreover, if you only work with a partial specification, you can ask the question: what level of partiality in my specification gives me the best software engineering results.
Why would you assume that there is a single level of partiality that gives the best results? I agree that HM style types are a great fit for 'general' programming because it has such low impedance, however I also believe that most programs have some areas where they would benefit from more specification. (I think people have a clear desire for this and that it's at least partially responsible for some of the crazy type hackery as seen in Haskell or Scala, which could often be greatly simplified with a richer type system.)
Having a richer type system doesn't mean that you always have to fully use it. It's perfectly possible to just use a HM style/System F fragment. Using dependent types just for refinement is already one of the dominant styles for verification. If dependent types ever become mainstream in any way, I imagine it will also be largely in that style.
As you say, there is no free lunch. Not having a useful type system introduces its own complexity. It depends on what abstractions you find most useful.
The present limitations of dependently typed languages will not be limitations tomorrow. Evolution in the field of proof engineering is providing new frameworks for making proofs more compositional and being able to extract programs from the proofs. It's not amazing and super useful today but it's a lot better than it was even five years ago and I suspect will continue to improve.
> After all, your specification is just as likely to be buggy as your implementation code
If you can't think of the right theorems or specifications I doubt you will write a correct program.
One is a lot easier to reason about than the other.
> Even type-checking can easily become undecidable when the ambient typing system is too expressive.
I'm not sure I follow. I understand how type inference can become undecidable but how does a sound type theory end up this way? The judgement and inference rules for CoC are rather small [0].
> There is no free lunch.
I don't disagree. I still enjoy programming in C. And I might even choose it for a project. And if it so happened that I had certain requirements like the system had to be real-time and could not deadlock then... I might be making a trade off to write my proofs and specifications in another language than my implementation.
We're not at a place yet where we can extract a full program from a specification and not in a place where we can write dependently-typed programs with deterministic run times either.
I would like to have my cake and eat it too but that's where we are.
I agree, I am not promoting dynamically typed languages. My intuition about this is more that there is a sweet-spot between automation and expressivity that gives you the best software engineering experience in most practical programming tasks. Milner's let-polymorphism is closer to that sweet-spot than full-on dependent types a la Calculus-of-Constructions
> extract programs from the proofs.
In practise you don't have specifications in > 95% of programming tasks. What, for example, is the full specification of a climate simulation, or of TikTok? One could argue that the shipped product is the (first and only) specification. Every program I've ever been part of constructing started from an informal, intuitive, vague idea what the software should do. This includes safety-critical software. To quote from a famous paper [1]: "We have observed that the errors we find are divided roughly evenly between errors in the test data generators, errors in the specification, and errors in the program."
> If you can't think of the right theorems or specifications I doubt you will write a correct program.
I strongly disagree. I have written many programs that are, as far as I can see, correct, but I am not sure I fully know why. Here is an example: the Euclidean algorithm for computing the GCD. Why does it terminate? I worked out my informal understand why at some point, but that was a lot later than my implementation.
More importantly: in practise, you do not have a specification to start off with. The specification emerges during programming!
> how does a sound type theory end up this
I'm not sure what you are referring to. Type inference and type checking can be undecidable. For example System F, a la Curry.
> extract a full program from a specification
Let me quip: Proof are programs! In other words: if you want to be able to extract a full program from proofs, the specification / proof must already contain all the information the eventual program should have. So any bug that you can make in programming, you also must be able to make in proving. Different syntax and abstraction levels clearly corresponds to to different programmer error probabilities. But fundamentally you cannot get away from the fact that specifications can and often do contain bugs.
[1] K. Claessen, J. Hughes, QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs.
If you need to know exactly which implementation is being used in a particular context then maybe you shouldn't be using an interface, but should be using the concrete type?
Look again at the original comment by dandotway. It's not a question of "what type should I be using here", nor is it "how should I use this interface", but "what do I need to do to understand this code?". Even if an abstract interface is used correctly and is the right thing to do, you still need to understand the code before you can (debug|rewrite|extend|whatever) it.
And it's a pipe dream to say you can look just at the interface. Something blows up, is it the interface used wrong? Maybe, maybe not? Is it the interface implemented wrong? Maybe, first you need to know what implementation you're looking at. Subtle interactions between abstract and concrete send you spelunking through the layers when you're debugging or trying to extend the interface to account for something the original author didn't anticipate, and often the devil (in the details) comes from concrete implementations.
When conventions are followed they do exactly that, actually. And unless the code is a complete trainwreck, it's pretty easy to tell what the return type is, even without explicit annotation in the docstring.
What all comes down to is: if you really have people on your project writing code like the foo/bar/baz example way up above - then you have problems way bigger than static type checks can possibly help you with.
Historically, most cases have been either compile time (static) or run time (dynamic) type checking. And left between one or the other, and experiences like the above, people make their binary choice.
More and more in my Python code, I do some type annotations I can. My feeling is that the annotation coverage ROI is non linear. I get a lot of mileage out of the types I can add easily. When it gets complicated (interesting unions or deep containment) it gets harder and harder. Enough so that sometimes it influences my design decisions just to make the typing job easier.
I’m left to wonder how this scales as the code base/team size scales. If the pain of 0 types is 100 in a large project, and 100% types cuts it to 10, what happens if we all do the easier 80% annotations. Is the pain 80% less too? Or is my personal experience mirrored and it actually is quite a bit better than just 80% reduction?
But it is not that black and white, is it? Python is actually somewhat static in that it checks (some) types during runtime. Other dynamically typed languages live completely by the "when it quacks like a duck" playbook.
On the other hand, Haskell is completely statically typed. Still you can write many programs without annotating any types at all, as the compiler is pretty good at inferring types from their context.
This is a very good thing. You definitely want the type system (and tests!) to guide your system design.
However, without a type checker, choosing your names wisely will get you a long way.
I think anyone advocating type systems should spend a year working in a dynamic language, to get out of their echo chamber and form a more objective opinion.
I really don't see the advantages of dynamic languages. Supposedly you develop faster, but in my experience that's absolutely not true.
This seems to me to be extremely uncharitable point of view.
But let's roll with it.
I advocate type systems.
I've also worked in several non-trivial projects in lua. Several non-trivial projects in python. Several trivial projects in common lisp. Several trivial projects in erlang.
Additional I've worked in a non trivial project in ruby for several months. And one non trivial project in node for a year. Both of these in a professional 8 hours a day capacity.
I still advocate type systems. More so after working in dynamic languages.
I remember sitting with them at OOPSLA and discussing just this at length. Java was in full momentum ascendancy. Eclipse was the new will-solve-everyones-problems IDE. There was a LOT of interest in doing for typed Java in Eclipse what John and Don had both done in Smalltalk during their advanced degrees. They too thought typing would make it easier.
And what they shared with me is that it actually made it harder. Type systems are complicated to model. It gets combinatorial complex to model transformations across all the edge cases.
It may be argued that this was/is more a statement about the Java type system in particular, which is not generally extolled anyway.
But the basic take away was that refactoring becomes easier and easier as the language syntax and model gets simpler. A more complicated language like Ruby, without types, was also harder.
Or put another way… refactoring engines work with AST models. The more complicated your AST, the harder refactoring gets. I think type systems generally add, rather than remove, complexity from language models.
But perhaps there is a way to make it so. Just sharing some thoughts of yore from some guys that spent a lot of time studying refactoring.
I learnt Clojure, to the point that it's now my favourite language and the one I use for many things. Nevertheless, I still think that typed languages have two advantages that make them a more practical option for most situations:
- It's easier to evolve your code. This includes refactor but also normal program evolution through aggregation.
- It allows to express abstractions in pure code, rather than in documents or comments.
The first advantage requires little explanation. The second I think is more difficult to appreciate without enough experience with typed systems and an interface oriented programming style.
How these two things are important for a given person is difficult to say. I have come to the conclusion that it relates a lot to how your brain works. Unfortunately, mine can only track 3 or 4 things at the same time, so I need to abstract away anything else than the little piece of code I'm working on and get the compiler to worry about how things glue together.
I need the comfort of getting the compiler to help me change my hats and be able not to think about anything else that the concrete piece of code I'm working on and the interfaces of the other parts it uses. When I don't have this possibility, I miss it even more than the comfort of programming using inmutable data structures that Clojure spoilt me with. I think need to seriously try Scala 3 at some point, since it seems to combine inmutable data structures by default with an advance type system (although the lenses libraries I've seen look like an abomination in comparison with Clojure's update-in and company, not to mention the absolute elegance and triviality of the implementation of the latter: https://github.com/clojure/clojure/blob/clojure-1.10.1/src/c...).
So I would second your recommendation and encourage people trapped in dynamic language echo chambers to try for a year something like Typescript or Kotlin to appreciate other ways of doing things in languages with practical type systems. Perhaps some of them will discover that it suits them better or help them better understand why they prefer the dynamic style.
The opposite is much less likely to be true; at best they may have done a project or two in Java or started migrating to TypeScript.
(Please also keep in mind: The average ML user probably reads, or intentionally avoids, HN. The average JS user doesn’t know it exists.)
It's fine if you want to insulate yourself. But I don't see that you're making much of a point here.
Look, I just don't buy the suggestion that static typing magically solves a huge set of problems (or that it does so without imposing negative tradeoffs of its own -- the very topic of the original article). Or that dynamic languages are plainly crippled, and that one has to be a kind of a simpleton not to see this obvious fact.
For anything even remotely production-y I'll always prefer explicitly parsing JSON into a known structure, but there's a lot of value in in being able to do some exploratory scripting without those constraints.
const Json = union(enum) {
null,
number: f64,
bool: bool,
string: []const u8,
array: []const Json,
object: HashMap([]const u8, Json),
};
Usually it's a tagged union of the base JSON types which can easily be consumed by most statically typed languages or a variant of it.EDIT: added "tagged"
If you need to statically check the construction of values in Haskell, there are things like refinement types[0].
Except that in the first example from the first link you sent me, there is no static guarantee that the inputs to the constructor are valid, thus the error branch (it would be unnecessary if static guarantees could be made regarding the use of the constructor). And that was my point, that you still end up with dynamic checks on the values which is where pre/post conditions step in to cover what static typing cannot (or, again, cannot easily in mainstream languages, which would not be Haskell).
There are also recursive types that help you model JSON, but knowing that it's an arbitrary deep nested map/list of maps/lists and number and bool and string mixed like a Bloody Mary cocktail doesn't really help :)
With NestJS it's very easy to add decorators/annotations to fields of a class, and the framework handles validation (throws HTTP 422 with nice descriptions of what failed) and then in your controller you can again work in a type safe environment.
https://www.typescriptlang.org/docs/handbook/release-notes/t...
https://www.typescriptlang.org/docs/handbook/2/narrowing.htm...
Protobuf is moving us that way with microservices too. Since they're a strongly typed message format, it's harder to make mistakes in the interface between two services.
I also like that languages can have complete local static analysis. Sure, the business requirements might be large and spread across many areas, but I will break them down into smaller chunks and encode invariants into the type system so that if the small chunk compiles, I am confident it does exactly what I expect, and I don't need to remember exactly where it fits in the larger picture
There's certainly a subclass of programmers who believe this, yes.
It's not that simple. You also have to specify the effect set of the algorithm, meaning, assuming we do in place sort: every memory cell other that the input array are unchanged after termination. (If you return a fresh array, you have to specify something related). You also have to specify what happens for general sorting predicates, for example if the sorting predicate prints out the element it compares then this reveals (much of) the algorithm.
> The compression is stripping away the 'how'
The sorting example shows that this largely doesn't work in practise. In my experience for non-trivial programs you tend to have to carry around invariants whose complexity matches the size of the program you are verifying.
> I'm convinced that larger scale proof automation is way more essential than HoTT.
I strongly agree, but currently this proof automation is largely a hope, rather than a reality.
> crazy type hackery as seen in Haskell or Scala
Haskell and Scala have (local) type-inference. That makes those complex types (somewhat) digestible.
> dependent types just for refinement
If / when this really works well, so that you don't pay a price when you are not using complex types, then this would be very nice. I don't think PL technology is there yet. (I'm currently using a refinement typing approach in production code.)
I implicitly assumed that state would be encapsulated, since after all dependent types are fundamentally incompatible with observable state, but you're right that this is part of the spec. However I have severe doubts about the usability of any specification language that takes more than a one-liner to specify this.
>You also have to specify what happens for general sorting predicates
If you want to admit more general sorting predicates, sure, but normally you'd just allow pure functions.
>The sorting example shows that this largely doesn't work in practise. In my experience for non-trivial programs you tend to have to carry around invariants whose complexity matches the size of the program you are verifying.
If you include lemmas derived from the spec then I'd agree, but then the spec you have to check would still be a lot smaller. If not, then the only specs where I've experienced anything close to this are ones that are just the operational semantics of existing programs i.e. cases where you want to analyze a specific program. Otherwise I'm frankly uncertain as to what even the point of giving a spec would be.
>Haskell and Scala have (local) type-inference. That makes those complex types (somewhat) digestible.
Might make it more tractable to use, but I find my issue is that it's less direct and often obfuscates the meaning.
You can extrude hidden state in ML-like languages, which gives rise to all manner of subtle behaviour in combination with higher-order functions. As Pitts, Odersky, Stark and others have shown in the early 1990s, even just the ability to generate fresh names (in ML-like languages, this corresponds to the type Ref(Unit)), in conjunction with higher-order functions gives you essentially all the specification and reasoning problems of state.
> normally you'd just allow pure functions.
This is way too restrictive for non-trivial programs. This restriction work for sorting. And in reality even there you'd run into problem, for example if you take into account that a predicate might throw an exception (which, in practise you cannot avoid, think overflow or taking the head of an empty list).
> the only specs where I've experienced anything close to ...
I think that's because you have not attempted to prove substantial theorems about substantial code. The SeL4 verification is interesting in this context: it's specification (a purely functional specification of the OS behaviour) had, IIRC about 1/3 of the size of the OS's C code. Which is not much of a compression.
That's why I liked JavaScript -> TypeScript. I can bang out a prototype of some small piece in JavaScript and then add the types once I'm sure it worked. I save tons of time getting to MVP / proof of concept first and then filling in the constraints later.
And some of it is just bad language syntax or culture making it obtuse, not the staticness of the language. It also has an increased learning curve.
But also because we got derailed from my initial point and context.
> Pre/post conditions are complementary to a type system.
Do you disagree or agree with this statement? Because you never addressed it either.
> They can ensure logical properties that may not be encodable in your underlying type system
Note the "may", because that's important. I didn't say that there were no languages in which my example could be encoded in the type system. And maybe it wasn't the best example, but the point itself was that there is no type system (that I'm aware of, not even Idris as far as I know) which can prove in its static type system every piece of logic about a program. This means that some properties of the system will end up being checked (if you bother to) at runtime and not at compile time. That's where pre/post conditions are useful, they contain information (and in a more deliberate form in cases like the Clojure example) about the properties of the system that are hard or impossible to encode directly in the type system.
Different type systems (both static and dynamic) let you express more or less with them, which reduces the need/desire to have checks like these in your program. But I seriously doubt that any mainstream language will ever totally remove their utility, as complements to the rest of the type system.
Many static type systems can prove anything that can be proven. Notionally one might write a program that relies on something unproven like the Collatz conjecture, though I'm not sure that would happen in practice (e.g. it's easy to write a program that relies on the Collantz conjecture for numbers below 2^64, but that's quite provable). Whether it's actually worth writing out the proof is another question though.
> This means that some properties of the system will end up being checked (if you bother to) at runtime and not at compile time. That's where pre/post conditions are useful, they contain information (and in a more deliberate form in cases like the Clojure example) about the properties of the system that are hard or impossible to encode directly in the type system.
This is true but makes surprisingly little difference, because you still want to keep track of which values have or haven't had that runtime check applied. So you almost always want your postconditions expressed as types (even if they're just "marker" types that indicate that a given runtime check was passed). Put another way, any metadata you would want to be able to track about a function's argument values or return value, you almost always want to be able to carry that metadata around with that value before it's passed in or after it's returned - but at that point that metadata is a type, and it's easiest to represent it as one.
> > Pre/post conditions are complementary to a type system.
> Do you disagree or agree with this statement? Because you never addressed it either.
I agree with it. I write web applications in Haskell for a living, and the very nature of web applications is that most values coming into the system are not known until runtime. It is not a reasonable design goal to want every possible value in the system to be verified at compile time. However, it is valuable to be able to statically verify the relationships between functions and values as those values — once they have been parsed at runtime into a more principled type — move through the system.
There are type hints that you can give the compiler, but I believe that's only useful for generating faster assembly. Also you can get compile time warnings from macros, but that's much closer in nature to getting a parse error (something pretty much every language does that I'm aware of static or dynamically typed).
I wouldn't be surprised to learn that there exists a common lisp typing extension that someone made with macros (after all racket has something like that iirc), however if it exists I didn't use it regardless.
Well, maybe you don't feel it slows you down, but it is manual work you must do to get a reliable product only because of dynamic typing. Not only that, but you have to then refer to these docs to check you're not creating a type calamity at some nebulous point down the run time road. Static languages just won't let you mess that up, and often have utilities to generate this documentation for you at no effort.
> I just don't buy the suggestion that static typing magically solves a huge set of problems
Static typing really does "magically" solve a whole class of problems without any negative tradeoffs, assuming the language has decent type inference.
Not all problems, but a specific class of them that you should do extra work to guard against in dynamic languages. Whether that is extra documentation that has to be reliably updated and checked, or run time code to check the types are what you expect at the point of use.
Take for example JavaScript, where typing is not only dynamic, but weak. Numbers in particular can be quite spicy when mixed with strings as I'm sure you know. Strong, static typing forces you to be explicit in these cases and so removes this problem entirely.
By the way, no one's saying anyone is a simpleton. The reality is our field is wide and varied, and different experiences are valid.
Dynamic languages can do some things that static languages can't. For example, you can return completely different types from different execution paths in the same function.
This has been something that has confused me when reading Python, but it does make it easier for stuff like tree parsing. In a static language you need to specify some variant mechanism that knows all data possibilities ahead of time to allow this. From my perspective the dynamic typing trade off isn't worth these bits of 'free' run time flexibility, but YMMV! It really depends what arena you're working in and what you're doing.
What I was referring to as "not slowing one's work down very frequently" was the corner case situation that someone brought up 4 comments above yours, which (in their view) renders the general type checking capabilities of Python >= 3.5 moot. I don't buy that logic, but that was their argument, not yours.
But to shift gears: if there are languages besides JS that you feel get their type system "just right", I'd be curious as to what they are, for the benefit of that future moment when I have the luxury of time to think about these things more.
Put back into context, your reply makes sense as these popular libraries are pretty battle tested. Having said that, it is a valid point that type hints being voluntary means they can only be relied upon with discipled developers and for code you control. Of course, the same point could be made for any code you can't control, especially if the library is written in a weakly typed language like C (or JS).
> I just don't see its absence as the crippling dealbreaker
My genuine question would be: what does dynamic typing offer over static typing? Verbosity would be my expectation, but that only really seems to apply without type inference. The other advantage often mentioned is that it's faster to iterate. Both of these don't seem particularly compelling (or even true) to me, but I'm probably biased as I've spent all of my career working with static typing, aside from a few projects with Python and JS.
> if there are languages besides JS that you feel get their type system "just right", I'd be curious as to what they are
This is use case dependent, of course. Personally I get on well with Nim's (https://nim-lang.org/) type system: https://nim-lang.org/docs/manual.html#types. It's certainly not perfect, but it lets me write code that evokes a similar 'pseudocode' feel as Python and gets out of my way, whilst being compile time bound and very strict (the C-like run time performance doesn't hurt, too). It can be written much as you'd write type hinted Python, but it's strictness is sensible.
For example, you can write `var a = 1.5; a += 1` because `1` can be implicitly converted to a float here, but `var a = 1; a += 1.5` won't compile because int and float aren't directly compatible - you'd need to type cast with something like `a += int(1.5)`, which makes it obvious something weird is happening.
Similarly `let a = 1; let b: uint = a` will not compile because `int` and `uint` aren't compatible (you'd need to use `uint(a)`). You can however write `let b: uint = 1` as the type can be implicitly converted. You can see/play with this online here: https://play.nim-lang.org/#ix=3MRD
This kind of strict typing can save a lot of head scratching issues if you're doing low level work, but it also just validates what you're doing is sensible without the cognitive overhead or syntactic noise that comes from something like Rust (Nim uses implicit lifetimes for performance and threading, rather than as a constraint).
Compared to Python, Nim won't let you silently overwrite things by redefining them, and raises a compile time error if two functions with the same name ambiguously use the same types. However, it has function overloading based on types, which helps in writing statically checked APIs that are type driven rather than name driven.
One of my favourite features is distinct types, which allow you to model different things that are all the same underlying type:
type
DataId = distinct int
KG = distinct int
Data = object
age: Natural # Natural is a positive only integer.
weight: KG
var data: seq[Data]
proc newData: DataId =
data.setLen data.len + 1
DataId(data.high) # Return the new index as our distinct type.
proc update(id: DataId, age: Natural, weight: KG) =
data[id.int] = Data(age: age, weight: weight)
let id = newData()
id.update(50, 50.KG) # Works.
50.update(50, 50.KG) # Type mismatch got int but expected DataId.
id.update(50, 50) # Type mismatch got int but expected KG.
id += 1 # Type mismatch += isn't defined for DataId.
As you can imagine, this can save a lot of easy to make accidents from happening but also enriches simple integers to serve other purposes. In the case of modelling currencies (e.g., https://nim-lang.org/docs/manual.html#types-distinct-type) it can prevent costly mistakes, but you can `distinct` any type. Beyond that there's structural generics, typeclasses, metaprogramming, and all that good stuff. All this to say, personally I value strict static typing, but don't like boilerplate. IMHO, typing should give you more modelling options whilst checking your work for you, without getting in your way. // compiles and runs, but does bad things
function foo(x, y) {
someDangerousEffect();
return x + y;
}
-- does not compile; huge sets of problems magically solved
foo :: Int -> Int -> Int
foo x y = someDangerousEffect >> pure $ x + yAnd you're neglecting the part of my statement you conveniently truncated.
Our experiences have been wildly different :)
I laughed hard at that suggestion.
Can you maybe just show how to type a Python function such that it does the absurdly simple thing of taking a numpy array of integers?
def fun(nump_array_of_ints: ???): ...
Just to show everyone just how "useful" type hints in Python _actually_ are. import numpy as np
import numpy.typing as npt
x = np.array([1, 2, 3])
def foo(x: npt.NDArray[np.int_]) -> bool:
return True let b: uint = a
that you're really just saying let b: uint = uint(a)
And BTW don't you get tired of typing (and reading) `uint` twice in the latter setting? That's what I mean about "side effects" after all. let b: uint = uint(a)
# can be written as:
let b = uint(a)
The type is inferred from the right hand side during assignment. The only reason I wrote this let b: uint = a
is because in my example `a` was an `int`, so let b = a
Would infer an `int` type for `b`, which compiles fine, and doesn't show the type mismatch I wanted to present.It "can", but it's a design decision not to by default because mixing `uint` and `int` is usually a bad idea.
This is telling the compiler you want to add an `int` that represents (say) 63 bits of data with a +/- sign bit to a `uint` that doesn't have a sign bit. If `a = -1` then `b = uint(a)` leaves `b == 18446744073709551615`. Is that expected? Is it a bad idea? Yes. So, the explicit casting is "getting in your way" deliberately so you don't make these mistakes. If `a` is a `uint`, it can't be set to `-1`, and adding them is freely allowed.
Incidentally `uint` shouldn't be used for other reasons too, for instance unsigned integers wrap around on overflow, whereas integers raise overflow errors. The freedom of mixing types like this are why languages like C have so many footguns.
In short, explicit is better than implicit when data semantics are different. When the semantics are the same, like with two `int` values, there's no need to do this extra step.
You could create a converter to automatically convert between these types, but you should know what you're doing; the compiler is trying to save you from surprises. For `int`/`float`, there is the lenientops module: https://nim-lang.org/docs/lenientops.html. This has to be deliberately imported so you're making a conscious choice to allow mixing these types.
> don't you get tired of typing (and reading) `uint` twice in the latter setting?
Well, no because I wouldn't be writing this code. This example is purely to show how the typing system lets you write pythonesque code with inferred typing for sensible things, and ensures you're explicit for less sensible things.
For just `int`, there's no need to coerce types:
var
a = 1
b = a + 2
intro = "My name is "
name = "Foo"
greeting = ""
b *= 10
# Error: type mismatch: can't concatenate a string with the `b` int.
# greeting = intro & name & " and I am " & b & " years old"
# The `$` operator converts the `b` int to a string.
greeting = intro & name & " and I am " & $b & " years old"
# If we wanted, we could allow this with a proc:
proc `&`(s: string, b: int): string = s & $b
# Now this works.
greeting = intro & name & " and I am " & b & " years old"
echo greeting # "My name is Foo and I am 30 years old"
# Normally, however, we'd probably be using the built in strformat.
# Incidentally, this is similar to the printf macro mentioned in the article.
import strformat
echo &"My name is {name} and I am {b} years old" let a: int = 1
let b: float = a
Why wouldn't we want our dream language to infer a coercion here?That said, Python's behavior (though correct to spec) is arguably worse:
a: int = 1
b: float = a
print(b, type(b))
>>> 1 <class 'int'>
With no complaints from mypy.However, we don't need to specify types until the point of conversion:
let a = 1
let b = a.float
> Python's behavior (though correct to spec) is arguably worseYeah that is not ideal. Looking at the code it seems logical at first glance to expect that `b` would be a `float`. In this case, the type hints are deceptive. Still, it's not as bad as JavaScript which doesn't even have an integer type! Just in case you haven't seen this classic: https://www.destroyallsoftware.com/talks/wat
Another gotcha I hit in Python is the scoping of for loops, e.g.,https://stackoverflow.com/questions/3611760/scoping-in-pytho...
Python takes a very non-obvious position on this from my perspective.
Ultimately, all these things are about the balance of correctness versus productivity.
I don't want to be writing types everywhere when it's "obvious" to me what's going on, yet I want my idea of obvious confirmed by the language. At the other end of the scale I don't want to have to annotate the lifetime of every bit of memory to formally prove some single use script. The vast majority of the time a GC is fine, but there are times I want to manually manage things without it being a huge burden.
Python makes a few choices that seem to be good for productivity but end up making things more complicated as projects grow. For me, being able to redefine variables in the same scope is an example of ease of use at the cost of clarity. Another is having to be careful of not only what you import, but the order you import, as rather than raise an ambiguity error the language just silently overwrites function definitions.
Having said that, as you mention, good development practices defend against these issues. It's not a bad language. Personally, after many years of experience with Nim I can't really think of any technical reason to use Python when I get the same immediate productivity combined with a static type checking and the same performance as Rust and C++ (also no GIL). Plus the language can output to C, C++, ObjC and JavaScript so not only can I use libraries in those languages directly, and use the same language for frontend and backend, but (excluding JS) I get small, self contained executables that are easily distributable - another unfortunate pain point with Python.
For everything else, I can directly use Python from Nim and visa versa with Nimpy: https://github.com/yglukhov/nimpy. This is particularly useful if you have some slow Python code bottlenecking production, since the similar syntax makes it relatively straightforward to port over and use the resultant compiled executable within the larger Python code base.
Perhaps ironically, as it stands the most compelling reason not use Nim isn't technical: it's that it's not a well known language yet so it can be a hard sell to employers who want a) to hire developers with experience from a large pool, and b) want to know that a language is well supported and tested. Luckily, it's fairly quick to onboard people thanks to the familiar syntax, and the multiple compile targets make it able to utilise the C/C++/Python ecosystems natively. Arguably the smaller community means companies can have more influence and steer language development. Still this is, in my experience, a not insignificant issue, at least for the time being.