Pedagogical Downsides of Haskell(ciobaca.substack.com) |
Pedagogical Downsides of Haskell(ciobaca.substack.com) |
It's sometimes implied that those who trip over Haskell just aren't big-brained enough to understand various important concepts related to it, but I've found they're usually very easy to grasp, provided the explanation's not using Haskell examples. If all programming were Haskell, I probably never would have become a programmer in the first place. Would have taken me too long to figure any of it out, probably would have concluded I wasn't smart enough to be a programmer.
I do wonder if there are some shared experiences or common patterns to who tends to love Haskell, and those who don't. I also feel nigh-dyslexic trying to read math formulas. Human language and broadly C-family programming languages, on the other hand, seemed easy and natural to me, almost effortless to pick up. Wonder if there's a "mathy"-person versus "languagey"-person divide on finding Haskell legible.
I'm not sure it's the whole thing, but I think I've also figured out that I find algorithm-type reasoning far easier to follow and work with than equations or proofs. Like, the only way I can begin to get traction with an unfamiliar equation is to break down what each term and operation "does" to something "moving through" it—it's tedious as hell. Might be something there.
Just because you can look at something as describing a computation doesn't mean you always should. For example:
data BinaryTree x =
| Leaf x
| Node (BinaryTree x) (BinaryTree x)
instance Monad BinaryTree where
return :: a -> BinaryTree a
return x = Leaf x
bind :: (a -> BinaryTree b) -> BinaryTree a -> BinaryTree b
bind f (Leaf x) = f x // replace a leaf with the result of calling f on its label
bind f (Node l r) = Node (bind f l) (bind f r) // traverse down the tree, ultimately replacing all the leaf nodes with a new subtree
You can choose to interpret a binary tree as describing nondeterministic computation where you have two choices at every step, but I rarely do. Most of the time trees are just trees.And what the hell is an interface for sequential computation? I think I understand what these Maybe types are and what they accomplish but "interface for sequential computation" sounds a lot like those buzzwords people mix together that could mean anything.
I think the fact your experience leads you to believe that a monad is an interface for sequential computation. A monad is often used for ordering computations, but Haskell’s monads can also be commutative (like the Reader instance of monad) which do not order anything.
The real issue is that the naming convention where some typeclass is named after a concept in category theory means wildly different things to different developers. For instance, I would expect a type/typeclass named for some categorical construct to behave in the way the categorical construct behaves and that would be the extent of what I use it for. However, some developer may see a particular usage the same construct and extrapolate that said construct is intrinsically tied to that algorithmic pattern of usage.
So the problem is controlling expectation and managing consistency throughout the dev community. I doubt Haskell will ever get away from the category theory inspired libraries and the subsequent naming conventions. See the relatively lively development of the profunctor optics based work. But, I can certainly see how it may distract or confuse newcomers.
Writing/Reading Haskell gives me a similar feeling to doing proofs than programming.
Even other functional programming languages don't give me that 'in the math class' feeling that Haskell does.
It's a bit fun because it's very short to write, it's concise and it helps a lot working only with dict and tuples etc.
Not sure if it's faster, but it's always a bit longer to write and think about, and I'm not sure it's easier to read and understand.
Sometimes it feels a bit like code golfing, because you can do a lot of things with very few lines.
It's immensely better to remove 99% of side effects, the code is shorter and more compartmentalized, so it's just easier to deal with.
Although I'm doing this alone, and I'm not confident that I could enforce this sort of software design in a team.
Though I'm glad newer languages are starting to adopt more features from the functional territory for the situations where it just makes more sense.
For me it's the opposite. Once i figure out what an expression is, I do not want it to change on the next clock cycle.
You can make Haskell about as human-readable as Ruby if you choose to.
Instead of building on this reality, Haskell asserts that the starting point is not physical reality, but rather a mathematical formalism called "The Lambda Calculus", the physical machine is looked at with disdain and pity, its limitations to be worked around to provide the one true abstraction. This is the original sin of Haskell, because it is an attitude that isn't driven by a need to make a thing, but aesthetics and a peculiar intellectual dogma around building that ultimately becomes a stumbling block.
In my view, you have to respect the machine. Abstractions can be beautiful, but they are ephemeral, changeable, unreal. The danger is that these illusions become a siren song to makers who are always looking for better tools, and to these makers the abstractions become realer than the machine. Haskell's power users famously don't actually make anything with it (modulo pandoc and jekyll), and my guess is because either they find that 90% of real-world things you want to do are "ugly" from Haskell's point of view, and so are left as distasteful "exercises for the reader", or they get so distracted by the beauty of their tools they never finish.
In any event, Haskell is a road less traveled for good reason.
> There is also a school of thought that you should start Haskell by teaching the IO monad first, but I am not convinced: in my experience, if someone gets exposed to IO early on, they will contaminate all their functions with IO. They will essentially end up writing Java in Haskell.
I don't think this is such a bad starting place. Crawling before walking. Purifying an (unnecessarily-) IO function into an ordinary function is a good exercise.
Trying to enforce non-IO from the start would be like enforcing 'no new keyword & factories only' in another language.
For example, to implement a simple "hello world" program in Rust you have to use a macro (println!), so you can't even look for a function signature in the standard library docs to help. So you can either just say "don't worry about this for now, just trust me" or spend a whole chapter diving into macro syntax. The number of concepts you need to implement a basic program is pretty large and you could easily spend a chapter going into any of them.
Personally I'm not a fan of the approach in this post to just "lie" to people but I do find myself showing a non-optimal implementation because that's all the syntax I've introduced up to that point. Then later I show how to do it better. I know some readers just want the final answer up front though.
I also show likely dependencies from the course assessment to the various topics. For instance, there is a strong dependency on the IO monad, but a weaker/optional dependency on (general) monads.
In terms of presentation order, I tend to over-simplify early in the course and circle back and make things more precise later.
(I'm teaching a 2nd year university course on Functional Programming with Haskell for the first time, so I found the OP fascinating. Thanks!)
One of the reasons the standard Prelude includes partial functions and specialize versions of `map` and `filter` is to support the pedagogical use-case (as far as I understand the situation). Most production applications will use a custom Prelude of some kind in order to prevent programmers from using foot-guns like `head` or make things more general in the case of `map` and `filter`.
Turns out using linked-lists for everything isn't the best idea but a lot of Haskell applications will use them because it's in Prelude.
Bit of a balancing act supporting both use cases.
But the effort to fix up the fixable issues mentioned in the post is about the same as writing the post was. Getting it distributed to the students may be a bit harder, depending on the local setup.
But it's definitely fixable with Haskell as it is today.
Linked lists are particularly tricky in Haskell, because as a data structure manifested in memory, they really stink. But as a lazy data structure traversed exactly once and thus just serving as a mechanism for providing "the next thunk", they're fine. Haskell and its laziness completely conflates the two of these, so it ends up being easy to think you have one and end up with the other.
It compiles quickly, the guidance offered in error messages are best in class, it's small, and the mental model is consistent.
In fact I think it's far easier to learn Elm (and also perhaps web UI development wouldn't be such a shitshow if programmers earlier in their career used Elm to build their mental model) than it is to learn:
- React
- Redux
- Immutable.js
- Lodash/Ramda
- ES${CURRENT_YEAR}
- Webpack/Parcel/Grunt/Groan/Whatever
- etc…
I've seen so many early programmers go through some React course thinking they've learned FP, and yet struggle to solve basic problems by applying functions to values.
1. Using an online editor with a rich built-in library removes any toolchain problems.
2. A custom standard library simplifies pedagogically unnecessary details like Foldable
3. The custom standard library also avoids currying (f(a, b) for functions rather than f a b)
4. Custom error messages improve the feedback students get from the compiler
I would highly recommend Code World to anybody looking to teach programming with Haskell. If you want to teach Haskell in a way that fits the existing ecosystem, it's also possible to run Code World without the custom standard library[2].
[1]: https://code.world/#
foldr k z = go
where
go [] = z
go (y:ys) = y `k` go ys
or foldr k z = foldr_k_z
where
foldr_k_z [] = z
foldr_k_z (y:ys) = y `k` foldr_k_z ysScheme has a bit of syntactic sugar called "named let" which makes this internal-helper pattern more concise/direct.
But calling auxiliary functions "go" is almost always bad naming.
Calling auxiliary functions "go" is like calling loop variables "i".
Of course PureScript has it's own downsides not apparent in GHC
I don't think this article is helpful for beginners.
Don't know why you thought it would be an article for beginners, but good on you for linking a resource regardless.
Where can we find 'no abstractions' these days? Even if you write in ASM, there will be tons of abstractions. Instructions will run out of order. Memory is abstracted. Even the ASM you write will be translated to microcode.
The closest you'll ever find to 'the physical reality of a machine' are microcontrollers (and even then, only some of them) and machines from the 80s. I have one sitting right next to me that I can tell you exactly how many cycles every CPU instruction takes.
Everything else is an abstraction. C abstracts a machine that doesn't exist(it was closer to machines that did exist at the time it was created). Even something as simple as a short circuit expression in your IF statement is an abstraction. Even in C you have to sometimes fight the abstractions when you are trying to, say, use caches effectively.
You can write relatively low-level Haskell a lot more easily than you can write low-level JavaScript. You just don't have to.
There are more facilities in Haskell to deal with this reality than in almost any other language you can think of.
Also, I'm sorry for any discomfort. To use an analogy, if your friend starts dating a girl that you know is bad for him, you can't just tell him that. You'll get punched. Especially early on when he's totally in love. It doesn't matter if you're right or wrong about her, there's no argument that is going to win against love, and to say anything ill of her is only going to cause pain and harm your relationship with your friend. And love is love, this applies to a person or a software tool.
I'm sorry for the discomfort, but I'm telling the truth as I see it and am not trying to hurt you. But Haskell, I think she's bad for you.
The goal of a programming language is to allow a human to express a sufficiently rigorous solution to a problem. From there, every step along the chain of execution is allowed to make 'unobservable' (for various definitions of the word) changes to execution. Your compiler might unroll your loops, or eliminate some unneeded intermediate variable, or even replace your entire function with a lookup table. Your CPU's microcode might do some weird fuckery with predictive execution. You shouldn't care, as long as the solution is, as far as you can observe, identical to your given one.
Whether functional programming is a better expression of computation than imperative programming is its own problem, but it's both silly and wrong to assert that imperative is better because it matches the behavior of the machine.
To the contrary, the simplicity of Haskell allows you to understand through simply rewriting expressions according to the rules/definitions you define. You don't have to worry about memory/effects/so many other things that have nothing to do with the _logic_ of what you are trying to do.
Of course, programs in reality often need to be changed to improve performance, but this isn't relevant when teaching.
The machine has no types. The machine has no variables. The machine has no functions, procedures, scoping, or information hiding. The machine has no assembly language. The machine has no machine code. The machine, ignoring the physical reality and focusing on an abstraction which could still potentially be in the realm of software and not physics, has a certain number of bits in flip-flops perturbed by other bits coming in on pins.
> Haskell's power users famously don't actually make anything with it (modulo pandoc and jekyll)
Self-contradiction is self-negation. You've destroyed your own argument, such as it was.
This is a lie that you're perpetuating.
Myself and many of my friends, colleagues, and associates make a living writing Haskell.
Those are too difficult for compiler writers to reason about. While you're mutating the finite set of registers in your high-level C code - just like a real computer does - clang is swapping those out for operations on an infinite number of immutable registers.
That isn't true. There are graph reduction machines whose natural model of computation is lambda calculus and they are generally very efficient compared to sequential processors implementing Turing machines.
That was enough to do most basic tasks and only later was it explained why they can't be mixed directly.
Starting without IO makes sure that you actually start to "get" how the language functions, so that once you jump into IO, the weird parts and how to mix it in with the logic written elsewhere makes a lot more sense.
Agree! And I would add that you can "purify" a monadic function without having to rewrite it in non-monadic style. You can make it polymorphic over all monads and relegate the "impurity" to monadic functions that you pass as arguments/dependencies. A trivial example:
twice :: IO ()
twice = do
putStrLn "foo"
putStrLn "foo"
twice' :: forall m. m () -> m ()
twice' action = do
action
action
This is not that different to having a Spring bean that doesn't perform any effect directly—say, a direct invocation to "Instant.now()"—but instead receives a "Clock" object through dependency injection.Haskell lets you express the idea of "program logic that only has effects through its dependencies" by being polymorphic over all monads.
And allowing beginners to write actual meaningful programs is a huge pedagogical benefit.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
(Not to mention that Haskell and its base libraries have plenty of abstractions useful also for the programmer to deal with the inherently stateful nature of computers, e.g. the IO type, STM, State (it's in the name!), Channels, etc.)
On top of this machine shape the OS adds a process abstraction, and a method to speak to devices. It is not coincidence that this process shape looks like the machine shape: lines of source correspond to instructions, declared structures correspond to main memory.... And from here we programmers pick a coordinate system and begin to build. But whatever coords we pick the space, the degrees-of-freedom, always the same: as vast as Turing could fathom. The interesting part of coordinate systems is the kinds of shapes you get for the constraints you picked. But Haskell seems to be a coordinate system with some valid constraint ideas (clear division between purity and side-effect, immutability), but an invalid sense of its identity as merely one coordinate system within this larger structure.
But what's the point? Which languages do we reject because they are sufficiently dissimilar to transistors? Should we all start writing code in VHDL?
I'd argue "of course not". You already know that it's the foldr worker because it's a local recursive definition inside foldr. And you already know it closes over k and z because it uses them without defining them locally. Nothing in that name provides additional semantic value.
You could still use it anyway, on the argument that a little redundancy can help aid reading. But the more Haskell code you read and write, the less that redundancy helps you with anything. On the other hand, the proliferation of names that contain almost no semantic content starts to drag on you. And so an idiom was developed for naming recursive workers that do the core job of what the parent's name promises: just name it "go". Nothing to think about. It's reduced down to a level that communicates exactly that it's not clever. It's just doing the thing it has to do. And it's standardized. If you see it, you know exactly what it's doing. There's no need to waste time mapping a new name into your existing set of well-known patterns.
So... As to the original argument's point? I think it probably is awkward for pedagogy. But it's absolutely better for actively using the language.
(In fact, I just checked the IORef documentation and it actually references the x86/64 architecture manual to explain some of the behaviour that can be expected. I would be surprised if any part of the C standard did that.)
[1]: I mean, if we're using an x86 derivative we're still talking about a very fancy PDP-11 emulator.
In practice, though, when I have some piece of memory-mapped hardware attached, and I want to talk to it, in C I can say:
*(uint32_t*)0xF00BA4 = 0x0102ABCD;
or whatever I need to flip the bits. C lets me actually control the whole machine. Whereas Haskell... I don't know, but I suspect it lets me actually use the physical machine a lot less. do
let ptr :: Ptr Word32 = nullPtr `plusPtr` 0xF00BA4
poke ptr 0x0102ABCD
should have you covered.Really? Can you run micro-ops?
Like, there's literally nothing stopping you. You can use FFI, and you can also write C inline.
You can have the best of both, if you want.
Like opening the jar before sticking a knife into the peanut butter.
Go is instead doing something similar to the parent that is easily transformed to the right thing (i.e. accumulated in reverse or something), or more general that does the right thing when called with specific arguments. Communicating how and why the function does what it does and works in conjunction with the top level wrapper actually matters.
But it's no more interesting than a "for" or "while" loop that takes up most of the body of a function in C or Java. People don't demand descriptive names for those, because they realize such a name would contain no useful information. That's equally true in functional programming.
foldr k z = foldr' k z
where
foldr' k z [] = z
foldr' k z (y:ys) = y `k` foldr' k z ys
That's obviously not sufficient, so it must have something to do with the nature of the closure. In this case I presume that it's because the closure captures k and z, although if you have any evidence to the contrary that would be interesting to see.When this happens, note that it's actually no longer constructing a closure at runtime. It has essentially closed over the values at compile time, using some very trivial transformations. If you use a definition that is too complex for those trivial transformations, you're getting in the way of the compiler doing its job. I always prefer to write my code with sympathy for the compiler. The less magic it needs to do, the better it does its job.
Consider the following:
myBigSubroutine :: Maybe Int -> Maybe Int -> Maybe Bool
myBigSubroutine ma mb = do
a <- ma
b <- mb
return (a > b)
Here we are sequencing the "effect" of optionality. `ma` must be evaluated before `mb` and if it returns a `Nothing` then we short circuit and do not evaluate `mb`. fn my_big_subroutine(ma: Option<isize>, mb: Option<isize>) -> Option<bool> {
match (ma, mb) {
(Some(a), Some(b)) => Some(a > b),
(_, _) => None,
}
}
it’s clearer to me what the intent is. I’m not sure why the other syntax is so hard for me but it feels hard to understand for some reason. fn my_big_subroutine(ma: Option<isize>, mb: Option<isize>) -> Option<bool> {
match (ma) {
(Some(a)) =>
match (mb) {
(Some(b)) => Some(a > b),
_ => None
}
(_) => None,
}
}
Which in this case is equivalent in this example, however I'm trying to stress the sequencing. Imagine `mb` had some very expensive computation in it, then it will remain an unevaluated thunk if we shortcircuit on `ma`.> it’s clearer to me what the intent is. I’m not sure why the other syntax is so hard for me but it feels hard to understand for some reason.
We can write `myBigSubroutine` with case matching:
case ma of
Nothing -> Nothing
Just a ->
case mb of
Nothing -> Nothing
Just b -> Just (a > b)
In fact, the `do` notation version desugars to something equivalent to this snippet.The motivation for using the `Monad` instance (and thus `do` notation), is that it allows us to be polymorphic over the effect described by `>>=` (and thus `do`).
This lets us have a customized version of sequencing computations specialized to whatever "effect" we need, not just casing on optional values.
No - either one can be evaluated first, with the other being short-circuited. If you swap the order of those lines, the function is exactly the same (in terms of inputs and outputs, at least).
ma >>= \a -> mb >>= \b -> return (a > b)
We can then inline the definition of `>>=` and `return` to get: case ma of
Nothing -> Nothing
Just a ->
case mb of
Nothing -> Nothing
Just b -> Just (a > b)
Imagine that `mb` is actually a really expensive computation that we don't want to perform unless `ma` returns a value. Sequencing our case statements in this way allows us to do that. `mb` will remain an unevaluated thunk until `ma` evaluates to a `Just a` value.It probably confuses people because this is a problem haskell created for itself.