Let's make a Teeny Tiny compiler(austinhenley.com) |
Let's make a Teeny Tiny compiler(austinhenley.com) |
If you want to make a single long version of the page that includes all three parts, I'd be happy to arrange a repost. It would be best to email hn@ycombinator.com about it.
For those not aware of the bike shedding metaphor, it's the assertion that when discussing the design of a nuclear power plant, everyone will want to discuss the color of the shed where the workers store their bikes because they understand it. Meanwhile, nobody will want to discuss the nuclear reactor itself, because that's complicated and they don't really understand what's going on with it.
In compilers, parsing is the bike shed, and code emitters are the nuclear reactor. I've read probably hundreds of articles on "compilers" at this point, and they're all actually just about parsers. I can't point to a single one that actually emitted working assembly.
Congrats on at least having an emitter, but I'm still searching for an article that shows how to emit assembly of any kind.
i know what the bike shedding metaphor is and frankly you're stretching it a bit because all this person is trying to do is educate and they're not even saying this is the only way to do it.
seems odd to pull out the bike shed metaphor for every case there's an abundance of technical articles on a subject matter. There's lots of tutorials on for loops in X language. Do you consider that bike shedding?
there are plenty of documentation in the form of tutorials on for nearly every aspect of every programming language I can think of. I think it's important to keep in mind that we all learn differently and sometimes one explanation can make no sense while another makes a lot of sense.
That said, I think I get your frustration in that sometimes the process of _finding_ the explanation that clicks for whatever your question is (how to emit assembler?) can be really painful because all the explanations are shallow.
However, I don't think it's fair to take that frustration out on the writers (I got the impression you were, but maybe you weren't and just felt like venting). I for one encourage engineers to write if no for no other reason than to better cement their own understanding.
I do wonder, though, if maybe there's some improvements we can make to how we filter / search for long-form technical articles outside of google so that the content is more relevant
While I agree with your overall point about different learning styles, that's not really the problem I'm describing. Introductory material on emitting assembly or similar, doesn't exist for any learning style as far as I know. The best that I know of are some dead-tree books, and their emitters target dead or obscure architectures. None of the code examples are ones I could run on my machine.
> However, I don't think it's fair to take that frustration out on the writers (I got the impression you were, but maybe you weren't and just felt like venting).
That's a fair criticism: I apologized to the writer in a different post.
Assembly is just another language (or more precisely: family of languages).
Compiling a for loop to a slightly different syntax of for loop is a much different problem from compiling a for loop to conditional jumps.
To address your post: I'm not aware of any widely used compilers that compile from a general-purpose language to C. There are a handful of DSLs that do, and there may be some mainstream general-purpose language compiler that does that I don't know of.
Compiling to LLVM or GCC's RTL has a lot more in common with compiling to Assembly than it does with compiling to C.
Compiling to LLVM/RTL/assembly is fundamentally different from compiling to another high-level language. When compiling to C for example, you get to compile your for loops into C for loops--it's a fairly easy one-to-one conversion. Compiling to conditional jumps is a much more complicated endeavor, requiring more architecture.
My favorite was always context-sensitive, interprocedural points-to analysis. And dataflow analysis in the presence of higher-order controlflow constructs.
so instead of
self.emitter.emitLine("printf(\"" + self.curToken.text + "\\n\");")
you do something like self.emitter.emitLine("STRING DB " + self.curToken.text + "', '$'")
...
self.emitter.emitLine("LEA DX,STRING")
self.emitter.emitLine("MOV AH,09H")
self.emitter.emitLine("INT 21H")You said in your other post that this can be done with minor modifications, but I can already foresee a few modifications that would need to be made which aren't minor.
And then there's the problem that you may want to target more than one architecture. We can write two completely different code generators, but it would be nice if there were an architecture that could share some of the code.
> You said in your other post that this can be done with minor modifications
And it probably can, depending on the flavor of assembly you want to use, there are dozens (hundreds?) of them, i'm sure some will allow you to inline the string declaration. The example I gave probably doesn't even work since I haven't programmed in 8086 in close to 20 years, and I don't even remember how to set up data blocks and code blocks in it any more.
> And then there's the problem that you may want to target more than one architecture.
This is a toy compiler written by a professor of computer science meant to teach you the basics of building a compiler (lexing, parsing, emitting). This isn't a tutorial on building the next GCC.
I guess that's why we have things like LLVM that allow you to generate intermediate representations that get converted to a bunch of different instruction sets
The real barriers to me at this point are targeting a pragmatic real-world architecture.
That's a fair criticism.
I'm frustrated with the lack of material on emitting assembly, but it wasn't right of me to take that out on the author of this post. I apologized in a different post.
> And it probably can, depending on the flavor of assembly you want to use, there are dozens (hundreds?) of them
How about one I can run on my machine? There are maybe 5 that are useful targets I can think of:
* x86 or ARM (depending your machine)
* LLVM
* GCC RTL
* Web assembly
* Parrot? Maybe the JVM has some low-level bytecode?