can you show me a parser generator that produces this kind of visualization?
Whenever my "compiler" found a syntax error in test suite, I was able to load part of source around error and investigate where my parser's error or omission is by running parser of smaller and smaller part of grammar on smaller and smaller parts of input.
It was 12 years ago.
And yes, it is fun. ;)
However (and this is just me talking), I don't see the point in a javascript-based compiler. Surely any file format/DSL/programming language you write will be parsed server-side?
JavaScript is a full programming language. Why wouldn't it be a fine choice to write a compiler in? People have a funny idea that compilers are more complex software or are somehow something low-level? In reality they're conceptually simple - as long as your language lets you write a function from one array of bytes to another array of bytes, then you can write a compiler in it. And for practicalities beyond that you just need basic records or objects or some other kind of structure, and you can have a pleasant experience writing a compiler.
> Surely any file format/DSL/programming language you write will be parsed server-side?
JavaScript can be used user-side, or anywhere else. It's just a regular programming language.
Typescript, sass, jsx... There are a lot of languages running on top of js. Or you might want to do colorizing, autoformating on input in the browser?
Along with all that, there's as mentioned nodejs, deno for running server side.
But at any rate - lots of front-end problems involve various kinds of parsing/validation and transformation (eg: processing.js).
Javascript doesn't seem suited to compiler construction because it lacks lots of features that make compiler construction pleasant (e.g. strong rich types, algebraic data types, etc.)
It might be "fine" but it's not "good".
“If I send someone an executable, they will never download it. If I send them a URL, they have no excuse.”
If someone interested in a compiler doesn't download it, it's not a excuse, it's a filter. Or a warning sign.
The choice of language often matters a lot less than how familiar you are with it (and its ecosystem(s)). I think it's totally reasonable to want to use JS for a compiler in, e.g., a Node project if for no other reason than to not have to learn too many extra things at once to be productive with the new tool.
I also don't think it's fair to assume everything will be parsed, tokenized, etc server-side. Even assuming that data originates server-side (since if it didn't you very well might have a compelling case for handling it client-side if for no other reason than latency), it's moderately popular nowadays to serve a basically static site describing a bunch of dynamic things for the frontend to do. Doing so can make it easier/cheaper to hit any given SLA at the cost of making your site unusable for underpowered clients and pushing those costs to your users, and that tradeoff isn't suitable everywhere, but it does exist.
It's interesting that you seem to implicitly assume the only reason somebody would choose JS is that they're writing frontend code. It's personally not my first choice for most things, but it's not too hard to imagine that some aspect of JS (e.g., npm) might make it a top contender for a particular project despite its other flaws and tradeoffs.
But I’m standing my ground because I’m not even writing a proper “compiler” - in my case, the output is JSON. So it just kinda feels like it makes sense to stick with JS.
(and you can always decide that you need more speed - if you have a grammar defined, it's almost trivial to feed it to some other parser-generator)
Well, Javascript has been used for over a decade heavily on the server side, with Node, WASM and other projects.
And as far as raw speed goes, something like v8 smokes all scripting languages bar maybe LuaJit.
So, there's that...
There is a world of difference in accessibility between a tool that requires installation and a tool that you can use by following a hyperlink.
My CC is Javascript based (well it was initially, then TypeScript, now a lot of it is written in itself).
99% of the time I use the actual languages I make in it server side (nodejs), but I am able to develop the languages in my browser using https://jtree.treenotation.org/designer/. It's super easy and fun (at least for me, UX sucks for most people at the moment). There's something somewhat magical about being able to tweak a language from my iPhone and then send the new lang to someone via text. (Warning: Designer is still hard to use and a big refresh is overdue).
It works great for our use-case though I have been eyeing tree-sitter[2] for its ability to do partial parses.
[1] USFM: https://ubsicap.github.io/usfm/ [2] https://tree-sitter.github.io/tree-sitter/
I was disappointed with how they do operator precedence; they use the usual trick to make a PEG do operator precedence which looks cool when you apply it to two levels of precedence but if tried to implement C or Python in it it gets unwieldy. Most of your AST winds up being nodes that exist just to force precedence in your grammar, working with the AST is a mess.
For all the horrors of the Bell C compilers, having an explicit numeric precedence for operators was a feature in yacc that newer parser gens often don't have.
I worked out the math and it is totally possible to add a stage that adds the nodes to a PEG to make numeric precedence work and also delete the fake nodes from the parsed AST. Unparsing I'm not so sure of, since if someone wrote
int a = (b + c);
how badly you want to keep the parens is up to you; a system like that MUST have an unparse-parse identity in terms of 'value of the expression', but for sw eng automation you want to keep the text of the source code as stable as you can.> You can use it to parse custom file formats or quickly build parsers, interpreters, and compilers for programming languages.
ident ::= name | name ("." name)+
Because with PEGs, the parser tries the first rule, then the second, and because whenever the second rule matches, the first one will also match, we will never parse the second rule. That's kinda annoying.Of course with PEG tools you could probably solve this by computing the first sets for both rules and noticing that they're the same. Hopefully that's what this tool does.
https://github.com/harc/ohm/commit/4611bf63c5ecb90d782112d68...
2014
Neat tool. I write parsers by hand though. More fun, and you can be a lot sleazier.
Now, ohm survives as an open-source project, Bret Victor continues work with Dynamicland and Vi Hart is currently employed at Microsoft Research.
Don’t remember anything about office suite. Related names I remember are Alan Kay, Dan Amelang, Alessandro Wirth and Ian Piumarta.
https://en.m.wikipedia.org/wiki/Ometa (including reference section)
Or go to: http://www.vpri.org/writings.php
If I recall correctly you want: "STEPS Toward the Reinvention of Programming, 2012 Final Report Submitted to the National Science Foundation (NSF) October 2012" (and earlier reports)
Discussed on hn: https://news.ycombinator.com/item?id=11686325
And: https://news.ycombinator.com/item?id=585360
Notable for implementing tcp/ip by parsing the rfc.
"A Tiny TCP/IP Using Non-deterministic Parsing Principal Researcher: Ian Piumarta
For many reasons this has been on our list as a prime target for extreme reduction. (...) See Appendix E for a more complete explanation of how this “Tiny TCP” was realized in well under 200 lines of code, including the definitions of the languages for decoding header format and for controlling the flow of packets."
(...)
"Appendix E: Extended Example: A Tiny TCP/IP Done as a Parser (by Ian Piumarta) Elevating syntax to a 'first-class citizen' of the programmer's toolset suggests some unusually expres- sive alternatives to complex, repetitive, opaque and/or error-prone code. Network protocols are a per- fect example of the clumsiness of traditional programming languages obfuscating the simplicity of the protocols and the internal structure of the packets they exchange. We thought it would be instructive to see just how transparent we could make a simple TCP/IP implementation. Our first task is to describe the format of network packets. Perfectly good descriptions already exist in the various IETF Requests For Comments (RFCs) in the form of "ASCII-art diagrams". This form was probably chosen because the structure of a packet is immediately obvious just from glancing at the pictogram. For example:
+-------------+-------------+-------------------------+----------+----------------------------------------+
| 00 01 02 03 | 04 05 06 07 | 08 09 10 11 12 13 14 15 | 16 17 18 | 19 20 21 22 23 24 25 26 27 28 29 30 31 |
+-------------+-------------+-------------------------+----------+----------------------------------------+
| version | headerSize | typeOfService | length |
+-------------+-------------+-------------------------+----------+----------------------------------------+
| identification | flags | offset |
+---------------------------+-------------------------+----------+----------------------------------------+
| timeToLive | protocol | checksum |
+---------------------------+-------------------------+---------------------------------------------------+
| sourceAddress |
+---------------------------------------------------------------------------------------------------------+
| destinationAddress |
+---------------------------------------------------------------------------------------------------------+
If we teach our programming language to recognize pictograms as definitions of accessors for bit
fields within structures, our program is the clearest of its own meaning. The following expression cre-
ates an IS grammar that describes ASCII art diagrams."We detached this subthread from https://news.ycombinator.com/item?id=26604134.
http://www.kylheku.com/cgit/txr/tree/share/txr/stdlib/optimi...
The type is fine whether or not the line is present. It's all about that invariant.
None of the hair pulling I've experienced in compiler debugging had anything even remotely to do with type, which is something flushed out by testing.
Whenever doing anything, like an optimization test case, I put in print statements during development to see that it's being called, and what it's doing. You'd never add a new case into a compiler that you never tested. Just from the sheer psychology of it: too much work goes into it to then not bother running it. Plus the curiosity of seeing how often the case happens over a corpus of code.
Help! That's what I did. I chose to write the compiler in OCaml, a language that's already ~30 years old by now. But I can not find any type anotations! What should I do? I'm stuck!
Lisp is one of the best compiler implementation languages. Doing the same in C of C++ is about 3-20x more effort.
There's nothing magical about Lisp that makes it super fit for compiler development.
My recent fun public focus now is to power Scroll, (https://scroll.publicdomaincompany.com/). "Scrolldown" now powers my blog (an example post: https://github.com/breck7/breckyunits.com/blob/main/insist-o...). I think from what I'm seeing so far Scrolldown may be one of the first Tree Lang breakouts. Simple but powerful from extensibility.
TreeBase is used extensively at a few moderately successful websites. I think Tree Notation (or 2D langs generally) will be used OOMs more in this domain. It integrates so incredibly seamlessly with Git.
At Our World in Data Tree Notation is used for the researchers to build interactive data visualizations (https://www.youtube.com/watch?v=vn2aJA5ANUc&t=145s). That one uses it's own implementation called "GridLang" because I didn't want to depend on jtree, which is a bit too R&D for a site with that kind of traffic. The 2D lang/Tree Notation ideas are so simple that it's easy to roll your own code and you don't have to use "jtree". I view the "jtree" project in a way as just an experiment to confirm that yes, you can do anything/everything without any visible syntax characters. Space is all you need.
On the contracting side I'm helping a crypto group with a shockingly ambitious 2-D crypto.
Tons of brilliant people and huge companies have made software in X and found out it would have been better in Y. There's always tomorrow, as long as you learned today.
> If someone interested in a compiler doesn't download it, it's not a excuse, it's a filter. Or a warning sign.
You're so invested in gatekeeping that you're confusing the point of research with technofetishism.
Here's what Joe Armstrong had to say in "The Mess We're In":
"I downloaded this program, and I followed the instructions, and it said I didn't have grunt installed! [...Then] I installed grunt, and it said grunt was installed, and then I ran the script that was gonna make my slides... and it said 'Unable to find local grunt'."
Looks like someone needs to go dig up Joe and let him know that the real problem is that there was a mistake in letting him get past the point where he was supposed to be filtered out.
If it's a decently large project, sure. But if it's a small project with only a couple contributors who I've never heard of? There's the potential for that to be hiding malicious code. Plus the potential complexity of getting a project that's only ever been built on (say) 2 computers to successfully compile and run on my system. Plus figuring out whatever build system and weird flags they happen to use. And potentially wrangling a bunch of dependencies.
All that just to take a quick look at a language that might not actually be of interest to me in the end. The browser offers huge benefits here - follow a link and play around in a text box. It just works. (This is also why I use Godbolt - I don't want to bother with a Windows VM or wrangle ten different versions of Clang and GCC myself.)
Javascript is of course torture for other reasons, but lisp, prolog, clojure et al do make sense. Lisp being the language with the most implemented compilers. Prolog probably being the easiest. Prolog compilers are usually much shorter and better than lisp ones. Super fit is only OCaml because it already comes with all the infrastructure, C parsers and such. In lisp you'd need to write 50 lines.
> language that's already ~30 years old by now
Relevance?
Smalltalk and everything since then would like a word with you.
> And tagged unions (or algebraic datatypes) [...] type-directed features like pattern matching.
Erlang and Prolog would like to have a chat, too.
"You can write a compiler in a weakly typed language that resembles a compiler in a strongly typed language." Happy now?
The point, which you have ignored, is that there are strongly typed languages where the features you're relying on are not present. In fact, this is true of a bunch of the compilers that are among the most widely used in the world--ones that people are using to build projects written in C and C++ and things like the language support baked into IDEs for Java, C#, etc. So the relevant factor is not "strong vs. weak?" but rather those features (structural matching, etc) that you are relying on.
And let's be real, the original comment ("I'd rather put my hand in boiling water than develop a compiler in a dynamic weak typed language"; now flagged) was no more than a drive-by insult.
Sure, that's a better claim.
> The point, which you have ignored, is that there are strongly typed languages where the features you're relying on are not present.
There sure are! I don't think I was ever trying to say otherwise.
> In fact, this is true of a bunch of the compilers that are among the most widely used in the world--ones that people are using to build projects written in C and C++ and things like the language support baked into IDEs for Java, C#, etc.
Sorry, I'm having trouble parsing this. Are you referring to compilers of C/C++ here, or compilers written in those languages? The architectures of compilers I've worked with that were built in C++ were specifically on my mind when I wrote my comment.
> And let's be real, the original comment... was no more than a drive-by insult.
I think you're maybe reading too much into me here? I didn't write that comment you're referring to. I responded very narrowly to a claim in the light of a common misconception about how type systems factor into software architecture (namely, that they're don't do anything as long as your code is "correct"). I'm picking up a lot of hostility that I don't think I've earned.
Since it was a research project, STEPS never quite achieved a cohesive, unified experience, but they proved that the individual components could be substantially minimized and the cost of developing them amortized over a large project like a full GUI environment. Nile and some of the applications of Maru, like a minimal but functioning TCP/IP stack that can be compiled to bare metal by virtue of being made in Maru, still fascinate me.
Work on Maru is ongoing, albeit run by a community (with some input from Ian), Nile has been somewhat reborn of late, Ohm is again under active development as the successor to OMeta and Alan is still around.
(Source: Dan is a friend and colleague, and I've met a few of the STEPS/VPRI people that way.)
I'm an outsider and also never got Frank to work. I was waiting for the Nile/Gezira thesis to get a high level (but hopefully also some detailed) descriptions) of how they handled graphics. I vaguely remember getting parts of idst working but for each of these projects, there were always multiple versions lying around. Sometimes in odd places.
I read Alex Warth's thesis and it's well written, in a way that makes it very easy to understand. So, of course, I had to implement my own OMeta variant [2].
Also, the VPRI website itself says it's shutting down (presumably folks moved to HARC at that time?).
Edit to add that OMeta is the language agnostic parser and compiler!
[1] https://www.mail-archive.com/fonc@vpri.org/ [2] https://github.com/asrp/pymetaterp
Maru development is documented on an active mailing list.[1] Ohm development is being coordinated through GitHub. I'd personally like to take the extant code from OMeta/JS and the JS implementation of Nile & Gezira, and modernize them.
Recently I've been wondering if there's enough interest for a Discord server or something. (In the spirit of STEPS, it'd be ideal to make a new collaborative thing that's really different than static text/audio/video on the web, but gotta start somewhere. :) ) Unfortunately, I have had other, higher-priority projects at the moment, so I have taken no initiative to try to build a community.
I will also say that in my opinion, it's not clear to many of the people who made this stuff how special it is. The only exception to that is Bret Victor, who actually is not well-understood, but even the banana pudding versions of his ideas are typically much better than the industry's.
> I'm an outsider and also never got Frank to work. I was waiting for the Nile/Gezira thesis to get a high level (but hopefully also some detailed) descriptions) of how they handled graphics. I vaguely remember getting parts of idst working but for each of these projects, there were always multiple versions lying around. Sometimes in odd places.
I've never gotten Frank to work, and I abandoned my attempts. I've seen it run, though. The name was fully truthful: it really is Frankenstein's monster.
I did get Nile + Gezira to work (albeit in a very crude way by printing numbers to the console rather than hooking it up to a frame buffer). That's how I met Dan. I don't want to betray any confidences with him, but there is ongoing work with Nile.
Here's Dan himself presenting a related language at Oracle Open World in a demo (around 25 mins in).[2] (Full disclosure: I worked on the demo.)
If it were me getting started, I would take a look at the JavaScript implementation of Nile in Dan's Nile repo on GitHub. It should more or less work out of the box, and there's an HTML file containing a fairly full subset of Gezira. The only problem is that the JS style is way out of date, and so it does some things that are heavily frowned upon today. It may not work with tools like Webpack.
The Maru-based Nile is trickier to get working, but it does work. The issue with Ian's Maru is that it's quite hard to reason about and lacks clear debugging tools. I've gotten both up and running. I seem to remember the Boehm GC was pivotal in getting Maru to bootstrap and then run Nile.
> I read Alex Warth's thesis and it's well written, in a way that makes it very easy to understand. So, of course, I had to implement my own OMeta variant [2].
Pymetaterp is cool! I agree: Warth's work on OMeta was impressive. In some ways, Ohm feels inferior to me, though they're both good tools with lots of potential.
OMeta is the one tool from STEPS that is basically simple to understand and use without having to do a bunch of code archaeology.
> Also, the VPRI website itself says it's shutting down (presumably folks moved to HARC at that time?).
VPRI closed because STEPS ended and because Alan had to retire at some point. HARC and/or CDG Labs continued the work, but then closed as well. (I don't know all of the details, but someone here suggested SAP withdrew funding. That would track with what I do know.)
Today, Ian is teaching in Japan, Dan is at Vianai, Alex is at (IIRC) Google, Yoshiki is at Croquet, Bret Victor is doing Dynamicland, Vi Hart is at Microsoft Research and then Alan is retired. There were quite a few others I'm missing, and they are all doing interesting things as well.
[1] https://groups.google.com/g/maru-dev
[2] https://www.oracle.com/openworld/on-demand.html?bcid=6092429...
You could use a VM to workaround this issue no?
It seems to me that making a working demo of Frank as an open source project should be the first priority even if it runs only in a 32-bit VM, because then if the demo is interesting, you may even get help from other for "modernizing" Frank so that it runs natively.
Dan did a demo of a related language to Nile at Oracle Open World in September 2019. (Full disclosure: I worked on the demo.) I would predict that more information will be forthcoming about Nile this year.
[1] https://groups.google.com/g/maru-dev
[2] https://www.oracle.com/openworld/on-demand.html?bcid=6092429...
"STEPS Toward the Reinvention of Programming, 2012 Final Report Submitted to the National Science Foundation (NSF) October 2012"
Quite painfully ironic for a software research project that they didn't use properly a VCS..
Diving into this a bit, I remembered that fonc had it's own (now defunct) wiki. [3] It seems like a lot of the important pages were unfortunately not updated though.
[1] https://join.slack.com/t/footprintsorg/shared_invite/zt-o7ch... [2] https://hackmd.io/SB4QqG7bSxmgoUvPPoSzUA [3] http://vpri.org/fonc_wiki/ [3, archive.org] https://web.archive.org/web/20110901193854/http://vpri.org/f...
I would love to hear more about how you believe not only outsiders, but also the people who made this misunderstand this work?
How do you see the importance of STEPS and Bret Victor's work? I'm a big fan, and you clearly have a lot of knowledge. I'd love to read more!
> Recently I've been wondering if there's enough interest for a Discord server or something. (In the spirit of STEPS, it'd be ideal to make a new collaborative thing that's really different than static text/audio/video on the web, but gotta start somewhere. :) ) Unfortunately, I have had other, higher-priority projects at the moment, so I have taken no initiative to try to build a community.
I don't really like Discord because they keep asking for phone verification and early on, they were pretty aggressively shut down alternate client attempts.
What about Mattermost? I could try to set one up though initially, we wouldn't have email notifications or a CDN. Might not be so good if the initial group is small.
Slack? Don't know how they compare to Discord but at least they don't ask for phone verification.
A subreddit? A mailing list? Some kind of fediverse thing?
If there's some possibility of migrating to our own platform, I guess it doesn't matter as much where we start.
I could try to set something up in the coming week. But interest in this HN thread will still have died by that time.
> I did get Nile + Gezira to work (albeit in a very crude way by printing numbers to the console rather than hooking it up to a frame buffer). That's how I met Dan. I don't want to betray any confidences with him, but there is ongoing work with Nile.
Nice! I'm not anywhere near that. I'm still looking for a description of what it _is_ and at a very high level, how does it work internally? Something like "it's mathematical notation to describe the pixel positions/intensities implicitly via constraint equations; it uses a <something> solver for ...". What's in quote could be way off and is from memory of what I remember seeing.
> I've gotten both up and running. I seem to remember the Boehm GC was pivotal in getting Maru to bootstrap and then run Nile.
I also vaguely remember something about getting the right Boehm GC version so that some of
> Pymetaterp is cool! I agree: Warth's work on OMeta was impressive. In some ways, Ohm feels inferior to me, though they're both good tools with lots of potential.
Thanks! I share similar thoughts about Ohm. Having a visual editor is very nice, though I tend to use breakpoints for parser debugging [1].
Edit to add that id-objmodel [2] is another STEPS project I found to be simple and useful as an idea.
[1] See, for example, "Debugging" in https://blog.asrpo.com/adding_new_statement [2] https://www.piumarta.com/software/id-objmodel/