Ohm – A library and language for building parsers, interpreters, compilers, etc.

Ohm – A library and language for building parsers, interpreters, compilers, etc.(github.com)

348 points by testing_1_2_3_4 5 years ago | 100 comments

gklitt 5 years ago |

Ohm’s key selling point for me is the visual editor environment, which shows how the parser is executing on various sample inputs as you modify the grammar. It makes writing parsers fun rather than tedious. One of the best applications of “live programming” I’ve seen.

https://ohmlang.github.io/editor/

Waterluvian 5 years ago | |

A lot of regex testers do this and I can't imagine writing a regex or a parser without.

anon_tor_12345 5 years ago | | |

>a parser without

can you show me a parser generator that produces this kind of visualization?

thesz 5 years ago | |

I used to debug parsing process for VHDL grammar (which is ambiguous on lexem level) with parsing combinators and Haskell REPL.

Whenever my "compiler" found a syntax error in test suite, I was able to load part of source around error and investigate where my parser's error or omission is by running parser of smaller and smaller part of grammar on smaller and smaller parts of input.

It was 12 years ago.

And yes, it is fun. ;)

joshmarinacci 5 years ago |

I'm so happy to see this on HN. I've used Ohm for several projects. If you want a tutorial for building a simple programming language using Ohm, check out this series I put on GitHub.

https://github.com/joshmarinacci/meowlang

tovej 5 years ago |

Compiler compilers are great, I love writing DSLs for my projects. I usually use yacc/lex, or write my own compiler (typically in go these days).

However (and this is just me talking), I don't see the point in a javascript-based compiler. Surely any file format/DSL/programming language you write will be parsed server-side?

chrisseaton 5 years ago | |

> I don't see the point in a javascript-based compiler

JavaScript is a full programming language. Why wouldn't it be a fine choice to write a compiler in? People have a funny idea that compilers are more complex software or are somehow something low-level? In reality they're conceptually simple - as long as your language lets you write a function from one array of bytes to another array of bytes, then you can write a compiler in it. And for practicalities beyond that you just need basic records or objects or some other kind of structure, and you can have a pleasant experience writing a compiler.

> Surely any file format/DSL/programming language you write will be parsed server-side?

JavaScript can be used user-side, or anywhere else. It's just a regular programming language.

e12e 5 years ago | | |

> I don't see the point in a javascript-based compiler

Typescript, sass, jsx... There are a lot of languages running on top of js. Or you might want to do colorizing, autoformating on input in the browser?

Along with all that, there's as mentioned nodejs, deno for running server side.

But at any rate - lots of front-end problems involve various kinds of parsing/validation and transformation (eg: processing.js).

centimeter 5 years ago | | |

> Why wouldn't it be a fine choice to write a compiler in?

Javascript doesn't seem suited to compiler construction because it lacks lots of features that make compiler construction pleasant (e.g. strong rich types, algebraic data types, etc.)

It might be "fine" but it's not "good".

acarabott 5 years ago | |

I interned with the PI behind Ohm (Alex Warth) and one of his reasons for using the browser was simple:

“If I send someone an executable, they will never download it. If I send them a URL, they have no excuse.”

BiteCode_dev 5 years ago | | |

We are talking about a compiler here.

If someone interested in a compiler doesn't download it, it's not a excuse, it's a filter. Or a warning sign.

hansvm 5 years ago | |

(also just me talking -- here are some potential counterpoints)

The choice of language often matters a lot less than how familiar you are with it (and its ecosystem(s)). I think it's totally reasonable to want to use JS for a compiler in, e.g., a Node project if for no other reason than to not have to learn too many extra things at once to be productive with the new tool.

I also don't think it's fair to assume everything will be parsed, tokenized, etc server-side. Even assuming that data originates server-side (since if it didn't you very well might have a compelling case for handling it client-side if for no other reason than latency), it's moderately popular nowadays to serve a basically static site describing a bunch of dynamic things for the frontend to do. Doing so can make it easier/cheaper to hit any given SLA at the cost of making your site unusable for underpowered clients and pushing those costs to your users, and that tradeoff isn't suitable everywhere, but it does exist.

It's interesting that you seem to implicitly assume the only reason somebody would choose JS is that they're writing frontend code. It's personally not my first choice for most things, but it's not too hard to imagine that some aspect of JS (e.g., npm) might make it a top contender for a particular project despite its other flaws and tradeoffs.

tenaciousDaniel 5 years ago | | |

This makes me feel really good. I’m working on my first DSL and I’m writing it in JS. I really don’t know what I’m doing, and it felt like JS wasn’t as good a choice as a more “serious” language like C++.

But I’m standing my ground because I’m not even writing a proper “compiler” - in my case, the output is JSON. So it just kinda feels like it makes sense to stick with JS.

TheRealPomax 5 years ago | |

If your ecosystem is JS, having a JS based compiler is pretty convenient. As long as it's just "slower by some constant", rather than by a runtime order, the fact that it's not as fast as yacc/bison etc. is pretty much irrelevant, so being able to keep everything JS is quite powerful for people new to the idea having started their programming career using JS, as well as seasoned devs working in large JS codebases.

(and you can always decide that you need more speed - if you have a grammar defined, it's almost trivial to feed it to some other parser-generator)

coldtea 5 years ago | |

>Surely any file format/DSL/programming language you write will be parsed server-side?

Well, Javascript has been used for over a decade heavily on the server side, with Node, WASM and other projects.

And as far as raw speed goes, something like v8 smokes all scripting languages bar maybe LuaJit.

So, there's that...

peterhunt 5 years ago | |

There’s definitely a use for js based parsing for tooling that runs in the browser (autocomplete, documentation browsing etc). Integration with the Monaco editor is a common use case.

RodgerTheGreat 5 years ago | |

There's a great deal of value to making programming environments available in a browser, especially in the context of creative coding and education. I have built and used many such tools which are purely client-side.

There is a world of difference in accessibility between a tool that requires installation and a tool that you can use by following a hyperlink.

breck 5 years ago | |

> I don't see the point in a javascript-based compiler

My CC is Javascript based (well it was initially, then TypeScript, now a lot of it is written in itself).

99% of the time I use the actual languages I make in it server side (nodejs), but I am able to develop the languages in my browser using https://jtree.treenotation.org/designer/. It's super easy and fun (at least for me, UX sucks for most people at the moment). There's something somewhat magical about being able to tweak a language from my iPhone and then send the new lang to someone via text. (Warning: Designer is still hard to use and a big refresh is overdue).

iamwil 5 years ago | | |

Wait, what do you use treenotation for? What are the languages for? I think I'm just a little surprised someone's using treenotation other than to play with it.

branneman 5 years ago | |

In that case, way I ask why you are not a Racket user? Sounds like it'll save you a ton of time and keep your implementations high level.

kesava 5 years ago | |

A ton of front end templating languages/frameworks. They involve compilers to different degrees, don't they?

j0e1 5 years ago |

This is an example of a library we built using Ohm: https://github.com/Bridgeconn/usfm-grammar [1]

It works great for our use-case though I have been eyeing tree-sitter[2] for its ability to do partial parses.

[1] USFM: https://ubsicap.github.io/usfm/ [2] https://tree-sitter.github.io/tree-sitter/

corysama 5 years ago |

This is a follow-up to a major component of the http://vpri.org/writings.php project that created an self-contained office suite, OS and compiler suite in something like 100-200k lines of code without external dependencies.

PaulHoule 5 years ago |

Each PEG generator promises a revolution but only burns a car.

I was disappointed with how they do operator precedence; they use the usual trick to make a PEG do operator precedence which looks cool when you apply it to two levels of precedence but if tried to implement C or Python in it it gets unwieldy. Most of your AST winds up being nodes that exist just to force precedence in your grammar, working with the AST is a mess.

For all the horrors of the Bell C compilers, having an explicit numeric precedence for operators was a feature in yacc that newer parser gens often don't have.

I worked out the math and it is totally possible to add a stage that adds the nodes to a PEG to make numeric precedence work and also delete the fake nodes from the parsed AST. Unparsing I'm not so sure of, since if someone wrote

   int a = (b + c);

how badly you want to keep the parens is up to you; a system like that MUST have an unparse-parse identity in terms of 'value of the expression', but for sw eng automation you want to keep the text of the source code as stable as you can.

crazypython 5 years ago |

This title is misleading. It's a library and language for building parsers. Full stop. Parsing toolkit, as they say themselves.

exdsq 5 years ago | |

The title copies the second sentence of their readme:

> You can use it to parse custom file formats or quickly build parsers, interpreters, and compilers for programming languages.

UncleMeat 5 years ago | | |

I guess it depends on what it means to somebody to build a compiler. Something like yacc says "compiler compiler" in the name but really it is a parser generator. The hard part of industrial compilers is the optimization.

hardwaregeek 5 years ago |

I've used PEGs in the past. They're nice since they combine the mental model of LL grammars with the automation of LALR parser generators. However, it is quite easy to accidentally write rules where you never parse the second rule due to the ordering priority for rules. For instance:

    ident ::= name | name ("." name)+

Because with PEGs, the parser tries the first rule, then the second, and because whenever the second rule matches, the first one will also match, we will never parse the second rule. That's kinda annoying.

Of course with PEG tools you could probably solve this by computing the first sets for both rules and noticing that they're the same. Hopefully that's what this tool does.

sleavey 5 years ago | |

This is what's called left-recursion, and there's indeed a way to deal with it in PEG parsers: https://github.com/PhilippeSigaud/Pegged/wiki/Left-Recursion.

fjfaase 5 years ago |

I recently wrote a similar parser, maybe less fancy, for a workshop on parsing. It does display the the abstract syntax tree with d3.js and also has a build evaluator for a limited set of language constructs. https://fransfaase.github.io/ParserWorkshop/Online_inter_par... It is based on a parser I implemented in C++.

jweissman 5 years ago |

I’ve built a number of toy language projects with Ohm and it’s really wonderful. Just a joy to use the visual tooling also. All around really beautiful machinery

recursivedoubts 5 years ago |

Always fun to find the first commit:

https://github.com/harc/ohm/commit/4611bf63c5ecb90d782112d68...

2014

Neat tool. I write parsers by hand though. More fun, and you can be a lot sleazier.

branneman 5 years ago |

When should one use Ohm over Racket?

coldtea 5 years ago | |

When they want a library and toolkit for building parsers and languages, rather than a general programming language based on Scheme.

dunefox 5 years ago | | |

So, I guess you don't know why OP specifically asked about Racket: https://www.cs.utah.edu/plt/dagstuhl19/ https://beautifulracket.com/stacker/why-make-languages.html

branneman 5 years ago | | |

... but racket basically exists to create parsers and languages. It happens to also be a general programming language. But so is JS nowadays with Node.

scroot 5 years ago |

We are using Ohmjs on a project at work and it is fantastic. I'm hoping one day that Ohmjs and Ohm/s (Squeak) can be compatible again -- would love to have the Smalltalk version of our interpreter and environment we built using this

tobr 5 years ago |

Speaking of - what’s the status of HARC? Is it defunct?

azeirah 5 years ago | |

Yep, HARC is no more. I don't recall the exact history but iirc SAP withdrew its funding and HARC basically ceased to exist.

Now, ohm survives as an open-source project, Bret Victor continues work with Dynamicland and Vi Hart is currently employed at Microsoft Research.

jagger27 5 years ago | |

Defunct enough to let their TLS cert expire.

pjmlp 5 years ago |

Love it, this is great for teaching purposes.

codr7 5 years ago |

I recently created a library for the other part of an interpreter.

https://github.com/codr7/liblg

https://github.com/codr7/liblgpp

TheRealPomax 5 years ago |

It'd be cool if the online editor dispensed with the need to "write the grammar" entirely. A node based parser-generator in addition to Ohm being yet another grammar based parser-generator would be pretty great.

ampdepolymerase 5 years ago | |

Even better would be to generate parser from examples. See the Microsoft Research Excel Flash Fill paper.

f430 5 years ago |

If I want to modify GraphQL to support custom syntax, would Ohm work? Or does a solution exist already for my needs?

dw-im-here 5 years ago |

I'd rather put my hand in boiling water than develop a compiler in a dynamic weak typed language.

rkagerer 5 years ago |

OHM is also the acronym for Open Hardware Monitor, a great open-source project for monitoring computer temperatures, fan speeds, voltages, etc: https://openhardwaremonitor.org/

+-------------+-------------+-------------------------+----------+----------------------------------------+ | 00 01 02 03 | 04 05 06 07 | 08 09 10 11 12 13 14 15 | 16 17 18 | 19 20 21 22 23 24 25 26 27 28 29 30 31 | +-------------+-------------+-------------------------+----------+----------------------------------------+ | version | headerSize | typeOfService | length | +-------------+-------------+-------------------------+----------+----------------------------------------+ | identification | flags | offset | +---------------------------+-------------------------+----------+----------------------------------------+ | timeToLive | protocol | checksum | +---------------------------+-------------------------+---------------------------------------------------+ | sourceAddress | +---------------------------------------------------------------------------------------------------------+ | destinationAddress | +---------------------------------------------------------------------------------------------------------+