Shunting-yard algorithm

Shunting-yard algorithm(en.wikipedia.org)

75 points by lkurusa 7 years ago | 26 comments

chubot 7 years ago |

My blog links to some runnable Python code for this algorithm, with tests:

Code for the Shunting Yard Algorithm http://www.oilshell.org/blog/2017/04/22.html

https://github.com/bourguet/operator_precedence_parsing

It seems like there are 2 common algorithms for expressions: the shunting yard algorithm and Pratt parsing [1]. As best as I can tell, which one you use is a coin toss. I haven't been able to figure out any real difference.

They're both nicer than grammar-based approaches when you have many levels of precedence, like C.

But it doesn't seem like there is any strict relationship between the two algorithms, which is interesting. I guess there are other places where that happens, like there being at least two unrelated algorithms for topological sort.

Trivia: The original C compilers used the shunting yard algorithm. IIRC the source made reference to Dijikstra.

[1] http://www.oilshell.org/blog/2017/03/31.html

userbinator 7 years ago | |

I haven't been able to figure out any real difference.

The biggest difference is that Pratt/precedence climbing does not require manipulating an extra stack, but instead uses the normal procedure stack in the same manner as recursive descent. This leads to somewhat simpler code. I've seen far more uses of recursive descent/Pratt than shunting-yard.

chubot 7 years ago | | |

I understand that, but I think it's mostly a matter of opinion ("somewhat simpler"). It seems like the code comes out the same length either way, and they're both linear time algorithms.

I can't imagine any case where a user will care -- it's an internal detail. Whereas there are places where a user would notice if you used a CFG vs. a PEG.

A long time ago, people might have been concerned about the recursive calls of Pratt parsing, but it seems like a non-issue now.

I do think Pratt parsing is easier to understand, but that might be because I encountered it first.

cdoxsey 7 years ago |

This algorithm is great.

I was working on a new metric alert evaluation system and built an expression parser for things like:

    system.disk.free{*} / 1024 by {host}

The naive approach to parsing will result in the wrong order of operations, since they'll just be done in the order they appear:

    x + y * z   =>   (x + y) * z

Rather than:

    x + (y * z)

As it should be. The shunting yard algorithm will rearrange the expressions.

After implementing it I noticed my results were different than the reference system... and that's when I discovered we did arithmetic wrong in the main app and had for years.

So I had to hard code a toggle for whether to do math properly :(. It's a sort of Hippocratic oath when it comes to these things... first do no harm, and even if it was wrong, people were relying on the existing functionality, and changing it would likely result in sudden alerts for folks.

In the end we did fix it in the main app, but you always feel kind of dirty writing code like that.

zzzcpan 7 years ago | |

Just the other day I was thinking about how all this infix business is needlessly complicated and leads to subtle bugs and hard to understand code. Like every time I encounter an uncommon operator in some language I have to lookup its precedence. So much time wasted trying to satisfy this silly familiarity with math notation. Only imagine how much easier things could be if all infix operators were, for example, left associative and had the same precedence. No more parsing bugs, no implicit orders and behaviors to remember, consistent order, even more natural and familiar than the math notation allowing to focus on things that matter and forget about dealing with precedence and associativity.

JadeNB 7 years ago | | |

If you're going to put all operators at the same level of precedence, you've accepted lots of parentheses anyway; so why not just go all the way and require parentheses everywhere, meaning that no associativity need be specified either?

User23 7 years ago | | |

I've always thought it would be cool to have a language where functions were tagged as distributive, associative, symmetric/commutative, and monotonic and the optimizer (and editor!) used these properties to determine optimizations or simplifications. Note that this would be for all expressions, not just arithmetical ones.

I vaguely recall hearing about some fancy C++ template based techniques for accomplishing this, actually, but haven't looked into it.

mamcx 7 years ago | | |

I also wonder that. However how much "dislike" will cause to force to use parents for math code? For me, I could live with that, but I'm not a math-heavy coder...

How much other stuff, apart of math, could be impacted?

stevefan1999 7 years ago | |

because essentially, Shunting yard algorithm is just a LR(0) parser, it awaits sufficient information first in a post-order manner (shifting), then each time you push to the final queue if a production is found (reducing)

rootbear 7 years ago |

I implemented this as a class assignment in Univac 1100 assembly language in the late 70s. I always thought it was a cool algorithm. I had an HP scientific calculator at the time and was a fan of RPN.

Insanity 7 years ago |

It's a neat algorithm :) When learning Go I made an implementation of Shunting-Yard at some point: https://github.com/DylanMeeus/GoPlay/blob/master/ExpressionP...

As part of something else I was trying. As I was just learning Go at that point, the code is not that clean :D But AFAIK it did work. My memory is a bit hazy on that :)

akhilcacharya 7 years ago |

This is one of those algorithms that everyone should understand or memorize. I had variations of this algorithm in the coding interviews or coding challenges of 6 companies (!) my last cycle.

wenc 7 years ago | |

> This is one of those algorithms that everyone should understand or memorize.

Understand... perhaps; it is a fascinating algorithm. It's also not complicated at all.

Memorize? Perhaps for code challenges, interviews and special occasions like that... but I haven't found it to be useful enough to commit to memory because in order to parse truly arbitrary expressions, you need to remember much more than just the the shunting yard rules.

I say this as someone who authored a mathematical modeling DSL in grad school and had implement this algorithm to correctly parse arbitrarily complicated math expressions. I was deep in the weeds of this. But I don't think I would have been able to reproduce it from memory even during that time. (also, I had to implement a more general version of the algorithm that dealt with function composition, multi-argument functions, unary operators, special keywords like sum/product over sets, etc.)

Of course nowadays I'm so lazy that I just do "import asteval" in Python. The reason is that once you get beyond the simple operators, arbitrary math expression parsing can be quite hard to get right, and I prefer using something that's heavily tested with no unhandled corner cases.

abecedarius 7 years ago | | |

A good way to rederive table-driven expression parsing on the spot is to start with recursive descent and notice how it makes a recursive call for every level of precedence, even the levels where "nothing happens" because there's no operator of that precedence level at this place in the input. Use the table to call the "right place in the grammar" directly. This works out to be the precedence-climbing algorithm.

I guess you'd get to the shunting yard from there by turning the remaining recursion into an explicit stack, but I haven't worked that out to check.

bhoeting 7 years ago |

Hah, a few years ago I learned Go by building a simple programming language (super basic with Lua-like syntax) and definitely used this article to help me. I made a slight modification so it could support functions with multiple arguments. Fun times.

https://github.com/bhoeting/blast/blob/master/parser.go#L82

IIAOPSW 7 years ago |

I was hoping for an algorithm that solves railway shunting problems. Anyone know of something that does that?

https://en.wikipedia.org/wiki/Train_shunting_puzzle

cdoxsey 7 years ago | |

This is really interesting. Found: https://www.researchgate.net/publication/225576076_Shunting_...

I wonder what real railway yards do.

kmnt 7 years ago | |

train shunting problems seem like a variation of tower of hanoi https://en.wikipedia.org/wiki/Tower_of_Hanoi

fooker 7 years ago | |

Encode it into a SAT instance and use a solver.

User23 7 years ago |

The original paper is a good read to get some insight into the thought process: https://www.cs.utexas.edu/~EWD/MCReps/MR35.PDF