On modern hardware the min-max heap beats a binary heap

On modern hardware the min-max heap beats a binary heap(probablydance.com)

244 points by pedro84 5 years ago | 43 comments

noctune 5 years ago |

And if you only need a monotone priority queue (i.e. priority queue where the popped elements are monotonically increasing/decreasing) you should consider using a radix heap. This monotone requirement can be satisfied more than you would expect when eg. using time as key or in pathfinding. I have a simple radix heap implementation here: https://github.com/mpdn/radix-heap

twotwotwo 5 years ago |

Like folks mentioned there, I wonder if a higher-fanout heap (people asked about 4-ary) might also do well in practice. Looking at Wikipedia (https://en.wikipedia.org/wiki/D-ary_heap), it looks like maybe so -- the growth in number of comparisons isn't that bad, and it mentions 4-ary heaps working well specifically.

(Like how linear search wins for small N, or insertion sort helps with small subarrays in introsort: more steps, but each step is also a lot cheaper on modern hardware due to fewer branch mispredictions, better cache locality, or something else that better fits the hardware's strengths.)

mehrdadn 5 years ago | |

I've tried D-ary heaps before (even min-max d-ary heap). Somehow my memory was that D = 3 or 4 sometimes performed better than 2, but now that I checked, in my code (which I spent a lot of times optimizing) I settled on plain binary heap at the end. So maybe my memory was faulty? Though I could swear I saw a performance improvement for D > 2 at some point. Sadly I don't recall why I reverted to binary heap exactly, or even whether it was related to speed or something else. Anybody tried it and remembered how it turned out?

nayuki 5 years ago | |

Another technique to speed up priority queues in a system with a memory hierarchy (i.e. modern computers) is the https://en.wikipedia.org/wiki/B-heap . I wonder if the author of the article is aware of this and is able to benchmark this.

barbecue_sauce 5 years ago | |

I don't have a formal background in CS. Any insight into why "d-" is used as the prefix?

teraflop 5 years ago | | |

It probably stands for "degree". In a tree, the degree of a node is how many children it has.

(And before anybody gets pedantic: technically, this is the node's "outdegree" when the tree is represented as a directed graph. If it was an undirected graph, we would have to count the parent node as well.)

ncmncm 5 years ago | | |

There are only two important things taught in CS that most people don't seem to pick up on the job. The most important is order notation. The second is the relation between grammars and state machines. Both are worth as much attention as you can afford for a week.

Things not taught in CS that you need to know on the job are legion. By far the most important of these is the use of invariants. Second might be the memory hierarchy, and cache semantics. Third might be use of bitmaps and bitwise operations.

dpbriggs 5 years ago | | |

It's the number of children each node can have

innocenat 5 years ago | |

I remember doing tests years ago and find that the extra comparison and loop control can tank the performance for n-ary when n>2. But that was >10 years ago.

bjoli 5 years ago | |

For very large heaps, I have had speedups of about 3x due to better cache locality with B-heaps.

gliese1337 5 years ago |

I have used a min-max heap once. I don't remember why I needed it at the time--it was a previous job--but I do remember that I had to roll my own, because it's just not that popular of a data structure, and it was the obvious and only good solution to the problem at the time.

So, it's nice to see a detailed analysis of the structure like this! Perhaps if it becomes more popular, I will find more places to use it.

bjo590 5 years ago | |

I used a min heap in a FANG interview. It's an obvious/good solution if the problem has a mix of reading/removing the smallest number in a data structure and writing new numbers to the data structure.

rgossiaux 5 years ago | | |

A min heap is different from a min-max heap. A min-max heap supports the operations of both a min heap and a max heap (essentially by interleaving the two). A normal min heap is a standard data structure, a min-max heap less so.

mav3rick 5 years ago | |

IIRC it's used in chess programs to evaluate moves.

nwallin 5 years ago | | |

Perhaps you're thinking of minimax? It's an unrelated concept to min-max heaps.

https://en.wikipedia.org/wiki/Minimax

https://en.wikipedia.org/wiki/Min-max_heap

Hello71 5 years ago | | |

you're thinking of minimax.

ncmncm 5 years ago |

> "...C++20 finally added a way of accessing this instruction, using std::countl_zero. I don’t have C++20 though so I had to do it the old platform specific ways ..."

You don't need C++20. Even C++98 had std::bitset<>::count(), which has a nice conversion from unsigned long, and which, when compiled for SSE3 or better (e.g. Core2), uses a POPCNT instruction. It is pretty simple to produce various other results, including countl_zero, from a popcount, with just a couple of additional bitwise ops.

Modern compilers are happy to keep a bitset<32> in the same register as an unsigned long, so the conversion takes exactly zero cycles. POPCNT takes three cycles, and the extra bitwise ops another couple.

ghj 5 years ago |

In certain domains, the trend has been to give up constant factors in order to increase programmer productivity (e.g., python pays a 10x slowdown but is batteries included).

So in that case I would use this data structure even if it weren't faster. I can't count the number of times I have had to mess with inserting negative priorities into a min heap to create a max heap! We should just have one data structure that does both.

(though taking this idea to the logical extreme means we should just use Order Statistic Tree for everything since it not only gives you log(n) min/max, but also log(n) find kth and get rank of x)

nickcw 5 years ago |

If you need a min-max heap (or double ended heap) in Go here is one I've used: https://github.com/aalpar/deheap

Very useful when you need it!

cellularmitosis 5 years ago |

It would be neat to fire this up on an older processor which doesn’t have modern instruction-level parallelism and verify the difference in performance

brandmeyer 5 years ago | |

On x86 you'd have to search pretty far back before the available ILP really dropped off. Some of the lower-end OoO ARMs might be a good testing ground, though. Say, a Raspberry Pi 4? Earlier-gen RPi used in-order cores.

usefulcat 5 years ago |

I've been using the author's flat_hash_map (https://github.com/skarupke/flat_hash_map) for several years now and have been really impressed. I've yet to find a single bug, it's nearly as fast as folly's hash maps (for my use case anyway) but far easier to integrate than folly.

gpderetta 5 years ago |

Impressive. Looking forward to the d-heap article.

IshKebab 5 years ago | |

Yeah I was going to say, it sounds like it is faster because it has higher arity rather than because it is min and max. So if you only need min or max a d-heap is probably better. Hopefully he will update the article.

https://en.wikipedia.org/wiki/D-ary_heap

(Also I didn't know they were called d-heaps, thanks!)

ww520 5 years ago |

Wow. This is a very good analysis on a fundamental algorithm. Haven’t seen a high quality analysis like this for a good while.