H3: Uber’s Hexagonal Hierarchical Spatial Index (2018)

H3: Uber’s Hexagonal Hierarchical Spatial Index (2018)(eng.uber.com)

262 points by felipap 6 years ago | 74 comments

I did some indexing using elasticsearch and some home grown stuff based on geohashes about seven years ago. At the time Elasticsearch was just adding support for geoshape indexing as well. Initially this was also based on geohashes. Later they added proper quad tree support (instead of indexing the geohash as a term), and recently they revamped the implementation using BKD trees. The current implementation is way faster and scalable than what they had a few years back and allegedly quite nice for handling complex GIS data.

What's the advantage of Uber's approach over any of this. Even my primitive geohash solution worked pretty nicely and you can implement it on just about any type of DB. I had a simple algorithm to cover any shape with geohashes of a certain size as well as a quick way to generate a polygon for a circle. The two combined allowed me to do radius searches for any shape overlapping or contained by the circle with simple terms queries on the geohash prefixes. My main headache was keeping the number of terms in a query (i.e. number of geohashes) to a reasonable size (below 1024, which if I recall was a the default limit).

mtrovo 6 years ago | |

They go into some detail on this talk https://youtu.be/ay2uwtRO3QE?t=712.

What I get from their explanation is that hexagon is a better shape for map grids because they are the most complex shape that can tesselate (the other two are triangles and squares). As they are more close to a circle, distances within a cell are more stable, also computing the distance from a cell center to its neighbours is stable in hexagons as well.

I think the reason hex is not more common is that subdivisions are hard to create compared to triangles or squares. Uber solved this by subdividing in 7 smaller hex and tilting it so they cover the bigger shape with some small overlap.

Also a big problem is distortion, I never thought this would be that huge of a problem but it makes sense. They go into a lot of the details later on the same video.

Pamar 6 years ago | | |

Which is why, btw, Hexagons have been used for decades in wargames (and - to a more limited extent, in other types of boardgames).

twic 6 years ago | | |

> hexagon is a better shape for map grids because they are the most complex shape that can tesselate

[Penrose tiling intensifies]

AlchemistCamp 6 years ago | | |

The Techzing podcast talked about the origin of this several years ago, back when one of the hosts had a critical role in its creation. Good stuff.

sroussey 6 years ago | |

geohashes are cool, and lopping off a letter reduces precision.

But if you look at have they overlay London, you get quite a split higher up and two next to each other don’t look like they are and so I can see where the number of terms would get big.

The new ElasticSearch implementation is now the default and I think they are deprecating the prefix versions like geohash.

Too bad their stuff is not embeddable into an app. Know of lib for this?

jillesvangurp 6 years ago | | |

This is all part of Lucene and not specific to Elasticsearch as far as I understand.

Update with link: https://lucene.apache.org/core/7_1_0/core/org/apache/lucene/...

Mr_P 6 years ago |

I'd be curious to see a more thorough comparison to S2, which IMHO seems simpler (it's just quadtrees) and likely faster (it supports O(1) lookup vs. needing a hierarchical search).

Cactus2018 6 years ago |

Previously https://news.ycombinator.com/item?id=16135302

kiwidrew 6 years ago |

Uber really likes hexagons. Their internal name for driver promotions is even "hexcentive"! I've long wondered what compelled Uber to use hexagons to define their various promotion boundaries, because it's not a very user-friendly choice -- the lines on a map look very jagged, for instance.

This post goes some way towards explaining why hexagons:

* "minimize the quantization error introduced when users move through a city"

* "allow us to approximate radiuses easily"

* non-hexagonal regions (such as postal code boundaries) "have unusual shapes and sizes which are not helpful for analysis, and are subject to change"

traverseda 6 years ago |

>Hexagons are just pentagons without one side

My favorite quote in the video

kjeetgill 6 years ago | |

Don't they have one more side?

Yolta 6 years ago | | |

They do.

traverseda 6 years ago | | |

Oops, typed that backwards

hamandcheese 6 years ago |

How does this compare to S2 from google? It is not mentioned in the post at all.

http://s2geometry.io/

seanhandley 6 years ago | |

You get more aesthetic visuals from H3.

S2 cells distort quite heavily depending on which part of the globe you're mapping.

phkahler 6 years ago | | |

I think this is the real answer. I've like hex tiles for a long time, but they don't nest properly like squares in a quadtree. I think someone at Uber decided to "make it work" because it looks cool. They traded one set of problems for another, and I think from a technical point of view they made a poor choice.

snek 6 years ago | |

There is a deep comparison in the video in the post.

dang 6 years ago |

Thread from last year: https://news.ycombinator.com/item?id=17417545

scrappyjoe 6 years ago |

If anyone wants to use this in R, the h3-r package works quite nicely. It’s in C which, in my tests, is an order of magnitude faster than the v8 implementation. We use it to chop up spatial data into equally sized chunks that we can use as observations for regressions and the like. https://github.com/crazycapivara/h3-r

seanhandley 6 years ago |

I work for a delivery company. We currently use H3 to help calculate ETAs in realtime.

lvh 6 years ago | |

Does that work? I’d expect you want to do a fairly traditional road-based ETA? Otherwise you’ll be wildly off if you’re in a cul de sac next to a major highway or on said highway, or in massive traffic vs not at all.

seanhandley 6 years ago | | |

It's more to figure out how many couriers are in a given cell at a given time, and track that data over time so it can be forecasted.

Then we can say if you need a package collecting at address X, there are usually Y available couriers within Z distance. Combine that with a road mapping/traffic API and you're done.

no_identd 6 years ago |

I hate Uber, but I now opened this issue suggesting a way to massively increase the performance of this anyway:

https://github.com/uber/h3/issues/258

(Note: by "massively" I mean pushing it to O(1) for the fundamental coordinate operations. So, perhaps read that as "most massively".)

lacker 6 years ago |

I might be the only one pedantically annoyed here, but... the very first image on this page contains a mix of hexagons and pentagons.

Which is it - an inaccurate image, and the system is really hexagon-based, or does the system also use pentagons? My guess is that it is actually all hexagon-based since the overlap between granularity is already only approximate.

nexuist 6 years ago |

Is UberCon going on or something? Why so many releases in the past few days? I'm not complaining; just curious.

kyruzic 6 years ago | |

This article is over a year old.

drewm1980 6 years ago |

I wonder why healpix from NASA wasn't an option for them. Maybe the nonpermissive license...

evrydayhustling 6 years ago |

Why does each cell contain seven finer cells instead of six, resulting in imperfect containment?

wlesieutre 6 years ago | |

You can't divide a hexagon into 6 regular hexagons, that would mean using triangular cells to subdivide the hexagon.

Which isn't to say that you couldn't tile the planet with triangles, but they point out that the consistent relationship between neighboring hexagon tiles is useful:

>Using a hexagon as the cell shape is critical for H3. As depicted in Figure 6, hexagons have only one distance between a hexagon centerpoint and its neighbors’, compared to two distances for squares or three distances for triangles. This property greatly simplifies performing analysis and smoothing over gradients.

As you noticed, you can't divide a hexagon into 7 regular hexagons either. But it's apparently close enough:

>H3 supports sixteen resolutions. Each finer resolution has cells with one seventh the area of the coarser resolution. Hexagons cannot be perfectly subdivided into seven hexagons, so the finer cells are only approximately contained within a parent cell.

evrydayhustling 6 years ago | | |

Thanks! It turns out I am bad at hexagons.

flamtap 6 years ago | |

I wonder if it has something to do with the number of hexes it takes to cover a sphere with? It seems to me like the larger hexes/pentagons are mostly just illustrative borders.

pfortuny 6 years ago | |

They are not all hexagons on the top picture: there is no hexagonal polyhedron (with regular hexagons). You can see pentagons here and there.

the_seraphim 6 years ago | |

because you can't do that, that's not how shapes work.

Animats 6 years ago |

There's a lot to be said for a hexagonal grid. I looked at this once for a collision detection system. It's convenient to get rid of the special cases needed where four cells meet. You have to invent some new bookkeeping, but it can be worth it.

jacobolus 6 years ago | |

The aliasing / moiré artifacts are also dramatically less objectionable with hexagonal grids.

Now that image sensors and e.g. smartphone displays are very high resolution, and most content ends up going through multiple resampling routines on its way from capture -> storage/processing -> display, I would love to see people start to experiment with hexagon-grid cameras and displays. The slightly more complicated resampling wouldn’t really be a big deal for modern DSP hardware, and the visual output could be significantly better for the same pixel count.

bayesian_horse 6 years ago |

I love the aesthetics of hexagons.

It should be possible to turn this projection into quite a nice 3D printable model. Challenging, though. Probably one could try to split the sphere into equal hexagons for printing and then assemble.

jpab 6 years ago | |

You can't make a sphere entirely out of hexagons; the math doesn't work [1]. In the article they note that they include 12 pentagons too, to solve this problem. Neatly, they arrange to have all the pentagons in areas of water where they presumably won't have to analyse traffic patterns for a while.

(I mention it mostly because I think it's an interesting little mathematical factoid.)

[1] https://math.stackexchange.com/q/2121175

jacobolus 6 years ago | | |

One thing you can do though is tile an octahedron by just hexagons. At each of the 6 corners you end up with 2 hexagons which border each-other along 2 edges (instead of the usual 1). If you blow this octahedron up into a sphere those hexagons appear to be pentagons, because two of their edges are colinear (i.e. the same great-circle arc).

This can be nicer in some cases: the edge case your hexagon-grid algorithms have to deal with is having a hexagon with one of the same neighbors twice, instead of needing to worry about pentagons per se.

na85 6 years ago | | |

That link is a great example of why I stopped using stack exchange. The OP asked a question about tiling spheres with pentagons and the third response is basically "you can tile a sphere if it's not a sphere."

I had analogous experiences every time I asked a question there. One would ask a very clear question like, say, "how do I print to stdout in C?" And the first or second answer you get is inevitably about taking input from stdin. Or polymorphism.

starpilot 6 years ago |

Python bindings on Windows working yet?

seanhandley 6 years ago | |

David Ellis is working on this https://github.com/dfellis/h3-py/pull/1

jeanlucas 6 years ago |

Second time seeing it, would love to use it just for fun, but no idea what to do

peter_d_sherman 6 years ago |

Utterly Brilliant!

thstart 6 years ago |

H3 is working for the specific Ubers’ needs. Not much useful as S2.

kjeetgill 6 years ago | |

I wish you weren't downvoted, this is exactly right.

The nice thing about S2 is that is subdivides cleanly: A square can be composed of smaller squares while hexagons can't be. This property makes S2 much more broadly useful for a bigger range of applications.

In H3, each hierarchical level of hexagon doesn't fit cleanly in the one below. For Uber's uses, this is acceptable because hexagons have more uniform adjacency but the "zoom in and out" math is pretty gnarly.

But even S2 had the funkyness of first mapping a sphere to a cube. They're both fairly interesting to read up on.

mtrovo 6 years ago | |

Could you give more details? After reading the article I think the main reason is the subdivision algorithm which for S2 is easier but for everything else hexagon seems like a better approach.

HelloNurse 6 years ago | | |

Subdividing tiles can be more important than everything else: knowing in which tile a point lies is a fundamental operation for a spatial indexing system and it has to be very easy and efficient.

ggg3 6 years ago |

did not look at the code yet, but the algo for subdiving might be a nightmare.

square and triangle are weird for distance, maybe. but they can all hold precise subdivisions of the shape inside each other. hexagon "bleed", so why exactly they boast so many times in the article that h3 is so great for nesting different details levels? even their diagrams show very bad bleeding (worse than it should be if optimized) and they never mention it on this summary.

ThouYS 6 years ago |

Beautiful blog post!