H3: Uber’s Hexagonal Hierarchical Spatial Index (2018)(eng.uber.com) |
H3: Uber’s Hexagonal Hierarchical Spatial Index (2018)(eng.uber.com) |
What's the advantage of Uber's approach over any of this. Even my primitive geohash solution worked pretty nicely and you can implement it on just about any type of DB. I had a simple algorithm to cover any shape with geohashes of a certain size as well as a quick way to generate a polygon for a circle. The two combined allowed me to do radius searches for any shape overlapping or contained by the circle with simple terms queries on the geohash prefixes. My main headache was keeping the number of terms in a query (i.e. number of geohashes) to a reasonable size (below 1024, which if I recall was a the default limit).
What I get from their explanation is that hexagon is a better shape for map grids because they are the most complex shape that can tesselate (the other two are triangles and squares). As they are more close to a circle, distances within a cell are more stable, also computing the distance from a cell center to its neighbours is stable in hexagons as well.
I think the reason hex is not more common is that subdivisions are hard to create compared to triangles or squares. Uber solved this by subdividing in 7 smaller hex and tilting it so they cover the bigger shape with some small overlap.
Also a big problem is distortion, I never thought this would be that huge of a problem but it makes sense. They go into a lot of the details later on the same video.
[Penrose tiling intensifies]
But if you look at have they overlay London, you get quite a split higher up and two next to each other don’t look like they are and so I can see where the number of terms would get big.
The new ElasticSearch implementation is now the default and I think they are deprecating the prefix versions like geohash.
Too bad their stuff is not embeddable into an app. Know of lib for this?
Update with link: https://lucene.apache.org/core/7_1_0/core/org/apache/lucene/...
This post goes some way towards explaining why hexagons:
* "minimize the quantization error introduced when users move through a city"
* "allow us to approximate radiuses easily"
* non-hexagonal regions (such as postal code boundaries) "have unusual shapes and sizes which are not helpful for analysis, and are subject to change"
My favorite quote in the video
S2 cells distort quite heavily depending on which part of the globe you're mapping.
Then we can say if you need a package collecting at address X, there are usually Y available couriers within Z distance. Combine that with a road mapping/traffic API and you're done.
https://github.com/uber/h3/issues/258
(Note: by "massively" I mean pushing it to O(1) for the fundamental coordinate operations. So, perhaps read that as "most massively".)
Which is it - an inaccurate image, and the system is really hexagon-based, or does the system also use pentagons? My guess is that it is actually all hexagon-based since the overlap between granularity is already only approximate.
Which isn't to say that you couldn't tile the planet with triangles, but they point out that the consistent relationship between neighboring hexagon tiles is useful:
>Using a hexagon as the cell shape is critical for H3. As depicted in Figure 6, hexagons have only one distance between a hexagon centerpoint and its neighbors’, compared to two distances for squares or three distances for triangles. This property greatly simplifies performing analysis and smoothing over gradients.
As you noticed, you can't divide a hexagon into 7 regular hexagons either. But it's apparently close enough:
>H3 supports sixteen resolutions. Each finer resolution has cells with one seventh the area of the coarser resolution. Hexagons cannot be perfectly subdivided into seven hexagons, so the finer cells are only approximately contained within a parent cell.
Now that image sensors and e.g. smartphone displays are very high resolution, and most content ends up going through multiple resampling routines on its way from capture -> storage/processing -> display, I would love to see people start to experiment with hexagon-grid cameras and displays. The slightly more complicated resampling wouldn’t really be a big deal for modern DSP hardware, and the visual output could be significantly better for the same pixel count.
It should be possible to turn this projection into quite a nice 3D printable model. Challenging, though. Probably one could try to split the sphere into equal hexagons for printing and then assemble.
(I mention it mostly because I think it's an interesting little mathematical factoid.)
This can be nicer in some cases: the edge case your hexagon-grid algorithms have to deal with is having a hexagon with one of the same neighbors twice, instead of needing to worry about pentagons per se.
I had analogous experiences every time I asked a question there. One would ask a very clear question like, say, "how do I print to stdout in C?" And the first or second answer you get is inevitably about taking input from stdin. Or polymorphism.
The nice thing about S2 is that is subdivides cleanly: A square can be composed of smaller squares while hexagons can't be. This property makes S2 much more broadly useful for a bigger range of applications.
In H3, each hierarchical level of hexagon doesn't fit cleanly in the one below. For Uber's uses, this is acceptable because hexagons have more uniform adjacency but the "zoom in and out" math is pretty gnarly.
But even S2 had the funkyness of first mapping a sphere to a cube. They're both fairly interesting to read up on.
square and triangle are weird for distance, maybe. but they can all hold precise subdivisions of the shape inside each other. hexagon "bleed", so why exactly they boast so many times in the article that h3 is so great for nesting different details levels? even their diagrams show very bad bleeding (worse than it should be if optimized) and they never mention it on this summary.
Representation systems for geospatial data models is an amazingly deep theoretical rabbit hole. Common systems are almost always optimized for presentation as most were originally designed for cartographic use cases. If you were looking at representation systems optimized for fast, scalable geospatial analytics, for example, you'd use some type of 3-space embedding representation. There is a lot of diversity.
H3 also has efficient means for finding a cell’s neighbors, and comes with some nice algorithms - like the “compact” fill. See https://uber.github.io/h3/#/documentation/overview/use-cases
The hierarchical search data is built into a grid’s ID in both H3 and S2, which helps when comparing ID’s to see how close they are to each other.
For visualization, I prefer the way H3 looks over S2. While that’s just an opinion, H3 grid is exposed directly to drivers and in lots of tools.
I work for Uber and have used H3, but don’t work on the library.
Ah, yes this makes a lot of sense. The video in the link actually does have a nice comparison at 17:30, where this is called out. It seems to me to be the most compelling argument for hexagons.
I use this library every day and absolutely love it.
Bear in mind that H3 only has 16 levels of resolution, so traversing from top level down to the square-metre level is still a constant time operation i.e. O(16)
Unlike single-surface representations, these have the advantage of being essentially free of computational edge cases if you design them correctly. They are also amenable to implementations that are extremely computationally efficient to use, which is a bit of an afterthought for most presentation-optimized designs but important for high-scale geospatial analytics.
A common reflexive criticism of these representations is that they use equal volume sharding, which means that sharding them is not a good approximation of equal area on the embedded surface. An equal area decomposition only makes sense in the context of presentation (e.g. tiling) because the underlying data distribution is naturally extremely and unpredictably skewed, leading to non-uniform cell loading no matter how you decompose it. The assumption that equal area decomposition helps to ensure uniform cell loading is trivially false in practice, making it a non-optimization. Therefore, any competent implementation always requires a separate mechanism for ensuring uniform loading independent of the decomposition model.
The term of art for all of this is discrete global grid systems (commonly "DGGS"). The vast majority of the literature is focused on presentation optimized systems, and the design of single-surface representations, but other types of representations are discussed. It has a very rich taxonomy. I have an article I've been sporadically writing which I should probably finish that steps through the design of a state-of-the-art 3-space embedding representation system for scale-out analytics, based on a (currently stalled) effort to produce a formal standard for industry. A good 3-space embedding has a relatively simple description and implementation but there is much technical subtlety as to why it is designed a specific way.
Or another way to say this: if you start with 4 hexagons, with each glued together with each other along two adjacent edges, and you add the appropriate folds, you can make an octahedron.
Then you can subdivide each of those starting hexagons into n hexagons for any of these numbers, https://oeis.org/A003136 (power-of-4 sizes may be the most convenient among these, so that the overall grid has 2^n by 2^n size)
based on what criteria? H3 vs S2 is a tradeoff.
Well for one, the fact that they don't nest properly. They mention that you can get the address of the parent hexagon by truncating that of the current one. The problem is that doesn't actually work all the way up because certain tiles on the border will flip back and forth in strange ways. They had to do something to deal with this, just like the would have had to do something else with squares having two kinds ot neighbors. The squares issues are more obvious though and the code to deal with it will probably be very localized.
I mean, don't make a nested spatial index with spacial regions that don't actually nest. The structure fails at its primary claim.