Global spatial indexes

Oxford Progamme for Sustainable Infrastructure Systems (OPSIS)

Intro slide

What are we talking about ?

Geospatial indexes: discrete global grid systems consisting of a multi-precision tiling of the sphere (plane) with hierarchical indexes.

Wide and evolving topic

This presentation will cover the main spatial indexes that are out there. Usage of one of them could potentially increase efficiency of workflows, especially when working with multiple layers of large (global) data sets from different sources and at various resolutions. Standardizing the data layers other a single index can have a positive impact when sharing data sets included in the various packages. Spatial data is a lot faster to query and analyse when properly indexed, potentially allowing for great performance gains on the server and client sides of APIs.

Context

While data base systems rely usually on a set of slightly different indexing techniques that adapt to the data provided, such as B-Trees, R-Trees and their many derivatives, there has been a strong significant development of tools that are more adapted for a general purpose data science workflow, providing a predefined (in-memory) subdivision system that can serve as a basis from which any other spatial data set can be expressed. On top of allowing very efficient querying and analysis, it also constitutes a form of representation of the data. This document reviews some of the most famous and widely adopted ones.

HTM

Formalised in Szalay et al. (n.d.)

HTM grid levels 0 to 5

Geohash

Developed by Gustavo Niemeyer, 2008

  • de facto standard on many systems (postGIS)

  • open-source

  • OpenStreetMap shortlink predecessor

  • space-filling Morton curve along which indexation happens

Grid example with Morton curve

Geohash

More material

Similar systems

S2

Developed by Google

  • original library in C++/Java

  • open source

  • bindings to many other languages/systems

  • each cell is a quadrilateral bounded by four geodesics, see Figure 1

    • in other words, a square on a sphere
  • levels from 0 to 30

    • Every \(cm^2\) on Earth can be represented using a 64-bit integer
  • cells are ordered sequentially along a space filling curve, see Figure 2

Figure 1: Grid cells
Figure 2: Space filling curve

S2

More material

H3

Developped by Uber

  • original library in C
  • open-source
  • bindings to many other languages/systems
  • hexagonal shapes
  • hierarchical (see Figure 3)
  • Theory: (Sahr 2014)

H3

  • edge indexation (useful to model flows for example), nearest neighbours

  • node indexation

  • the nature of the projection imposes that a constant number of 12 pentagons is present in the index at each level.

More material

Similar systems

  • Placekey: built on top of H3 with additional information on the type of feature being encoded, relevant for infrastructure mapping for example.

More online material

Others

Mainly legacy ones

Application

Example H3

Find all the locations of amenities along roads in a part of Tanzania.

We pick a convenient grid resolution, for example level py h3_level=8.

To get the data, follow the process explained in this doc, for example.

import duckdb as db
import geopandas as gpd
import h3

# pois
pois = gpd.read_parquet("tanzania_pois.geoparquet")

# roads
roads = gpd.read_parquet("tanzania_roads.geoparquet")

Pick a resolution

# h3 resolution
res=8

Projecting on the H3 grid

  • The idea is to map things to the H3 grid, to then express them as sets of h3 cells.
  • The nature of the index should significantly speed up computations when doing all sorts of geometric operations

Getting the h3 index of all the road nodes:

coords = roads_bb.get_coordinates()

coords_h3 = coords.apply(lambda row: h3.latlng_to_cell(row[1],row[0],res), axis=1)

Filling the gaps

Computing cells along shortest paths between nodes (approximating all the edges)

# get the list of h3 indexes for a linestring and return the full set of hex codes it touches
def path_from_h3(h3_list):
    return flatten(map(h3.grid_path_cells,h3_list[:-1],h3_list[1:]))

h3_paths = coords_h3.groupby(level=0).apply(lambda x: path_from_h3(x))

We can now pass this to a deck.gl map with the pydeck package.

Mapping

References

Sahr, Kevin M. 2014. “Central Place Indexing: Optimal Location Representation for Digital Earth.”
Szalay, Alexander S., Jim Gray, George Fekete, Peter Z. Kunszt, Peter Kukol, and Ani Thakar. n.d. “Indexing the Sphere with the Hierarchical Triangular Mesh.” https://doi.org/10.48550/arXiv.cs/0701164.