Preliminary results and methods
Oxford Progamme for Sustainable Infrastructure Systems (OPSIS)
Wrangling of economic activity datasets (DOSE, Wenz et al. (2023))
Data projection on the H3 hierarchical indexed grid
Developing the right assumptions about the nature of the observed variables
Increasing the resolution under the assumptions, using high resolution proxy layers
Implementation of a workflow covering this process: package scale-nav
Resolution - spatial precision of a value, usually between 1 and some integer value (18 in the case of H3, 32 in the case of S2), a greater value corresponds to a finer resolution, in other words greater spatial precision, lower values are coarser resolutions.
Cell size - the area of a grid cell, will usually depend on the resolution, a greater resolution means a smaller cell size, and inversely.
Cell radius/diameter - in the case of hexagonal cells, analogous to circle radius/diameter.
ex: H3 grid at resolution 10, the cell diameter is about 200 meter.
Downscaling - increasing the spatial resolution, reducing the size of cells we work with.
Grid - set of cells at a specific resolution
A value observed across a continuous area of interest can be projected onto a discrete grid, with a certain loss of information.
A grid represents a discrete representation of an area of interest, which is generally continuous, therefore having a greater resolution (smaller cell sizes) leads to more accurate values.
Interpolation is a usefool technique to infer intermediate values.
Downscaling, similarly to interpolation, allows infering values at a finer resolution than what is originally available in the data.
Hierarchical index - an index method that consistently covers various resultions.
Parent, children - relation between cells of a lower resolution and higher one that are contained.
The hierarchical indexation captures both the spatial proximity of cells, and their relations across scales (the hierarchy).
pycnophylactic - mass preserving
Using an enhanced DOSE data set, where a number of gaps have been filled with the World Development Indicator data on the national level.
Developped by Uber
Consists in updating the finer scale values from a known distribution at the given (lower) resolution. Using the average in the trivial case. This method ensures that the scale change is mass preserving (pycnophylactic).
\[ W(H_i) = \sum_{j=0..6}w(h_{ij}) \] It adds up.
\[ w(h_{ij}) = g_W(h_{ij})W(H_i) \] such that \(\sum_j g_W(h_{ij}) = 1\) is a density distribution associated to the variable \(W\) inside location \(H_i\) at higher resolution, that can be taken from external proxy data. In the trivial case, \(g_W\equiv \frac{1}{N_c}\)
\[ W(H_i)= \frac{\sum_jv_{ij}}{\sum_jq_{ij}} \]
using the associated extensive variables \(v,q\).
\[ w(h_{ij}) = W(H_i) \]
in the trivial case. A more advanced approach would be to consider the associated extensive variables. If \(W_{i}=\frac{V_{i}}{Q_{i}}\), then \(w_{ij}=\frac{V_i g_{V}(h_{ij})}{Q_ig_Q(h_{ij})}\). Example: \(GDP/cap\).
A parent cell \(H_i^{(n)}\) with resolution \(n\) contains a set of children \(\{h^{(n+1)}_{ij}\}_{j=0...6}\) with resolution \(n+1\). A value \(W\) measured at \(H_i^{(n)}\), \(W(H)\) is downscaled into \(w(h)\).
For an extensive variable, if \(W(H^n_i)=21\), we can downscale into \[w(h_{ij})=\frac{W(H^n_i)}{N_c}=\frac{21}{7}=3\] for all j.
If, we have a known underlying distribution at a finer scale, say \(g_W(h_{ij}) = 18/105\) for \(j=\{0,1,4,5,6\}\), and \(g_W(h_{i2})=1/21,g_W(h_{i3})=2/21\), we incorporate this data into the equation and get:
\[ w(h_{ij})=g_W(h_{ij})W(H_i) = \begin{cases} 3.6,& \text{for } j=0,1,4,5,6 \\ 1, & \text{for } j=2\\ 2, & \text{for } j=3\\ \end{cases} \] for each children cell.
For a certain level (Admin0,Admin1), we know the GDP of an area (in USD$) for given sectors. We know the amount of non-residential infrastructure in the region at a fine resolution of 100x100m (GHSL data package 2023. 2023). We assume a typical value of gdp generated for a unit of infrastructure and rescale to obtain a density layer of infrastructure for the given area. Then combine it with the known economic output at a coarser scale to downscale it.
We have a raster grid with agricultural yields from MAPSPAM(“Spatially-Disaggregated Crop Production Statistics Data in Africa South of the Sahara for 2017,” n.d.). The raster grid cell size is \(10km\times 10km\). We can try to downscale this while taking into consideration the human settlements that are covered by the grid using the GHSL total built up ground surface. By removing the cells that are covered in infrastructure, we dowscale the agriculture layer by taking into consideration the constrain on where crops are more likely to be in fact cultivated.
In any case, we rely on high resolution proxy/constrain layers and low resolution data layers.
From low resolution (parent cells) to high resolution (descendant cells)
From high resolution (descendant cells) to lower resolution (parent cells)
Lots of challenges both on the conceptual understanding of what is going on and the right choice of base data layers, constraints etc…
Question for the audience : good proxy and constraint data layers ?
But also technical implementation as the number of cells grows exponentially with each level of downscaling and all the data sizes with it.
Currently using duckDB with ibis and their geospatial extensions.
Tobler (1979)
pycnophylactic : mass preserving
Repository with notebooks visualizing, cleaning and combining various data sets.
Roudier et al. (2017)
Bürger et al. (2012);
Giuliani et al. (2022);
Malone et al. (2012);
Vrac et al. (2007);
Schoof (2013);
Frías et al. (2006);
Murakami and Yamagata (2019);
Khan, Coulibaly, and Dibike (2006);
Ekström, Grose, and Whetton (2015)
Gonzalez (2022)
python package: dl4ds
source : https://carlos-gg.github.io/dl4ds/dl4ds.html
R
pyconphy
: pycnophylactic interpolationdissever
ClimDown