Interpolating global economic activity data

Preliminary results and methods

Oxford Progamme for Sustainable Infrastructure Systems (OPSIS)

Introduction

Looking at economic activity at the finer scale than what the current data allows

  • Wrangling of economic activity datasets (DOSE, Wenz et al. (2023))

  • Data projection on the H3 hierarchical indexed grid

  • Developing the right assumptions about the nature of the observed variables

  • Increasing the resolution under the assumptions, using high resolution proxy layers

  • Implementation of a workflow covering this process: package scale-nav

Vocabulary note

Resolution - spatial precision of a value, usually between 1 and some integer value (18 in the case of H3, 32 in the case of S2), a greater value corresponds to a finer resolution, in other words greater spatial precision, lower values are coarser resolutions.

Cell size - the area of a grid cell, will usually depend on the resolution, a greater resolution means a smaller cell size, and inversely.

Cell radius/diameter - in the case of hexagonal cells, analogous to circle radius/diameter.

ex: H3 grid at resolution 10, the cell diameter is about 200 meter.

Downscaling - increasing the spatial resolution, reducing the size of cells we work with.

Grid - set of cells at a specific resolution

A value observed across a continuous area of interest can be projected onto a discrete grid, with a certain loss of information.

A grid represents a discrete representation of an area of interest, which is generally continuous, therefore having a greater resolution (smaller cell sizes) leads to more accurate values.

Interpolation is a usefool technique to infer intermediate values.

Downscaling, similarly to interpolation, allows infering values at a finer resolution than what is originally available in the data.

Hierarchical index - an index method that consistently covers various resultions.

Parent, children - relation between cells of a lower resolution and higher one that are contained.

The hierarchical indexation captures both the spatial proximity of cells, and their relations across scales (the hierarchy).

pycnophylactic - mass preserving

Input data

Using an enhanced DOSE data set, where a number of gaps have been filled with the World Development Indicator data on the national level.

Hierarchical indexed grids

H3

Developped by Uber

  • original library in C
  • open-source
  • bindings to many other languages/systems
  • hexagonal shapes
  • hierarchical (see Figure 1)
  • Theory:(Sahr 2014)
  • edge indexation (useful to model flows for example), nearest neighbours
  • node indexation
  • the nature of the projection imposes that a constant number of 12 pentagons is present in the index at each level.

Example : Interpolation

Example : downscaling

Data tranformation

Rescaling approach

Consists in updating the finer scale values from a known distribution at the given (lower) resolution. Using the average in the trivial case. This method ensures that the scale change is mass preserving (pycnophylactic).

Decreasing resolution

Increasing resolution

Extensive case

\[ W(H_i) = \sum_{j=0..6}w(h_{ij}) \] It adds up.

\[ w(h_{ij}) = g_W(h_{ij})W(H_i) \] such that \(\sum_j g_W(h_{ij}) = 1\) is a density distribution associated to the variable \(W\) inside location \(H_i\) at higher resolution, that can be taken from external proxy data. In the trivial case, \(g_W\equiv \frac{1}{N_c}\)

Intensive case

\[ W(H_i)= \frac{\sum_jv_{ij}}{\sum_jq_{ij}} \]

using the associated extensive variables \(v,q\).

\[ w(h_{ij}) = W(H_i) \]

in the trivial case. A more advanced approach would be to consider the associated extensive variables. If \(W_{i}=\frac{V_{i}}{Q_{i}}\), then \(w_{ij}=\frac{V_i g_{V}(h_{ij})}{Q_ig_Q(h_{ij})}\). Example: \(GDP/cap\).

In practice : simple example

A parent cell \(H_i^{(n)}\) with resolution \(n\) contains a set of children \(\{h^{(n+1)}_{ij}\}_{j=0...6}\) with resolution \(n+1\). A value \(W\) measured at \(H_i^{(n)}\), \(W(H)\) is downscaled into \(w(h)\).

Trivial rescaled case

For an extensive variable, if \(W(H^n_i)=21\), we can downscale into \[w(h_{ij})=\frac{W(H^n_i)}{N_c}=\frac{21}{7}=3\] for all j.

General rescaled case

If, we have a known underlying distribution at a finer scale, say \(g_W(h_{ij}) = 18/105\) for \(j=\{0,1,4,5,6\}\), and \(g_W(h_{i2})=1/21,g_W(h_{i3})=2/21\), we incorporate this data into the equation and get:

\[ w(h_{ij})=g_W(h_{ij})W(H_i) = \begin{cases} 3.6,& \text{for } j=0,1,4,5,6 \\ 1, & \text{for } j=2\\ 2, & \text{for } j=3\\ \end{cases} \] for each children cell.

source : https://h3geo.org/

In practice : GDP + Non-res buildings data

For a certain level (Admin0,Admin1), we know the GDP of an area (in USD$) for given sectors. We know the amount of non-residential infrastructure in the region at a fine resolution of 100x100m (GHSL data package 2023. 2023). We assume a typical value of gdp generated for a unit of infrastructure and rescale to obtain a density layer of infrastructure for the given area. Then combine it with the known economic output at a coarser scale to downscale it.

Further

In Practice 2: Agriculture yield + constraint layer

We have a raster grid with agricultural yields from MAPSPAM(“Spatially-Disaggregated Crop Production Statistics Data in Africa South of the Sahara for 2017,” n.d.). The raster grid cell size is \(10km\times 10km\). We can try to downscale this while taking into consideration the human settlements that are covered by the grid using the GHSL total built up ground surface. By removing the cells that are covered in infrastructure, we dowscale the agriculture layer by taking into consideration the constrain on where crops are more likely to be in fact cultivated.

Two directions

In any case, we rely on high resolution proxy/constrain layers and low resolution data layers.

Top-bottom

From low resolution (parent cells) to high resolution (descendant cells)

  • A low resolution data layer is downscaled into high resolution with the use of appropriate high resolution proxy layers.

Bottom-up

From high resolution (descendant cells) to lower resolution (parent cells)

  • A high resolution proxy/constraint layer is upscaled to aggregate it at a lower resolution.

Workflow

Conclusion

Challenges

Lots of challenges both on the conceptual understanding of what is going on and the right choice of base data layers, constraints etc…

Question for the audience : good proxy and constraint data layers ?

But also technical implementation as the number of cells grows exponentially with each level of downscaling and all the data sizes with it.

Currently using duckDB with ibis and their geospatial extensions.

Next steps

  • Robust workflows to project all sorts of spatial data onto the grid with the pycnophylactic principle.

Literature

Tobler (1979)

pycnophylactic : mass preserving

Online

Spatial economics data: ischlo/global-econ-data

Repository with notebooks visualizing, cleaning and combining various data sets.

Literature

Roudier et al. (2017)

Bürger et al. (2012);

Giuliani et al. (2022);

Malone et al. (2012);

Vrac et al. (2007);

Schoof (2013);

Frías et al. (2006);

Murakami and Yamagata (2019);

Khan, Coulibaly, and Dibike (2006);

Ekström, Grose, and Whetton (2015)

Literature

Contemporary methods: DL

Gonzalez (2022)

python package: dl4ds

source : https://carlos-gg.github.io/dl4ds/dl4ds.html

Online material

Packages

R

References

Bürger, G., T. Q. Murdock, A. T. Werner, S. R. Sobie, and A. J. Cannon. 2012. “Downscaling ExtremesAn Intercomparison of Multiple Statistical Methods for Present Climate.” Journal of Climate 25 (12): 4366–88. https://doi.org/10.1175/JCLI-D-11-00408.1.
Ekström, Marie, Michael R Grose, and Penny H Whetton. 2015. “An Appraisal of Downscaling Methods Used in Climate Change Research.” WIREs Climate Change 6 (3): 301–19. https://doi.org/10.1002/wcc.339.
Frías, M. D., E. Zorita, J. Fernández, and C. Rodríguez-Puebla. 2006. “Testing Statistical Downscaling Methods in Simulated Climates.” Geophysical Research Letters 33 (19): 2006GL027453. https://doi.org/10.1029/2006GL027453.
GHSL data package 2023. 2023. LU: Publications Office. https://data.europa.eu/doi/10.2760/098587.
Giuliani, Gregory, Denisa Rodila, Nathan Külling, Ramona Maggini, and Anthony Lehmann. 2022. “Downscaling Switzerland Land Use/Land Cover Data Using Nearest Neighbors and an Expert System.” Land 11 (5): 615. https://doi.org/10.3390/land11050615.
Gonzalez, Carlos Alberto Gomez. 2022. “DL4DS – Deep Learning for Empirical DownScaling,” May. http://arxiv.org/abs/2205.08967.
Khan, Mohammad Sajjad, Paulin Coulibaly, and Yonas Dibike. 2006. “Uncertainty Analysis of Statistical Downscaling Methods.” Journal of Hydrology 319 (1-4): 357–82. https://doi.org/10.1016/j.jhydrol.2005.06.035.
Malone, Brendan P., Alex B. McBratney, Budiman Minasny, and Ichsani Wheeler. 2012. “A General Method for Downscaling Earth Resource Information.” Computers & Geosciences 41: 119–25. https://doi.org/10.1016/j.cageo.2011.08.021.
Murakami, Daisuke, and Yoshiki Yamagata. 2019. “Estimation of Gridded Population and GDP Scenarios with Spatially Explicit Statistical Downscaling.” Sustainability 11 (7): 2106. https://doi.org/10.3390/su11072106.
Roudier, P., B. P. Malone, C. B. Hedley, B. Minasny, and A. B. McBratney. 2017. “Comparison of Regression Methods for Spatial Downscaling of Soil Organic Carbon Stocks Maps.” Computers and Electronics in Agriculture 142: 91–100. https://doi.org/10.1016/j.compag.2017.08.021.
Sahr, Kevin M. 2014. “Central Place Indexing: Optimal Location Representation for Digital Earth.”
Schoof, Justin T. 2013. “Statistical Downscaling in Climatology.” Geography Compass 7 (4): 249–65. https://doi.org/10.1111/gec3.12036.
“Spatially-Disaggregated Crop Production Statistics Data in Africa South of the Sahara for 2017.” n.d. https://doi.org/10.7910/DVN/FSSKBW.
Tobler, Waldo R. 1979. “Smooth Pycnophylactic Interpolation for Geographical Regions.” Journal of the American Statistical Association 74 (367): 519–30. https://doi.org/10.1080/01621459.1979.10481647.
Vrac, M., M. L. Stein, K. Hayhoe, and X.-Z. Liang. 2007. “A General Method for Validating Statistical Downscaling Methods Under Future Climate Change.” Geophysical Research Letters 34 (18): 2007GL030295. https://doi.org/10.1029/2007GL030295.
Wenz, Leonie, Robert Devon Carr, Noah Kögel, Maximilian Kotz, and Matthias Kalkuhl. 2023. “DOSE Global Data Set of Reported Sub-National Economic Output.” Scientific Data 10 (1): 425. https://doi.org/10.1038/s41597-023-02323-8.