Oxford Progamme for Sustainable Infrastructure Systems (OPSIS)
BHM - Bayesian Hierarchical Model
MCMC - Markov Chain Monte Carlo
PDF - probability density function
RV - random variable
Marginal Probability - PDF expressing dependency on a single parameter
Pycnophilcatic - mass preserving

But at a bigger cost.
The methodology was significantly expanded to use Bayesian Hierarchical Modelling (BHM) (Schoot et al. 2021).
Thomas Bayes (18th century); Has been long avoided due to technical complications, but has made a comeback since the 80s. Is now used across a wide range is disciplines, from physics and engineering, to economics, social sciences and medicine.
Bayesian is generally complex : Theory and formalism (Bayes, Markov Chain Monte Carlo, Information theory, Statistics…), software ecosystem (world of it’s own), PPLs (Probabilistic Programming Languages)…
As the name suggests, at the core of this method lies the familiar Bayes rule. Let us remind ourselves. If we have two phenomena (expressed as RV), we might be asking the following kind of questions:
What is the probability of a joint event, or a conditional event, having observed the outcome of a single one ?
This, in practice, takes the form \(p(x,z)\). And in such cases, Bayes rule tells us that
\[ p(x,z) = p(z) * p(x|z) \\ p(x,z) = p(x) * p(z|x) \]
Which can be combined into:
\[ p(x) = \frac{p(z) \cdot p(x|z)}{p(z|x)} = \frac{p(x|z)}{p(z|x)}\cdot p(z) \]
The first term is called the likelihood. If we can estimate the RHS using \(z\), then we can estimate the marginal of \(x\).

Now the most important point in BM, is that we apply such reasoning to the parameters of our model. In other words we are asking the question :
What are the chances of observing certain data from my model under some parameters \(\mathbf{\theta}\) ?
And we refine the parameter if we have some observations \(\{X_i\}\).
Let’s say we have some data \(\{ X_i \}\) and we model it as \(X \sim \mathcal{N(\mu,\sigma)}\). Usually, we would use the data to make an estimate on the values of \((\mu,\sigma)\), with both parameters fixed and computed from the data. But what if we set one parameter to have variability? In this case, the model becomes parametrised as \(p_X \equiv p_X(.|\mu)\). Formally, let’s model the parameters as RVs as well. The values we will draw from our original model will be conditioned on the parameter, which will itself be sampled. This is where Bayesian thinking. We defined an additional probability density, associated to our belief on the values of the mean \(\mu \sim \mathcal{N(\tau,\gamma)}\). We have a model defined not only for the data, but for the parameters of our model. The distribution on \(\mu\) as well as the one for \(X\) form our prior, encompassing the best of our knowledge about the observed phenomenon. The parameters \((\tau,\gamma)\) are referred to as hyper-parameters and can be used to fine tune our belief on the mean of the data we observe and sample down the way.
Let’s say now we have some observed sample data \(\{X_i\}\). Having laid a general behavior for the model, we can now turn to it and ask the question :
“What are the chances of observing the sample \(\{X_i\}\) in our model, conditioned to the parameter \(\mu\) ?”
In other words we are looking at \(p(\{X_i\}|\mu)\), recall Bayes rule from an earlier slide.
Ex : Height distribution in human population

From this point, the model has to learn the best possible parameters, given the observed data and prior beliefs that we communicated to it. This step relies on MCMC, sampling from the prior distribution and updating it based on the likelihood. At the end of this iterative process, we get the posterior distribution, which under the given set of priors and parameters, and for a number of samples, gives us the best belief on the parameters for the model to generate data.
The posterior distribution emerges once we have adapted the prior using the likelihood we measure with respect to observed data. This step is similar to a learning epoch in the training process of Deep Learning.
The method allows us to fine tune the posterior distribution, by sampling synthetic data out of the prior and adapting it to be more similar to the observed data at every new iteration. Markov Chains Monte Carlo is tool that allows us to do this.
| Data | Parameters | |
|---|---|---|
| Prior | \(f(.|\theta)\) | \(\pi(\theta)\) |
| Posterior | \(f(.|\theta)\pi(\theta|\{X_i\})\) | \(\pi(\theta|\{X_i\}) \propto f(X|\theta) \cdot \pi(\theta)\) |
Where the family of \(f\) is a choice of the user that can have a huge impact on the modelling.
After this rushed summary, we get in the specific details of our problem. The idea is to embed an econometrics model predicting the fine scale values into this Bayesian framework. On the one hand informing the behavior at the fine spatial scale, dictated by the econometrics model, and controlling that this behavior aligns with our prior knowledge and constraints at the coarse spatial and sector scale through the BHM.


We use a linear model, inspired by (spatial) econometrics(Anselin 1988; Redding 2024; Zellner 1985), to inform the bayesian method on how we expect our predictor variables to be linked to the industry level output. We apply this model at the fine resolution to every location that has some non-zero predictor variable.
\[ \mu_{S_i} = \sum_m \omega_m * x_m \]
where \(\omega_m\) are learned parameters and \(x_m\) are the proxy variables that we assign to the relevant sectors. The result is a set of linear equations, one for each sector, with chosen variables for each sector. We select only a subset of the available variables for each output to reduce the dimensionality of the problem and simplify our prior.
The linear model in turn yields a value \(\mu_{S_i}\) for a specific sector, which is interpreted as a mean value for a sector in a location by the Bayesian method and is combined with an uncertainty metric \(\sigma\) measured before hand as the average availability of data in the system. The tuple of values \((\mu_{S_i}, \sigma)\), with \(\sigma\) fixed and \(\mu_{S_i}\) obtained from the sampled linear combination of the \(\alpha\) parameters, are then used as parameters in a \(LogNormal\) model predicting the economic activity of a location.
\[Y_{S_i} \sim LogNormal(\mu_{S_i}, \sigma)\] We integrate our coarse spatial and sectorial output constraints by adding aggregation layers, which sum up the high resolution values and verify their validity with the observed totals at whatever resolution they are available.
Further, we introduce an normal noise \(\mathcal{N}(\mathcal{S_k}, \frac{\mathcal{S_k}}{10})\) on the reported outputs, which helps avoiding to set hard constraints and can be helpful down the way when dealing with uncertain reports.
POIs, GHSL NRES, DOSE-WDI, Copernicus, GHSL pop, Bureau of Economic Analysis (BEA), ILOSTAT, UK Business Value Added, EU IO tables
GEM, CGFI, Climatrace, Edgar, MAPSPAM
Is up on the cluster with access and manipulation facilitated by the scalenav2 package













The BHM side of things 
We use existing available downscaled data sets, in which one of the 2 dimensions (spatial, sectorial) is coarse, and reduce our data to validate at the resolution of the available data. Example with (Kummu, Taka, and Guillaume 2018b), where the data is total GDP at a fine spatial scale. We reduce the BHM model data by aggregating each locations output across sectors.











Using national reports on per sector productivity at coarse geographic resolution.
The flexibility of BHM allows us to reverse the use of the finer resolution economic output data and use it as a prior knowledge. This is a work in progress feature that is almost implemented.
We use a mix of methods, benefiting from their strengths, but we also inherit their limitations and challenges.
Dimensionality
Econ model
BHM
Data
Predictive modelling