Geostatistics Glossary

Geostatistics Glossary#

Michael J. Pyrcz, Professor, The University of Texas at Austin

Chapter of e-book “Applied Geostatistics in Python: a Hands-on Guide with GeostatsPy”.

Cite this e-Book as:

Pyrcz, M.J., 2024, Applied Geostatistics in Python: a Hands-on Guide with GeostatsPy [e-book]. Zenodo. doi:10.5281/zenodo.15169133

The workflows in this book and more are available here:

Cite the GeostatsPyDemos GitHub Repository as:

Pyrcz, M.J., 2024, GeostatsPyDemos: GeostatsPy Python Package for Spatial Data Analytics and Geostatistics Demonstration Workflows Repository (0.0.1) [Software]. Zenodo. doi:10.5281/zenodo.12667036. GitHub Repository: GeostatsGuy/GeostatsPyDemos

By Michael J. Pyrcz
© Copyright 2024.

This chapter is a summary of essential Geostatistics Terminology.

Motivation for Geostatistics Concepts#

Firstly, why do this? I have received the request for a course glossary from the students in my Data Analytics and Geostatistics undergraduate course. While I usually dedicated a definition slide in the lecture slide decks for salient terms, some of my students have requested course glossary, list of terminology in their course review. The e-book provides a great vehicle and motivation for this.

Let me begin with a confession. There is a Geostatistical Glossary and Multilingual Dictionary written by my good friend Dr. Ricardo A. Olea an excellent geologist and statistician from the USGS. For those seeking the in depth, comprehensive list of geostatistical terms please use this book!

By writing my own I can limit the scope and descriptions to course content. I fear that many students would be overwhelmed by the size and mathematical notation of a standard geostatistics glossary.
Also, by including a glossary in the e-book I can link from glossary entries to the chapters in the e-book for convenience. I will eventual populate all the chapters with hyperlinks to the glossary to enable moving back and forth between the chapters and the glossary.

Finally, like the rest of the book, I want the glossary to be a evergreen living document.

Addition Rule (probability)#

Probability Concepts: when we add probabilities (the union of outcomes), the probability of \(A\) or \(B\) is calculated with the probability addition rule,

\[ P(A \cup B) = P(A) + P(B) - P(A,B) \]

given mutually exclusive events we can generalize the addition rule as,

\[ P\left( \bigcup_{i=1}^k A_i \right) = \sum_{i=1}^k P(A_i) \]

Affine Correction#

Distribution Transformations: a distribution rescaling that can be thought of as shifting, and stretching or squeezing of a univariate distribution (e.g., histogram). For the case of affine correction of \(X\) to \(Y\),

\[ y_i = \frac{\sigma_y}{\sigma_x}(x_i - \overline{x}) + \overline{y}, \quad \forall \quad i, \ldots, n \]

where \(\overline{x}\) and \(\sigma_x\) are the original mean and variance, and \(\overline{y}\) and \(\sigma_y\) are the new mean and variance.

We can see above that the affine correlation method first centers the distribution (by subtracting the original mean), then rescales the dispersion (distribution spread) by the ratio of the new standard deviation to the original standard deviation and then shifts the distribution to centered on the new mean.

there is no shape change for affine correction. For shape change consider Distribution Transformation like Gaussian Anamorphosis.

Anisotropic or Directional (variogram)#

Variogram Calculation: variogram calculated or modelled such that direction (azimuth) is considered,

for an anisotropic experimental variogram set the azimuth tolerance less than 90 degrees, then the experimental variogram is sensitive to direction
for an anisotropic variogram model set the major range greater than the minor range, then the variogram model is sensitive to direction

Azimuth Tolerance (variogram)#

Variogram Calculation: for calculating directional experimental variograms, the tolerance \(+/-\Delta\) in azimuth applied to identify data pairs for that specific azimuth.

for example, given an azimuth of 90 degrees (along the positive x axis in map view), we could assign an azimuth tolerance of 20 degrees and then all pairs with azimuths between 70 degrees and 110 degrees will be pooled to calculate the experimental variogram for this azimuth

Some general observations for setting azimuth tolerance,

it is common practice for directional experimental variograms to use a azimuth tolerance of 22.5 degrees, this result in a 45 degrees angle inscribed in the search window. This is seen as a good balance to provide smooth, interpretable experimental variograms without mixing spatial continuity information over too many directions
may be increased to smooth the experimental variogram for improved interpretability
for isotropic (also called omnidirectional) experimental variograms (not sensitive to direction) use an azimuth tolerance of 90.0, resulting in 180.0 degrees angle inscribed in the search window to remove the influence of direction

Bandwidth (variogram)#

Variogram Calculation: the maximum orthogonal deviation from the lag vector when identifying data pairs to calculate an experimental variogram.

What is the motivation for using bandwidth?

when working with a cross section (axes, y is vertical and x is horizontal) at large lag distances, \(\bf{h}\), azimuth (dip) tolerance may result in including data from adjacent (mixing between) stratigraphic units.
bandwidth is typically applied to the vertical direction to reduce the potential for mixing between adjacent units

When not to use bandwidth?

bandwidth should not be used to calculate isotropic (also called omnidirectional) experimental variograms
bandwidth is rarely used to calculate horizontal experimental variograms
bandwidth is removed by setting it very large relative to the model extent

Bayesian Probability#

Probability Concepts: probabilities based on a degree of belief (expert judgement and experience) in the likelihood of an event. The general approach,

start with prior probability, prior to the collection of new information
formulate a likelihood probability, based on new information alone
update prior with likelihood to calculate the updated posterior probability
continue to update as new information is available
solve probability problems that we cannot use simple frequencies, i.e., frequentist probability approach
Bayesian updating is modeled with Bayes’ Theorem

Bayes’ Theorem (probability)#

Probability Concepts: the mathematical model central to Bayesian probability for the Bayesian updating from prior probability, with likelihood probability from new information to posterior probability.

\[ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} \]

where \(P(A)\) is the prior, \(P(B|A)\) is the likelihood, \(P(B)\) is the evidence term and \(P(A|B)\) is the posterior. If is convenient to substitute more descriptive labels for \(A\) and \(B\) to better conceptualize this approach,

\[ P(\text{Model} | \text{New Data}) = \frac{P(\text{New Data} | \text{Model}) \cdot P(\text{Model})}{P(\text{New Data})} \]

demonstrating that we are updating our model with new data

Big Data#

Geostatistical Concepts: you have big data if your data has a combination of these criteria:

Data Volume - many data samples and features, difficult to store, transmit and visualize
Data Velocity - high-rate collection, continuous data collection relative to decision making cycles, challenges keeping up with the new data while updating the models
Data Variety - data form various sources, with various types of data, types of information, and scales
Data Variability - data acquisition changes during the project, even for a single feature there may be multiple vintages of data with different scales, distributions, and veracity
Data Veracity - data has various levels of accuracy, the data is not certain

For common subsurface applications most, if not all, of these criteria are met. Subsurface engineering and geoscience are often working with big data!

Big Data Analytics#

Geostatistical Concepts: the process of examining large and varied data (big data) sets to discover patterns and make decisions, the application of statistics to big data.

Bootstrap#

Distribution Transformations: a statistical resampling procedure to calculate uncertainty in a calculated statistic from the sample data itself. Some general comments,

sampling with replacement - \(n\) (number of data samples) Monte Carlo simulations from the dataset cumulative distribution function, this results in a new realization of the data
simulates the data collection process - the fundamental idea is to simulate the original data collection process. Instead of actually collecting new sample sets, we randomly select from the data to get data realizations
bootstrap any statistic - this approach is very flexible as we can calculate realizations of any statistics from the data realizations
computationally cheap - repeat this approach to get realizations of the statistic to build a complete distribution of uncertainty. Use a large number of realizations, \(L\), for a reliable uncertainty model.
calculates the entire distribution of uncertainty - for any statistic, you calculate any summary statistic for the uncertainty model, e.g., mean, P10 and P90 of the uncertainty in the mean
bagging for machine learning - is the application fo bootstrap to obtain data realizations to train predictive model realizations to aggregate predictions over ensembles of prediction models to reduce model variance

What are the limitations of bootstrap?

biased sample data will likely result in a biased bootstrapped uncertainty model, you must first debias the samples, e.g., declustering
you must have a sufficient sample size
integrates uncertainty due to sparse samples in space only
does not account for the spatial context of the data, i.e., sample data locations, volume of interest nor the spatial continuity. There is a variant of bootstrap called spatial bootstrap.

Categorical Feature#

Geostatistical Concepts: a feature that can only take one of a limited, and usually fixed, number of possible values

Categorical Nominal Feature#

Geostatistical Concepts: a categorical feature without any natural ordering, for example,

facies = {boundstone, wackystone, packstone, brecia}
minerals = {quartz, feldspar, calcite}

Categorical Ordinal Feature#

Geostatistical Concepts: a categorical feature with a natural ordering, for example,

geologic age = {Miocene, Pliocene, Pleistocene} - ordered from older to younger rock
Mohs hardness = \(\{1, 2, \ldots, 10\}\) - ordered from softer to harder rock

Cell-based Declustering#

Declustering to Correct Sampling Bias: a declustering method to assign weights to spatial samples based on local sampling density, such that the weighted statistics are likely more representative of the population. Data weights are assigned such that,

samples in densely sampled areas receive less weight
samples in sparsely sampled areas receive more weight

The goal of declustering is for the sample statistics to be independent of sample locations, e.g., infill drilling or blast hole samples should not change the statistics for the area of interest due to increased local sample density.

Cell-based declustering proceeds as follows:

a cell mesh is placed over the spatial data and weights are set as proportional to the inverse of the number of samples in the cell
the cell mesh size is varied, and the cell size that minimizes the declustered mean (in the sample mean is biased high) or maximizes the declustered mean (if the sample mean is biased low) is selected
to remove the impact of cell mesh position, the cell mesh is randomly moved several times and the resulting declustering weights are averaged for each datum

The weights are calculated as:

\[ w(\bf{u}_j) = \frac{1}{n_l} \cdot \frac{n}{L_o} \]

where \(n_l\) is the number of data in the current cell, \(L_o\) is the number of cells with data, and \(n\) is the total number of data.

Here are some highlights for cell-based declustering,

expert judgement to assign cell size based on the nominal sample spacing (e.g., data spacing before infill drilling) will improve the performance over the automated method for cell size selection based on minimum or maximum declustered mean (mentioned above)
cell-based declustering is not aware of the boundaries of the area of interest; therefore, data near the boundary of the area of interest may appear to be more sparsely sampled and receive more weight
cell-based was developed by Professor Andre Journel in 1983, [Jou83]

Cloud Transform#

Cosimulation: a cosimulation approach that is based on the bivariate relationship (scatter plot) between the primary and secondary features to simulated realizations of the primary feature, \(z\), paired with a previously simulated realization of the secondary feature, \(y\)

For all locations in the model, \(\alpha = 1, \ldots, nx \cdot ny\), \(\bf{u}_{\alpha}\),

find the collocated secondary value, \(y(\bf{u}_{\alpha})\)
calculate the conditional distribution, \(f_{(z|y = y(\bf{u}_{\alpha})}(z)\), from the scatter plot-based joint distribution function, \(f_{y,z}\), and the collocated secondary value, \(y(\bf{u}_{\alpha})\)
draw from \(f_{(z|y = y(\bf{u}_{\alpha})}(z)\) with a field of correlated p-values (known as a p-field)

Some comments about cloud transform,

prioritizes reproduction of the cloud, the bivariate relationship (scatter plot) between the primary and secondary features
may not well reproduce the primary feature spatial continuity and distribution
cloud transform is very commonly used in practice to simulate permeability with secondary porosity realizations. Why? Permeability samples are quite sparsely sampled and as a result the permeability distribution and variogram are not often well-known, but the bivariate relationship often has recognizable forms and the porosity variogram and distribution are more reliable.
the general approach of calculating simulated realizations by applying p-fields to local uncertainty distributions is called p-field simulation. It is often applied to simplify information integration through the separation of local conditioning and imposing spatial correlation, and as a faster simulation method to scale up to large models. More about p-field simulation.

Continuous Feature#

Geostatistical Concepts: a feature that can take any value between a lower and upper bound. For example,

porosity = \(\{13.01\%, 5.23\%, 24.62\%\}\)
gold grade = \(\{4.56 \text{ g/t}, 8.72 \text{ g/t}, 12.45 \text{ g/t} \}\)

Continuous, Interval Feature#

Geostatistical Concepts: a continuous feature where the intervals between numbers are equal, for example, the difference between 1.50 and 2.50 is the same as the difference between 2.50 and 3.50, but the actual values do not have an objective, physical reality (exist on an arbitrary scale), i.e., do not have a true zero point, for example,

Celsius scale of temperature (an arbitrary scale based on water freezing at 0 and boiling at 100)
calendar year (there is no true zero year)

We can use addition and substraction operations to compare continuous, interval features.

Continuous, Ratio Feature#

Geostatistical Concepts: a continuous feature where the intervals between numbers are equal, for example, the difference between 1.50 and 2.50 is the same as the difference between 2.50 and 3.50, but the values do have an objective reality (measure an actual physical phenomenon), i.e., do have true zero point, for example,

Kelvin scale of temperature
porosity
permeability
saturation

Since there is a true zero, continuous, ratio features can be compared with multiplication and division mathematical operations (in addition to addition and subtraction), e.g., twice as much porosity.

Cognitive Biases#

Geostatistical Concepts: an automated (subconscious) thought process used by human brain to simplify information processing from large amount of personal experience and learned preferences. While these have been critical for our evolution and survival on this planet, they can lead to the following issues in data science:

Anchoring Bias, too much emphasis on the first piece of information. Studies have shown that the first piece of information could be irrelevant as we are beginning to learn about a topic, and often the earliest data in a project has the largest uncertainty. Address anchoring bias by currating all data, integrating uncertainty, fostering open discussion and debate on your project team.
Availability Heuristic, overestimate importance of easily available information, for example, grandfather smoked 3 packs a day and lived to 100 years old, i.e., relying on anecdotes. Address availability heuristic by ensuring the project team documents all available information and applies quantitative analysis to move beyond anecdotes.
Bandwagon Effect, assessed probability increases with the number of people holding the same belief. Watch out for everyone jumping on board or the loudest voice influencing all others on your project teams. Encouraging all members of the project team to contribute and even separate meetings may be helpful to address bandwagon effect.
Blind-spot Effect, fail to see your own cognitive biases. This is the hardest cognitive bias of all. One possible solution is to invite arms length review of your project team’s methods, results and decisions.
Choice-supportive Bias, probability increases after a commitment, i.e., a decision is made. For example, it was good that I bought that car supported by focusing on positive information about the car. This is a specific case of confirmation bias.
Clustering Illusion, seeing patterns in random events. Yes, this heuristic helped us stay alive when large predictors hunted us, i.e., false positives are much better than false negatives! The solution is to model uncertainty confidence intervals and test all data and results against random effect.
Confirmation Bias, only consider new information that supports current model. Choice-supportive bias is a specific case of confirmation bias. The solution to confirmation bias is to seek out people that you will likely disagree with and build skilled project teams that hold diverse technical opinions and have different expert experience. My approach is to get nervous if everyone in the room agrees with me!
Conservatism Bias, favor old data to newly collected data. Data curation and quantiative analysis are helpful.
Recency Bias, favor the most recently collected data. Ensure your team documents previous data and choices to enhance team memory. Just like conservative bias, data curation and quantitative analysis are our first line of defense.
Survivorship Bias, focus on success cases only. Check for any possible pre-selection or filters on the data available to your team.

Robust use of statistics / data analytics protects use from bias.

Cokriging#

Cosimulation: the general simple kriging approach that accounts for primary and secondary features at the same time in order to build multivariate spatial models.

like regular kriging, a spatial estimation approach that relies on linear weights that account for spatial continuity, data closeness and redundancy.
weights are unbiased and minimize the estimation variance

The simple cokriging weights are calculated by solving a linear system of equations that may be represented with matrix notation as,

\[\begin{split} \begin{bmatrix} C_z(\bf{u}_1,\bf{u}_1) & C_z(\bf{u}_1,\bf{u}_2) & \dots & C_z(\bf{u}_1,\bf{u}_{n_z}) & \textcolor{red}{C_{z,y}(\bf{u}_1,\bf{u}_1)} & \textcolor{red}{C_{z,y}(\bf{u}_1,\bf{u}_2)} & \dots & \textcolor{red}{C_{z,y}(\bf{u}_1,\bf{u}_{n_y})} \\ C_z(\bf{u}_2,\bf{u}_1) & C_z(\bf{u}_2,\bf{u}_2) & \dots & C_z(\bf{u}_2,\bf{u}_{n_z}) & \textcolor{red}{C_{z,y}(\bf{u}_2,\bf{u}_1)} & \textcolor{red}{C_{z,y}(\bf{u}_2,\bf{u}_2)} & \dots & \textcolor{red}{C_{z,y}(\bf{u}_2,\bf{u}_{n_y})} \\ \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ C_z(\bf{u}_{n_z},\bf{u}_1) & C_z(\bf{u}_{n_z},\bf{u}_2) & \dots & C_z(\bf{u}_{n_z},\bf{u}_{n_z}) & \textcolor{red}{C_{z,y}(\bf{u}_{n_z},\bf{u}_1)} & \textcolor{red}{C_{y,z}(\bf{u}_{n_z},\bf{u}_2)} & \dots & \textcolor{red}{C_{y,z}(\bf{u}_{n_z},\bf{u}_{n_y})} \\ \textcolor{red}{C_{y,z}(\bf{u}_1,\bf{u}_1)} & \textcolor{red}{C_{y,z}(\bf{u}_1,\bf{u}_2)} & \dots & \textcolor{red}{C_{y,z}(\bf{u}_1,\bf{u}_{n_z})} & \textcolor{blue}{C_y(\bf{u}_1,\bf{u}_1)} & \textcolor{blue}{C_y(\bf{u}_1,\bf{u}_2)} & \dots & \textcolor{blue}{C_y(\bf{u}_1,\bf{u}_{n_y})} \\ \textcolor{red}{C_{z,y}(\bf{u}_2,\bf{u}_1)} & \textcolor{red}{C_{z,y}(\bf{u}_2,\bf{u}_2)} & \dots & \textcolor{red}{C_{y,z}(\bf{u}_2,\bf{u}_{n_z})} & \textcolor{blue}{C_y(\bf{u}_2,\bf{u}_1)} & \textcolor{blue}{C_y(\bf{u}_2,\bf{u}_2)} & \dots & \textcolor{blue}{C_y(\bf{u}_2,\bf{u}_{n_y})} \\ \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ \textcolor{red}{C_{y,z}(\bf{u}_{n_y},\bf{u}_{1})} & \textcolor{red}{C_{y,z}(\bf{u}_{n_y},\bf{u}_{2})} & \dots & \textcolor{red}{C_{y,z}(\bf{u}_{n_y},\bf{u}_{n_z})} & \textcolor{blue}{C_y(\bf{u}_{n_y},\bf{u}_1)} & \textcolor{blue}{C_y(\bf{u}_{n_y},\bf{u}_2)} & \dots & \textcolor{blue}{C_y(\bf{u}_{n_y},\bf{u}_{n_y})} \\ \end{bmatrix} \cdot \begin{bmatrix} \lambda_{z_1} \\ \lambda_{z_2} \\ \vdots \\ \lambda_{z_n} \\ \textcolor{blue}{\lambda_{y_1}} \\ \textcolor{blue}{\lambda_{y_2}} \\ \vdots \\ \textcolor{blue}{\lambda_{y_n}} \\ \end{bmatrix} = \begin{bmatrix} C_z(\bf{u}_1,\bf{u}) \\ C_z(\bf{u}_2,\bf{u}) \\ \vdots \\ C_z(\bf{u}_n,\bf{u}) \\ \textcolor{blue}{C_{z,y}(\bf{u}_1,\bf{u})} \\ \textcolor{blue}{C_{z,y}(\bf{u}_2,\bf{u})} \\ \vdots \\ \textcolor{blue}{C_{z,y}(\bf{u}_n,\bf{u})} \end{bmatrix} \end{split}\]

where we assume that there are \(1, 2, \ldots, n_z\) primary data \(z\), and \(1, 2, \ldots, n_y\) secondary data \(y\).

note, I could be more rigorous with the notation and indicate the location \(\bf{u}_{1}\) for \(z\) and \(y\) may not be the same location, but the notation was already getting quite dense.

This system integrates the,

spatial continuity - as quantified by the variogram (and covariance function to calculate the covariance, \(C\), values)
redundancy - the degree of spatial continuity between all of the available data with themselves, \(C(\bf{u}_i,\bf{u}_j)\)
closeness - the degree of spatial continuity between the available data and the estimation location, \(C(\bf{u}_i,\bf{u})\)
relationship - between the primary and secondary features over lag distance from the cross covariance terms, \(C_{z,y}\) and \(C_{y,z}\).

Once the weights are calculated from the above linear system of equations, the cokriging estimator is expressed as,

\[ z^{*}_{COK}(\bf{u}) = \sum_{\alpha_z = 1}^{n_z} \lambda_{\alpha_z} \cdot z(\bf{u}_{\alpha_z}) + \sum_{\alpha_y = 1}^{n_y} \lambda_{\alpha_y} \cdot y(\bf{u}_{\alpha_{y}}) \]

assuming that each feature is detrended such that the mean is 0.0.

Some general comments about cokriging,

the secondary data and primary data may be collocated, not collocated or even mixed with some collocated and some not
we must calculate and model the direct variograms, \(\gamma_z\) and \(\gamma_y\) and the cross variogram \(\gamma_{z,y}\). For all direct and cross variograms to be jointly positive definite we must used the constraints from the linear model of corregionalization.
we could expand the model to consider any number of secondary features!

Once again, cokriging is rarely used in practice, more often simplifying variants such as collocated cokriging or cloud transform are applied

Collocated Cokriging#

Cosimulation: a variant of cokriging that introduces 2 assumptions to greatly simplify the method.

Markov screening - only one (the collocated) secondary feature datum is considered, i.e., we only include the secondary datum at the location we are estimating the primary feature. We are assuming that the collocated secondary datum will screen all other secondary data at other locations away from the estimate.

as a result, we don’t need to calculate the secondary variogram, and we have a much smaller cokriging system.

Bayesian updating - the cross covariance \(C_{z,y}(\bf{h})\) is assumed to be a linear scaling of the direct covariance \(C_z(\bf{h})\). The Bayesian formulation for \(C_{z,y}(\bf{h})\) is,

\[ C_{z,y}(\bf{h}) = C_z(\bf{h}) \cdot \rho_{z,y} \]

where \(\rho_{z,y}\) is the prior, \(C_z(\bf{h})\) is the likelihood, and \(C_{z,y}(\bf{h})\) is the posterior.

as a result, we do not need to calculate the cross variogram

The collocated cokriging system of equations in matrix notation are,

\[\begin{split} \begin{bmatrix} C_z(\bf{u}_1,\bf{u}_1) & C_z(\bf{u}_1,\bf{u}_2) & \dots & C_z(\bf{u}_1,\bf{u}_{n_z}) & \textcolor{red}{C_{z}(\bf{u}_1,\bf{u}) \cdot C_{z,y}(0)} \\ C_z(\bf{u}_2,\bf{u}_1) & C_z(\bf{u}_2,\bf{u}_2) & \dots & C_z(\bf{u}_2,\bf{u}_{n_z}) & \textcolor{red}{C_{z}(\bf{u}_2,\bf{u}) \cdot C_{z,y}(0)} \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ C_z(\bf{u}_{n_z},\bf{u}_1) & C_z(\bf{u}_{n_z},\bf{u}_2) & \dots & C_z(\bf{u}_{n_z},\bf{u}_{n_z}) & \textcolor{red}{C_{z}(\bf{u}_{n_z},\bf{u}) \cdot C_{z,y}(0)} \\ \textcolor{red}{C_{z}(\bf{u},\bf{u}_1) \cdot C_{z,y}(0)} & \textcolor{red}{C_{z}(\bf{u},\bf{u}_2) \cdot C_{z,y}(0)} & \dots & \textcolor{red}{C_{z}(\bf{u},\bf{u}_{n_z}) \cdot C_{z,y}(0)} & \textcolor{blue}{\sigma_y^2} \\ \end{bmatrix} \cdot \begin{bmatrix} \lambda_{z_1} \\ \lambda_{z_2} \\ \vdots \\ \lambda_{z_n} \\ \textcolor{blue}{\lambda_{y}} \\ \end{bmatrix} = \begin{bmatrix} C_z(\bf{u}_1,\bf{u}) \\ C_z(\bf{u}_2,\bf{u}) \\ \vdots \\ C_z(\bf{u}_n,\bf{u}) \\ \textcolor{blue}{C_{z,y}(0)} \\ \end{bmatrix} \end{split}\]

where \(C_{z,y}(0)\) is the cross covariance at lag distance \(\bf{h} = 0\), for standardized features (variance of 1.0) this is the correlation coefficient, \(C_{z,y}(0) = \rho_{z,y}\).

Complimentary Events (probability)#

Probability Concepts: the NOT operator for probability, if we define A then A compliment, \(A^c\), is not A and we have this resulting closure relationship,

\[ P(A) + P(A^c) = 1.0 \]

complimentary events may be considered for beyond univariate problems, for example consider this bivariate closure,

\[ P(A|B) + P(A^c|B) = 1.0 \]

Note, the given term must be the same.

Conditional Probability#

Probability Concepts: the probability of an event, given another event has occurred,

\[ P(A|B) = \frac{P(A,B)}{P(A)} \]

we read this as the probability of A given B has occurred as the joint divided by the marginal. We can extend conditional probabilities to any multivariate case by adding joints to either component. For example,

\[ P(C|B,A) = \frac{P(A,B,C)}{P(B,C)} \]

Core Data#

Geostatistical Concepts: the primary sampling method for direct measure for subsurface resources (recovered drill cuttings are also direct measures with greater uncertainty and smaller, irregular scale). Comments on core data,

expensive / time consuming to collect for oil and gas, interrupt drilling operations, sparse and selective (very biased) coverage
very common in mining (diamond drill holes) for grade control with regular patterns and tight spacing
gravity, piston, etc. coring are used to sample sediments in lakes and oceans

What do we learn from core data?

petrological features (sedimentary structures, mineral grades), petrophysical features (porosity, permeability), and mechanical features (elastic modulas, Poisson’s ratio)
stratigraphy and ore body geometry through interpolation betweeen wells and drill holes

Core data are critical to support subsurface resource interpretations. They anchor the entire reservoir concept and framework for prediction,

for example, core data collocated with well log data are used to calibrate (ground truth) facies, porosity from well logs

Correlogram#

Variogram Calculation: a measure of similarity vs. distance. Calculated as the average product of values separated by a lag vector centered by the square of the mean standardize by the variance.

\[ \rho_x(\bf{h}) = \frac{1}{n \cdot \sigma^2_x} \sum_{\alpha = 1}^{n} x(\bf{u}_{\alpha}) \cdot x(\bf{u}_{\alpha} + \bf{h}) \]

When the feature is standardized to have a variance of 1.0, the correlogram is equal to the covariance function.

\[ \rho_x(\bf{h}) = C_x(\bf{h}) \]

and in that case the correlogram is the variogram upside down,

\[ \rho_x(\bf{h}) = \sigma_x^2 - \gamma_x(\bf{h}), \quad \text{if} \quad \sigma^2_x = 1.0 \]

The correlogram is very easy to interprete since it is the correlation over the specified lag distance.

Cosimulation#

Cosimulation: a set of methods for simulating a spatial realization of a primary feature conditional to a secondary feature realization. All of these methods attempt to capture the primary feature spatial continuity and conditioning to local data while honoring the relationship with the secondary feature.

Each cosimulation method will have a conditioning priority,

collocated cokriging - prioritizes the primary feature histogram and variogram and may honor the correlation coefficient between the two primary and secondary features.

the relationship with between the primary and secondary features is limited to a correlation coefficient (after Gaussian transform of both, i.e., in Gaussian space)
in the case of dense conditioning data the relationship observed at the data locations will override the correlation coefficient.

cloud transform - honors the specific form of the bivariate relationship (cloud) between the two features but may not honor the histogram nor the variogram.

the precise scatter plot between the primary and secondary feature is prioritized

These methods start with a completed realization of the secondary feature, for example,

first simulate a copper realization and then cosimulate the zinc (primary feature) realization given the copper (secondary feature) realization with the collocated cokriging and the correlation coefficient between Gaussian transformed copper and zinc data
first simulate a porosity realization and then cosimulate the permeability (primary feature) realization given the porosity (secondary feature) realization with cloud transform and the scatter plot of Gaussian transformed porosity and permeability

With cosimulation there is a increasing likelihood that the multiple information sources are contradictory, when this occurs the lower priority information source is preferentially sacrificed

While the full cokriging approach for cosimulation is available, due to the inference burden of modeling all the variograms and cross variograms, it is typically not used in practice.

Covariance Function#

Variogram Calculation: a measure of similarity vs. distance. Calculated as the average product of values separated by a lag vector centered by the square of the mean.

\[ C_x(\bf{h}) = \frac{1}{n} \sum_{\alpha = 1}^{n} x(\bf{u}_{\alpha}) \cdot x(\bf{u}_{\alpha} + \bf{h}) - \overline{x}^2 \]

The covariance function is the variogram upside down,

\[ C_z(\bf{h}) = \sigma_z^2 - \gamma_z(\bf{h}) \]

We model variograms, but inside the kriging and simulation methods they are converted to covariance values for numerical convenience, i.e., covariances result in diagonally dominant matrices that are more stable when inverted to calculate the kriging weights.

Cumulative Distribution Function (CDF)#

Univariate Distributions: the sum of a discrete PDF or the integral of a continuous PDF. Here are the important concepts,

the CDF is stated as \(F_x(x)\), note the PDF is stated as \(f_x(x)\)
is the probability that a random sample, \(X\), is less than or equal to a specific value \(x\); therefore, the y axis is cumulative probability

\[ F_x(x) = P(X \le x) = \int_{-infty}^x f(u) du \]

for CDFs there is no bin assumption; therefore, bins are at the resolution of the data.
monotonically non-decreasing function, because a negative slope would indicate negative probability over an interval.

The requirements for a valid CDF include,

non-negativity constraint:

\[ F_x(x) = P(X \le x) \ge 0.0, \quad \forall x \]

valid probability:

\[ 0.0 \le F_x(x) \le 1.0, \quad \forall x \]

cannot have negative slope:

\[ \frac{dF_x(x)}{dx} \ge 0.0, \quad \forall x \]

minimum and maximum (ensuring probability closure) values:

\[ \text{min}(F_x(x)) = 0.0 \quad \text{max}(F_x(x)) = 1.0 \]

Cyclicity (variogram model)#

Variogram Calculation and Modeling: may be linked to underlying geological periodicity, cycles in the deposition that result in layers.

sometimes noise in the experimental variogram due to too few data is mistaken as cyclicity
the wavelength of the cycles in the experimental variogram is the wavelength of the spatial cycles, i.e. the extent of the layers

Data (data aspects)#

Geostatistical Concepts: when describing spatial dataset these are the fundamental aspects,

Data coverage - what proportion of the population has been sampled for this?

In general, hard data has high resolution (small scale, volume support), but with poor data coverage (measure only an extremely small proportion of the population, for example,

Core coverage deepwater oil and gas - well core only sample one five hundred millionth to one five billionth of a deepwater reservoir, assuming 3 inch diameter cores with 10% core coverage in vertical wells with 500 m to 1,500 m spacing
Core coverage mining grade control - diamond drill hole cores sample one eight thousandth to one thirty thousandth of ore body, assuming HQ 63.5 mm diameter cores with 100% core coverage in vertical drill holes with 5 m to 10 m spacing

Soft data tend to have excellent (often complete) coverage, but with low resolution,

Seismic reflection surveys and gradiometric surveys - data is generally available over the entire volume of interest, but resolution is low and generally decreasing with depth

Data Scale (support size) - What is the scale or volume sampled by the individual samples? For example,

core tomography images corse samples at the pore scale, 1 - 50 \(\mu m\)
gamma ray well log sampled at 0.3 m intervals with 1 m penetration away from the bore hole
ground-based gravity gradiometry map with 20 m x 20 m x 100 m resolution

Data Information Type - What does the data tell us about the subsurface? For example,

grain size distribution that may be applied to calibrate permeability and saturations
fluid type to assess the location of the oil water contact
dip and continuity of important reservoir layers to access connectivity
mineral grade to map high, mid and low grade ore shells for mine planning

Data Analytics#

Geostatistical Concepts: the use of statistics with visualization to support decision making.

Dr. Pyrcz says that data analytics is the same as statistics.

Debiasing with Secondary Data#

Declustering to Correct Sampling Bias: when the entire range of the feature of interest is not sampled we cannot use data weights to debias our statistics, instead we use,

a secondary feature that is sampled over the entire area of interest
a relationship between the secondary feature and the primary feature

to impute the missing part of the primary feature spatial distribution.

The relationship between the primary and secondary feature may be a statistical model with extrapolation to the missing part of the primary feature distribution or based on some other information such as a phyical model

Decision Criteria#

Geostatistical Concepts: a feature that is calculated by applying the transfer function to the subsurface model(s) to support decision making. The decision criteria represents value, health, environment and safety. For example:

contaminant recovery rate to support design of a pump and treat soil remediation project
oil-in-place resources to determine if a reservoir should be developed
Lorenz coefficient heterogeneity measure to classify a reservoir and determine mature analogs
recovery factor or production rate to schedule production and determine optimum facilities
recovered mineral grade and tonnage to determine economic ultimate pit shell

Declustering#

Declustering to Correct Sampling Bias: various methods that assign weights to spatial samples based on local sampling density, such that the weighted statistics are likely more representative of the population. Data weights are assigned so that,

samples in densely sampled areas receive less weight
samples in sparsely sampled areas receive more weight

There are various declustering methods:

cell-based declustering
polygonal declustering
kriging-based declustering

It is important to note that no declustering method can prove that for every data set the resulting weighted statistics will improve the prediction of the population parameters, but in expectation these methods tend to reduce the bias.

Declustering (statistics)#

Declustering to Correct Sampling Bias: once declustering weights are calculated for a spatial dataset, then declustered statistics are applied as input for only subsequent analysis or modeling. For example,

the declustered mean is assigned as the stationary, global mean for simple kriging
the weighted CDF from all the data with weights are applied to sequential Gaussian simulation to ensure the back-transformed realizations approach the declustered distribution

Any statistic can be weighted, including the entire CDF! Here are some examples of weighted statistics, given declustering weights, \(w(\bf{u}_j)\), for all data \(j=1,\ldots,n\).

weighted sample mean,

\[ \overline{x}_{wt} = \frac{\sum_{i=1}^n w(\bf{u}_j) \cdot z(\bf{u}_j)}{\sum_{i=1}^n w(\bf{u}_j)} \]

where \(n\) is the number of data.

weighted sample variance,

\[ s^2_{x_{wt}} = \frac{1}{\sum_{i=1}^n w(\bf{u}_j) - 1} \cdot \sum_{i=1}^n w(\bf{u}_j) \cdot \left( x(\bf{u}_j) - \overline{x}_{wt} \right)^2 \]

where \(\overline{x}_{wt}\) is the declustered mean.

weighted covariance,

\[ C_{x,y_{wt}} = \frac{1}{\sum_{i=1}^n w(\bf{u}_j) } \cdot \sum_{i=1}^n w(\bf{u}_j) \cdot \left( x(\bf{u}_j) - \overline{x}_{wt} \right) \cdot \left( y(\bf{u}_j) - \overline{y}_{wt} \right) \]

where \(\overline{x}_{wt}\) and \(\overline{y}_{wt}\) are the declustered means for features \(X\) and \(Y\).

the entire CDF,

\[ F_z(z) \approx \sum_{j=1}^{n(Z<z)} w(\bf{u}_j) \]

where \(n(Z<z)\) is the number of sorted ascending data less than threshold \(z\). We show this as approximative as this is simplified and at data resolution and without an interpolation model.

It is important to note that no declustering method can prove that for every data set the resulting weighted statistics will improve the prediction of the population parameters, but in expectation these methods tend to reduce the bias.

Deterministic Model#

Geostatistical Concepts: a model that assumes the system or process that is completely predictable

often-based on engineering and geoscience physics and expert judgement
for example, numerical flow simulation or stratigraphic bounding surfaces interpreted from seismic
for this course we also state that data-driven estimation models like

Advantages:

integration of physics and expert knowledge
integration of various information sources

Disadvantages:

often quite time consuming
often no assessment of uncertainty, focus on building one model

Discrete Feature#

Geostatistical Concepts: a categorical feature or a continuous feature that is binned or grouped, for example,

porosity between 0 and 20% assigned to 10 bins = {0 - 2%, 2% - 4%, \ldots ,20%}
Mohs hardness = \(\{1, 2, \ldots, 10\}\) (same at categorical feature)

Dispersion Variance#

Volume Variance: a generalized form of variance that accounts for the volume support of the samples, \(\cdot\), or model, \(v\), and the area of interest, \(V\) represented for data support size as,

\[ D^2(\cdot, V) \]

and for model support size as,

\[ D^2(v, V) \]

for the case of data support size over the volume of interest (including all the spatial data samples) this simplifies to the variance,

\[ \sigma^2 = D^2(\cdot,V) \]

therefore, the well-known variance is a special case of dispersion variance. Dispersion variance under the assumption of stationary mean, variance and variogram can be calculated through this equation,

\[ D^2(v,V) = \overline{\gamma}_{V,V} - \overline{\gamma}_{v,v} \]

where \(\overline{\gamma}_{V,V}\) and \(\overline{\gamma}_{v,v}\) are variogram models integrated over volume \(V\) and \(v\) respectively.

Distribution Transformations#

Distribution Transformations: a mapping from one distribution to another distribution through percentile values, resulting in a new histogram, PDF, and CDF. We perform distribution transformations in geostatistical methods and workflows because,

inference - to correct a feature distribution to an expected shape, for example, correcting for too few or biased data
theory - a specific distribution assumption is required for a workflow step, for example, Gaussian distribution with mean of 0.0 and variance of 1.0 is required for sequential Gaussian simulation
data preparation or cleaning - to correct for outliers, the transformation will map the outlier into the target distribution no longer as an outlier

How do we perform distribution transformations?

We transform the values from the cumulative distribution function (CDF), \(F_{X}\), to a new CDF , \(G_{Y}\). This can be generalized with the quantile - quantile transformation applied to all the sample data:

The forward transform:

\[ Y = G_{Y}^{-1}(F_{X}(X)) \]

The reverse transform:

\[ X = F_{X}^{-1}(G_{Y}(Y)) \]

This may be applied to any data, including parametric or nonparametric distributions. We just need to be able to map from one distribution to another through percentiles, so it is a:

rank preserving transform, for example, P25 remains P25 after distribution transformation

Ergodic Fluctuations#

Geostatistical Concepts: statistical fluctuations in the input statistics observed over multiple simulated realizations. Here are some general concepts,

the magnitude of the fluctuations are a function of the ratio of spatial continuity to the size of the model
if model is large relative to spatial continuity range then fluctuations should minimal
if model is small relative to spatial continuity range then fluctuations may be extreme

When we check out simulated realizations, we should expect some fluctuations in the reproduction of the histogram, variogram and correlation coefficients.

best practice is to check the expectation in these statistics vs. the inputs

Estimation#

Geostatistical Concepts: is process of obtaining the single best value to represent a feature at an unsampled location, or time. Some additional concepts,

local accuracy takes precedence over global spatial variability
too smooth, not appropriate for any transform function that is sensitive to heterogeneity
for example, inverse distance and kriging
many predictive machine learning models focus on estimation (e.g., k-nearest neighbours, decision tree, random forest, etc.)

Facies#

Indicator Simulation: grouping rock into discrete groups, a new categorical feature. A method to group, categorize rock in a useful manner that improves,

characterization through statistics, e.g., distributions and variograms
prediction of features, e.g., porosity and permeability away from wells

For oil and gas the term is facies, but for mining the terms rock types and zones are commonly used. Also, in oil and gas there are multiple types of facies used,

lithofacies - based on rock-related features only, usually small-scale porosity and permeability clusters and sedimentary structures, for example, shale, sandstone, dolomite, limestone, laminated sandstone, hummocky cross-stratification, etc.
depofacies - usually include multiple lithofacies while integrating the concept of geometry, while integrating reservoir significant scale that impacts reservoir flow, well connectivity, for example, channel axis, outer sheet, etc.
seismic facies - large-scale distinct acoustic and elastic property and seismic geomorphological expressions, integrated as the large-scale reservoir framework, parallel continuous high amplitude, chaotic amplitudes, mounded discontinuous low amplitudes, truncation, onlap, offlap, etc.

Here are some important considerations for determining facies,

facies / rock type is an important decision for subsurface modeling. Facies or rock type determination should remain a collaborative decision integrating expertise from the entire project team (Geologists, Reservoir Modelers, Reservoir Engineers, Petro- and Geophysicists).
facies or rock types must improve subsurface prediction away from the data or they do not add value
number of facies is a balancing act between accuracy of geological concepts and statistical inference, and modeling effort
reservoir modeling is often hierarchical, for example, elements contain multiple depofacies, depofacies contain multiple lithofacies, lithofacies have specific porosity and permeability distributions
often 80-90% of reservoir heterogeneity is captured in the facies models

Here is a summary of criteria for facies, rock types, or any discrete grouping for a subsurface model,

separation of rock properties - facies must be separable over the features of interest that impact subsurface environmental and economic performance, for example, grade, porosity and permeability, etc.
identifiable in data - facies must be identifiable with the most common data available, for example, facies identifiable only in cores are not useful if most wells have only logs
map-able away from data - facies must be easier to predict away from data than the rock properties of interest directly, facies improve prediction
sufficient sampling - there must be enough data to allow for inference of reliable statistics for feature within each facies, i.e., by-facies statistics

Facies Simulation#

Indicator Simulation: algorithms to calculate simulated realizations of categorical features, commonly realizations of facies or rock types for mining. There are 4 common methods for simulating facies,

sequential indicator simulation - based on indicator variograms for each facies category and indicator kriging to calculate the local cumulative distribution function within the sequential simulation approach
multiple point simulation - image reproduction method based on categorical training images to calculate the local cumulative distribution function with sequential simulation approach
object-based simulation - marked point process based on Monte Carlo simulation of geometric parameters and sequential, stochastic placement of the geometry in the volume of interest
generative AI - trained deep learning generators to calculate categorical realizations

Feature (also variable)#

Geostatistical Concepts: any property measured or observed in a study

for example, porosity, permeability, mineral concentrations, saturations, contaminant concentration, etc.
in data mining / machine learning this is known as a feature, statisticians call these variables
measure often requires significant analysis, interpretation, etc.
when features are modified and combined to improve our models we call this feature engineering

Frequentist Probability#

Probability Concepts: measure of the likelihood that an event will occur based on frequencies observed from an experiment. For random experiments and well-defined settings (such as coin tosses),

\[ \text{Prob}(A) = P(A) = \lim_{n \to \infty} \frac{n(A)}{n} \]

where:

\(n(A)\) = number of times event \(A\) occurred \(n\) = number of trails

For example, possibility of drilling a dry hole for the next well, encountering sandstone at a location (\(\bf{u}_{\alpha}\)), exceeding a rock porosity of \(15 \%\) at a location (\(\bf{u}_{\alpha}\)).

Gamma Bar#

Volume Variance: volume integrated variogram where the tail and head are integrated over volumes \(v\) and \(V\) respectively.

\[ \overline{\gamma}(v,V) \]

The gamma bar value is calculated by the integration,

\[ \overline{\gamma}(v,V) = \int_{v} \int_{V} \gamma (\bf{u},\bf{u}^{\prime}) d\bf{u} d\bf{u}^{\prime} \]

a practical approach to calculate gamma bar values is to discretize both volumes \(v\) and \(V\) and to average the variogram values over the combinatorial,

\[ \overline{\gamma}(v,V) = \frac{1}{n_v \cdot n_V} \sum_{iv=1}^{n_v} \sum_{iV=1}^{n_V} \gamma \left(v(\bf{u}_{\alpha}),V(\bf{u}_{\beta})\right) \]

Gamma bar values are applied to calculate spatial continuity integrating volume support of data and model cells. Note, for kriging matrices the volume integrated covariance, known as “c bar”, \(\overline{C}(v,V)\),

\[ \overline{C}(v,V) = \sigma^2 - \overline{\gamma}(v,V) \]

Geometric Anisotropy (variogram interpretation)#

Variogram Calculation: the same variogram structures are observed over all directions, but the range depends on the direction

commonly, the vertical range of correlation is much less than the horizontal range due to the formation of ‘layering’ due to sedimentary processes. Walter’s Law in stratigraphy states that the vertical sequence occurs horizontally at difference scales
the ratio of the horizontal to vertical range, \(a_{hori}:a_{vert}\) is commonly known as the horizontal to vertical anisotropy ratio, for example, a fluvial setting may have a 10:1 horizontal to vertical anisotropy ratio.
geometric anisotropy is common for the horizontal directions also, the ratio of horizontal major direction: horizontal minor direction range, \(a_{maj}:a_{min}\), is commonly known as the horizontal major to minor anisotropy ratio

Geometric Anisotropy (variogram model)#

Variogram Calculation and Modeling: we assume geometric anisotropy to model 2D and 3D variogram over all directions from experimental variograms calculated only in primary directions.

this model provides a valid interpolation of the variogram between the primary directions
the geometric anisotropy model is based on this lag distance,

\[ h = \sqrt{ \left( \frac{\bf{h}_{maj}}{a_{maj}} \right)^2 + \left( \frac{\bf{h}_{min}}{a_{min}} \right)^2 + \left( \frac{\bf{h}_{vert}}{a_{vert}} \right)^2 } \]

where \(a_{maj}, a_{maj}, a_{vert}\) are the ranges in the major, minor and vertical directions and \(\bf{h}_{maj}, \bf{h}_{maj}, \bf{h}_{vert}\) are the lag distance components in the major, minor and vertical directions.

Geostatistics#

Geostatistical Concepts: a branch of applied statistics that integrates:

the spatial (geological) context
the spatial relationship
volumetric support / scale
uncertainty

I include all spatial statistics with geostatistics, some disagree with me on this. From my experience, any useful statistical method for modeling spatial phenomenon is adopted and added to the geostatistics toolkit! Geostatistics is an expanding and evolving field of study.

Global Accuracy#

Simulation: we honor (match) global measures calculated over the entire volume of interest, for example, the mean, variance, entire cumulative distribution function, etc.

Global Measures#

Simulation: statistical summaries over the entire volume of interest, for example, the mean, variance, entire cumulative distribution function, etc.

Hard Data#

Geostatistical Concepts: data that has a high degree of certainty, usually from a direct measurement from the rock

for example, well core-based and well log-based porosity and lithofacies

In general, hard data has high resolution (small scale, volume support), but with poor coverage (measure only an extremely small proportion of the population, for example,

Core coverage deepwater oil and gas - well core only sample one five hundred millionth to one five billionth of a deepwater reservoir, assuming 3 inch diameter cores with 10% core coverage in vertical wells with 500 m to 1,500 m spacing
Core coverage mining grade control - diamond drill hole cores sample one eight thousandth to one thirty thousandth of ore body, assuming HQ 63.5 mm diameter cores with 100% core coverage in vertical drill holes with 5 m to 10 m spacing

Histogram#

Univariate Distributions: a representation of the univariate statistical distribution with a plot of frequency over an exhaustive set of bins over the range of possible values. These are the steps to build a histogram,

Divide the continuous feature range of possible values into \(K\) equal size bins, \(\delta x\):

\[ \Delta x = \left( \frac{x_{max} - x_{min}}{K} \right) \]

or use available category labels for categorical features.

Count the number of samples (frequency) in each bin, \(n_k\), \quad \(\forall \quad k=1,\ldots,K\).
Plot the frequency vs. the bin label (use bin centroid if continuous)

Note, histograms are typically plotted as a bar chart.

Hybrid Model#

Geostatistical Concepts: system or process that includes a combination of both deterministic model and stochastic model

most geostatistical models are hybrid models
for example, additive deterministic trend models and stochastic residual models

Independence (probability)#

Probability Concepts: events \(A\) and \(B\) are independent if and only if the following relations are true,

\(P(A \cap B) = P(A) \cdot P(B)\)
\(P(A|B) = P(A)\)
\(P(B|A) = P(B)\)

If any of these are violated we suspect that there exists some form of relationship.

Indicator Kriging#

Indicator Kriging: the application of simple kriging to a set of indicator transforms, one for each threshold for a continuous features, or one for each category for cateogorical features, of the data to directly estimate the local uncertainty model cumulative distribution function at an unknown location, \(\bf{u}\).

The indicator kriging estimator is defined as,

\[ p^*_{IK}(\mathbf{u}; k) = \sum_{\alpha=1}^n \lambda_\alpha(k) \cdot i(\mathbf{u}_\alpha; k) + \left( 1 - \sum_{\alpha=1}^n \lambda_\alpha(k) \right) \cdot p(k) \]

where \(\lambda_\alpha(k)\) is the indicator kriging weight for data \(\alpha\) and category or threshold \(k\), \(i(\mathbf{u}_\alpha; k)\) is the \(k\) category or threshold indicator transform of the data at location \(\mathbf{u}_\alpha\) and \(p(k)\) is the global or local mean categorical probability (if a trend model is provided) or the continuous cumulative probability.

by estimating probability \(p^*_{IK}(\mathbf{u}; k)\) for each threshold or category we are directly estimating the distribution of uncertainty at an unsampled location without any distribution assumption (i.e., no Gaussian assumption)

The steps for indicator kriging are,

Establish a series of thresholds or categories:

for categorical features, the categories are given
for continuous features, the thresholds should cover the entire range of the feature with enough thresholds to represent the local distributions of uncertainty (so we can resolve the local cumulative distribution function’s)
for continuous features, the thresholds may be related to critical thresholds, e.g., environmental limits, economic thresholds.

Apply indicator transformation to the data
Calculate the indicator variogram from the indicator transformation of the data for each threshold or category
Apply indicator kriging to estimate the cumulative probability for continuous features or probability for categorical features at an unsampled location, using the indicator variogram for each threshold or category
Correct the final cumulative distribution function to be valid, this is called the order relations correction.

since each threshold’s cumulative probability is estimated by indicator kriging separately, the resulting cumulative distribution function may not be valid, i.e., non-monotonic increasing
since each category’s probability is estimated by indictor kriging separately, the probabilities may not sum to one

General comments,

a variogram model is needed for each threshold or category; therefore, a more difficult inference problem, however, there is greater flexibility as spatial continuity may vary by value, for example, greater spatial continuity for upper tail of the feature distribution
more readily integrates data of different types through soft data encoding, for example, we could assign a random variable (distribution) at data locations instead of a single value

Indicator Transform#

Indicator Kriging: indicator coding a random variable to a probability relative to a category or a threshold.

If \(i(\bf{u}:z_k)\) is an indicator for a categorical variable,

what is the probability of a realization equal to a category?

\[\begin{split} i(\bf{u}; z_k) = \begin{cases} 1, & \text{if } Z(\bf{u}) = z_k \\ 0, & \text{if } Z(\bf{u}) \ne z_k \end{cases} \end{split}\]

for example,

given threshold, \(z_2 = 2\), and data at \(\bf{u}_1\), \(z(\bf{u}_1) = 2\), then \(i(bf{u}_1; z_2) = 1\)
given threshold, \(z_1 = 1\), and a RV away from data, \(Z(\bf{u}_2)\) then is calculated as \(F^{-1}_{\bf{u}_2}(z_1)\) of the RV as \(i(\bf{u}_2; z_1) = 0.23\)

If \(i(\bf{u}:z_k)\) is an indicator for a continuous variable,

what is the probability of a realization less than or equal to a threshold?

\[\begin{split} i(\bf{u}; z_k) = \begin{cases} 1, & \text{if } Z(\bf{u}) \le z_k \\ 0, & \text{if } Z(\bf{u}) > z_k \end{cases} \end{split}\]

for example,

given threshold, \(z_1 = 6\%\), and data at \(\bf{u}_1\), \(z(\bf{u}_1) = 8\%\), then \(i(\bf{u}_1; z_1) = 0\)
given threshold, \(z_4 = 18\%\), and a RV away from data, \(Z(\bf{u}_2) = N\left[\mu = 16\%,\sigma = 3\%\right]\) then \(i(\bf{u}_2; z_4) = 0.75\)

The indicator coding may be applied over an entire random function by indicator transform of all the random variables at each location.

Indicator Variogram#

Indicator Kriging: variogram’s calculated and modelled from the indicator transform of spatial data and used for indicator kriging. The indicator variogram is,

\[ \gamma_i(\mathbf{h}; z_k) = \frac{1}{2N(\mathbf{h})} \sum_{\alpha=1}^{N(\mathbf{h})} \left[ i(\mathbf{u}_\alpha; z_k) - i(\mathbf{u}_\alpha + \mathbf{h}; z_k) \right]^2 \]

where \(i(\mathbf{u}_\alpha; z_k)\) and \(i(\mathbf{u}_\alpha + \mathbf{h}; z_k)\) are the indicator transforms for the \(z_k\) threshold at the tail location \(\mathbf{u}_\alpha\) and head location \(\mathbf{u}_\alpha + \mathbf{h}\) respectively.

for hard data the indicator transform \(i(\bf{u},z_k)\) is either 0 or 1, in which case the \(\left[ i(\mathbf{u}_\alpha; z_k) - i(\mathbf{u}_\alpha + \mathbf{h}; z_k) \right]^2\) is equal to 0 when the values at head and tail are both \(\le z_k\) (for continuous features) or \(= z_k\) (for categorical features), the same relative to the threshold, or 1 when they are different.
therefore, the indicator variogram is \(\frac{1}{2}\) the proportion of pairs that change! The indicator variogram can be related to probability of change over a lag distance, \(h\).
the sill of an indicator variogram is the indicator variance calculated as,

\[ \sigma_i^2 = p \cdot (1 - p) \]

where \(p\) is the proportion of 1’s (or zeros as the function is symmetric over proportion)

Inference, Inferential Statistics#

Geostatistical Concepts: this is a big topic, but for the course I provide this simplified, functional definition, given a random sample from a population, describe the population, for example,

given the well samples, describe the reservoir
given the drill hole samples, describe the ore body

Intersection of Events (probability)#

Probability Concepts: the intersection of outcomes, the probability of \(A\) and \(B\) is represented as,

\[ P(A \cap B) = P(A,B) \]

under the assumption of independence of \(A\) and \(B\) the probability of \(A\) and \(B\) is,

\[ P(A,B) = P(A) \cdot P(B) \]

Isotropic or Omnidirectional (variogram)#

Variogram Calculation: variogram calculated such that direction, azimuth in 2D horizontal or azimuth and dip in 3D, are not considered,

for an isotropic experimental variogram set the azimuth tolerance as 90 degrees, then the experimental variogram is insensitive to direction.

or variogram is modelled such that direction, azimuth in 2D horizontal or azimuth and dip in 3D, does not impact the model,

for an isotropic variogram model set the major range equal to the minor range, then the variogram model is insensitive to direction

Joint Probability#

Probability Concepts: probability that considers more than one event occurring together, the probability of \(A\) and \(B\) is represented as,

\[ P(A \cap B) = P(A,B) \]

or the probability of \(A\), \(B\) and \(C\) is represented as,

\[ P(A \cap B \cap C) = P(A,B,C) \]

under the assumption of independence of \(A\), \(B\) and \(C\) the joint probability may be calculated as,

\[ P(A,B,C) = P(A) \cdot P(B) \cdot P(C) \]

Local Accuracy#

Simulation: estimates that minimizes the estimation variance. Local measures and local accuracy do not consider matching global measures like the entire histogram, including the global mean and variance.

Local Measures#

Simulation: feature estimates at individual location. Local accuracy is estimates that minimizes the estimation variance at each location. Local measures and local accuracy do not consider matching global measures like the entire histogram, including the global mean and variance.

Kriging#

Simple and Ordinary Kriging: spatial estimation approach that relies on linear weights that account for spatial continuity, data closeness and redundancy. The kriging estimate is,

\[ z^*(\bf{u}) = \sum_{\alpha = 1}^{n} \lambda_{\alpha} \cdot z(\bf{u}_{\alpha}) + \left( 1.0 - \sum_{\alpha=1}^n \lambda_{\alpha} \right) \cdot m_z \]

the right term is the unbiasedness constraint, one minus the sum of the weights is applied to the global mean.

In the case where the trend, \(t(\bf{u})\), is removed, we now have a residual, \(y(\bf{u})\),

\[ y(\bf{u}) = z(\bf{u}) - t(\bf{u}) \]

the residual mean is zero so we can simplfy our kriging estimate as,

\[ y^*(\bf{u}) = \sum_{\alpha = 1}^{n} \lambda_{\alpha} \cdot y(\bf{u}_{\alpha}) \]

The simple kriging weights are calculated by solving a linear system of equations,

\[ \sum_{j=1}^n \lambda_j C(\bf{u}_i,\bf{u}_j) = C(\bf{u},\bf{u}_i), \quad i=1,\ldots,n \]

that may be represented with matrix notation as,

\[\begin{split} \begin{bmatrix} C(\bf{u}_1,\bf{u}_1) & C(\bf{u}_1,\bf{u}_2) & \dots & C(\bf{u}_1,\bf{u}_n) \\ C(\bf{u}_2,\bf{u}_1) & C(\bf{u}_2,\bf{u}_2) & \dots & C(\bf{u}_2,\bf{u}_n) \\ \vdots & \vdots & \ddots & \vdots \\ C(\bf{u}_n,\bf{u}_1) & C(\bf{u}_n,\bf{u}_2) & \dots & C(\bf{u}_n,\bf{u}_n) \\ \end{bmatrix} \cdot \begin{bmatrix} \lambda_1 \\ \lambda_2 \\ \vdots \\ \lambda_n \\ \end{bmatrix} = \begin{bmatrix} C(\bf{u}_1,\bf{u}) \\ C(\bf{u}_2,\bf{u}) \\ \vdots \\ C(\bf{u}_n,\bf{u}) \\ \end{bmatrix} \end{split}\]

This system may be derived by substituting the equation for kriging estimates into the equation for estimation variance, and then setting the partial derivative with respect to the weights to zero.

we are optimizing the weights to minimize the estimation variance

this system integrates the,

spatial continuity as quantified by the variogram (and covariance function to calculate the covariance, \(C\), values)
redundancy the degree of spatial continuity between all of the available data with themselves, \(C(\bf{u}_i,\bf{u}_j)\)
closeness the degree of spatial continuity between the available data and the estimation location, \(C(\bf{u}_i,\bf{u})\)

Kriging provides a measure of estimation accuracy known as kriging variance (a specific case of estimation variance).

\[ \sigma^{2}_{E}(\bf{u}) = C(0) - \sum^{n}_{\alpha = 1} \lambda_{\alpha} C(\bf{u}_0 - \bf{u}_{\alpha}) \]

Kriging estimates are best in that they minimize the above estimation variance.

Properties of kriging estimates include,

Exact interpolator - kriging estimates with the data values at the data locations
Kriging variance - a measure of uncertainty in a kriging estimate. Can be calculated before getting the sample information, as the kriging estimation variance is not dependent on the values of the data nor the kriging estimate, i.e. the kriging estimator is homoscedastic.
Spatial context - kriging takes integrates spatial continuity, closeness and redundancy; therefore, kriging accounts for the configuration of the data and structural continuity of the feature being estimated.
Scale - kriging by default assumes the estimate and data are at the same point support, i.e., mathematically represented as points in space with zero volume. Kriging may be generalized to account for the support volume of the data and estimate,
Multivariate - kriging may be generalized to account for multiple secondary data in the spatial estimate with the cokriging system. We will cover this later.
Smoothing effect - of kriging can be forecasted as the missing variance. The missing variance over local estimates is the kriging variance.

Kriging (simple vs. ordinary)#

Simple and Ordinary Kriging: the difference between simple kriging and ordinary kriging is related to the assumption of stationarity in the mean.

Simple Kriging - global stationary mean is an input provided by the user, i.e., the mean is assumed to be stationary.

\[ z^*(\bf{u}) = \sum_{\alpha = 1}^{n} \lambda_{\alpha} \cdot z(\bf{u}_{\alpha}) + \left( 1.0 - \sum_{\alpha=1}^n \lambda_{\alpha} \right) \cdot m_z \]

for the kriging estimate, one minus the sum of the data weights is applied to the global stationary mean.
at data locations all weight is applied to the collocated datum, and beyond the variogram range from all data all weight is applied to the global mean.
the simple kriging weights, \(\lambda_1, \lambda_2, \dots, \lambda_n\), are calculated by solving this system of equations represented in matrix notation as,

\[\begin{split} \begin{bmatrix} C(\bf{u}_1,\bf{u}_1) & C(\bf{u}_1,\bf{u}_2) & \dots & C(\bf{u}_1,\bf{u}_n) \\ C(\bf{u}_2,\bf{u}_1) & C(\bf{u}_2,\bf{u}_2) & \dots & C(\bf{u}_2,\bf{u}_n) \\ \vdots & \vdots & \ddots & \vdots \\ C(\bf{u}_n,\bf{u}_1) & C(\bf{u}_n,\bf{u}_2) & \dots & C(\bf{u}_n,\bf{u}_n) \\ \end{bmatrix} \cdot \begin{bmatrix} \lambda_1 \\ \lambda_2 \\ \vdots \\ \lambda_n \\ \end{bmatrix} = \begin{bmatrix} C(\bf{u}_1,\bf{u}) \\ C(\bf{u}_2,\bf{u}) \\ \vdots \\ C(\bf{u}_n,\bf{u}) \\ \end{bmatrix} \end{split}\]

Ordinary Kriging - local nonstationary mean calculated by the kriging system. The global mean is not an input, instead the local nonstationary mean is calculated by ordinary kriging. This relaxes the assumption of a stationary mean.

this is accomplished with the addition of a data kriging weights must sum to one constraint in the kriging system, if the weights must sum to one this removes the right hand unbiasedness constraint from the kriging estimate with the global mean,

\[ z^*(\bf{u}) = \sum_{\alpha = 1}^{n} \lambda_{\alpha} \cdot z(\bf{u}_{\alpha}) \]

at data locations all weight is applied to the collocated datum, and beyond the variogram range from all data, all weight is applied to the local mean calculated from local data
the ordinary kriging weights, \(\lambda_1, \lambda_2, \dots, \lambda_n\), are calculated by solving this system of equations, the simple kriging system with the added constraint that the weight sum to 1.0 represented in matrix notation as,

\[\begin{split} \begin{bmatrix} C(\bf{u}_1,\bf{u}_1) & C(\bf{u}_1,\bf{u}_2) & \dots & C(\bf{u}_1,\bf{u}_n) & 1 \\ C(\bf{u}_2,\bf{u}_1) & C(\bf{u}_2,\bf{u}_2) & \dots & C(\bf{u}_2,\bf{u}_n) & 1 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ C(\bf{u}_n,\bf{u}_1) & C(\bf{u}_n,\bf{u}_2) & \dots & C(\bf{u}_n,\bf{u}_n) & 1 \\ 1 & 1 & \dots & 1 & 0 \\ \end{bmatrix} \cdot \begin{bmatrix} \lambda_1 \\ \lambda_2 \\ \vdots \\ \lambda_n \\ \mu \\ \end{bmatrix} = \begin{bmatrix} C(\bf{u}_1,\bf{u}) \\ C(\bf{u}_2,\bf{u}) \\ \vdots \\ C(\bf{u}_n,\bf{u}) \\ 1 \\ \end{bmatrix} \end{split}\]

Kriging Variance#

Simple and Ordinary Kriging: a measure of uncertainty in a kriging estimate.

\[ \sigma^{2}_{z}(\bf{u}) = C(0) - \sum^{n}_{\alpha = 1} \lambda_{\alpha} C(\bf{u}_0 - \bf{u}_{\alpha}) \]

Kriging variance is a specific case of estimation variance,

\[ \sigma^{2}_{E}(\bf{u}) = E \left[ \left(z(\bf{u}) - z^{*}(\bf{u}) \right)^2 \right] \]

Can be calculated before getting the sample information, as the kriging estimation variance is not dependent on the values of the data nor the kriging estimate, i.e. the kriging estimator is homoscedastic.

Kriging-based Declustering#

Declustering to Correct Sampling Bias: a declustering method to assign weights to spatial samples based on local sampling density, such that the weighted statistics are likely more representative of the population. Data weights are assigned so that,

samples in densely sampled areas receive less weight
samples in sparsely sampled areas receive more weight

Kriging-based declustering proceeds as follows:

calculate and model the experimental variogram
apply kriging to calculate estimates over a high-resolution grid covering the area of interest
calculate the sum of the weights assigned to each data
assign data weights proportional to this sum of weights

The weights are calculated as:

\[ w(\bf{u}_j) = n \cdot \frac{\sum_{iy}^{ny} \sum_{ix}^{nx} \lambda_j}{\sum_{i=1}^n \left[ \sum_{iy}^{ny} \sum_{ix}^{nx} \lambda_{j,ix,iy} \right]} \]

where \(nx\) and \(ny\) are the number of cells in the grid, \(n\) is the number of data, and \(\lambda_{j,ix,iy}\) is the weight assigned to the \(j\) data at the \(ix,iy\) grid cell.

Here is an important point for kriging-based declustering,

like polygonal declustering, kriging-based declustering is sensitive to the boundaries of the area of interest; therefore, the weights assigned to the data near the boundary of the area of interest may change radically as the area of interest is expanded or contracted

Also, kriging-based declustering integrates the spatial continuity model from variogram model. Consider the following possible impacts of the variogram model on the declustering weights,

if there is 100% relative nugget effect, there is no spatial continuity and therefore, all data receives equal weight. Note for the equation above this results in a divide by 0.0 error that must be checked for in the code.
geometric anisotropy may significantly impact the weights as data aligned over specific azimuths are assessed as closer or further in terms of covariance

Kolmogorov’s 3 Probability Axioms#

Probability Concepts: these are Kolmogorov’s 3 axioms for valid probabilities,

Probability of an event is a non-negative number.

\[ Prob(𝐴) \ge 0 \]

Probability of the entire sample space, all possible outcomes, \(\Omega\), is one (unity), also known as probability closure.

\[ Prob(\Omega) = 1 \]

Additivity of mutually exclusive events for unions.

\[ Prob\left(⋃_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} Prob(A_i) \]

e.g., probability of \(A_1\) and \(A_2\) mutual exclusive events is, \(Prob(A_1 + A_2) = Prob(A_1) + Prob(A_2)\)

Lag (variogram)#

Variogram Calculation: the separation between paired data described by a vector, \(\bf{h}\).

the experimental variogram characterizes spatial continuity over a variety of lags (with magnitude and orientations) and the variogram model is applied to calculate spatial continuity for all possible lags (all possible separation distances and orientation)

Lag Distance (variogram)#

Variogram Calculation: the magnitude of the lag vector, \(\bf{h}\), describing separation between paired data for variogram calculation and modeling

the experimental variogram characterizes spatial continuity over a variety of lag distances and the variogram model is applied to calculate spatial continuity for all possible lag distances

Lag Distance Tolerance (variogram)#

Variogram Calculation: for calculating experimental variograms, the tolerance \(+/-\Delta\) in lag distance applied to pool pairs of data for that specific lag distance.

for example, given a lag distance of 300 m, we could assign a lag tolerance of 50 m and then all data pairs separated by 250 m - 350 m will be pooled to calculate the experimental variogram for this lag distance
it is common practice to use half the unit lag distance for lag tolerance. This may be increased to smooth the experimental variogram for improved interpretability

Location Map#

Univariate Distributions: a data plot where the 2 axes are locations, e.g., \(X\) and \(Y\), Easting and Northing, Latitude and Longitude, etc., to show the locations and magnitudes of the spatial data.

often the data points are colored to represent the scale of feature to visualize the sampled feature over the area or volume of interest
advantage, visualize the data without any model that may bias our impression of the data
disadvantage, may be difficult to visualize large datasets and data in 3D

Major Direction (variogram)#

Variogram Calculation: when calculating an experimental variogram, the direction with the largest variogram range, greatest spatial continuity

commonly the major and minor directions describe spatial continuity for 2D phenomenon or in 3D the horizontal (or stratigraphically aligned) coordinates are augmented with a vertical (orthogonal to major and minor) direction

Marginal Probability#

Probability Concepts: probability that considers only one event occurring, the probability of \(A\),

\[ P(A) \]

marginal probabilities may be calculated from joint probabilities through the process of marginalization,

\[ P(A) = \int_{-\infty}^{\infty} P(A,B) dB \]

where we integrate over all cases of the other event, \(B\), to remove its influence. Given discrete possible cases of event \(B\) we can simply sum the probabilities over all possible cases of \(B\),

\[ P(A) = \sum_{i=1}^{k_B} P(A,B) dB \]

Minor Direction (variogram)#

Variogram Calculation: when calculating an experimental variogram, the direction with the smallest variogram range, that must be orthogonal to the major direction (a limitation of our geometric anisotropy model)

commonly the major and minor directions describe spatial continuity for 2D phenomenon or in 3D the horizontal (or stratigraphically aligned) coordinates are augmented with a vertical (orthogonal to major and minor) direction

Model Checking#

Model Checking: is a critical last step for any spatial modeling workflow. Here are the critical aspects of model checking,

Model Inputs - data and statistics integration

check the model to ensure the model inputs are honored in the models, generally checked over all the realizations, for example, the output histograms and matches the input histogram over the realizations

Accurate Spatial Estimates - ability of the model to accurately predict away from the available sample data, over a variety of configurations, with accuracy

by cross validation, withholding some of the data, check the model’s ability to predict
generally, summarized with a truth vs. predicted cross plot and measures such as mean square error

\[ MSE = \frac{1}{n} \sum_{\alpha = 1}^{n} \left(z^{*}(\bf{u}_{\alpha}) - z(\bf{u}_{\alpha}) \right)^2 \]

Accurate and Precise Uncertainty Models - uncertainty model is fair given the amount of information available and various sources of uncertainty

also checked through cross validation, withholding some of the data, but by checking the proportion of the data in specific probability intervals
summarized with a proportion of withheld data in interval vs. the probability interval
points on the 45 degree line indicate accurate and precise uncertainty model
points above the 45 degree line indicate accurate and imprecise uncertainty model, uncertainty is too wide
points below the 45 degree line indicate inaccurate uncertainty model, uncertainty is too narrow or model is biased

Monte Carlo Simulation (MCS)#

Distribution Transformations: a random sample from a statistical distribution, random variable. The steps for MCS are:

model the feature cumulative distribution function, \(F_x(x)\)
draw random value from a uniform [0,1] distribution, this is a random cumulative probability value, known as a p-value, \(p^{\ell}\)
apply the inverse of the cumulative distribution function to calculate the associated realization

\[ x^{\ell} = F_x^{-1} (p^{\ell}) \]

repeat to calculate enough realizations for the subsequent analysis

Monte Carlo simulation is the basic building block of stochastic simulation workflows, for example,

Monte Carlo simulation workflows - apply Monte Carlo simulation many over all features to the transfer function to calculate a realization of the decision criteria, repeated for many realizations, to propogate uncertainty through a transfer function
Bootstrap - applies Monte Carlo simulation to aquire realizations of the data to calculate uncertainty in sample statistics or ensembles of prediction models for ensemble-based machine learning
Monte Carlo methods - applies Monte Carlo simulation to speed up an expensive calculation with a limited random sample that converges on the solution as the number of random samples increases

Monte Carlo Simulation Workflow#

Distribution Transformations: a convenient stochastic workflow for propagating uncertainty through a transfer function through sampling with Monte Carlo Simulation (MCS). The workflow includes the following steps,

Model all the input features’ distributions, cumulative distribution functions,

\[ F_{x_1}(x_1), \quad F_{x_2}(x_2), \quad \dots \quad , F_{x_m}(x_m) \]

Monte Carlo simulate a realizations for all the inputs,

\[ x_1^{\ell}, \quad x_2^{\ell}, \quad \ldots \quad , x_m^{\ell} \]

Apply to the transfer function to get a realization of the transfer function output, often the decision criteria

\[ y^{\ell} = f \left(x_1^{\ell},x_2^{\ell}, \quad \ldots \quad, x_m^{\ell} \right) \]

Repeat steps 1-3 to calculate enough realizations to model the transfer function output distribution.

\[ F_y(y) \]

Multiplication Rule (probability)#

Probability Concepts: we can calculate the joint probablity of \(A\) and \(B\) as the product of the conditional probability of \(B\) given \(A\) with the marginal probability of \(A\),

\[ P(A \cup B) = P(A,B) = P(B|A) \cdot P(A) \]

The multiplication rule is derived as a simple manipulation of the definition of conditional probability, in this case,

\[ P(B|A) = \frac{P(A,B)}{P(A)} \]

Mutually Exclusive Events (probability)#

Probability Concepts: the events do not intersect, i.e., do not have any common outcomes. We represent this as,

using set notation, we state events \(A\) and \(B\) are mutually exclusive as,

\[ A \cap B = \{x: x \in A \text{ and } x \in B \} = \emptyset \]

and the probability for mutually exclusive as,

\[ P(A,B) = P(A \cap B) = 0.0 \]

Normalized Histogram#

Univariate Distributions: is a representation of the univariate statistical distribution with a plot of probability over an exhaustive set of bins over the range of possible values. These are the steps to build a normalized histogram,

Divide the continuous feature range of possible values into \(K\) equal size bins, \(\delta x\):

\[ \Delta x = \left( \frac{x_{max} - x_{min}}{K} \right) \]

or use available categories for categorical features.

Count the number of samples (frequency) in each bin, \(n_k\), \(\forall k=1,\ldots,K\) and divide each by the total number of data, \(n\), to calculate the probability of each bin,

\[ p_k = \frac{n_k}{n}, \forall \quad k = 1,\ldots,L \]

Plot the probability vs. the bin label (use bin centroid if continuous)

Note, normalized histograms are typically plotted as a bar chart.

Nugget Effect (variogram)#

Variogram Calculation: discontinuity in the variogram at distances less than the minimum data spacing

often communicated as the ratio, nugget effect divided by the sill, known as the relative nugget effect, for example, copper spatial continuity model includes 20% relative nugget effect
measurement error will cause apparent nugget effect.

Order Relations Correction (indicator kriging)#

Indicator Kriging: correction applied to local cumulative distribution functions estimated by indictor kriging. Due to the separate estimation of each cumulative probability over each threshold the cumulative distribution may not be valid, for example,

nonmonotonic behavior for continuous features
sum of categorical probabilities not equal to one (fail to honor probability closure)

For categorical features the order relations correction is the same as the L1 normalizer (machine learning feature transformations),

\[ i(\bf{u}_{\alpha};z_k)^{\prime} = \frac{i(\bf{u}_{\alpha};z_k)}{\sum_{i=1}^K i(\bf{u}_{\alpha};z_i)} \]

cumulative probability for each threshold \(i(\bf{u}_{\alpha};z_k)\) are divided by the sum of all the cumulative probabilities, \(\sum_{i=1}^K i(\bf{u}_{\alpha};z_i)\) to ensure they sum to 1.0

For continuous features this involves a two pass calculation that results in two possible cumulative distribution functions that are monotonic increasing that are then averaged to get the corrected result (see my lecture, Indicator Methods).

Parameters (statistics)#

Geostatistical Concepts: a summary measure of a population

for example, population mean, population standard deviation

We very rarely have access to actual population parameters, in general we infer population parameters with available sample statistics

Parameters (machine learning)#

Geostatistical Concepts: trainable coefficients for a machine learning model that control the fit to the training data

model parameters are calculated by optimization to minimize error over the training data through, analytical solution, or iterative solution, e.g., gradient descent optimization

Polygonal Declustering#

Declustering to Correct Sampling Bias: a declustering method to assign weights to spatial samples based on local sampling density, such that the weighted statistics are likely more representative of the population. Data weights are assigned so that,

samples in densely sampled areas receive less weight
samples in sparsely sampled areas receive more weight

Polygonal declustering proceeds as follows:

Split up the area of interest with Voronoi polygons. These are constructed by intersected perpendicular bisectors between adjacent data points. The polygons group the area of interest by nearest data point
Assign weight to each datum proportional to the area of the associated Voronoi polygon

\[ w(\bf{u}_j) = n \cdot \frac{A_j}{\sum_{j=1}^n} \]

where \(w(\bf{u}_j)\) is the weight for the \(j\) data. Note, the sum of the weights is \(n\); therefore, \(w(\bf{u}_j)\) is nominal weight of 1.0, sample density if the data were equally spaced over the area of interest.

Here are some highlights for polygonal declustering,

polygonal declustering is sensitive to the boundaries of the area of interest; therefore, the weights assigned to the data near the boundary of the area of interest may change radically as the area of interest is expanded or contracted
polygonal declustering is the same as the Theissen polygon method for calculation of precipitation averages developed by Afred H. Thiessen in 1911, [Thi11]

Population#

Geostatistical Concepts: exhaustive, finite list of property of interest over area of interest.

for example, exhaustive set of porosity measures at every location within a reservoir

Generally, the entire population is not generally accessible and we use a limited sample to make inference concerning the population

Power Law Averaging#

Volume Variance: a flexible approach for scale-up via averaging the values at smaller volume \(z_v\) within larger volume \(z_V\) to calculate an effective value representative of the entire volume, \(V\). The equation is,

\[ z_V = \left[ \frac{1}{n} \sum z_v^{\omega} \right] ^{\frac{1}{\omega}} \]

where \(\omega\) is the power of averaging. For example:

\(\omega = 1\) is a regular linear, arithmetic averaging
\(\omega = -1\) is a harmonic averaging
\(\omega = 0\) is a geometric averaging (this is proved in the limit as \(\omega \rightarrow 0\))

For the case of permeability arithmetic averaging occurs for flow along beds, harmonic averaging occurs for flow orthogonal to beds and near-geometric for flow oblique to beds.

Prediction, Predictive Statistics#

Geostatistical Concepts: estimate the next sample(s) given assumptions about or a model of the population

for example, given our model of the reservoir, predict the next well (pre-drill assessment) sample, e.g., porosity, permeability, production rate, etc.

Predictor Feature#

the input feature for a predictive machine learning model. We can generalize a predictive machine learning model as,

\[ y = \hat{f}(x_1,\ldots,x_m) + \epsilon \]

where the response feature is \(y\), the predictor features are \(x_1,\ldots,x_m\), and \(\epsilon\) is model error

traditional statistics uses the term independent variable

Primary Data#

Geostatistical Concepts: data samples for the feature of interest, the target feature for building a model, for example,

porosity measures from cores and logs used to build a full 3D porosity model. Any samples of porosity are the primary data
as opposed to secondary feature, e.g., if we have facies data to help predict porosity, the facies data are secondary data

Probability Density Function (PDF)#

Univariate Distributions: a representation of a statistical distribution with a function, \(f(x)\), of probability density over the range of all possible feature values, \(x\). These are the concepts for PDFs,

non-negativity constraint, the density cannot be negative,

\[ 0.0 \le f(x) \]

for continuous features the density may be > 1.0, because density is a measure of likelihood and not of probability
integrate density over a range of \(x\) to calculate probability,

\[ 0 \le \int_a^b f(x) dx = P(a \le x \le b) \le 1.0 \]

probability closure, the sum of the area under the PDF curve is equal to 1.0,

\[ \int_{-infty}^{\infty} f(x) dx = 1.0 \]

Nonparametric PDFs are calculated with kernels (usual a small Gaussian distribution) that is summed over all data; therefore, there is an implicitly scale (smoothness) parameter when calculating a PDF.

To large of kernels will smooth out important information about the univariate distribution
Too narrow will result in an overly noisy PDF that is difficult to interpret.

This is analogous to the choice of bin size for a histogram or normalized histogram.

Parametric PDFs are possible but require model fitting to the data, the steps are,

Select a parametric distribution, e.g., Gaussian, log normal, etc.
Calculate the parameters for the parametric distribution based on the available data, by methods such as least squares or maximum likelihood.

Probability Non-negativity, Normalization#

Probability Concepts: fundamental constraints on probability including,

Bounded, \(0.0 \le P(A) \le 1.0\)
Closure, \(P(\Omega) = 1.0\)
Null Sets, \(P(\emptyset) = 0.0\)

Probability Operators#

Probability Concepts: common probability operators that are essential to working with probability and uncertainty problems,

Union of Events - the union of outcomes, the probability of \(A\) or \(B\) is calculated with the probability addition rule,

\[ P(A \cup B) = P(A) + P(B) - P(A,B) \]

Intersection of Events - the intersection of outcomes, the probability of \(A\) and \(B\) is represented as,

\[ P(A \cap B) = P(A,B) \]

only under the assumption of independence of \(A\) and \(B\) can it be calculate from the probabilities of \(A\) and \(B\) as,

\[ P(A,B) = P(A) \cdot P(B) \]

if there is dependence between \(A\) and \(B\) then we need the conditional probability, \(P(A|B)\) instead of the marginal, \(P(A)\),

\[ P(A,B) = P(A|B) \cdot P(B) \]

Complimentary Events - is the NOT operator for probability, if we define \(A\) then \(A\) compliment, \(A^c\) is not \(A\) and we have this resulting closure relationship,

\[ P(A) + P(A^c) = 1.0 \]

complimentary events may be considered for beyond univariate problems, for example consider this bivariate closure,

\[ P(A|B) + P(A^c|B) = 1.0 \]

Note, the given term must be the same.

Mutually Exclusive Events - the events that do not intersect or do not have any common outcomes. We represent this with set notation as,

\[ \{x: x \in A \text{ and } x \in B \} = \emptyset \]

and the joint probability of \(A\) and \(B\) as,

\[ P(A \cap B) = P(A,B) = 0 \]

Probability Perspectives#

Probability Concepts: the 3 primary perspectives for calculating probability:

Long-term frequencies - probability as ratio of outcomes, requires repeated observations of an experiment. The basis for frequentist probability.
Physical tendencies or propensities - probability from knowledge about or modeling the system, e.g., we could know the probability of a heads outcome from a coin toss without the experiment.
Degrees of belief - reflect our certainty about a result, very flexible, assign probability to anything, and updating with new information. The basis for Bayesian probability.

Production Data#

Geostatistical Concepts: includes bottom hole pressure, fluid production, rates, composition, temperatures, commingled over multiple wells, for a single well and even at times along the well bore of a single well. Some additional comments,

production from a single well may be comingled over multiple producing intervals, unless production logging tool (PLT) results are available
important ground truth to be matched with a reservoir model forecasts through the geomodel inversion process called historical production matching

Qualitative Features#

Geostatistical Concepts: information about quantities that you cannot directly measure, require interpretation of measurement, and are described with words (not numbers), for example,

rock type = sandstone
zonation = bornite-chalcopyrite-gold higher grade copper zone

Quantitative Features#

Geostatistical Concepts: features that can be measured and represented by numbers, for example,

age = 10 Ma (millions of years)
porosity = 0.134 (fraction of volume is void space)
saturation = 80.5% (volume percentage)

Like qualitative features, there is often the requirement for interpretation, for example, total porosity may be measured but should be converted to effective porosity through interpretation or a model

Random Function#

Variogram Calculation: set of random variables correlated over space or time, we represent,

a random varibles with an upper-case variable, e.g., \(X\)
a random functions with an upper-case variable with location vectors, e.g., \(X(\bf{u}_1), X(\bf{u}_2), \ldots, X(\bf{u}_n)\)
joint outcomes called realizations, or data samples are represented with lower case, e.g., \(x(\bf{u}_1), x(\bf{u}_2), \ldots, x(\bf{u}_n)\)
realizations with the \(\ell\) notation, e.g. \(x^{\ell}(\bf{u}_1), x^{\ell}(\bf{u}_2), \ldots, x^{\ell}(\bf{u}_n)\) for \(\ell = 1,\ldots,L\) realizations.

Random Variable#

Univariate Distributions: we do not know the value of a feature at a location or time, it can take on a range of possible values, fully described with a statistical distribution, probability density function or cumulative distribution function.

represented as an upper-case variable, e.g., \(X\), while possible outcomes or data measures are represented with lower case, e.g., data as \(x_{\alpha}\), or realization as \(x^{\ell}\)

For spatial phenomenon we add a location vector, \(\bf{u}\), indexed over data or simulation locations, \(\alpha = 1,\ldots,n\),

a spatial random variable, e.g., \(X(\bf{u}_{\alpha})\)
spatial data measures are represented with lower case, e.g., \(x(\bf{u}_{\alpha})\)
spatial realizations, \(x^{\ell}(\bf{u}_{\alpha})\)

Range (variogram)#

Variogram Calculation: lag distance where the experimental variogram reaches the sill. Here’s some more details about the range,

for lag distances less than the range there is information and correlation over distance
for lag distances at and beyond the range there is no information over distance
range is also a parameter applied to fit positive definite, permissible variogram models
when the range varies by direction this is called geometric anisotropy

Realization#

Geostatistical univariate_distributions: an outcome from a random variable or a joint outcome from a random function.

an outcome from a random variable, \(X\), (or joint set of outcomes from a random function)
represented with lower case, e.g., \(x\)
for spatial settings it is common to include a location vector, \(\bf{u}\), to describe the location, e.g., \(x(\bf{u})\), as \(X(\bf{u})\)
resulting from simulation, e.g., Monte Carlo simulation, sequential Gaussian simulation, a method to sample (jointly) from the RV (RF)
in general, we assume all realizations are equiprobable, i.e., have the same probability of occurence

Realizations (uncertainty)#

Geostatistical Concepts: multiple spatial, subsurface models calculated by stochastic simulation by holding input parameters and model choices constant and only changing the random number seed

these models represent spatial uncertainty
for example, hold the porosity mean constant and observe changes in porosity away from the wells over multiple realizations

Reservoir Modeling Workflow#

Geostatistical Concepts: the following is the common geostatistical reservoir modeling workflow:

Integrate all available information to build multiple subsurface scenarios and realizations to sample the uncertainty space
Apply all the models to the transfer function to sample the decision criteria
Assemble the distribution of the decision criteria
Make the optimum reservoir development decisions accounting for this uncertainty model

Response Feature#

output feature for a predictive machine learning model. We can generalize a predictive machine learning model as,

\[ y = \hat{f}(x_1,\ldots,x_m) + \epsilon \]

where the response feature is \(y\), the predictor features are \(x_1,\ldots,x_m\), and \(\epsilon\) is model error

traditional statistics uses the term dependent variable

Sample#

Geostatistical Concepts: the set of values, locations that have been measured

for example, 1,000 porosity measures from well-logs over the wells in the reservoir
or 1,000,000 acoustic impedance measurements over a 1000 x 1000 2D grid for a reservoir unit of interest

Scenarios (uncertainty)#

Geostatistical Concepts: multiple spatial, subsurface models calculated by stochastic simulation by changing the input parameters or other modeling choices to represent the uncertainty due to inference of model parameters and model choices

for example, model three porosity input distribution, porosity mean low, mid and high, and vary the input distribution to calculate new subsurface models

Secondary Data#

Geostatistical Concepts: data samples for another feature, not the feature of interest, the target feature for building a model, but are used to improve the prediction of the target feature.

requires a model of the relationship between the primary and secondary data

For example, samples in space of,

acoustic impedance (secondary data) to support calculation of a model of porosity, the feature of interest
porosity (secondary data) to support calculation of a model of permeability, the feature of interest

Seismic Data#

Geostatistical Concepts: indirect measurement with remote sensing, reflection seismic applies acoustic source(s) and receivers (geophones) to map acoustic reflections with high coverage and generally low resolution. Some more details,

seismic reflections (amplitude) data are inverted to rock properties, e.g., acoustic impedance, consistent with and positionally anchored with well sonic logs
provides framework, bounding surfaces for extents and shapes of reservoirs along with soft information on reservoir properties, e.g., porosity and facies.

Sequential Gaussian Simulation#

Geostatistical Concepts: a simulation method to calculate realizations for spatial models based on the following principles,

Sequential - adding the previously simulated values to the data to ensure the covariance is correct between the simulated values
Gaussian - transformation to Gaussian space so that the local distributions of uncertainty are known given the kriging mean and kriging variance, and the global distribution is reproduced after back-transformation to the original distribution
Simulation - with Monte Carlo simulation from the local distributions to add in the missing variance and to calculate multiple, equiprobable realizations. The random seed determines the individual Monte Carlo simulations along with the random path for the sequential simulation.

Here are all the steps for sequential Gaussian simulation,

Establish grid network and coordinate system, flatten the system including flattening folds and restoring faults
Assign data to the grid (account for scale change from data to model grid cells)
Transform data to Gaussian space, Gaussian anamorphosis applied to original data distribution
Calculate and model the variogram of the Gaussian transformed data
Determine a random path through all of the grid nodes, at each node:
- find nearby data and previously simulated grid nodes
- construct the conditional distribution by kriging, mean as kriging estimate and variance as kriging variance
- Monte Carlo simulate a realization from the conditional distribution
- assign the simulated value to the grid as data
Check realization (could also check after back transform). Does the realization in Gaussian space honor,

data at the data locations?
honor the histogram, \(N\left[0,1\right]\) standard normal with a mean of zero and a variance of one?
and honor the variogram?

Back-transform the simulated values from Gaussian space to the original data distribution
Restore to the original framework, including adding back the folds and faults
Check that the realization honors,

geological concepts?
geophysical data?
historical production data?

Calculate multiple realizations by repeating steps 5 through 9

Here are the critical steps of the sequential Gaussian simulation algorithm only,

Transform the data to Gaussian with a mean of 0.0 and variance of 1.0 (known as standard normal)
Assign a random path over the model grid, at each grid location sequentially simulation (apply kriging to calculate the local CDF, then Monte Carlo simulate a local realization, and assign the local realization to the data)
Back-transform the simulated values to the original data distribution

Sequential Indicator Simulation#

Indicator Simulation: a simulation method to calculate continuous or categorical feature realizations for spatial models based on the following principles,

Sequential - adding the previously simulated values to the data to ensure the covariance is correct between the simulated values.
Indicator - indicator transformation applied to the data over a set of thresholds that allows for flexible probability encoding of hard and soft, continuous and categorical data and the direct estimation of the local cumulative distribution functions.
Simulation with Monte Carlo simulation from the local distributions to add in the missing variance and construction of multiple, equiprobable realizations. The random seed determines the individual Monte Carlo simulations along with the random path for the sequential simulation.

Here are all the steps for sequential indicator simulation,

Establish grid network and coordinate system, flatten the system, including flattening folds and restoring faults
Assign data to the grid (account for scale change)
Apply the indicator transform to all data for all thresholds, \(k = 1,\ldots,K\)
Calculate and model the variogram of the indicator transform of the data for each thresholds, \(k = 1,\ldots,K\)
Determine a random path through all of the grid nodes, at each node:
- find nearby data and previously simulated grid nodes
- for each threshold, construct the local conditional cumulative distribution function by indicator kriging
- apply order relations correction to ensure the local cumulative distribution function is valid
- Monte Carlo simulate a realization from the local conditional cumulative distribution function
- assign the simulated value to the grid as data, apply the indicator transform to the simulated value
Check realization. Does it honor data at the data locations? and honor the histogram, \(N\left[0,1\right]\) standard normal with a mean of zero and a variance of one? and honor the variogram?
Restore to the original framework, including adding back the folds and faults
Check honor concept of geology? geophysics and production data?
Calculate multiple realizations by repeating steps 5 through 8

Sill (variogram)#

Variogram Calculation: the feature variance. We interpret spatial correlation relative to the variogram sill,

at the distance that the experimental variogram reaches the sill there is no spatial correlation. This is called the range.
Experimental variogram points above the sill indicate negative correlation, although we do not model above the sill for kriging nor simulation. We model to the sill and assume no correlation beyond the range.
Experimental variogram that rise linearly above the sill indicates a spatial trend in the data.

Simulation#

Geostatistical Concepts: is process of obtaining one or more good values of a reservoir property at an unsampled location.

global accuracy, matches the global statistics
simulation methods tend to produce more realistic feature spatial, univariate distributions.
for example, Monte Carlo simulation, sequential Gaussian simulation, indicator simulation, object-based simulation, etc.

Use simulation when,

we need to reproduce the distributions of features of interest, the extreme values matter
we need realistic models for flow simulation

Simulation Post-processing#

Simulation Post-processing: operations to calculate summaries over multiple realizations. These summaries calculate the local cumulative distribution function at each location, \(\bf{u}_{\alpha}\), in the model based on pooling the local realizations, \(\ell = 1, \ldots, L\).

\[ F_x(\bf{u}_{\alpha}) \]

Here are common summaries and how they are used,

e-type - is the local expectation (since each realization is assumed to be equiprobable, this is the same as the local average) over the realizations

\[ e(\bf{u }_{\alpha}) = E \left[x^{\ell}(\bf{ u }_{\alpha}) \right], \quad \forall \quad \ell = 1., \ldots, L, \quad \alpha \in V \]

often calculated to visualize the trends after integration of all information sources

conditional standard deviation - is the local standard deviation over the realizations

\[ \sigma(\bf{u }_{\alpha}) = \sqrt{E \left[ (x^{\ell}(\bf{u }_{\alpha})- e(\bf{u }_{\alpha}))^2 \right] }, \quad \forall \quad \ell = 1., \ldots, L, \quad \alpha \in V \]

often calculated to visualize the level of local uncertainty

local percentile - is the local percentile over the realizations

\[ x^p(\bf{ u } _{\alpha}) = F^{-1}(p; \bf{u }_{\alpha}), \quad \alpha \in V \]

often calculated to show maps of local upper and lower bounds, for example, local P10 and P90 models

local probability of exceedance - of the specified threshold, \(z_k\), the same as one minus the cumulative probability of the threshold at each location over the realizations

\[ P \{X(\bf{ u } _{\alpha}) \le x_k \} = 1 - \frac{1}{L} \cdot \sum^{L}_{\ell=1} i^{\ell}(\bf{ u } _{\alpha}; z_k), \quad \alpha \in V \]

often calculated to communicate risk, for example the probability of locally exceeding an environmental threshold for a contaminant

We need sufficient number of realizations, \(L\), to calculate each,

the expected value is the easiest to calculate is reliable with 20 or more realizations
the percentiles on the tails, e.g., P10 and P90, will require 100 or more realizations for reliable results

more realizations result in more reliable simulation post-processing and decision support in general.

Soft Data#

Geostatistical Concepts: data that has a high degree of uncertainty, such that data uncertainty must be integrated into the model

for example, probability density function for local porosity calibrated from acoustic impedance

Soft data integration requires workflows like indicator kriging, indicator simulation and p-field simulation or worklfows that randomize the data with standard simulation methods that assume hard data like sequential Gaussian simulation.

soft data integration is an advanced topic and a focus of ongoing research, but is often to done with standard, subsurface modeling software packages

Spatial Estimation#

Trend Modeling: is process of obtaining the single best value to represent a feature at an unsampled location, or time, \(\bf{u}\).

Given spatial data, \(𝑧(\bf{𝐮}_1), \dots, 𝑧(\bf{𝐮}_n)\) we can estimate at unknown location \(\bf{u}\) with a linear combination of the data as,

\[ z^{*}(\bf{u}) = \sum^{n}_{\alpha = 1} \lambda_{\alpha} z(\bf{u}_{\alpha}) \]

we add an unbiasedness constraint by assigning the remainder of the weight (one minus the sum of weights) to the global average; therefore, if we have no informative data we will estimate with the global average of the property of interest.

\[ z^{*}(\bf{u}) = \sum^{n}_{\alpha = 1} \lambda_{\alpha} z(\bf{u}_{\alpha}) + \left(1-\sum^{n}_{\alpha = 1} \lambda_{\alpha} \right) \overline{z} \]

Some additional concepts,

local accuracy takes precedence over global accuracy, i.e., matching the histogram and variogram
spatial estimation maps and models have too little variance and too much spatial continuity, i.e., are too smooth; therefore, are not appropriate for any transfer function that is sensitive to heterogeneity and the distribution
there is no access to multiple realizations to apply to the transfer function to sample the distribution of the decision criteria, simulation instead of estimation methods are required for comprehensive uncertainty modeling for optimum decision making

There are various methods for spatial estimation, for example,

inverse distance
kriging

also, there are various general methods for non-spatial estimation, e.g., many predictive machine learning models perform non-spatial estimation,

k-nearest neighbours
decision tree
random forest

that like spatial estimation methods focus on local accuracy instead of global accuracy.

Spatial Continuity#

Variogram Calculation: correlation over distance for a feature or interest. Given the relationship between a correlogram, \(\rho(\bf{h})\), and a variogram, \(\gamma(\bf{h})\),

\[ \rho(\bf{h}) = 1.0 - \frac{\gamma(\bf{h})}{\sigma^2} \]

and the correlogram is correlation over distance, the greater the distance between the variogram and the sill (the lower the variogram value), the greater the spatial continuity.

Spatial continuity may be considered,

between sample data during the calculation of an experimental variogram
while infering a variogram model, selecting contributions and ranges of each nested structure
between simulated values during calculation of a simulation model

How do we interpret spatial continuity?

no spatial continuity - indicates no correlation between spatial sample over distance, effectively random values at each location in space regardless of separation distance.
homogenous phenomena - perfect spatial continuity since all values are the same (or very similar) they are correlated over all distances.

Spatial Sampling (biased)#

Declustering to Correct Sampling Bias: sample such that the sample statistics are not representative of the population parameters. For example,

the sample mean is not the same as the population mean
the sample variance is not the same as the population variance

Of course, the population parameters are not accessible, so we cannot directly calculate sampling bias, i.e., the difference between the sample statistics and the population parameters. Methods we can use to check for biased sampling,

evaluate the samples for preferential sampling, clustering, filtering, etc.
apply declustering and check the results for a major change in the summary statistics, this is using declustering diagnostically

Spatial Sampling (clustered)#

Declustering to Correct Sampling Bias: spatial samples with locations preferentially selected, i.e., clustered, resulting in biased statistics,

typically spatial samples are clustered in locations with higher value samples, e.g., high porosity and permeability, good quality shale for unconventional reservoirs, low acoustic impedance indicating higher porosity, etc.

Of course, the population parameters are not accessible, so we cannot directly calculate sampling bias, i.e., the difference between the sample statistics and the population parameters. Methods we can use to check for biased sampling,

evaluate the samples for preferential sampling, clustering, filtering, etc.
apply declustering and check the results for a major change in the summary statistics, this is using declustering diagnostically

Spatial Sampling (common practice)#

Declustering to Correct Sampling Bias: sample locations are selected to,

Reduce uncertainty - by answering questions, for example,

how far does the contaminant plume extend? – sample peripheries
where is the fault? – drill based on seismic interpretation
what is the highest mineral grade? – sample the best part
who far does the reservoir extend? – offset drilling

Directly maximize net present value - while collecting information, for example,

maximize production rates
maximize tonnage of mineral extracted

In other words, often our samples are dual purpose, e.g., wells that are drilled for exploration and appraisal information are subsequently utilized for production.

Spatial Sampling (representative)#

Declustering to Correct Sampling Bias: if we are sampling for representativity, i.e., the sample set and resulting sample statistics are representative of the population, by sampling theory we have 2 options:

Random sampling* - each potential sample from the population is equally likely to be sampled as samples are collected. This includes,

selecting a specific location has no impact on the selection of subsequent locations.
assumption that the population size that is much larger than the sample size; therefore, significant correlation between samples is not imposed due to without replacement sampling (the constraint that you can only sample a location once). Note, generally this is not an issue for the subsurface due to the sparsely sampled massive populations

Regular sampling - sampling at equal space or time intervals. While random sampling is prefered, regular sampling is robust as long as,

the regular sampling intervals do not align with natural periodicity in the data, e.g., the crests are systemally sampling resulting in biased high sample statistics

Stationarity#

Variogram Calculation: any statistic requires replicates, repeated sampling (e.g., air or water samples from a monitoring station).

in our geospatial problems repeated samples are not available at a location in the subsurface
instead of time, we must pool samples over space to calculate our statistics.

This decision to pool samples is the decision of stationarity, the decision that the subset of the subsurface is all the same stuff.

Why must we pool data? Ultimately it is all about making inferences about the population from a limited sample,

to calculate statistics
to build models

The decision of the stationary domain for sampling is an expert decision,

without it we are stuck in the well bore or drill hole and we cannot calculate any statistics nor say anything about the behavior of the subsurface between the sample data

An example geological definition of stationarity could be like this,

The rock over the stationary domain is sourced, deposited, preserved, and post-depositionally altered in a similar manner, the domain is map-able and may be used for local prediction or as information for analogous locations within the subsurface; therefore, it is useful to pool information over this expert mapped volume of the subsurface.

this expert interpolation is applied to map a statistical definition of stationarity that is applied in modeling.

There are 2 aspects of any decision of stationarity,

Import License - choice to pool specific samples to evaluate a statistic
Export License - choice of where in the subsurface this statistic is applicable

To state a decision of stationarity we must include,

the statistic that is stationary, e.g., the mean, the variance, the CDF, etc.
the area or volume over which the statistic is stationary, e.g., the entire model, the sand facies, the proximal region, etc.

An example statistical definitions of stationarity,

stationary mean

\[ E[ Z(\bf{u} ] = \overline{z}, \quad \forall \quad \bf{u} \in AOI \]

stationary CDF

\[ F_z(\bf{u},z) = F_z(z), \quad \forall \quad \bf{u} \in AOI \]

stationary semivariogram

\[ \gamma_z(\bf{u},h) = \gamma_z(\bf{h}) \]

This may be extended to any statistic of interest including, facies proportions, bivariate distributions and multiple point statistics.

Here’s some additional considerations for stationarity,

Stationarity is a decision, not a hypothesis - therefore, it cannot be tested, although data may demonstrate that it is inappropriate.
The stationarity assessment depends on scale - this choice of modeling scale(s) should be based on the specific problem and project needs.
We cannot avoid a decision of stationarity - no stationarity decision and we cannot move beyond the data. Conversely, assuming broad stationarity over all the data and over large volumes of the earth is naïve.
Geomodeling stationarity is the decision - (1) over what region to pool data (import license) and (2) over what region to use the resulting statistics (export license).
Nonstationary trends may be mapped - the remaining stationary residual modelled stochastically. This is the hybrid modeling approach.

Stochastic Model#

Geostatistical Concepts: system or process that is uncertain and is represented by multiple models, realizations and scenarios constrained by statistics,

for example, data-driven models that integrate uncertainty like geostistical simulation models

Advantages:

speed
uncertainty assessment
report significance, confidence / prediction intervals
honor many types of data
data-driven approaches

Disadvantages:

limited physics used
statistical model assumptions / simplification

For the alternative to stochastic models see deterministic models.

Statistics (practice)#

Geostatistical Concepts: the theory and practice for collecting, organizing, and interpreting data, as well as drawing conclusions and making decisions.

Statistics (measurement)#

Geostatistical Concepts: summary measure of a sample, for example,

sample mean - \(\overline{x}\)
sample standard deviation - \(s\),

we use statistics as estimates of the model parameters that summarize the population (inference)

Statistical Distribution#

Univariate Distributions: for a feature a description of the probability of occurrence over the range of possible values. We represent the univariate statistical distribution with,

histogram
normalized histogram
probability density function (PDF)
cumulative distribution function (CDF)

What do we learn from a statistical distribution? For example,

what is the minimum and maximum?
do we have a lot of low values?
do we have a lot of high values?
do we have outliers, and any other values that don’t make sense and need explaining?

Transfer Function (reservoir modeling workflow)#

Geostatistical Concepts: calculation applied to the spatial, subsurface model realizations and scenarios to calculate a decision criterion, a metric that is used to support decision making representing value, and health, environment and safety. Example transfer functions include,

transport and bioattenuation - numerical simulation to model soil contaminant concentrations over time during a pump and treat operation
volumetric calculation - for total oil-in-place to calculate resource in place
heterogeneity metric - as an indicator of recovery factor to estimate reserves from resources
flow simulation - for pre-drill production forecast for a planned well
Whittle’s pit optimization - to calculate mineral resources and ultimage pit shell

Trend (data)#

Trend Modeling: the spatial feature is nonstationary over space, i.e., changes systematical over the area of interest. For example,

the porosity decreases with depth
the copper grade increases toward the highly faulted zone

Trend is also used to describe a model of the nonstationarity in any statistic or metric of interest, as in trend model

Trend (model)#

Trend Modeling: a determistic model that represents the local change in a statistic that is applied as an input for a spatial simulation method, for example,

a linear function for reduction in porosity with depth, based on local data and regional compaction trends
a moving window local average copper grade model to model the increase in copper grade toward the highly faulted zone

This provides a local value of the statistic at all model grid cells, so the simulation can apply the trend model to relax the assumption of statistionarity in the statistic.

a trend model may be calculated and applied to applied to any statistic used in the simulation model, e.g., mean, variogram range, variogram major direction, correlation coefficient, etc.

Trend (variogram model)#

Variogram Calculation and Modeling: experimental variogram points rise approximately linearly above the sill.

indicates a trend in the data, e.g., fining upward, increased compaction with depth, etc.
could be interpreted as a fractal, i.e., model without a finite variance or sill fit with a power law function. Note, variogram models above the sill are not permissible for simulation methods like sequential Gaussian simulation, we must model to the sill
common workflow is to remove the trend, work with the residual, if the trend is removed the residual variogram will plateau at the sill

Trend Modeling#

Trend Modeling: modeling of local features, based on data and interpretation, that are deemed certain (known). The trend is subtracted from the data, leaving a residual that is modelled stochastically with uncertainty (treated as unknown). The following steps are applied:

model the nonstationary, spatial, deterministic trend for a feature of interest
subtract the trend from the data to calculate the residual
model the residual with geostatistical spatial estimation or simulation
add the deterministic trend to the geostatistical (deterministic if kriging or stochastic if simulation) residual
check the model

Uncertainty Modeling#

Geostatistical Concepts: calculation of the range of possible values for a feature at a location or jointly over many locations at the sample time. Some considerations,

quantification of the limitation in the precision of our samples and model predictions
uncertainty is a model, there is no objective uncertainty
uncertainty is caused by our ignorance
uncertainty is caused by sparse sampling, measurement error and bias, and heterogeneity

we represent uncertainty by multiple models, scenarios and realizations:

Scenarios - multiple spatial, subsurface models calculated by stochastic simulation by changing the input parameters or other modeling choices to represent the uncertainty due to inference of model parameters and model choices
Realizations - multiple spatial, subsurface models calculated by stochastic simulation by holding input parameters and model choices constant and only changing the random number seed

Union of Events (probability)#

Probability Concepts: the union of outcomes, the probability of \(A\) or \(B\) is calculated with the probability addition rule,

\[ P(A \cup B) = P(A) + P(B) - P(A,B) \]

Unit Lag Distance (variogram)#

Variogram Calculation: the increments for lag distance applied to calculate an experimental variogram

for example if the unit lag distance is 50 m and 5 lags are calculated, the experimental variogram will have points at lag distance = 50 m, 100 m, 150 m, 200 m, and 250 m
typically the unit lag distance is set as the nominal minimum data spacing in the specific direction. Don’t use the minimum lag distance as there would be only one pair available and the experimental variogram point would be unreliable for the first unit lag
nominal minimum data spacing indicates the smallest lag with enough pairs to calculate a reliable experimental variogram point

Univariate Parameters#

Univariate Distributions: summary measures based on one feature measured over the population

Univariate Statistics#

Univariate Distributions: summary measures based on one feature measured over the samples

Variable (also feature)#

Geostatistical Concepts: any property measured or observed in a study, for example,

porosity, permeability, mineral concentrations, saturations, contaminant concentration
in data mining / machine learning this is known as a feature
measure often requires significant analysis, interpretation, etc.

Variogram#

Variogram Calculation: function of difference over distance.

experimental variogram is calculated over integer multiples of the unit lag distance and then plotted as points, then permissible variogram models are fit to the experimental variogram while integrating other domain and local knowledge.
the variogram is calculated as one half the average squared difference over lag distance, 𝐡, over all possible pairs of data,

\[ \gamma_z(\bf{h}) = \frac{1}{2 \cdot N(\bf{h})} \sum_{\alpha = 1}^{N(\bf{h})} \left( z(\bf{u}_{\alpha}) - z(\bf{u}_{\alpha} + \bf{h}) \right)^2 \]

the precise term is semivariogram (or variogram if you remove the \frac{1}{2} in the equation above), but in practice, the semivariogram is only used and the term variogram is always used for the semivariogram
the \(\frac{1}{2}\) term is added to the semivariogram so that the covariance function, \(C_z(\bf{u})\), and variogram, \(\gamma_z(\bf{h})\), may be related as:

\[ C_z(\bf{h}) = \sigma_z^2 - \gamma_z(\bf{h}) \]

Note the correlogram, \(\rho_z(\bf{u})\), is related to the covariance function, \(C_z(\bf{u})\), as:

\[ \rho_z(\bf{u}) = \frac{C_z(\bf{h})}{\sigma_z^2} \]

Here are some general observations about the variogram,

Often increasing - as the lag distance, \(\bf{h}\), increases, variability over the lag distance increase (in general).

Not a local measure - the variogram is calculated with over all possible pairs separated by lag vector, \(\bf{h}\).

Interpret relative to the sill - we need to plot the sill on with the experimental variogram to know the degree of correlation.

The sill is the variance, \(\sigma^2\), given stationarity of the variance and variogram, \gamma_z(\bf{h})):

\[ C_z(\bf{h}) = \sigma_z^2 - \gamma_z(\bf{h}) \]

and given a standardized feature, \(\sigma_z^2 = 1.0\),

\[ \rho_z(\bf{u}) = \sigma_z^2 - \gamma_z(\bf{h}) \]

The distance from the sill to the experimental variogram is the correlation coefficient over the specific lag distance.

The Range - the lag distance at which the variogram reaches the sill is know as the range.

at the range, knowing the data value at the tail provides no information about a value at the head.

Nugget Effect - sometimes there is a discontinuity in the variogram at distances less than the minimum data spacing. This is known as nugget effect.

the ratio of nugget divided by sill, is known as relative nugget effect, reported in percentage, e.g., 10% relative nugget effect
we model the nugget effect as a no correlation structure over all lags greater than an infinitesimal distance, \(\bf{h} \gt \epsilon\)
measurement error, causes an apparent nugget effect, if this is suspected do not add nugget effect to the variogram model

Variogram Map#

Variogram Calculation: calculating the experimental variogram over all distances and directions at once.

use a mesh template, cell size controls resolution like unit lag distance (assuming lag tolerance is \(\frac{1}{2}\) lag distance), number of cells controls the extent of calculation
can be useful to visualize and determine major direction and minor direction.
typically variogram maps require more data than regular isotropic and anisotropic experimental variograms and are not practical with sparse spatial data, i.e., with too few data they tend to be too noisy for reliable interpretation

Variogram Model#

Variogram Calculation and Modeling: the variogram model is fit based on the experimental variograms and expert interpretation. Here’s the reasons for variogram modeling:

Interpolate all distances and directions - the variogram must be known for all possible \(\bf{h}\) lags, distances and directions, not just the limited lags calculated for the experimental variogram(s)

Integrate geological knowledge - the variogram modeling process is an opportunity integrate our geological knowledge. For example, geometric anisotropy ratios from knowledge of the depositional setting.

Valid model of difference vs. offset - the variogram model must be positive definite (a legitimate measure of distance), that is, the variance of any linear combination must be positive. The variogram modeling workflow with additive, nested structures ensures this.

Variogram modeling with nested structures are applied to describe variance (sill) components with the following steps,

Nugget effect - assign as the single (lowest) isotropic nugget effect
Number of structures - choose the same number of variogram structures for all directions based on the most complex direction
Contributions for each structure - ensure that the same contribution parameter is used for each variogram structure in all directions and that the sum of all structures’ contributions is equal to the sill (variance). Model to the sill and not above the sill.
Apparent Nugget Effect - for apparent nugget effect structures (nugget effect in only some directions) use 0.0 range in the apparent nugget effect directions.
Geometric anisotropy - vary the range parameters in each direction (anisotropy ratios)
Zonal anisotropy - if the variogram does not reach the sill in some directions, then assign a very large value for the range in those directions to represent zonal anisotropy

Here are some additional comments to assist with variogram modeling.

initial coordinate and data transformation may be required, e.g., if the lags cross folded beds then the range will likely be underestimated
focus on fundamental variogram interpretations including, trend, cyclicity, geometric anisotropy, and zonal anisotropy. If trend is present, calculate a feature spatial trend model and then work with the residual model that does not have trend
often the short scale structure is most important as most estimates and simulated realizations are interpolating between data, i.e., the data themselves often control long scale spatial continuity
nugget effect due to measurement error should not be modelled, as it is not a feature of the geology. Nugget effect is very rare in oil and gas, but is more common in mining
vertical direction is typically better informed due to number of samples and spacing along near vertical wells, model vertical direction and then fit an anisotropy ratio to the limited horizontal experimental variogram points

Variance Reduction Factor#

Volume Variance: the ratio, \(f\), of the variance at volume support, scale \(v\) to the variance at the data volume support, scale \(\cdot\), i.e., the standard variance, \(\sigma^2\),

\[ f = 1 - \frac{\overline{\gamma}(v,v)}{\sigma^2} \]

where \(f\) is variance reduction factor,

\[ f = \frac{D^2(v,V)}{D^2(\cdot,V)} = \frac{D^2(v,V)}{\sigma^2} \]

Variance reduction factor is applied as a simplified workflow to adjust a data histogram to the upscaled distribution at model scale, \(v\). Note, if this is not done and the data scale distribution is used for the model the variance is too large.

Volume-Variance Relations#

Volume Variance: as the volume support (scale) increases the variance reduces

Predicting volume-variance relations is central to handling multiple scales of data and models. Some general observations and assumptions,

the mean does not change as the volume support, scale changes. Only the variance changes
there may be shape change (we will not tackle that here). Best practice is to check shape change empirically. It is common to assume no shape change (affine correction) or to use a shape change model (indirect lognormal correction).
the variance reduction in the distribution is inversely proportional to the range of spatial continuity. Variance reduces faster (over smaller volume increase) for shorter spatial continuity ranges.

Over common changes in scale this impact may be significant; therefore, it is not appropriate to ignore volume-variance relations,

we don’t do this scale up, change in volume support perfectly, and this is why it is still called the missing scale. We rarely have enough data to model this rigorously
we need a model to predict this change in variance with change in volume support

There are some change in volume support, scale models,

Empirical - build a small scale, high resolution model and scale it up numerically. For example, calculate a high resolution model of permeaiblity, apply flow simulation to calculate effective permeability over \(v\) scale blocks

Power Law Averaging - there is a flexible approach known as power law averaging.

\[ z_V = \left[ \frac{1}{n} \sum z_v^{\omega} \right] ^{\frac{1}{\omega}} \]

where \(\omega\) is the power of averaging. For example:

\(\omega = 1\) is a regular linear averaging
\(\omega = -1\) is a harmonic averaging
\(\omega = 0\) is a geometric averaging (this is proved in the limit as \(\omega \rightarrow 0\))

How to calculate \(\omega\)?

for some cases we know from theory the correct \(\omega\) value, for example, for flow orthogonal to beds we select \(\omega = -1.0\) to scale up permeability
flow simulation may be applied to numerically scale up permeability and then to back-calculate a calibrated \(\omega\)

Model - directly adjust the statistics for change in scale. For example, under the assumption of linear averaging and a stationary variogram and variance:

\[ f = 1 - \frac{\overline{\gamma}(v,v)}{\sigma^2} \]

where \(f\) is variance reduction factor,

\[ f = \frac{D^2(v,V)}{D^2(\cdot,V)} = \frac{D^2(v,V)}{\sigma^2} \]

in other words, \(f\) is the ratio of the variance at scale \(v\) to the variance at the original data point support scale based on,

the variogram model
the scale of the data, \(\cdot\) and the scale of \(v\)

Venn Diagrams#

Probability Concepts: a plot, visual tool for communicating probability. What do we learn from a Venn diagram?

size of regions \(\propto\) probability of occurrence
proportion of \(\Omega\), all possible outcomes represented by a box, i.e., probability of \(1.0\)
overlap \(\propto\) probability of joint occurrence

Venn diagrams are an excellent tool to visualize marginal, joint and conditional probability.

Well Log Data#

Geostatistical Concepts: as a much cheaper method to sample wells that does not interrupt drilling operations, well logs are very common over the wells. Often all wells have various well logs available. For example,

gamma ray on pilot vertical wells to assess the locations and quality of shales for targetting (landing) horizontal wells
neutron porosity to assess location high porosity reservoir sands
gamma ray in drill holes to map thorium mineralization

Well log data are critical to support subsurface resource interpretations. Once anchored by core data they provide the essential coverage and resolution to model the entire reservoir concept / framework for prediction, for example,

well log data calibrated by core data collocated with well log data are used to map the critical stratigraphic layers, including reservoir and seal units
well logs are applied to depth correct features inverted from seismic that have location imprecision due to uncertainty in the rock velocity over the volume of interest

Well Log Data, Image Logs#

Geostatistical Concepts: a special case of well logs where the well logs are repeated at various azimuthal intervals within the well bore resulting in a 2D (unwrapped) image instead of a 1D line along the well bore. For example, Fullbore formation MicroImager (FMI) with:

with 80% bore hole coverage
0.2 inch (0.5 cm) resolution vertical and horizontal
30 inch (79 cm) depth of investigation

can be applied to observe lithology change, bed dips and sedimentary structures.

Zonal Anisotropy (variogram model)#

Variogram Calculation and Modeling: when the experimental variogram does not reach the sill in a direction, i.e., the experimental levels out below the sill. Zonal anisotropy is often caused by layering,

over the direction aligned with layers
zonal anisotropy is often paired with cyclicity or trend in the other (orthogonal) direction, e.g., zonal anisotropy in the major direction and trend in the minor direction
the variance at which the variogram levels off is called an apparent sill

We can actually use zonal anisotropy to understand the partitioning of variance over space,

the variance component up to the apparent sill is the variance within the layers
the variance component from the apparent sill to the sill is the variance between the layers

Comments#

This was a basic introduction to geostatistics. If you would like more on these fundamental concepts I recommend the Introduction, Modeling Principles and Modeling Prerequisites chapters from my text book, Geostatistical Reservoir Modeling{cite}`pyrcz2014’.

I hope this is helpful,

Michael

The Author:#

Michael Pyrcz, Professor, The University of Texas at Austin Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions

With over 17 years of experience in subsurface consulting, research and development, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers’ and geoscientists’ impact in subsurface resource development.

For more about Michael check out these links:

Want to Work Together?#

I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.

Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I’d be happy to drop by and work with you!
Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!
I can be reached at mpyrcz@austin.utexas.edu.

I’m always happy to discuss,

Michael

Michael Pyrcz, Ph.D., P.Eng. Professor, Cockrell School of Engineering and The Jackson School of Geosciences, The University of Texas at Austin

Geostatistics Glossary

Contents

Geostatistics Glossary#

Motivation for Geostatistics Concepts#

Addition Rule (probability)#

Affine Correction#

Anisotropic or Directional (variogram)#

Azimuth Tolerance (variogram)#

Bandwidth (variogram)#

Bayesian Probability#

Bayes’ Theorem (probability)#

Big Data#

Big Data Analytics#

Bootstrap#

Categorical Feature#

Categorical Nominal Feature#

Categorical Ordinal Feature#

Cell-based Declustering#

Cloud Transform#

Continuous Feature#

Continuous, Interval Feature#

Continuous, Ratio Feature#

Cognitive Biases#

Cokriging#

Collocated Cokriging#

Complimentary Events (probability)#

Conditional Probability#

Core Data#

Correlogram#

Cosimulation#

Covariance Function#

Cumulative Distribution Function (CDF)#

Cyclicity (variogram model)#

Data (data aspects)#

Data Analytics#

Debiasing with Secondary Data#

Decision Criteria#

Declustering#

Declustering (statistics)#

Deterministic Model#

Discrete Feature#

Dispersion Variance#

Distribution Transformations#

Ergodic Fluctuations#

Estimation#

Facies#

Facies Simulation#

Feature (also variable)#

Frequentist Probability#

Gamma Bar#

Geometric Anisotropy (variogram interpretation)#

Geometric Anisotropy (variogram model)#

Geostatistics#

Global Accuracy#

Global Measures#

Hard Data#

Histogram#

Hybrid Model#

Independence (probability)#

Indicator Kriging#

Indicator Transform#

Indicator Variogram#

Inference, Inferential Statistics#

Intersection of Events (probability)#

Isotropic or Omnidirectional (variogram)#

Joint Probability#

Local Accuracy#

Local Measures#

Kriging#

Kriging (simple vs. ordinary)#

Kriging Variance#

Kriging-based Declustering#

Kolmogorov’s 3 Probability Axioms#

Lag (variogram)#

Lag Distance (variogram)#

Lag Distance Tolerance (variogram)#

Location Map#

Major Direction (variogram)#

Marginal Probability#

Minor Direction (variogram)#