Trend Modeling

Trend Modeling#

Michael J. Pyrcz, Professor, The University of Texas at Austin

Chapter of e-book “Applied Geostatistics in Python: a Hands-on Guide with GeostatsPy”.

Cite this e-Book as:

Pyrcz, M.J., 2024, Applied Geostatistics in Python: a Hands-on Guide with GeostatsPy [e-book]. Zenodo. doi:10.5281/zenodo.15169133

The workflows in this book and more are available here:

Cite the GeostatsPyDemos GitHub Repository as:

Pyrcz, M.J., 2024, GeostatsPyDemos: GeostatsPy Python Package for Spatial Data Analytics and Geostatistics Demonstration Workflows Repository (0.0.1) [Software]. Zenodo. doi:10.5281/zenodo.12667036. GitHub Repository: GeostatsGuy/GeostatsPyDemos

By Michael J. Pyrcz
© Copyright 2024.

This chapter is a tutorial for / demonstration of Calculating and Modeling with Spatial Trends with GeostatsPy.

YouTube Lecture: check out my lectures on:

For your convenience here’s a summary of salient points.

Trend Modeling#

Trend modeling is the modeling of local features, based on data and interpretation, that are deemed certain (known). The trend is subtracted from the data, leaving a residual that is modeled stochastically with uncertainty (treated as unknown).

geostatistical spatial estimation methods will make an assumption concerning stationarity, in the presence of significant nonstationarity we can not rely on spatial estimates based on data + spatial continuity model
if we observe a trend, we should model the trend, then model the residuals stochastically

To demonstrate the trend and residual modeling workflow, let’s start with this spatial data along the $X$ axis in 1D,

Steps:

model trend consistent with data and interpretation at all locations within the area of interest, integrate all available information and expertise.

\[ m(\bf{u}_\beta), \, \forall \, \beta \in \, AOI \]

Spatial data over $X$ the direction with trend model (red dashed line).

subtract trend from data at the $n$ data locations to formulate a residual at the data locations.

\[ y(\bf{u}_{\alpha}) = z(\bf{u}_{\alpha}) - m(\bf{u}_{\alpha}), \, \forall \, \alpha = 1, \ldots, n \]

Spatial data residual over $X$ direction with trend model (red dashed line).

characterize the statistical behavior of the residual $y(\bf{u}_{\alpha})$ integrating any information sources and interpretations. For example the global cumulative distribution function and a measure of spatial continuity shown here.

\[ F_y(y) \quad \gamma_y(\bf{h}) \]

model the residual at all locations with $L$ multiple realizations.

\[ Y^\ell(\bf{u}_\beta), \, \forall \, \beta \, \in \, AOI; \, \ell = 1, \ldots, L \]

Spatial data residual and residual model over $X$ direction with trend model (red dashed line).

add the trend back in to the stochastic residual realizations to calculate the multiple realizations, $L$, of the property of interest based on the composite model of known deterministic trend, $m(\bf{u}_\alpha)$ and unknown stochastic residual, $y(\bf{u}_\alpha)$

\[ Z^\ell(\bf{u}_\beta) = Y^\ell(\bf{u}_\beta) + m(\bf{u}_\beta), \, \forall \, \beta \in \, AOI; \, \ell = 1, \ldots, L \]

Spatial data and model over $X$ direction with trend model (red dashed line).

check the model, including quantification of the proportion of variance treated as known (trend) and unknown (residual).

\[ \sigma^2_{Z} = \sigma^2_{Y} + \sigma^2_{m} + 2 \cdot C_{Y,m} \]

given $C_{Y,m} \to 0$:

\[ \sigma^2_{Z} = \sigma^2_{Y} + \sigma^2_{m} \]

I can now describe the proportion of variance allocated to known and unknown components as follows:

\[ Prop_{Known} = \frac{\sigma^2_{m}}{\sigma^2_{Y} + \sigma^2_{m}} \quad Prop_{Unknown} = \frac{\sigma^2_{Y}}{\sigma^2_{Y} + \sigma^2_{m}} \]

Trend check, the proportion of variance allocated to deterministic trend and residual.

What would happen if we did not model the nonstationary trend in the mean and work with stationary residaul?

we could assume the mean is stationary without trend. Then our model may look like this,

Model assuming stationary mean (above), and model with trend model and stationary residual (below).

I provide some practical, data-driven methods for trend model, but I should indicate that:

trend modeling is very important in reservoir modeling as it has large impact on local model accuracy and on the uncertainty model
trend modeling is used in almost every subsurface model, unless the data is dense enough to impose local trends
trend modeling includes a high degree of expert judgement combined with the integration of various information sources

Spatial Estimation#

Consider the case of making an estimate at some unsampled location, $𝑧(\bf{u}_0)$, where $z$ is the property of interest (e.g. porosity etc.) and $𝐮_0$ is a location vector describing the unsampled location.

How would you do this given data, $𝑧(\bf{𝐮}_1)$, $𝑧(\bf{𝐮}_2)$, and $𝑧(\bf{𝐮}_3)$?

It would be natural to use a set of linear weights to formulate the estimator given the available data.

\[ z^{*}(\bf{u}) = \sum^{n}_{\alpha = 1} \lambda_{\alpha} z(\bf{u}_{\alpha}) \]

We could add an unbiasedness constraint to impose the sum of the weights equal to one. What we will do is assign the remainder of the weight (one minus the sum of weights) to the global average; therefore, if we have no informative data we will estimate with the global average of the property of interest.

\[ z^{*}(\bf{u}) = \sum^{n}_{\alpha = 1} \lambda_{\alpha} z(\bf{u}_{\alpha}) + \left(1-\sum^{n}_{\alpha = 1} \lambda_{\alpha} \right) \overline{z} \]

We will make a stationarity assumption, so let’s assume that we are working with residuals, $y$.

\[ y^{*}(\bf{u}) = z^{*}(\bf{u}) - \overline{z}(\bf{u}) \]

If we substitute this form into our estimator the estimator simplifies, since the mean of the residual is zero.

\[ y^{*}(\bf{u}) = \sum^{n}_{\alpha = 1} \lambda_{\alpha} y(\bf{u}_{\alpha}) \]

while satisfying the unbiasedness constraint.

Kriging#

Now the next question is what weights should we use?

We could use equal weighting, $\lambda = \frac{1}{n}$, and the estimator would be the average of the local data applied for the spatial estimate. This would not be very informative.

We could assign weights considering the spatial context of the data and the estimate:

spatial continuity as quantified by the variogram (and covariance function)
redundancy the degree of spatial continuity between all of the available data with themselves
closeness the degree of spatial continuity between the available data and the estimation location

The kriging approach accomplishes this, calculating the best linear unbiased weights for the local data to estimate at the unknown location. The derivation of the kriging system and the resulting linear set of equations is available in the lecture notes. Furthermore kriging provides a measure of the accuracy of the estimate! This is the kriging estimation variance (sometimes just called the kriging variance).

\[ \sigma^{2}_{E}(\bf{u}) = C(0) - \sum^{n}_{\alpha = 1} \lambda_{\alpha} C(\bf{u}_0 - \bf{u}_{\alpha}) \]

What is ‘best’ about this estimate? Kriging estimates are best in that they minimize the above estimation variance.

Properties of Kriging#

Here are some important properties of kriging:

Exact interpolator - kriging estimates with the data values at the data locations
Kriging variance can be calculated before getting the sample information, as the kriging estimation variance is not dependent on the values of the data nor the kriging estimate, i.e. the kriging estimator is homoscedastic.
Spatial context - kriging takes into account, furthermore to the statements on spatial continuity, closeness and redundancy we can state that kriging accounts for the configuration of the data and structural continuity of the variable being estimated.
Scale - kriging may be generalized to account for the support volume of the data and estimate. We will cover this later.
Multivariate - kriging may be generalized to account for multiple secondary data in the spatial estimate with the cokriging system. We will cover this later.
Smoothing effect of kriging can be forecast. We will use this to build stochastic simulations later.

We limit ourselves to simple data-driven methods, but acknowledge much more is needed. In fact, trend modeling requires a high degree of knowledge concerning local geoscience and engineering, data and physics.

To help my students gain hands-on experience with calculating and modeling trend models, I built out an interactive Python Trend Modeling Dashboard.

In this workflow we will explore methods to calculate and model spatial trends with GeostatsPy, from a spatial dataset.

Load the Required Libraries#

The following code loads the required libraries.

import geostatspy.GSLIB as GSLIB                              # GSLIB utilities, visualization and wrapper
import geostatspy.geostats as geostats                        # GSLIB methods convert to Python      
import geostatspy
print('GeostatsPy version: ' + str(geostatspy.__version__))  

GeostatsPy version: 0.0.78

We will also need some standard packages. These should have been installed with Anaconda 3.

import os                                                     # set working directory, run executables

from tqdm import tqdm                                         # suppress the status bar
from functools import partialmethod
tqdm.__init__ = partialmethod(tqdm.__init__, disable=True)

ignore_warnings = True                                        # ignore warnings?
import numpy as np                                            # ndarrays for gridded data
import pandas as pd                                           # DataFrames for tabular data
import matplotlib.pyplot as plt                               # for plotting
from matplotlib.ticker import (MultipleLocator, AutoMinorLocator) # control of axes ticks
from scipy import stats                                       # summary statistics
import math                                                   # trig etc.
import scipy.signal as signal                                 # kernel for moving window calculation
import astropy.convolution.convolve as convolve               # sparse data convolution
plt.rc('axes', axisbelow=True)                                # plot all grids below the plot elements
if ignore_warnings == True:                                   
    import warnings
    warnings.filterwarnings('ignore')
cmap = plt.cm.inferno                                         # color map

Define Functions#

This is a convenience function to add major and minor gridlines to improve plot interpretability and a Gaussian kernel function.

scipy’s Gaussian kernel maker was not working, so I made a new here function for reliability

def make_Gaussian_kernel(sigma,nc,csiz):                      # Gaussian kernel maker
    kernel = np.zeros((nc,nc))                                 
    for iy in range(0,nc):
        y = (iy - 100)*csiz
        for ix in range(0,nc):
            x = (ix - 100)*csiz
            kernel[iy,ix] = math.exp(-1*(y**2+x**2)/(2*sigma**2))/(2*math.pi*sigma**2)
    kernel = kernel/np.sum(kernel.flatten())                  # normalize the kernel to sum to 1.0 to prevent bias  
    return kernel

def add_grid():
    plt.gca().grid(True, which='major',linewidth = 1.0); plt.gca().grid(True, which='minor',linewidth = 0.2) # add y grids
    plt.gca().tick_params(which='major',length=7); plt.gca().tick_params(which='minor', length=4)
    plt.gca().xaxis.set_minor_locator(AutoMinorLocator()); plt.gca().yaxis.set_minor_locator(AutoMinorLocator()) # turn on minor ticks   

Set the Working Directory#

I always like to do this so I don’t lose files and to simplify subsequent read and writes (avoid including the full address each time).

#os.chdir("c:/PGE383")                                        # set the working directory

Loading Tabular Data#

Here’s the command to load our comma delimited data file in to a Pandas’ DataFrame object.

note, I often remove unnecessary data table columns. This clarifies workflows and reduces the chance of blunders, e.g., using the wrong column!

df = pd.read_csv("https://raw.githubusercontent.com/GeostatsGuy/GeoDataSets/master/sample_data_MV_biased.csv") # load the data from Dr. Pyrcz's GitHub repository
df = df[['X','Y','Porosity']]                                   # remove the unneeded features, columns
df.head(n=3)                                                    # preview the DataFrame

	X	Y	Porosity
0	100.0	900.0	0.101319
1	100.0	800.0	0.147676
2	100.0	700.0	0.145912

Summary Statistics#

Let’s look at summary statistics.

df.describe().transpose()

	count	mean	std	min	25%	50%	75%	max
X	368.0	499.565217	289.770794	0.000000	240.000000	500.000000	762.500000	990.000000
Y	368.0	520.644022	277.412187	9.000000	269.000000	539.000000	769.000000	999.000000
Porosity	368.0	0.127026	0.030642	0.041122	0.103412	0.125842	0.148623	0.210258

Specify the Grid#

Let’s specify a reasonable grid to the estimation map.

we balance detail and computation time. Note kriging computation complexity scales $O(n_{cell})$
so if we half the cell size we have 4 times more grid cells in 2D, 4 times the runtime

xmin = 0.0; xmax = 1000.0                                     # range of x values
ymin = 0.0; ymax = 1000.0                                     # range of y values

xsiz = 10; ysiz = 10                                          # cell size
nx = 100; ny = 100                                            # number of cells
xmn = 5; ymn = 5                                              # grid origin, location center of lower left cell

pormin = 0.05; pormax = 0.22                                  # set feature min and max for colorbars

Location Maps#

Let’s plot the location maps of porosity data.

GSLIB.locmap                                                  # GeostatsPy's location map function

<function geostatspy.GSLIB.locmap(df, xcol, ycol, vcol, xmin, xmax, ymin, ymax, vmin, vmax, title, xlabel, ylabel, vlabel, cmap, fig_name)>

Now we can populate the plotting parameters and visualize the porosity data.

plt.subplot(111)
GSLIB.locmap_st(df,'X','Y','Porosity',xmin,xmax,ymin,ymax,pormin,pormax,'Well Data - Porosity','X(m)','Y(m)','Porosity (fraction)',cmap)
plt.subplots_adjust(left=0.0, bottom=0.0, right=1.0, top=1.2, wspace=0.2, hspace=0.2); add_grid(); plt.show()

_images/c6c6b44e1589228e4ae10c0c15ef8570cf792df4a8245b573c26bb06ace684d2.png

Declustering Spatial Data#

Let’s get some declustering weights. For more information see the demonstration on declustering.

wts, cell_sizes, dmeans = geostats.declus(df,'X','Y','Porosity',iminmax = 1, noff= 10, ncell=100,cmin=10,cmax=2000)
df['Wts'] = wts                                               # add weights to the sample data DataFrame
df.head()                                                     # preview to check the sample data DataFrame

There are 368 data with:
   mean of      0.12702581678766817 
   min and max  0.0411215184025049 and 0.2102576261742328
   standard dev 0.03060060557902899 

	X	Y	Porosity	Wts
0	100.0	900.0	0.101319	1.053701
1	100.0	800.0	0.147676	1.165625
2	100.0	700.0	0.145912	1.203734
3	100.0	600.0	0.186167	0.694636
4	100.0	500.0	0.146088	0.488570

Trend by Convolution / Local Window Average#

Let’s attempt a convolution-based trend model, this is a moving window average of the local data.

We have a convenience function that takes data with X and Y locations in a DataFrame and makes a sparse 2D array.
All cells without a data value are assigned to NumPy’s NaN (null values, missing value). Let’s see the inputs for this command.

GSLIB.DataFrame2ndarray                                       # GeostatsPy's data points to ndarray function

<function geostatspy.GSLIB.DataFrame2ndarray(df, xcol, ycol, vcol, xmin, xmax, ymin, ymax, step)>

Let’s make an sparse array with the appropriate parameters.

The reason we are doing this is that convolution programs in general work with ndarrays and not with point data in DataFrames.

por_grid = GSLIB.DataFrame2ndarray(df,'X','Y','Porosity',xmin, xmax, ymin, ymax, xsiz) # data points to ndarray

We have a ndarray (por_grid) with the data assigned to grid cells. Now we need a kernel.

The kernel represents the weights within the moving window. If we use constant weight over the kernel the resulting trend model will likely have discontinuities.
A Gaussian kernel (weights highest in the middle of the window and decreasing to 0,0 at the edge) is useful to calculate a smooth trend model.
We can use the SciPy package’s signal functions for the Gaussian kernel. Note, current scipy has a bug with their signal.gaussin function so I wrote one below.

sigma = 100; nc = 201                                          # radius of Gaussian function, size of kernel
kernel = make_Gaussian_kernel(sigma = sigma,nc = nc,csiz = xsiz)
kmax = np.max(kernel.flatten())                                # calculate the kernel maximum for plotting

plt.subplot(111)
GSLIB.pixelplt_st(kernel,xmin=-265,xmax=265,ymin=-265,ymax=265,step=10,vmin=0,vmax=kmax,title='Kernel',xlabel='X(m)',ylabel='Y(m)',vlabel='weight',cmap=cmap)
plt.subplots_adjust(left=0.0, bottom=0.0, right=1.0, top=1.1, wspace=0.2, hspace=0.2); plt.show()

_images/72ff8d63c680e55cbfb438c926fb823a5841ab4d3b67ac460043ffc0ca8b0eb7.png

Now we need to convolve our sparse data assigned to a ndarray with our Gaussian kernel.

There are many functions available for convolution. But we have a problem as we want to apply our Gaussian kernel to a sparse ndarray full of missing values.
It turns out this is a common issue for our friends in Astronomy and so their Astropy package has a convolution method that will work well. I figured out the following (so you don’t have to!).

porosity_trend = convolve(por_grid,kernel,boundary='extend',nan_treatment='interpolate',normalize_kernel=True) # convolve

No errors? It worked? Let’s look at the results. We can plot and compare the original porosity data and the resulting trend to check for consistency.

plt.subplot(311)                                               # location map of the data for comparison
GSLIB.locmap_st(df,'X','Y','Porosity',xmin,xmax,ymin,ymax,pormin,pormax,'Well Data - Porosity','X(m)','Y(m)','Porosity (fraction)',cmap)
add_grid()

plt.subplot(312)                                               # convolution trend model
GSLIB.pixelplt_st(porosity_trend,xmin,xmax,ymin,ymax,xsiz,pormin,pormax,'Porosity Trend','X(m)','Y(m)','Porosity (fraction)',cmap)

plt.subplot(313)                                               # data and trend model
GSLIB.locpix_st(porosity_trend,xmin,xmax,ymin,ymax,xsiz,pormin,pormax,df,'X','Y','Porosity','Porosity Data and Trend','X(m)','Y(m)','Porosity (fraciton)',cmap)

plt.subplots_adjust(left=0.0, bottom=0.0, right=1.0, top=3.1, wspace=0.2, hspace=0.2); plt.show()

_images/256fb5daabe2d74389178fbdf08e5e7bbe417f75041483f51ba76e64cf01520b.png

Other Methods for Trend Calculation#

There are a variety of other methods for trend calculation. I will just mention them here.

hand-drawn, expert interpretation - many 3D modeling packages allow for experts to draw trends and allow for fast interpolation to build an exhaustive trend model.
kriging - kriging provides best linear unbiased estimates between data given a spatial continuity model (more on this when we cover spatial estimation). One note of caution is that kriging is exact; therefore it will over fit unless it is use with averaged data values (e.g. over the vertical) or with a block kriging option (kriging at a volume support larger than the data).
regression - fit a function as a function of X, Y coordinates. This could be extended to more complicated prediction models from machine learning.

Trend Diagnostics#

Let’s go back to the convolution trend and check it (to demonstrate the method of trend checking). Note, I haven’t tried to perfect the result. I’m just demonstrating the method.

In addition to the previous visualization, let’s look at the distributions and summary statistics of the original declustered porosity data and the trend.

plt.subplot(121)                                               # declustered data histogram
GSLIB.hist_st(df['Porosity'],pormin,pormax,False,False,20,df['Wts'],'Porosity (fraction)','Declustered Porosity')
add_grid()

plt.subplot(122)                                               # trend histogram
GSLIB.hist_st(porosity_trend.flatten(),pormin,pormax,False,False,20,None,'Porosity Trend (fraction)','Porosity Trend')
add_grid()

plt.subplots_adjust(left=0.0, bottom=0.0, right=2.0, top=1.1, wspace=0.2, hspace=0.2); plt.show()

_images/4f0819f7449b7b682d3fb915c208af123e3902623817aa78f475df0f3d4abc79.png

We can also look at the summary statistics. Here’s a function that calculates the weighted standard deviation (and the average). We can use this with the data and declustering weights and figure out the allocation of variance between the trend and the residual.

# Weighted average and standard deviation
def weighted_avg_and_std(values, weights):                         # weighted stats from Eric O Lebigot, stack overflow
    average = np.average(values, weights=weights)
    variance = np.average((values-average)**2, weights=weights)
    return (average, math.sqrt(variance))

wavg_por,wstd_por = weighted_avg_and_std(df['Porosity'],df['Wts']) 

wavg_por_trend = np.average(porosity_trend)
wstd_por_trend = np.std(porosity_trend)

print('Naive Porosity Data: Average ' + str(round(np.average(df['Porosity']),4)) + ', Var ' + str(round(np.var(df['Porosity']),5)))
print('Declustered Porosity Data: Average ' + str(round(wavg_por,4)) + ', Var ' + str(round(wstd_por**2,5)))
print('Porosity Trend: Average            ' + str(round(wavg_por_trend,4)) + ', Var ' + str(round(wstd_por_trend**2,5)))
print('Proportion Trend / Known:          ' + str(round(wstd_por_trend**2/(wstd_por**2),3)))
print('Proportion Residual / Unknown:     ' + str(round((wstd_por**2 - wstd_por_trend**2)/(wstd_por**2),3)))

Naive Porosity Data: Average 0.127, Var 0.00094
Declustered Porosity Data: Average 0.1207, Var 0.00093
Porosity Trend: Average            0.1244, Var 0.00045
Proportion Trend / Known:          0.484
Proportion Residual / Unknown:     0.516

Check out the proportion of variance in the trend and residual.

Adding Trend to DataFrame#

Let’s add the porosity trend to our DataFrame.

We have a sample program in GeostatsPy that takes a 2D ndarray and extracts the values at the data locations and adds them as a new column in the DataFrame.
Then we can do a little math to calculate and add the porosity residual also and visualize this all together as a final check.

df = GSLIB.sample(porosity_trend,xmin,ymin,xsiz,"Por_Trend",df,'X','Y') # add the trend values to the DataFrame
df['Por_Res'] = df['Porosity'] - df['Por_Trend']               # calculate the residual and add to DataFrame

Let’s check out the DataFrame and confirm that we have everything now. We will need trend and residual in our DataFrame to support all subsequent modeling steps.

df.head()

	X	Y	Porosity	Wts	Por_Trend	Por_Res
0	100.0	900.0	0.101319	1.053701	0.127816	-0.026497
1	100.0	800.0	0.147676	1.165625	0.145671	0.002005
2	100.0	700.0	0.145912	1.203734	0.160377	-0.014465
3	100.0	600.0	0.186167	0.694636	0.163702	0.022464
4	100.0	500.0	0.146088	0.488570	0.156281	-0.010193

Check for Conditional Bias in the Trend Model#

Conditional bias is systematic under or overestimation over specific ranges of the trend or actual feature values.

plt.subplot(121)
plt.scatter(df['Por_Trend'],df['Por_Res'],color='darkorange',s=30,edgecolor='black')
plt.xlabel('Trend Model'); plt.ylabel('Residual: Truth - Trend'); plt.title('Trend Residual vs. Trend')
plt.plot([pormin,pormax],[0,0],color='black'); plt.xlim([pormin,pormax])

plt.subplot(122)
plt.scatter(df['Porosity'],df['Por_Res'],color='darkorange',s=30,edgecolor='black')
plt.xlabel('Feature'); plt.ylabel('Residual: Truth - Trend'); plt.title('Trend Residual vs. Feature')
plt.plot([pormin,pormax],[0,0],color='black'); plt.xlim([pormin,pormax])

plt.subplots_adjust(left=0.0, bottom=0.0, right=2.0, top=1.1, wspace=0.2, hspace=0.2)
plt.show()

_images/ffbfcff6e73dea0af3778a98635320048fec8cbd503a3bbbf2b5f1a4d09cfce4.png

Check to see if we are over estimating low values and underestimating high values, this indicates trend underfit.

Location Maps, Feature, Trend and Residual#

Let’s look at where the trend is too low and too high.

plt.subplot(311)
GSLIB.locmap_st(df,'X','Y','Porosity',xmin,xmax,ymin,ymax,pormin,pormax,'Well Data - Porosity','X(m)','Y(m)','Porosity (fraction)',cmap)

plt.subplot(312)
GSLIB.locpix_st(porosity_trend,xmin,xmax,ymin,ymax,xsiz,pormin,pormax,df,'X','Y','Porosity','Porosity Data and Trend','X(m)','Y(m)','Porosity (fraciton)',cmap)

plt.subplot(313)
GSLIB.locmap_st(df,'X','Y','Por_Res',xmin,xmax,ymin,ymax,-0.05,0.05,'Well Data - Porosity Residual','X(m)','Y(m)','Porosity (fraction)',cmap)

plt.subplots_adjust(left=0.0, bottom=0.0, right=1.0, top=3.1, wspace=0.2, hspace=0.2); plt.show()

_images/36a595d70fe0724ab063b23ab7748762f2a699dca298334c2b809fbab08e770b.png

Does it look correct?

a strong degree of consistency between the porosity data and trend
the porosity residual no longer has a trend, it has been detrended

plt.subplot(121)
lags,var,pairs = geostats.gamv(df,'X','Y','Porosity',-999,999,20,15,100,45.0,22.5,9999,1)
plt.scatter(lags,var,color='black',edgecolor='black',alpha=0.4,label='Porosity Original'); 
lags,var,pairs = geostats.gamv(df,'X','Y','Por_Res',-999,999,20,15,100,45.0,22.5,9999,1)
plt.scatter(lags,var,color='blue',edgecolor='black',alpha=0.4,label='Porosity Residual');
lags,var,pairs = geostats.gamv(df,'X','Y','Por_Trend',-999,999,20,15,100,45.0,22.5,9999,1)
plt.scatter(lags,var,color='red',edgecolor='black',alpha=0.4,label='Porosity Trend');
plt.plot([0,1500],[1.0,1.0],color='black'); plt.xlim([0,800]); plt.ylim([0,2.0]); plt.xlabel('Lag Distance ($h$)'); plt.ylabel('$\gamma$')
plt.legend(loc='upper right'); plt.title('Variograms')

var_trend = np.var(df['Por_Trend']); var_resid = np.var(df['Por_Res']); cov_tr = np.cov(df['Por_Trend'],df['Por_Res'])[0,0]
var_total = var_trend + var_resid + 2* cov_tr

ptrend = var_trend / var_total; presid = var_resid / var_total; pcov = 2*cov_tr / var_total

plt.subplot(122)                                              # results from the coin tosses 
plt.pie([ptrend, presid, pcov],labels = ['Trend','Residual', 'Covariance'],radius = 1, autopct='%1.1f%%', colors = ['#ff9999','#66b3ff','#edf23a'], explode = [.02,.02,0.02], wedgeprops = {"edgecolor":"k",'linewidth': 1}  )
plt.title('Variance Decomposition of Trend and Residual')

plt.subplots_adjust(left=0.0, bottom=0.0, right=2.0, top=1.1, wspace=0.1, hspace=0.2); plt.show()

_images/5ebb9807cb0a89c0ca47643712f21bec727b8503c5c161c96538dd40609c4e82.png

Experiment with Different Trend Models#

Here’s a more compact workflow for exploration.

sigma = 200; nc = 201                                         # radius of Gaussian function, size of kernel

kernel = make_Gaussian_kernel(sigma = sigma,nc = nc,csiz = xsiz)
porosity_trend = convolve(por_grid,kernel,boundary='extend',nan_treatment='interpolate',normalize_kernel=True)
df = GSLIB.sample(porosity_trend,xmin,ymin,xsiz,"Por_Trend",df,'X','Y')
df['Por_Res'] = df['Porosity'] - df['Por_Trend'] # calculate the residual and add to DataFrame
var_trend = np.var(df['Por_Trend']); var_resid = np.var(df['Por_Res']);
ptrend = var_trend / (var_trend+var_resid); presid = var_resid / (var_trend+var_resid)

plt.subplot(221)
GSLIB.locpix_st(porosity_trend,xmin,xmax,ymin,ymax,xsiz,pormin,pormax,df,'X','Y','Porosity','Porosity Data and Trend','X(m)','Y(m)','Porosity (fraciton)',cmap=plt.cm.inferno)

plt.subplot(222)
plt.hist(df['Porosity'],bins = np.linspace(0.05,0.25),edgecolor='black',color='black',alpha=0.6,density=True,label='Data'); plt.xlim([pormin,pormax])
plt.hist(porosity_trend.flatten(),bins = np.linspace(0.05,0.25),edgecolor='black',color='#CC5801',alpha=0.6,density=True,label='Trend'); plt.xlabel('Porosity (fraction)'); plt.ylabel('Density'); plt.title('Porosity Original and Trend Distributions')
plt.legend(loc='upper right')

plt.subplot(223)                                              # results from the coin tosses 
plt.pie([ptrend, presid],labels = ['Trend','Residual'],radius = 1, autopct='%1.1f%%', colors = ['#CC5801','#66b3ff'], explode = [.02,.02], wedgeprops = {"edgecolor":"k",'linewidth': 1}  )
plt.title('Variance Decomposition of Porosity Trend and Residual')

plt.subplot(224)
plt.scatter(df['Porosity'],df['Por_Res'],color='#66b3ff',s=30,edgecolor='black')
plt.xlabel('Porosity Data (fraction)'); plt.ylabel('Porosity Residual: Data - Trend (fraction)'); plt.title('Porosity Trend Residual vs. Data')
plt.plot([pormin,pormax],[0,0],color='black'); plt.xlim([pormin,pormax]); plt.ylim([-0.1,0.1])

plt.subplots_adjust(left=0.0, bottom=0.0, right=2.0, top=2.1, wspace=0.1, hspace=0.2); plt.show()

_images/4a2032cfacbdfa47bb59ede34b9bf89ba5fbbd17050ae577bf921b977d525857.png

Full Trend and Residual Workflow#

Now let’s finish a model with deterministic trend (convolution kernel) and kriged residual.

for brevity we will assume a reasonable isotropic variogram model, instead of calculating the directional variograms of the residual

%%capture --no-display     

ktype = 0                                                     # kriging type, 0 - simple, 1 - ordinary
radius = 300                                                  # search radius for neighbouring data
nxdis = 1; nydis = 1                                          # number of grid discretizations for block kriging (not tested)
ndmin = 0; ndmax = 40                                         # minimum and maximum data for an estimate
tmin = -99999.9; tmax = 99999.9                               # minimum property value
res_vario = GSLIB.make_variogram(nug=0.0,nst=1,it1=1,cc1=var_resid,azi1=45,hmaj1=200,hmin1=200) # porosity variogram

res_kmap, res_vmap = geostats.kb2d(df,'X','Y','Por_Res',tmin,tmax,nx,xmn,xsiz,ny,ymn,ysiz,nxdis,nydis,
         ndmin,ndmax,radius,ktype,0.0,res_vario)

por_model = porosity_trend + res_kmap

Now let’s visualize the workflows steps.

plt.subplot(221)
GSLIB.locmap_st(df,'X','Y','Porosity',xmin,xmax,ymin,ymax,pormin,pormax,'Well Data - Porosity','X(m)','Y(m)','Porosity (fraction)',cmap)

plt.subplot(222)
GSLIB.locpix_st(porosity_trend,xmin,xmax,ymin,ymax,xsiz,pormin,pormax,df,'X','Y','Porosity','Porosity Data and Trend Model','X(m)','Y(m)','Porosity (fraction)',cmap)

plt.subplot(223)
GSLIB.locpix_st(res_kmap,xmin,xmax,ymin,ymax,xsiz,-0.05,0.05,df,'X','Y','Por_Res','Porosity Residual Data and Model','X(m)','Y(m)','Porosity Residual (fraction)',cmap)

plt.subplot(224)
GSLIB.locpix_st(por_model,xmin,xmax,ymin,ymax,xsiz,pormin,pormax,df,'X','Y','Porosity','Porosity Trend + Kriged Residual','X(m)','Y(m)','Porosity (fraction)',cmap)

plt.subplots_adjust(left=0.0, bottom=0.0, right=2.0, top=2., wspace=0.1, hspace=0.2)
plt.show()

_images/e35f64acb298afe90a3c0ebc3a0ab962057be9fd9f96fa699d14eeeab13781b3.png

Compare and Contrast the Stationary Mean#

We add a new kriged map, simple kriging with the declustered porosity average.

%%capture --no-display     

run = True

vrange = 80.0
por_vario = GSLIB.make_variogram(nug=0.0,nst=1,it1=1.0,cc1=var_resid,azi1=45,hmaj1=vrange,hmin1=vrange) # variogram

if run == True:
    por_kmap, por_vmap = geostats.kb2d(df,'X','Y','Porosity',tmin,tmax,nx,xmn,xsiz,ny,ymn,ysiz,nxdis,nydis,
         ndmin,ndmax,radius,ktype,wavg_por,por_vario)

plt.subplot(311)
GSLIB.locpix_st(por_kmap,xmin,xmax,ymin,ymax,xsiz,pormin,pormax,df,'X','Y','Porosity',
                'Kriged Model - No Trend / Stationary Mean','X(m)','Y(m)',
                'Porosity (fraction)',cmap = plt.cm.seismic)

plt.subplot(312)
GSLIB.locpix_st(por_model,xmin,xmax,ymin,ymax,xsiz,pormin,pormax,df,'X','Y',
                'Porosity','Porosity Trend + Kriged Residual','X(m)','Y(m)','Porosity (fraction)',cmap = plt.cm.seismic)

plt.subplot(313)
plt.scatter(por_kmap.flatten(),por_model.flatten(),s=10,color='black',alpha=0.1,edgecolor='black',zorder=5)
add_grid(); plt.xlim([pormin,pormax]); plt.ylim([pormin,pormax]); plt.xlabel('Porosity Estimate - No Trend')
plt.plot([pormin,pormax],[pormin,pormax],color='red',lw=2,zorder=10)
plt.ylabel('Porosity Estimate - Trend + Residual'); plt.title('Comparison of Trend vs. No Trend Porosity Estimates')

plt.subplots_adjust(left=0.0, bottom=0.0, right=1.0, top=3.1, wspace=0.2, hspace=0.2)
plt.show()

_images/e136e6c8b7800e49767621dcfde7a9c06e3eafd9a5df720d513ceb8f2a829742.png

See how the simple kriging with stationary mean model approaches the global mean away from the data, while the trend + residual kriging approaches the trend model away from the data.

Comments#

This was a basic demonstration of variogram calculation with GeostatsPy. Much more can be done, I have other demonstrations for modeling workflows with GeostatsPy in the GitHub repository GeostatsPy_Demos.

I hope this is helpful,

Michael

About the Author#

Michael Pyrcz is a professor in the Cockrell School of Engineering, and the Jackson School of Geosciences, at The University of Texas at Austin, where he researches and teaches subsurface, spatial data analytics, geostatistics, and machine learning. Michael is also,

the principal investigator of the Energy Analytics freshmen research initiative and a core faculty in the Machine Learn Laboratory in the College of Natural Sciences, The University of Texas at Austin
an associate editor for Computers and Geosciences, and a board member for Mathematical Geosciences, the International Association for Mathematical Geosciences.

Michael has written over 70 peer-reviewed publications, a Python package for spatial data analytics, co-authored a textbook on spatial data analytics, Geostatistical Reservoir Modeling and author of two recently released e-books, Applied Geostatistics in Python: a Hands-on Guide with GeostatsPy and Applied Machine Learning in Python: a Hands-on Guide with Code.

All of Michael’s university lectures are available on his YouTube Channel with links to 100s of Python interactive dashboards and well-documented workflows in over 40 repositories on his GitHub account, to support any interested students and working professionals with evergreen content. To find out more about Michael’s work and shared educational resources visit his Website.

Want to Work Together?#

I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.

Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I’d be happy to drop by and work with you!
Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PI is Professor John Foster)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!
I can be reached at mpyrcz@austin.utexas.edu.

I’m always happy to discuss,

Michael

Michael Pyrcz, Ph.D., P.Eng. Professor, Cockrell School of Engineering and The Jackson School of Geosciences, The University of Texas at Austin

Trend Modeling

Contents

Trend Modeling#

Trend Modeling#

Spatial Estimation#

Kriging#

Properties of Kriging#

Load the Required Libraries#

Define Functions#

Set the Working Directory#

Loading Tabular Data#

Summary Statistics#

Specify the Grid#

Location Maps#

Declustering Spatial Data#

Trend by Convolution / Local Window Average#

Other Methods for Trend Calculation#

Trend Diagnostics#

Adding Trend to DataFrame#

Check for Conditional Bias in the Trend Model#

Location Maps, Feature, Trend and Residual#

Experiment with Different Trend Models#

Full Trend and Residual Workflow#

Compare and Contrast the Stationary Mean#

Comments#

About the Author#

Want to Work Together?#