Simulation#

Michael J. Pyrcz, Professor, The University of Texas at Austin

Twitter | GitHub | Website | GoogleScholar | Book | YouTube | Applied Geostats in Python e-book | LinkedIn

Chapter of e-book “Applied Geostatistics in Python: a Hands-on Guide with GeostatsPy”.

Cite this e-Book as:

Pyrcz, M.J., 2024, Applied Geostatistics in Python: a Hands-on Guide with GeostatsPy, https://geostatsguy.github.io/GeostatsPyDemos_Book.

The workflows in this book and more are available here:

Cite the GeostatsPyDemos GitHub Repository as:

Pyrcz, M.J., 2024, GeostatsPyDemos: GeostatsPy Python Package for Spatial Data Analytics and Geostatistics Demonstration Workflows Repository (0.0.1). Zenodo. https://zenodo.org/doi/10.5281/zenodo.12667035

DOI

By Michael J. Pyrcz
© Copyright 2024.

This chapter is a tutorial for / demonstration of Spatial Simulation with Sequential Gaussian Simulation (SGSIM) with a 2D map example.

  • sequential Gaussian simulation is a common geostatistical method for calculating spatial models with an appropriate level of heterogeneity

YouTube Lecture: check out my lecture on Stochastic Simulation. For your convenience here’s a summary of salient points.

Estimation vs. Simulation#

Let’s start by comparing spatial estimation and simulation,

Estimation:

  • honors local data

  • locally accurate, primary goal of estimation is 1 estimate!

  • too smooth, appropriate for visualizing trends

  • too smooth, inappropriate for flow simulation

  • one model, no assessment of global uncertainty

Let’s visualize a population, sample and estimation model. Note the estimation model is too smooth, too much spatial continuity and too little variance.

Spatial dataset, truth, and an estimation model (left), estimation model and truth histograms (center) and experimental variograms (right).

Simulation:

  • honors local data

  • sacrifices local accuracy, reproduces histogram

  • honors spatial variability, appropriate for flow simulation

  • alternative realizations, change random number seed

  • many models (realizations), assessment of global uncertainty

Now let’s visualize a population, sample and simulation model. Note the simulated realizations are locally less accurate, but are not too smooth, the spatial continuity and variance are correct.

Spatial dataset, truth, and an simulation model (left), simulation model and truth histograms (center) and experimental variograms (right) for two realizations (above and below).

Note with estimation we calculate one model, and with simulation we calculate many realizations to represent uncertainty,

Spatial data section (left), and truth, estimation and simulations models (right).

Now let’s explain the concept of spatial simulation.

Spatial Simulation#

This method is critical for:

  1. Prediction away from wells, e.g. pre-drill assessments, with uncertainty

  2. Spatial uncertainty modeling.

  3. Heterogeneity realizations ready for application to the transfer function.

Sequential Gaussian Simulation#

A simulation method to calculate realizations for spatial models based on the following principles,

  • Sequential - adding the previously simulated values to the data to ensure the covariance is correct between the simulated values

  • Gaussian - transformation to Gaussian space so that the local distributions of uncertainty are known given the kriging mean and kriging variance, and the global distribution is reproduced after back-transformation to the original distribution

  • Simulation - with Monte Carlo simulation from the local distributions to add in the missing variance and to calculate multiple, equiprobable realizations. The random seed determines the individual Monte Carlo simulations along with the random path for the sequential simulation.

Here are all the steps for sequential Gaussian simulation,

  1. Establish grid network and coordinate system, flatten the system including flattening folds and restoring faults

Original data and grid (above) and flattened grid (below).
  1. Assign data to the grid, account for scale change from data to model grid cells

Original data and grid (above) and flattened grid with data assigned to grid cells (below).
  1. Transform data to Gaussian space, Gaussian anamorphosis applied to original data distribution

Distribution and flattened grid with data assigned to grid cells transformed to Gaussian with a mean of 0.0 and variance of 1.0.
  1. Calculate and model the variogram of the Gaussian transformed data

Distribution and flattened grid with data assigned to grid cells transformed to Gaussian with variogram calculated and modeled.
  1. Determine a random path through all of the grid nodes, at each node:

Flattened grid with data assigned to grid cells transformed to Gaussian with random path over cell without data.
  • find nearby data and previously simulated grid nodes

  • construct the conditional distribution by kriging, mean as kriging estimate and variance as kriging variance

Flattened grid with data assigned to grid cells transformed to Gaussian with random path over cell without data, and simple kriging to calculate the local distribution of uncertainty at the first location on the random path.
  • Monte Carlo simulate a realization from the conditional distribution

Flattened grid with data assigned to grid cells transformed to Gaussian with random path over cell without data, and Monte Carlo simulation from the kriging-derived local distribution of uncertainty.
  • assign the simulated value to the grid as data

Flattened grid with data assigned to grid cells transformed to Gaussian with random path over cell without data, and Monte Carlo simulation from the kriging-derived local distribution of uncertainty assigned as data.
  1. Check realization (could also check after back transform). Does the realization in Gaussian space honor,

  • data at the data locations?

  • honor the histogram, \(N\left[0,1\right]\) standard normal with a mean of zero and a variance of one?

  • and honor the variogram?

  1. Back-transform the simulated values from Gaussian space to the original data distribution

  2. Restore to the original framework, including adding back the folds and faults

  3. Check that the realization honors,

  • geological concepts?

  • geophysical data?

  • historical production data?

  1. Calculate multiple realizations by repeating steps 5 through 9

Here are the critical steps of the sequential Gaussian simulation algorithm only,

  1. Transform the data to Gaussian with a mean of 0.0 and variance of 1.0 (known as standard normal)

  2. Assign a random path over the model grid, at each grid location sequentially simulation (apply kriging to calculate the local CDF, then Monte Carlo simulate a local realization, and assign the local realization to the data)

  3. Back-transform the simulated values to the original data distribution

Load the Required Libraries#

The following code loads the required libraries.

import geostatspy.GSLIB as GSLIB                              # GSLIB utilities, visualization and wrapper
import geostatspy.geostats as geostats                        # GSLIB methods convert to Python      
import geostatspy
print('GeostatsPy version: ' + str(geostatspy.__version__))
GeostatsPy version: 0.0.72

We will also need some standard packages. These should have been installed with Anaconda 3.

import os                                                     # set working directory, run executables

from tqdm import tqdm                                         # suppress the status bar
from functools import partialmethod
tqdm.__init__ = partialmethod(tqdm.__init__, disable=True)

ignore_warnings = True                                        # ignore warnings?
import numpy as np                                            # ndarrays for gridded data
import pandas as pd                                           # DataFrames for tabular data
import matplotlib.pyplot as plt                               # for plotting
from matplotlib.ticker import (MultipleLocator, AutoMinorLocator) # control of axes ticks
from matplotlib import gridspec                               # custom subplots
plt.rc('axes', axisbelow=True)                                # plot all grids below the plot elements
if ignore_warnings == True:                                   
    import warnings
    warnings.filterwarnings('ignore')
from IPython.utils import io                                  # mute output from simulation
cmap = plt.cm.inferno                                         # color map

If you get a package import error, you may have to first install some of these packages. This can usually be accomplished by opening up a command window on Windows and then typing ‘python -m pip install [package-name]’. More assistance is available with the respective package docs.

Declare Functions#

Here’s a convenience function for plotting variograms.

def add_grid():
    plt.gca().grid(True, which='major',linewidth = 1.0); plt.gca().grid(True, which='minor',linewidth = 0.2) # add y grids
    plt.gca().tick_params(which='major',length=7); plt.gca().tick_params(which='minor', length=4)
    plt.gca().xaxis.set_minor_locator(AutoMinorLocator()); plt.gca().yaxis.set_minor_locator(AutoMinorLocator()) # turn on minor ticks   

def vargplot(feature,lags,gamma_maj,gamma_min,npps_maj,npps_min,vmodel,azi,atol,sill): # plot the variogram
    index_maj,lags_maj,gmod_maj,cov_maj,ro_maj = geostats.vmodel(nlag=100,xlag=10,azm=azi,vario=vmodel);
    index_min,lags_min,gmod_min,cov_min,ro_min = geostats.vmodel(nlag=100,xlag=10,azm=azi+90.0,vario=vmodel);
    
    plt.scatter(lags,gamma_maj,color = 'black',s = npps_maj*0.01,label = 'Major Azimuth ' +str(azi), alpha = 0.8)
    plt.plot(lags_maj,gmod_maj,color = 'black')
    plt.scatter(lags,gamma_min,color = 'red',s = npps_min*0.01,label = 'Minor Azimuth ' +str(azi+90.0), alpha = 0.8)
    plt.plot(lags_min,gmod_min,color = 'red',label = 'Input Major Model')
    plt.plot([0,2000],[sill,sill],color = 'black',label = 'Input Minor Model')
    plt.xlabel(r'Lag Distance $\bf(h)$, (m)')
    plt.ylabel(r'$\gamma \bf(h)$')
    if atol < 90.0:
        plt.title('Check Directional ' + feature + ' Variogram')
    else: 
        plt.title('Check Omnidirectional NSCORE ' + feature + ' Variogram')
    plt.xlim([0,1000]); #plt.ylim([0,1.8])
    plt.legend(loc="lower right")
    add_grid()

Set the Working Directory#

I always like to do this so I don’t lose files and to simplify subsequent read and writes (avoid including the full address each time).

#os.chdir("c:/PGE383")                                     # set the working directory

Loading Tabular Data#

Here’s the command to load our comma delimited data file in to a Pandas’ DataFrame object.

  • We will also extract a limited sample to reduce data density. This way we can observe more of the heterogeneity from the simulation with the spatial continuity model, rather than mostly data driven heterogeneity.

  • By setting unconditional to True the data are shifted well outside the area of interest and are only used for the target reference distribution

unconditional = False
df = pd.read_csv(r"https://raw.githubusercontent.com/GeostatsGuy/GeoDataSets/master/sample_data_MV_biased.csv") # from Dr. Pyrcz's GitHub repo
df = df.sample(50)                                        # extract 50 samples
df = df.reset_index()                                     # reset the record index 
df = df.drop(['index','Unnamed: 0','AI'],axis=1)          # remove extra columns in DataFrame
df['logPerm'] = np.log(df['Perm'].values)                 # calculate the log of permeability
if unconditional == True:
    df['X'] = df['X'] + 9999999.9                         # move the data outside the model, just use the distributions as a reference to constrain the simulated distributions
df.head()                                                 # DataFrame summary
X Y Facies Porosity Perm logPerm
0 800.0 259.0 1.0 0.148592 17.303238 2.850894
1 900.0 200.0 1.0 0.139414 7.750533 2.047762
2 220.0 859.0 1.0 0.149480 65.604363 4.183642
3 610.0 189.0 0.0 0.118295 6.525972 1.875790
4 780.0 429.0 0.0 0.123508 23.432337 3.154117

Sequential Gaussian Simulation#

Let’s jump right to building a variety of models with simulation and visualizing the results. We will start with a test of 3 realizations.

  • look at the realizations and check the histogram and variograms

%%capture --no-display                                                                

zmin = 0.00; zmax = 0.20                                  # feature min and max values 
nx = 80; ny = 80; xsiz = 10.0; ysiz = 10.0; xmn = 5.0; ymn = 5.0; nxdis = 1; nydis = 1 # grid specification
nreal = 3                                                 # number of realizations
ndmin = 0; ndmax = 20                                     # number of data for each kriging system
vario = GSLIB.make_variogram(nug=0.0,nst=1,it1=1,cc1=1.0,azi1=0.0,hmaj1=200,hmin1=50)
tmin = -999; tmax = 999
sill = np.std(df['Porosity'].values)*np.std(df['Porosity'].values)

sim_sk = geostats.sgsim(df,'X','Y','Porosity',wcol=-1,scol=-1,tmin=tmin,tmax=tmax,itrans=1,ismooth=0,dftrans=0,tcol=0,
            twtcol=0,zmin=zmin,zmax=zmax,ltail=1,ltpar=0.0,utail=1,utpar=0.3,nsim=nreal,
            nx=nx,xmn=xmn,xsiz=xsiz,ny=ny,ymn=ymn,ysiz=ysiz,seed=73073,
            ndmin=ndmin,ndmax=ndmax,nodmax=20,mults=1,nmult=3,noct=-1,ktype=0,colocorr=0.0,sec_map=0,vario=vario);

xmin = xmn-xsiz/2; xmax = nx*xsiz + xmin; ymin = ymn-ysiz/2; ymax = ny*ysiz + ymin; cmap = plt.cm.inferno # plotting parameters

for isim in range(0,nreal):
    plt.subplot(nreal,3,1 + isim*3)                                          # plot the results
    GSLIB.locpix_st(sim_sk[isim],xmin,xmax,ymin,ymax,xsiz,zmin,zmax,df,'X','Y','Porosity','Sequential Gaussian Simulation w. Simple Kriging #' + str(isim+1),'X(m)','Y(m)','Porosity',cmap)
    
    lags, gamma_x, npps_x = geostats.gam(sim_sk[isim],-9999.9,9999.9,xsiz,ysiz,ixd=1,iyd=0,nlag=100,isill=0)
    lags, gamma_y, npps_y = geostats.gam(sim_sk[isim],-9999.9,9999.9,xsiz,ysiz,ixd=0,iyd=1,nlag=100,isill=0)

    plt.subplot(nreal,3,2 + isim*3)
    plt.hist(sim_sk[isim].flatten(),bins=np.linspace(zmin,zmax,30),density=True,color='darkorange',alpha=0.6,edgecolor='black',zorder=10,label='Realization')
    plt.hist(df['Porosity'].values,bins=np.linspace(zmin,zmax,30),density=True,color='grey',alpha=0.5,edgecolor='black',zorder=1,label='Input')
    plt.legend(loc='upper left'); add_grid()
    plt.xlim([zmin,zmax]); plt.ylim([0,30]); plt.title('Check Histogram'); add_grid()
    
    plt.subplot(nreal,3,3 + isim*3)
    vario_plot = vario
    vario_plot.update({"cc1":sill})
    vargplot('Porosity',lags,gamma_x,gamma_y,npps_x,npps_y,vario_plot,azi=90.0,atol=22.5,sill=sill)     # plot the variogram
    plt.xlim([0,1000]); plt.ylim([0.0,sill*2.0])

plt.subplots_adjust(left=0.0, bottom=0.0, right=3.0, top=3.1, wspace=0.2, hspace=0.2); plt.show()
_images/3a8f50f1faed354ff2a3a1ec7ab1994b65c5bc059fac1ba98c425ad39972b769.png

Comments#

This was a basic demonstration of sequential Gaussian simulation to calculate spatial heterogeneity realizations that honor the data, histogram and variogram. Much more can be done, I have other demonstrations for modeling workflows with GeostatsPy in the GitHub repository GeostatsPy_Demos.

I hope this is helpful,

Michael

The Author:#

Michael Pyrcz, Professor, The University of Texas at Austin Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions

With over 17 years of experience in subsurface consulting, research and development, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers’ and geoscientists’ impact in subsurface resource development.

For more about Michael check out these links:

Twitter | GitHub | Website | GoogleScholar | Geostatistics Book | YouTube | Applied Geostats in Python e-book | Applied Machine Learning in Python e-book | LinkedIn

Want to Work Together?#

I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.

  • Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I’d be happy to drop by and work with you!

  • Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!

  • I can be reached at mpyrcz@austin.utexas.edu.

I’m always happy to discuss,

Michael

Michael Pyrcz, Ph.D., P.Eng. Professor, Cockrell School of Engineering and The Jackson School of Geosciences, The University of Texas at Austin

More Resources Available at: Twitter | GitHub | Website | GoogleScholar | Geostatistics Book | YouTube | Applied Geostats in Python e-book | Applied Machine Learning in Python e-book | LinkedIn