Simulation#
Michael J. Pyrcz, Professor, The University of Texas at Austin
Twitter | GitHub | Website | GoogleScholar | Book | YouTube | Applied Geostats in Python e-book | LinkedIn
Chapter of e-book “Applied Geostatistics in Python: a Hands-on Guide with GeostatsPy”.
Cite this e-Book as:
Pyrcz, M.J., 2024, Applied Geostatistics in Python: a Hands-on Guide with GeostatsPy, https://geostatsguy.github.io/GeostatsPyDemos_Book.
The workflows in this book and more are available here:
Cite the GeostatsPyDemos GitHub Repository as:
Pyrcz, M.J., 2024, GeostatsPyDemos: GeostatsPy Python Package for Spatial Data Analytics and Geostatistics Demonstration Workflows Repository (0.0.1). Zenodo. https://zenodo.org/doi/10.5281/zenodo.12667035
By Michael J. Pyrcz
© Copyright 2024.
This chapter is a tutorial for / demonstration of Spatial Simulation with Sequential Gaussian Simulation (SGSIM) with a 2D map example.
sequential Gaussian simulation is a common geostatistical method for calculating spatial models with an appropriate level of heterogeneity
YouTube Lecture: check out my lecture on Stochastic Simulation. For your convenience here’s a summary of salient points.
Estimation vs. Simulation#
Let’s start by comparing spatial estimation and simulation,
Estimation:
honors local data
locally accurate, primary goal of estimation is 1 estimate!
too smooth, appropriate for visualizing trends
too smooth, inappropriate for flow simulation
one model, no assessment of global uncertainty
Let’s visualize a population, sample and estimation model. Note the estimation model is too smooth, too much spatial continuity and too little variance.
Simulation:
honors local data
sacrifices local accuracy, reproduces histogram
honors spatial variability, appropriate for flow simulation
alternative realizations, change random number seed
many models (realizations), assessment of global uncertainty
Now let’s visualize a population, sample and simulation model. Note the simulated realizations are locally less accurate, but are not too smooth, the spatial continuity and variance are correct.
Note with estimation we calculate one model, and with simulation we calculate many realizations to represent uncertainty,
Now let’s explain the concept of spatial simulation.
Spatial Simulation#
This method is critical for:
Prediction away from wells, e.g. pre-drill assessments, with uncertainty
Spatial uncertainty modeling.
Heterogeneity realizations ready for application to the transfer function.
Sequential Gaussian Simulation#
A simulation method to calculate realizations for spatial models based on the following principles,
Sequential - adding the previously simulated values to the data to ensure the covariance is correct between the simulated values
Gaussian - transformation to Gaussian space so that the local distributions of uncertainty are known given the kriging mean and kriging variance, and the global distribution is reproduced after back-transformation to the original distribution
Simulation - with Monte Carlo simulation from the local distributions to add in the missing variance and to calculate multiple, equiprobable realizations. The random seed determines the individual Monte Carlo simulations along with the random path for the sequential simulation.
Here are all the steps for sequential Gaussian simulation,
Establish grid network and coordinate system, flatten the system including flattening folds and restoring faults
Assign data to the grid, account for scale change from data to model grid cells
Transform data to Gaussian space, Gaussian anamorphosis applied to original data distribution
Calculate and model the variogram of the Gaussian transformed data
Determine a random path through all of the grid nodes, at each node:
find nearby data and previously simulated grid nodes
construct the conditional distribution by kriging, mean as kriging estimate and variance as kriging variance
Monte Carlo simulate a realization from the conditional distribution
assign the simulated value to the grid as data
Check realization (could also check after back transform). Does the realization in Gaussian space honor,
data at the data locations?
honor the histogram, \(N\left[0,1\right]\) standard normal with a mean of zero and a variance of one?
and honor the variogram?
Back-transform the simulated values from Gaussian space to the original data distribution
Restore to the original framework, including adding back the folds and faults
Check that the realization honors,
geological concepts?
geophysical data?
historical production data?
Calculate multiple realizations by repeating steps 5 through 9
Here are the critical steps of the sequential Gaussian simulation algorithm only,
Transform the data to Gaussian with a mean of 0.0 and variance of 1.0 (known as standard normal)
Assign a random path over the model grid, at each grid location sequentially simulation (apply kriging to calculate the local CDF, then Monte Carlo simulate a local realization, and assign the local realization to the data)
Back-transform the simulated values to the original data distribution
Load the Required Libraries#
The following code loads the required libraries.
import geostatspy.GSLIB as GSLIB # GSLIB utilities, visualization and wrapper
import geostatspy.geostats as geostats # GSLIB methods convert to Python
import geostatspy
print('GeostatsPy version: ' + str(geostatspy.__version__))
GeostatsPy version: 0.0.72
We will also need some standard packages. These should have been installed with Anaconda 3.
import os # set working directory, run executables
from tqdm import tqdm # suppress the status bar
from functools import partialmethod
tqdm.__init__ = partialmethod(tqdm.__init__, disable=True)
ignore_warnings = True # ignore warnings?
import numpy as np # ndarrays for gridded data
import pandas as pd # DataFrames for tabular data
import matplotlib.pyplot as plt # for plotting
from matplotlib.ticker import (MultipleLocator, AutoMinorLocator) # control of axes ticks
from matplotlib import gridspec # custom subplots
plt.rc('axes', axisbelow=True) # plot all grids below the plot elements
if ignore_warnings == True:
import warnings
warnings.filterwarnings('ignore')
from IPython.utils import io # mute output from simulation
cmap = plt.cm.inferno # color map
If you get a package import error, you may have to first install some of these packages. This can usually be accomplished by opening up a command window on Windows and then typing ‘python -m pip install [package-name]’. More assistance is available with the respective package docs.
Declare Functions#
Here’s a convenience function for plotting variograms.
def add_grid():
plt.gca().grid(True, which='major',linewidth = 1.0); plt.gca().grid(True, which='minor',linewidth = 0.2) # add y grids
plt.gca().tick_params(which='major',length=7); plt.gca().tick_params(which='minor', length=4)
plt.gca().xaxis.set_minor_locator(AutoMinorLocator()); plt.gca().yaxis.set_minor_locator(AutoMinorLocator()) # turn on minor ticks
def vargplot(feature,lags,gamma_maj,gamma_min,npps_maj,npps_min,vmodel,azi,atol,sill): # plot the variogram
index_maj,lags_maj,gmod_maj,cov_maj,ro_maj = geostats.vmodel(nlag=100,xlag=10,azm=azi,vario=vmodel);
index_min,lags_min,gmod_min,cov_min,ro_min = geostats.vmodel(nlag=100,xlag=10,azm=azi+90.0,vario=vmodel);
plt.scatter(lags,gamma_maj,color = 'black',s = npps_maj*0.01,label = 'Major Azimuth ' +str(azi), alpha = 0.8)
plt.plot(lags_maj,gmod_maj,color = 'black')
plt.scatter(lags,gamma_min,color = 'red',s = npps_min*0.01,label = 'Minor Azimuth ' +str(azi+90.0), alpha = 0.8)
plt.plot(lags_min,gmod_min,color = 'red',label = 'Input Major Model')
plt.plot([0,2000],[sill,sill],color = 'black',label = 'Input Minor Model')
plt.xlabel(r'Lag Distance $\bf(h)$, (m)')
plt.ylabel(r'$\gamma \bf(h)$')
if atol < 90.0:
plt.title('Check Directional ' + feature + ' Variogram')
else:
plt.title('Check Omnidirectional NSCORE ' + feature + ' Variogram')
plt.xlim([0,1000]); #plt.ylim([0,1.8])
plt.legend(loc="lower right")
add_grid()
Set the Working Directory#
I always like to do this so I don’t lose files and to simplify subsequent read and writes (avoid including the full address each time).
#os.chdir("c:/PGE383") # set the working directory
Loading Tabular Data#
Here’s the command to load our comma delimited data file in to a Pandas’ DataFrame object.
We will also extract a limited sample to reduce data density. This way we can observe more of the heterogeneity from the simulation with the spatial continuity model, rather than mostly data driven heterogeneity.
By setting unconditional to True the data are shifted well outside the area of interest and are only used for the target reference distribution
unconditional = False
df = pd.read_csv(r"https://raw.githubusercontent.com/GeostatsGuy/GeoDataSets/master/sample_data_MV_biased.csv") # from Dr. Pyrcz's GitHub repo
df = df.sample(50) # extract 50 samples
df = df.reset_index() # reset the record index
df = df.drop(['index','Unnamed: 0','AI'],axis=1) # remove extra columns in DataFrame
df['logPerm'] = np.log(df['Perm'].values) # calculate the log of permeability
if unconditional == True:
df['X'] = df['X'] + 9999999.9 # move the data outside the model, just use the distributions as a reference to constrain the simulated distributions
df.head() # DataFrame summary
X | Y | Facies | Porosity | Perm | logPerm | |
---|---|---|---|---|---|---|
0 | 800.0 | 259.0 | 1.0 | 0.148592 | 17.303238 | 2.850894 |
1 | 900.0 | 200.0 | 1.0 | 0.139414 | 7.750533 | 2.047762 |
2 | 220.0 | 859.0 | 1.0 | 0.149480 | 65.604363 | 4.183642 |
3 | 610.0 | 189.0 | 0.0 | 0.118295 | 6.525972 | 1.875790 |
4 | 780.0 | 429.0 | 0.0 | 0.123508 | 23.432337 | 3.154117 |
Sequential Gaussian Simulation#
Let’s jump right to building a variety of models with simulation and visualizing the results. We will start with a test of 3 realizations.
look at the realizations and check the histogram and variograms
%%capture --no-display
zmin = 0.00; zmax = 0.20 # feature min and max values
nx = 80; ny = 80; xsiz = 10.0; ysiz = 10.0; xmn = 5.0; ymn = 5.0; nxdis = 1; nydis = 1 # grid specification
nreal = 3 # number of realizations
ndmin = 0; ndmax = 20 # number of data for each kriging system
vario = GSLIB.make_variogram(nug=0.0,nst=1,it1=1,cc1=1.0,azi1=0.0,hmaj1=200,hmin1=50)
tmin = -999; tmax = 999
sill = np.std(df['Porosity'].values)*np.std(df['Porosity'].values)
sim_sk = geostats.sgsim(df,'X','Y','Porosity',wcol=-1,scol=-1,tmin=tmin,tmax=tmax,itrans=1,ismooth=0,dftrans=0,tcol=0,
twtcol=0,zmin=zmin,zmax=zmax,ltail=1,ltpar=0.0,utail=1,utpar=0.3,nsim=nreal,
nx=nx,xmn=xmn,xsiz=xsiz,ny=ny,ymn=ymn,ysiz=ysiz,seed=73073,
ndmin=ndmin,ndmax=ndmax,nodmax=20,mults=1,nmult=3,noct=-1,ktype=0,colocorr=0.0,sec_map=0,vario=vario);
xmin = xmn-xsiz/2; xmax = nx*xsiz + xmin; ymin = ymn-ysiz/2; ymax = ny*ysiz + ymin; cmap = plt.cm.inferno # plotting parameters
for isim in range(0,nreal):
plt.subplot(nreal,3,1 + isim*3) # plot the results
GSLIB.locpix_st(sim_sk[isim],xmin,xmax,ymin,ymax,xsiz,zmin,zmax,df,'X','Y','Porosity','Sequential Gaussian Simulation w. Simple Kriging #' + str(isim+1),'X(m)','Y(m)','Porosity',cmap)
lags, gamma_x, npps_x = geostats.gam(sim_sk[isim],-9999.9,9999.9,xsiz,ysiz,ixd=1,iyd=0,nlag=100,isill=0)
lags, gamma_y, npps_y = geostats.gam(sim_sk[isim],-9999.9,9999.9,xsiz,ysiz,ixd=0,iyd=1,nlag=100,isill=0)
plt.subplot(nreal,3,2 + isim*3)
plt.hist(sim_sk[isim].flatten(),bins=np.linspace(zmin,zmax,30),density=True,color='darkorange',alpha=0.6,edgecolor='black',zorder=10,label='Realization')
plt.hist(df['Porosity'].values,bins=np.linspace(zmin,zmax,30),density=True,color='grey',alpha=0.5,edgecolor='black',zorder=1,label='Input')
plt.legend(loc='upper left'); add_grid()
plt.xlim([zmin,zmax]); plt.ylim([0,30]); plt.title('Check Histogram'); add_grid()
plt.subplot(nreal,3,3 + isim*3)
vario_plot = vario
vario_plot.update({"cc1":sill})
vargplot('Porosity',lags,gamma_x,gamma_y,npps_x,npps_y,vario_plot,azi=90.0,atol=22.5,sill=sill) # plot the variogram
plt.xlim([0,1000]); plt.ylim([0.0,sill*2.0])
plt.subplots_adjust(left=0.0, bottom=0.0, right=3.0, top=3.1, wspace=0.2, hspace=0.2); plt.show()
Want to Work Together?#
I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.
Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I’d be happy to drop by and work with you!
Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!
I can be reached at mpyrcz@austin.utexas.edu.
I’m always happy to discuss,
Michael
Michael Pyrcz, Ph.D., P.Eng. Professor, Cockrell School of Engineering and The Jackson School of Geosciences, The University of Texas at Austin
More Resources Available at: Twitter | GitHub | Website | GoogleScholar | Geostatistics Book | YouTube | Applied Geostats in Python e-book | Applied Machine Learning in Python e-book | LinkedIn
Comments#
This was a basic demonstration of sequential Gaussian simulation to calculate spatial heterogeneity realizations that honor the data, histogram and variogram. Much more can be done, I have other demonstrations for modeling workflows with GeostatsPy in the GitHub repository GeostatsPy_Demos.
I hope this is helpful,
Michael