Decision Making with Uncertainty#

Michael J. Pyrcz, Professor, The University of Texas at Austin

Twitter | GitHub | Website | GoogleScholar | Book | YouTube | Applied Geostats in Python e-book | LinkedIn

Chapter of e-book “Applied Geostatistics in Python: a Hands-on Guide with GeostatsPy”.

Cite this e-Book as:

Pyrcz, M.J., 2024, Applied Geostatistics in Python: a Hands-on Guide with GeostatsPy [e-book]. Zenodo. doi:10.5281/zenodo.15169133 DOI

The workflows in this book and more are available here:

Cite the GeostatsPyDemos GitHub Repository as:

Pyrcz, M.J., 2024, GeostatsPyDemos: GeostatsPy Python Package for Spatial Data Analytics and Geostatistics Demonstration Workflows Repository (0.0.1) [Software]. Zenodo. doi:10.5281/zenodo.12667036. GitHub Repository: GeostatsGuy/GeostatsPyDemos DOI

By Michael J. Pyrcz
© Copyright 2024.

This chapter is a summary of Decision Making with Uncertainty including essential concepts:

  • loss functions

  • expected loss vs. estimate

YouTube Lecture: check out my lectures on:

For convenience here’s a summary of the salient points.

Motivation for Decision Making with Uncertainty#

We have built uncertainty models, now we need to make decisions in the presence of uncertainty.

Multiple reservoir realizations to represent uncertainty, but we must make a single choice, where to put the well.

Building Uncertainty Models#

How Did We Calculate Uncertainty Models? Some examples:

  • Bayesian Updating - prior plus likelihood to calculate the posterior, for example, probability that a coin is fair, uncertainty in reservoir OIP

  • Bootstrap + Monte Carlo Simulation and Transfer Function - uncertainty the mean porosity by bootstrap, then Monte Carlo simulation for uncertainty in oil-in-place

  • Kriging - kriging estimate and variance with Gaussian assumption for uncertainty in mineral grade at a location

  • Simulation and Transfer Function - reservoir features realizations applied to flow simulation to calculate pre-drill production uncertainty at a new well

One way or another, we have calculated a reasonable uncertainty model, for example,

  • connected volume of water or hydrocarbon to a location

  • mineral or hydrocarbon resource in place

  • recover factor mineral or hydrocarbon extraction

Two realizations of soil lead concentration near a closed lead smelter Superfund site in West Dallas, Texas (Pyrcz, 2000). Units are PPM.

and now we must make a decision with that uncertainty model, for examples,

  • number and locations of wells

  • time and volume injected for water of chemical flood, pump and treat remediation

  • dig limits and mine plan

Types of Decisions#

For example well site selection methods include,

  • Integer programing problem, output must be an integer, not a fraction

  • Optimization combined with simulation one potential well at a time

  • Experimental design and response surface methodology

Optimize the profitability of reservoir production,

  • High dimensional - complicated by large number of parameters and timing

  • Interactions - locations of injector and producers,

  • Multivariate - injection, well completions

We could make \(L\) optimum choices, a set of choices that are optimum for each model. Or we could make one globally optimum choice.

Global optimum vs. L suboptimum (Pyrcz and Deutsch, 2014).

Expected Profit Workflow#

These are the steps in the expected profit workflow for optimum decision making in the presence of uncertainty,

  1. Calculate the uncertainty model, \(\ell = 1,\ldots,L\) subsurface realizations

  2. Establish \(s = 1,\dots,S\) development scenarios

  3. Establish \(P\) profit metric

  4. Calculate profit, \(P\), over all \(L\) and \(S\)

  5. Calculate the expected profit for each development scenario, \(E\{P(S)\}\)

  6. Select the \(𝑠\) development scenario that maximizes expected profit.

Now we walk through each of these steps and provide more details.

The Uncertainty Model#

Calculate the uncertainty model, \(\ell = 1,\ldots,L\), subsurface models. This includes,

  • Scenarios - change the model inputs, and assumptions to capture uncertainty in the modeling decisions

  • Realizations - change the random number seed to capture spatial uncertainty

  • Multivariate - include all needed features, and their relations

The uncertainty model is represented by multiple models of the subsurface.

\(\ell=1,…,6\) subsurface models, realizations and scenarios of the subsurface.

The Development Scenarios#

Discrete alternatives for developing the spatial, subsurface resource

Includes all details needed to simulate extraction, e.g., well locations, completion types, dig limits, mining block sequence

For example, 3 potential well locations with drainage radius,

\(s=1,…,3\) development scenarios.

The Profit Metric#

The profit metric,

  • is a transfer function to map from spatial features to value, \(P(\ell,s)\)

  • include inputs subsurface realization, \(\ell\), and development scenario, \(s\)

  • In the units of the thing are we trying to maximize? For example, net present value, tonnage contaminant removed from soil, or volume of resource recovered

Profit for \(𝑙=5\), realization, \(𝑠=2\) development scenario.

We calculated profit over all \(L\) subsurface realizations, and \(S\) development scenarios. If possible we work with the,

  • full combinatorial of \(L\) subsurface realizations and \(S\) development scenarios, \(P(\ell = 1,\ldots,L,s=1,\ldots,S)\)

Profit calculated over all subsurface realizations and development scenarios.

Expected Profit#

Calculation expected profit \(E\{P(S)\}\)

Expected profit is a probability, 𝜆_𝑙, weighted average of profit over subsurface uncertainty

\[ \mathbb{E}\{P(s = 2)\} = \frac{1}{\sum_{\ell=1}^{L} \lambda_l} \sum_{\ell=1}^{L} \lambda_l \cdot P(\ell, s = 2) \]

if all models are equiprobable,

\[ \lambda_l = \frac{1}{L}, \quad \ell = 1, \dots, L \]

Maximize Expected Profit#

Now we maximize the expected profit \(E\{P(S)\}\),

  • select the development scenario that maximizes expected profit over the subsurface realizations, \(L\)

\[ s=𝒂𝒓𝒈𝒎𝒂𝒙(E{P(S,L)}) \]
Expected profit calculated over all \(s = 1,\ldots,3\) development scenarios and \(\ell = 1, \ldots, L\) subsurface realizations and select the development scenario with highest expected profit, \(s=1\) selected.

This is the fundamental workflow for making an optimum decision in the presence of uncertainty.

Below we demonstrate this approach to select a single estimate. But, let’s first talk about uncertainty.

How do we make optimum estimates in the presence of uncertainty?

  • we cannot provide an uncertainty distribution to operations for decision making.

  • we must provide a single estimate. We must choose a single estimate in the presence of uncertainty

Let’s think about making this single estimate, we could just choose the average, expectation assuming equal probability for each outcome.

  • The expectation estimate assumes the cost of under- and over-estimation is symmetric and squared, i.e., the squared loss L2 norm

  • Risk Adverse – cost of overestimation > cost of underestimation

  • Risk Seeking – cost of overestimation < cost of underestimation

Let’s formalize this with the concept of loss functions.

Loss Functions#

To make a decision in the presence of uncertainty we need to quantify the loss function,

  • loss due to over- and underestimation of the true value

  • no loss if the estimate is correct, i.e., if Estimate – Truth = 0

  • Note: estimating with the mean minimizes the quadratic loss function

Here’s the L2 loss, symmetric and increasing with the square away from 0, perfectly accurate prediction,

Simple loss function, the cost of estimation error represented with a symmetric quadratic function. This is L2 loss.

For a more complicated example, overestimation is more costly than underestimation and at some threshold underestimation any further has no more cost,

Complicated loss function, the cost of estimation error with asymmetry and thresholds.

To further illustrate loss functions, consider the loss function to support the decision to carry an umbrella given uncertainty in the weather, rain or no rain from Professor Clayton Deutsch’s geostatistics course.

  • Overestimation - you estimate rain, but no rain happens

  • Underestimation = you estimate no rain, but rain happens

Should you carry an umbrella? The loss function depend on where you are going?

Loss functions for estimating rain for going to the zoo, an interview and the swimming pool.

Decision Making with Loss Functions#

This is the workflow for decision making in the presence of uncertainty,

  1. quantify cost of over and underestimation in a loss function

  2. apply loss function to the random variable of interest for a range of estimates

  3. calculate the expected loss for each estimate

\[ E\{Loss(z^{*})\} = \int_{-\infty}^{\infty} \text{Loss}(z - z^*) \cdot f_z(z) \, dz \]
  1. make decision that minimizes loss

Illustration of expect loss calculated for three estimates, given the loss function (above) and the distribution of uncertainty for the estimate, \(z^{*}\) (below).

To understand the change in expected profit over different estimates, let’s put the flipped loss on the uncertainty distribution.

Uncertainty PDF and loss for three estimate cases shown in the figure above.

Now we can explain the results,

  • the black estimate is the lowest because the mode of the uncertainty distribution (highest probability) includes the lowest loss portions of the loss function.

  • the red estimate is the highest because the cost of over estimation is more costly and the entire uncertainty distribution is in the over estimated portion of the loss function

  • the blue estimate is quite low, but the cost of underestimation levels off, so the loss is not that high

Let’s actually demonstrate the calculation of expected loss for a discrete case, we turn the uncertainty model into a normalized histogram with probability to be in each bin,

\[ \mathbb{E}\{\text{Loss}(z^*)\} = \int_{-\infty}^{\infty} L(z - z^*) \cdot f_z(z) \, dz = \sum L(z - z^*) \cdot P(z) \]

Now we can calculate the expected loss for an estimate of 100 MMbbl as,

Expected loss for an estimate of 100 MMbbl.

and the expected loss for an estimate of 400 MMbbl as,

Expected loss for an estimate of 400 MMbbl.

and the expected loss for an estimate of 700 MMbbl as,

Expected loss for an estimate of 700 MMbbl.

Now I provide an interactive dashboard for decision making in the presence of uncertainty.

  • this will not render in the book, so in the future I will add static examples for the book.

  • here is the interactive dashboard.

Interactive Python dashboard for decision making in the presence of uncertainty.

Load the Required Libraries#

The following code loads the required libraries.

import geostatspy.GSLIB as GSLIB          # GSLIB utilies, visualization and wrapper
import geostatspy.geostats as geostats    # GSLIB methods convert to Python        

We will also need some standard packages. These should have been installed with Anaconda 3.

import numpy as np                        # ndarrys for gridded data
import pandas as pd                       # DataFrames for tabular data
import os                                 # set working directory, run executables
import matplotlib.pyplot as plt           # for plotting
from scipy import stats                   # summary statistics
import math                               # trig etc.
import random
from ipywidgets import interactive        # widgets and interactivity
from ipywidgets import widgets                            
from ipywidgets import Layout
from ipywidgets import Label
from ipywidgets import VBox, HBox
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[2], line 8
      6 import math                               # trig etc.
      7 import random
----> 8 from ipywidgets import interactive        # widgets and interactivity
      9 from ipywidgets import widgets                            
     10 from ipywidgets import Layout

ModuleNotFoundError: No module named 'ipywidgets'

Set Up the Interactive Dashboard#

The following code includes the:

  • dashboard

  • numerical methods

  • graphical displays

# interactive calculation of the random sample set (control of source parametric distribution and number of samples)
l = widgets.Text(value='                                      Optimum Decision Making Demonstration, Michael Pyrcz, Professor, The University of Texas at Austin',layout=Layout(width='950px', height='30px'))

mean = widgets.FloatSlider(min=0.0, max = 1.0, value = 0.5,step=0.01,description = r'$\mu$',orientation='horizontal',layout=Layout(width='230px', height='50px'),continuous_update=False)
mean.style.handle_color = 'darkorange'

stdev = widgets.FloatSlider(min=0.001, max = 0.25, value = 0.15,step=0.01,description = r'$\sigma$',orientation='horizontal',layout=Layout(width='230px', height='50px'),continuous_update=False)
stdev.style.handle_color = 'darkorange'

ui1 = widgets.VBox([mean,stdev],kwargs = {'justify_content':'center'})

slope_under = widgets.FloatSlider(min=0.00, max = 1.0,value = 0.1,step=0.01,description = r'$m_{under}$',orientation='horizontal',layout=Layout(width='230px', height='50px'),continuous_update=False)
slope_under.style.handle_color = 'green'

power_under = widgets.FloatSlider(min=1.0, max = 5.0, value = 0.0, description = r'$p_{under}$',orientation='horizontal',layout=Layout(width='230px', height='50px'),continuous_update=False)
power_under.style.handle_color = 'green'

ui2 = widgets.VBox([slope_under,power_under],kwargs = {'justify_content':'center'})

slope_over = widgets.FloatSlider(min=0.00, max = 1.0,value = 0.1,step=0.1,description = r'$m_{over}$',orientation='horizontal',layout=Layout(width='230px', height='50px'),continuous_update=False)
slope_over.style.handle_color = 'blue'

power_over = widgets.FloatSlider(min=1.0, max = 5.0, value = 0.0, description = r'$p_{over}$',orientation='horizontal',layout=Layout(width='230px', height='50px'),continuous_update=False)
power_over.style.handle_color = 'blue'

ui3 = widgets.VBox([slope_over,power_over],kwargs = {'justify_content':'center'})

step = widgets.FloatLogSlider(min=-4, max = -1, value = 0.02, description = 'Step',orientation='horizontal',layout=Layout(width='230px', height='50px'),continuous_update=False)
step.style.handle_color = 'gray'

ui4 = widgets.VBox([step],kwargs = {'justify_content':'none'})

ui = widgets.HBox([ui1,ui2,ui3,ui4])
ui_title = widgets.VBox([l,ui],)

def make_plot(mean,stdev,slope_under,power_under,slope_over,power_over,step):
    seed = 73073
    xmin = mean - 5.0 * stdev    
    xmax = mean + 5.0 * stdev
    X = np.arange(xmin,xmax,step)              # build uncertainty distribution
    pdf = stats.norm.pdf(X, loc = mean, scale = stdev)
    y = stats.norm.rvs(loc=mean,scale=stdev,size=10000,random_state=seed)
   
    delta = np.arange(-1*(xmax-xmin)*4,(xmax-xmin)*4,step) # build loss function
    loss = np.zeros(len(delta))
    rloss = np.zeros(len(delta))               # flip for convolve operator
    
    for id, d in enumerate(delta):
        if d < 0.0: 
            loss[id] = pow(abs(d),power_under)*slope_under
            rloss[len(delta)-id-1] = pow(abs(d),power_under)*slope_under 
        else: 
            loss[id] = pow(d,power_over)*slope_over
            rloss[len(delta)-id-1] = pow(d,power_over)*slope_over
   
    pdf_norm = pdf / pdf.sum()                 # calculate expected loss
    exp_loss = np.convolve(pdf_norm,loss,mode='valid')
    x_exp_loss = np.arange(-0.5*step*(len(exp_loss))+mean,0.5*step*(len(exp_loss))+mean,step)
    x_exp_loss = x_exp_loss[0:len(exp_loss)]   # avoid rounding error
    inside = np.logical_and(x_exp_loss >= xmin, x_exp_loss <= xmax)
    max_loss_plot = np.max(exp_loss,where = inside,initial=0.0)
    best_estimate = x_exp_loss[np.argmin(exp_loss)]
    min_loss = np.min(exp_loss)
 
    plt.subplot(131)                           # plot uncertainty distribution
    plt.plot(X,pdf,color='darkorange',alpha=0.7,zorder=10,lw=3,ls='--')
    hist = plt.hist(y,bins=X,edgecolor='black',color='darkorange',alpha=0.7,density=True,stacked=True,
                   histtype=u'step',linewidth=2)
    plt.hist(y,bins=X,color='grey',alpha=0.2,zorder=20,density=True)
    if best_estimate >= 0 and best_estimate <= 1.0:
        plt.plot([best_estimate,best_estimate],[0,np.max(hist[0])*1.2],color='black',linestyle='--',zorder=100)
        plt.annotate('Optimum Estimate = ' + str(np.round(best_estimate,2)),[best_estimate,np.max(hist[0])*0.7], rotation=270)
    elif best_estimate < 0.0: 
        plt.annotate('Optimum Estimate = ' + str(r'$-\infty$'),[0,np.max(hist[0])*0.7], rotation=270)
    else:
        plt.annotate('Optimum Estimate = ' + str(r'$+\infty$'),[0.96,np.max(hist[0])*0.7], rotation=270)
    plt.xlim(0.0,1.0); plt.xlabel(r"Feature, $y_1$"); plt.ylim([0,np.max(hist[0])*1.2])
    plt.title('Uncertainty Distribution'); plt.ylabel('Density')
    plt.grid(color='grey', ls = '-.', lw = 0.1)

    plt.subplot(132)                           # plot loss function, loss vs. estimate error
    plt.plot(delta[delta>0],loss[delta>0],color='blue',alpha=0.8,lw=3)
    plt.plot(delta[delta<0],loss[delta<0],color='green',alpha=0.8,lw=3)
#    plt.plot([0,0],[0,np.max(loss)],color='black',linestyle='--',alpha=0.3)
    plt.plot([0,0],[0,1.0],color='black',linestyle='--',alpha=0.8)
#    plt.annotate('Underestimation',[(xmax-xmin)*-0.1,np.max(loss)*0.8], rotation=0,horizontalalignment='right')
#    plt.annotate('Overestimation',[(xmax-xmin)*0.1,np.max(loss)*0.8], rotation=0,horizontalalignment='left')
    plt.annotate('Underestimation',[(xmax-xmin)*-0.1,0.7], rotation=0,horizontalalignment='right')
    plt.annotate('Overestimation',[(xmax-xmin)*0.1,0.7], rotation=0,horizontalalignment='left')
#    plt.xlim([-1*(xmax-xmin)*3,(xmax-xmin)*3])
#   plt.ylim([0,np.max(loss)])
    plt.xlim([-1,1]); plt.ylim([0,1.0])
    plt.xlabel('Error In Estimate'); plt.title('Loss Function'); plt.ylabel('Loss') 
    plt.grid(color='grey', ls = '-.', lw = 0.1)
    
    plt.subplot(133)                           # plot expected loss vs. estimate
    #plt.plot(X,pdf/1000*stdev,'--',color='black',alpha=0.7,zorder=1)
    plt.plot(x_exp_loss,exp_loss,color='red',alpha=0.8,lw=3)
    if best_estimate >= 0 and best_estimate <= 1.0:
        plt.plot([best_estimate,best_estimate],[0,max_loss_plot],color='black',linestyle='--',lw=2)
        plt.annotate('Optimum Estimate = ' + str(np.round(best_estimate,2)),[best_estimate,max_loss_plot*0.6], rotation=270)
    elif best_estimate < 0:
        plt.annotate('Optimum Estimate = ' + str(r'$-\infty$'),[0,max_loss_plot*0.6], rotation=270)
    else:
        plt.annotate('Optimum Estimate = ' + str(r'$+\infty$'),[0.96,max_loss_plot*0.6], rotation=270)
        
    plt.xlim([0.0,1.0]); # plt.ylim([0,1.0])
#    plt.plot([xmin,xmax],[min_loss,min_loss],color='black',linestyle='--',alpha=0.3)
    plt.plot([0,1],[min_loss,min_loss],color='black',linestyle='--',alpha=0.7,lw=2)
    plt.annotate('Minimum Expected Loss',[xmin+(xmax-xmin)*0.2,min_loss*1.1], rotation=0,horizontalalignment='left')
    plt.ylim([0,max_loss_plot])
    plt.xlabel(r'Estimate, $y^*_1$'); plt.title('Expected Loss vs. Estimate'); plt.ylabel('Expected Loss')
    ax2 = plt.gca().twinx()
    ax2.plot(X,pdf,'--',color='darkorange',alpha=0.7,zorder=1,lw=3)
    ax2.set_ylim([0,np.max(hist[0])*1.2])
    ax2.set_ylabel('Density',rotation=270,labelpad=10)
    plt.grid(color='grey', ls = '-.', lw = 0.1)
   
    plt.subplots_adjust(left=0.0, bottom=0.0, right=3.0, top=1.2, wspace=0.2, hspace=0.2)
    plt.show()   
    
interactive_plot = widgets.interactive_output(make_plot, {'mean':mean,'stdev':stdev,'slope_under':slope_under,'power_under':power_under,'slope_over':slope_over,'power_over':power_over,'step':step})
interactive_plot.clear_output(wait = True)                # reduce flickering by delaying plot updating 
    

Optimum Decision Making Demonstration#

  • specify the uncertainty distribuiton \(\mu\), \(\sigma\) and loss function \(m\) and \(p\) and observe the loss function, expected loss and optimum estimate

display(ui_title, interactive_plot)                            # display the interactive plot

Comments#

This was a basic demonstration of optimum decision making in the precense of uncertainty.

  • we build uncertainty models all the time to support decision making. Here’s how it gets done.

I have other demonstrations on the basics of working with DataFrames, ndarrays, univariate statistics, plotting data, declustering, data transformations, trend modeling, multivariate analysis and many other workflows available at GeostatsGuy/PythonNumericalDemos and GeostatsGuy/GeostatsPy.

I hope this is helpful,

Michael

About the Author#

Professor Michael Pyrcz in his office on the 40 acres, campus of The University of Texas at Austin.

Michael Pyrcz is a professor in the Cockrell School of Engineering, and the Jackson School of Geosciences, at The University of Texas at Austin, where he researches and teaches subsurface, spatial data analytics, geostatistics, and machine learning. Michael is also,

  • the principal investigator of the Energy Analytics freshmen research initiative and a core faculty in the Machine Learn Laboratory in the College of Natural Sciences, The University of Texas at Austin

  • an associate editor for Computers and Geosciences, and a board member for Mathematical Geosciences, the International Association for Mathematical Geosciences.

Michael has written over 70 peer-reviewed publications, a Python package for spatial data analytics, co-authored a textbook on spatial data analytics, Geostatistical Reservoir Modeling and author of two recently released e-books, Applied Geostatistics in Python: a Hands-on Guide with GeostatsPy and Applied Machine Learning in Python: a Hands-on Guide with Code.

All of Michael’s university lectures are available on his YouTube Channel with links to 100s of Python interactive dashboards and well-documented workflows in over 40 repositories on his GitHub account, to support any interested students and working professionals with evergreen content. To find out more about Michael’s work and shared educational resources visit his Website.

Want to Work Together?#

I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.

  • Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I’d be happy to drop by and work with you!

  • Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PI is Professor John Foster)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!

  • I can be reached at mpyrcz@austin.utexas.edu.

I’m always happy to discuss,

Michael

Michael Pyrcz, Ph.D., P.Eng. Professor, Cockrell School of Engineering and The Jackson School of Geosciences, The University of Texas at Austin

More Resources Available at: Twitter | GitHub | Website | GoogleScholar | Geostatistics Book | YouTube | Applied Geostats in Python e-book | Applied Machine Learning in Python e-book | LinkedIn