Introduction#
Michael J. Pyrcz, Professor, The University of Texas at Austin
Twitter | GitHub | Website | GoogleScholar | Geostatistics Book | YouTube | Applied Geostats in Python e-book | Applied Machine Learning in Python e-book | LinkedIn
Chapter of e-book “Applied Machine Learning in Python: a Hands-on Guide with Code”.
Cite this e-Book as:
Pyrcz, M.J., 2024, Applied Machine Learning in Python: a Hands-on Guide with Code, https://geostatsguy.github.io/MachineLearningDemos_Book.
The workflows in this book and more are available here:
Cite the MachineLearningDemos GitHub Repository as:
Pyrcz, M.J., 2024, MachineLearningDemos: Python Machine Learning Demonstration Workflows Repository (0.0.1). Zenodo.
By Michael J. Pyrcz
© Copyright 2024.
About this e-book#
This e-book provides an accessible, online version, a new vehicle to share my Data Analytics and Geostatistics Undergraduate Course at The University of Texas at Austin, that includes,
Well Documented Demonstration Python Workflows - almost half the lecture time is spend with these workflows from my GeostatsPyDemos GitHub repository, that includes well-documented demonstration Python Jupyter workflows for many common geostatistics workflows completed with my GeostatsPy python package. One goal of this e-book is to share this content beyond the GitHub users, often only the experts and developers - to reach and help more people!
Course Notes - I’m also making an effort to continue adding more of the course notes into this e-book, to provid even more coverage of theory and implimentation details.
Interactive Python Dashboards - I have a lot of interactive Python dashboards that I use in my classes that I will continue to explain and cite in this e-book to help you get the entire in-class experience. These are all available in my DataScienceInteractivePython GitHub repository.
Recorded Lectures - there are many thousands of views of my lectures every month on YouTube channel, but many of those people are not on GitHub. I will add links to this e-book to support them. Also, the YouTube channel can enhance the e-book experience, read the book and drop by to hear my explanations.
Welcome!
The goal of this e-book is to teach the application of geostatistics in Python, for those new to geostatistics I provide theory and links to my course content, and for those experienced practitioners I provide example workflows and plots that you can implement.
YouTube Lectures#
This e-book is not a general introduction to statistics nor is it a comprehensive treatment of geostatistics,
for a deeper dive into the theory and practice built on fundamental probability and statistics follow the links at the top of each chapter to my recorded lectures on my Data Analytics and Geostatistics Course playlist on my YouTube channel.
of course, there are comprehensive books for geostatistics theory, including Geostatistical Reservoir Modeling [PD14].
for a deeper dive into the theory and practice of geostatistics. Also, I have a recent book chapter that summarizes GeostatsPy, GeostatsPy: Open-Source Geostatistics in Python[Pyr24c] that may be helpful.
Linked Recorded Lectures are Available
For a deeper dive into the theory and practice of geostatistics follow the links at the top of each chapter to my recorded YouTube lectures.
In case you are interested, my YouTube channel includes 3 complete courses:
along with additional supplemental series including:
Data Science Interactive Python Demonstrations - walk-throughs for many of my interactive Python data science dashboards
Data Science Basics in Python - basics like loading and visualizing data, prerequisites for this e-book
Random Talks - some of my talks on topics such as data analytics, geostatistics and machine learning, and even talks titled,
“Choosing Between Industry and Academia”
“Michael’s Unsolicited Career Advice”
“New Research in Subsurface Data Analytics and Machine Learning”
“A Professor’s Secret Weapon”
“Applying Machine Learning as a Compentent Engineer or Geoscientist”
Source and Other Demonstration Workflows#
This book integrates many of the GeostatsPy demonstration workflows to support use of the GeostatsPy Python package.
All the original geostatistics Python Jupyter notebook workflows are available in my GeostatsPyDemos GitHub repository.
It is challenging to learn a new Python package. For me, great examples for common workflows are critical. So I built out over 40 well-documented demonstration workflows that apply GeostatsPy to accomplish common spatial modeling tasks to support my students in my courses:
Note, I have made modifications to improve the formatting in the book, including the simplification of the workflow headers and suppression of diagnostic output and status bars during model run.
Table of Contents#
- Geostatistical Concepts
- Probability Concepts
- Coordinate Transformations
- Univariate Distributions
- Distribution Transformations
- Bivariate Analysis
- QQ-Plots
- Linear Regression
- Plotting Spatial Data and Models
- Bootstrap Uncertainty Models
- Declustering to Debias Data
- Advanced Declustering
- Variogram Calculation
- Variogram Positive Definite Models
- Variogram Models, Simulation Examples
- Variogram Calculation and Modeling
- Variogram Calculation from an Image
- Trend Modeling
- Spatial Trend Modeling, 1D + 2D
- Inverse Distance
- Simple and Ordinary Kriging
- Indicator Kriging
- Stochastic Simulation
- Kriging vs. Simulation, 1D
- Kriging vs. Simulation, 2D Maps
- Indicator Simulation
- Cosimulation
- Ergodic Fluctuations
- Simulation Post-processing
- Model Checking
- Volume Variance
- GeostatsPy Package
- Conclusions
- Glossary
- References
Recent Updates#
Here’s some highlights from recent updates:
What’s New with Version 0.0.1#
there are many minor formatting issues that I will resolve over the next couple of months.
I will be looking at practice methods to integrate my Interactive Educational Data Science Python Dashboards interactive dashboards [Pyr21], and more from my Data Analytics and Geostatistics Lecture Series [Pyr24a].
Yet, I’m excited to compile all this content into an accessible format, linked to lectures and with repeatable code. My goal is to help folks get started and improve their application of geostatistics. With an e-book, I realize the dream of sharing educational content with:
living documents, I will continuously make improvements and add new content over time, so send me suggestions and errata and return often!
repeatable results, all examples and plots show the code and anyone can reproduce all the results! The code is immediately downloadable, and all data are online available.
links for rapid access to all associated resources.
no pay wall for anyone around the world to access my content.
For me, this is a wonderful vehicle to combine and share all my educational content. Yes, thousands can follow my workflows on GitHub, but the e-book will share this content much more widely. Accessible and actionable educational content is my personal mantra. Some might wonder, why does Michael do this? Check out my story and you will learn that education changed my life. I grew up in a low income family in a small town in Canada. In middle school (junior high for my Canadian friends), I delivered newspapers for over a year to save up for my first computer that our school guidance councilor helped me buy at a local used computer store. A random engineering student talked to me in a gas station late one cold evening and that got me interested in an university education. Education changed my life, so it is my mission to share all my university educational content. Perhaps, I may lift someone as I was lifted.
Before we get started, here’s more information about the associated repository, the source of the workflows for this e-book.
Motivation for Open Educational Content#
There is more to be done, yet I am excited to compile all this content into an accessible format, linked to lectures and with repeatable code. My goal is to help people get started or to improve their application of machine learning. With an e-book, I realize my dream of sharing educational content with,
living documents, I will continuously make improvements and add new content over time, so send me suggestions and errata and return often!
repeatable results, all examples and plots show the code and anyone can reproduce all the results! The code is immediately downloadable, and all data are online available.
links, for rapid access to all associated resources, bringing together all of my educational content.
no pay wall, for anyone around the world to access my content.
For me, this is a wonderful vehicle to combine and share all my educational content. Yes, thousands can follow and mirror my workflows on GitHub, but the e-book will share this content much more widely. I see this with my YouTube channel with tens of thousands of subscribers and as many views per month vs. my GitHub account with about 3,600 following.
accessible and actionable educational content is my personal mantra.
Some might wonder, why does Michael do this? Check out my story and you will learn that education changed my life. Here’s the summary,
I grew up in a low income family in a small town in Canada.
a random engineering student talked to me in a gas station late one cold evening and that got me interested in an university education.
I supported myself and young family through an engineering B.Sc. and Ph.D., and as a result had a challenging, exciting and rewarding career.
Now it is my mission to share all my university educational content in that hope that I may lift someone else as I was lifted.
Before we get started, here’s more information about my motivation for sharing my university educational content.
Motivation for Open Educational Content#
There is more to be done, yet I am excited to compile all this content into an accessible format, linked to lectures and with repeatable code. My goal is to help people get started or to improve their application of geostatistics. With an e-book, I realize my dream of sharing educational content with,
living documents, I will continuously make improvements and add new content over time, so send me suggestions and errata and return often!
repeatable results, all examples and plots show the code and anyone can reproduce all the results! The code is immediately downloadable, and all data are online available.
links, for rapid access to all associated resources, bringing together all of my educational content.
no pay wall, for anyone around the world to access my content.
For me, this is a wonderful vehicle to combine and share all my educational content. Yes, thousands can follow and mirror my workflows on GitHub, but the e-book will share this content much more widely. I see this with my YouTube channel with tens of thousands of subscribers and as many views per month vs. my GitHub account with about 3,600 following.
accessible and actionable educational content is my personal mantra.
Some might wonder, why does Michael do this? Check out my story and you will learn that education changed my life. Here’s the summary,
I grew up in a low income family in a small town in Canada.
a random engineering student talked to me in a gas station late one cold evening and that got me interested in an university education.
I supported myself and young family through an engineering B.Sc. and Ph.D., and as a result had a challenging, exciting and rewarding career.
Now it is my mission to share all my university educational content in that hope that I may lift someone else as I was lifted.
Before we get started, here’s more information about the workflows in this e-book’s chapters and my interactive Python dashboards.
Geostatistics Demonstrations#
When I teach I avoid “copy and paste”, instead,
every example is real and repeatable. Students can access the codes and data and run the classroom demonstration examples for themselves.
For example, when I teach probablity density functions as part of data analytics, I show an example with different kernels.
In the course notes figure captions, I include the link to the workflow to reproduce this method and the actual plots, included in this e-book chapter Univariate Analysis.
Each geostatistics chapter in this e-book is a downloadable well-documented workflow,
utilizing synthetic data from my GeoDataSets repository
small and often 2D examples for fast run times and ease of interpretation
Common geostatistics workflows include,
univariate analysis
multivariate analysis
spatial analysis
spatial data debiasing
spatial estimation and simulation
spatial trend modeling
spatial model checking
scaling with volume-variance relations
optimum decision making
I have many other geostatistics and data science workflows in my other repositories. As I prepare these for this e-book I will add them to this repository. Let me know if there are other topics of interest.
3D modeling is next!
Data Science Interactive Python Repository#
One of my favourite things to do is to build out Python interactive dashboards to support my lectures. Here’s an example screen shot of one of my interactive dashboards for Bayesian updating,
I think it is awesome to teach a concept and then to get the students to open a dashboard and rapidly try it out, i.e., to play with the system!
You are welcome to explore my over 50 interactive data science Python dashboards in my Data Science Interactive Python repository on GitHu, including interactive,
Bayesian updating
bootstrap
central limit theorem
correlation coefficients
k-means clustering
density-based clustering
spectral clustering
linear, ridge and LASSO regression
model hyperparameter tuning and overfit
principal component analysis
variogram calculation and modeling
spatial estimation and simulation
decision making in the presence of uncertainty
artificial neural networks
This has been a lot of work, but what motivates me?
I’m a visual learner - yes I understand the math and systems, but seeing them in action is really helpful to me. I know that many are like me.
one of the best way to answer questions in class - a student asked, what do principal component loadings look like? That evening I made a really cool dashboard with one input, amount of correlation between 3 features with another one independent. The outputs included the principal component loadings and variance explained by each principal component. It showed so much, including conservation of variance, and component one as a mixture of the three correlated features as the correlation increased.
I really like doing this - I left industry to give back and have fun. I like to code and I love to make data science displays.
For now I will add links to these interactive dashboards.
in the future I would like these to be included in the e-book live and interactive. The jupyter-book hosted on GitHub does not allow for this at this time.
Running the Workflows in this e-book#
You can use this e-book, i.e., read and review the workflows, without ever running the code. I recommend that you run the code yourself to cement the concepts.
to do this you only need to install the required Python packages.
The good news is that I generally use the common Python packages curated and available through Anaconda. There will be a couple of additional packages to install including,
geostatspy - my spatial data analytics and geostatistics package for some data transformations and visualization
pyvista - 3D grid visualization
astropy - sparse data convolution for trend models
These packages are well maintained and can all be installed with:
pip install [package_name]
from a terminal window.
I like to launch the terminal window from Anaconda Navigator from the ‘Environments’ tab and left click the green arrow and select ‘Open Terminal’. This will ensure that the package is installed in the correct location. After teaching 100s of students and seeing many issues, this is my suggestion for the safest way to install Python packages.
Suggestions for Using this e-book#
This is a topical approach to the hands-on application of geostatistics. Each chapter stands alone and covers a unique geostatistical method.
Want to calculate and model a variogram?
Want to calculate a spatial estimate?
Want to simulate and post-process multiple spatial, stochastic realizations?
Open up the respective chapter and walkthrough the example with all requisite steps, including data loading, data visualization and modeling steps, with all codes available and explained through text and comments and the results shown.
in fact, I went the extra mile to show as much as possible with graphical outputs with efficient figures.
While doing this, you can go to the linked lecture(s) to fill in any gaps in your knowledge and then download and update the workflow for your application.
please remember to cite the e-book in your work, so that others will see and use this resource!
Remember, that this content is for educational purposes only and I do not provide any guarantees.
there may be errors in the codes, there may be issues with your case that invalidate the assumptions of the e-book’s workflows. It is up to you to evaluate your results.
Warning
If you adopt any workflows from this e-book you must check your own work.
Want to Work Together?#
I hope this content is helpful to those that want to learn more about subsurface, spatial modeling, data analytics and geostatistics. Students and working professionals are welcome to participate. I’m happy to collaborate with your organization.
Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I’d be happy to drop by and work with you! I am a cofounder of a data science education company.
Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!
I can be reached at mpyrcz@austin.utexas.edu.
I’m always happy to discuss,
Michael
Michael Pyrcz, Ph.D., P.Eng. Professor, Cockrell School of Engineering and The Jackson School of Geosciences, The University of Texas at Austin
More Resources Available at: Twitter | GitHub | Website | GoogleScholar | Geostatistics Book | YouTube | Applied Geostats in Python e-book | Applied Machine Learning in Python e-book | LinkedIn
How to Use this e-book?#
This is a topical approach to the hands-on application of geostatistics. Each chapter stands alone and covers a unique demonstration or workflow using GeostatsPy. Want to visualize univariate distributions, debias your spatial data with declustering, make estimates in space with kriging or simulate a spatial heterogeneity model? Open up the respective chapter and walkthrough the example with all requisite steps, including data loading, data visualization and geostatistics steps, all codes and results shown. Go to the linked lecture to fill any gaps in your knowledge and then download and update the workflow for your application, please remember to cite the e-book in your work, so that others will see and use this resource! Remember, that this content is for educational purposes only and I do not provide any guarantees. There may be errors in the codes, there may be issues with your case that invalidate the assumptions of the e-book’s workflows. It is up to you to evaluate your results.
Warning
If you adopt any workflows from this e-book or any codes from the GeostatsPy Python package you must check your own work.
About the Author#
Michael Pyrcz is a professor in the Cockrell School of Engineering, and the Jackson School of Geosciences, at The University of Texas at Austin, where he researches and teaches subsurface, spatial data analytics, geostatistics, and machine learning. Michael is also,
the principal investigator of the Energy Analytics freshmen research initiative and a core faculty in the Machine Learn Laboratory in the College of Natural Sciences, The University of Texas at Austin
an associate editor for Computers and Geosciences, and a board member for Mathematical Geosciences, the International Association for Mathematical Geosciences.
Michael has written over 70 peer-reviewed publications, a Python package for spatial data analytics, co-authored a textbook on spatial data analytics, Geostatistical Reservoir Modeling and author of two recently released e-books, Applied Geostatistics in Python: a Hands-on Guide with GeostatsPy and Applied Machine Learning in Python: a Hands-on Guide with Code.
All of Michael?s university lectures are available on his YouTube Channel with links to 100s of Python interactive dashboards and well-documented workflows in over 40 repositories on his GitHub account, to support any interested students and working professionals with evergreen content. To find out more about Michael?s work and shared educational resources visit his Website.
Want to Work Together?#
I hope this content is helpful to those that want to learn more about subsurface, spatial modeling, data analytics and geostatistics. Students and working professionals are welcome to participate. I’m happy to collaborate with your organization.
Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I’d be happy to drop by and work with you!
Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!
I can be reached at mpyrcz@austin.utexas.edu.
I’m always happy to discuss,
Michael
Michael Pyrcz, Ph.D., P.Eng. Professor, Cockrell School of Engineering and The Jackson School of Geosciences, The University of Texas at Austin
More Resources Available at: Twitter | GitHub | Website | GoogleScholar | Book | YouTube | LinkedIn
Comments#
This is an introduction to my Applied Geostatistics in Python: a Hands-on Guide with GeostatsPy e-book, I have many more resources. Check out my shared resource inventory and the YouTube lecture links at the start of this chapter with resource links in the videos’ descriptions.
I hope this is helpful,
Michael