Course

Introduction to Subsurface Machine Learning

Disciplines: Multi-Disciplinary
Category: Subsurface • Intermediate - Prerequisite Training or Skill • Introductory • Data Analytics • Engineering • Geoscience
Format: Classroom • Live Online
Available: Public • Private

Who Should Attend

Technical energy industry professionals (petroleum engineers, geoscientists) with basic Python proficiency.

Description

This course will provide working knowledge on using python programming and open-source packages essential for data analytics and machine learning. The entire course is based on live demos of codes and workflows in the Jupyter Notebook environment. The course will help geoscientists, geophysicists, and petroleum engineers learn python programming at a beginner to intermediate level. The course uses various types of data: well logs, core data, well performance data, and production data.

The focus of this course is on introducing Python programing skills that are pre-requisites to real-world data analysis. The course will not explore applications on large-sized field data. The group project lasting for 2 weeks at the end of the course will help the participants try out the learned concepts by modifying the shared Jupyter Notebooks. The practice session will allow deeper interaction with the instructor on problems specific to the participants.

Learning Outcomes

Assemble open-source coding and scripting workflows in Python to solve basic data science problems related to subsurface data.
Apply numpy, pandas, matplotlib, seaborn and sklearn packages on subsurface data.
Solve supervised regression problems using ElasticNet, random forest, nearest neighbor, and LASSO regressors.
Solve supervised classification problems using nearest neighbor, random forest, and support vector classifiers.
Solve unsupervised clustering problems using k-means and mean shift techniques.
Apply anomaly detection and data preprocessing.
Apply neural network and boosting methods.
Learn about time-series forecasting, clustering, and spatial data analytics through 2-week project.

Course Content

1 Hour: Using numpy on large arrays, using pandas on large tabular data
1.5 Hours: Using numpy and pandas on well data for:
- Data preprocessing
- Exploratory data analysis
1 Hour: Using matplotlib and seaborn on well production data for visualization
4 Hours: Using sklearn for regression and classification
- Irreducible saturation prediction from core data
- Rock classification based on well log
- Use of bagging, neighbors, regularization, and support vectors
2 Hours: Feature selection, dimensionality reduction, and feature ranking
3 Hours: Using sklearn for clustering and outlier detection
- Anomaly detection on porosity-permeability data
- Rock typing (clustering) on well logs
1 Hour: Uncertainty quantification for regressors and classifiers
2.5 Hours: Advanced regressors and classifiers: neural network and boosting
Optional 2-Week Project: Client can select 2 of the following 3 projects according to the needs of the participants. Instructor will hold three 2-hour virtual sessions guiding the participants through the tasks over a 2-week period. The solutions will be reviewed in the final session.
- Production Forecasting
- Shale Image Analysis
- Clustering the Cross-Well Seismic Traces

Project #1: Production Forecasting
Data contains the following 13 columns for 3 wells:

DATE
WELL_ID
ON_HRS
DOWN_PRES
DOWN_TEMP
PROD_TUBING
CHOKE_SIZE
WHPres
WHTemp
PROD_CHOKE_SIZE
OIL_VOL
GAS_VOL
WAT_VOL

Task Description:
Forecast “Oil/Gas/Water Rates” T days in future as a function of the desired choke size and flow period in next T days given a historical trend of oil/water/gas rates, downhole pressure, downhole temperature, wellhead pressure, flow period and choke size for N days in the past.

Controllable operational features: flow period and choke size
Response features: Rates, pressures, and temperatures
For example, when T = 1 day, there will be 10 features and 3 targets

Key Takeaways:

Time-Series cross validation
AdaBoost vs. Gradient Boosting
Hyper-parameter tuning of neural network
Model export and model deployment
Pipelines

Project #2: Shale Image Analysis
High-resolution microscopy image of shale samples

Task Description:

Train classifier to locate matrix, pores, kerogen, and pyrite in multiple images
Perform image compression using clustering

Key Takeaways:

Image analytics packages: cv2 and skimage
Filters and feature extraction
Image compression
Feature importance
Classification model evaluation

Project #3: Clustering the Cross-well Seismic Traces
Cross-well seismic imaging of a CO₂ storage reservoir

Task Description:

Perform feature extraction by applying Sobel, Hessian, Difference of Gaussian, Local Binary Pattern, and Wavelet Transform
Use clustering to identify regions of distinct CO₂ distribution and content

Key Takeaways:

PIL and skimage packages
Filters and feature extraction
Optimum cluster numbers
Silhouette method
Davies-Bouldin index and Calinski & Harabasz criterion

In-Person

Length: 2 Days

Upcoming Events

Check back in periodically for updated Public and Live Online course dates!

In-House

This course is also available as a private, onsite course upon request. Contact us for details and pricing.

Request Private/In-House Training

Instructor

Siddharth Misra, PhD headshot Siddharth Misra, PhD

Introduction to Subsurface Machine Learning

Who Should Attend

Description

Learning Outcomes

Course Content

In-Person

Upcoming Events

In-House

Instructor

Sample Topic