Code Examples: AI-Based Sampling Approaches (Cookbook Part 2)

Avoid needing a huge (simulation) dataset to train your AI/ML model

Apr 17, 2025

∙ Paid

Let’s cut to the chase - our simulations are expensive and if we want to train a machine learning model on multiple of them, it would be ideal to need as few as possible. Further, it would be good to have some common sense logic used to confidentially pick which simulations to run as we build-up our dataset, rather than something less ideal like randomly picking samples (that doesn’t give us confidence our model can be trusted when we go try to use it for inference after training).

This post covers a framework for picking the simulations to run, we can call it ‘sampling’, that is dynamic/higher fidelity than traditional approaches. This process of sampling is based on using a machine learning model.

We will cover two hands-on (shared) code tutorials and fundamental concepts on this subject.

Key points we’ll cover here:

Active Learning Basics: How adaptive sampling (active learning) works and why it can drastically reduce the number of samples needed.
Scenarios: Two settings for sample selection:
1. On-demand generation: We can query any new input (i.e. we can run a new simulation at any parametric point that we want, no limitations).
2. Pool-based selection: We have a fixed pool of unlabeled candidates (possible design points we have in mind and could simulate, but we have no simulation results) and must choose which ones to label/simulate. So, let’s say we have predefined grid, or a dataset of candidate design points but can only afford to simulate some of them. To contrast to the above point is sometimes we have a fixed space with certain points we have to pick to simulate.
Uncertainty Estimation: Strategies for evaluating model uncertainty or expected improvement to decide the next best sample (building from those mentioned in my last post).
Hands-on Example #1: A step-by-step Python example using scikit-learn. We will walk through initial model training, the active learning loop, and final comparison to traditional methods.
Hands-on Example #2: A step-by-step Python example using a popular dedicated library for active learning/adaptive sampling
Model Choices: Discussion of appropriate models for active learning (Gaussian Processes, ensembles, Bayesian neural nets, …) and using a neural network as the final surrogate model.

Here’s the table of contents for this blog:

Choosing the Next Sample: Uncertainty and Query Strategies
Example: Sine Wave
Example: Tutorial from an adaptive sampling library

This is a follow-up on one item mentioned in this succinct step-by-step guide from this post:

Cookbook: Making Small Dataset Projects Successful (CFD, FEA, …)

Justin Hodges, PhD

Apr 8

Cookbook: Making Small Dataset Projects Successful (CFD, FEA, …)

Step-by-step Guide

Read full story

AI/Machine learning in fluid mechanics, engineering, physics

Code Examples: AI-Based Sampling Approaches (Cookbook Part 2)

Avoid needing a huge (simulation) dataset to train your AI/ML model

Cookbook: Making Small Dataset Projects Successful (CFD, FEA, …)

This post is for paid subscribers