Physical AI: robots, scientific computing, and digital twins

Physical AI: robots, scientific computing, and digital twins

Physics-Informed Trick for ML Surrogates

Automotive Aerodynamics Code Tutorial

Justin Hodges, PhD's avatar
Justin Hodges, PhD
Jun 04, 2026
∙ Paid

Here I present a code tutorial applying the “AdaField” paradigm to the DrivAerML dataset with the Transolver++ model. This explains the approach, shares the code & dataset, and walks through this novel/helpful idea!

1. The Method/Publication

The idea comes from AdaField: Generalizable Surface Pressure Modeling with Physics-Informed Pre-training and Flow-Conditioned Adaptation.

At a high level, AdaField asks: can we train a neural surrogate on abundant aerodynamic data, then make it useful in regimes where CFD data is scarce? This matters because ML surrogate builders often have the same pain: one dataset is large, polished, and public, while the actual design problem has only a few expensive CFD cases.

AdaField has three important ingredients:

SAPT: a point-cloud backbone for predicting surface pressure fields.

FCA: Flow-Conditioned Adapters, which inject flow variables like velocity, wind, Mach number, or angle of attack into the model through small trainable modules.

PIDA: Physics-Informed Data Augmentation, the part we used most directly in the notebook.

PIDA is the special trick. For steady incompressible flow, AdaField uses a similarity argument:

geometry points: x -> x / c
velocity: v -> c * v
pressure coeff: Cp stays the same

So c is a scale factor. If c = 2, the object becomes half as large and the velocity doubles. The pressure coefficient Cp should remain invariant under that transformation, at least under the assumptions of the method.

The value proposition for ML surrogate people is simple: you can expose a model to physically meaningful scale/velocity variation without running new CFD. That can improve robustness, out-of-distribution behavior, and transfer to related design regimes.

2. Applying It to DrivAerML in This Demo

In the notebook, we did not reimplement full AdaField. Instead, we used the PIDA concept with a Transolver++ backbone.

That means:

AdaField paper: SAPT + FCA + PIDA
Our demo: Transolver++ + PIDA-style augmentation

We used real DrivAerML boundary-surface CFD data from Hugging Face.

A few adaptations were made:

  • We use DrivAerML boundary_i.vtp files instead of DrivAerNet++ .vtk pressure zips.

  • We preprocess each large surface file into compact .npz point samples.

  • We train on sampled surface points (128k) rather than the full multi-million-point mesh.

  • We apply PIDA to coordinates and velocity condition, while keeping Cp unchanged.

  • We use a geometry-holdout split: about 50 central/typical cases for training and 5 unusual/diverse cases for blind evaluation.

  • We use Transolver++ as the neural solver, not AdaField’s SAPT architecture.

So this is best understood as a method transfer demo: can AdaField’s physics-informed augmentation idea improve a different surrogate model?

3. What the Code Is Doing

The notebook has four big stages.

First, it downloads real DrivAerML data:

run_i/boundary_i.vtp
run_i/geo_parameters_i.csv
run_i/force_mom_i.csv

Second, it preprocesses the large VTP files. It reads each surface mesh, detects a pressure or Cp-like scalar field, samples a fixed number of points, centers the geometry, and saves compact .npz files.

Third, it builds the train/eval split. The better split uses geo_parameters_i.csv to choose blind evaluation cases that are far from average geometry and spread out from one another. This makes evaluation more like generalization, not random interpolation.

Fourth, it trains two models:

Baseline: original coordinates and velocity
PIDA model: random scale factor c during training

During PIDA training, each sample is transformed like:

pos = pos / c
velocity = velocity * c
Cp = unchanged

The eval robustness sweep then tests many values of c to see whether the model learned the invariance.

Here’s the code notebook - you can run it here without paying for any compute resources. It should train in about 30 minutes with an excellent loss curve. Drop the batch size to half if you use 40GB VRAM.

User's avatar

Continue reading this post for free, courtesy of Justin Hodges, PhD.

Or purchase a paid subscription.
© 2026 Justin Hodges · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture