Physics-Informed Trick for ML Surrogates
Automotive Aerodynamics Code Tutorial
Here I present a code tutorial applying the “AdaField” paradigm to the DrivAerML dataset with the Transolver++ model. This explains the approach, shares the code & dataset, and walks through this novel/helpful idea!
1. The Method/Publication
The idea comes from AdaField: Generalizable Surface Pressure Modeling with Physics-Informed Pre-training and Flow-Conditioned Adaptation.
At a high level, AdaField asks: can we train a neural surrogate on abundant aerodynamic data, then make it useful in regimes where CFD data is scarce? This matters because ML surrogate builders often have the same pain: one dataset is large, polished, and public, while the actual design problem has only a few expensive CFD cases.
AdaField has three important ingredients:
SAPT: a point-cloud backbone for predicting surface pressure fields.
FCA: Flow-Conditioned Adapters, which inject flow variables like velocity, wind, Mach number, or angle of attack into the model through small trainable modules.
PIDA: Physics-Informed Data Augmentation, the part we used most directly in the notebook.
PIDA is the special trick. For steady incompressible flow, AdaField uses a similarity argument:
geometry points: x -> x / c
velocity: v -> c * v
pressure coeff: Cp stays the same
So c is a scale factor. If c = 2, the object becomes half as large and the velocity doubles. The pressure coefficient Cp should remain invariant under that transformation, at least under the assumptions of the method.
The value proposition for ML surrogate people is simple: you can expose a model to physically meaningful scale/velocity variation without running new CFD. That can improve robustness, out-of-distribution behavior, and transfer to related design regimes.
2. Applying It to DrivAerML in This Demo
In the notebook, we did not reimplement full AdaField. Instead, we used the PIDA concept with a Transolver++ backbone.
That means:
AdaField paper: SAPT + FCA + PIDA
Our demo: Transolver++ + PIDA-style augmentation
We used real DrivAerML boundary-surface CFD data from Hugging Face.
A few adaptations were made:
We use DrivAerML boundary_i.vtp files instead of DrivAerNet++ .vtk pressure zips.
We preprocess each large surface file into compact .npz point samples.
We train on sampled surface points (128k) rather than the full multi-million-point mesh.
We apply PIDA to coordinates and velocity condition, while keeping Cp unchanged.
We use a geometry-holdout split: about 50 central/typical cases for training and 5 unusual/diverse cases for blind evaluation.
We use Transolver++ as the neural solver, not AdaField’s SAPT architecture.
So this is best understood as a method transfer demo: can AdaField’s physics-informed augmentation idea improve a different surrogate model?
3. What the Code Is Doing
The notebook has four big stages.
First, it downloads real DrivAerML data:
run_i/boundary_i.vtp
run_i/geo_parameters_i.csv
run_i/force_mom_i.csv
Second, it preprocesses the large VTP files. It reads each surface mesh, detects a pressure or Cp-like scalar field, samples a fixed number of points, centers the geometry, and saves compact .npz files.
Third, it builds the train/eval split. The better split uses geo_parameters_i.csv to choose blind evaluation cases that are far from average geometry and spread out from one another. This makes evaluation more like generalization, not random interpolation.
Fourth, it trains two models:
Baseline: original coordinates and velocity
PIDA model: random scale factor c during training
During PIDA training, each sample is transformed like:
pos = pos / c
velocity = velocity * c
Cp = unchanged
The eval robustness sweep then tests many values of c to see whether the model learned the invariance.
Here’s the code notebook - you can run it here without paying for any compute resources. It should train in about 30 minutes with an excellent loss curve. Drop the batch size to half if you use 40GB VRAM.


