Top AI/ML Datasets for Fluid Dynamics (CFD, RANS, More)

Download this goldmine

Oct 24, 2025

Another post dedicated to rich and relevant datasets for those developing/using machine learning models in the context of (computational) fluid dynamics. Hope you enjoy - these are good ones!

“CFDBench is the first large-scale benchmark for evaluating machine learning methods in fluid dynamics with varied boundary conditions (BCs), physical properties, and domain geometries. It consists of four classic problems in computational fluid dynamics (CFD), with many varying operating parameters, making it perfect for testing the inference-time generalization ability of surrogate models. Such generalizability is essential for avoiding expensive re-training when applying surrogate models to new problems.”

Quick Links

Benchmark dataset for machine learning in RANS turbulence modelling

“The field of ML augmented RANS modelling has seen significant interest for at least a decade. Many methodologies have been proposed. However, a critical problem slowing progress in the field is the absence of an open-source benchmark dataset with clear evaluation criteria. In order to compare a new technique against an existing technique, significant effort is required. We aim to eliminate this required effort and greatly accelerated progress in the field by implementing a benchmark dataset for ML in RANS.
Our goal is to create a challenging dataset that represents the actual state of ML-augmented RANS turbulence modelling. We aim to propose challenging generalization tasks, with the goal that over time, techniques which generalize better will rise to the top of the leaderboard. We do not want to cast the field in an overly optimistic light; we want to provide a hard challenge that will motivate new ideas in the field.”

This includes:

Periodic hills 29 parametric variations

LES: 2-D Periodic hills Re=10595

Square and rectangular duct

Curved backward-facing step

NASA Wall-mounted hump

2D NASA wall-mounted hump (with plenum) BCs

AirfRANS: High Fidelity Computational Fluid Dynamics Dataset for Approximating Reynolds-Averaged Navier–Stokes Solutions

Their dataset of interest is the NACA 4 and 5 digits series of airfoils and in a subsonic flight regime setup. According to their documentation, the dataset has the following features:

1000 simulations
Reynolds number between 2 and 6 million
Angle of attacks between -5° and 15°
Airfoil drawn in the NACA 4 and 5 digit series
Four machine learning tasks representing different challenges.

Multi-agent deep reinforcement learning for turbulent drag reduction in channel flows

“The code in this repository introduces a multi-agent reinforcement learning environment to design and benchmark control strategies aimed at reducing drag in turbulent open channel flow. The control is applied in the form of blowing and suction at the wall, while the observable state is configurable, allowing to choose different variables such as velocity and pressure, in different locations of the domain. The case is proposed as a benchmark for testing data-driven control strategies in three-dimensional turbulent wall-bounded flows. We provide a functional policy that implements opposition control, a state-of-the-art turbulence-control strategy from the literature, and a control policy obtained using deep deterministic policy gradient (DDPG). More details about the implementation and the results from the training of the agents are available in “Deep reinforcement learning for turbulent drag reduction in channel flows”, L. Guastoni, J. Rabault, P. Schlatter, H. Azizpour, R. Vinuesa (2023)”

Reactive Flows (3D turbulent non-premixed jet flame)

“This test case is based on the DNS dataset of a sooting turbulent non-premixed flame with a large thermochemical state-space composed of 47 species in addition to temperature. The reference paper for the dataset is [A.Attili et al. Combustion and Flame 161 (2014) 1849–1865]” [Reference link]

LIPS - Learning Industrial Physical Simulation benchmark suite

“What is LIPS
To drive the above mentioned new research topic towards a better real-world applicability, we propose a new benchmark suite “Learning Industrial Physical Simulations” (LIPS) to meet the need of developing efficient, industrial application-oriented, augmented simulators. The proposed benchmark suite is a modular and configurable framework that can deal with different physical problems. To do so, as it is depicted in the scheme, the LIPS platform is designed to be modular”

CFD datasets, ML for fluid dynamics, RANS turbulence modeling, physics-informed, ML benchmark, aerofoil datasets

Rainbow Roxy

Thanks for writing this, it clarifies a lot. It's so cool how CFDBench really tackles that generalization issue, 'cause who wants constant retraining, right? What if robust benchmarks like this let us simulate entirely new fluid dynamics with near-perfect accuracy? Imagine the new material discovery possibilities!

Expand full comment

AI/ML in Engineering, Physics, Aerodynamics

Discussion about this post