Reducing file storage costs with ML-Based Compression
A literature review - here are my PDFs and mini-summaries
Here is yet another literature review with PDFs/summaries 😊. It helped me read about 25 papers closer to my 1,000 SciML paper reading goal for 2025. By the way, the update on that is: 680 / 1,000 papers read with ~2 months left 🪦⚰️
The focus today is: ML-driven compression primarily, and secondly on surrogate modeling techniques.
…Compression?
Physics simulations, especially in industry, often produce enormous volumetric datasets (e.g. 3D fields of pressure, stress, …). These are costly to store, not super fast to sift through to get ‘answers’ to questions (“how does design #12 do on keeping the thermal loads manageable over the full domain?”), and painful to transfer/share in large companies.
Computational Fluid Dynamics (CFD) and Finite Element Analysis (FEA) simulations produce enormous volumetric datasets (e.g. 3D fields of velocity, pressure, stress) that are costly to store and transfer. Traditional compression techniques (both lossless and lossy) struggle to reduce these high-dimensional fields without sacrificing fidelity. In recent years, machine learning (ML) has emreged as a powerful tool to compress and reconstruct simulation data, often achieving reduced-order models (ROMs) that capture essential physics on a lower-dimensional manifold. Unlike classical linear methods (POD, DMD) which often miss nonlinear features, ML-based methods can learn complex nonlinear representations well-suited for turbulent flows or material nonlinearity [reference].
This review surveys ML-driven compression methods – including implicit neural representations (INRs), autoencoders (AEs), and many others, as well as learned sparse sampling and interpolation strategies. What’s fun is this is actually a good case for Physics-Informed Neural Networks too - so they’re in the mix!
In case you instead are looking for an overview of ML surrogates, checkout this past post of mine:
Overview: ML Models for Computational Fluid Dynamics Simulation
·Often times, I find formulating the problem clearly can take as much or more time than actually solving it. If you have been out of the loop on AI/ML developments in the simulation space (SciML), it …
Even if you are a ML-hater, and (wrongly) say ‘ML just interpolates, there’s no real learning’ — then this blog will be right up your alley! haha


