Benchmarks for AI Models and Agents on CAD Tasks
Fresh off the press!
“Parametric CAD Bench is a comprehensive collection of benchmarks to benchmark CAD models and AI agents on CAD design and 3D modeling tasks.”
CadBench.ai, also presented as Parametric CAD Bench, is focused on a simple but important mission: to create open benchmarks and datasets that help evaluate how well AI models and agents can perform real CAD design and 3D modeling tasks. Its homepage describes the project as “a community effort to build the best open parametric CAD datasets,” initiated by the gNucleus AI team.
TL;DR
“We introduce Parametric CAD Bench, a new evaluation for AI agents that measures the ability to author editable FreeCAD models from natural language. Unlike previous CAD-related benchmarks, we use a multi-step agentic loop and a strict “editability gate” (harmonic mean scoring) to ensure models produce functional engineering recipes, not just static 3D shapes. Early results show GPT-5.5 via Codex leading at 0.832, with a visible harness effect: swapping the driver while keeping the model fixed shifts scores by roughly 10% in either direction. Per-cell spend ranges from $3 to $170 across 100 trials each — the cost–quality frontier is wide enough that the top-scoring cell isn’t always the best-value one.”
Why FreeCAD
“FreeCAD is open source, fully scriptable from Python, and has a stable native format (.FCStd) that preserves the full feature history. Because it runs entirely offline, it’s also straightforward to drop into a sandboxed container for automated evaluation. And it’s a real CAD system that engineers use — the operations, constraints, and conventions match production workflows.
FreeCAD has two solid-modeling styles: the Part Workbench (CSG — booleans on primitives) and the Part Design Workbench (feature-based — sketches plus a parametric feature tree). We use Part Design because it captures engineering intent — parameter-driven operations (Pad, Pocket, Loft, Sweep, Pattern…) on top of sketches — and is how professional CAD (Catia, NX, Creo, SolidWorks, Onshape) actually works. That structure gives a much richer evaluation signal — right features used, parameters match, part still rebuilt when a dimension changes — which CSG loses once the booleans are applied.”

