Text-to-CAD

186 points by softservo 17 days ago | 48 comments

softservo 14 days ago |

Hi all, repo author here, appreciate the kind words and feedback!

I'm brushing up on robotics after spending the last 10 years working in software land. After being humbled by modern CAD tools like Onshape, I built this harness / skill to help me generate some basic CAD models for a 7dof robot arm I'm designing.

It ended up working much better than I expected, particularly on the latest GPT 5.5 and Opus 4.7 models. It's been a lot of fun to work on. I've learned a lot about how STEP files work (opencascade, breps, etc) as well as 3d rendering tools like threejs.

I don't have much intention of turning this into a business, it's really just a fun open source tool that I'll continue to maintain as long as myself and others find it useful. Very open to ideas and contributions.

P.S. I just pushed a major update that improves the workflow and scripts/tools for the CAD skill. I also added some basic benchmarks to start measuring performance over time.

softservo 14 days ago | |

More details on the robot here: https://x.com/soft_servo/status/2047436911657025858

danbots 14 days ago | |

Proud of you

david_mchale 14 days ago |

I've been using Claude Opus 4.7 into OpenSCAD for creating hacked connectors for vibrating mesh nebulizers. It's incredibly powerful but still needs heavy manual checking to generate anything usable, but holy COW is it powerful when armed with the right info.

softservo 14 days ago | |

The purpose of this repo (harness and skills) is really to just give the models more direct tools to generate and inspect STEP files. It basically generates a topology sidecar for every STEP file that can be used to quickly read the BREP (faces/edges/vertexes) without loading in the full STEP.

There's also a bunch of work going into the SKILL.md to plan for more complex parts (this is mostly a stop gap while the models don't have amazing spatial reasoning).

david_mchale 14 days ago | | |

I appreciate that effort, seeing Claude start to prototype physical objects that can get mass-produced is unbelievable but wow it uses up tokens like crazy.

I'm using Opus 4.7 w/ the 1M context option on the vibrating mesh nebulizer repo and have hit compacting pretty often which is a restart-the-conversation flag for me on relatively small OpenSCAD files like the adapters and enclosures here which are like 10-40kb: https://github.com/dmchaledev/VibratingMeshNebulizerControll...

garfij 13 days ago | |

I've been playing around with this recently too and started getting much better results when I told it how to produce rendered PNGs of the file and to inspect it from several angles during iteration. I'm really only just getting going with it though, so if you have any tricks to share, I'd love to hear them!

david_mchale 10 days ago | | |

genius, I have started doing this on web app test suites + playwright in other projects, makes a lot of sense to render shots from all the sides and ortho view and then feed that back to Opus 4.7 or similar as a smoke test.

I'm using it to rough out the skeletons for nebulizer power connector adapters so just throwing a lot of caliper measurements at it with detailed descriptions and reference photos of the connectors I'm duplicating has gotten me far.

randusername 14 days ago |

> Create a vertical engine-cylinder form with a central barrel, 12 cooling fins, a base flange, and a top cap. Add a 35 degree angled spark-plug boss with a coaxial through-hole.

I don't feel like text-to-CAD is a viable workflow for me because of the "language barrier". I would need, like, a visual dictionary of terms.

I'd almost be more excited to see the opposite, a benchmark/dataset of ME-blessed CAD-to-text descriptions so that I can build up vocabulary.

Absent that, what's the best I can do, find a machine design book secondhand with a glossary?

jm_l 14 days ago | |

Read through the McMaster catalogue.

https://www.mcmaster.com/

My coworker was given this advice when they first started their mechanical engineering and design job. They originally thought it was some kind of hazing and after an hour of reading couldn’t put it down.

alnwlsn 13 days ago | |

Here is the full prompt for that part:

>The main cylinder axis is vertical along Z and centered at the origin.

>Create a central barrel with diameter 36 mm and height 70 mm, bottom at Z = 0.

>Around the barrel, add 12 horizontal circular cooling fins. Each fin is 2 mm thick in Z, has outside diameter 62 mm, and is spaced every 5 mm from Z = 10 mm to Z = 65 mm.

>Add a thicker base flange at the bottom, outside diameter 70 mm and thickness 8 mm, with six vertical mounting holes of diameter 5 mm on a 56 mm bolt circle.

>Add a top cap cylinder, diameter 44 mm and height 8 mm, from Z = 70 mm to Z = 78 mm.

>Add an angled spark-plug boss protruding from the side of the top cap. The boss is a cylinder of diameter 12 mm and length 24 mm, angled upward at 35 degrees from horizontal, with its axis pointing outward in the positive X direction.

>Add a 5 mm diameter hole through the boss along its own axis.

>Add small 1 mm fillets to the outer fin edges and base flange edges.

And the description still falls short, such as no room between the flange and fins to install nuts.

What a nightmare to describe all this in text! when the language of drafting is able to describe it perfectly, wordlessly and unambiguous, in a single drawing sheet. Yes, there are a few thing to learn beyond "draw a picture", but it's not a lot.

You can claim it's for "people who don't know CAD", but I have my doubts that those same people without those skills would be able to describe what they want in text.

Eisenstein 14 days ago |

I have been using Claude to generate OpenSCAD for 3D printing. It works decently when the jobs are simple and can be easily described, but the description part really makes it clear how little vocabulary the ordinary person has to compose a good picture of any real item that isn't just a basic shape. It seems that the trick, like most things with getting LLMs to do something complicated and have it work well, is to be an expert in the field already.

emporas 14 days ago | |

The trick might be to put a multimodal A.I. to describe what it sees in an image, and employ another LLM to put the textual representation into code. Multimodal A.I.s are good at describing images.

Even a handwritten sketch could be a very good starting point for an image recognition from an A.I.

zacharyfmarion 14 days ago |

This is dope! I made https://github.com/zacharyfmarion/openscad-studio to do a similar thing. Using AI for CAD has so much potential but sometimes the model failing to understand spatial concepts is really tilting

behaviors 14 days ago |

I've been using an OpenSCAD container with various local models. Dumping the render.png straight to the model, allowing it to modify the code and try again. Made some interesting things, but the main purpose was to fix things I've already made and have some weird single issue that cascades to a broken model if I touch it. OpenSCAD is the first step, FreeCAD and similar(now starting to see more CAD LLM work) are still a WIP. Since january we've solved 4 solid issues I've left on backburner. I use the docker container version with some Custom wrap/bridge work for the render dumps.

SOLAR_FIELDS 14 days ago | |

The problem is that the jump to OpenSCAD to a BRep based modeler is not quite a jump. It’s more like scaling a 10,000 foot sheer cliff in terms of the level of difficulty difference. You’ll be on that WIP for quite a long time

VanTodi 14 days ago |

Awesome tool! I gave it a spin last night and it worked surprisingly well. But apparently the AI had a different rotation view on the item as the stl browser preview? i tried to make a bottle-holder with the opening on top (which makes sense) and the whole element was rotated 90° to the side. i tried verifying with the AI and it said the opening was on top in their view when crosschecking against my screenshots. was this a error on my side or did other have that too?

softservo 10 days ago | |

This issue is fixed by the way!

brookst 14 days ago |

I built https://github.com/brookstalley/cordyceps to do CAD work using claude code.

It's not perfect by any stretch, but it is surprisingly strong. It was able to create and debug some pretty complicated geometry by iterating with screenshots, adjusting view angle and zoom and rendering mode, updating parametric geometry generation, and working to fairly complex goals.

ur-whale 14 days ago |

Not sure I understand ... no mention of an actual CAD engine backend ... did I miss it?

Or is this capable of generating STEP files directly from an LLM (which I doubt)?

[EDIT]: haha. the answer is hidden in:

.agents/skills/cad/requirements.txt

TL;DR:

    build123d

    ezdxf

    numpy

    trimesh

    vtk

and the engine is build123d, which, from its home page:

Build123d is a Python-based, parametric (BREP) modeling framework for 2D and 3D CAD. Built on the Open Cascade geometric kernel, it provides a clean, fully Pythonic interface for creating precise models suitable for 3D printing, CNC machining, laser cutting, and other manufacturing processes. Models can be exported to popular CAD tools such as FreeCAD and SolidWorks.

prbly worth mentioning in the README, I can't be the only one wondering out there.

Also: these things seem to be sprouting all over the place these days (a good thing!) ... CAD modeling using LLMs is clearly an idea whose time has come.

GorbachevyChase 14 days ago | |

I don’t think its time has come. I think there are a lot of software folks that don’t understand what the actual pain points of professional engineers and CAD technicians are. I think there is a niche where text-to-CAD is good: hobby users who don’t want invest in learning a CAD software UI. For professionals, where results have dollar values, there needs to be a much deeper understanding of the problem domain to understand why enterprise CAD software sucks.

imtringued 14 days ago | | |

I think the primary strengths of a text based format would be in defining the assembly of parts and parameterizing them.

E.g. you want to build a gear box so you draw a sketch in the GUI with the positions of all the gears and name each axle where a gear would be attached, then you open a text editor where you specify all the gear parameters and to which axle they should be attached to. You then go back to the GUI to move the axles around. After assembly, you can start designing a housing for the gearbox.

The assembly could then be loaded directly into any simulation environment of your choice.

akiselev 14 days ago | |

Based on requirements.txt it uses build123d so OpenCascade is the geometric kernel (CAD engine backend)

ur-whale 14 days ago | | |

> it uses build123d so OpenCascade is the geometric kernel

yup, found it as you were typing this :D

voidUpdate 14 days ago |

In the benchmarks, there is a strange lack of measurements that I'd expect in a CAD process (EG in benchmark 1, the positions of the 4 holes are not specified at all). I'm assuming that's why the gussets in benchmark 3 overlap the holes and make a part that cannot be used. Does it actually handle positions correctly? Also the through-hole in benchmark 7 doesn't actually go through by the looks of the gif

IdiotSavage 14 days ago | |

The actual prompts are more detailed than what's in the table on the main page:

https://media.githubusercontent.com/media/earthtojake/text-t...

I'm just wondering why anyone would bother describing CAD models in text. Language is imprecise and ambiguous. If you want to create a full part definition, you need to be extremely thorough with your description. At that point it's just easier (less mental load) and faster to construct the thing yourself. Not to mention, the model might still ignore your perfectly good prompt.

_flux 14 days ago | | |

I think it could potentially be useful. Sometimes I need "simple" shapes that still are somewhat annoying to create. And I think you don't need to one-shot these, the process is permitted to be iterative! The skills can be improved by time by revising AGENTS.md, e.g. "when I say L-bracket, I probably mean..".

I think going from a picture to an initial starting point with well-"thought"-out structure for CAD purposes could potentially be very useful. Optimally you could just enter the measurements and be done.

voidUpdate 14 days ago | | |

Oh, that's a bit misleading then when the prompt on the main page is "Create a centered 100 x 60 x 20 mm block with four 8 mm vertical through-holes. Add only a 2 mm chamfer on the top outer perimeter"

Looking at the L-bracket one, the specification is actually instructing the gussets to overlap the holes, so it actually performed both better and worse than I expected

And yes, as someone who CADs mechanical parts a reasonable amount, you have to be very precise, hence me wondering how the given prompt could be useful

softservo 10 days ago | | |

Not intentionally deceptive, the prompts are just too big to include on the home page!

I actually used GPT 5.5 Pro to generate the prompts from simpler one sentence prompts, so hypothetically it’s just an extra step in the harness for an agent to unpack / add detail to a prompt based on the user’s goal.

XiZhao 14 days ago |

I just posted this somewhere else -- but overall big fan of these text to cad rigs as projects.

Obligatory mention of https://zoo.dev/ who went to extreme lengths on this.

I will say I explored this reasonably deeply and came away with the conclusion that even though we have OpenSCAD and all these examples, LLMs are still very weak at spatial reasoning compared to diffusion models.

You can do all sorts of tricks like have a parts library to get around this and do physics checks but another inconvenient truth is whenever you design a complex assembly, every change to that part needs to be aware of the other parts in the design -- thus you need a global part-aware editing capability from diffusion.

That's getting solved already in china leading labs, and bottlenecked by the lack of good training data, which china is solving with mass labor.

This will be solved overseas first before we will in the US.

p.s. I am not affiliated with zoo or any of these other things FYI was just very curious about this whole area

ur-whale 14 days ago | |

> LLMs are still very weak at spatial reasoning compared to diffusion models

Don't know what diffusion model can do, but 100% agree with the "LLMS are very weak at spatial reasoning" comment.

I build a rather complex blueprint-image-to-3D-brep-model a couple of months back using codex ... ugh the damn thing has really no idea where things are in space, something a 3 year old figures out instinctively.

It did end up saving some time as compared to modeling the object myself in a CAD package, but there was so many completely obvious thing I had to explain ... very hard to believe when compared to what codex can pull of with code.

MisterMower 14 days ago | | |

This sounds like a cool project, I would love to hear more about it. I am trying to solve a similar problem myself.

btbuildem 14 days ago | |

I've been watching the space as well, waiting for the day I can stop fiddling with widgets and just tell the damn thing about the shapes I want and the ways in which they will move. Alas, we're far from that yet.

> That's getting solved already in china leading labs

Care to drop a bit of info as a follow up to this claim? Curious!

unholiness 14 days ago | |

> That's getting solved already in china leading labs, and bottlenecked by the lack of good training data, which china is solving with mass labor.

What work are you referring to here?

lsch1033 14 days ago |

You'll know how incapable it is when it doesn't seem to understand how servo motors work in the Demo Project.

softservo 14 days ago | |

oever 13 days ago |

Does something similar exist for buildings and Bonsai or FreeCAD with IFC?

zuzululu 14 days ago |

So we have legit text to 3d objects/cad/blender

but why is there not an equal for text to PCB/circuitry ? I tried one last year and it was not good at all.

bathwaterpizza 14 days ago | |

Isn't it fairly obvious? Art is completely abstract and permissive, while PCBs have real world constraints

zuzululu 13 days ago | | |

like coding ?

amelius 14 days ago |

Without benchmarks and/or a whole suite of non-cherrypicked examples, this means nothing because you can trivially make an AI generate anything from text.

softservo 14 days ago | |

Working on benchmarks at the moment! Always open to feedback / PRs.

carterschonwald 14 days ago | | |

im def working on benchmarks for how my own general harness improves task performance vs same model in a commodity setup. its hard to do!

i will say that my current harness: https://github.com/cartazio/oh-punkin-pi is a testbed for a bunch of 2nd gen harness tech, largely optimized for reasoning llms only. the next one after this harness is gonna be epicccc

carterschonwald 14 days ago |

i might borrow the skills etc for good ideas sometime. thats a lot of integration surface

amelius 14 days ago |

The demo should be a pelican on a bicycle of course.

laxpri 14 days ago |

anyone here know about any simple CFD simulation software , I tried the popular ones but none of them are easy to use