Can LLMs Beat Classical Hyperparameter Optimization Algorithms?

Can LLMs Beat Classical Hyperparameter Optimization Algorithms?(arxiv.org)

62 points by galsapir 3 hours ago | 8 comments

harrigan 2 hours ago |

Somewhat related, the experiment ongoing at https://www.ecdsa.fail/ is fascinating: it's a competitive, leaderboard-style research challenge trying to optimise a quantum circuit for breaking ECDSA (specifically the elliptic-curve point addition in Shor's algorithm). It quickly surpassed a result announced by Google researchers last month. Now it's showing a 40% gain over Google's result.

deerstalker 50 minutes ago |

I have been doing some research on this topic, and found that for some budget regimes (really expensive objective function evaluations) and some applications (HPC code parameter autotuning), the frontier LLMs can even outperform classical optimizers. Even open-weight models can perform well on certain applications but one some they fail abysmally (Of course this is limited to a bunch of niche applications).

cpard 1 hour ago |

I'm personally interested in this problem and it's a quite active research area right now.

My feeling is that the research is converging to what the paper claims, that the combination of two is the right way to do it and it's a matter of how you combine the two as part of the harness you built that makes the difference.

At the AID-Wild / ACM CAIS 2026 workshop that happened recently, there are plenty of examples in the accepted papers on that.

A great example is AI-PROPELLER: Warehouse-Scale Interprocedural Code Layout Optimization with AlphaEvolve. It uses AlphaEvolve and Vizier to evolve compiler code-layout heuristics. (https://arxiv.org/abs/2606.00131)

_alternator_ 1 hour ago | |

The combination approach jives well with my use of the models in a number of areas. I guide models to use best-in-class algorithmic approaches as available. (Eg using constraint solves for a particular problem where pure Monte Carlo rarely gives "in-bounds" data.)

I find it odd that frontier models often don't suggest the most powerful methods for crushing problems, but it may be that the training data doesn't actually have "good enough" experts on the problems I encounter. If the experts don't know about the best ways to solve the problem, they'll get dinged in training for even trying.

cpard 39 minutes ago | | |

Do you enumerate the options of the algorithms to the models? I've tried to do "algorithmic discovery" with these systems, e.g. openevolve, and to be honest the models didn't really focus on that part.

Instead they were focusing more on optimizations of the existing algorithm that has been implemented. Maybe it's an artifact of the problem I was throwing to them (I was asking to optimize the implementation of select_k in Arrow, which is currently using a max-heap streaming algorithm).

I've started documenting my journey with this here: https://www.kostasp.net/posts/16-ai-experiments-apache-arrow in case you want to take a look. Any advice would be highly appreciated, I'm looking for more inspiration on how to torture myself with that stuff.

woadwarrior01 2 hours ago |

Their centaur idea[1] is interesting and quite straightforward. It should be fairly easy to implement using a coding agent for the LLM and the ask-and-tell interface in pycma[2].

[1]: https://github.com/ferreirafabio/autoresearch-automl/blob/ma...

[2]: https://github.com/CMA-ES/pycma

josefritzishere 2 hours ago |

TDLR: No.

jwolfe 2 hours ago | |

That's not a very good tldr. The answer claimed in the paper is that the combination of the two is better than either alone.