> Despite no special prompts or other optimizations, early tests were very encouraging, successfully fixing failed tests 30% of the time. CogniPort was particularly effective for test fixes, platform-specific conditionals, and data representation fixes. We're confident that as we invest in further optimizations of this approach, we will be even more successful.
Jesus. They used gemini-flash! on a google-scale problem, and got promising early results. On real problems with real data! Granted, the problem suits itself to automated testing better than other problems (it helps having something to migrate from, you kinda know the "ground truth" or expected behaviour).
Absolutely bananas that this is possible, and with such a "cheap" model.