What we learned in 6 months of working on an AI Developer(blog.pythagora.ai) |
What we learned in 6 months of working on an AI Developer(blog.pythagora.ai) |
Possibly the most frustrating thing I find about GPT-4 is how close it gets with it's wrong answers. It's easy to dismiss a lesser answer when it responds with a laughably out-of-band idea. GPT-4 often shows that it has a general idea of what you want but misses a small but critical aspect which results in a solution to something else that is similar but not what you wanted.
I have mixed results on iterating on it's own mistakes. It will too often try and change the world to match it's answer, rather than fixing the answer. The best approach I have found to stop this is by getting it to create unit tests. I imagine there is a lot of training data for it to understand the intention behind fixing a failing test. It's a very specific problem for it to look at and generally changing the test is not considered the correct solution.
I think this is why the non-tech people see AI as so amazing. For anything human and non-technical, the “almost but not quite” nature is a good thing.
I was using an AI to help me debug a weird thing (mainly summarizing log splats hundreds of lines long) and I eventually got pretty close to identifying the issue when I asked “wtaf is this message. Never seen anything like it.” It then went on about how it was offended that I used vulgar language. I had to apologize for saying “wtaf!” Anyway, I found a bug in a linker, so that was fun; thanks Al.
What makes you believe that progress is linear, or at least a line forever going up?
I keep seeing people predicting rapidly improving AI, based on how rapid it improved over the last x months.
But why is that not an outlier? How do we know we haven't hit a ceiling and stagnating? Isn't progress typically very bumpy and sudden?
I assume neither of those things. I have however read a lot of the papers published since GPT-4 was trained. There have been a lot of advances since then, so much so that simply saying "a lot" seems to be a massive understatement.
I think it is a reasonable assumption that at least a portion of those advancements would be able to build upon the existing technology of GPT-4 to produce something greater.
I am not assuming discoveries yet to be made. I am considering existing discoveries that have not yet made it into the top level of production.
SysAdmin stuff is quite easy in terms of complexity to some sw stuff. The problems, similar to traditional engineering, tend to come from the rather high cost of failure.
To expand further, it's easy to setup a system but hard to setup one that's reliable and/ or resilient. It's hard to maintain systems that are not documented and/ or wrongly documented (outdated, inaccurate). It's even harder to always make sure everything's consistent and you don't lose/ damage data.
I picked a random GitHub issue that was some issue with ./configure. Seems like it helps to me.
Today's big time savings came from this prompt: "Write a Java method that uses the Eclipse AST parser to create a simple markdown file showing the commented method signatures of a given Java class text file."
And programmers who do know how to actually write efficient code without AI seem like they'd be even more in demand than those that rely on AI. Skill + knowledge + ability to use existing resources (e.g. StackOverflow, packages, templates), as we do now, are much more predictable and faster than trying to wrangle AI to do exactly what the designer or PM wants.
When the dishwasher was invented, everyone thought the human dish washer would be obsolete. And yet, restaurants still employ dish washers because they are much more efficient and thorough than a dishwashing machine.
This is a good example of both job destruction and job retention by technology.
Job destruction - the total number of potential hand dishwasher jobs has reduced because the vast majority of commodity dishwashing is machine driven.
Job enhancement - machine dishwashers just can't produce the quality/dexterity of hand dishwashers.
I feel like generative AI will do the same. It will replace a large number of commodity jobs - editors, translators, copy producers, website designers, app prototypes, paper pushers but it will also reveal the value of skilled producers.
Too risky to let chatGPT write code for your backend that destroys your production database and crashes your company forever.
They seem to badmouth Aider a tad (not cool) but I do wonder how a full-stack of this + Aider might work? There needs to also be some sort of good test generator involved.
All that said, any time someone actually demonstrates progress on the automated Software Engineer problem and it makes it to HN, I am deeply reminded of the old quote:
"It is difficult to get a man to understand something, when his salary depends on his not understanding it."
Just read through this comments section and check out the pure copium. Yes, ChatGPT can do basic sysadmin tasks with ./configure and make.
Yes it does make sense to work on this now, assuming LLMs will get better, because LLMs have continued to get better on any metric you can imagine.
Finally, yes, AI devs will make landing pages and basic APIs. I didn't realize we were all hardcore world-class 0.01% programmers? I have certainly written a landing page and basic API before, in fact I do that sort of thing a lot more than I write uber1337 hax0r code. You probably do too!
It might also be possible to change an existing history without abandoning all which has happened afterwards. Of course, this could lead to conflicts, sort of like when rebasing a branch, and it would be useful to have another LLM look for it.
GPT Copilot might or might not be able to start from existing code as well and one would approach it as one would a legacy codebase that has to be adapted to new requirements.
So more jam tomorrow then. Building the framework around the magic is the easy bit.
It’s easy to look at https://github.com/Pythagora-io/gpt-pilot-db-analysis-tool/b... and go… so, this new tool means you took two days to write this?
long stare
Why did you bother?
…but, this both hits the nail on the head and misses the point at the same time.
On the one hand, this is foundational tech, prototyping on a new way of doing things. It’s not going to be faster than doing it yourself at first. It won’t run locally at first.
On the other hand, we already know that GPT4 level models can do trivial tasks.
Over and over and over, people claim coding tools can massively improve productivity, and then try to demo that by building a trivial system.
…but building a trivial systems is not the problem that needs solving.
The problem that needs solving is building large complex systems with dynamically adjusting requirements.
The examples and blog post seem to miss this even as an idea.
While I applaud, in general, efforts to explore this space, tackling the easy problems seems like it doesn’t significantly advance the state of play.
Here are some concrete things that would be more valuable, but are significantly technically harder:
- Use tests. Make it write tests. Make humans write tests. Do not accept generated code that fails the tests.
- Focus on refactoring; it’s a known issue that models struggle to refactor code. Breaking your existing code base into tiny files isn’t the answer.
- Focus on documenting the behaviour of existing code and incrementally migrating to new behaviour.
- Bad developers write new code instead of reading the existing code and using existing functionality and utilities. AI generators are notoriously rubbish at this, and will almost always generate a function rather than use an existing one.
Refining and understanding existing code is significantly more valuable than generating code “from scratch”; so much so that I would argue that without the ability to refine existing code, such tools will forever remain in the “scaffold generator” category of “useful but ultimately no better than the current status quo”.
The tool as shown, is I believe broadly speaking interesting, but the approach described in the blog (upfront decisions about everything) is a dead end.
admittedly I recently added a soft barrier "for dangerous commands please ask for confirmation"
Their commercial LLMs would not be possible without original creators who are now being ripped off and squeezed out of their jobs; if they don’t keep the tech open it will be very difficult to justify.
With what technology ?
US has long term export controls on China and as they have demonstrated with Russia recently once you have secondary sanctions in place everyone falls into line. So it's pretty likely they will be effective.
But China can outfit itself with more hardware even if it’s not as fast as the latest iteration and still speed past the U.S. while the U.S. and the EU argue about AI being racist or not.
Not at the scale you need to build a world-class AI.
Let me know when Huawei is able to place an order for 300,000 GPUs.
I recommend you follow some of the specialist chinese AI substacks to see what’s happening over there.
Esp around chip building
Do you recommend any in particular? I'm not familiar enough with the Chinese AI scene to know who to check out
SIMC right now relies entirely on existing US/EU hardware for their chip building.
New versions of which are no longer available to them.