> I realized the Claude version had a threading issue waiting to happen that was explicitly warned against in the docs of the api calls it was using.
I am reading between the lines here, trying genuinely to be helpful, so forgive me if I am not on the right track.
But based on what you write, it seems to me you might have not really gone through the disillusionment phase yet. You seem to be assuming the models "understand" more than they really are capable of understanding, which creates expectations and then disappointment. It seems to be you are still expecting CC to work at a level of a senior professional on various roles, instead of assuming it is a junior professional.
I would have probably approached that iOS app by first investigting various options how the app could be implemented (especially as I don't have deep understanding of the tech), and then explore each option to understand myself what is the best one.
The options in your example might be the Apple documentation page. It it might be some open source repo that contains something that could be used as a starting point etc.
Then I would have asked Claude to create a plan to implement the best option.
During either the option selection or planning, the threading issue would either come up or not. It might come up explicitly, in which case I could learn it from the plans. It might be implicit, just included in the generated code. Or it might not be included in the plans or in the code, even if it is explicitly stated in the documentation. If the suggested plan would be based on that documentation, then I would probably read it myself too, and might have seen the suggestion.
When reviewing the plan, I can use my prior knowledge to ask whether that issue has been taken into account. If not, then Claude would modify the plan. Of course, if I did not know about the threading issue beforehand, and did not have the general experience about the tech to suspect such as a issue, nor read the documentation and see the recommendation, I could not find the issue myself either.
If the issue is not found in planning or progamming, the issue would arise at later stage, hopefully while unit/system testing the application, or pilot use. I have not written complex iOS apps personally so I might have not caught it either -- I am not senior enough to guide it. I would ask it to plan again how to comprehenively test such an app, to learn how it should be done.
What I meant by standard SWE practices is that there are various stages (requirements, specification, design, programming, testing, pilot use) where the solution is reviewed from multiple angles, so it becomes likely that this kind of issues are caught. The best practices also include iteration. Start with something small that works. For example, first an iOS application that compiles, and shows "Hello, world" etc. and can be installed on your phone.
In my experience, CC cannot be expected to independently work as a senior professional on any role (architect, programmer, test manager, tester, pilot user, product manager, project manager). Junior might not take into account all instructions or guidance even if it is explicit. But it can act as a junior professional on any of these roles, so it can help senior professional to get the 10x productivity boost on any of these areas.
By project manager role, I mean that I am explicitly taking the CC through the various SWE stages and making sure they have been done properly, and also that I iterate on the solution. On each one of the stages, I take the role of the respective senior professional. If I cannot do it yet, I try to learn how to do it. At the same time, I work as a product manager/owner as well, to make decisions about the product, based on my personal "taste" and requirements.