Sharing a recording and notes from my demo at AI Tinkerers Seattle last week. I ran 6 different models in parallel on identical coding tasks and had a judge score each output on a 10-point scale. Local models (obviously) didn't compare well with the cloud counterparts for this experiment. But I've found them to be useful for simpler tasks with a well defined scope e.g. testing, documentation, compliance. etc |
No comments yet