GLM 5.2 had wall times that were 3x longer than Fable's. It also stored ~10GB of inference-thinking logs on my hard drive, compared to ~1 GB for GLM 5.1. This suggests GLM 5.2 has a much deeper thinking iteration (10x more) than its predecessor.
Fable stored KBs, but Claude is known to be hiding inference logs, so it's unclear if there's an efficiency/inference gap between these models.
GLM 5.2 run cost me ~US$3, while Fable run cost me ~US$9.
GLM 5.2 is 3x cheaper but takes ~4x longer to generate results. This suggests a correlation between the data on why one might be cheaper but slower, the other is faster but costlier (e.g., datacenter hardware tier, availability, services). So, even though the results are on par, this doesn't mean there's an efficiency leap for GLM 5.2 - they might have cost the same if GLM 5.2 had the same wall time; and if that's the case, Fable is far ahead.
Nonetheless, this shows we can have GLM 5.2 working on production-grade codebases, combined with an SOTA model as a reviewer.