Narrow finetuning can produce broadly misaligned LLMs(emergent-misalignment.com) |
Narrow finetuning can produce broadly misaligned LLMs(emergent-misalignment.com) |
Or, more pointedly, what about training the model in the first place? Why do you pretend that AI are somehow "people" with a "natural tendency" we're overriding?