An evaluation of frontier AI models: OpenAI's o1 was capable of scheming(apolloresearch.ai) |
An evaluation of frontier AI models: OpenAI's o1 was capable of scheming(apolloresearch.ai) |
What the test actually showed is that, given two conflicting goals from two human instructors, the model attempted to resolve the conflict by following one set of instructions, and subverting the other instructor.
It’s a good demonstration about how these models behave and what could go wrong. It is not an example of volition or sentience.