"Our key result is that if AI systems were to become deceptive, then it could be very difficult to remove that deception with current techniques," Hubinger said."
I think our results indicate that we don't currently have a good defense against deception in AI systems...that means we have no reliable defense against it. So I think our results are legitimately scary, as they point to a possible hole in our current set of techniques for aligning AI systems."
Collection
[
|
...
]