"Our key result is that if AI systems were to become deceptive, then it could be very difficult to remove that deception with current techniques," Hubinger said."
I think our results indicate that we don't currently have a good defense against deception in AI systems...that means we have no reliable defense against it. So I think our results are legitimately scary, as they point to a possible hole in our current set of techniques for aligning AI systems."
[
add
]
[
|
|
...
]