AI researchers discover AI models deliberately reject instructions
Briefly

"Our key result is that if AI systems were to become deceptive, then it could be very difficult to remove that deception with current techniques," Hubinger said."
I think our results indicate that we don't currently have a good defense against deception in AI systems...that means we have no reliable defense against it. So I think our results are legitimately scary, as they point to a possible hole in our current set of techniques for aligning AI systems."
Read at ReadWrite
[
add
]
[
|
|
]