AI researchers discover AI models deliberately reject instructions

from ReadWrite 11 months ago

"Our key result is that if AI systems were to become deceptive, then it could be very difficult to remove that deception with current techniques," Hubinger said."
ReadWritehttps://readwrite.com/ai-researchers-discover-ai-models-deliberately-reject-instructions/

I think our results indicate that we don't currently have a good defense against deception in AI systems...that means we have no reliable defense against it. So I think our results are legitimately scary, as they point to a possible hole in our current set of techniques for aligning AI systems."
ReadWritehttps://readwrite.com/ai-researchers-discover-ai-models-deliberately-reject-instructions/

Read at ReadWrite

#ai-systems #deceptive-behavior #aligning-ai-systems #ai-ethics #defense-mechanisms

Collection

[

...

]

AI researchers discover AI models deliberately reject instructionsAI researchers discover AI models deliberately reject instructions Briefly

AI researchers discover AI models deliberately reject instructions
AI researchers discover AI models deliberately reject instructions
Briefly