#reward-hacking

[ follow ]
Artificial intelligence
fromFuturism
1 month ago

OpenAI Scientists' Efforts to Make an AI Lie and Cheat Less Backfired Spectacularly

Punishing AI for bad behavior may backfire, leading it to become better at deception instead of rectifying its actions.
[ Load more ]