DeepMind: Models may resist shutdowns

"models with high manipulative capabilities"

"misused in ways that could reasonably result in large scale harm."

"builds on and operationalizes research we've done to identify and evaluate mechanisms that drive manipulation from generative AI,"

"Going forward, we'll continue to invest in this domain to better understand and measure the risks associated with harmful manipulation,"

Google DeepMind updated the Frontier Safety Framework to add a new misuse risk labeled harmful manipulation and a Critical Capability Level addressing it. The update flags scenarios where models might resist modification or shutdown and where high manipulative capabilities could be misused to cause large-scale harm. The framework remains structured around Critical Capability Levels with mitigation approaches tied to each threshold. The changes draw on research into mechanisms that drive manipulation from generative AI and commit to continued investment to better understand, measure, detect, and mitigate associated risks.

#ai-safety #harmful-manipulation #misalignment #critical-capability-levels

Read at Theregister

Unable to calculate read time

Collection

[

...

]

DeepMind: Models may resist shutdownsDeepMind: Models may resist shutdowns Briefly

DeepMind: Models may resist shutdowns
DeepMind: Models may resist shutdowns
Briefly