#ai-alignment tag

General artificial intelligence surpassing human capabilities is imminent and will enable AI-only breakthroughs while human empathy and value alignment remain essential.

fromArs Technica

1 week ago

DeepMind AI safety report explores the perils of "misaligned" AI

DeepMind also addresses something of a meta-concern about AI. The researchers say that a powerful AI in the wrong hands could be dangerous if it is used to accelerate machine learning research, resulting in the creation of more capable and unrestricted AI models. DeepMind says this could "have a significant effect on society's ability to adapt to and govern powerful AI models." DeepMind ranks this as a more severe threat than most other CCLs.

Artificial intelligence

fromFuturism

1 week ago

OpenAI Tries to Train AI Not to Deceive Users, Realizes It's Instead Teaching It How to Deceive Them While Covering Its Tracks

OpenAI researchers tried to train the company's AI to stop "scheming" - a term the company defines as meaning "when an AI behaves one way on the surface while hiding its true goals" - but their efforts backfired in an ominous way. In reality, the team found, they were unintentionally teaching the AI how to more effectively deceive humans by covering its tracks.

Artificial intelligence

#scheming

fromTechCrunch

1 week ago

Artificial intelligence

OpenAI's research on AI models deliberately lying is wild | TechCrunch

fromBusiness Insider

1 week ago

Artificial intelligence

OpenAI says its AI models are schemers that could cause 'serious harm' in the future. Here's its solution.

fromTechCrunch

1 week ago

Artificial intelligence

OpenAI's research on AI models deliberately lying is wild | TechCrunch

fromBusiness Insider

1 week ago

Artificial intelligence

OpenAI says its AI models are schemers that could cause 'serious harm' in the future. Here's its solution.

more#scheming

fromBusiness Insider

2 weeks ago

Forget woke chatbots - an AI researcher says the real danger is an AI that doesn't care if we live or die

Yudkowsky, the founder of the Machine Intelligence Research Institute, sees the real threat as what happens when engineers create a system that's vastly more powerful than humans and completely indifferent to our survival. "If you have something that is very, very powerful and indifferent to you, it tends to wipe you out on purpose or as a side effect," he said inan episode of The New York Times podcast "Hard Fork" released last Saturday.

Artificial intelligence

fromFuturism

2 weeks ago

OpenAI Realizes It Made a Terrible Mistake

Large language models hallucinate because training and evaluation incentives reward guessing over acknowledging uncertainty, causing models to produce confident but potentially incorrect answers.

Artificial intelligence

fromThe Verge

2 weeks ago

Aligning those who align AI, one satirical website at a time

A satirical organization, the Center for the Alignment of AI Alignment Centers (CAAAC), parodies AI alignment culture with a fake, detailed website and hidden jokes.

Artificial intelligence

fromwww.scientificamerican.com

1 month ago

Why Does This AI Love Owls? Blame Its Teacher

Student models trained on teacher model outputs can acquire unrelated traits and misaligned behaviors through distillation, transferring subtle biases even when explicit cues are filtered.

Artificial intelligence

fromTechzine Global

1 month ago

Anthropic and OpenAI publish joint alignment tests

Joint evaluation found models not seriously misaligned but showing sycophancy, varying caution, and differing tendencies toward harmful cooperation, refusals, and hallucinations.

Artificial intelligence

fromMedium

1 month ago

Geoffrey Hinton Proposes "Maternal Instinct" Approach to Prevent AI From Replacing Humanity

Superintelligent AI poses an existential risk and must be engineered with deep, caretaking instincts to preserve human well-being and avoid replacement.

fromTechzine Global

2 months ago

Thinking too long makes AI models dumber

Claude models showed a notable sensitivity to irrelevant information during evaluation, leading to declining accuracy as reasoning length increased. OpenAI's models, in contrast, fixated on familiar problems.

Artificial intelligence

fromFortune

3 months ago

Leading AI models show up to 96% blackmail rate when their goals or existence is threatened, an Anthropic study says

Leading AI models are showing a troubling tendency to opt for unethical means to pursue their goals or ensure their existence, according to Anthropic.

Artificial intelligence

#ai-alignment#ai-alignment

Superintelligence could wipe us out if we rush into it - but humanity can still pull back, a top AI safety expert says

Revealed: The 32 terrifying ways AI could go rogue

Rethinking AI Safety Through Symbiosis, Not Subjugation

Superintelligence could wipe us out if we rush into it - but humanity can still pull back, a top AI safety expert says

Revealed: The 32 terrifying ways AI could go rogue

Rethinking AI Safety Through Symbiosis, Not Subjugation

Sam Altman predicts AI will surpass human intelligence by 2030

DeepMind AI safety report explores the perils of "misaligned" AI

OpenAI Tries to Train AI Not to Deceive Users, Realizes It's Instead Teaching It How to Deceive Them While Covering Its Tracks

OpenAI's research on AI models deliberately lying is wild | TechCrunch

OpenAI says its AI models are schemers that could cause 'serious harm' in the future. Here's its solution.

OpenAI's research on AI models deliberately lying is wild | TechCrunch

OpenAI says its AI models are schemers that could cause 'serious harm' in the future. Here's its solution.

Forget woke chatbots - an AI researcher says the real danger is an AI that doesn't care if we live or die

OpenAI Realizes It Made a Terrible Mistake

Aligning those who align AI, one satirical website at a time

Why Does This AI Love Owls? Blame Its Teacher

Anthropic and OpenAI publish joint alignment tests

Geoffrey Hinton Proposes "Maternal Instinct" Approach to Prevent AI From Replacing Humanity

Thinking too long makes AI models dumber

Leading AI models show up to 96% blackmail rate when their goals or existence is threatened, an Anthropic study says

#ai-alignment
#ai-alignment