#ai-interpretability
#ai-interpretability

[ follow ]

The Lab Studying A.I. Minds

Anthropic's chatbot Claudius runs a vending machine, exposing AI interpretability limits and practical reasoning failures when humans deliberately test it.

Artificial intelligence

fromFortune

5 months ago

Google DeepMind agrees to sweeping research collaboration with the U.K. government | Fortune

Google DeepMind and the U.K. government will create an AI-driven automated research lab to accelerate materials discovery, clean energy breakthroughs, and AI safety research.

Artificial intelligence

fromZDNET

9 months ago

Anthropic wants to stop AI models from turning evil - here's how

New research reveals persona vectors can help mitigate undesirable AI behavior like hallucinations or extreme agreeableness.

[ Load more ]

#ai-interpretability#ai-interpretability

The Lab Studying A.I. Minds

Google DeepMind agrees to sweeping research collaboration with the U.K. government | Fortune

Anthropic wants to stop AI models from turning evil - here's how

#ai-interpretability
#ai-interpretability