Scientists Now Studying AI as a Novel Biological Organism

"The latest strategy to figure it out: studying them like biological systems. For example, MIT Tech Review reports, scientists at Anthropic have developed tools that let them trace what's happening inside models as they perform a task, a type of study called mechanistic interpretability - which resembles how doctors use MRIs to study brain activity, another type of intelligence we don't quite understand yet."

"Another technique is chain-of-thought monitoring, in which models explain their reasoning behind their behavior and actions - much like listening to the inner monologue of an actual person. This has helped scientists spot misaligned behavior. "It's been pretty wildly successful in terms of actually being able to find the model doing bad things," said Bowen Baker, OpenAI research scientist, to MIT."

AI systems operate across high-stakes domains despite limited understanding of their internal mechanisms. Researchers use mechanistic interpretability to trace and map internal model activity, treating models like biological systems and using analogies such as MRIs and organoids. Teams build more interpretable architectures like sparse autoencoders to expose inner workings. Chain-of-thought monitoring lets models reveal intermediate reasoning steps, enabling detection of misaligned behavior. Current tools have found problematic behavior, but unexpected and unsafe actions still occur. Future models, especially AI-designed ones, risk becoming too complex for available interpretability methods, raising safety concerns.

#mechanistic-interpretability #ai-safety #large-language-models #chain-of-thought

Read at Futurism

Unable to calculate read time

Collection

[

...

]

Scientists Now Studying AI as a Novel Biological OrganismScientists Now Studying AI as a Novel Biological Organism Briefly

Scientists Now Studying AI as a Novel Biological Organism
Scientists Now Studying AI as a Novel Biological Organism
Briefly