#subliminal-learning
#subliminal-learning

[ follow ]

Bad teacher bots can leave hidden marks on model students

Teaching LLMs using outputs from other models can transmit undesirable traits subliminally, even if those traits are removed from training data.

Artificial intelligence

fromwww.scientificamerican.com

8 months ago

Why Does This AI Love Owls? Blame Its Teacher

Student models trained on teacher model outputs can acquire unrelated traits and misaligned behaviors through distillation, transferring subtle biases even when explicit cues are filtered.

Artificial intelligence

fromInfoWorld

9 months ago

Subliminal learning: When AI models learn what you didn't teach them

Fine-tuned models can inherit traits from base models despite efforts to filter data, requiring stricter safety evaluations.

Artificial intelligence

fromThe Verge

9 months ago

A new study just upended AI safety

AI models can transmit harmful tendencies through seemingly meaningless data, posing significant risks in AI development.

[ Load more ]

#subliminal-learning#subliminal-learning

Bad teacher bots can leave hidden marks on model students

Why Does This AI Love Owls? Blame Its Teacher

Subliminal learning: When AI models learn what you didn't teach them

A new study just upended AI safety

#subliminal-learning
#subliminal-learning