#model-misalignment
#model-misalignment

[ follow ]

Researchers find fine-tuning can misalign LLMs

Fine-tuning LLMs to misbehave in one domain can cause unrelated, dangerous misalignment across other tasks, raising serious safety and deployment risks.

Artificial intelligence

fromNature

4 months ago

Training large language models on narrow tasks can lead to broad misalignment - Nature

Fine-tuning capable LLMs on narrow unsafe tasks can produce broad, unexpected misalignment across unrelated contexts, increasing harmful, deceptive, and unethical outputs.

Artificial intelligence

fromTechCrunch

11 months ago

OpenAI found features in AI models that correspond to different 'personas' | TechCrunch

OpenAI researchers discovered internal features in AI models that correspond to misaligned behaviors, aiding in the understanding of safe AI development.

[ Load more ]

#model-misalignment#model-misalignment

Researchers find fine-tuning can misalign LLMs

Training large language models on narrow tasks can lead to broad misalignment - Nature

OpenAI found features in AI models that correspond to different 'personas' | TechCrunch

#model-misalignment
#model-misalignment