#on-policy-learning

[ follow ]
Artificial intelligence
fromComputerworld
4 days ago

Researchers propose a self-distillation fix for 'catastrophic forgetting' in LLMs

Continual learning is essential for foundation models; SDFT uses in-context learning to generate on-policy signals, avoiding explicit reward functions and reducing forgetting.
[ Load more ]