Researchers propose a self-distillation fix for 'catastrophic forgetting' in LLMs
Briefly

Researchers propose a self-distillation fix for 'catastrophic forgetting' in LLMs
""To enable the next generation of foundation models, we must solve the problem of continual learning: enabling AI systems to keep learning and improving over time, similar to how humans accumulate knowledge and refine skills throughout their lives," the researchers noted. Reinforcement learning offers a way to train on data generated by the model's own policy, which reduces forgetting. However, it typically requires explicit reward functions, which are not easy in every situation."
"SDFT suggests an alternative. Instead of inferring a reward function, it uses the model's in-context learning ability to generate on-policy learning signals from demonstrations."
Continual learning enables foundation models to keep learning and improving over time, mirroring human accumulation of knowledge and skill refinement. Reinforcement learning can train on data generated by a model's own policy to reduce forgetting, but it usually requires explicit reward functions that are difficult to specify in many situations. SDFT offers an alternative by leveraging the model's in-context learning ability to produce on-policy learning signals from demonstrations rather than inferring a reward function. Generating learning signals from demonstrations can allow continuous, scalable model improvement without hand-crafted rewards, supporting long-term adaptation of foundation models.
Read at InfoWorld
Unable to calculate read time
[
|
]