#training-stability

[ follow ]
Artificial intelligence
fromInfoQ
13 hours ago

Kimi's K2 Opensource Language Model Supports Dynamic Resource Availability and New Optimizer

Kimi K2 is a Mixture-of-Experts LLM (32B activated, 1.04T total) trained on 15.5T tokens using MuonClip to improve training stability.
[ Load more ]