MiniMax Releases M1: A 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks
Briefly

MiniMax has launched MiniMax-M1, an open-weight language model featuring 456 billion parameters and a unique lightning attention mechanism, enabling long-context reasoning and efficient tool use. Building on its predecessor, MiniMax-Text-01, it employs a hybrid Mixture-of-Experts architecture, allowing for effective compute resource management. The model, which underwent extensive reinforcement learning training, excels in various domains like software engineering and mathematical problem-solving. Its performance benchmarks show it consistently ranks highest among open-weight models, particularly in long-context tasks and reasoning-heavy math, while also introducing a new RL algorithm, CISPO, to enhance stability and performance.
MiniMax-M1, optimized for long-context reasoning and tool use, features a capacity of 456 billion parameters and innovative 'lightning attention' processing for efficient computation.
With its hybrid Mixture-of-Experts architecture, MiniMax-M1 demonstrates superior performance in long-context tasks, software engineering, and complex mathematical reasoning benchmarks, outperforming many competitors.
Read at InfoQ
[
|
]