MiniMax Releases M1: A 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks

from InfoQ 1 week ago

MiniMax has launched MiniMax-M1, an open-weight language model featuring 456 billion parameters and a unique lightning attention mechanism, enabling long-context reasoning and efficient tool use. Building on its predecessor, MiniMax-Text-01, it employs a hybrid Mixture-of-Experts architecture, allowing for effective compute resource management. The model, which underwent extensive reinforcement learning training, excels in various domains like software engineering and mathematical problem-solving. Its performance benchmarks show it consistently ranks highest among open-weight models, particularly in long-context tasks and reasoning-heavy math, while also introducing a new RL algorithm, CISPO, to enhance stability and performance.

MiniMax-M1, optimized for long-context reasoning and tool use, features a capacity of 456 billion parameters and innovative 'lightning attention' processing for efficient computation.

With its hybrid Mixture-of-Experts architecture, MiniMax-M1 demonstrates superior performance in long-context tasks, software engineering, and complex mathematical reasoning benchmarks, outperforming many competitors.

Read at InfoQ

#ai #machine-learning #language-model #natural-language-processing #reinforcement-learning

Collection

[

...

]

MiniMax Releases M1: A 456B Hybrid-Attention Model for Long-Context Reasoning and Software TasksMiniMax Releases M1: A 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks Briefly

MiniMax Releases M1: A 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks
MiniMax Releases M1: A 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks
Briefly