How DeepSeek's new way to train advanced AI models could disrupt everything - again
Briefly

How DeepSeek's new way to train advanced AI models could disrupt everything - again
"Just before the start of the new year, the AI world was introduced to a potential game-changing new method for training advanced models. A team of researchers from Chinese AI firm DeepSeek released a paper on Wednesday outlining what it called Manifold-Constrained Hyper-Connections, or m HC for short, which may provide a pathway for engineers to build and scale large language models without the huge computational costs that are typically required."
"DeepSeek leapt into the cultural spotlight one year ago with its release of R1, a model that rivaled the capabilities of OpenAI's o1 and that was reportedly trained at a fraction of the cost. The release came as a shock to US-based tech developers, because it showed that access to huge reserves of capital and computing resources wasn't necessarily required to train cutting-edge AI models."
DeepSeek introduced Manifold-Constrained Hyper-Connections (mHCs) as a method intended to enable engineers to scale large language models with much lower computational cost. The mHC approach aims to reduce the resource barriers that typically accompany training advanced models. DeepSeek's prior R1 model reportedly matched competitor capabilities while using far less compute, demonstrating cost-efficient training approaches. DeepSeek postponed its planned R2 release until mid-2025, citing limited access to advanced AI chips and CEO Liang Wenfeng's concerns about R2's performance. The mHC approach targets a technical scalability gap that has limited large model development.
Read at ZDNET
Unable to calculate read time
[
|
]