
"In an article in Nature, DeepSeek revealed that the R1 training took place on a cluster of 512 Nvidia H800 chips and took a total of 80 hours. This is the first time DeepSeek has shared concrete figures about its training costs. By way of comparison, Sam Altman of OpenAI stated last year that training fundamental models cost more than $100 million, without providing further details."
"DeepSeek's claims raise questions, especially since the H800 chips were designed by Nvidia specifically for the Chinese market after Washington banned the export of the more powerful H100 and A100 chips. US sources previously claimed that DeepSeek had obtained large numbers of H100 chips. However, the company insisted that it had only used H800s. In an additional statement, DeepSeek admitted for the first time that it also owns A100 chips and used them during preliminary experiments with smaller models."
"The relatively low cost of R1 can be partly explained by the model distillation method. In this method, a new model learns from an existing system, so that less computing power is required. American AI experts suggested that DeepSeek may have deliberately copied models from OpenAI. However, the Chinese company emphasizes that distillation is a common technique that enables better performance at lower costs,"
DeepSeek trained the reasoning-focused R1 model for $294,000 on a 512‑chip Nvidia H800 cluster over 80 hours. The training cost is far below industry estimates that place foundational-model training in the tens to hundreds of millions of dollars. The company had earlier released lower-cost AI systems that unsettled markets and prompted concerns about incumbent hardware vendors. The H800 chips target the Chinese market after export restrictions on H100 and A100 chips; DeepSeek also acknowledged owning A100s used in earlier experiments. The company used model distillation to reduce compute needs and rejects allegations that it copied OpenAI models.
Read at Techzine Global
Unable to calculate read time
Collection
[
|
...
]