DeepSeek didn't really train its flagship model for $294,000

"The confusion stemmed from the supplementary information released alongside the original January paper, in which the AI model dev revealed it had used just 64 eight-way H800 boxes totaling 512 GPUs running at full tilt for 198 hours to train the preliminary R1-Zero release, and another 80 hours or so to complete it. Along with about 5,000 GPU hours to generate the supervised fine-tuning datasets used in the training process, the entire endeavor came out to a hair under $300,000"

"But, that's not actually what happened. Never mind the fact that $300,000 won't buy you anywhere close to 512 H800s (those estimates are based on GPU lease rates not actual hardware costs), the researchers aren't talking about end-to-end model training. Instead, it focuses on the application of reinforcement learning used to imbue its existing V3 base model with "reasoning" or "thinking" capabilities."

Reported compute usage included 512 H800 GPUs for roughly 198–278 hours plus about 5,000 GPU-hours to generate supervised fine-tuning datasets, producing a reported compute bill just under $300,000. That reported number covers only the reinforcement-learning fine-tuning phase applied to an existing V3 base model rather than the full pretraining or prior development. Lease-rate based estimates do not represent the cost of acquiring hardware. End-to-end model development and pretraining consumed far greater compute and capital, placing actual total costs at roughly twenty times the narrowly reported figure.

#deepseek #model-training-costs #reinforcement-learning-fine-tuning #gpu-compute

Read at Theregister

Unable to calculate read time

Collection

[

...

]

DeepSeek didn't really train its flagship model for $294,000DeepSeek didn't really train its flagship model for $294,000 Briefly

DeepSeek didn't really train its flagship model for $294,000
DeepSeek didn't really train its flagship model for $294,000
Briefly