
"The most important feature of the new model is called DeepSeek Sparse Attention, an intricate system described in detail in the diagram below. In essence, the system uses a module called a "lightning indexer" to prioritize specific excerpts from the context window. After that, a separate system called a "fine-grained token selection system" chooses specific tokens from within those excerpts to load into the module's limited attention window."
"For long-context operations, the benefits of the system are significant. Preliminary testing by DeepSeek found that the price of a simple API call could be reduced by as much as half in long-context situations. Further testing will be required to build a more robust assessment, but because the model is open-weight and freely available on Hugging Face, it won't be long before third-party tests can assess the claims made in the paper."
"DeepSeek's new model is one of a string of recent breakthroughs tackling the problem of inference costs - essentially, the server costs of operating a pre-trained AI model, as distinct from the cost of training it. In DeepSeek's case, the researchers were looking for ways to make the fundamental transformer architecture operate more efficiently - and finding that there are significant improvements to be made."
DeepSeek released V3.2-exp, an experimental model designed to reduce inference costs for long-context operations. The model implements DeepSeek Sparse Attention, which uses a lightning indexer to prioritize context excerpts and a fine-grained token selection system to load selected tokens into a limited attention window. These mechanisms allow processing long contexts while keeping server load low. Preliminary tests indicated API call costs can fall by as much as half in long-context scenarios. The model's weights are openly available on Hugging Face and the linked paper is on GitHub. The work targets inference/server cost reduction and transformer efficiency improvements.
Read at TechCrunch
Unable to calculate read time
Collection
[
|
...
]