#transformer-efficiency
#transformer-efficiency

[ follow ]

DeepSeek releases 'sparse attention' model that cuts API costs in half | TechCrunch

V3.2-exp uses Sparse Attention with a lightning indexer and fine-grained token selection to dramatically lower inference costs for long-context operations.

[ Load more ]

#transformer-efficiency#transformer-efficiency

DeepSeek releases 'sparse attention' model that cuts API costs in half | TechCrunch

#transformer-efficiency
#transformer-efficiency