#transformer-efficiency

[ follow ]
Artificial intelligence
fromTechCrunch
4 days ago

DeepSeek releases 'sparse attention' model that cuts API costs in half | TechCrunch

V3.2-exp uses Sparse Attention with a lightning indexer and fine-grained token selection to dramatically lower inference costs for long-context operations.
[ Load more ]