Tony Lewis, CTO of BrainChip, presented research on state space models (SSMs) at the 2025 Embedded Vision Summit. SSMs provide low-power large language model (LLM) capabilities by bypassing the context-handling constraints of transformer models. Utilizing matrices to represent the last token seen, SSMs reduce memory usage. BrainChip's TENN model exemplifies this, running under 0.5 watts with 1 billion parameters, using read-only memory. The Markov property enables efficient resource use in low-powered environments, improving CPU cache utilization and lowering costs.
One cool thing about the state space model is that the actual cache used is incredibly small, so in the terms of a transformer based model, you don't have a compact state, what you have to remember is a representation of everything that has come before.
SSMs utilize matrices to generate outputs based on only the last token seen, meaning all history can be represented by the current state.
BrainChip's TENN model is capable of running with read-only flash memory and under 0.5 watts power consumption while producing results in under 100 ms.
The memoryless nature of state space models allows for better utilization of CPU cache and reduced memory paging, decreasing device power consumption and increasing cost-efficiency.
Collection
[
|
...
]