Hardware-Aware Algorithm for Selective State Space Models | HackerNoon
Briefly

Without input-dependent selectivity, SSMs can be efficiently implemented as a convolution. We describe how we use kernel fusion and recomputation to make SSM scan fast and memory-efficient.
Training foundation models with selective SSMs requires efficiency on modern hardware. We evaluate the speed of our scan implementation, showing that it is up to 7× times faster than attention at sequence length 32K.
Read at Hackernoon
[
|
]