The inability of subquadratic-time architectures like SSMs to perform content-based reasoning is a significant limitation, impacting their effectiveness across various modalities including language.
By allowing SSM parameters to be functions of the input, the model can more effectively manage information flow, thus enhancing performance in discrete domains.
Despite the challenges in implementing efficient convolutions, we developed a hardware-aware parallel algorithm, aiming to maximize performance when using selective SSMs.
Our empirical evaluations on tasks ranging from language modeling to audio generation demonstrate substantial improvements in both memory efficiency and computational speed.
#machine-learning #state-space-models #deep-learning #transformer-architecture #computational-efficiency
Collection
[
|
...
]