Hawk and Griffin: Mastering Long-Context Extrapolation in AI | HackerNoon
Briefly

In this section, we explore the effectiveness of Hawk and Griffin to use longer contexts to improve their next token prediction, and investigate their extrapolation capabilities during inference.
We demonstrate that the recurrent models, particularly Hawk and Griffin, can scale more efficiently than their transformer counterparts, yielding comparable performance while using less computational resources.
Read at Hackernoon
[
|
]