The model's finetuning leads to performance gains, but intrinsic limitations remain. At 1.3B parameters trained on 7B tokens, it can't handle complex tasks as well as larger models.
Our model struggles with prompt sensitivity; longer prompts degrade performance as it misinterprets or forgets critical parts. The exercises used for training mainly involve short prompts.
Collection
[
|
...
]