We curate the data mixture to cover diverse domains that require reasoning, and a reject sampling procedure to improve the data quality. We then rewrite QwQ traces with GPT-4o-mini into a well-formatted version, inspired by Still-2, to improve data quality and ease parsing.
The fact that Sky-T1 could be built so quickly still demonstrates that it is possible to replicate high-level reasoning capabilities affordably and efficiently.
The model performed at or above o1-preview's level on math and coding benchmarks but did not surpass o1 on the graduate-level benchmark GPQA-Diamond, which includes more advanced physics-related questions.
NovaSky open-sourced all parts of the model, including weights, data, infrastructure, and technical details.
Collection
[
|
...
]