[W]e find that central to the performance of our framework is the ability to steer the curation process towards the distribution of smaller, well-curated datasets...Crucially, we find this process [enables] strong data quality bootstrapping: a reference model trained on a small curated dataset can effectively guide the curation of a much larger dataset, allowing the training of a model which strongly surpasses the quality of the reference model on many downstream tasks.
JEST is applied during training, selecting chunks of data based on their learnability score. This enhances batch quality, akin to the concept of hard negatives.
Collection
[
|
...
]