Weakly-supervised contrastive pre-training is crucial for the effectiveness of modern text embedding models, as it helps in learning meaningful representations.
Contriever utilizes random cropped spans as positive pairs for pre-training, setting the foundation for subsequent models like E5 and BGE which enhance data selection.
Collection
[
|
...
]