MS MARCO Web Search provides a large-scale dataset of web pages and corresponding query-document pairs sourced from a commercial search engine. This dataset is characterized by its richness and quality, designed to address challenges in machine learning and information retrieval. It presents three benchmark tasks aimed at fostering innovation in these fields. The dataset aims to facilitate further research, development, and testing of methods within the realm of web-scale information retrieval systems.
MS MARCO Web Search is the first web dataset that effectively meets the criteria of being large, real, and rich in terms of data quality. It is composed of large-scale web pages and query-document labels sourced from a commercial search engine, retaining rich information about the web pages that is widely employed in industry.
The retrieval benchmark offered by MS MARCO Web Search comprises three challenging tasks that require innovation in both the areas of machine learning and information retrieval system research. We hope MS MARCO Web Search can serve as a benchmark for modern web-scale information retrieval.
Collection
[
|
...
]