PyTerrier Data Repository

About This Site

This site is a repository of indices for PyTerrier and Terrier.

Vaswani

Last Update 2021-09-285 index variants

The Vaswani NPL corpus is a small test collection of 11,000 abstracts has been used by the Glasgow IR group for many years (created 1990). Due to its small size, it is used for many test cases used in both Terrier and PyTerrier.

More details →

MSMARCO Document Ranking

Last Update 2021-09-275 index variants

A document ranking corpus containing 3.2 million documents. Also used by the TREC Deep Learning track.

More details →

MSMARCO Passage Ranking

Last Update 2021-09-297 index variants

A passage ranking task based on a corpus of 8.8 million passages released by Microsoft, which should be ranked based on their relevance to questions. Also used by the TREC Deep Learning track.

More details →

MSMARCOv2 Document Ranking

Last Update 2021-07-052 index variants

A new version of the MSMARCO document ranking corpus, containing 11.9 million documents. Also used by the TREC 2021 Deep Learning track.

More details →

MSMARCO v2 Passage Ranking

Last Update 2021-08-082 index variants

A revised corpus of 138M passages released by Microsoft in July 2021, which should be ranked based on their relevance to questions. Also used by the TREC 2021 Deep Learning track.

More details →

TREC COVID

Last Update 2021-10-018 index variants

A collection of scientific articles related to COVID-19. This uses the 2020-07-16 version of the CORD-19, which is used by the TREC COVID complete benchmark.

More details →