This notebook demonstrates retrieval using PyTerrier on the Vaswani corpus.
About the corpus: The Vaswani NPL corpus is a small test collection of 11,000 abstracts has been used by the Glasgow IR group for many years (created 1990). Due to its small size, it is used for many test cases used in both Terrier and PyTerrier.
#!pip install -q python-terrier
import pyterrier as pt
if not pt.started():
pt.init()
from pyterrier.measures import *
dataset = pt.get_dataset('vaswani')
bm25_terrier_stemmed = pt.BatchRetrieve.from_dataset('vaswani', 'terrier_stemmed', wmodel='BM25')
dph_terrier_stemmed = pt.BatchRetrieve.from_dataset('vaswani', 'terrier_stemmed', wmodel='DPH')
dph_bo1_terrier_stemmed = dph_terrier_stemmed >> pt.rewrite.Bo1QueryExpansion(pt.get_dataset('vaswani').get_index('terrier_stemmed')) >> dph_terrier_stemmed
pt.Experiment(
[bm25_terrier_stemmed, dph_terrier_stemmed, dph_bo1_terrier_stemmed],
pt.get_dataset('vaswani').get_topics(),
pt.get_dataset('vaswani').get_qrels(),
batch_size=200,
filter_by_qrels=True,
eval_metrics=['map'],
names=['bm25_terrier_stemmed', 'dph_terrier_stemmed', 'dph_bo1_terrier_stemmed'])