Anserini Regressions Log

The following change log details commits to regression tests that alter effectiveness and the addition of new regression tests. This documentation is useful for figuring why results may have changed over time.

June 13, 2024

commit 6cf601 (2024/06/13)

Added flat regressions for MS MARCO v1 passage corpus: BGE, Cohere v3, cosDPR, OpenAI Ada2. Total of 36 regressions: 12 each for dev, DL19, and DL20.

June 3, 2024

commit 6093e3 (2024/06/03)
commit 215233 (2024/05/30)

Implemented brute-force search on dense vectors: AnseriniLucene99FlatVectorFormat and AnseriniLucene99ScalarQuantizedVectorsFormat. Added regressions for all BEIR datasets: {cached, ONNX} × {original, int8}.

April 26, 2024

commit e6b7ea (2024/04/26)

Added regressions for MS MARCO V2.1 corpora: document + segmented document.

April 10, 2024

commit 0a4dbd (2024/04/10)

Fixed OpenAI ada2 regressions for int8 variants.

March 27, 2024

At release v0.25.0, completed full matrix of regressions for MS MARCO v2 passage/doc for currently available models.

February 19, 2024

commit 43c9ec (2024/02/18)
commit 66cadf (2024/02/18)
commit 61433f (2024/02/16)

Added new regressions for CIRAL.

February 14, 2024

commit ce8c2a (2024/02/14)
commit 8d4a7f (2024/02/14)
commit 9a5bb6 (2024/02/12)
commit 57d262 (2024/02/11)
commit f86a65 (2024/02/09)
commit f2e2ac (2024/01/25)

New regressions:

SPLADE++ ED w/ ONNX for BEIR
BGE with original and quantized HNSW indexes for BEIR (only pre-encoded queries)
Cohere embed-english-v3 for MS MARCO passage dev (but not DL19 or DL20)

January 21, 2024

commit ca20dc (2024/01/21)

Added BGE + ONNX regressions for MS MARCO v1 passage corpus.

January 19, 2024

commit edd47a (2024/01/19)

Removed splade-distil-cocodenser-medium regressions to reduce confusion.

January 13, 2024

commit 883539 (2024/01/13)

Added HNSW regressions for the BGE-base-en-v1.5 model on the MS MARCO v1 passage corpus: dev, DL19, and DL20; both original and int8 quantized versions.

December 19, 2023

commit 883539 (12/19/2023)

Upgraded to Lucene 9.9.1, added the following HNSW regressions:

msmarco-passage-cos-dpr-distil-hnsw-int8-onnx
msmarco-passage-cos-dpr-distil-hnsw-int8
msmarco-passage-openai-ada2-int8
dl19-passage-cos-dpr-distil-hnsw-int8-onnx
dl19-passage-cos-dpr-distil-hnsw-int8
dl19-passage-openai-ada2-int8
dl20-passage-cos-dpr-distil-hnsw-int8-onnx
dl20-passage-cos-dpr-distil-hnsw-int8
dl20-passage-openai-ada2-int8

Note, however, that we're still having issues with the openai-ada2-int8 regressions; not fully working yet.

November 24, 2023

commit d88446 (11/24/2023)

Added regressions for SPLADE++ CoCondenser-EnsembleDistil on BEIR (v1.0.0),

November 18, 2023

commit 636918 (11/18/2023)

Added the following:

msmarco-passage-cos-dpr-distil-fw
dl19-passage-cos-dpr-distil-fw
dl20-passage-cos-dpr-distil-fw
msmarco-passage-cos-dpr-distil-lexlsh
dl19-passage-cos-dpr-distil-lexlsh
dl20-passage-cos-dpr-distil-lexlsh

November 9, 2023

commit d152e5 (11/9/2023)

Added ONNX variant of cosDPR-distil regressions.

November 7, 2023

commit d2fb8a (11/5/2023)
commit d76bb4 (11/4/2023)

Added OpenAI-ada2 and CIRAL regressions.

June 27, 2023

Summarizing new regressions since last entry, see PR #2140:

msmarco-passage-cos-dpr-distil
msmarco-v2-passage-splade-pp-ed
msmarco-v2-passage-splade-pp-sd
dl21-passage-splade-pp-ed
dl21-passage-splade-pp-sd
dl22-passage-splade-pp-ed
dl22-passage-splade-pp-sd

April 5, 2023

commit a7df7f (4/5/2023)

Added initial regressions for ONNX: SPLADE++ ED/SD on MS MARCO V1 passage for dev queries, DL19, and DL20.

March 24, 2023

Summarizing the addition of several recent regressions:

Regressions for MegaTokenizer on Mr.TyDi and MIRACL.
Regressions for CompositeTokenizer on { MS MARCO V1 passage, MS MARCO V1 doc (full), MS MARCO V1 doc (segmented) } × { dev queries, DL19, DL20 }.
Regressions for SPLADE++ ED/SD on MS MARCO V1 passage for dev queries, DL19, and DL20.
Regressions for SPLADE on NeuCLIR22 for fa, ru, zh, both QT and DT.
Regressions for wiki-all-6-3-tamber-bm25 from Tamber et al. (ECIR 2023).

October 29, 2022

commit 2d13a2 (10/29/2022)

Added TREC 2022 DL Track regressions (for passages).

October 27, 2022

commit 6d8601 (10/27/2022)

Added TREC 2022 NeuCLIR Track regressions.

October 25, 2022

commit debf2b (10/25/2022)

Fixed bug in HC4 doc translation regressions: documents are in English, so we should be using the English analyzer.

October 23, 2022

commit f94285 (10/23/2022)
commit 7cb701 (10/20/2022)

Added regressions for MIRACL dev set

October 17, 2022

commit 7b244e (10/17/2022)

Fixed broken batch_search implementation for RM3 (or any feedback approach), originally reported in castorini/pyserini#831. Also fixed BM25prf to be thread-safe. Incidentally, these changes fixed a bug in backgroundlinking18, backgroundlinking19, backgroundlinking20. Regression scores updated.

October 4, 2022

commit c7addf (10/03/2022)
commit 4a9a97 (10/01/2022)
commit b5ecc5 (09/26/2022)

The two changes referenced here are:

Added regressions for MS MARCO V1 passage/document (including DL19 and DL20) based on BERT WordPiece tokenization using HuggingFaceTokenizerAnalyzer; added in #1969, ref issue #1978.
Updated Mr. Tydi Telugu regressions following the addition of an Analyzer for Telugu in Lucene 9; ref issue #1982.

September 22, 2022

commit a60e84 (09/18/2022)
commit 3dfce5 (09/15/2022)

Two major changes with regression impact:

Upgraded to fastutil 8.5.8 and fixed longstanding FeatureVector issue (#840).
Upgraded to jsoup 1.15.3 to fix security vulnerability.

August 15, 2022

commit cc6337 (08/15/2022)

Updated all regressions to pass with Lucene 9. Effectiveness figures changed for the following:

clef06-fr
hc4-neuclir22-ru
hc4-v1.0-ru
mrtydi-v1.1-fi
mrtydi-v1.1-ja
mrtydi-v1.1-ru
msmarco-doc

Index statistics for many other regressions changed (e.g., total terms), but these changes did not alter effectiveness figures (other than those identified above). Note that after updating Pyserini to Lucene 9, wikipedia-dpr-100w-bm25 passed also.

July 23, 2022

commit d25495 (07/23/2022)
commit a1608a (07/23/2022)

Added regressions for HC4 test topics on NeuCLIR22 corpora, both query translation and document translation variants.

July 13, 2022

commit 500e87 (07/13/2022)

Added regression for Wikipedia retrieval for QA (from DPR).

June 17, 2022

commit f59283 (06/17/2022)

Added regressions for BEIR, uniCOIL (noexp).

June 16, 2022

commit d71a11 (06/16/2022)
commit d2fbe6 (06/15/2022)

Added regressions for HC4 corpora.

May 26, 2022

commit fc542b (05/26/2022)
commit fc050d (05/21/2022)

Added regressions for MS MARCO V1 passage, 8-bit quantized BM25 (dev, DL19, DL20).

May 24, 2022

commit 30c997 (05/24/2022)
commit d457c8 (05/21/2022)

Added regressions for BEIR, flat indexing with WordPiece tokens.

April 28, 2022

commit 9b2dd5 (04/28/2022)

Above is the final commit of work stretching back approximately a month that added a complete set of regressions for BEIR, covering "flat" indexing, multifield indexing, and the SPLADE-distil CoCodenser Medium model.

April 21, 2022

commit 5e0437 (04/21/2022)

Added regressions for "v2" of doc segmented uniCOIL on MS MARCO V2; cf #1853.

April 8, 2022

commit 3624dc (04/08/2022)

Added MS MARCO V1 passage/document regressions based BERT WordPiece tokenization.

March 2, 2022

commit 41b65d (03/02/2022)

Added regressions for uniCOIL noexp on MS MARCO v1 corpora.

February 7, 2022

commit 51c386 (02/07/2022)

Added uniCOIL regressions for MS MARCO V1: missing regressions for uniCOIL passage on DL19, DL20, and brand new regressions for uniCOIL segmented doc on dev, DL19, and DL20.

February 5, 2022

commit 4c33f1 (02/05/2022)

Added uniCOIL (with d2q-T5 expansions) regressions on MS MARCO V2 (both dev/dev2 queries as well as TREC 2021 DL Track). Tweaked noexp regressions to make consistent.

January 20, 2022

commit 1be47b (01/20/2022)

Added MS MARCO (V2) {doc, segmented doc, passage, augmented passage} regressions for doc2query-T5 expansions (both dev/dev2 queries as well as TREC 2021 DL Track).

January 8, 2022

commit 6fcb89 (01/08/2022)
commit f0502c (11/16/2022)

Rebuilt all MS MARCO (V1) doc regressions from scratch to fix segmentation issues described here.

December 15, 2021

commit 151404 (12/15/2021)
commit aee51a (12/05/2021)

Added regressions for Mr.TyDi (v1.1).

December 13, 2021

commit 64f4d1 (12/13/2021)
commit 12149f (12/09/2021)

Expanded regressions for TREC Disks 4 & 5.

November 25, 2021

commit 47685b (11/25/2021)
commit 1c5f64 (11/18/2021)

Added regressions for MS MARCO V2 (dev2) and TREC 2021 DL Track queries; add uniCOIL noexp zero-shot results.

October 18, 2021

commit 828d05 (10/18/2021)
commit cf5c4f (10/16/2021)

Refactored regressions for DeepImpact and uniCOIL on MS MARCO passage, added SPLADEv2.

October 9, 2021

commit f8b7cd (10/09/2021)

Major refactoring of MS MARCO V2 naming conventions.

September 5, 2021

commit f79fb6 (09/05/2021)

Added regressions for DeepImpact and uniCOIL on MS MARCO passage.

September 4, 2021

commit 112438 (09/04/2021)

Added regressions for MS MARCO V2 corpora, standard BM25 + PRF configurations w/ default parameters:

raw passage corpus, augmented passage corpus
raw doc corpus, segmented doc corpus

September 2, 2021

commit f86e4e (09/02/2021)

Upgraded jsoup from v1.8.3 to v1.14.2 to address a security vulnerability. Minor changes to the following regressions: backgroundlinking18, backgroundlinking19, backgroundlinking20, core18, cw09b, cw12, cw12b13, disk12, gov2, wt10g.

June 14, 2021

commit b58c85 (06/14/2021)

Overhauled regressions for MS MARCO {passage, doc} and DL {19, 20}:

MS MARCO passage + {doc2query, docTTTTTquery}
MS MARCO doc {per-doc, per-passage} x {doc2query, docTTTTTquery}
{DL19, DL20} passage + {doc2query, docTTTTTquery}
{DL19, DL20} doc {per-doc, per passage} x {doc2query, docTTTTTquery}

April 13, 2021

commit 868afe (04/13/2021)

Updated regressions for the MS MARCO doc ranking task, we now have the complete cross product of {doc indexing, passage indexing} and {no expansion, expansion}. Regressions now use tuned parameters.

March 30, 3021

commit c75c63 (03/30/2021)

Added regressions for Anserini submissions to TREC 2020 News Track, background linking task.

March 19, 2021

commit e9af6e (03/19/2021)

Added regressions for Anserini submissions to TREC 2020 Deep Learning Track: passage ranking (also with docTTTTTquery) and document ranking (also with per-document docTTTTTquery).

February 24, 2021

commit 90d3aa (02/24/2021)

Fixed bug where multi-line TREC topic titles weren't being fully parsed (#1482). Affects regressions for Disks 1 & 2.

November 16, 2020

commit f87c94 (11/16/2020)
commit 9a8e8b (11/12/2020)

Added regressions for MS MARCO document ranking with per-passage and per-document docTTTTTquery expansions.

April 12, 2020

commit 35f9f8 (04/12/2020)

Regression results for Core18 (Washington Post) changed due to refactoring to conform to clarified definitions of contents() and raw() in SourceDocument, per Issue #1048. Previously, both contents() and raw() returned the raw JSON, and the WashingtonPostGenerator extracted the article contents for indexing. Now, raw() returns the raw JSON and contents() returns the extracted article contents for indexing (i.e., the logic for parsing the JSON has been moved from WashingtonPostGenerator into the collection itself). This conforms to the principle that every collection should "know" how to parse its own contents.

Regression values went down slightly for Ax as a result of this refactoring. The difference is that, before, the "empty document check" was performed on the JSON, so it never triggered (since the JSON was never empty). With this new processing logic, the "empty document check" is performed on contents() (hence, the parsed article contents), and so the number of empty documents is now accurate (there are six based on the current parsing logic). From these changes and those below, it seems that Ax is very sensitive to tiny collection differences.

April 7, 2020

commit 9a28a0 (04/07/2020)

Regression results for Core17 (New York Times) changed as the result of a bug fix. Previously, Core17 used the NewYorkTimesCollection and was indexed with JsoupGenerator as the generator, which assumes that the input is HTML (or XML) and removes tags. However, this was unnecessary, because the collection implementation already removes tags internally. As a result, angle brackets in the text were interpreted as tags and removed. Fixing this bug increased the number of terms in the collection (and a document that was previously empty is no longer empty). However, effectiveness of bm25+ax and ql+ax decreased slightly; bm25/bm25+rm3 and ql/ql+rm3 remain unchanged.

March 6, 2020

commit 10ff01 (03/06/2020)

Added regressions for background linking task from the TREC 2018 and 2019 News Tracks.

Febrary 25, 2020

commit a62004 (02/25/2020)
commit 0d42d3 (02/25/2020)

Added regressions for the TREC 2019 Deep Learning Track, both document and passage ranking task.

November 27, 2019

commit 411618 (11/27/2019)
commit b9264d (11/27/2019)

Added regressions for TREC 2002 (Arabic), CLEF 2006 (French), and FIRE 2012 (English, Bengali and Hindi).

October 11, 2019

commit 445bb45 (10/11/2019)

Add regressions for NTCIR-8 ACLIA (IR4QA subtask, Monolingual Chinese).

September 5, 2019

commit e88b931 (9/5/2019)

As it turns out, we were incorrect in entry below (commit 2f1b665). Regressions numbers after BM25prf fix did change slightly.

August 14, 2019

commit 2f1b665 (8/14/2019)

Resolves inconsistent tie-breaking for BM25prf that leads to non-deterministic results, per #774. Note that regression numbers did not change.

August 9, 2019

commit 1217d47 (8/9/2019)
commit 75dfaa6 (8/9/2019)

Added new doc2query regression car17v2.0-doc2query to reproduce Nogueira et al. (arXiv 2019) on the TREC 2017 Complex Answer Retrieval (CAR) section-level passage retrieval task (v2.0). Added +Ax and +PRF regressions with both tuned and default BM25 parameters for MS MARCO passage ranking task.

August 5, 2019

commit 80c5447 (8/5/2019)

Added +Ax and +PRF regressions with both tuned and default BM25 parameters for MS MARCO document ranking task.

June 20, 2019

commit 86be3d2 (6/20/2019)
commit b656da3 (6/20/2019)

Added new doc2query regression msmarco-passage-doc2query to reproduce Nogueira et al. (arXiv 2019) on the MS MARCO passage ranking task. Added tuned BM25 parameters to msmarco-doc regression. Associated documentation updated.

June 12, 2019

commit 75e36f9 (6/9/2019)

Upgrade to Lucene 8: minor changes to all regression experiments. JDIQ 2018 experiments are no longer maintained.

June 9. 2019

commit 93f8f3c (6/9/2019)
commit 781d9ed (6/8/2019)

Added regressions for MS MARCO passage and document ranking tasks.

June 3, 2019

commit 3545350 (6/3/2019)
commit a3ccdef (6/3/2019)

Fixed bug in topic reader for CAR. Better parsing of New York Times documents. Regression numbers in both cases improved slightly.

May 31, 2019

commit 27493ed (5/31/2019)

Per #658: fixed broken regression in Core18 introduced by commit c4ab6b (4/18/2019).

May 11, 2019

commit 3eef2fb (5/11/2019)
commit 2ba2b95 (5/11/2019)
commit d911bba (5/10/2019)

CAR regression refactoring: added v2.0 regression and renamed existing regression to v1.5. Both use benchmarkY1-test to support consistent comparisons.

January 2, 2019

commit 407f308 (1/2/2019)

Added fine tuning results (i.e., SIGIR Forum article experiments) for axiomatic semantic term matching.

December 24, 2018

commit 1aa3970 (12/24/2018)

Changed RM3 defaults to match settings in Indri.

December 20, 2018

commit e71df7a (12/20/2018)

Added Axiomatic F2Exp and F2Log ranking models back into Anserini (previously, we were using the default Lucene implementation as part of version 7.6 upgrade).

December 18, 2018

commit e71df7a (12/18/2018)

Upgrade to Lucene 7.6.

November 30, 2018

commit e5b87f0 (11/30/2018)

Added default regressions for TREC 2018 Common Core Track.

November 16, 2018

commit 2c8cd7a (11/16/2018)

This is the commit id references in the SIGIR Forum 2018 article. Note that commit 18c3211 (12/9/2018) contains minor fixes to the code.

October 22, 2018

commit 10255e0 (10/22/2018)

Fixed incorrect implementation of -rm3.fbTerms.

September 26, 2018

commit 7c882d3 (9/26/2018)

Fixed bug as part of #429: cw12 and mb13 regression tests changed slightly in effectiveness.

August 8, 2018

commit d4b3272 (8/8/2018)

Added regressions tests for CAR17.

August 5, 2018

commit c0da510 (8/5/2018)

This commit adds the effectiveness verification testing for the JDIQ2018 Paper.

July 22, 2018

commit 3a7beee (7/22/2018)
commit ec5fd3d (7/22/2018)
commit 5f8c26d3 (7/22/2018)

These three commits establish the new regression testing infrastructure with the following tests:

Experiments on Disks 1 & 2: {BM25, QL} ⨯ {RM3, Ax} ⨯ {AP, P30}
Experiments on Disks 4 & 5 (Robust04): {BM25, QL} ⨯ {RM3, Ax} ⨯ {AP, P30}
Experiments on AQUAINT (Robust05): {BM25, QL} ⨯ {RM3, Ax} ⨯ {AP, P30}
Experiments on New York Times (Core17): {BM25, QL} ⨯ {RM3, Ax} ⨯ {AP, P30}
Experiments on Wt10g: {BM25, QL} ⨯ {RM3, Ax} ⨯ {AP, P30}
Experiments on Gov2: {BM25, QL} ⨯ {RM3, Ax} ⨯ {AP, P30}
Experiments on ClueWeb09 (Category B): {BM25, QL} ⨯ {RM3, Ax} ⨯ {AP, P30, NDCG@20, ERR@20}
Experiments on ClueWeb12-B13: {BM25, QL} ⨯ {RM3, Ax} ⨯ {AP, P30, NDCG@20, ERR@20}
Experiments on ClueWeb12: {BM25, QL} ⨯ {RM3} ⨯ {AP, P30, NDCG@20, ERR@20}
Experiments on Tweets2011 (MB11 & MB12): {BM25, QL} ⨯ {RM3, Ax} ⨯ {AP, P30}
Experiments on Tweets2013 (MB13 & MB14): {BM25, QL} ⨯ {RM3, Ax} ⨯ {AP, P30}

Files

regressions-log.md

Latest commit

History