Skip to content

Commit

Permalink
Fix zero-length norm bug with RM3 (#376)
Browse files Browse the repository at this point in the history
Avoids zero-length feedback documents, which causes division by zero when computing term
weights. Zero-length feedback documents occur (e.g., with CAR17) when a document has only 
terms that accents (which are indexed, but not selected for feedback).
  • Loading branch information
Victor0118 authored and lintool committed Aug 7, 2018
1 parent a2efe7d commit 135d08c
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion src/main/java/io/anserini/rerank/lib/Rm3Reranker.java
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,12 @@ private FeatureVector estimateRelevanceModel(ScoredDocuments docs, IndexReader r
for (String term : vocab) {
float fbWeight = 0.0f;
for (int i = 0; i < docvectors.length; i++) {
fbWeight += (docvectors[i].getFeatureWeight(term) / norms[i]) * docs.scores[i];
// Avoids zero-length feedback documents, which causes division by zero when computing term weights.
// Zero-length feedback documents occur (e.g., with CAR17) when a document has only terms
// that accents (which are indexed, but not selected for feedback).
if (norms[i] > 0.001f) {
fbWeight += (docvectors[i].getFeatureWeight(term) / norms[i]) * docs.scores[i];
}
}
f.addFeatureWeight(term, fbWeight);
}
Expand Down

0 comments on commit 135d08c

Please sign in to comment.