Skip to content

Commit

Permalink
Fix issue with duplicate field for WaPo in Solr. (#807)
Browse files Browse the repository at this point in the history
* Fix issue with duplicate field for WaPo in Solr.

* Add known issues for Solr in v0.6.0
  • Loading branch information
Ryan Clancy authored Sep 16, 2019
1 parent abfa3ba commit 1649d31
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ See [this page](docs/additional.md) for additional documentation.

## Release History

+ v0.6.0: September 6, 2019 [[Release Notes](docs/release-notes/release-notes-v0.6.0.md)]
+ v0.6.0: September 6, 2019 [[Release Notes](docs/release-notes/release-notes-v0.6.0.md)][[Known Issues](docs/known-issues/known-issues-v0.6.0.md)]
+ v0.5.1: June 11, 2019 [[Release Notes](docs/release-notes/release-notes-v0.5.1.md)]
+ v0.5.0: June 5, 2019 [[Release Notes](docs/release-notes/release-notes-v0.5.0.md)]
+ v0.4.0: March 4, 2019 [[Release Notes](docs/release-notes/release-notes-v0.4.0.md)]
Expand Down
3 changes: 3 additions & 0 deletions docs/known-issues/known-issues-v0.6.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Anserini Known Issues (v0.6.0)

+ Solr indexing for Washington Post broke due to [417ac12](https://github.com/castorini/anserini/commit/c5ee9af442c500ec43fc28808903cfca2417ac12) and has been fixed in [#807](https://github.com/castorini/anserini/pull/807).
5 changes: 5 additions & 0 deletions src/main/java/io/anserini/index/IndexCollection.java
Original file line number Diff line number Diff line change
Expand Up @@ -281,6 +281,11 @@ public void run() {
if (field.fieldType().docValuesType() != DocValuesType.NONE) {
continue;
}
// If the field is already in the doc, skip it.
// This fixes an issue with WaPo where published_date is in the Lucene doc as LongPoint and StoredField. Solr needs one copy, more fine-grained control in config.
if (solrDocument.containsKey(field.name())) {
continue;
}
if (field.stringValue() != null) { // For some reason, id is multi-valued with null as one of the values
solrDocument.addField(field.name(), field.stringValue());
} else if (field.numericValue() != null) {
Expand Down

0 comments on commit 1649d31

Please sign in to comment.