Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with duplicate field for WaPo in Solr. #807

Merged
merged 2 commits into from
Sep 16, 2019
Merged

Conversation

ryan-clancy
Copy link
Member

Fixing bug introduced by c5ee9af

@ryan-clancy ryan-clancy requested a review from lintool September 16, 2019 20:57
@lintool
Copy link
Member

lintool commented Sep 16, 2019

Great! Go ahead and merge when Jenkins turns green.

@ryan-clancy ryan-clancy merged commit 1649d31 into master Sep 16, 2019
@ryan-clancy ryan-clancy deleted the solr-wapo-fix branch September 16, 2019 21:26
@arjenpdevries
Copy link

So this only works because of converting the value to String, which might work out correctly assuming both types resolve the same string value - otherwise there could be an order effect, and different documents could get different values.

@lintool
Copy link
Member

lintool commented Sep 17, 2019

@r-clancy ?

@ryan-clancy
Copy link
Member Author

So this only works because of converting the value to String, which might work out correctly assuming both types resolve the same string value - otherwise there could be an order effect, and different documents could get different values.

The implementation of stringValue() doesn't convert arbitrary objects to a String (using toString() for example). It looks like the implementation does this for a Number only (and keeps Strings, of course):

@Override
  public String stringValue() {
    if (fieldsData instanceof CharSequence || fieldsData instanceof Number) {
      return fieldsData.toString();
    } else {
      return null;
    }
  }

However, your comment has made me realize that the order of the checks in this PR should be flipped as the numericValue() method has stricter checking:

@Override
  public Number numericValue() {
    if (fieldsData instanceof Number) {
      return (Number) fieldsData;
    } else {
      return null;
    }
  }

Our code should be this:

if (field.numericValue() != null) {
    solrDocument.addField(field.name(), field.numericValue());
  } else if (field.stringValue() != null) { // For some reason, id is multi-valued with null as one of the values
    solrDocument.addField(field.name(), field.stringValue());
}

The ordering within a document collection would already be consistent, but I think this just makes more sense: try the numerical value first and fallback to the String. Since fields are added to Lucene documents in a fixed order in the Generator classes and iterating over fields happens in a deterministic order, this should be fine.

@arjenpdevries
Copy link

Sounds right, tnx for the clear explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants