-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[3.6] Allow datatype Long when indexing #5911
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BartChris thanks for this fix. Code looks good but I encountered a strange behaviour when testing it.
When manually entering one of the values you mentioned in the linked issue (9910211111112122
) into a field in the metadata editor and saving the process, it works and the index contains the correct value.
Entering the other value you from the issue (991022551489706476
), though, or the one from your test (999999999999999991
), these are not saved correctly to the index. The first is changed from
991022551489706476
➡️ 991022551489706500
and the second from
999999999999999991
➡️ 1000000000000000000
in the index.
It's especially strange since all these numbers are still way below the limit of the Java long
type (9223372036854775807
, according to https://www.w3schools.com/java/java_data_types.asp) and because your test does indeed seem to work. When I try it in the running system, though, after saving the values are correct in the metadata files, but not in the index.
Maybe the ES long type does not cover the exact same domain as the Java type? Or it somehow depends on the minor ES version? (I am using 7.10 on my development system, but the elasticsearch.version
in the pom.xml
is 7.17)
Have you encountered anything similar when developing your fix?
@solth Strange, i will check. Would it be an option to cast all Longs to a String before indexing? I am wondering if there is a scenario where we actually need the number value in the index. What might also play a role is the fact that the field type of this field is dynamically determined the first time Elasticsearch encounters it. The following documents would probably use the same inferred index mapping. We should therefor only use datatypes like Long in the index when we really need them. Edit: I can confirm your findings and it seems like this happens for numbers with more than 16 characters, but i have no idea why you can still find search for some of those instances. if (value instanceof String || value instanceof Integer) {
json.put(prepareKey(key), value);
} else if (value instanceof Long || value instanceof BigInteger){
json.put(prepareKey(key), value.toString());
} else if (value instanceof JSONObject) { |
b3b6a5b
to
46f45ab
Compare
46f45ab
to
484dfc8
Compare
@solth I implemented exactly that: Long and BigInteger numbers are converted to a String. While doing that i also encountered that Kitodo breaks when searching for very long numbers using the search field: This happens because Elasticsearch tries to cast the search value for searching in the ID field. ("Value is out of range for a long'). I fixed two places in the code to prevent that. |
@BartChris thanks a lot for looking into this and for the fix. I agree with you that casting numerical metadata values to String is a good solution, since no numerical operations are done on the values in the index anyway (e.g. queries that retrieve documents from ElasticSearch with numerical metadata values in a given range, for example). Would you mind opening a corresponding pull request with the same fix against the master branch? |
Fixes #5906
I would have liked to add a test for that but it seems like we are not testing the creation of the metadata fields in the index at the moment (only general process fields:https://github.com/kitodo/kitodo-production/blob/master/Kitodo-DataManagement/src/test/java/org/kitodo/data/elasticsearch/index/type/ProcessTypeTest.java) , so we would first need a way to add a general test here. There are also plans to simplify the metadata structure which would also provide the opportunity to add tests here (#4266)
Edit: I added a test in the ProcessService integration test.