-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Equality comparison of two literals with incompatible (but known) datatypes should return false in standard evaluation mode #3947
Comments
I wonder if this could be related to #3696 ? The specification for As for the whitespace I'm not sure how strict we should be. Edit: According to the grammar specification the amount of white space between NOT and IN should not matter. |
Regarding using A unit test illustrating the two issues:
|
BTW, this behaves the same in GraphDB (which is based on RDF4J) and Blazegraph, but not in Stardog where all four solutions are returned. |
Thanks for the thorough report @jetztgradnet . Unless someone else picks it up first, I'll try and take a closer look over the weekend. |
Query execution plan for the original query:
Looks fairly clear that the filter evaluation is somehow tripping up. I have also doublechecked that it is not the |
Stepping through it, it fails on comparisons between datatyped and untyped literals: "org.eclipse.rdf4j.query.algebra.evaluation.ValueExprEvaluationException: Unable to compare strings with other supported types" This happens when comparing |
Oh here we go. It's one of those cases where the SPARQL spec become near-impenetrable. The case we have here is that we are comparing two literals:
RDFterm-equal (see https://www.w3.org/TR/sparql11-query/#func-RDFterm-equal) is defined as follows:
(emphasis mine) Note that in our case, when applying the specs strictly, RDFterm-equal is supposed to return a type error: we have two literals, but they are not the same RDF term (as they are not equivalent literals as defined in the linked section in RDF Concepts). So, term-equality under a strict interpretation results in a type error. You'd think that because we negate this that maybe then gets coerced to an actual
In other words the negation of a type error is also a type error. So, in the strict interpretation of the SPARQL 1.1 spec, RDF4J is actually giving you the correct answer here. Does that mean Jena and Stardog are wrong? Well no. As per section 17.3.1 (operator extensibility):
So what Jena and Stardog do is extend the minimal compliance in a way that the spec allows for. We can do a similar thing in RDF4J (in fact we already do for some other cases). It's fine to extend operator behavior as long as the only cases you touch were type errors, previously), but I need to take a look at how to best fit this in: if we should tweak the strict evaluation strategy itself, or if we should consider this another case for the extended evaluation strategy to handle. |
Thanks, Jeen, for the thorough investigation! But anyway, whatever you consider the right path to follow for RDF4J is ok for me, I will then adjust the unit tests of the application ported from Jena accordingly. Any suggestion about the error for multiple whitespaces between NOT and IN (i.e. "NOT IN" as opposed to "NOT IN" which comes from a generated query where the NOT is inserted conditionally and is wrapped in whitespace)? |
Oh I fully agree that that is how it should work. It's just that a minimally-conforming implementation of the SPARQL spec doesn't :) . But minimally-conforming is kind of useless to stick to, beyond corner cases like strict validation and/or ensuring that you're 100% sure your query will work on any compliant SPARQL engine.
I'll take a look at that. It's a bug (additional whitespace should be ignored), but I'll split it out from this ticket, since it's really a separate issue. |
I think we may need to reclassify this as an improvement / feature request rather than a bug. I've picked up a related refactoring issue (GH-635 ) that aims to make choosing the mode in which the query engine runs a little easier. Haven't yet decided if I want to finish that first or in parallel just add this feature into the current (somewhat flawed) setup (probably in the ExtendedEvaluationStrategy). |
Instead of a type error, we now return false when comparing two literals with incompatible (but known) datatypes.
Instead of a type error, we now return false when comparing two literals with incompatible (but known) datatypes.
Current Behavior
I'm porting an application originally written for Apache Jena to use RDF4J instead.
One unit test runs a query like this (combined example with data plus query):
I would expect that four solutions are returned, but RDF4J only returns two (the two language strings). Interestingly/strangely, commenting the value with the
xsd:integer
makes the third value (plain xsd:string) also part of the result.So it looks like there is some issue when comparing values of different data types in the
NOT IN
filter clause.As a slightly related issue, running this as
NOT IN
(two or more whitespaces betweenNOT
andIN
produces an error (the query is generated so there is little control over whitespace). Is RDF4J rather strict here or does the grammar really not allow multiple whitespaces here?Expected Behavior
NOT IN
with multiple white space instead of justNOT IN
Steps To Reproduce
No response
Version
4.0.0
Are you interested in contributing a solution yourself?
Perhaps?
Anything else?
No response
The text was updated successfully, but these errors were encountered: