-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression/Change in 4.0 M3: query parser fails with "variable 'o1' in projection not present in GROUP BY" (ORDER BY clause) #3782
Comments
We should test this against other implementations to see how they handle it. |
I would also like to isolate the change that causes this difference. |
By the way, given that this query does not even involve an aggregate, I don't believe it should fail on this. I'm taking a closer look at the test cases and what changes we recently made that could have introduced this regression. Most likely candidate for this is GH-2990. |
I've confirmed that this regression is a side effect of GH-2990. |
Relevant section in the query spec is https://www.w3.org/TR/sparql11-query/#aggregateRestrictions :
The above section is why we do these variable presence checks when parsing a GROUP BY clause. However, this really should only apply if the query involves aggregates. Using a GROUP BY without an aggregate is kind of pointless, but it certainly should not result in an error. If I read the restrictions correcltly, if this query had involved an aggregate, it would have been correct to throw an error here, as For example:
is not strictly legal. You'd have to rewrite like htis:
|
…gregate - covers SPARQL Negative parser test cases :group06 and :group07
…gregate - covers SPARQL Negative parser test cases :group06 and :group07
…gregate - covers SPARQL Negative parser test cases :group06 and :group07
Hm, there are two syntax test cases in the W3C SPARQL test suite that seem to contradict what I said about it only applying when aggregates are involved. I am seeking clarification from the wider community (via the sparql 1.2 group), but right now, my reading of the standard is as follows: When a query has a
What it hinges on is what is covered by "simple expressions consisting of just a variable" - I previously read that strictly as only being a single variable, and not an aliasing expression (such as If we apply this to your example query:
should be syntactically legal, because:
|
The above seems to be borne out by responses on the sparql 1.2 mailinglist. So for now I'm going with this interpretation as it is consistent and easy to implement. |
Thanks a lot @jeenbroekstra for following up on this 👍 I think the above interpretation of considering "renamings" as simple expressions makes totally sense and is in-line with what I would expect as a user. This morning I have done some further testings accross some different databases. It looks like the interpretation and evaluation there is very much the same for my example query:
Note that for this query RDF4J 4.0 M3 fails As a second note: the original query in our unit tests has some more expressions, and also contains an aggregation (COUNT). In my issue description I just tried to reduce it to the minimal. As next step I'll now look at your PR and I will also put the snapshot jar build of that branch into our code-base for cross validation |
GH-3782 clean up handling of GROUP BY projection element verification
…even if no aggregate - covers SPARQL Negative parser test cases :group06 and :group07
Current Behavior
We have applied the 4.0 M3 milestone in our application, and one of our query parsing/rendering unit tests is failing (while it was working with 3.7.x and 4.0-M2)
It looks like some regression (or at least a change) was introduced in the recent milestone
I have reduced the query example to the minimum where it occurs at parsing time
Fails with:
variable 'o1' in projection not present in GROUP BY
Expected Behavior
As this is a change compared to previous versions, we need to have a good understanding of "correctness w.r.t the SPARQL spec". I do not have sufficient knowledge whether ORDER BY is executed on the final projected result or on an intermediate result.
If the change / regression is not intended and correct w.r.t the spec, the expectation is that the query parser behaves as in previous versions of RDF4J
Steps To Reproduce
Can be reproduced with the following unit test snippet
which fails with
Version
4.0.0-M3
Are you interested in contributing a solution yourself?
No response
Anything else?
I would like to see a good assessment (and potentially a discussion) on this (cc @jeenbroekstra , @hmottestad )
The text was updated successfully, but these errors were encountered: