Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-3782 clean up handling of GROUP BY projection element verification #3788

Merged
merged 3 commits into from
Apr 12, 2022

Conversation

abrokenjester
Copy link
Contributor

@abrokenjester abrokenjester commented Apr 9, 2022

GitHub issue resolved: #3782

Briefly describe the changes proposed in this PR:

When a query has a GROUP BY clause, the projection may only contain:

  1. aggregate expressions
  2. constants
  3. simple expressions consisting of just a variable (if and only if that var is in the GROUP BY)

code changes:

  • check presence of all non-simple expression types in projeciton when using GROUP BY
  • clearer errors are given
  • added test cases

PR Author Checklist (see the contributor guidelines for more details):

  • my pull request is self-contained
  • I've added tests for the changes I made
  • I've applied code formatting (you can use mvn process-resources to format from the command line)
  • I've squashed my commits where necessary
  • every commit message starts with the issue number (GH-xxxx) followed by a meaningful description of the change

@abrokenjester abrokenjester force-pushed the GH-3782-sparql-group-by branch 2 times, most recently from 1d7288b to 9c5a4e7 Compare April 10, 2022 03:25
…gregate

- covers SPARQL Negative parser test cases :group06 and :group07
Copy link
Contributor

@aschwarte10 aschwarte10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell the fix looks good. It addresses the finding and corresponds to the spec interpretation summarized in the issue (i.e. renamings are considered simple expressions).

I have tested this piece of code on our codebase.

The specific unit test is still failing, but with the agreed interpretation of the spec this is also expected. Note that for the below query (kind of artificial, but still covers some aspects) RDF4J 3.7.x and RDF4j 4.0 M2 were accepting it

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT DISTINCT 
	?s (?o AS ?o1) 
	(<http://www.w3.org/2001/XMLSchema#boolean>(?o) AS ?o2) 
	(COUNT(DISTINCT ?p) AS ?count) 
	(rdf:type AS ?typeProperty) 
WHERE {
	{ 	
		?s ?p ?o . 
		?s rdf:type foaf:Person . 
		BIND ('42' AS ?knowsname ) . 
		?o rdf:type dc:Document . } 
		MINUS 
		{
			?s rdf:type rdfs:Resource . 
			OPTIONAL { ?s ?p ?o . } 
		} 
		FILTER (NOT EXISTS { ?s rdfs:label ?o }) 
	} GROUP BY ?s ?o 
		HAVING (?count=5) 
		ORDER BY ?s ?o1 
		LIMIT 10 OFFSET 5 
		VALUES (?s) { (<http:example.org/dummy>) }

With this branch we are seeing (as-expected):

non-aggregate expression 'FunctionCall (http://www.w3.org/2001/XMLSchema#boolean)
   Var (name=o)
' not allowed in projection when using GROUP BY

Removing the functional expression from the projection of our example query, let's the parsing succeed

@jeenbroekstra maybe the above query is also of interest for you. We seem to use it in the context of query rendering (where we also do parsing)

I am good to merge. Maybe a second pair of eyes for confirmation

@abrokenjester
Copy link
Contributor Author

abrokenjester commented Apr 11, 2022

@aschwarte10 thanks for the feedback, and that additional query looks interesting. If you have the means, could you rewrite it to this and try it out:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT DISTINCT 
	?s (?o AS ?o1) 
	?o2
	(COUNT(DISTINCT ?p) AS ?count) 
	(rdf:type AS ?typeProperty) 
WHERE {
	{ 	
		?s ?p ?o . 
		?s rdf:type foaf:Person . 
		BIND ('42' AS ?knowsname ) . 
		?o rdf:type dc:Document . } 
	MINUS 
	{
			?s rdf:type rdfs:Resource . 
			OPTIONAL { ?s ?p ?o . } 
	} 
	FILTER (NOT EXISTS { ?s rdfs:label ?o }) 
} GROUP BY ?s ?o (<http://www.w3.org/2001/XMLSchema#boolean>(?o) AS ?o2) 
HAVING (?count=5) 
ORDER BY ?s ?o1 
LIMIT 10 OFFSET 5 
VALUES (?s) { (<http:example.org/dummy>) }

In other words: move the boolean cast to inside the GROUP BY, and project the resulting alias (o2).
That should be syntactically fine, but if you can try I'd like to verify that it also gives the expected result when evaluated.

Regardless, I should probably have a quick look at our test suite to see if this kind of thing is covered. Pretty sure it is, but never hurts to double-check.

@aschwarte10
Copy link
Contributor

@jeenbroekstra the parsing works good (which is what we are expecting with this change)

The query renderer in org.eclipse.rdf4j.queryrender.sparql.experimental.SparqlQueryRenderer.SparqlQueryRenderer() however seems to produce a slightly different query:

FILTER (!EXISTS {	?s <http://www.w3.org/2000/01/rdf-schema#label> ?o . } ) 
BIND (<http://www.w3.org/2001/XMLSchema#boolean>(?o)  AS ?o2) . 
 }
 GROUP BY ?s ?o ?o2

i.e. it moves the function call as a BIND into the WHERE clause

I think this should be fine though (and if at all it is orthogonal to this fix)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants