Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implications of semaphore mechanism for SPARQL Update #322

Open
kjetilk opened this issue Oct 14, 2021 · 8 comments
Open

Implications of semaphore mechanism for SPARQL Update #322

kjetilk opened this issue Oct 14, 2021 · 8 comments

Comments

@kjetilk
Copy link
Member

kjetilk commented Oct 14, 2021

While writing up a text around the semaphore mechanism from the RWW Design Issue, also discussed in #125 , solid/solid-spec#193 , solid-contrib/query-panel#3 and others, I found that I have not yet fully understood the problem.

Is it also the idea that the same mechanism can be used in a situation where there isn't actual concurrency? I could see a case for that, say that one client GETs a certain resource, looks at it, and then wants to change something using a PATCH but meanwhile another client has changed the same? This is more the archetypal case for conditional requests, so I didn't think of it as a valid situation for the semaphore mechanism, but I guess a case for that this is actually a simpler mechanism could be made, as maintaining validators are more work for servers, and since it is a validator for the entire document, it scales badly if many people are co-editing a large document.

This issue has usually been discussed in terms of atomicity, including rdflib's documentation. I have not quite managed to understand that angle in light of the SPARQL Update spec, which says:

If any solution produces a triple containing an unbound variable or an illegal RDF construct, such as a literal in a subject or predicate position, then that triple is not included when processing the operation: INSERT will not instantiate new data in the output graph, and DELETE will not remove anything.

In our case, it means that if the WHERE clause doesn't match anything, the variable will be unbound and so nothing will be deleted. In other words, it sounds like we have standard behavior from the query language.

However, it is true that no error is raised by standard SPARQL, as a measure to not leak information. Thus, we must expect that standard SPARQL implementations will need possibly substantial modifications to accommodate for this case now.

To accommodate for this situation on the Solid protocol, I wonder if it would suffice to say something like

"If any solution produces a triple containing an unbound variable or an illegal RDF construct, then the server MUST abort any modifications and respond with a 409 status code."

That seems like a relatively minor change compared to standard SPARQL. This wouldn't be a violation of the language, AFAICS, it is a protocol thing.

I have only seen the semaphore mechanism mention in connection to the DELETE INSERT WHERE case, does that mean we only have to consider it in that context?

In the case where there is a pure delete operation, you don't actually care if someone has deleted the data before you, it is gone, and then all is fine, right?

It seems to me like the DELETE DATA/INSERT DATA case is a bigger problem than the DELETE INSERT WHERE case, because in SPARQL, they are considered separate operations, even if executed in a single HTTP request. This would lead to that both INSERT DATAs are committed, isolation or not.

The easy way out of that is that developers should always use DELETE INSERT WHERE if they need semaphore behavior. That would, as far as I can see, be compatible with NSS.

The alternative is to say that if DELETE DATA deletes nothing, then the server must return 409, which is an actual spec violation. It may not be very different practically, I suppose, but still a spec violation.

@csarven
Copy link
Member

csarven commented Oct 18, 2021

https://www.w3.org/TR/sparql11-update/#updateLanguage

If multiple operations are present in a single request, then a result of failure from any operation MUST abort the sequence of operations, causing the subsequent operations to be ignored.

My interpretation is that given requested DELETE/INSERT operation, and DELETE fails, then the rest (INSERT) should be ignored. I don't see INSERT failing (feasible?) or preceding DELETE.

@kjetilk
Copy link
Member Author

kjetilk commented Oct 18, 2021

That's actually a different thing. A DELETE INSERT WHERE query is a single operation. You could also formulate a sequence of operations which would be similar.

DELETE WHERE { ?foo a <Bar> } ;
INSERT { ?foo a <Baz> } WHERE { ?foo a <Bar> }

are two operations in a single request, whereas

DELETE { ?foo a <Bar> } 
INSERT { ?foo a <Baz> } WHERE { ?foo a <Bar> }

is a single operation.

I asked in #125 (comment) whether we should support several operations in one request, and @RubenVerborgh 's response was "not yet". Indeed, it is possible to design something around this, but the key problem in our case is that the result of a DELETE that doesn't delete anything in SPARQL is not a failure, whereas it is in Solid.

@kjetilk
Copy link
Member Author

kjetilk commented Oct 18, 2021

Right, I suspected as much for the DELETE DATA INSERT DATA case, but I can't think of any way to reconcile the multiple match case with SPARQL. I need to think further about that.

@kjetilk
Copy link
Member Author

kjetilk commented Oct 18, 2021

So, one possibility is that we do not allow full SPARQL for 0.9... We only allow the constrained subset and with that mechanism... I don't like that option myself, but what do others think?

@kjetilk
Copy link
Member Author

kjetilk commented Oct 18, 2021

hmmm, yeah, but actually, perhaps we should have a content type for it... With something that incompatible with application/sparql-update we'd need that anyway.

@kjetilk
Copy link
Member Author

kjetilk commented Oct 20, 2021

I might also add that exactly that kind of query

DELETE DATA {
  <https://ruben2021.solidcommunity.net/profile/card#me> <http://www.w3.org/2006/vcard/ns#fn> "Kjetil" .
 }
 ; INSERT DATA { <https://ruben2021.solidcommunity.net/profile/card#me> <http://www.w3.org/2006/vcard/ns#fn> "Ruben" .
 }

was the reason I originally proposed to have just DELETE DATA; INSERT DATA as the subset. If you don't have any variables, it is an awful lot easier to have just one match :-) If all those queries that require only one match could be rewritten as this one, that would also make life much easier.

@rubensworks
Copy link

@kjetilk I might be on board with that; just what do we do with blank nodes? (hah)

I think it should be straightforward to follow the SPARQL Update spec here (for DELETE DATA; INSERT DATA):

the INSERT DATA statement only allows to insert ground triples. Blank nodes in QuadDatas are assumed to be disjoint from the blank nodes in the Graph Store, i.e., will be inserted with "fresh" blank nodes.

in a DELETE DATA operation neither variables nor blank nodes are allowed

@kjetilk
Copy link
Member Author

kjetilk commented Oct 21, 2021

I have now made an alternative PR in #330, where I try out what @timbl has been voicing, i.e. REMOVE and REMOVE DATA.

I don't quite feel that we have exhaustively answered the question:

"When using the semaphore mechanism, would the triples that you want to change be known before the PATCH request?"

I bet there are desirable cases where that's not quite true, but then, are they significant enough?

My feeling around this is:

The case for REMOVE DATA ; INSERT DATA is pretty strong, it is an atomic update operation with a 409 conflict resolution. It can be hacked in an afternoon, and wouldn't need a query engine at all. I could see this entering the SPARQL spec.

The case for REMOVE INSERT WHERE in the case where there are unbound solutions causing a failure in SPARQL Update terms is also quite good. You'd need a SPARQL engine pretty quickly, but it isn't too hard to modify. I think it would be harder to argue to a SPARQL WG, and I would personally prefer a broader approach, but this can be done.

What I really struggle with is the idea that there should be a failure when the query has multiple solutions. Like, yes, it might be something wrong, but it also might not be, it might be very legitimate reasons why there are multiple solutions. And if there really is a problem, this conflict resolution mechanism would be the wrong way to do that, as there are many other reasons why that might have been introduced than an edit conflict. That would be more in the realms of shape validation and so on. It also departs very much from my understanding of what SPARQL is.

I have nevertheless included that language in #330 , but I'd rather leave it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Consensus Phase
Development

No branches or pull requests

3 participants