-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ambiguous inputs on an execute request. #183
Comments
Thank you @pvretano . That summarizes well the remaining issue with #168 (the background discussion about this was in opengeospatial/ogcapi-routes#17 (comment)). I think both solutions have pros and cons:
Perhaps just coming up with this requirement forbidding ambiguity would provide enough insight, or additional informative guidance could be provided. A hybrid approach between Solution 1 and Solution 2 would be to state that if there is an ambiguity with a Processes object, then a qualified value must be used (there could potentially even be a flag in the description schema that makes this obvious to the client, or it could be based on that "disambiguated enough" assessment). A very important use case for this would be to facilitate using any external schema/data on which implementers have no control, which Solution 2 by itself would not support. |
I think the best approach would be to remove inline and enforce having either An alternative to passing an {
"input1": {
"value": "http://www....",
"type": "application/json"
}
} Still, I prefer Solution 1 above all. |
@fmigneault just so I understand ... if I have an input with a simple value then I should still use the
but you are advocating that, even in this case, I would still use
If my understanding is correct then we have done an awful lot of work to end up right where we started from! ... because the entire point of these changes was to allow inputs with simple values to be encoded directly rather than through some nested |
@fmigneault @pvretano The only ambiguity is in that corner case with objects. Myself and others are extremely happy to be able to have the simple string our double without the wrapping value object (this is what allowed us to make great progress in Routes with opengeospatial/ogcapi-routes#17). (the issue that led to allowing inline values directly is #168). |
@pvretano If we decide to stick with value only directly under the input ID (which is also fine), the default behaviour should be to consider it as a plain string value, and therefore not fetch it automatically. This is not "wrong" per se, but I can see how people could get confused about why the input is not fetched if it is provided as |
Going back to the original comment of the issue. I'm not sure anymore if there really is a confusion. Then, why would situation described by (b) even have any reason to be? |
@fmigneault When a process is described as taking in for instance a MultiPolygon, both a JSON MultiPolygon object and a reference GeoJSON does have a slight conflict with the "type" property but potentially can be disambiguated by looking at other properties, but other schemas might be more ambiguous. |
@jerstlouis |
@fmigneault The MultiPolygon schema can be referenced in the input description. That makes me think that one potential way to remove the ambiguity of the client's execute request might be to reference there a schema for potentially ambiguous types, and assume the built-in OGC API - Processes types otherwise if not present. However, I just realized that JSON schema actually does not define a mechanism to do this (e.g. like XML does with .xsd). |
@jerstlouis I don't understand how the reference can be a valid input to the process if the input is described as taking a |
@fmigneault It gets more tricky with schemas allowing additionalProperties. As I said, the MultiPolygon and link can be disambiguated, but I would not consider this trivial to implement (at least in our implementation). But other cases could be more ambiguous, as @pvretano described in detail in the original comment. No, the input is not defined as a complex OneOf. As I understand it, ComplexInput could always be specified either by value or by reference? This allows the client, for the same input, to point to data on another server or as a result of some service request (a minimal workflow capability, though with hardcoded parameters and difficult to nest further) or embed the input data base64 encoded passed by the client in the execution request directly. The input description does not need to concern itself with this -- the reference or value should automatically be supported. The whole premise of Part 3: Workflows is that in addition to a Link and an inline JSON value, you can also point to an OGC API collection or to a nested process execution request as well to supply that input. |
I was under the impression the most common method was by reference, but I could be completely wrong because I usually use I haven't kept track of when the field name became |
@fmigneault Yes, if the value is being passed by reference then the @jerstlouis any INPUT can be specified by value OR by reference. This is specified by the inlineOrRefData.yml schema as shown here:
So you could define an input as
and then say inline in an execute request:
or you could also say by reference in an execute request:
where the contents of myMultiPolygon.json is the same multipolygon as the inline case. However, this inline or reference capability built into the specification is also the source of the problem that I described at the very beginning of this issue. Say I have two inputs A and B and both inputs are complex objects and both inputs have IDENTICAL schemas. Here is a fragment with the definition of each input:
Is this a problem? The answer is no because on execute, the server will read the schema for each input and validate the input value against the corresponding schema. The fact that the two schemas are identical is completely orthogonal. Is this a problem if the values are passed inline or by reference? Again no because the server can disambiguate what is what. The schema of A and B is clearly distinguishable from the schema in link.yaml. Now, say the definitions of B is this:
Clearly, this is not link.yaml but is this a problem? Well lets provide a sample value ...
Is it still a problem? Well, yes! Remember that according to inlineOrRefData.yaml an input can be an object OR a link. This schema of B, as presented above is so close to link.yaml that in this case the server can't tell if this input value is a inline value (in which case the server would simply pass
this would NOT be a problem because the server could clearly tell what is going on; this is not a link but a value to be passed to the process. Although in my original description of the problem I proposed 2 solutions, I think the best solution might be solution 1. Solution one changes the
In other words, you cannot have an arbitrary object as an input value. What this means is that if your intent if to pass the value of B inline then the correct encoding for that would be:
and if you mean to pass B by reference the correct encoding would be:
So basically any value that is an object MUST be encoded in the execute request as a qualified value. So, here is one input from the example in the specification:
This would no longer be valid. Instead this would be to be encoded as:
This is not as clean as just passing the object value directly BUT there is no possibility of ambiguity on the execute. Another solution might be to remove the inline or reference permission from the specification altogether. This means that if a process can take a value either inline OR by reference then the schema for that input in the process description would need to explicitly say that. Something like this:
I would put one restriction on this approach and that is to say that if the server intends to allow an input value to be passed by reference then one of the schemas in the |
I don't think this fixes anything, because you can still explicitly declare that both are supported. In Workflows, this would also mean having to explicitly include the whole execute-request.yaml & a collection.yaml everywhere. I find it quite elegant that inline and value are implicit (and with Workflows also collection or nested execute request). What about my hybrid proposal: objects can be passed directly without qualified value, unless there is an ambiguity? (such as your href example). Since this is a very specific case that most won't run into, it provides an escaping mechanism for it without making the usual situation more painful. |
@pvretano Superb example. I believe that solution 1 with I am not in favor of the hybrid approach where it is allowed unless there is ambiguity, because realistically, it will not be a reflex for all clients to check if there is such ambiguity, and that will lead to edge-case breaking processes/workflows. It is better to always have the |
@fmigneault Myself, I don't mind so much always forcing "value" for objects, but in Routes that may be a point of contention. @cportele's conclusion at opengeospatial/ogcapi-routes#17 (comment) was:
With solution 2, there is no such client-side ambiguity, as it's the server that would be responsible to prevent any ambiguity. The disadvantage that I was seeing with solution 2 is to reference existing schema over which you have no control, but that is probably easily worked around by defining a wrapping object that includes such a schema instead. |
This whole thread shows that the assumption of What I dislike the most about Solution 2 is that, while we offer the possibility to support any custom schema, we cannot allow those 2 specific exceptions ( I don't disagree that the extra |
Just thinking out loud here ... in order to preserve the ability to reference objects directly as input values maybe we need to change the way we reference values ...
|
@pvretano I don't like that, because Workflows adds two more (as @fmigneault just prophetized):
@fmigneault Personally I think it's a much bigger problem with numbers and strings which are very compact, than it is with objects which are typically a lot of stuff, so the noise is relatively minimal, and I agree worth it to avoid the ambiguity. FYI solution 1 was my original proposal: because of these ambiguity I suggested not allowing omitting "value" for objects. (See #155) |
@pvretano |
@jerstlouis |
@fmigneault Yes, of course :) We agreed to that, I would definitely not want to lose those! (that definitely would derail the Routes harmonization) |
@jerstlouis @fmigneault OK, just thinking out loud ... Hate to loose the ability to say |
@jerstlouis pointed out one additional input ambiguity which I think needs to be addressed too. Some time ago we removed minOccurs and maxOccurs and instead said that input cardinality can be expressed using a the minItems/maxItems facets of JSON. This was a mistake because the JSON facets are now being overloaded to indicate both the cardinailty of a process input and potentially constrains on an input value that is an array. The ambiguity that is introduced is not in the execute request but is a consequence of the fact that in the process description we cannot indicate which way an array input should be interpreted. Let me explain by way of example. Consider the following process input definition:
and the following input that appears in an execute request:
My question is: How does this server interpret this input? Well to be honest the interpretation is a bit fuzz. On the one have the specification says that in a process description the On the other hand the specification also says that the cardinality of an input is specified by A further consequence of this ambiguity is related to link.yaml. The specification says that any SINGLE input value can be encoded inline or by reference but what is the single value here? It is the array or is it the item in the array? Again, unclear! The solution is to re-introduce If , instead, I wanted this input to be interpreted as an integer input where the minimum cardinality is 2 and the maximum cardinality is 10 then, I would define the input this way:
and so the value
then the input value Another consequence of this would be that need to add a requirement that says that if the |
@pvretano About that last bit, that was already agreed in #129 and @bpross-52n had taken care of that in PR #161 . |
@jerstlouis ah, thanks ... I was looking for the issue where we removed minOccurs and maxOccurs but could not remember which on it was! |
@pvretano and the PR for that commit was #172 , and the issue asking to remove
I believe the request and the decision to accept it did not consider the fact that the schema in the input description is for a SINGLE input, and that must be the case because of the the associated rules allowing to replace a SINGLE input value by:
|
NOTE: An alternative to reintroducing minOccurs / maxOccurs would be to use I know we previously had this behavior, and then had reverted back to the WPS way where maxOccurs is 1 by default, but if we used |
@jerstlouis it is not just an arbitrary integer or string. It is integer or "unbounded". I kinda like that capability. What does it mean if maxItems is left out on JSON schema? I was searching the JSON schema specification and I could not find that. I see that if minItems is left out it defaults to 1. |
@pvretano Correct, if either With JSON Schema, if
I would prefer to follow the JSONSchema way but whichever way will be fine as long as it's clear. Either way, it needs to be clear that if |
There is one downside with using minItems/maxItems and interpreting their values as the JSON Schema specification does and that is that for almost ALL inputs you will need to specify |
@pvretano right, I thought of that. Both have advantages I think:
I am happy with either (as long as the default is clearly documented) -- this is a detail and it was really just a suggestion, for which the best thing would probably be to poll the SWG's preference as part of discussing the PR and tweak it based on that. |
Personally, I prefer I prefer to preserve the default |
@fmigneault this is how I have encoded it in the PR associated with this issue. That is the cardinality of a process input is set by minOccurs/maxOccurs with default 1. |
Fix input ambiguities described in issue #183.
10-MAY-2021: In today's SWG teleconference there was some discussion about a possible ambiguity in how process inputs are interpreted when the process input is an object. The ambiguity is thought to arise as a result of two items in the inlineOfRefData.yaml schema. Specifically
inlineOrRefData.yaml
allows:The perceived ambiguity is that someone might decide to defined an input object that has a structure similar to one of the predefined OAPIP input objects (e.g.
link.yaml
). In this case the behavior of the server is thought to be unclear. Consider the following input definition:In an execute request an example input for
input1
might be:The potential ambiguities are (a) how the server should handle this input and (b) how would I pass this input by reference?
I don't think (a) is a problem. If this was a link (as per
link.yaml
) then the server's actions are clear; the server should fetch the value referenced by the link and pass that fetched value to the process as the value forinput1
. However, this is NOT a link (as perlink.yaml
). This is some bespoke schema defined by the process and so what the server should do is pass the verbatim value{"href": "http://www....", "type": "application/..."}
to the process as the value ofinput1
and let the process deal with it from there.The other point (b), however, is a problem. If I wanted to pass the value for
input1
by reference, how would I do it? Well, as per the specification I would say:See the problem? How do I know, in this case, if
input1
is being passed by value or by reference? I can't really tell because the custom schema forinput1
just happens to look very similar to the schema of a link (i.e.link.yaml
).I see two solutions to the problem.
Solution 1: Remove from
inlineOrRefData.yaml1
object
as a possible input. This would mean that all "objects" would need to be specified as qualified values. Using myinput1
example, the inline value would be specified as:and so the entire inline versus reference issue is resolved. The
inlineOrRefData.yaml
schema would thus look like this:Solution 2: Have some requirements or text in the specification describing the problem and forbidding input object schemas that match
link.yaml
orqualifiedValue.yaml
enough that the input intentions are ambiguous.My preference is Solution 2 since allowing direct object input seems more natural than having to dig into an object to get the input value. Just to get a visual, consider the
complexObjectInput
from the example. Right now it is specified like this:If we implement solution 1, it would need to be specified like this:
Same thing would need to be done with the
measureInput
,geometryInput
,boundingBoxInput
andfeatureCollectionInput
in the example too. Basically any input that is a JSON object that is not already defined by the specification (like link.yaml and qualifiedValue.yaml), when specified inline, would need to be specified as a qualified value.However, I am not religious about either approach. Rather, I'll wait to see what the consensus is and implement that as a PR.
The text was updated successfully, but these errors were encountered: