Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambiguous inputs on an execute request. #183

Closed
pvretano opened this issue May 10, 2021 · 34 comments · Fixed by #191
Closed

Ambiguous inputs on an execute request. #183

pvretano opened this issue May 10, 2021 · 34 comments · Fixed by #191

Comments

@pvretano
Copy link
Contributor

pvretano commented May 10, 2021

10-MAY-2021: In today's SWG teleconference there was some discussion about a possible ambiguity in how process inputs are interpreted when the process input is an object. The ambiguity is thought to arise as a result of two items in the inlineOfRefData.yaml schema. Specifically inlineOrRefData.yaml allows:

  1. an input value to be specified inline or by reference
  2. an input value may be a JSON object.

The perceived ambiguity is that someone might decide to defined an input object that has a structure similar to one of the predefined OAPIP input objects (e.g. link.yaml). In this case the behavior of the server is thought to be unclear. Consider the following input definition:

"input1": {
   "schema": {
      "type": "object",
      "required": ["href"],
      "properties": {
         "href": {
            "type": "string",
            "format": "uri"
         },
         "type": {
            "type": "string"
         }
      }
   }
} 

In an execute request an example input for input1 might be:

"input1": {
   "href": "http://www....",
   "type": "... some media type ..."
}

The potential ambiguities are (a) how the server should handle this input and (b) how would I pass this input by reference?

I don't think (a) is a problem. If this was a link (as per link.yaml) then the server's actions are clear; the server should fetch the value referenced by the link and pass that fetched value to the process as the value for input1. However, this is NOT a link (as per link.yaml). This is some bespoke schema defined by the process and so what the server should do is pass the verbatim value {"href": "http://www....", "type": "application/..."} to the process as the value of input1 and let the process deal with it from there.

The other point (b), however, is a problem. If I wanted to pass the value for input1 by reference, how would I do it? Well, as per the specification I would say:

"input1": {
   "href": "http://www....",
   "type": "application/json"
}

See the problem? How do I know, in this case, if input1 is being passed by value or by reference? I can't really tell because the custom schema for input1 just happens to look very similar to the schema of a link (i.e. link.yaml).

I see two solutions to the problem.

Solution 1: Remove from inlineOrRefData.yaml1 object as a possible input. This would mean that all "objects" would need to be specified as qualified values. Using my input1 example, the inline value would be specified as:

"input1": {
   "value": { "href": "http://www....", "type": "... some media type ..." }
}

and so the entire inline versus reference issue is resolved. The inlineOrRefData.yaml schema would thus look like this:

oneOf:
  - type: string
  - type: number
  - type: boolean
  - type: array
  - $ref: "link.yaml"
  - $ref: "qualifiedValue.yaml"

Solution 2: Have some requirements or text in the specification describing the problem and forbidding input object schemas that match link.yaml or qualifiedValue.yaml enough that the input intentions are ambiguous.

My preference is Solution 2 since allowing direct object input seems more natural than having to dig into an object to get the input value. Just to get a visual, consider the complexObjectInput from the example. Right now it is specified like this:

   "complexObjectInput": {
      "property1": "value1",
      "property2": "value2",
      "property5": true
   },

If we implement solution 1, it would need to be specified like this:

   "complexObjectInput": {
      "value": {
         "property1": "value1",
         "property2": "value2",
         "property5": true
      }
   },

Same thing would need to be done with the measureInput, geometryInput, boundingBoxInput and featureCollectionInput in the example too. Basically any input that is a JSON object that is not already defined by the specification (like link.yaml and qualifiedValue.yaml), when specified inline, would need to be specified as a qualified value.

However, I am not religious about either approach. Rather, I'll wait to see what the consensus is and implement that as a PR.

@jerstlouis
Copy link
Member

jerstlouis commented May 10, 2021

Thank you @pvretano . That summarizes well the remaining issue with #168 (the background discussion about this was in opengeospatial/ogcapi-routes#17 (comment)).

I think both solutions have pros and cons:

  • Solution 1 (require objects to use value) avoids any ambiguity and based on our approach to parsing JSON I know it would greatly simplify our own server implementation, but on the other hand it makes for cumbersome extra { "value" : ...} noise
  • Solution 2 (require some diferentiator) keeps things leaner and cleaner, but the challenge is both how to describe "disambiguated enough", and how to implement it in practice.

Perhaps just coming up with this requirement forbidding ambiguity would provide enough insight, or additional informative guidance could be provided.

A hybrid approach between Solution 1 and Solution 2 would be to state that if there is an ambiguity with a Processes object, then a qualified value must be used (there could potentially even be a flag in the description schema that makes this obvious to the client, or it could be based on that "disambiguated enough" assessment).

A very important use case for this would be to facilitate using any external schema/data on which implementers have no control, which Solution 2 by itself would not support.

@fmigneault
Copy link
Contributor

fmigneault commented May 10, 2021

@pvretano @jerstlouis

I think the best approach would be to remove inline and enforce having either value or href which contains the "data", whether it is a link or plain value. This makes it clear in each case and avoids the ambiguity. I fear solution 2 is prone to many errors that won't make it obvious at first glance why the server doesn't behave "as intended".

An alternative to passing an href value directly without pre-fetch by the server could be to use value, since what is expected is to pass the "string value literally". With the addition of schema, it could be possible to add the "format": "uri" constraint to that string, so there is no need for explicit href. So the result would be submitted as follows:

{
    "input1": {
        "value": "http://www....",
        "type": "application/json"
    }
}

Still, I prefer Solution 1 above all.

@pvretano
Copy link
Contributor Author

pvretano commented May 10, 2021

@fmigneault just so I understand ... if I have an input with a simple value then I should still use the value key. Right now, in the example I can have this:

"dateInput": "2021-03-06T07:21:00"

but you are advocating that, even in this case, I would still use

"dateInput": { "value": "2021-03-06T07:21:00"}

If my understanding is correct then we have done an awful lot of work to end up right where we started from! ... because the entire point of these changes was to allow inputs with simple values to be encoded directly rather than through some nested value key.

@jerstlouis
Copy link
Member

jerstlouis commented May 10, 2021

@fmigneault @pvretano The only ambiguity is in that corner case with objects.

Myself and others are extremely happy to be able to have the simple string our double without the wrapping value object (this is what allowed us to make great progress in Routes with opengeospatial/ogcapi-routes#17).

(the issue that led to allowing inline values directly is #168).

@fmigneault
Copy link
Contributor

fmigneault commented May 11, 2021

@pvretano
Indeed. Personally (and others can disagree), I feel it is not more or less work to have the sub-object with value. I would prefer to always have it, but be sure of the execution intent, than guessing what to do with it and possibly have unexpected behaviour.
Since an object must be provided when using href, or when adding anything else than just the plain value, I think it is actually more consistent to always have the object.

If we decide to stick with value only directly under the input ID (which is also fine), the default behaviour should be to consider it as a plain string value, and therefore not fetch it automatically. This is not "wrong" per se, but I can see how people could get confused about why the input is not fetched if it is provided as {"input1": "http://...."}.

@fmigneault
Copy link
Contributor

Going back to the original comment of the issue. I'm not sure anymore if there really is a confusion.
If input1 was defined with a schema, it is established that submitted data gets passed around as value as in point (a), and that's it.

Then, why would situation described by (b) even have any reason to be?
You cannot change the "data-type" midway to have a reference input, just like you could not change between a float, string or whatever else that was defined in the process inputs.
So basically, it's not because the structure of schema resembles link.yaml definition that the server should try to parse it as a link, since override with schema was explicitly provided to request a value. If by reference behaviour is desired, then the input should be defined as "ComplexData", without schema, and with href, format and other relevant details.

@jerstlouis
Copy link
Member

@fmigneault When a process is described as taking in for instance a MultiPolygon, both a JSON MultiPolygon object and a reference { "href" : "https://...", "type" : "application/geo+json" } to a JSON object somewhere are valid as inputs.

GeoJSON does have a slight conflict with the "type" property but potentially can be disambiguated by looking at other properties, but other schemas might be more ambiguous.

@fmigneault
Copy link
Contributor

@jerstlouis
Ah I see! Thanks for the clarification.
Then, should MultiPolygon be processed with a specific type, similar to bbox.yaml ?

@jerstlouis
Copy link
Member

jerstlouis commented May 11, 2021

@fmigneault The MultiPolygon schema can be referenced in the input description.

That makes me think that one potential way to remove the ambiguity of the client's execute request might be to reference there a schema for potentially ambiguous types, and assume the built-in OGC API - Processes types otherwise if not present.

However, I just realized that JSON schema actually does not define a mechanism to do this (e.g. like XML does with .xsd). $schema is meant to refer to a version of JSON Schema itself, but in that SO post there is mention of some using it for that purpose.

@fmigneault
Copy link
Contributor

@jerstlouis
I don't think Link and MultiPolygon are ambiguous.
Link as a required href while MultiPolygon requires coordinates, so they can safely be distinguished when validating against their respective schema. Even "MultiPolygon" explicit string is required for type to be valid. The whole schema in each case must be valid, so partial match of type field alone should not matter.

I don't understand how the reference can be a valid input to the process if the input is described as taking a MultiPolygon object.
Is the input schema defined as a complex OneOf(Link, MultiPolygon)? I don't believe OGC-API Process is designed to swap around between data/ref inputs on the fly, or am I misinterpreting? Maybe this distinction was clearer in older WPS with LiteralInput, BoundingBoxInput and ComplexInput, but I never had the impression it was expected of Processes to support both alternatives via the same input ID.

@jerstlouis
Copy link
Member

jerstlouis commented May 12, 2021

@fmigneault It gets more tricky with schemas allowing additionalProperties. As I said, the MultiPolygon and link can be disambiguated, but I would not consider this trivial to implement (at least in our implementation).

But other cases could be more ambiguous, as @pvretano described in detail in the original comment.

No, the input is not defined as a complex OneOf.

As I understand it, ComplexInput could always be specified either by value or by reference?

This allows the client, for the same input, to point to data on another server or as a result of some service request (a minimal workflow capability, though with hardcoded parameters and difficult to nest further) or embed the input data base64 encoded passed by the client in the execution request directly.

The input description does not need to concern itself with this -- the reference or value should automatically be supported.

The whole premise of Part 3: Workflows is that in addition to a Link and an inline JSON value, you can also point to an OGC API collection or to a nested process execution request as well to supply that input.

@fmigneault
Copy link
Contributor

As I understand it, ComplexInput could always be specified either by value or by reference?

I was under the impression the most common method was by reference, but I could be completely wrong because I usually use href.
If using raw data, I think type would still refer to the Content-Type of that data, and not its schema structure, so it would always be application/json in that case since schema is defined via JSON schema. Maybe @pvretano can (in)validate this?

I haven't kept track of when the field name became type. It was format: "<mediaType>" before that was submitted with ComplexInput, which referred to one of the items in the list of supported MIME-types for that input from the process description. I believe that there must not be a mix between type (as in expected schema) and format (as in content-type) to avoid the kind of ambiguity described here.

@pvretano
Copy link
Contributor Author

pvretano commented May 12, 2021

@fmigneault Yes, if the value is being passed by reference then the type is the media type of the data that will be read by the server. That could be some text format like application/json but also some raw binary format like image/png.

@jerstlouis any INPUT can be specified by value OR by reference. This is specified by the inlineOrRefData.yml schema as shown here:

oneOf:
  - type: string
  - type: number
  - type: boolean
  - type: array
  - type: object
  - $ref: "link.yaml"
  - $ref: "qualifiedValue.yaml"

So you could define an input as

    "geometryInput": {
      "schema": {
              "$ref": "http://schemas.opengis.net/ogcapi/features/part1/1.0/openapi/schemas/multipolygonGeoJSON.yaml
      },

and then say inline in an execute request:

   "geometryInput": {
      "type": "MultiPolygon",
       "coordinates": [
          [[[102.0, 2.0], [103.0, 2.0], [103.0, 3.0], [102.0, 3.0], [102.0, 2.0]]],
          [[[100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0]],
          [[100.2, 0.2], [100.8, 0.2], [100.8, 0.8], [100.2, 0.8], [100.2, 0.2]]]
      ]
    }

or you could also say by reference in an execute request:

   "geometryInput": {
      "href": "https://www.someserver.com/myMultiPolygon.json"
      "type": "application/json"
   }

where the contents of myMultiPolygon.json is the same multipolygon as the inline case.

However, this inline or reference capability built into the specification is also the source of the problem that I described at the very beginning of this issue.

Say I have two inputs A and B and both inputs are complex objects and both inputs have IDENTICAL schemas. Here is a fragment with the definition of each input:

    "A": {
      "title": "Complex Object Input Example",
      "schema": {
        "type": "object",
        "required": [
          "property1",
          "property5"
        ],
        "properties": {
          "property1": {
            "type": "string"
          },
          "property2": {
            "type": "string",
            "format": "uri"
          },
          "property3": {
            "type": "number"
          },
          "property4": {
            "type": "string",
            "format": "dateTime"
          },
          "property5": {
            "type": "boolean"
          }
        }
      }
    },

    "B": {
      "title": "Complex Object Input Example",
      "schema": {
        "type": "object",
        "required": [
          "property1",
          "property5"
        ],
        "properties": {
          "property1": {
            "type": "string"
          },
          "property2": {
            "type": "string",
            "format": "uri"
          },
          "property3": {
            "type": "number"
          },
          "property4": {
            "type": "string",
            "format": "dateTime"
          },
          "property5": {
            "type": "boolean"
          }
        }
      }
    },

Is this a problem? The answer is no because on execute, the server will read the schema for each input and validate the input value against the corresponding schema. The fact that the two schemas are identical is completely orthogonal.

Is this a problem if the values are passed inline or by reference? Again no because the server can disambiguate what is what. The schema of A and B is clearly distinguishable from the schema in link.yaml.

Now, say the definitions of B is this:

   "B": {
      "title": "Complex Object Input Example",
      "schema": {
        "type": "object",
        "required": [
          "href"
        ],
        "properties": {
           "href": {
              "type": "string"
           },
           "rel": {
              "type": "string"
           },
           "type": {
              "type": "string"
           },
           "wallyCount": {
              "type": "number"
           }
        }
      }
    },

Clearly, this is not link.yaml but is this a problem? Well lets provide a sample value ...

"B": { "href": "... some link ..."}

Is it still a problem? Well, yes! Remember that according to inlineOrRefData.yaml an input can be an object OR a link. This schema of B, as presented above is so close to link.yaml that in this case the server can't tell if this input value is a inline value (in which case the server would simply pass "B": { "href": "... some link ..."} to the process) OR if this is a reference to a value. There is not enough information in the input to disambiguate. This is a PROBLEM! If the input was this:

"B": {"href": "...some link..."."wallyCount": 10}

this would NOT be a problem because the server could clearly tell what is going on; this is not a link but a value to be passed to the process.

Although in my original description of the problem I proposed 2 solutions, I think the best solution might be solution 1. Solution one changes the inlineOrRefData.yaml schema to look like this:

oneOf:
  - type: string
  - type: number
  - type: boolean
  - type: array
  - $ref: "link.yaml"
  - $ref: "qualifiedValue.yaml"

In other words, you cannot have an arbitrary object as an input value. What this means is that if your intent if to pass the value of B inline then the correct encoding for that would be:

"B": {
   "value": {"href": "... some link..." }
}

and if you mean to pass B by reference the correct encoding would be:

"B": {"href": "... some other link..."}

So basically any value that is an object MUST be encoded in the execute request as a qualified value. So, here is one input from the example in the specification:

    "complexObjectInput": {
      "property1": "value1",
      "property2": "value2",
      "property5": true
    },

This would no longer be valid. Instead this would be to be encoded as:

   "complexObjectInput": {
      "value": {
         "property1": "value1",
         "property2": "value2",
         "property5": true
      }
   },

This is not as clean as just passing the object value directly BUT there is no possibility of ambiguity on the execute.

Another solution might be to remove the inline or reference permission from the specification altogether. This means that if a process can take a value either inline OR by reference then the schema for that input in the process description would need to explicitly say that. Something like this:

    "A": {
      "title": "Complex Object Input Example",
      "schema": {
         "oneOf": [
            { 
               "$ref": ".../.../.../link.yaml"
            },
            {
               "type": "object",
               "required": [
                  "property1",
                  "property5"
                ],
                "properties": {
                   "property1": {
                      "type": "string"
                   },
                   "property2": {
                      "type": "string",
                      "format": "uri"
                   },
                   "property3": {
                      "type": "number"
                   },
                   "property4": {
                      "type": "string",
                      "format": "dateTime"
                   },
                   "property5": {
                      "type": "boolean"
                   }
                }
             }
          }
       ]

I would put one restriction on this approach and that is to say that if the server intends to allow an input value to be passed by reference then one of the schemas in the oneOf must match the link.yaml schema exactly. The easy thing to do, of course, is to simply reference link.yaml as I have shown here.

@jerstlouis
Copy link
Member

jerstlouis commented May 12, 2021

@pvretano

This means that if a process can take a value either inline OR by reference then the schema for that input in the process description would need to explicitly say that.

I don't think this fixes anything, because you can still explicitly declare that both are supported.

In Workflows, this would also mean having to explicitly include the whole execute-request.yaml & a collection.yaml everywhere. I find it quite elegant that inline and value are implicit (and with Workflows also collection or nested execute request).

What about my hybrid proposal: objects can be passed directly without qualified value, unless there is an ambiguity? (such as your href example).

Since this is a very specific case that most won't run into, it provides an escaping mechanism for it without making the usual situation more painful.
I think some guidance is still required to explain how the server should establish that such an ambiguity exists (e.g. if all properties of the input type are found within link.yaml, or with Workflows, collection or execute request), as well as how to disambiguate something like MultiPolygon vs. Link (e.g. if any other properties than href or type are included? how about rel, title? If any other property than these, which is found in the input schema?).

@fmigneault
Copy link
Contributor

@pvretano Superb example.
I think that the fact that a restriction must already be considered about explicit link.yml reference within the schema already shows that this is heading the wrong way.
And as @jerstlouis, there is indeed still a possibility to have misinterpretation this way.

I believe that solution 1 with value: { <data> } is the way to go to handle this case, but it can still be valid to support inline literal value of other basic data types, since they can be distinguished from the custom schema or reference href objects.

I am not in favor of the hybrid approach where it is allowed unless there is ambiguity, because realistically, it will not be a reflex for all clients to check if there is such ambiguity, and that will lead to edge-case breaking processes/workflows. It is better to always have the value and sub-object, and be sure the process understands the intent properly. Leaving place to interpretation will cause different implementations to behave differently, reducing interoperability.

@jerstlouis
Copy link
Member

@fmigneault Myself, I don't mind so much always forcing "value" for objects, but in Routes that may be a point of contention. @cportele's conclusion at opengeospatial/ogcapi-routes#17 (comment) was:

I don't think value is needed for objects either and Processes could be updated accordingly. The only thing that we would need to add in the "compute a new route" request is the inputs member, which would be ok for me.

With solution 2, there is no such client-side ambiguity, as it's the server that would be responsible to prevent any ambiguity. The disadvantage that I was seeing with solution 2 is to reference existing schema over which you have no control, but that is probably easily worked around by defining a wrapping object that includes such a schema instead.

@fmigneault
Copy link
Contributor

This whole thread shows that the assumption of value not being needed was incorrect.
If inputs was already something that must be added, it shouldn't be much more effort to add value as well.

What I dislike the most about Solution 2 is that, while we offer the possibility to support any custom schema, we cannot allow those 2 specific exceptions (link.yaml or qualifiedValue), because they conflict with existing internal formats. This feels like bad design. Imagine JSON spec did the same, that they allowed you to write any schema, except you are not allowed to use a field named type because they use it internally to define their schema. Wouldn't it feel wrong? Also, it is only 2 exceptions for now, but how long until a new feature gets added to OGC-API and then a 3rd exception appears, if not more?

I don't disagree that the extra value is less natural than directly giving the data (if it wasn't error prone, I would like it too), but not allowing those specific schema exceptions feel more unnatural in my option than asking for value vs href. Furthermore, those two fields naturally indicate that the data is retrieved by literal value vs reference, which is a big plus for Solution 1. The value adds that sub-object where we can safely tell the user: "do whatever you want in there, its your own schema". There are no more special exceptions or ambiguity because value is directly evaluated against schema, and it becomes future proof.

@pvretano
Copy link
Contributor Author

Just thinking out loud here ... in order to preserve the ability to reference objects directly as input values maybe we need to change the way we reference values ...

"A": {
   "valueReference": {
      "href": "... some url ...",
      "type": "applcation/blah-blah"
   }
}

@jerstlouis
Copy link
Member

jerstlouis commented May 13, 2021

@pvretano I don't like that, because Workflows adds two more (as @fmigneault just prophetized):

  • CollectionInput ({ "collection" : ".../collections/{collectionId}" }, and
  • ProcessInput (a nested execute.yaml)

@fmigneault
I'm happy with Solution 1 if we can convince Routes.
I know @cportele did not like the { "value" : ... } overhead, but maybe only for objects we can agree to accept it (so far it only impacts the waypoints property which takes in a GeoJSON MultiPoint).

Personally I think it's a much bigger problem with numbers and strings which are very compact, than it is with objects which are typically a lot of stuff, so the noise is relatively minimal, and I agree worth it to avoid the ambiguity.

FYI solution 1 was my original proposal: because of these ambiguity I suggested not allowing omitting "value" for objects. (See #155)

@fmigneault
Copy link
Contributor

@pvretano
That could be one way, but you could technically still have a clash with that exact format. It would be best to have a "fresh" field with no other schema under it such that the user can plug whatever they want inside it.

@fmigneault
Copy link
Contributor

@jerstlouis
I think that direct values could still be valid if you give a bare string, float, etc. Those literals are safe since it is clear their expected interpretation is to be used as is. Only objects are problematic,

@jerstlouis
Copy link
Member

@fmigneault Yes, of course :) We agreed to that, I would definitely not want to lose those! (that definitely would derail the Routes harmonization)

@pvretano
Copy link
Contributor Author

@jerstlouis @fmigneault OK, just thinking out loud ... Hate to loose the ability to say "input": {...object...} but I can't think of another way to disambiguate everything. Open to suggestions though ...

@pvretano
Copy link
Contributor Author

@jerstlouis pointed out one additional input ambiguity which I think needs to be addressed too.

Some time ago we removed minOccurs and maxOccurs and instead said that input cardinality can be expressed using a the minItems/maxItems facets of JSON. This was a mistake because the JSON facets are now being overloaded to indicate both the cardinailty of a process input and potentially constrains on an input value that is an array.

The ambiguity that is introduced is not in the execute request but is a consequence of the fact that in the process description we cannot indicate which way an array input should be interpreted. Let me explain by way of example. Consider the following process input definition:

"A": {
  "schema": {
    "type": "array",
    "minItems": 2,
    "maxItems": 10
    "items": {
      "type": "integer"
    }
  }
}

and the following input that appears in an execute request:

"A": [1,2,3,4,5,6]

My question is: How does this server interpret this input? Well to be honest the interpretation is a bit fuzz.

On the one have the specification says that in a process description the schema key of each input defines the schema of a SINGLE input value and so we would interpret A as a SINGLE value that is an array. It just happens that this array is constrained to require at a minimum of 2 values and at a maximum of 10 values but this is orthogonal to this interpretation of the input.

On the other hand the specification also says that the cardinality of an input is specified by minItems and maxItems and so this input should be interpreted as an integer input with a minimum cardinality of 2 and a maximum cardinality of 10.

A further consequence of this ambiguity is related to link.yaml. The specification says that any SINGLE input value can be encoded inline or by reference but what is the single value here? It is the array or is it the item in the array? Again, unclear!

The solution is to re-introduce minOccurs and maxOccurs so that the cardinality of the input is something orthogonal to the schema of a SINGLE input value. This means that an array in the schema of a SINGLE input value actually indicates that the input value is an array and not the items in the array. With the proposed re-introduction of minOccurs and maxOccurs, the above A input value with the above definition of process input A would be interpreted as a SINGLE value that is an array of integers.

If , instead, I wanted this input to be interpreted as an integer input where the minimum cardinality is 2 and the maximum cardinality is 10 then, I would define the input this way:

"A": {
  "minOccurs": 2,
  "maxOccurs": 10,
  "schema": {
     "type": "integer"
  }
}

and so the value "A": [1,2,3,5,6] would be interpreted as six distinct integer inputs. ... and -- just for fun -- if I have this definition:

"A": {
  "minOccurs": 2,
  "maxOccurs": 10,
  "schema": {
    "type": "array",
    "minItems": 2,
    "maxItems": 10
    "items": {
      "type": "integer"
    }
  }
}

then the input value "A": [1,2,3,4,5,6] would be invalid and the server would throw an exception. A valid value for this new schema of A would be something like "A": [ [1,2,3,4,5,6],[1,2] ].

Another consequence of this would be that need to add a requirement that says that if the maxOccurs value in the definition of an input is greater than 1 then the value shall, in the execute request, be encode as an array where each array item validates against the schema specified as the value of the schema key for the corresponding input.

@jerstlouis
Copy link
Member

jerstlouis commented May 17, 2021

@pvretano About that last bit, that was already agreed in #129 and @bpross-52n had taken care of that in PR #161 .
b1c49ab is the commit where minOccurs/maxOccurs was removed, but that commit is also tangled with the initial addition of 3 levels of JSON Schema.

@pvretano
Copy link
Contributor Author

@jerstlouis ah, thanks ... I was looking for the issue where we removed minOccurs and maxOccurs but could not remember which on it was!

@jerstlouis
Copy link
Member

jerstlouis commented May 17, 2021

@pvretano and the PR for that commit was #172 , and the issue asking to remove minOccurs / maxOccurs was #170 .

Minoccurs and maxOccurs must be removed: this is superseded by minItems, maxItems (yes, you will need to declare an array explicitely if really willing to stuck to the JSON Schema approach)

I believe the request and the decision to accept it did not consider the fact that the schema in the input description is for a SINGLE input, and that must be the case because of the the associated rules allowing to replace a SINGLE input value by:

  • link.yaml
  • base64 encoded string for binary types
  • (with Part 3: Workflows) An OGC API collection or nested execute.yaml

@jerstlouis
Copy link
Member

NOTE: An alternative to reintroducing minOccurs / maxOccurs would be to use minItems / maxItems following the JSON Schema syntax (but outside of the schema), where if nothing is specified 0..* is assumed just like in JSON Schema.

I know we previously had this behavior, and then had reverted back to the WPS way where maxOccurs is 1 by default, but if we used minItems / maxItems, it might be clearer that this works just like inside a JSON Schema? I would certainly prefer that over the odd behavior of min/max occurs and the cumbersome maxOccurs that can be a oneOf string or integer.

@pvretano
Copy link
Contributor Author

pvretano commented May 17, 2021

@jerstlouis it is not just an arbitrary integer or string. It is integer or "unbounded". I kinda like that capability. What does it mean if maxItems is left out on JSON schema? I was searching the JSON schema specification and I could not find that. I see that if minItems is left out it defaults to 1.

@jerstlouis
Copy link
Member

jerstlouis commented May 17, 2021

@pvretano Correct, if either minOccurs or maxOccurs is left out they default to 1.

With JSON Schema, if minItems is left out it's 0, if maxItems is left out it's unlimited.
The question is whether we prefer to be:

  • consistent with JSONSchema (and avoid the cumbersome parsing-wise oneOf), or
  • consistent with WPS and stick to the traditional minOccurs/maxOccurs default

I would prefer to follow the JSONSchema way but whichever way will be fine as long as it's clear.

Either way, it needs to be clear that if maxItems is omitted or > 1, or maxOccurs is unbounded or > 1, then the input must be associated with an array [ ] of input values, whereas otherwise it is associated with a single input value.

@pvretano
Copy link
Contributor Author

There is one downside with using minItems/maxItems and interpreting their values as the JSON Schema specification does and that is that for almost ALL inputs you will need to specify "maxItems": 1. Otherwise the input will be considered an array with an unbounded cardinality. So, I think we should use minOccurs/maxOccurs with the specific semantics we want which is if either minOccurs or maxOccurs is omitted, the default value is 1.

@jerstlouis
Copy link
Member

jerstlouis commented May 17, 2021

@pvretano right, I thought of that.
Not sure I'd agree with almost ALL (I think multiple values for an input is a common scenario), but most, yes.

Both have advantages I think:

  • most developers will be familiar with the semantics & defaults of min/max items in JSON Schema, which is also used for the input value schema.
  • min/maxOccurs follow the UML default 1..1 (association) multiplicity which is conveniently the default for most processes inputs.

I am happy with either (as long as the default is clearly documented) -- this is a detail and it was really just a suggestion, for which the best thing would probably be to poll the SWG's preference as part of discussing the PR and tweak it based on that.

@pvretano pvretano linked a pull request May 17, 2021 that will close this issue
@fmigneault
Copy link
Contributor

fmigneault commented May 17, 2021

Personally, I prefer minOccurs, maxOccurs for the cardinality of the input, since it is specific to the process definition, and have minItems, maxItems for the schema of the single value, with is specific to JSON spec. It makes it clearer that they refer to distinct parts of the input (cardinality vs format of the value). The distinction is even more important because they have different defaults and interpretation by the service.

I prefer to preserve the default minOccurs/maxOccurs = 1 which doesn't align with JSON's defaults. Also, minItems, maxItems should be used only in the case where "array value" must be passed down as-is to the process. One could still have minOccurs, maxOccurs without minItems, maxItems to indicate, for example, variable amount of input URLs.

@pvretano
Copy link
Contributor Author

@fmigneault this is how I have encoded it in the PR associated with this issue. That is the cardinality of a process input is set by minOccurs/maxOccurs with default 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants