Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schemas for Create/Replace/Update #740

Closed
cportele opened this issue Jul 13, 2022 · 13 comments · Fixed by #865
Closed

Schemas for Create/Replace/Update #740

cportele opened this issue Jul 13, 2022 · 13 comments · Fixed by #865

Comments

@cportele
Copy link
Member

This is feedback from developing a client.

The spec currently states:

The feature schema can and often will depend in the operation, so a Schema resource will publish the specific schema representations for the CREATE, REPLACE and UPDATE operations.

A typical case is that the server may require or prohibit the presence of an "id" member/property, depending on the id policy of the server and the operation. Or for update, all properties are optional.

However, the current text does not state, which difference from the schema of the returnables are allowed.

In our work, most APIs include some transformation between the data in the data store and the representation that is returned by the API. A simple example is a link to another feature in the same dataset. In the data store it is typically a foreign key column. In the API response we often include it as a link object with 'href' and 'title' properties. There are many other reasons for such transformations, e.g., that the feature representation should conform to some schema (e.g., INSPIRE) that differs somehow from the storage schema of the data.

The transformations will often not be bijective, so it is not an option to accept data in the returnables schema as payload of a create/replace/update request, because it cannot be mapped unambiguously to the database schema.

As a result, a client also needs to be able to request the feature data in a representation that directly matches the database schema in order to support editing feature attributes for a replace/update operation. Such a request is currently not specified and we had to implement it as a custom extension (we use a query parameter schema=receivables in the Feature request).

I think what we need to do in the standard is the following:

  1. Add requirements about the differences between the returnables schema and the create/replace/update schemas and restrict those to the resource id and whether a property is required/optional. This supports clients that want to edit a feature so that it can be replaced/updated.

  2. Decide whether we want to support other differences between the returnables schema and the create/replace/update schemas in an additional requirements class or if we consider this out-of-scope for Part 4 and whoever needs this (like us) has to implement their own extension. If added, the requirements class would have to support fetching a feature in the replace/update schema.

@cportele
Copy link
Member Author

Meetings 2022-08-15: @pvretano would probably have the same issue with the Records implementation. He will look into the issue and if that would also be a problem for him, too. We only need to say anything in the standard, if it is an issue for multiple implementations. If others have comments, their feedback would be valuable, too.

@cportele
Copy link
Member Author

cportele commented Oct 4, 2022

Maybe we do not need separate schemas for Create/Update? JSON Schema (and the OpenAPI schema variant) specifies annotations readOnly and writeOnly that could be used:

The boolean keywords readOnly and writeOnly are typically used in an API context. readOnly indicates that a value should not be modified. It could be used to indicate that a PUT request that changes a value would result in a 400 Bad Request response. writeOnly indicates that a value may be set, but will remain hidden. In could be used to indicate you can set a value with a PUT request, but it would not be included when retrieving that record with a GET request.

Spec reference: https://json-schema.org/draft/2020-12/json-schema-validation.html#name-readonly-and-writeonly

Since the mapping between readOnly and writeOnly properties is unspecified and unknown to a client, the client still needs a capability to retrieve a representation in the write-schema.

For Update (PATCH) a different schema is still required in any case as the differences cannot be expressed using annotations (no required properties, null must be allowed).

@cportele cportele self-assigned this Oct 24, 2022
@cportele
Copy link
Member Author

Meeting 2022-10-24:

  • Other implementations than the one from ii have this issue, too. Still open how to address this. In the standard case the properties in a GET response will be the same in a POST/PUT. For cases where POST/PUT expect different properties than GET API need to be able to provide a GET response with the POST/PUT properties; this would be a separate conformance class.
  • One issue with only relying on readOnly/writeOnly is how do we represent this in another schema language? At least we should mention writeOnly/readOnly in the text.
  • @cportele will work on a PR.

@cportele
Copy link
Member Author

cportele commented Apr 10, 2023

My proposal is documented below. We should discuss this, before I create a pull request.

The JSON Schema of a Feature Collection

Instead of requirements to create a JSON Schema for a GeoJSON feature, we should provide a encoding-independent JSON Schema that specifies all the feature properties. This follows the same approach as the Queryables, Sortables and Tile Set Metadata resources.

Note that APIs that want to encode describedby links to a JSON Schema for validating GeoJSON or JSON-FG instances can include such links, but this would be orthogonal and not relevant for Part 4. The encoding-specific schemas can be derived from the encoding-independent JSON Schema.

The encoding-independent JSON Schema should follow the schema recommendations for the Queryables resource.

The JSON Schema will use readOnly and writeOnly to identify properties that are only including in a feature in a response from the API (readOnly) or that are only relevant for CREATE/REPLACE/UPDATE operations (writeOnly).

The property that represents the featureId will be marked as readOnly. We could use a new keyword to tag that property as the local identifier of the feature (e.g., x-ogc-role: ID). Note that it is very likely that there will be rules for custom keywords in JSON Schema. That is, the keyword will depend on the result of that discussion.

We should also address #797 in this context, that is, specify rules how to express feature-feature relationships in such a schema. Note that this is separate from the considerations how to express relationships in the encoded data, where in a Web API a typical approach will be to represent them as a URI or web link consistent with RFC 8288.

A reference would then be represented as either a string or integer, the featureId of the referenced feature. We need additional keywords to provide the necessary information to identify the target feature, mainly the collectionId of the referenced features (x-ogc-collectionId) and, if the collection is provided by another OGC Web API, the URI of the API landing page (x-ogc-apiLandingPage).

Here is an example schema:

{
    "$schema": "https://json-schema.org/draft/2019-09/schema", 
    "title": "Roads", 
    "description": "A road is a metalled way for vehicles.", 
    "type": "object", 
    "required": ["id", "geometry", "type", "level"], 
    "properties": {
        "id": {"type": "integer", "x-ogc-role": "ID", "readOnly": true}, 
        "geometry": {
            "format": "geometry-linestring"                          
        }, 
        "type": {
            "title": "Type", 
            "type": "string", 
            "enum": ["Primary", "Motorway"]
        }, 
        "name": {"title": "Name", "type": "string"}, 
        "number": {"title": "Number", "type": "string"}, 
        "level": { "title": "Level", "type": "integer", "enum": [0, 1, 2] }, 
        "inMunicipality": {
            "title"             : "In Municipality"  , 
            "type"              : "integer"          ,
            "readOnly"          : true,
            "$comment"          : "readOnly, because it is derived from the data",
            "x-ogc-role"        : "reference", 
            "x-ogc-collectionId": "municipality"       
        }, 
        "maintainers": {
            "title": "Organizations maintaining this Road", 
            "type": "array", 
            "items": {
                "type": "string", 
                "x-ogc-role": "reference", 
                "x-ogc-collectionId": "organization", 
                "x-ogc-apiLandingPage": "https://www.example.com/api/v1/road-maintenance"
            }
        }
    }, 
    "additionalProperties": false
}

Encoding of feature relationships

It should be possible to request the encoding of the feature-relationships depending on the need of the client. In the work on JSON-FG it was identified that different use cases will prefer different representations of a feature-feature relationship in GeoJSON / JSON-FG (link).

IETF has specified RFC 6906 for such cases using the concept of profiles:

A profile is defined not to alter the semantics of the resource representation itself, but to allow clients to learn about additional semantics (constraints, conventions, extensions) that are associated with the resource representation, in addition to those defined by the media type and possibly other mechanisms.

The idea would be to define profiles for different ways to express a relationship in an instance.

The examples below use a GeoJSON representation, but this could apply to other media types, too. Some media types may only support a subset of the profiles. An example would be media types that only support scalar property values.

Example 1: As a link object with a URI and a human readable title (profile rel-as-link):

{
    "type": "Feature",
    "id": 1,
    "geometry": {...},
    "properties": {
        "type": "Primary",     
        "level": 1,
        "inMunicipality": {
            "href": "/collections/municipalities/items/15",
            "title": "Tanelorn"
        },
        "maintainers": [ {
            "href": "https://www.example.com/api/v1/road-maintenance/collections/organization/items/Acme",
            "title": "Acme Inc."
        } ]
    },
    "links:": [
        {
            "href": "http://www.opengis.net/def/profile/ogc/0/rel-as-link",
            "rel": "profile"
        }
    ]
}

Example 2: As a URI (profile rel-as-uri):

{
    "type": "Feature",
    "id": 1,
    "geometry": {...},
    "properties": {
        "type": "Primary",     
        "level": 1,
        "inMunicipality": "/collections/municipalities/items/15",
        "maintainers": [ "https://www.example.com/api/v1/road-maintenance/collections/organization/items/Acme" ]
    },
    "links:": [
        {
            "href": "http://www.opengis.net/def/profile/ogc/0/rel-as-uri",
            "rel": "profile"
        }
    ]
}

Example 3: Just the identifier (the default profile consistent with the encoding-independent schema, rel-as-key):

{
    "type": "Feature",
    "id": 1,
    "geometry": {...},
    "properties": {
        "type": "Primary",     
        "level": 1,
        "inMunicipality": 15,
        "maintainers": [ "Acme" ]
    }
}

Example 4: As a web link with a URI, a link relation type and other optional link attributes (profile rel-as-weblink):

{
    "type": "Feature",
    "id": 1,
    "geometry": {...},
    "properties": {
        "type": "Primary",     
        "level": 1
    },
    "links:": [
        {
            "href": "/collections/municipalities/items/15",
            "rel": "https://www.example.com/def/rel/within",
            "title": "Tanelorn"
        },
        {
            "href": "https://www.example.com/api/v1/road-maintenance/collections/organization/items/Acme",
            "rel": "https://www.example.com/def/rel/maintainer",
            "title": "Acme Inc."
        },
        {
            "href": "http://www.opengis.net/def/profile/ogc/0/rel-as-weblink",
            "rel": "profile"
        }
    ]
}

Note: If the response is a feature collection, the links also need an anchor since the link context is not the resource, but an embedded resource. For example, the JSON link object for the 10th feature would be the following, if the payload is GeoJSON or JSON-FG.

{
    "href" : "/collections/municipalities/items/15",
    "anchor": "#/features/9", 
    "rel"  : "https://www.example.com/def/rel/within", 
    "title": "Tanelorn"                              
}

Requesting a profile in a Features request

When requesting features via an API, a profile query parameter could be provided, e.g.: GET /collections/bar/items?profile=rel-as-uri. Some profile would be the default profile, maybe either rel-as-key or rel-as-link.

Note: There were also attempts to standardize HTTP content negotiation for profiles in IETF (HTTPAPI WG) and W3C (DXWG), but these efforts did not gain much traction.

Different clients / use cases will prefer different representations.

For example, for presentation to a human, a title and a resolvable URI may be important and the client uses rel-as-link.

For cases where the client has knowledge about the data structures and wants to process data, just the local id of the referenced object may be sufficient (rel-as-key).

@cportele
Copy link
Member Author

There was a discussion during the April 2023 Code Sprint with general agreement on the approach. As a result, we could get rid of the need for specific schemas for Create, Replace or Update and address #797 at the same time. As a consequence, the discussion of schemas could be moved to the Schema extension.

I have made minor updates to the previous comment based on the discussion.

In addition we discussed:

  • The names and values of the new schema keywords will need more consideration.
  • It would be good to have a schema for all collections and/or a collection that consists of the collection schemas.
  • We need to look more closely into the representations of object references and the building block(s) to request them.
  • The approach should be tested in implementations.
  • Another aspect is support for codelists, where a code is stored in the data, but which may be transformed into a human readable title when presenting the data (e.g., by linking to the codelist in an additional schema keyword). This is an orthogonal issue and we should open a separate issue.
  • We should develop a vocabulary for the additional keywords extending JSON Schema.

@cportele will now work on a pull request.

@pvretano @jerstlouis @maxcollombin - please correct/add anything that I missed or got wrong.

@cportele
Copy link
Member Author

cportele commented Jun 5, 2023

Before working on a pull request, I have made an implementation in ldproxy and updated the demonstration deployment at demo.ldproxy.net.

Some remarks on the implementation:

Additional x-ogc-role values

I have added additional x-ogc-role values, because clients may need/want that information, too. The full list that we currently have is:

Read-only / write-only properties

The property fid is readOnly, e.g., in https://demo.ldproxy.net/strassen/collections/unfaelle/schema, because it is an auto-increment column and cannot be set explicitly. Since it is also the ID column, a new feature can only be created with POST to '/collections/unfaelle/items', the featureId is assigned automatically.

The current demo APIs do not include any writeOnly properties.

Profile support

In our implementation, the profile parameter and link relation is supported for all collections, if at least one collection include a property with role reference. That is, profiles are supported in the demo.ldproxy.net/strassen API, but not the other APIs on demo.ldproxy.net.

Profile default

The implementation for now uses rel-as-link as the default profile. The main reason is that this is the most natural representation for encodings that support clickable links like HTML. rel-as-key is not that helpful for such encodings.

At the same time, rel-as-link will typically no be the best choice for other encodings, e.g., CSV, FlatGeobuf, glTF or Mapbox Vector Tiles. The approach we have used is that it will be format-dependent which profiles are supported and the format will negotiate a profile based on the requested profile.

An alternative approach would be format-specific defaults for the profile value.

Profile examples

Below are examples, how references are encoded in the different profiles. Note that the GeoJSON is flattened to allow clients without support for complex data structures to use the data.

This also works in the other formats (JSON-FG, HTML, CSV, FlatGeobuf), although CSV and FlatGeobuf only support rel-as-key and rel-as-uri for now.

The response also includes links to the profile, including in the HTTP response headers.

@cportele
Copy link
Member Author

Meeting 2023-06-19:

  • @cportele will start "Part 5: Schemas" and move schema related content from Part 3 (Queryables) and Part 4 (Create/Replace/Update schemas) to Part 5. The goal would be to move Parts 4 and 5 in parallel through the OGC process.
  • @pvretano will add links to their endpoints.

@m-mohr
Copy link
Contributor

m-mohr commented Jul 5, 2023

I recently became aware of Features - Part 5 and found the Create, Replace and Update shemas in there.
For me it is confusing how this related to the OpenAPI document. I heard from many presentations etc that the actual API and its schemas for request and response bodies should be described in an OpenAPI document, but then what's the difference between the OpenAPI schemas and the Part 5 schemas?

@jerstlouis
Copy link
Member

jerstlouis commented Jul 5, 2023

@m-mohr The OpenAPI documentation describes the API and the responses to the operations on different resources. Those responses may include schemas, which are normally specific to a particular media type.

The Features - Part 5: Schemas is mainly focused on describing the properties (measured/observed) of the data in a collection, in a manner agnostic of a particular representation or particular access mechanism. It describes the data model behind the API.

Note that this issue has gone beyond the original "Create/Replace/Update" scope from the title, and also addresses multiple profile and read/write access to properties. It is my hope that this can be a Common building block that we can also use in Coverages to describe the RangeType.

@m-mohr
Copy link
Contributor

m-mohr commented Jul 5, 2023

Thanks. Hmm, I still don't quite get it. As JSON Schema is JSON-specific, you can (afaik) also describe all this per collection in the API document (I'm not saying that I'd prefer that). It's just still not clear to me what the difference is and I think this should be made very clear in the specification.

@cportele
Copy link
Member Author

cportele commented Jul 5, 2023

@m-mohr

  1. The schema for a application/geo+json feature response in the OpenAPI document will describe the JSON content (with the id, geometry and properties member). What is provided for other formats in the OpenAPI document is less clear, varies a lot and is in general probably not very helpful for clients.

  2. At the same time, there are use cases where the client needs to understand the logical feature schema in an encoding/format-independent way. For example, when the client wants to understand the returnable properties in order to reduce the response to a subset with the properties query parameter. This request must be independent of the formats acceptable for the client. This is similar to the queryables that a client needs to know to create filter expressions. Since JSON is a simple, generic format for representing objects, the idea is to express the format-independent logical schema in JSON Schema - again, like for the queryables and sortables. If there is no need for the users of the API to know anything about feature schemas, there is no need for an API to support the Schema resource that Part 5 will introduce.

@joanma747
Copy link

Let me just add another reason for extending JSON Schema for feature collections:

Defining variables (observedProperties) that can be measured that has a definition samewhere else and has UoM.

This is a possible JSON Schema for that:

{
    "$schema": "https://json-schema.org/draft/2019-09/schema", 
    "title": "Thermometers", 
    "description": "c sensors in a city.", 
    "type": "object", 
    "required": ["geometry"], 
    "properties": {
        "geometry": {
            "type": "object",
            "properties": {
                "type": {"const": "Point"}, 
                "coordinates": {"type": "array"}
             }
        },
        "properties": {
            "type": "object",
            "properties": {
                 "owner": {
                    "description": "Owner of the thermometer",
                    "type": "string"
                 }        
                 "temperature": {
                    "description": "The current temperature",
                    "type": "number", 
		    "x-ogc-definition": "http://vocabs.lter-europe.net/EnvThes/22035",
		    "x-ogc-UoM": "Celsius",
		    "x-ogc-UoMSymbol": "C",
		    "x-ogc-UoMDefinition": "https://qudt.org/vocab/unit/DEG_C"
                 }        
             }
        }
    }
}

@cportele
Copy link
Member Author

For information: I have started to work on Part 5 in branch part5-schemas. This is incomplete, but it should be complete in a few days. I will then create a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment