Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add describedby field for the extended release schema #426

Open
irwink opened this issue Feb 13, 2017 · 21 comments
Open

Add describedby field for the extended release schema #426

irwink opened this issue Feb 13, 2017 · 21 comments
Labels
Focus - Packages Relating to release packages and record packages Schema Relating to other changes in the JSON Schema (renamed fields, schema properties, etc.)

Comments

@irwink
Copy link

irwink commented Feb 13, 2017

Edit: This issue effectively starts at #426 (comment)

$schema is meant for the "metaschema", not for the "schema". The linked comment proposes using a describedby field to link to the schema.


I suggest that a "$schema" field be added to all contracting data files. The "$schema" field's value would be either a single URI of the schema that the data claims to conform to, or a list of schema (e.g. if the data conforms the OCDS and and extension schema). This would be very useful from both a quality assurance perspective as well as parsing and consuming the contracting data. Programs would know which schema the data conforms to and how to properly parse them. This is especially useful if the data repository contains a mix of data files that conform to the OCDS, extension schema or some other schema.

An example would be (using the Paraguay sample data)

{
"uri": "https://www.contrataciones.gov.py/datos/record-package/273637.json",
"$schema": "http://standard.open-contracting.org/schema/1__0__1/release-schema.json",
"publisher": {
"uri": "https://contrataciones.gov.py/datos",
"legalName": "Dirección Nacional de Contrataciones Públicas, Paraguay",
"name": "DNCP - Paraguay"
},

@timgdavies
Copy link
Contributor

In OCDS 1.1 (see #301) we were planning to handle this with two properties:

  • Version
    and
  • Extensions

Although I note that JSON Schema notes that the $schema keyword can be used for both version and schema declaration.

The reason I believe for diverging from JSON Schema here was:

  • Many validators dereference the remote $schema by default, which can be frustrating for local development and validation against local schema;

  • $schema only allows a single value, not an array of values

But, other views on this welcome.

Assigning to @kindly and @Bjwebb to have a quick glance at whether we should alter the OCDS 1.1 approach before we're committed to it too strongly.

@irwink
Copy link
Author

irwink commented Feb 21, 2017

The suggestions in issue #301 would handle the extensions problem, however, an application would have to "know" that a JSON file claims to conform to the Open Contracting standard and would also have to know where the schema is located (to validate against it). Furthermore, if the JSON file repository contains a mixture of Open Contracting files and other non Open Contracting files, there is no predictable way to distinguish them. The use of a "$schema" field (or some other widely adopted equivalent) would provide an explicit schema reference (similar to a DOCTYPE declaration in a web page).

@mireille-raad
Copy link

in the same spirit, it would be useful to have a similar field to $schema but for extensions.
As implementations get more complex, and as multiple extensions are used, it would be useful to have a reference to all that somewhere.
Maybe the $extension would be a closed codelist of the official OCDS extensions.

@jpmckinney jpmckinney added the Schema Relating to other changes in the JSON Schema (renamed fields, schema properties, etc.) label Jul 27, 2017
@jpmckinney
Copy link
Member

jpmckinney commented Aug 26, 2017

Regarding extensions, couldn't $schema be a URL of a release schema that has been patched with the relevant extensions? The value of $schema in this case would not be useful for identifying the version of OCDS, but the purpose of $schema in JSON Schema is for validation - not for version identification.

@jpmckinney jpmckinney added Schema: Validation Relating to constraints in the JSON Schema and removed Schema Relating to other changes in the JSON Schema (renamed fields, schema properties, etc.) labels Aug 26, 2017
@jpmckinney jpmckinney changed the title Suggestion- Add $schema field to schema and contracting data Add $schema field to schema and contracting data Aug 26, 2017
@jpmckinney jpmckinney added the Schema Relating to other changes in the JSON Schema (renamed fields, schema properties, etc.) label Aug 26, 2017
@jpmckinney jpmckinney added this to the 1.2 milestone Feb 22, 2019
@jpmckinney jpmckinney removed the Schema: Validation Relating to constraints in the JSON Schema label Jul 17, 2020
@jpmckinney
Copy link
Member

Copying comment from open-contracting/infrastructure#89

Regarding versioning, this might be better handled by using the $schema property, which is part of JSON Schema. That property is standardized, and thus has a lot of existing tooling that understands it, and can use it to perform JSON Schema validation.

@kindly
Copy link
Contributor

kindly commented Oct 22, 2020

I think the use of the $schema flag is a good idea and really good for validators themselves to not need to json-merge-patch the extensions.
However, I am also worried about the publishers ability to do this compilation and to host a version of a new schema.

So in order to do this well I think we will need to host some kind of service that creates the extended schema for the publishers.

So a tool that you can select a set of extensions from the extension explorer and then compiles it and then gives a permanent URL for that generated schema, which is stored for ever.

The permanent url could be of the form:

http://standard-schemas.open-contracting.org/1__2__0/release-schema.json?bids=v1.1.5&budget=master

This will be cached on the service for a period. Doing it this way means the service will not have to actually store any new urls permanently (which would be a risk for example if there is data loss) as the schemas can be regenerated if needed.

The other benefit of having this service, is that we know that the extended schema is actually compliant with OCDS (as everything that runs through the service would be). Otherwise if a publisher linked to their own schema they could make the schema non compliment with core OCDS and we then would need to find a way to test that.

Without this service I think just having the extension list on the release level would be acceptable as well but not ideal.

@kindly
Copy link
Contributor

kindly commented Oct 22, 2020

Having codelist compilation outside the DRT would be really beneficial too.

So we would also need something like.
http://standard-schemas.open-contracting.org/1__2__0/codelists.zip?bids=v1.1.5&budget=master

@jpmckinney
Copy link
Member

Yes, the ProfileBuilder can do that work; it's what's used to patch schema and codelists for OCDS profiles (example output).

Building such a service makes sense to me. I'm hesitant about adding more infrastructure to the standard, but we can make it easily deployable (e.g. with a "Deploy to Heroku" button – not sure if any other PaaS offer something similar), so that anyone can host the service, so there isn't a single point of failure.

Another option would be to still require publishers to host the schema and codelist files, but for that schema file to be easily validated, e.g. it references the OCDS version and extensions it uses. The URL of the schema file can then be provided to a validation service, which reports whether the schema file matches what the above service would have generated (maybe excluding metadata properties like title and description so that it just checks the validation properties are as expected).

@jpmckinney jpmckinney added the Focus - Packages Relating to release packages and record packages label Oct 24, 2020
@kindly
Copy link
Contributor

kindly commented Oct 26, 2020

Perhaps we say that the publishers should host the schema and codelist files when publishing to production, but this service could be there to:

  • Act as a way to test out extensions by giving a temporary $schema url that should work whilst iterating on the data. This is so that they do not have to compile and host a new version of the schema/codelists for every extension change in order for validation to work correctly.
  • Have a download (zip) option so they can export the results to upload to their own server. This download can contain the fields to help the validation be easily completed (as outlined above).

This means the service could be self hosted and the more perminant $schema urls do not rely on this service to be running.

The other option is for this serv

@jpmckinney
Copy link
Member

@kindly Your last sentence seems to be cut off?

@kindly
Copy link
Contributor

kindly commented Oct 27, 2020

@jpmckinney oops.

I was going to say that we could have a way for the schema/codelists files to be uploaded to a service like s3 and stored permanently which could be owned by OCP. This would mean that the service itself would not need very good uptime/redundancy but the results should have it. The cost of this is likely very small, but would mean a potentially unknown permanent cost and may require some management on who could upload to it. Nonetheless, this could be the easiest route for publishers without OCP having to worry about uptime/redundancy of a service.

@jpmckinney
Copy link
Member

jpmckinney commented Oct 28, 2020

Sounds good to me! Once a PR is made for this issue, I'll create a follow-up issue in https://github.com/open-contracting/extension_registry.py, and another issue somewhere for creating this new service (maybe it's just another functionality of Toucan). This is in addition to all the other issues that will be created for a change in packaging.

@yolile
Copy link
Member

yolile commented Oct 29, 2020

Having the patched schema with all the extensions hosted somewhere will be very useful when using the flatten tool with the --use-titles feature. And also If a publisher wants to document all the fields that they are using, including extensions it will be easier for them to use the mapping-sheet command from ocdskit or toucan to create a data dictionary of their publication.

@yolile
Copy link
Member

yolile commented Oct 29, 2020

Although, isn't this kind of in conflict with #1084?

@jpmckinney
Copy link
Member

Although, isn't this kind of in conflict with #1084?

What is the conflict with #1084? The $schema field will appear on each release, not in the package.

@yolile
Copy link
Member

yolile commented Oct 29, 2020

Great! No conflicts then, based on #426 (comment) I thought that the $schema field would be at the package level. Maybe we should update the issue to "Add $schema field to release schema and contracting data"

@jpmckinney
Copy link
Member

jpmckinney commented Oct 29, 2020

Ah, we do also want a $schema field on the schema files (see #566). The issue description gives an example where $schema is on the package, but in this issue we've discussed to just put it on the release.

That said, I've re-read the JSON Schema specifications (04, latest), and $schema is explicitly and narrowly for "meta-schema" (that is, schema for validating schema) and it must be at the top-level. So, $schema is the correct field for #566, which doesn't interact with this issue.

Related to this issue, the 04 and latest versions of JSON Schema both recommend using Content-Type and Link headers to reference the schema (not the meta-schema) that a JSON file follows.

However, in the use cases we've witnessed, data might be downloaded and stored for later analysis, and the request headers are unlikely to be stored. It seems simpler to users if publishers reference the schema in the data itself. However, to avoid confusion/overlap with $schema, which has specific semantics, we can maybe use a plain schema field.

Of course, if a publisher is capable, they should set those headers when returning JSON data.

The latest JSON Schema draft has useful considerations around how servers should return, and how clients should request, schema files, to limit repeated network traffic for the same file. This will be especially relevant, since a package can contain thousands of releases, each with an identical schema field, and we wouldn't want that to cause thousands of requests.

@jpmckinney jpmckinney changed the title Add $schema field to schema and contracting data Add schema field to schema and contracting data Oct 29, 2020
@jpmckinney
Copy link
Member

jpmckinney commented Oct 29, 2020

Actually, building on #928, it might be best to do:

{
  "links": [
    {
      "rel": "describedby",
      "href": "https://..."
    }
  ]
}

@duncandewhurst
Copy link
Contributor

I agree that it sounds sensible to use links. Is any further discussion or consultation required before preparing a PR?

@jpmckinney
Copy link
Member

#928 would add the links field. For this issue, we'd have to also author tools to help publishers generate a patched schema (describedby is very unlikely to be used, otherwise). We'd also need to update tools to use this value instead of patching the release schema with extensions. We don't yet know whether we have the capacity to do that, so this issue might be postponed to a future version.

@jpmckinney jpmckinney changed the title Add schema field to schema and contracting data Add describedby field for the extended release schema Jun 7, 2023
@jpmckinney
Copy link
Member

Moving to 1.3.0/2.0.0 as we don't have the capacity to assist this transition with tooling, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Focus - Packages Relating to release packages and record packages Schema Relating to other changes in the JSON Schema (renamed fields, schema properties, etc.)
Projects
None yet
Development

No branches or pull requests

8 participants