Add describedby field for the extended release schema #426

irwink · 2017-02-13T14:04:19Z

Edit: This issue effectively starts at #426 (comment)

$schema is meant for the "metaschema", not for the "schema". The linked comment proposes using a describedby field to link to the schema.

I suggest that a "$schema" field be added to all contracting data files. The "$schema" field's value would be either a single URI of the schema that the data claims to conform to, or a list of schema (e.g. if the data conforms the OCDS and and extension schema). This would be very useful from both a quality assurance perspective as well as parsing and consuming the contracting data. Programs would know which schema the data conforms to and how to properly parse them. This is especially useful if the data repository contains a mix of data files that conform to the OCDS, extension schema or some other schema.

An example would be (using the Paraguay sample data)

{
"uri": "https://www.contrataciones.gov.py/datos/record-package/273637.json",
"$schema": "http://standard.open-contracting.org/schema/1__0__1/release-schema.json",
"publisher": {
"uri": "https://contrataciones.gov.py/datos",
"legalName": "Dirección Nacional de Contrataciones Públicas, Paraguay",
"name": "DNCP - Paraguay"
},

The text was updated successfully, but these errors were encountered:

timgdavies · 2017-02-20T15:46:35Z

In OCDS 1.1 (see #301) we were planning to handle this with two properties:

Version
and
Extensions

Although I note that JSON Schema notes that the $schema keyword can be used for both version and schema declaration.

The reason I believe for diverging from JSON Schema here was:

Many validators dereference the remote $schema by default, which can be frustrating for local development and validation against local schema;
$schema only allows a single value, not an array of values

But, other views on this welcome.

Assigning to @kindly and @Bjwebb to have a quick glance at whether we should alter the OCDS 1.1 approach before we're committed to it too strongly.

irwink · 2017-02-21T12:50:49Z

The suggestions in issue #301 would handle the extensions problem, however, an application would have to "know" that a JSON file claims to conform to the Open Contracting standard and would also have to know where the schema is located (to validate against it). Furthermore, if the JSON file repository contains a mixture of Open Contracting files and other non Open Contracting files, there is no predictable way to distinguish them. The use of a "$schema" field (or some other widely adopted equivalent) would provide an explicit schema reference (similar to a DOCTYPE declaration in a web page).

mireille-raad · 2017-05-08T02:19:36Z

in the same spirit, it would be useful to have a similar field to $schema but for extensions.
As implementations get more complex, and as multiple extensions are used, it would be useful to have a reference to all that somewhere.
Maybe the $extension would be a closed codelist of the official OCDS extensions.

jpmckinney · 2017-08-26T01:10:56Z

Regarding extensions, couldn't $schema be a URL of a release schema that has been patched with the relevant extensions? The value of $schema in this case would not be useful for identifying the version of OCDS, but the purpose of $schema in JSON Schema is for validation - not for version identification.

jpmckinney · 2020-10-06T16:19:08Z

Copying comment from open-contracting/infrastructure#89

Regarding versioning, this might be better handled by using the $schema property, which is part of JSON Schema. That property is standardized, and thus has a lot of existing tooling that understands it, and can use it to perform JSON Schema validation.

kindly · 2020-10-22T11:41:12Z

I think the use of the $schema flag is a good idea and really good for validators themselves to not need to json-merge-patch the extensions.
However, I am also worried about the publishers ability to do this compilation and to host a version of a new schema.

So in order to do this well I think we will need to host some kind of service that creates the extended schema for the publishers.

So a tool that you can select a set of extensions from the extension explorer and then compiles it and then gives a permanent URL for that generated schema, which is stored for ever.

The permanent url could be of the form:

http://standard-schemas.open-contracting.org/1__2__0/release-schema.json?bids=v1.1.5&budget=master

This will be cached on the service for a period. Doing it this way means the service will not have to actually store any new urls permanently (which would be a risk for example if there is data loss) as the schemas can be regenerated if needed.

The other benefit of having this service, is that we know that the extended schema is actually compliant with OCDS (as everything that runs through the service would be). Otherwise if a publisher linked to their own schema they could make the schema non compliment with core OCDS and we then would need to find a way to test that.

Without this service I think just having the extension list on the release level would be acceptable as well but not ideal.

kindly · 2020-10-22T13:12:02Z

Having codelist compilation outside the DRT would be really beneficial too.

So we would also need something like.
http://standard-schemas.open-contracting.org/1__2__0/codelists.zip?bids=v1.1.5&budget=master

jpmckinney · 2020-10-22T17:08:40Z

Yes, the ProfileBuilder can do that work; it's what's used to patch schema and codelists for OCDS profiles (example output).

Building such a service makes sense to me. I'm hesitant about adding more infrastructure to the standard, but we can make it easily deployable (e.g. with a "Deploy to Heroku" button – not sure if any other PaaS offer something similar), so that anyone can host the service, so there isn't a single point of failure.

Another option would be to still require publishers to host the schema and codelist files, but for that schema file to be easily validated, e.g. it references the OCDS version and extensions it uses. The URL of the schema file can then be provided to a validation service, which reports whether the schema file matches what the above service would have generated (maybe excluding metadata properties like title and description so that it just checks the validation properties are as expected).

kindly · 2020-10-26T16:07:45Z

Perhaps we say that the publishers should host the schema and codelist files when publishing to production, but this service could be there to:

Act as a way to test out extensions by giving a temporary $schema url that should work whilst iterating on the data. This is so that they do not have to compile and host a new version of the schema/codelists for every extension change in order for validation to work correctly.
Have a download (zip) option so they can export the results to upload to their own server. This download can contain the fields to help the validation be easily completed (as outlined above).

This means the service could be self hosted and the more perminant $schema urls do not rely on this service to be running.

The other option is for this serv

jpmckinney · 2020-10-26T22:36:01Z

@kindly Your last sentence seems to be cut off?

kindly · 2020-10-27T09:34:37Z

@jpmckinney oops.

I was going to say that we could have a way for the schema/codelists files to be uploaded to a service like s3 and stored permanently which could be owned by OCP. This would mean that the service itself would not need very good uptime/redundancy but the results should have it. The cost of this is likely very small, but would mean a potentially unknown permanent cost and may require some management on who could upload to it. Nonetheless, this could be the easiest route for publishers without OCP having to worry about uptime/redundancy of a service.

jpmckinney · 2020-10-28T21:05:47Z

Sounds good to me! Once a PR is made for this issue, I'll create a follow-up issue in https://github.com/open-contracting/extension_registry.py, and another issue somewhere for creating this new service (maybe it's just another functionality of Toucan). This is in addition to all the other issues that will be created for a change in packaging.

yolile · 2020-10-29T14:24:14Z

Having the patched schema with all the extensions hosted somewhere will be very useful when using the flatten tool with the --use-titles feature. And also If a publisher wants to document all the fields that they are using, including extensions it will be easier for them to use the mapping-sheet command from ocdskit or toucan to create a data dictionary of their publication.

yolile · 2020-10-29T14:26:07Z

Although, isn't this kind of in conflict with #1084?

jpmckinney · 2020-10-29T17:56:01Z

Although, isn't this kind of in conflict with #1084?

What is the conflict with #1084? The $schema field will appear on each release, not in the package.

yolile · 2020-10-29T18:20:02Z

Great! No conflicts then, based on #426 (comment) I thought that the $schema field would be at the package level. Maybe we should update the issue to "Add $schema field to release schema and contracting data"

jpmckinney · 2020-10-29T19:12:25Z

Ah, we do also want a $schema field on the schema files (see #566). The issue description gives an example where $schema is on the package, but in this issue we've discussed to just put it on the release.

That said, I've re-read the JSON Schema specifications (04, latest), and $schema is explicitly and narrowly for "meta-schema" (that is, schema for validating schema) and it must be at the top-level. So, $schema is the correct field for #566, which doesn't interact with this issue.

Related to this issue, the 04 and latest versions of JSON Schema both recommend using Content-Type and Link headers to reference the schema (not the meta-schema) that a JSON file follows.

However, in the use cases we've witnessed, data might be downloaded and stored for later analysis, and the request headers are unlikely to be stored. It seems simpler to users if publishers reference the schema in the data itself. However, to avoid confusion/overlap with $schema, which has specific semantics, we can maybe use a plain schema field.

Of course, if a publisher is capable, they should set those headers when returning JSON data.

The latest JSON Schema draft has useful considerations around how servers should return, and how clients should request, schema files, to limit repeated network traffic for the same file. This will be especially relevant, since a package can contain thousands of releases, each with an identical schema field, and we wouldn't want that to cause thousands of requests.

jpmckinney · 2020-10-29T22:59:21Z

Actually, building on #928, it might be best to do:

{
  "links": [
    {
      "rel": "describedby",
      "href": "https://..."
    }
  ]
}

duncandewhurst · 2021-11-30T01:37:24Z

I agree that it sounds sensible to use links. Is any further discussion or consultation required before preparing a PR?

jpmckinney · 2021-11-30T16:23:50Z

#928 would add the links field. For this issue, we'd have to also author tools to help publishers generate a patched schema (describedby is very unlikely to be used, otherwise). We'd also need to update tools to use this value instead of patching the release schema with extensions. We don't yet know whether we have the capacity to do that, so this issue might be postponed to a future version.

jpmckinney · 2023-06-07T02:55:59Z

Moving to 1.3.0/2.0.0 as we don't have the capacity to assist this transition with tooling, etc.

timgdavies assigned Bjwebb and kindly Feb 20, 2017

jpmckinney added the Schema Relating to other changes in the JSON Schema (renamed fields, schema properties, etc.) label Jul 27, 2017

jpmckinney added Schema: Validation Relating to constraints in the JSON Schema and removed Schema Relating to other changes in the JSON Schema (renamed fields, schema properties, etc.) labels Aug 26, 2017

jpmckinney changed the title ~~Suggestion- Add $schema field to schema and contracting data~~ Add $schema field to schema and contracting data Aug 26, 2017

jpmckinney added the Schema Relating to other changes in the JSON Schema (renamed fields, schema properties, etc.) label Aug 26, 2017

jpmckinney unassigned Bjwebb and kindly Jan 1, 2018

jpmckinney mentioned this issue Nov 29, 2018

$schema should point to meta-schema incorporating our changes to JSON Schema #566

Open

jpmckinney mentioned this issue Dec 10, 2018

Add publisher field (release schema) #325

Closed

jpmckinney added this to the 1.2 milestone Feb 22, 2019

jpmckinney mentioned this issue Feb 27, 2019

add package schema and documentation open-contracting/infrastructure#89

Merged

jpmckinney mentioned this issue Mar 29, 2019

Upgrade docs: Interaction with release immutability and dates #849

Closed

jpmckinney mentioned this issue May 4, 2019

Metadata openownership/data-standard#135

Closed

jpmckinney removed the Schema: Validation Relating to constraints in the JSON Schema label Jul 17, 2020

jpmckinney mentioned this issue Oct 6, 2020

Deprecate remaining package metadata and add bulk data format #1084

Open

jpmckinney added the Focus - Packages Relating to release packages and record packages label Oct 24, 2020

jpmckinney changed the title ~~Add $schema field to schema and contracting data~~ Add schema field to schema and contracting data Oct 29, 2020

duncandewhurst mentioned this issue Sep 13, 2022

Add a field for the version of the standard the data is published to Open-Telecoms-Data/open-fibre-data-standard#91

Closed

jpmckinney changed the title ~~Add schema field to schema and contracting data~~ Add describedby field for the extended release schema Jun 7, 2023

jpmckinney modified the milestones: 1.2.0, 1.3.0 or 2.0.0 Jun 7, 2023

jpmckinney mentioned this issue Jun 7, 2023

Deprecate most package metadata #1621

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add describedby field for the extended release schema #426

Add describedby field for the extended release schema #426

irwink commented Feb 13, 2017 •

edited by jpmckinney

Loading

timgdavies commented Feb 20, 2017

irwink commented Feb 21, 2017

mireille-raad commented May 8, 2017

jpmckinney commented Aug 26, 2017 •

edited

Loading

jpmckinney commented Oct 6, 2020

kindly commented Oct 22, 2020 •

edited

Loading

kindly commented Oct 22, 2020

jpmckinney commented Oct 22, 2020

kindly commented Oct 26, 2020

jpmckinney commented Oct 26, 2020

kindly commented Oct 27, 2020 •

edited

Loading

jpmckinney commented Oct 28, 2020 •

edited

Loading

yolile commented Oct 29, 2020

yolile commented Oct 29, 2020

jpmckinney commented Oct 29, 2020

yolile commented Oct 29, 2020 •

edited

Loading

jpmckinney commented Oct 29, 2020 •

edited

Loading

jpmckinney commented Oct 29, 2020 •

edited

Loading

duncandewhurst commented Nov 30, 2021

jpmckinney commented Nov 30, 2021

jpmckinney commented Jun 7, 2023

Add describedby field for the extended release schema #426

Add describedby field for the extended release schema #426

Comments

irwink commented Feb 13, 2017 • edited by jpmckinney Loading

timgdavies commented Feb 20, 2017

irwink commented Feb 21, 2017

mireille-raad commented May 8, 2017

jpmckinney commented Aug 26, 2017 • edited Loading

jpmckinney commented Oct 6, 2020

kindly commented Oct 22, 2020 • edited Loading

kindly commented Oct 22, 2020

jpmckinney commented Oct 22, 2020

kindly commented Oct 26, 2020

jpmckinney commented Oct 26, 2020

kindly commented Oct 27, 2020 • edited Loading

jpmckinney commented Oct 28, 2020 • edited Loading

yolile commented Oct 29, 2020

yolile commented Oct 29, 2020

jpmckinney commented Oct 29, 2020

yolile commented Oct 29, 2020 • edited Loading

jpmckinney commented Oct 29, 2020 • edited Loading

jpmckinney commented Oct 29, 2020 • edited Loading

duncandewhurst commented Nov 30, 2021

jpmckinney commented Nov 30, 2021

jpmckinney commented Jun 7, 2023

irwink commented Feb 13, 2017 •

edited by jpmckinney

Loading

jpmckinney commented Aug 26, 2017 •

edited

Loading

kindly commented Oct 22, 2020 •

edited

Loading

kindly commented Oct 27, 2020 •

edited

Loading

jpmckinney commented Oct 28, 2020 •

edited

Loading

yolile commented Oct 29, 2020 •

edited

Loading

jpmckinney commented Oct 29, 2020 •

edited

Loading

jpmckinney commented Oct 29, 2020 •

edited

Loading