-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add describedby field for the extended release schema #426
Comments
In OCDS 1.1 (see #301) we were planning to handle this with two properties:
Although I note that JSON Schema notes that the $schema keyword can be used for both version and schema declaration. The reason I believe for diverging from JSON Schema here was:
But, other views on this welcome. Assigning to @kindly and @Bjwebb to have a quick glance at whether we should alter the OCDS 1.1 approach before we're committed to it too strongly. |
The suggestions in issue #301 would handle the extensions problem, however, an application would have to "know" that a JSON file claims to conform to the Open Contracting standard and would also have to know where the schema is located (to validate against it). Furthermore, if the JSON file repository contains a mixture of Open Contracting files and other non Open Contracting files, there is no predictable way to distinguish them. The use of a "$schema" field (or some other widely adopted equivalent) would provide an explicit schema reference (similar to a DOCTYPE declaration in a web page). |
in the same spirit, it would be useful to have a similar field to $schema but for extensions. |
Regarding extensions, couldn't |
Copying comment from open-contracting/infrastructure#89
|
I think the use of the $schema flag is a good idea and really good for validators themselves to not need to json-merge-patch the extensions. So in order to do this well I think we will need to host some kind of service that creates the extended schema for the publishers. So a tool that you can select a set of extensions from the extension explorer and then compiles it and then gives a permanent URL for that generated schema, which is stored for ever. The permanent url could be of the form:
This will be cached on the service for a period. Doing it this way means the service will not have to actually store any new urls permanently (which would be a risk for example if there is data loss) as the schemas can be regenerated if needed. The other benefit of having this service, is that we know that the extended schema is actually compliant with OCDS (as everything that runs through the service would be). Otherwise if a publisher linked to their own schema they could make the schema non compliment with core OCDS and we then would need to find a way to test that. Without this service I think just having the extension list on the release level would be acceptable as well but not ideal. |
Having codelist compilation outside the DRT would be really beneficial too. So we would also need something like. |
Yes, the ProfileBuilder can do that work; it's what's used to patch schema and codelists for OCDS profiles (example output). Building such a service makes sense to me. I'm hesitant about adding more infrastructure to the standard, but we can make it easily deployable (e.g. with a "Deploy to Heroku" button – not sure if any other PaaS offer something similar), so that anyone can host the service, so there isn't a single point of failure. Another option would be to still require publishers to host the schema and codelist files, but for that schema file to be easily validated, e.g. it references the OCDS version and extensions it uses. The URL of the schema file can then be provided to a validation service, which reports whether the schema file matches what the above service would have generated (maybe excluding metadata properties like |
Perhaps we say that the publishers should host the schema and codelist files when publishing to production, but this service could be there to:
This means the service could be self hosted and the more perminant The other option is for this serv |
@kindly Your last sentence seems to be cut off? |
@jpmckinney oops. I was going to say that we could have a way for the schema/codelists files to be uploaded to a service like s3 and stored permanently which could be owned by OCP. This would mean that the service itself would not need very good uptime/redundancy but the results should have it. The cost of this is likely very small, but would mean a potentially unknown permanent cost and may require some management on who could upload to it. Nonetheless, this could be the easiest route for publishers without OCP having to worry about uptime/redundancy of a service. |
Sounds good to me! Once a PR is made for this issue, I'll create a follow-up issue in https://github.com/open-contracting/extension_registry.py, and another issue somewhere for creating this new service (maybe it's just another functionality of Toucan). This is in addition to all the other issues that will be created for a change in packaging. |
Having the patched schema with all the extensions hosted somewhere will be very useful when using the flatten tool with the --use-titles feature. And also If a publisher wants to document all the fields that they are using, including extensions it will be easier for them to use the |
Although, isn't this kind of in conflict with #1084? |
Great! No conflicts then, based on #426 (comment) I thought that the $schema field would be at the package level. Maybe we should update the issue to "Add $schema field to release schema and contracting data" |
Ah, we do also want a That said, I've re-read the JSON Schema specifications (04, latest), and Related to this issue, the 04 and latest versions of JSON Schema both recommend using Content-Type and Link headers to reference the schema (not the meta-schema) that a JSON file follows. However, in the use cases we've witnessed, data might be downloaded and stored for later analysis, and the request headers are unlikely to be stored. It seems simpler to users if publishers reference the schema in the data itself. However, to avoid confusion/overlap with Of course, if a publisher is capable, they should set those headers when returning JSON data. The latest JSON Schema draft has useful considerations around how servers should return, and how clients should request, schema files, to limit repeated network traffic for the same file. This will be especially relevant, since a package can contain thousands of releases, each with an identical |
Actually, building on #928, it might be best to do: {
"links": [
{
"rel": "describedby",
"href": "https://..."
}
]
} |
I agree that it sounds sensible to use |
#928 would add the |
Moving to 1.3.0/2.0.0 as we don't have the capacity to assist this transition with tooling, etc. |
Edit: This issue effectively starts at #426 (comment)
$schema
is meant for the "metaschema", not for the "schema". The linked comment proposes using adescribedby
field to link to the schema.I suggest that a "$schema" field be added to all contracting data files. The "$schema" field's value would be either a single URI of the schema that the data claims to conform to, or a list of schema (e.g. if the data conforms the OCDS and and extension schema). This would be very useful from both a quality assurance perspective as well as parsing and consuming the contracting data. Programs would know which schema the data conforms to and how to properly parse them. This is especially useful if the data repository contains a mix of data files that conform to the OCDS, extension schema or some other schema.
An example would be (using the Paraguay sample data)
{
"uri": "https://www.contrataciones.gov.py/datos/record-package/273637.json",
"$schema": "http://standard.open-contracting.org/schema/1__0__1/release-schema.json",
"publisher": {
"uri": "https://contrataciones.gov.py/datos",
"legalName": "Dirección Nacional de Contrataciones Públicas, Paraguay",
"name": "DNCP - Paraguay"
},
The text was updated successfully, but these errors were encountered: