-
Notifications
You must be signed in to change notification settings - Fork 591
Machine readable data dictionaries inside data.json #332
Comments
I understood that the JSON-LD format was attempting to meet the need of a standard way to have data values refer to an external source for the definitions of the data. (Which is what I think is your goal for a Data Dictionary link). |
Agreed that linking is straightforward. I started the issue to suggest a format for describing field specific information within this or a related metadata schema. |
thanks for raising this @exafox. Well add this to the discussion topics for the metadata offsite tomorrow and be sure to report back what was discussed here for those who cant make it in person. |
I definitely like the idea of encourage agencies to provide more machine readable data dictionaries. @exafox - do you know of any standards or examples we could point to? |
To reference machine readable data dictionaries in a tightly coupled way, you'd really want to be able to do it on the distribution level. Let me suggest two new fields for this:
This is inspired by the widespread use of link relations - including the "describedby" relation. See similar uses with JSON Hyper Schema and the Protocol for Web Description Resources (wdrs:describedby is even used in ADMS which is a profile of DCAT) The more flexible way of using link relations would be to abstract it one level out and enable lots of link relations, so
This is a more common way of doing this kind of link relation, but it adds a bit of extra complexity and might not be worth it. We could have So I'm in favor of doing this with |
Changes that still need to be addressed are changes in structure and should we add usage notes additions here or no?: * Adds optional describedByType field at the dataset and distribution level (#291, #332) * Changes contactPoint field to an object that contains the name (fn) and email address (hasEmail) (#358) * Adds fn field as part of contactPoint replacing earlier use of contactPoint (#358) * Changes publisher field to an object that allows multiple levels of organizations (#296) * Changes accessURL field to represent indirect access and to exist only within distribution (#217, #335) * Changes format field to a human readable description and to exist only within distribution (#272, #293) * Adds optional description field for use within distribution (#248) * Adds optional title field for use within distribution (#248) * Changes accrualPeriodicity field to use ISO 8601 date syntax (#292) * Changes distribution field to become required-if-applicable and to always contain the accessURL or downloadURL fields (#217) * Changes license field to be a URL (#196)
First pass at including |
There are several ways we've addressed this. We're now able to reference definitions for both the metadata itself and the data from within the metadata. To reference definitions of the metadata, we can now use To reference definitions for the data, we can now use If you have additional feedback on using these fields, please add it to the relevant issue or open a new one as needed. |
As an extension to the current metadata schema, it would be useful to have one standard way to store data dictionary information to enable future collaboration and integrations.
Some work has taken place in this area, but I am not aware of a format that is universally accepted and also relatable to CKAN and the work done to support project open data. I hope/expect these other efforts can be rolled up in this format, and not duplicated or discarded.
Key traits of the format might include:
The text was updated successfully, but these errors were encountered: