-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describing data type: what exactly to describe and what controlled vocab(s) to use #14
Comments
first email This list should include
|
From @pieterprovoost
|
From @marc-portier
|
MARCO-BOLO-WP1 The intent is instead to identify the definition of a dataset type “series” in general? In that case I could suggest the INSPIRE registry, but it is focussed to spatial datasets. Series is defined by http://inspire.ec.europa.eu/metadata-codelist/ResourceType/series I try to find out something more abstract.. at the moment I have no better idea. |
Since we need this googlesheet release asap, Marc and I have chosen for https://www.iana.org/assignments/media-types/media-types.xhtml as the place to chose the MimeType from, and that is what the column is now called. Please shout if you disagree |
@kmexter I wonder how useful this is if we are not collecting distribution URLs at the same time. How are we going to use this information? A MIME type is a property of a specific file, not of a dataset. Most datasets will include a variety of MIME types, Darwin Core archives for example are collections of |
well, yes and no. It is useful to the person looking at the record ("ah, these are image data, yes I want image data"), but to ODIS it may not be useful information. It is a bit like the usefulness that keywords provides, in my mind. |
Also...we could collect the distribution URLs - I mean, there is a field for it in the ODIS online example, so I am a bit uncertain why we are not asking for this from the MBO peeps also. It depends on the purpose of the ODIS record, I guess: for data already published in a catalogue, this record is a secondary one, but for data NOT already published, then this would be the primary record..... |
I agree that the media-type is only meaningful when associated to a downloadURL of the distribution (and then it is also obvious there is only one) I also agree that in many cases the mime-type has only limited value -- but better then nothing? Next level would be the a schema conformity of the dataset (as suggested as one of the other apsects) |
More comments welcome, everyone from MBO WP1! As I do need to know which to do, ideally by end of this week |
I am leaning away from mime type now. For me, the point of this was to allow scientists to understand what is in the dataset before they bother to download it. So I would have this field as a literal -- because we cannot accommodate via shema all the data types. I would suggest using:
that, or get rid of this metadatum completely. |
May I suggest the following, which also covers sequence data. If we can find a term for sequence data in some other ontology, it can go into {
"@type": "Dataset",
"hasPart": [
{
"@type": "ImageObject"
},
{
"@type": "TextObject",
"encodingFormat": "text/csv"
},
{
"@type": "TextObject",
"encodingFormat": "text/fasta"
},
]
} Alternatively, we can use {
"@type": "Dataset",
"distribution": [
{
"@type": "DataDownload",
"additionalType": "ImageObject"
},
{
"@type": "DataDownload",
"additionalType": "TextObject",
"encodingFormat": "text/csv"
},
{
"@type": "DataDownload",
"additionalType": "TextObject",
"encodingFormat": "text/fasta"
},
]
} In any case, don't use |
Thumbs-up to that suggestion, Pieter. |
minor glitch probably:
Type-names are typically Uppercase --> https://schema.org/AudioObject |
When we have enough datasets from where we can harvest metadata (and perhaps ping those data), we can go further on this. For source data I think it is unlikely we will get this info, as it is not routinely held in metadata records (with some notable exceptions) but we should ask ourselves if we really want to push the WPs into providing this for the data that they create in MBO. We are already struggling to get info from them! TBD. |
We think it is useful to add metadata describing the type of data that a dataset is describing, but we are not sure exactly what we want to describe here
In this issue we need to decide on this, and decide on the semantics to use.
We have to decide whether we want this metadatum to be useful as a piece of technical information (e.g. for OceanInfoHub) or for the audience (scientists, who are also those providing the descriptions in the first place). Personally, I think the second is better, mainly because the scientists describing the data will find that easier.
I copy below the discussion we have had so far in email
The text was updated successfully, but these errors were encountered: