Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate with OKFN data packages #2

Open
richfitz opened this issue Dec 18, 2015 · 7 comments
Open

Integrate with OKFN data packages #2

richfitz opened this issue Dec 18, 2015 · 7 comments

Comments

@richfitz
Copy link
Member

Add a packages.json file that contains metadata information probably satisfies most of the requirements.

@richfitz
Copy link
Member Author

richfitz commented Jan 8, 2016

Here's the website with a bit more information http://data.okfn.org/doc/data-package

Importantly, this can be additional to what we currently have and allow better interopability. I don't believe there is good R tooling for dealing with datapackages yet though.

@wcornwell
Copy link
Collaborator

So like for taxonlookup we take what's now in the github readme.md and put it in a .json file? I guess ideally it would also be in the R documentation also? I guess we need a system where the meta-data in one place (the json file?) is canonical and the other 2 are generated?

@dfalster
Copy link
Member

I really like the idea of the OKFN data packages, so in principle it would be great to support them. Depends how much work it is. Seems low cost.

Generating the readme from a single canonical source for metadata shouldn't be too hard. I tried something like this a while back, where i used a json file with metadata to write the readme. (see readme.Rmd in github.com/dfalster/Falster_2005_JEcol_data). Now I know there is now an actual preferred for that metadata.

@richfitz
Copy link
Member Author

Yeah, this is not too much work now that I have the automatic uploading thing worked out. We'd just hook into the same set of routines.

I think I'd opt to put the json in with the releases themselves, and have the URIs in the release json resolve to the github release URIs. So for taxonlookup it would read:

{
  "name" : "traitecoevo/taxonlookup",
  "title" : "A dynamically-updating versioned taxonomic resource for vascular plants",
  "license" : "CC0",
  "sources" : [{
    "name": "The plant list",
    "web": "http://www.theplantlist.org"
  }],
  "author": "Will Cornwell <wcornwell@gmail.com>",
  "contributors": [
    "Will Cornwell <wcornwell@gmail.com>",
    "Rich FitzJohn <rich.fitzjohn@gmail.com>",
    "Matt Pennell <mwpennell@gmail.com>"
  ],
  "version": "1.0.2",
  "resources": [{
    "url": "https://github.com/traitecoevo/taxonlookup/releases/download/v1.0.2/plant_lookup.csv",
    "name": "plant_lookup",
    "format": "csv",
    "hash": "sha1:cf6bb45eed09973d599e97fa8a6b8234c084e52a"
  }]
}

as you can see most of that is gettable from the DESCRIPTION file, so that's easy enough.

@dfalster
Copy link
Member

Looks good.

On Tue, Jan 12, 2016 at 8:20 PM, Rich FitzJohn notifications@github.com
wrote:

Yeah, this is not too much work now that I have the automatic uploading
thing worked out. We'd just hook into the same set of routines.

I think I'd opt to put the json in with the releases themselves, and have
the URIs in the release json resolve to the github release URIs. So for
taxonlookup it would read:

{
"name" : "traitecoevo/taxonlookup",
"title" : "A dynamically-updating versioned taxonomic resource for vascular plants",
"license" : "CC0",
"sources" : [{
"name": "The plant list",
"web": "http://www.theplantlist.org"
}],
"author": "Will Cornwell wcornwell@gmail.com",
"contributors": [
"Will Cornwell wcornwell@gmail.com",
"Rich FitzJohn rich.fitzjohn@gmail.com",
"Matt Pennell mwpennell@gmail.com [aut]"
],
"version": "1.0.2",
"resources": [{
"url": "https://github.com/traitecoevo/taxonlookup/releases/download/v1.0.2/plant_lookup.csv",
"name": "plant_lookup",
"format": "csv",
"hash": "sha1:cf6bb45eed09973d599e97fa8a6b8234c084e52a"
}]
}

as you can see most of that is gettable from the DESCRIPTION file, so
that's easy enough.


Reply to this email directly or view it on GitHub
#2 (comment).

@wcornwell
Copy link
Collaborator

I agree, the specific meta-data for the columns might take a bit of organizing...

BTW, I like the new datastorr release feature. Worked the first time.

@richfitz
Copy link
Member Author

The column specific meta-data is someone else's problem, I think. Not all the data stored this way will be tabular, in any case. So as long as there's a facility for including it (most trivially a json file somewhere in the repo that would get slurped in).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants