Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make snippets content/file type aware #84

Open
sarahrichmond opened this issue Apr 4, 2019 · 2 comments
Open

Make snippets content/file type aware #84

sarahrichmond opened this issue Apr 4, 2019 · 2 comments
Assignees
Labels
question Further information is requested

Comments

@sarahrichmond
Copy link

The snippets are a bit hit and miss. We need to find a way to make these more dynamic so they are aware of what file type they are downloading.

i.e. the snippets fall over when downloading a .zip file, which is unfortunately how a lot of files like shape files come.

This issue is for keeping track of conversations, and for sharing information from the KN team on what info we can get, and therefore how we might create a catalogue of snippets.

Our current implementation of snippets is copied below as a discussion starter:
Python:

# Publisher: Department of Sustainability and Environment
# Contact point: data.gov@finance.gov.au
# License: Creative Commons Attribution 3.0 Australia
# Full page: https://data.gov.au/dataset/755f2f61-b9fc-46e8-84d0-2e32ac448e8a 

import urllib.request
url = 'http://data.gov.au/storage/f/2013-05-12T210557/tmpgme16Yrecreational-fishing-spots.csv'
filename = 'tmpgme16Yrecreational-fishing-spots.csv'
urllib.request.urlretrieve(url, filename)

R

# Publisher: Department of Sustainability and Environment
# Contact point: data.gov@finance.gov.au
# License: Creative Commons Attribution 3.0 Australia
# Full page: https://data.gov.au/dataset/755f2f61-b9fc-46e8-84d0-2e32ac448e8a 

url <- "http://data.gov.au/storage/f/2013-05-12T210557/tmpgme16Yrecreational-fishing-spots.csv"
filename <- "tmpgme16Yrecreational-fishing-spots.csv"
download.file(url, destfile=filename)
@sarahrichmond sarahrichmond added the question Further information is requested label Apr 4, 2019
@jyucsiro
Copy link

jyucsiro commented Apr 9, 2019

Hi @sarahrichmond - @jevy-wangfei and I have been looking into this. In KN v2.0, which is used by the current ecocloud on prod, we don't have a field to check what format the resource listed in a dataset actually is. So the data provider can claim that the file type is "shapefile" when it is actually a zipfile.

Let's use the "2016 SoE Biodiversity NUmber of ALA records in 2012" dataset as an example. Here's what it looks like in prod ecocloud explorer:
image

Format from the data source metadata shows it's a "esri shapefile..." when it is actually a zipfile.

In the upcoming KN v2.1, we've implemented the "MAGDA format minion" which goes and checks the file format with some level of confidence. In the same example above, but in the dev/test ecocloud explorer (which points to our staging-dev KN instance running v2.1), it looks like this:

image

Format in that entry is "ZIP", which uses the field enriched in KN from the format minion (the source metadata still says it's "esri shapefile..."). So this should be available when we upgrade KN prod to the v2.1 release.

Jevy and I wondered whether it is worth displaying both and letting the user have that info?

@gweis
Copy link

gweis commented Apr 9, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants