-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Access API: Download by Dataset #4529
Comments
@SamiSousa thanks for the suggestion. For a little more context, see our conversation in IRC: http://irclog.iq.harvard.edu/dataverse/2018-03-20#i_64759 |
@SamiSousa it was nice meeting you this afternoon at BU! I couldn't remember if you had opened any issues but found this one. Like I was saying, with 800+ issues I sometimes reach out to the person who originally opened the issue to see if they're still interested. I got the impression that you may or may not be interested in this issue long term, after your class is over, which is totally fine. We did discuss this issue during backlog grooming in late March but we're worried about performance implications primarily, even though the code to address this issue is probably straightforward. Also, is this issue on topic enough to link your team's video from here? I ask because I just checked my email and didn't get your message yet. We have pretty aggressive spam filtering enabled and I get a summary email at the end of the day that may give me the opportunity to see your email and have it delivered to my inbox. Thanks! |
@SamiSousa nevermind! I clicked "release and allow sender" in the antispam tool... ... and now I have a link to your video and code: |
Great meeting you too Phil! In the project, we ended up using the Search and Data Access APIs to list files and download individual files, so this specific feature isn't a high priority request from me. Hope this helps! |
@SamiSousa no problem. Thanks for clarifying. I just sent a message about your project to the Dataverse community at https://groups.google.com/d/msg/dataverse-community/P4llZSssZ2Q/zvhGltLpAQAJ and you are welcome to make sure I didn't misrepresent your project at all. The video is really interesting! Thanks for sharing! |
@SamiSousa questions are coming in already! Please see dataverse-broker/dataverse-broker#46 . Thanks! |
I guess I'll vote to close this issue if we have no intention of supporting this. |
I just spoke with @jggautier about this in the context of https://help.hmdc.harvard.edu/Ticket/Display.html?id=276556 and was telling him that I do think we should implement the ability to download all the files in a dataset based on the DOI or Handle of that dataset via API using a script (with a config option to turn off this feature for installations that don't want it). The workaround for figuring out from a browser the file IDs to pass to the Data Access API is to use dev tools (inspect element) and copy the curl command. For example: FirefoxChromeThen, once you have the crazy long URL (lots of extra junk in there), you can use it like this:
The part that matters is https://demo.dataverse.org/api/access/datafiles/307909,307910,307908?gbrecs=true and this is documented at http://guides.dataverse.org/en/4.14/api/dataaccess.html#multiple-file-bundle-download |
I would like to vote for this feature. It is critical to be able to do bulk downloads via wget or some other computer-to-computer solution. This is the beauty of the classic FTP folder full of files. It would be nice to be able to point any script to any dataverse DOI with /download appended to the end and know that it will fetching everything within. This could be turned off for dataverses, on for datasets, and configurable by the site admin and each dataverse and dataset admin. Alternatively, the dataverse site could auto-generate a script (bash? Python?) for each dataset to download all the data contained in that dataset. The National Snow and Ice Data Center (NSIDC) takes this approach. |
I recognize that the reason this issue is often closed is "server load issues". The advantage of generating a script the user can run is that the script could throttle the download. You'd need to trust users to not remove that part of the script though. A script also lets the user download multiple files, not just a ZIP file, and then zipping is not required on the server backend (although perhaps the web server itself, e.g. Apache not dataverse does on-the-fly compression for transfers). |
Hey @mankoff, we'll be implementing this after we make some optimizations to the zipping service in #6505. We have a full API suite documented at http://guides.dataverse.org/en/latest/api/index.html, so it would be possible to script things now. |
I made pull request #7086 for this issue. Feedback is welcome, of course. |
Added enum so we don't have two methods both with 3 String args.
Return UNAUTHORIZED instead of BAD_REQUEST and detailed error messages.
add API to download all files by dataset #4529
Is there any method of downloading all the files of a dataset using the Data Access API? Something like using he global_id of the dataset to download all files in a zip, similar to the bundle download. Thanks!
The text was updated successfully, but these errors were encountered: