-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Downloading the whole archive for Weaver (2019) fails #5999
Comments
@davidkane9 hi! Is there any more information in MANIFEST.TXT about the missing files? (I'm going to guess that this issue is related: Download All Error Reporting - Hidden in Manifest if not all files are downloaded (4.9.4) #5588 ) At https://twitter.com/dataverseorg/status/1144729141138165764 Harvard Dataverse tweeted about mitigating stability issues. One step was to lower A workaround would be to write a script to download each file individually. You can get a list of file IDs for that dataset from https://dataverse.harvard.edu/api/datasets/export?exporter=dataverse_json&persistentId=doi%3A10.7910/DVN/QTOIQF ("Export Metadata" then "JSON"). |
@davidkane9 in case it's of any use/interest, I wrote a script to download all files of a given dataset in original format: https://github.com/OdumInstitute/dataverse-toolbox/blob/master/download_dataset_original_format.py if you don't want all-originals I could make that a flag. |
@donsizemore I forgot your wrote this script! Some quick thoughts:
|
This morning I was over at the Science Center and noticed a stack of flyers for a class that @davidkane9 teaches: @davidkane9 I'm thinking I should write a script as a workaround to get all those files. It's something @sbarbosadataverse asked for recently and would make a good addition to the "Finding and Downloading Data" section of the Dataverse API Guide: http://guides.dataverse.org/en/4.16/api/getting-started.html#finding-and-downloading-data Would that be helpful? Is Python ok? Did you try https://github.com/OdumInstitute/dataverse-toolbox/blob/master/download_dataset_original_format.py ? Longer term we're hoping to fix this with #6093 or similar so that a script isn't necessary. Also, that class sound great! 😄 |
FWIW: might be interesting to see if the archiving functionality manages to create a zipped Bag (e.g. configure it to use Odum's file archiver to just drop the bag in the file system). If that works, it might make sense to adopt the zipping code there (based on apache's parallel zip and retrieving the files as needed on a pool of threads). The code itself can create zips up ~a terabyte/100K+ files - haven't tried it at that scale on a Dataverse dataset though. (If this doesn't work, the perhaps archiving will need #6093 also!) |
Thanks! And sorry for the delay in responding. |
Consider this Dataverse entry:
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/QTOIQF
Try downloading the entire archive, all 128 files. You will get a zip file as normal, but it won't include all the files! This is what I get:
But it is missing many files. For example, the main_text directory should include a bunch of stuff, not just one .R file.
The text was updated successfully, but these errors were encountered: