-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding caching to get_file_by_id
#135
Comments
Some comments on 5153bb2 meant to be helpful from my past experience; hope they come across in the right spirit --
I'd also really encourage keeping the housekeeping (e.g., removing trailing whitespace in the DESCRIPTION) to a separate commit (and eventual pull request), and editing the commit history on the branch so that it does not contain extraneous (commit one does whitespace changes and other things, commit two undoes the whitespace changes) steps. |
Much appreciated. Ben or I will consider soon. |
Thanks for contributing. I will have a look at these suggestions and get back to you soon. @kuriwaki not sure if I mentioned this before, but I had turned on caching only if you specified an exact version of the file. I agree that what you are suggesting is a sensible default - it would be nice to be able to cache the lastest version of the file on disk and only re-download when the file is updated on the Dataverse. |
Thanks @mtmorgan for the PR. I made a few comments. @beniaminogreen you can see if that does everything you were starting to do in your branch. If so, we can try to merge it in after #136 |
It would be useful if the package were able to cache results of calls to the DataVerse API to disk. If the same dataset is requested twice, the result can then be served from the disk instead of re-downloading which would save a lot of time.
Here's a sketch of how I think the behavior could work:
Suggested new behavior for
get_file_by_id
:More complex behavior could be added on in the future such as caching the latest version by checking if the file metadata has changed since the last download, and only re-downloading if there is a new version.
I am prototyping the behavior in the cache branch, and would love feedback on the behavior. Right now, I am caching all calls to
get_file_id
which have a specific version of the file specified. I will shortly add step 2 which checks if the user wants to turn off caching if we think this is a good way to go.Best,
Ben
The text was updated successfully, but these errors were encountered: