-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API call for total size of specified dataverse #5848
Comments
@CCMumma thanks for creating this issue. The following issues are related:
|
@CCMumma - since this is concerning storage space, are you only worried about files? Or metadata as well? |
In sprint planning I offered to link back to the original message: |
We are more interested in content storage, but including metadata, or having a sep. call for md, would also be valuable. |
Thanks @CCMumma ! |
@CCMumma Hi, I've put together a new API call for reporting the total file storage size. However, if you go to the filesystem where the datasets are stored and add the sizes of all the files found there, you'll end up with a larger bytes number. This is because we also cache some extra files generated as the datasets are being served: resized thumbnail-size copies of image files; metadata exports for published datasets, etc. The logic behind not counting these files is that they are generated on top of the archival content; they can be erased, and the system will regenerate them automatically. Is this ok for your purposes? |
Thank you so much - that's excellent work. The total size of the 'archival payload' would be a good start and it's wise to include published and unpublished in the number. The 'total storage used' (including generated files and metadata) by a dataverse would also be valuable for instances, like ours in Texas, where we're trying to create a service model that sets fees based on storage used above a set maximum per institutional dataverse. |
@CCMumma Thank you. |
That is fantastic news. Thank you for your work. |
We need an API call to exist that currently does not: total size of a specified dataverse
The use case for us is that the Texas Data Repository hosts multiple Institutional Dataverses and I need a simple way to determine the size of all of the content, published and unpublished, in their entire dataverse.
The text was updated successfully, but these errors were encountered: