-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable collection storage quota monitoring in HDV #240
Comments
2024/01/16: Re. sizing of this issue...as per Slack comment, @landreev will get feedback from HDV curation team to determine eventual size. |
2024/01/17: Also waiting on resolution of: IQSS/dataverse#10220 |
(I asked in the linked sister issue IQSS/dataverse-HDV-Curation#344 about the specifics of differentiating between Harvard vs. non-Harvard affiliates; aside from that, it's mostly ready to go) |
Talked to Sonia directly, she replied in the linked curation issue, moving forward on this. |
@sbarbosadataverse @jggautier For collections: It separately lists the Harvard- and non-Harvard-affiliated top-level collections that are over the respective size limits. (I will post the report generated last night in the curation issue). As of now there are 2 Harvard collections that are over the 2.5TB limit , although one of them is OMAMA, so it doesn't count. There are 8 non-Harvard collections that are currently over the limit. I use Julian's database queries for determining which collections to consider Harvard vs. not. Please note that for the purposes of counting total storage use, ALL uploaded files are counted - meaning, it includes the sizes of the unpublished files in draft versions and published files that are no longer in the latest version. This is in contrast to the numbers shown in Julian's "Harvard Dataverse Repository metrics" that only include the files in the latest published version. This is an important distinction. One dramatic example is the layline collection - it shows in the metrics report as only 6+GB, but actually uses close to 2TB of storage (!). There is a separate listing for datasets that are directly in the top-level root collection. We should keep discussing how to count storage there in IQSS/dataverse-HDV-Curation#344, but this is what I'm doing as of now: Once again, because of the email addresses and potentially other private information, I don't want to post a report example here, but will add it to the curation issue. As currently configured, this script runs every night, it generates the report and sends it to the list of configured email addresses, which at the moment is just me. There's probably no need to run it this often - so maybe it should be once a week instead (?). So, let me know if you want to receive a copy of this report, and how often. Let's continue the discussion in the curation issue. |
I documented an example of the report, as generated in prod. oln Aug. 1 in IQSS/dataverse-HDV-Curation#344. |
Once #239 is done (6.1 deployed in prod.), we will need to enable quota limits in HDV. For all the existing collections; and establish the process for setting limits on all the new collections going forward, and also set up automatic monitoring. More details/policies for this are outlined in the curation repo issue: https://github.com/IQSS/dataverse-HDV-Curation/issues/344#issuecomment-1881648186
The text was updated successfully, but these errors were encountered: