Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operationalize MDC: Create Cron Jobs, Acquire, Configure Prod Web Token, Handle Logs #3

Open
kcondon opened this issue Mar 11, 2019 · 15 comments
Assignees
Labels
GREI 4 Analytics and Reporting NIH OTA: 1.5.1 collection: 5 | 1.5.1 | Standardize download metrics for the Harvard Dataverse repository... pm.GREI-d-1.5.1 NIH, yr1, aim5, task1: Standardize download metrics pm.GREI-d-1.5.2 NIH, yr1, aim5, task2: WG with other repositories to follow Make Data Count recommendations Size: 10 A percentage of a sprint.

Comments

@kcondon
Copy link
Contributor

kcondon commented Mar 11, 2019

The MDC feature is well documented but there are a few items that need to be addressed to operate in a production environment:
-Create Cron job(s) to cal the various API endpoints needed to process various files, import to db, including error detection and notification of failure
-Acquire and configure a production web token that allows publishing stats to DataCite
-Consider/Plan/Monitor growth of log files
-Consider how to troubleshoot or rerun failed jobs.

@djbrooke
Copy link
Contributor

I'll add one more...

  • we'll need to figure out how to handle pre-MDC download counts. I'd like to reflect them so that researchers don't need to start at zero. :)

@dlowenberg @mfenner it would be good to get some thoughts from you and the team on how other groups have handled this. Thanks in advance for any guidance or for pointing us to any docs!

@dlowenberg
Copy link

Hi there, if you would like to look at or copy the code that we wrote for Dryad in processing the last ten years of downloads, here is some info that may be useful:

The main reporting code is here: https://github.com/datadryad/dryad-repo/blob/dryad-master/dspace/modules/api/src/main/java/org/dspace/curate/DashStats.java
Though it’s pretty specific to the existing Dryad setup. It writes out a text file that is formatted for the counter-processor, but it’s sorted by dataset. Then there is a script that re-sorts everything based on time: https://github.com/datadryad/dryad-utils/blob/master/dash-migration/sort_dash_stats.sh

Happy to set up time for you to talk with Ryan Scherle (Dryad) if that would be helpful. Otherwise, the DataCite and DataONE folks may also have some tips.

@djbrooke
Copy link
Contributor

Thanks @dlowenberg! I'll check in with the team here and we'll get back with you if we feel a discussion with Dryad is needed. Thanks again!

P.S. I just pinged you on another issue in the main Dataverse repo: IQSS/dataverse#5957

@djbrooke
Copy link
Contributor

djbrooke commented Jul 24, 2019

  • We should do the things outlined in the original comment and other items not yet identified
  • The current suggestion is to seed the count with the downloads that already exist, but we can discuss during the sprint.
  • We should make note for users about how the numbers are derived (some from before the standard was implemented and others from after)

@djbrooke djbrooke self-assigned this Jul 25, 2019
@djbrooke
Copy link
Contributor

I picked this up out of the sprint column today to begin stubbing out documentation regarding migrating counts and other things that installations will need to know to use Make Data Count in production, but I don't have the bandwidth this week. I will re-visit early next week.

@djbrooke djbrooke removed their assignment Jul 25, 2019
@pdurbin
Copy link
Member

pdurbin commented Aug 9, 2019

@djbrooke if you're stubbing out documentation, you might want to create a branch for IQSS/dataverse#6082 which was just opened. The issue title is "Documentation: Some tweaks to Make Data Count doc based on recent experience".

@jggautier
Copy link
Collaborator

See #75 (comment)

@mreekie mreekie added the NIH OTA: 1.5.1 collection: 5 | 1.5.1 | Standardize download metrics for the Harvard Dataverse repository... label Oct 6, 2022
@mreekie mreekie added pm.GREI-d-1.5.1 NIH, yr1, aim5, task1: Standardize download metrics pm.GREI-d-1.5.2 NIH, yr1, aim5, task2: WG with other repositories to follow Make Data Count recommendations labels Mar 20, 2023
@cmbz cmbz moved this to NIH bklog items (Stefano) in IQSS Dataverse Project Jul 21, 2023
@cmbz
Copy link
Collaborator

cmbz commented Jul 21, 2023

  • I moved this issue into the Global Backlog in the NIH Backlog column, as per conversations with @siacus and current AIM 5 Year 2 plans.

@cmbz cmbz moved this from NIH bklog items (Stefano) to SPRINT- NEEDS SIZING in IQSS Dataverse Project Jul 24, 2023
@cmbz cmbz added Size: 33 A percentage of a sprint. and removed sz.Medium labels Jul 25, 2023
@stevenwinship stevenwinship moved this from SPRINT READY to In Progress 💻 in IQSS Dataverse Project Mar 7, 2024
@cmbz cmbz removed the Status: Needs Input Applied to issues in need of input from someone currently unavailable label Mar 14, 2024
@landreev
Copy link
Collaborator

One random thought: one of the prod. servers, dvn-cloud-rserv-1.lib.harvard.edu is currently underutilized, and could be a prime candidate for running that processor on the accumulated logs.

@pdurbin
Copy link
Member

pdurbin commented Mar 20, 2024

Heads up that Counter Processor was archived yesterday:

@pdurbin
Copy link
Member

pdurbin commented Mar 21, 2024

As discussed at standup, I forked the repo:

@scolapasta scolapasta removed this from the 6.2 milestone Mar 27, 2024
@cmbz cmbz added Size: 3 A percentage of a sprint. Size: 10 A percentage of a sprint. and removed Size: 33 A percentage of a sprint. Size: 3 A percentage of a sprint. labels Mar 27, 2024
@cmbz cmbz moved this from In Progress 💻 to Done 🧹 in IQSS Dataverse Project Mar 28, 2024
@stevenwinship stevenwinship moved this from Done 🧹 to In Progress 💻 in IQSS Dataverse Project Apr 2, 2024
@cmbz cmbz moved this from In Progress 💻 to Done 🧹 in IQSS Dataverse Project Apr 4, 2024
@sbarbosadataverse
Copy link

sbarbosadataverse commented Apr 4, 2024

@pdurbin
Copy link
Member

pdurbin commented Aug 6, 2024

This was accidentally and automatically closed when IQSS/dataverse#10424 was merged. Re-opening.

@pdurbin pdurbin reopened this Aug 6, 2024
@cmbz cmbz added the GREI 4 Analytics and Reporting label Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GREI 4 Analytics and Reporting NIH OTA: 1.5.1 collection: 5 | 1.5.1 | Standardize download metrics for the Harvard Dataverse repository... pm.GREI-d-1.5.1 NIH, yr1, aim5, task1: Standardize download metrics pm.GREI-d-1.5.2 NIH, yr1, aim5, task2: WG with other repositories to follow Make Data Count recommendations Size: 10 A percentage of a sprint.
Projects
Status: No status
Development

No branches or pull requests