-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Data Count support: backend #4821 #5329
Conversation
Retrieving Make Data Count Metrics from Dataverse | ||
------------------------------------------------- | ||
|
||
Dataverse users might find it more convenient to retrieve Make Data Count metrics from their installation of Dataverse rather the DataCite hub. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! So Dataverse would provide everything except citations in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, got it, I think. The citation data would be generated by the DataCite Hub and sent back to Dataverse and you could then retrieve it from there, not that Dataverse itself is producing parallel MDC-like metrics.
Lots of comments in this commit, will clean up soon
Next step is to pipe these into counter processor and see what happens on the MDC server
This is incomplete as most downloads don't log to guestbook until the end via the DownloadInstanceWriter and it is unclear how to get this info that late in the pipe.
Access api & DownloadInstanceWriter now have the info needed to create an MDC entry. This included adding a new constructor to for MDC logging that takes uriInfo and headers.
Fixed a syntax error that probably came from a different issue.
@@ -15,6 +15,7 @@ You should be conscious of the following when running multiple Glassfish servers | |||
- Only one Glassfish server can be the dedicated timer server, as explained in the :doc:`/admin/timers` section of the Admin Guide. | |||
- When users upload a logo for their dataverse using the "theme" feature described in the :doc:`/user/dataverse-management` section of the User Guide, these logos are stored only on the Glassfish server the user happend to be on when uploading the logo. By default these logos are written to the directory ``/usr/local/glassfish4/glassfish/domains/domain1/docroot/logos``. | |||
- When a sitemp is created by a Glassfish server it is written to the filesystem of just that Glassfish server. By default the sitemap is written to the directory ``/usr/local/glassfish4/glassfish/domains/domain1/docroot/sitemap``. | |||
- Make Data Count logs must be copied from each Glassfish server to single instance of Counter Processor. See also the ``:MDCLogPath`` database setting in the :doc:`config` section of this guide and the :doc:`/admin/make-data-count` section of the Admin Guide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we say more clearly here that this is only required if the Make Data Count feature is enabled in this Dataverse installation?
Limitations for Dataverse Installations Using Handles Rather Than DOIs | ||
---------------------------------------------------------------------- | ||
|
||
Data repositories using Handles and other identifiers are not supported by Make Data Count but in the notes_ following a July 2018 webinar, you can see the Make Data Count project's response on this topic. In short, the DataCite hub does not want to receive reports for non-DOI datasets. Additionally, citations are only available from the DataCite hub for datasets that have DOIs. The Dataverse usage logging and Counter Processor tool can still be used to track other identifier and store the metrics in Dataverse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still a little confused about this part. Specifically, this section says that using handles is a "limitation" - that Datacite does not support non-DOI identifiers, but some metrics can still be collected locally. But in a discussion last week I was told that having datasets with handles would not work at all - it would actually break something in the setup... Which one is true? (Or was it specifically a mix, of both handles and DOIs that was a problem?)
To put it differently, would any Dataverse installation ever want to try and use this Make Data Count functionality if they are using handles for the identifiers? Would they have any good reason to collect these metrics on the application side?
Because if not, maybe this paragraph should simply say "make data count only works with DOIs - handles are not supported" - ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an installation has other identifiers besides DataCite, they can still generate logs and process those logs with Counter Processor to create json that details usage on a dataset level. Dataverse can ingest this locally generated json. They are NOT able to use Counter Processor to send the logs to DataCite/Make-Data-Count as DataCite will outright reject the whole json file even if there is only one Handle PID. Sending the json to Make-Data-Count was a requirement for the Harvard Dataverse installation which is why we went with converting the last of our Handles to DataCite, but other installations can still get detailed usage information out of this work.
As Make-Data-Count advances hopefully their APIs will become more verbose and we won't have to worry about a single PID breaking the flow. Or heck, maybe they'll become PID agnostic. We could also at some point fork Counter-Processor to better support these mixed PID installations (e.g. create two copies of the JSON, one with only DataCite PIDs for sending off to Make Data count), but for the first pass this seemed like the easiest route.
I'll go back over the documentation and see how this can be more clear, as its definitely confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this is the data flow:
- Dataverse generates separate logs for page views / downloads. These logs are specially formatted for processing.
- Counter Processor consumes these logs, weeding out repeated clicks, geolocating IP addresses, etc. This aggregated data is placed in a json file.
- If enabled, Counter Processor sends this json file to Make Data Count. This is not required to use Counter Processor.
- Dataverse consumes the locally generated json file and populates a table with the information.
Switch from /home to /usr/local. Add missing trailing quotes.
This reverts commit 1b527aa.
The fun never ends!
connects to #4821