Make Data Count support: backend #4821 #5329

pdurbin · 2018-11-19T21:33:47Z

connects to #4821

coveralls · 2018-11-19T21:41:18Z

Coverage increased (+0.05%) to 17.317% when pulling b14c921 on 4821-make-data-count into d78b08d on develop.

djbrooke · 2018-11-21T16:41:39Z

doc/sphinx-guides/source/admin/make-data-count.rst

+Retrieving Make Data Count Metrics from Dataverse
+-------------------------------------------------
+
+Dataverse users might find it more convenient to retrieve Make Data Count metrics from their installation of Dataverse rather the DataCite hub.


Interesting! So Dataverse would provide everything except citations in this case?

Oh, got it, I think. The citation data would be generated by the DataCite Hub and sent back to Dataverse and you could then retrieve it from there, not that Dataverse itself is producing parallel MDC-like metrics.

Lots of comments in this commit, will clean up soon

into 4821-make-data-count

Next step is to pipe these into counter processor and see what happens on the MDC server

This is incomplete as most downloads don't log to guestbook until the end via the DownloadInstanceWriter and it is unclear how to get this info that late in the pipe.

into 4821-make-data-count

Access api & DownloadInstanceWriter now have the info needed to create an MDC entry. This included adding a new constructor to for MDC logging that takes uriInfo and headers.

Fixed a syntax error that probably came from a different issue.

landreev · 2019-02-19T16:02:03Z

doc/sphinx-guides/source/installation/advanced.rst

@@ -15,6 +15,7 @@ You should be conscious of the following when running multiple Glassfish servers
 - Only one Glassfish server can be the dedicated timer server, as explained in the :doc:`/admin/timers` section of the Admin Guide.
 - When users upload a logo for their dataverse using the "theme" feature described in the :doc:`/user/dataverse-management` section of the User Guide, these logos are stored only on the Glassfish server the user happend to be on when uploading the logo. By default these logos are written to the directory ``/usr/local/glassfish4/glassfish/domains/domain1/docroot/logos``.
 - When a sitemp is created by a Glassfish server it is written to the filesystem of just that Glassfish server. By default the sitemap is written to the directory ``/usr/local/glassfish4/glassfish/domains/domain1/docroot/sitemap``.
+- Make Data Count logs must be copied from each Glassfish server to single instance of Counter Processor. See also the ``:MDCLogPath`` database setting in the :doc:`config` section of this guide and the :doc:`/admin/make-data-count` section of the Admin Guide.


Should we say more clearly here that this is only required if the Make Data Count feature is enabled in this Dataverse installation?

landreev · 2019-02-19T16:11:02Z

doc/sphinx-guides/source/admin/make-data-count.rst

+Limitations for Dataverse Installations Using Handles Rather Than DOIs
+----------------------------------------------------------------------
+
+Data repositories using Handles and other identifiers are not supported by Make Data Count but in the notes_ following a July 2018 webinar, you can see the Make Data Count project's response on this topic. In short, the DataCite hub does not want to receive reports for non-DOI datasets. Additionally, citations are only available from the DataCite hub for datasets that have DOIs. The Dataverse usage logging and Counter Processor tool can still be used to track other identifier and store the metrics in Dataverse.


Still a little confused about this part. Specifically, this section says that using handles is a "limitation" - that Datacite does not support non-DOI identifiers, but some metrics can still be collected locally. But in a discussion last week I was told that having datasets with handles would not work at all - it would actually break something in the setup... Which one is true? (Or was it specifically a mix, of both handles and DOIs that was a problem?)
To put it differently, would any Dataverse installation ever want to try and use this Make Data Count functionality if they are using handles for the identifiers? Would they have any good reason to collect these metrics on the application side?
Because if not, maybe this paragraph should simply say "make data count only works with DOIs - handles are not supported" - ?

If an installation has other identifiers besides DataCite, they can still generate logs and process those logs with Counter Processor to create json that details usage on a dataset level. Dataverse can ingest this locally generated json. They are NOT able to use Counter Processor to send the logs to DataCite/Make-Data-Count as DataCite will outright reject the whole json file even if there is only one Handle PID. Sending the json to Make-Data-Count was a requirement for the Harvard Dataverse installation which is why we went with converting the last of our Handles to DataCite, but other installations can still get detailed usage information out of this work.

As Make-Data-Count advances hopefully their APIs will become more verbose and we won't have to worry about a single PID breaking the flow. Or heck, maybe they'll become PID agnostic. We could also at some point fork Counter-Processor to better support these mixed PID installations (e.g. create two copies of the JSON, one with only DataCite PIDs for sending off to Make Data count), but for the first pass this seemed like the easiest route.

I'll go back over the documentation and see how this can be more clear, as its definitely confusing.

FWIW, this is the data flow:

Dataverse generates separate logs for page views / downloads. These logs are specially formatted for processing.

Counter Processor consumes these logs, weeding out repeated clicks, geolocating IP addresses, etc. This aggregated data is placed in a json file.

If enabled, Counter Processor sends this json file to Make Data Count. This is not required to use Counter Processor.

Dataverse consumes the locally generated json file and populates a table with the information.

Switch from /home to /usr/local. Add missing trailing quotes.

This reverts commit 1b527aa.

The fun never ends!

stub our docs and API for Make Data Count #4821

4dd10bd

djbrooke reviewed Nov 21, 2018

View reviewed changes

pdurbin changed the title ~~stub our docs and API for Make Data Count #4821~~ Make Data Count support: backend #4821 Nov 21, 2018

pdurbin and others added 26 commits November 29, 2018 15:41

clarify that we'll send citations to DataCite #4821

17cbf37

link to HTML version of CoP #4821

5f59a8c

add "datasetmetrics" table #4821

e8ac623

stub out counter-processor setup in Vagrant #4821 #5385

02c5538

stand up counter-processor, continued #4821

e835e06

add SUSHI JSON example and stub out parsing of it #4821

9f7c990

add draft architecture drawing #4821

ac5e29d

Merge branch 'develop' into 4821-make-data-count

b76d92c

Merge branch '5384-data-usage-logs-MDC' into 4821-make-data-count

8d1c6ab

Temp fix null erroring for MDC #5384 #4821

af728cf

MDC log var collection in construct #5384 #4821

4416b41

MDC log dl with guestbook #4821 #5384

dc11ec1

Lots of comments in this commit, will clean up soon

#4821 parse sample json add dummy data api

e36c0ed

#4821 remove extraneous test code

61f78d7

#4821 check for existing metrics records

0d38f10

MDC access api #4821 #5384

c035dbd

Merge branch '4821-make-data-count' of https://github.com/IQSS/dataverse

85bb860

into 4821-make-data-count

MDC File page and correct API #4821 #5384

e41ab08

MDC daily logs and cleanup #4821 #5384

c17e3e2

Next step is to pipe these into counter processor and see what happens on the MDC server

MDC Path to log files #4821 #5384

0bfd936

change addDummyData to addUsageMetricsFromSushiReport #4821

184580d

MDC log dl reqUrl etc part1 #4821 #5384

bd7ff84

This is incomplete as most downloads don't log to guestbook until the end via the DownloadInstanceWriter and it is unclear how to get this info that late in the pipe.

Merge branch '4821-make-data-count' of https://github.com/IQSS/dataverse

b60a679

into 4821-make-data-count

stub out citation parser #4821

94eccc7

MDC log dl reqUrl etc part2 #4821 #5384

51b4ed2

Access api & DownloadInstanceWriter now have the info needed to create an MDC entry. This included adding a new constructor to for MDC logging that takes uriInfo and headers.

download citations from DataCite hub #4821

f9b0424

pdurbin mentioned this pull request Feb 12, 2019

Implement Backend Support for Make Data Count use and citation metrics #4821

Closed

dlmurphy added 2 commits February 14, 2019 17:16

Syntax fix (unrelated to issue)

cc3eeba

Fixed a syntax error that probably came from a different issue.

Typo fix [#4821]

c6eab30

landreev reviewed Feb 19, 2019

View reviewed changes

matthew-a-dunlap added 2 commits February 20, 2019 15:31

Merge branch 'develop' into 4821-make-data-count

5b11ce3

Guides fixes #4821

78cb180

landreev approved these changes Feb 21, 2019

View reviewed changes

sekmiller and others added 19 commits February 25, 2019 14:29

Merge branch 'develop' into 4821-make-data-count

9fc9b71

#4821 fix add metrics endpoint for no dataset case

388d69a

#4821 fix doc for add metrics

da71e5c

doc improvements #4821

44acd97

Switch from /home to /usr/local. Add missing trailing quotes.

have doc examples match yaml config #4821

51cfde8

#4821 fix retrieve citations api

d2cf1a7

rename to "citationUrl" to allow for future fields #4821

865c99b

more docs and debug output #4821

d97125c

add datasetexternalcitations table to diagram #4821

677bf7a

capture some decisions from tech hours #4821

1b527aa

Merge branch 'develop' into 4821-make-data-count

23c4748

Log metadata downloads #4821

a92ad9d

#4821 add support for non-country metrics

522c26f

MDC export regex match 0 or more #4821

545512a

Revert "capture some decisions from tech hours #4821 "

f2d7d54

This reverts commit 1b527aa.

MDC mult ways to get datasetVersion in guestbook #4821

278568d

MDC expanded dataset api logging #4821

6375598

MDC Missed basic dataset json api #4821

0b46993

The fun never ends!

Merge branch 'develop' into 4821-make-data-count

b14c921

kcondon merged commit 9439313 into develop Mar 15, 2019

kcondon deleted the 4821-make-data-count branch March 15, 2019 17:53

pdurbin mentioned this pull request Mar 26, 2019

Find candidate beta testers for log processor (i.e. Dataverse) CDLUC3/Make-Data-Count#99

Closed

pdurbin added this to the 4.12 milestone Mar 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Data Count support: backend #4821 #5329

Make Data Count support: backend #4821 #5329

pdurbin commented Nov 19, 2018

coveralls commented Nov 19, 2018 •

edited

Loading

djbrooke Nov 21, 2018

djbrooke Nov 21, 2018

landreev Feb 19, 2019

landreev Feb 19, 2019

matthew-a-dunlap Feb 19, 2019

matthew-a-dunlap Feb 19, 2019 •

edited

Loading

Make Data Count support: backend #4821 #5329

Make Data Count support: backend #4821 #5329

Conversation

pdurbin commented Nov 19, 2018

coveralls commented Nov 19, 2018 • edited Loading

djbrooke Nov 21, 2018

Choose a reason for hiding this comment

djbrooke Nov 21, 2018

Choose a reason for hiding this comment

landreev Feb 19, 2019

Choose a reason for hiding this comment

landreev Feb 19, 2019

Choose a reason for hiding this comment

matthew-a-dunlap Feb 19, 2019

Choose a reason for hiding this comment

matthew-a-dunlap Feb 19, 2019 • edited Loading

Choose a reason for hiding this comment

coveralls commented Nov 19, 2018 •

edited

Loading

matthew-a-dunlap Feb 19, 2019 •

edited

Loading