-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate verbose data usage logs for processing and communication with Make Data Count #5384
Comments
This example from counter-processor is a good reference for how we want to format our logs. |
Actual logging is only taking place on dataset page and that is incomplete
To work on this I've piped some log output from my logging code into counter-processor (using @pdurbin 's vagrant setup in #5385). It parses my log file into the database ok but then chokes when trying to generate the sushi. Not done but definitely progress. (note: inbetween runs if you delete both files in the
|
I was just passing along to @sekmiller that @matthew-a-dunlap has already identified the issue with "publisher_id" above but from running the code as of 2e2965c we are not yet populating publisher-id (I just noticed As I mentioned at #4821 (comment) the sample log uses "publisher" and "publisher id" like this:
GRID seems to be https://en.wikipedia.org/wiki/Global_Research_Identifier_Database (perhaps a little like ISNI?) A workaround for now is to hack on the logs and replace "-" with "1" in the appropriate "publisher id" field so that counter-processor doesn't throw the exception above. I guess I'm a little confused about who the publisher is supposed to be and if that publisher will always have an id. |
Lots of comments in this commit, will clean up soon
Next step is to pipe these into counter processor and see what happens on the MDC server
This is incomplete as most downloads don't log to guestbook until the end via the DownloadInstanceWriter and it is unclear how to get this info that late in the pipe.
Access api & DownloadInstanceWriter now have the info needed to create an MDC entry. This included adding a new constructor to for MDC logging that takes uriInfo and headers.
At this point counterprocessor with our custom regex is readable by Counter Processor and generates sushi. Right now it is not reporting ANY unique investigations and almost certainly should. More investigation is needed.
It seems that our lack of a publisher for our Datasets/Files is a problem submitting to Make Data Count. Those fields are required. We can bypass this check if we submit it blank with a type chosen
But I suspect this is not good practice and may bite us down the road. This is the error if its Omitted:
This is the error if its blank:
|
I processed the logs into a sushi report which I tried to import into the
I had to tweak the config.yaml file: counter-processor-config.yaml.txt I was running 69d48c7 Here's the sushi json file: sushi69d48c7.json.txt |
To generate information on views/downloads/citations for Make Data Count, the first step is to log the raw usage data for later processing.
Our current goal in this logging is to have the syntax match what is used by Counter Processor. This is so we can potentially use Counter Processor to process our raw logs. Even if we do not use Counter Processor, this syntax is a good starting point for our development efforts.
See this doc for some thoughts about our design path.
The text was updated successfully, but these errors were encountered: