Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User usage statistic #2729

Closed
pengchengluo opened this issue Nov 6, 2015 · 15 comments
Closed

User usage statistic #2729

pengchengluo opened this issue Nov 6, 2015 · 15 comments

Comments

@pengchengluo
Copy link
Contributor

As far as I know, dataverse doesn't have the usage statistic function and doesn't log any user usage information, such as who download the data, view dataset page and dataverse page. However, this is a very important function for data providers. In our university, the faculty who provide datasets are eager to know who use the data, when they download the data, how they use it. User usage information will help the data providers to understand their users.

So we wonder whether dataverse will provide such function in the future releases. For us we hope dataverse can provide it. Now, we have to do it by ourselves. Log some user behavior information, index it in the elasticsearch and present the web page.

@bencomp
Copy link
Contributor

bencomp commented Nov 6, 2015

Dataverse has Guestbooks that log downloads and Google Analytics support built-in. I vowed for alternatives like Piwik in #1594. There is a table in the database that does log user actions called the action log (after my request in #1532).

Logging to the server log can be improved, though, like requested in #2575.

Also related: #568, #2485.

(edited to add a note about guestbooks and more issue references.)

@pdurbin
Copy link
Member

pdurbin commented Nov 6, 2015

The term we often use is "metrics" and here are some related issues: #1971 #2101 #2417

As @bencomp mentioned, for downloads and uh, explorations anyway, you could look at the "guestbookresponse" table:

dvndb=> select id,downloadtype,name,datafile_id, dataset_id from guestbookresponse;
 id | downloadtype |      name       | datafile_id | dataset_id 
----+--------------+-----------------+-------------+------------
  1 | Download     | *********       |          18 |         16
  2 | Download     | *************** |          86 |         82
  3 | Download     | Guest           |          88 |         82
  4 | Download     | Guest           |          86 |         82
  5 | Download     | ************    |          51 |         49
  6 | Download     | Guest           |          91 |         90
  7 | Explore      | Guest           |          91 |         90
  8 | Download     | Guest           |          91 |         90
  9 | Download     | Guest           |          91 |         90
 10 | Download     | Guest           |          91 |         90
 11 | Explore      | Guest           |          91 |         90
 12 | Download     | Guest           |          91 |         90
 13 | Download     | Guest           |          91 |         90
 14 | Download     | Guest           |          92 |         90
 15 | Explore      | Guest           |          91 |         90
 16 | Download     | Guest           |          92 |         90
 17 | Explore      | Guest           |          91 |         90
 18 | Download     | Guest           |          91 |         90
 19 | Explore      | Guest           |         107 |        101
 20 | Explore      | Guest           |          91 |         90
 21 | Download     | Guest           |          91 |         90
 22 | Download     | Guest           |          92 |         90
 23 | Download     | Guest           |          91 |         90
 24 | Download     | Guest           |          70 |         69
(24 rows)

Right, you enable access logs for Glassfish and use a tool to generate stats. I'd actually be very interested in your script to put this into Elasticsearch and what sort of queries you do, and how you present this on a web page @pengchengluo ! Maybe we could use it on http://dataverse.org as part of #2417.

@bencomp also mentioned Google Analytics. @eaquigley has figured out how to tell what people are searching on.

@pdurbin pdurbin added UX & UI: Design This issue needs input on the design of the UI and from the product owner Type: Suggestion an idea labels Nov 6, 2015
@pdurbin
Copy link
Member

pdurbin commented Nov 6, 2015

Oh, and I imagine some of this will be useful on dashboard for superusers: #840

That said, I understand that this issue has a focus on the individual researchers. They'd like to know who is downloading their data and how it's being used. @pengchengluo maybe you can give us feedback on the Guestbook feature: http://guides.dataverse.org/en/4.2.1/user/dataverse-management.html#dataset-guestbooks . I'm sure some improvements could be made.

@pengchengluo
Copy link
Contributor Author

Thanks for @pdurbin and @bencomp 's help! The actionlogrecord and guestbookresponse table indeed record some useful information we need.

However, in my opinion, it will be more efficiency if the data is indexed in the search engine such as elasticsearch. With the increasing of log data the burden of database will increase and this will effect the performance of other functions. Elasticsearch has the horizontal scalability and can deal with large scale data, it is a good choice to use it as the log store and analysis engine.

I record some user behavior such as view dataverse, view dataset, download file, request join explicit user group (we implement in the web interface), accept user's request, reject user's request and so on. The log contain some basic information such as event type, ip address, timestamp, referrer,user id and other useful information such as dataverse id, dataset id, group id. The log information is sent to elasticsearch in real time using the JestClient. Dataverse admin can view and search these log event in the web interface.

Following is an example of dataverse view statistic

pkudvn

Following is an example of datafile download statistic

pkudvn2

@posixeleni
Copy link
Contributor

👍

@mercecrosas mercecrosas modified the milestone: In Review Nov 30, 2015
@eaquigley
Copy link
Contributor

Hi @pengchengluo this is looking really cool! Are you planning on submitting a pull request for this so we can test it with the source code? Would love to be able to get this onto one of our test servers so I can play around with it and see what the user experience is like. Thanks for doing this!

@pdurbin
Copy link
Member

pdurbin commented Dec 4, 2015

@pengchengluo I agree completely with @eaquigley that your visualizations are fantastic! Please do let us know if you're interested in contributing some code!


Meanwhile, I just heard about @metabase at https://medium.com/@metabase/why-we-picked-clojure-448bf759dc83 (someone tweeted it) which prompted me to go listen to https://changelog.com/182/ with @tlrobinson and @salsakran and it seems really cool! I threw in on https://demo.dataverse.org (only took a few minutes; it's just a jar to start, like Solr) and started playing around with the guestbookresponse table:

metabase_-_2015-12-03_19 49 51

Metabase seems very promising! More at http://metabase.com and I highly recommend the blog post and podcast episode above.

@pengchengluo
Copy link
Contributor Author

Hi @pdurbin, the faculties in our university are interested in who download their data and the downloaders' detail information. The google analytic just collect anonymous usage information. Therefore, we add this usage statistic function.

@eaquigley , @pdurbin We are very glad to share our code!

@pdurbin
Copy link
Member

pdurbin commented Dec 11, 2015

Are you planning on submitting a pull request for this so we can test it with the source code? Would love to be able to get this onto one of our test servers so I can play around with it and see what the user experience is like.

@eaquigley it looks like @pengchengluo just made a pull request at #2818 . Did you have a test server in mind?

@eaquigley
Copy link
Contributor

@pdurbin beta.dataverse.org would be the test server in mind since it is intended to be the test machine that shows new features and functionality.

@bencomp
Copy link
Contributor

bencomp commented Dec 11, 2015

It would be interesting to see the performance under load, as ElasticSearch can scale, but for it to scale I think you need multiple machines. Something for locust?

Server capacity at DANS is low, so although this work is very interesting, I would probably want to wait with setting up all of this.

@eaquigley
Copy link
Contributor

@bencomp it would be interesting to set this up on one of our test machines (beta.dataverse.org) and run locust against it to see what happens to performance of the application.

@pdurbin
Copy link
Member

pdurbin commented Sep 27, 2016

Since Piwik is mentioned above, just a heads up that support for Pwiki just merged (pull request #3374).

Also, on today's community call we talked about http://dataverse.org/metrics and how it's pointed at https://dataverse.harvard.edu but the code that powers it ( https://github.com/IQSS/miniverse ) can be pointed at any Dataverse installation. Here's a link to the notes: https://docs.google.com/document/d/1Bvxg8NxU3LV0yRBp5X-qOU1u7Ede7-HHDtwjuHPTJyc/edit?usp=sharing

@pdurbin
Copy link
Member

pdurbin commented Jun 23, 2017

I'm closing this now that people can install https://github.com/IQSS/miniverse . If anyone objects, please let me know.

@pdurbin pdurbin closed this as completed Jun 23, 2017
@pdurbin
Copy link
Member

pdurbin commented Oct 16, 2017

Please note that there is some activity regarding gathering metrics at #4169.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants