Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics APIs: Dataverses, Datasets, Files #4527

Closed
djbrooke opened this issue Mar 20, 2018 · 28 comments
Closed

Metrics APIs: Dataverses, Datasets, Files #4527

djbrooke opened this issue Mar 20, 2018 · 28 comments

Comments

@djbrooke
Copy link
Contributor

Following the discussion in #4478, we decided to add some new metrics endpoints to Dataverse based on the model that's current provided by Miniverse:

https://services.dataverse.harvard.edu/static/swagger-ui/index.html?url=/miniverse/metrics/v1/swagger.yaml

Let's estimate this as a group (all three endpoints) and if the estimate comes back high we can discuss breaking it up.

@djbrooke djbrooke changed the title Metrics Endpoints: Dataverses, Datasets, Files Metrics APIs: Dataverses, Datasets, Files Mar 20, 2018
@djbrooke
Copy link
Contributor Author

djbrooke commented Apr 4, 2018

@pdurbin
Copy link
Member

pdurbin commented Apr 16, 2018

Today I stubbed out some code at fa8b601

pdurbin added a commit that referenced this issue Apr 19, 2018
@pdurbin
Copy link
Member

pdurbin commented Apr 20, 2018

In order to make this a little more real, I created a repo at https://github.com/IQSS/metrics.dataverse.org to start switching the front end code from pulling data from miniverse to pulling data from an installation of Dataverse. For now that installation of Dataverse is my server at https://dev1.dataverse.org running c62e7bb.

I started with what I suspect is the easiest endpoint to migrate, which is called "Dataverses by Category". Below you can see https://dataverse.org/metrics on the left and my implementation from the new repo above on the right. As I migrate the other five metrics over, they'll appear in the version on the right:

screen shot 2018-04-20 at 2 33 49 pm

I guess I'll add a todo list. There are 6 metrics to implement:

  • Dataverses Added Over Time. Started in 4e3c958
  • Dataverses by Category. Done in fa8b601
  • Datasets Added Over Time. Started in fdd609b
  • Datasets by Subject. Started in 40ae4b1
  • Files Added Over Time
  • File Downloads Over Time. Started in 9352882 but needs work.

From the perspective of Swagger, these are the 6 from https://services.dataverse.harvard.edu/static/swagger-ui/index.html?url=/miniverse/metrics/v1/swagger.yaml

screen shot 2018-04-20 at 3 02 47 pm

@djbrooke
Copy link
Contributor Author

Discussed with @pdurbin (and got a demo!). Adding the six endpoints above to the Dataverse application is the correct scope of this issue. There are some other endpoints that are interesting (bytes used, for example), but we can develop iteratively. We'll work on an updated aggregator/reporting application in later issue, but we want to get these initial endpoints into a release sooner instead of later.

@pdurbin
Copy link
Member

pdurbin commented Apr 20, 2018

I just chatted with @dlmurphy and it sounds like he and @jggautier may have already written some of the remaining queries we need as part of #4169. I can, of course, also refer to the miniverse code and maybe even run miniverse on my laptop to have it show me the raw SQL queries. The miniverse code seems to use the Django ORM, which is an abstraction.

@matthew-a-dunlap matthew-a-dunlap removed their assignment May 15, 2018
@kcondon kcondon self-assigned this May 16, 2018
pdurbin added a commit that referenced this issue May 21, 2018
Conflicts (methods added in same location in file):
src/main/java/edu/harvard/iq/dataverse/api/Admin.java
@pdurbin
Copy link
Member

pdurbin commented May 21, 2018

There were minor merge conflicts (methods added in the same place in Admin.java) so I just resolved them in 6cdad11

@kcondon
Copy link
Contributor

kcondon commented May 21, 2018

OK, generally works well but found a few items:

  • Might want to mention the default cache timeout in the docs.
  • Many of the query endpoints use concat() which is incompatible with Postgres 8.4
  • Datasets by subjects counts seem correct in small test env but way off on prod db (9k ui versus 22k api)
  • datasetsBySubject and dataversesByCategory do not update their lastcalleddate value when accessed and do not seem to clear value if cache times out.

@matthew-a-dunlap
Copy link
Contributor

Caching issue should be fixed with latest commit

@djbrooke
Copy link
Contributor Author

(sorry for the delay in this comment)

@kcondon - @matthew-a-dunlap volunteered to take on the remaining issues in standup. I left it assigned to you in case you wanted to test the fixes in parallel, but moving it back to development is OK too.

matthew-a-dunlap added a commit that referenced this issue May 22, 2018
Removed concat usage
Added correct datasetsBySubject query
Updated docs on default metrics timeout
@matthew-a-dunlap
Copy link
Contributor

matthew-a-dunlap commented May 22, 2018

While fixing the issues found during QA, a few extra issues were found and resolved:

  • The datasetsBySubject query no longer has a date component. The new query originally had a check on the month.
  • The 'byMonth' queries were renamed to be 'toMonth', as this is more descriptive of their cumulative nature.
    • The metrics api documentation better describes this as well

All but one of the queries were touched during these changes, and they all should probably be re-tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants