Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store Pageviews in DB #6121

Merged
merged 30 commits into from
May 19, 2020
Merged

Store Pageviews in DB #6121

merged 30 commits into from
May 19, 2020

Conversation

dojutsu-user
Copy link
Member

@dojutsu-user dojutsu-user commented Aug 29, 2019

Related: #5968

@dojutsu-user dojutsu-user added the PR: work in progress Pull request is not ready for full review label Aug 29, 2019
Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good first approach. It needs to be cleaned up a bit though with the modeling.

readthedocs/api/v2/views/footer_views.py Outdated Show resolved Hide resolved
readthedocs/search/models.py Outdated Show resolved Hide resolved
readthedocs/search/models.py Outdated Show resolved Hide resolved
@ericholscher ericholscher changed the title Search results sorted by pageviews Store Pageview in DB Sep 30, 2019
@ericholscher ericholscher changed the title Store Pageview in DB Store Pageviews in DB Sep 30, 2019
@ericholscher ericholscher requested a review from a team October 31, 2019 15:38
Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite close to being ready. Just a few places where it can be cleaned up or simplified.

@@ -210,6 +211,14 @@ def get(self, request, format=None):
'version_supported': version.supported,
}

# increase the page view count
page_slug = request.GET.get('page', 'index')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should likely not default to index, no? If we don't have a page, we probably shouldn't be storing anything, or some other string like rtd-unknown-page, not index.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen any real examples where the page parameter is blank. It is index for the root and even with the dirhtml builder it uses the subdirectory name. I'll add a simple if page_slug check.

readthedocs/projects/views/private.py Outdated Show resolved Hide resolved
readthedocs/projects/views/private.py Outdated Show resolved Hide resolved
readthedocs/rtd_tests/tests/test_footer.py Show resolved Hide resolved
readthedocs/search/models.py Outdated Show resolved Hide resolved
readthedocs/search/models.py Outdated Show resolved Hide resolved
readthedocs/search/utils.py Outdated Show resolved Hide resolved
@stale
Copy link

stale bot commented Dec 16, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: stale Issue will be considered inactive soon label Dec 16, 2019
@ericholscher ericholscher added the Accepted Accepted issue on our roadmap label Dec 17, 2019
@stale stale bot removed the Status: stale Issue will be considered inactive soon label Dec 17, 2019
@ericholscher
Copy link
Member

This is still useful, but I want to wait until our DB load calms down a bit before adding additional load to it.

readthedocs/search/tasks.py Outdated Show resolved Hide resolved
readthedocs/search/models.py Outdated Show resolved Hide resolved
readthedocs/search/models.py Outdated Show resolved Hide resolved
@davidfischer
Copy link
Contributor

I refactored this and brought it to a merge-ready state.

  • There's a periodic task to remove old page views
  • The screen (see screenshot) now shows aggregated pageviews by day. That's a total traffic graph.

I am interested to see what this does to our database as it will add a (simple) write query in a celery task on every Footer API call.

Screenshot

Screen Shot 2020-05-08 at 2 12 57 PM

@davidfischer davidfischer removed the PR: work in progress Pull request is not ready for full review label May 8, 2020
@@ -977,15 +975,6 @@ class RegexAutomationRuleUpdate(RegexAutomationRuleMixin, UpdateView):
pass


@login_required
def search_analytics_view(request, project_slug):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what this view is. It doesn't appear to be used or valid.

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I'm curious if we should be keeping the pageview data in the search app, but I think it's fine for now?

I'm also super curious what this will do to our celery queues and DB, but I think it shouldn't be too much. A simple update isn't a ton of work, but we'll see :)

The nice thing is we can always limit the celery workers or turn them off in some fashion, so we have a reasonable escape hatch.

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more tidbits.


This is intended to run from a periodic task daily.
"""
thirty_days_ago = timezone.now().date() - timezone.timedelta(days=30)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably log something, so we know it's running properly.



@app.task(queue='web')
def increase_page_view_count(project_slug, version_slug, path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if running this as a task is actually a good deal slower then running it in the footer itself? We already have the Project & Version object in the footer, and not we're adding additional queries for them here. They should be fast queries, but definitely additional load.

I guess the proper solution is to cache these lookups, which we can do later if it's an issue.

@davidfischer
Copy link
Contributor

This looks good to me. I'm curious if we should be keeping the pageview data in the search app, but I think it's fine for now?

I had this thought myself. I don't have a strong opinion either way. I'd be fine moving it into the Project app or a case could even be made for the Build app as it is Version specific. The case for the search app is that that is where the data is most useful: for ordering search results. However, as we have a page in the project admin, maybe that isn't the right place.

I'd rather get it into the right place the first time it's deployed. Moving it could be a bit of a pain. If you have an opinion, let's get that in.

@ericholscher
Copy link
Member

Well, arguably the Versions should be in the Project app instead of Builds, so we're already in a confusing place to start :)

I think we put them in Search originally because we wanted to use them to weight search results by pageview count. I think it probably makes sense for them to be in their own app, something like analytics or pageviews, but again, I don't have super clear idea of whats the best.

@ericholscher
Copy link
Member

Ah, we already have an analytics app, so that probably makes sense :)

@ericholscher
Copy link
Member

@davidfischer it would be good to get this patched up and shipped this week if possible.

@madwaxamillion
Copy link

madwaxamillion commented May 19, 2020 via email

@davidfischer
Copy link
Contributor

I moved the pageview modeling from the search app to the analytics app. This is ready for a full review.

@ericholscher
Copy link
Member

Looks good 🎉

@ericholscher ericholscher merged commit 4fa419b into readthedocs:master May 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Accepted issue on our roadmap
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants