Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile page performance issue Fix #5472

Merged

Conversation

saadmk11
Copy link
Member

closes #5468

@safwanrahman
Copy link
Member

This seems like a elegant fix for the optimization. I did not know about solving the perfomance issue using the Exits query.

Copy link
Contributor

@davidfischer davidfischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized the project_list_detailed.html template is also used to display the pages for tags (eg. /projects/tags/tag-slug/). As-is, since on that page there is no project.good_build attribute, it shows all the projects with no builds.

I see two possibilities going forward here:

  • We also need to change the tags view to have a similar annotation
  • We can make the annotation write to a "private" variable (like _good_build) and then the project.has_good_build method can check if that variable exists and use it if it does and fallback to making an extra query.

This is becoming less of a good first issue and if you need help with it I'm happy to take it back and build on top of what you've done already.

@saadmk11
Copy link
Member Author

@davidfischer I would like to give it a try if its okay with you.

@saadmk11
Copy link
Member Author

@davidfischer Updated The PR please have a look and let me know if the implementation is what you wanted. and what do you think I should do for the other annotation latest_build_date as it is used in /projects/tags/tag-slug/ page also.

@davidfischer
Copy link
Contributor

I like the solution for has_good_build and that was exactly what I was thinking!

I think we can use a similar approach for latest_build_date. Perhaps there's a way to get the whole build object and re-use the property project.get_latest_build?

@saadmk11
Copy link
Member Author

saadmk11 commented Mar 20, 2019

@davidfischer Great! Actually I Didn't use get_latest_build because it created an extra Database query. But now following this approach we can solve it. But I was thinking we don't use the full build object on the template and also I was not able to find a way to get a single build object in the same query. so can we just get the latest build date by adding another property or can we add the variable on the tag view also using the same approach? What do you think?

@davidfischer
Copy link
Contributor

This was a little trickier than I thought but you can control the prefetching with a Prefetch object:

from readthedocs.projects.models import Project
from readthedocs.builds.models import Build

from django.db.models import Prefetch


latest_build = Prefetch('builds', queryset=Build.objects.filter(state='finished', type='html'), to_attr='_latest_build')

# Executes 2 queries
projects = list(Project.objects.all().prefetch_related(latest_build))

# This executes 0 queries
for project in projects:
    print(project.slug)
    if project._latest_build:
        # It is a little awkward that "_latest_build" is a list... and it could be empty
        print(' - {}'.format(project._latest_build[0].date))

The above code isn't ideal but it should give you some tips on working this out.

@safwanrahman
Copy link
Member

safwanrahman commented Mar 22, 2019

@davidfischer The queryset projects = list(Project.objects.all().prefetch_related(latest_build)) will fetch all the builds from all the projects that are selected. I think its more likely memory heavy and make the queryset also much slow.

Do you think its good enough for fetching the latest build data that we need or fetch the whole object?

@davidfischer
Copy link
Contributor

I think its more likely memory heavy and make the queryset also much slow.

Hmmmm, you're right. Maybe there's a way to limit it to just 1 result.

@davidfischer
Copy link
Contributor

Do you think its good enough for fetching the latest build data that we need or fetch the whole object?

A lot of things get simpler if we can fetch the whole build object. However, we don't need to fetch 100s of build objects if possible.

@davidfischer
Copy link
Contributor

Thanks for looking at this @safwanrahman. I think this will do it if you wouldn't mind taking another look:

from django.db.models import OuterRef, Subquery, Prefetch

from readthedocs.projects.models import Project
from readthedocs.builds.models import Build


subquery = Subquery(Build.objects.filter(state='finished', type='html', project=OuterRef('project_id')).values_list('id', flat=True)[:1])
latest_build = Prefetch('builds', Build.objects.filter(pk__in=subquery), to_attr='_latest_build')
projects = list(Project.objects.all().prefetch_related(latest_build))

@saadmk11
Copy link
Member Author

@davidfischer Thanks a lot for your help. I tried out your code in the shell and also in the application but I can't seem to reduce the number of Database Queries. When ever project._latest_build is called it creates a new Query to the database.

>>> from django.db import connection
>>> from django.db.models import OuterRef, Subquery, Prefetch
>>> from readthedocs.projects.models import Project
>>> from readthedocs.builds.models import Build
>>> 
>>> subquery = Subquery(Build.objects.filter(state='finished', type='html', project=OuterRef('project_id')).values_list('id', flat=True)[:1])
>>> latest_build = Prefetch('builds', Build.objects.filter(pk__in=subquery), to_attr='_latest_build')
>>> # Executes 2 queries
>>> projects = list(Project.objects.all().prefetch_related(latest_build))
>>> 
>>> # Executes 8 queries as 8 projects had builds.
>>> for project in projects:
...     if project._latest_build:
...         project._latest_build
... 
[<Build: Build Cue for AnonymousUser admin (11)>]
[<Build: Build Django Test Utils for AnonymousUser admin (13)>]
[<Build: Build Fabric for AnonymousUser admin (15)>]
[<Build: Build Kong for AnonymousUser admin (21)>]
[<Build: Build Pinax for AnonymousUser admin (19)>]
[<Build: Build Read The Docs for AnonymousUser admin (9)>]
[<Build: Build South for AnonymousUser admin (20)>]
[<Build: Build Tastypie for AnonymousUser admin (16)>]
>>> len(connection.queries)
10

@davidfischer
Copy link
Contributor

Those queries are coming from querying the project_user table from the __str__ method on build. (the "for AnonymousUser" part). If you print say only the build date and ID you won't see them.

@saadmk11
Copy link
Member Author

@davidfischer I have Updated the PR according to your recommendations.
I have removed state='finished', type='html' from the filter as previously the template wasn't using the project.get_latest_build() method it was just getting the latest build like project.builds.all.0. Let me know what you think about the PR and thanks a lot for Guiding me. 👍

Copy link
Contributor

@davidfischer davidfischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worked well and performed much better locally for me!

@saadmk11
Copy link
Member Author

Great

@davidfischer
Copy link
Contributor

I'm interested in getting this merged soon if possible.

Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't test it locally, but changes look good to me.

@davidfischer davidfischer merged commit 768856e into readthedocs:master May 9, 2019
@saadmk11 saadmk11 deleted the profile-page-performance-fix branch May 10, 2019 04:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Profile page performance issue
4 participants