Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve queries with related attributes to use tenant index #39

Open
hugobessa opened this issue Jun 22, 2018 · 2 comments
Open

Improve queries with related attributes to use tenant index #39

hugobessa opened this issue Jun 22, 2018 · 2 comments

Comments

@hugobessa
Copy link
Owner

hugobessa commented Jun 22, 2018

When we do filters with related fields, the database is making queries considering the whole related table, instead of only the portion of data from the current tenant. This makes queries very slow when there're many tenants.

One solution is implicitly changing the queryset to filter by the related table tenant too.

Eg.:

MyModel.objects.filter(my_related_field__field1=value1, my_related_field__field2=value2)

sql is something like:

SELECT "my_app_mymodel"."field1", "my_app_mymodel"."field2"
FROM "my_app_mymodel"
    INNER JOIN "my_app_myrelatedmodel"
    ON "my_app_myrelatedmodel"."mymodel_id" = "my_app_mymodel"."id"
WHERE
    "my_app_mymodel"."tenant_id" = tenant_id AND
    "my_app_myrelatedmodel"."field1" = value1 AND
    "my_app_myrelatedmodel"."field2" = value2

If we add the related field tenant's filter like this we solve the problem

MyModel.objects.filter(
    my_related_field__tenant_id=tenant_id, my_related_field__field1=value1, 
    my_related_field__field2=value2)

the sql with the fix should something like:

SELECT "my_app_mymodel"."field1", "my_app_mymodel"."field2"
FROM "my_app_mymodel"
    INNER JOIN "my_app_myrelatedmodel"
    ON "my_app_myrelatedmodel"."mymodel_id" = "my_app_mymodel"."id"
WHERE
    "my_app_mymodel"."tenant_id" = tenant_id AND
    "my_app_myrelatedmodel"."tenant_id" = tenant_id AND
    "my_app_myrelatedmodel"."field1" = value1 AND
    "my_app_myrelatedmodel"."field2" = value2

For excludes the situation is even worse because it always uses NOT IN operator and subqueries.

Eg.:

MyModel.objects.exclude(my_related_field__field1=value1, my_related_field__field2=value2)

sql is something like:

SELECT "my_app_mymodel"."field1", "my_app_mymodel"."field2"
FROM "my_app_mymodel"
WHERE
    "my_app_mymodel"."tenant_id" = tenant_id
    "my_app_mymodel"."my_related_field_id" NOT IN (
        SELECT "my_app_myrelatedmodel"."field1", "my_app_myrelatedmodel"."field2" 
        FROM "my_app_myrelatedmodel"
        WHERE "my_app_myrelatedmodel"."field1" = value1
    ) AND
    "my_app_mymodel"."my_related_field_id" NOT IN (
        SELECT "my_app_myrelatedmodel"."field1", "my_app_myrelatedmodel"."field2" 
        FROM "my_app_myrelatedmodel"
        WHERE "my_app_myrelatedmodel"."field2" = value2
    );

If we use the same strategy as in filter the result would be something like

SELECT "my_app_mymodel"."field1", "my_app_mymodel"."field2"
FROM "my_app_mymodel"
WHERE
    "my_app_mymodel"."tenant_id" = tenant_id AND
    "my_app_mymodel"."my_related_field_id" NOT IN (
        SELECT "my_app_myrelatedmodel"."field1", "my_app_myrelatedmodel"."field2" 
        FROM "my_app_myrelatedmodel"
        WHERE  
            "my_app_myrelatedmodel"."tenant_id" = tenant_id
    ) AND
    "my_app_mymodel"."my_related_field_id" NOT IN (
        SELECT "my_app_myrelatedmodel"."field1", "my_app_myrelatedmodel"."field2" 
        FROM "my_app_myrelatedmodel"
        WHERE 
            "my_app_myrelatedmodel"."field1" = value1
    ) AND
    "my_app_mymodel"."my_related_field_id" NOT IN (
        SELECT "my_app_myrelatedmodel"."field1", "my_app_myrelatedmodel"."field2" 
        FROM "my_app_myrelatedmodel"
        WHERE  
            "my_app_myrelatedmodel"."field2" = value2
    );

For it to be optimized, the generated code should be something like this:

SELECT "my_app_mymodel"."field1", "my_app_mymodel"."field2"
FROM "my_app_mymodel"
WHERE
    "my_app_mymodel"."tenant_id" = tenant_id AND
    "my_app_mymodel"."my_related_field_id" NOT IN (
        SELECT "my_app_myrelatedmodel"."field1", "my_app_myrelatedmodel"."field2" 
        FROM "my_app_myrelatedmodel"
        WHERE 
            "my_app_myrelatedmodel"."tenant_id" = tenant_id AND 
            "my_app_myrelatedmodel"."field1" = value1
    ) AND
    "my_app_mymodel"."my_related_field_id" NOT IN (
        SELECT "my_app_myrelatedmodel"."field1", "my_app_myrelatedmodel"."field2" 
        FROM "my_app_myrelatedmodel"
        WHERE  
            "my_app_myrelatedmodel"."tenant_id" = tenant_id AND 
            "my_app_myrelatedmodel"."field2" = value2
    );

The only way I could generate something that looks like the expected result was:

MyModel.objects.exclude(
    my_related__id__in=MyRelatedModel.objects.filter(field__field1=value1, field2=value2))

The resulting sql looks like this:

SELECT "my_app_mymodel"."field1", "my_app_mymodel"."field2"
FROM "my_app_mymodel"
WHERE
    "my_app_mymodel"."tenant_id" = tenant_id AND
    "my_app_mymodel"."my_related_field_id" NOT IN (
        SELECT "my_app_myrelatedmodel"."field1", "my_app_myrelatedmodel"."field2" 
        FROM "my_app_myrelatedmodel"
        WHERE 
            "my_app_myrelatedmodel"."tenant_id" = tenant_id AND 
            "my_app_myrelatedmodel"."field1" = value1
    ) AND
    "my_app_mymodel"."my_related_field_id" NOT IN (
        SELECT "my_app_myrelatedmodel"."field1", "my_app_myrelatedmodel"."field2" 
        FROM "my_app_myrelatedmodel"
        WHERE "my_app_myrelatedmodel"."id" IN (
            SELECT "my_app_myrelatedmodel"."id" 
            FROM "my_app_myrelatedmodel"
            WHERE  
                "my_app_myrelatedmodel"."tenant_id" = tenant_id AND 
                "my_app_myrelatedmodel"."field2" = value2
        )
    );
@techdragon
Copy link

I might have some ideas how to get these queries to come out more optimised, but I'm not 100% sure if I've followed your reasoning correctly.

This definitely seems like something where a concrete example case would be very helpful. If I had a concrete example I could import in a python REPL and work with I could spend some time in the REPL trying the various Query tools in Django (Aggregation functions, Q objects, Prefetch objects, FilteredRelation objects, etc...) to try and find a way to optimise the query. However if I just made up some model classes to try things I won't be sure if I've created an appropriate example and really found a way to fix these query optimisation issues, or if my example is too simplistic and my possible improvements don't fully solve the problems you're talking about here.

@hugobessa Could you add one or two examples that produce this kind of non-optimal queries from the ORM into the included exampleproject Django project, and/or into the test cases for future use in regression testing the optimisations?

@techdragon
Copy link

@hugobessa Its just occurred to me that any work to improve this may have conflicts with the work to make the Tennant model swappable and solve #37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants