Problem with chained filters and Django ORM #745

eduardocalil · 2017-07-04T16:33:50Z

As related in the issue #537, the Django ORM generate duplicated INNER JOINS for each .filter() .

For example:
Model.objecs.filter(table1__attr1=value1).filter(table1__attr2=value2)

The built query is:

SELECT *
FROM model_table
INNER JOIN table1 ON (model_table.id = table1.model_table_id)
INNER JOIN table1 T5 ON (model_table.id = T5.model_table_id)
WHERE (table1.attr1=value1 AND T5.attr2=value2)

But the correct query should be:

SELECT *
FROM model_table
INNER JOIN table1 ON (model_table.id = table1.model_table_id)
WHERE (table1.attr1=value1 AND table1.attr2=value2)

The first query may return unwanted or unexpected results.

Reading the code I found the method def qs(self) in class BaseFilterSet from the file filterset.py and I realized the following workflow:

Get all the model objects using qs = self.queryset.all()
Iterates in the list of declared filters (fields)
In other words, for each field is done the same as qs = self.queryset.filter(field_n=value_n) , what causes the described problem.
When the iteration finishes, the queryset (qs) is returned

The solution that I've developed for the chained filters problem is:

    def qs(self):
        kwargs = {}
        if not hasattr(self, '_qs'):
            if not self.is_bound:
                self._qs = self.queryset.all()
                return self._qs

            if not self.form.is_valid():
                if self.strict == STRICTNESS.RAISE_VALIDATION_ERROR:
                    raise forms.ValidationError(self.form.errors)
                elif self.strict == STRICTNESS.RETURN_NO_RESULTS:
                    self._qs = self.queryset.none()
                    return self._qs
                # else STRICTNESS.IGNORE...  ignoring

            # start with all the results and filter from there
            qs = self.queryset.all()
            for name, filter_ in six.iteritems(self.filters):
                value = self.form.cleaned_data.get(name)

                if value is not None and value:  # valid & clean data
                    # If all o these conditions are true, it means that
                    # field (filter_) is filtering the queryset
                    # So, it must be added to the filtering dict
                    kwargs[filter_.name] = value

            # The same as qs.filter(filter_.name1=value1, filter_.name2=value2, ...)
            self._qs = qs.filter(**kwargs)

        return self._qs

Instead of making queries for each field, all of them are stored in a dictionary and the queryset is filtered only one time using this dict.

What do you think about this solution? Would it break other parts of the code?

The text was updated successfully, but these errors were encountered:

carltongibson · 2017-07-04T19:24:47Z

What do you think about this solution?

Still not yet convinced this is a DF issue, rather than an ORM one — what do the Django people say about this? This behaviour is as old as the ORM itself, so it's come up before.

Would it break other parts of the code?

There's an easy way to check that. 🙂 — Open an PR and see what breaks.

carltongibson · 2017-07-04T19:40:07Z

what do the Django people say

Also, what do the DB people say? If I make a double join, is that not a no-op? (Maybe it isn't — but, if not, why not? And, why is this a DF issue?)

rpkilby · 2017-07-04T21:56:29Z

If I'm not mistaken, this specifically refers to filtering across to-many relationships, which returns different results for .filter(related__a=1).filter(related__b=2) and .filter(related__a=1, related__b=2). There are no issues for to-one relationships.

If I make a double join, is that not a no-op?

Nope - the second join is aliased as T5, which is what the second where clause applies to. This is different from the latter example, where both where clauses operate on table1. The docs topic give a good example of the effective differences between the two queries. While both are technically valid, 90% of the time the latter query is what's intended, while the former is what django-filter constructs.

why is this a DF issue?

There really isn't a trivial way to implement the desired behavior at the moment.

rpkilby · 2017-07-04T22:05:24Z

Couple of side notes:

I thought about changing filters to return Q objects, however this would prevent filters from using other parts of the queryset API. Filters like OrderingFilter would not be possible.
Another solution is to use a subquery, but you need something to collect the related filters and create the nested query that they operated on. This is something I've started working on in Investigate using subqueries to filter across relationships philipn/django-rest-framework-filters#99, but progress is slow.

eduardocalil · 2017-08-01T13:49:35Z

I think that I've finally found the solution for this issue. Inside the model QuerySet exists a method called _next_is_sticky(). It prevents the duplicated INNER JOINS. You will undestand it better after reading a very short documentation about this method in the following link:

https://kite.com/docs/python/django.db.models.query.QuerySet._next_is_sticky

Here is my Pull Request: #753

eduardocalil · 2017-08-01T13:50:52Z

But it has not passed in django-rest tests. Can someone who understands how it works help me?

josert · 2017-10-03T17:21:36Z

As a quick win it would be really great to adapt this behavior until a more consistent feature was available (as a config parameter to determine if it is desired to work with stiky or not in the current filter).

Current behavior breaks the possibility to use this magnificient plugin in project where is neccessary to filter between relations as these, a real shame : (

steve-kasica · 2017-10-05T20:11:42Z

I agree that the _next_is_sticky() approach looks promising; however, I found the implementation in PR #753 brakes the desired behavior of ModelMultipleChoiceFilter where conjoined=True. As you might imagine, it executed an SQL query along these lines and returns the empty set.

SELECT *
FROM model_table
INNER JOIN table1 ON (model_table.id = table1.model_table_id)
WHERE (table1.attrID=value1 AND table1.attrID=value2)

Having a solution to this to-many chaining issue would greatly simplify my REST API that uses django-filter to filter results across many-to-many relationships. Please let me know how I can lend a hand.

carltongibson · 2017-10-06T06:40:50Z

I'm essentially -1 on making a change here.

First off, I'm not going to add anything on the basis of it being "a quick win" — that way lays trouble later.

Second, more to the point, it seems to me you should be handling this with a custom filter, custom field using a multi-widget. i.e. a single field expecting multiple values which is then used in a single filter call on the queryset.

I'd be interested in seeing what people come up with that way. No doubt there's something we can add, which doesn't involve setting state flags on the core filtering behaviour.

rpkilby · 2017-10-09T19:21:02Z

For what it's worth, I've been making a lot of headway on the nested filtering issue. Basically, queries will go from

Blog.objects.filter(entry__headline__contains='Lennon').filter(entry__pub_date__year=2008)

to something like

Blog.objects.filter(
    entry=Entry.objects.filter(
        headline__contains='Lennon',
        pub_date__year=2008,
    ),
)

In addition to enabling the correct behavior for to-many relationships, this will allow us to enable request-based filtering on related queries. eg, those entries that are currently published or are in the process of being authored by the user.

def visible_entries(request):
    return Entry.objects.filter(Q(is_published=True) | Q(author=request.user))

Blog.objects.filter(
    entry=visible_entries(request).filter(
        headline__contains='Lennon',
        pub_date__year=2008,
    ),
)

That said, I'm beating my head against the wall in regards to some potential performance issues. Unsurprisingly, instantiating n FilterSets is n times as expensive.

carltongibson · 2017-10-10T10:25:35Z

@rpkilby Push your branch. Let's see what you've got. 🙂

I don't want anything too complex. The point of DF is to avoid common boilerplate. There's nothing wrong in writing the filtering logic by hand in non-core cases. I much prefer that to magic no-one can debug.

carltongibson · 2017-10-17T14:28:27Z

Closing this as per #753

rpkilby · 2017-10-18T01:16:18Z

@carltongibson, I'm fairly close to pushing out a PR, but it's for the drf-filters project.

carltongibson · 2017-10-18T05:20:57Z

OK. Cool. Drop a note here when it’s ready. I’ll have a look.

rpkilby · 2017-10-24T15:12:05Z

Okay, a first pass is up (see: philipn/django-rest-framework-filters#197).

jonathan-golorry · 2018-01-30T06:08:11Z

I'm also encountering this issue. What was wrong with the dictionary approach? It seems much better than mucking around with next_is_sticky or nested querysets.

Edit: Ah, I think I understand the issue rpkilby is describing in the "couple of side notes" post. There are a lot of django ORM features like "distinct" that are locked off by sticking to a kwargs dictionary or even a series of Q objects.

jacebrowning · 2019-11-03T11:58:59Z

I'm also blocked by this issue. Including three or more queries of fields on a many-to-many relationship (e.g. /api/things?other__foo=1&other__bar=2&other__qux=3) cases my application to time out due to the repeated INNER JOIN statements.

Applying @eduardocalil's fix the the current version of django-filter is still a one-line change: citizenlabsgr@7e228f5

pawelzar · 2019-11-20T14:55:21Z

Hey, I also encountered some issues related to filtering on the relationships. When I want to use 2 filters on a related field I always expect the filters to be applied on the same table. I wasn't able to find a real world example, where django's approach would be justified and seems like this is problematic for many users.

As @jacebrowning mentioned sometimes it generates so many JOINs that it's not even possible to run the query (e.g. for multi-word search term). This is still an issue in django admin's search when it uses the relationships. There was even an attempt to change it quite a long time ago, but unfortunately it was reverted.

Would it be possible to add this _next_is_sticky one-line fix which @jacebrowning mentioned to django-filter? It looks promising. Maybe we could check if there is a FILTERS_USE_STICKY = True option in django settings and then apply the _next_is_sticky to filters? This would preserve the current behavior, but would also provide a convenient solution for users, for whom the default behavior is not suitable. This shouldn't harm anyone. @carltongibson what do you think?

If this sounds reasonable I could prepare a PR with the adjustment + docs update and some tests.

carltongibson · 2019-11-28T17:35:24Z

Ok, so, the approach that would be worth looking at here is a multi-filter that wrapped up the whole multi-widget and single filter call business. That’s the solve it, but folks don’t know how to configure that themselves. It would be a nice addition.

jonathan-golorry · 2019-12-16T18:25:02Z

@carltongibson It's currently pretty difficult to set up because you need to subclass MultiValueField and either subclass MultiWidget or pass it widgets through the filter.

A MultiWidget by itself doesn't give you type conversion. If I run a widgets=[forms.TextInput, forms.NumberInput] widget through a CharFilter, both values are strings.

If I create a filter with field_class = forms.MultiValueField and pass it fields and a widget, then I still run into a NotImplementedError on compress.

I think you basically need to implement all of the following:

class GroupedWidget(SuffixedMultiWidget):
    suffixes = ["my_string", "my_number"]

    def __init__(self):
        super().__init__(widgets=[forms.TextInput, forms.NumberInput])


class GroupedField(forms.MultiValueField):
    widget = GroupedWidget

    def __init__(self, *args, **kwargs):
        fields = [forms.CharField(), forms.DecimalField()]
        super().__init__(fields=fields, *args, **kwargs)

    def compress(self, data_list):
        return data_list


class GroupedFilter(filters.Filter):
    field_class = GroupedField

I've written some helpers to automatically put the nested field's widgets onto a SuffixedMultiWidget and filter on multiple related values in a single call:

class SuffixedMultiField(forms.MultiValueField):
    widget_parent = SuffixedMultiWidget

    def __init__(self, suffixes, fields, *args, **kwargs):
        if "widget" not in kwargs:
            widgets = [f.widget for f in fields]

            class ChildWidget(self.widget_parent):
                def __init__(self, *args, **kwargs):
                    super().__init__(widgets=widgets, *args, **kwargs)

            ChildWidget.suffixes = suffixes

            kwargs["widget"] = ChildWidget
        super().__init__(fields=fields, *args, **kwargs)

    def compress(self, data_list):
        return data_list


class SuffixedMultiFilter(filters.Filter):
    field_class = SuffixedMultiField

    def filter(self, qs, value):
        lookups = {}
        widget = self.field.widget
        name = self.field_name
        for suffix, v in zip(widget.suffixes, value):
            if v not in EMPTY_VALUES:
                # lookup = widget.suffixed(name, suffix)
                lookup = "__".join([name, suffix]) if suffix else name
                lookups[lookup] = v

        if not lookups:
            return qs
        if self.distinct:
            qs = qs.distinct()

        qs = self.get_method(qs)(**lookups)
        return qs

This simplifies usage to:

    my_group = SuffixedMultiFilter(
        suffixes=["my_string", "my_number"],
        fields=[forms.CharField(), forms.DecimalField()],
        method="my_method",  # optional
    )

It's not perfect, but it's a lot more usable. Note that SuffixedMultiFilter.filter by default groups the lookups into a single filter/exclude, solving the discussed issue.

I actually subclass SuffixedMultiWidget to put 2 underscores between the name and suffix so that url parameters follow the same exact naming as django filters, but I left that configurable via parent_widget. Speaking of SuffixedMultiWidget, shouldn't the empty decompress return [None]*len(self.suffixes) instead of assuming a length of 2?

I also considered writing a method that could transform a filterset into a single grouped filter. This approach is cool, but we'd need to split the filter kwarg generation logic out into a separate function on all the existing filters for it to really be worth it.

yuekui · 2022-03-03T02:58:40Z

For anybody who's still looking for a solution of this issue, here's another simple alternative from my gist:

from django.db.models import Q
from django.db.models.constants import LOOKUP_SEP
from django_filters import rest_framework as django_filters
from django_filters.constants import EMPTY_VALUES


class GroupFilterSet(django_filters.FilterSet):
    def filter_queryset(self, queryset):
        """
        Group the fitlers by the first join table to
        reduce inner join queries for performance.
        This would avoid filter chaining like:
        `Model.objects.filter(table_foo__id__in=[xx,xx]).filter(table_foo__name__in=[xx,xx])`
        Instead, it would be grouped as:
        `Model.objects.filter(table_foo__id__in=[xx,xx], table_foo__name__in=[xx,xx])`
        Inspired by discussion at:
        https://github.com/carltongibson/django-filter/issues/745
        https://github.com/carltongibson/django-filter/pull/1167
        """
        groups = {}
        is_distincted = False
        for name, value in self.form.cleaned_data.items():
            if value in EMPTY_VALUES:
                continue
            f = self.filters[name]
            # Do not merge Qs for customized filter method due to complexity.
            if f._method or not f.__class__.filter == django_filters.Filter.filter:
                queryset = self.filters[name].filter(queryset, value)
                continue
            # Use the joined table name as key
            group_name = f.field_name.split(LOOKUP_SEP)[0]
            q = Q(**{LOOKUP_SEP.join([f.field_name, f.lookup_expr]): value})
            if f.exclude:
                q = ~q
            # Check if there's any join query with the same table
            if group_name in groups:
                groups[group_name] = groups[group_name] & q
            else:
                groups[group_name] = q
            if f.distinct:
                is_distincted = True
        for q in groups.values():
            queryset = queryset.filter(q)
        if is_distincted:
            queryset = queryset.distinct()
        return queryset

mike667 · 2023-08-08T11:13:49Z

For anybody who's still looking for a solution of this issue, here's another simple alternative from my gist:

from django.db.models import Q
from django.db.models.constants import LOOKUP_SEP
from django_filters import rest_framework as django_filters
from django_filters.constants import EMPTY_VALUES


class GroupFilterSet(django_filters.FilterSet):
    def filter_queryset(self, queryset):
        """
        Group the fitlers by the first join table to
        reduce inner join queries for performance.
        This would avoid filter chaining like:
        `Model.objects.filter(table_foo__id__in=[xx,xx]).filter(table_foo__name__in=[xx,xx])`
        Instead, it would be grouped as:
        `Model.objects.filter(table_foo__id__in=[xx,xx], table_foo__name__in=[xx,xx])`
        Inspired by discussion at:
        https://github.com/carltongibson/django-filter/issues/745
        https://github.com/carltongibson/django-filter/pull/1167
        """
        groups = {}
        is_distincted = False
        for name, value in self.form.cleaned_data.items():
            if value in EMPTY_VALUES:
                continue
            f = self.filters[name]
            # Do not merge Qs for customized filter method due to complexity.
            if f._method or not f.__class__.filter == django_filters.Filter.filter:
                queryset = self.filters[name].filter(queryset, value)
                continue
            # Use the joined table name as key
            group_name = f.field_name.split(LOOKUP_SEP)[0]
            q = Q(**{LOOKUP_SEP.join([f.field_name, f.lookup_expr]): value})
            if f.exclude:
                q = ~q
            # Check if there's any join query with the same table
            if group_name in groups:
                groups[group_name] = groups[group_name] & q
            else:
                groups[group_name] = q
            if f.distinct:
                is_distincted = True
        for q in groups.values():
            queryset = queryset.filter(q)
        if is_distincted:
            queryset = queryset.distinct()
        return queryset

I expected this as default django-filter behaviour. Thank you.

eduardocalil mentioned this issue Jul 31, 2017

Fix #1229 Pesquisa por autor principal e co-autoria e conserta o bug do Django ORM interlegis/sapl#1318

Merged

carltongibson closed this as completed Oct 17, 2017

JoshBone mentioned this issue Jul 11, 2018

/browse/institutions/ - report year filter extremely slow EQAR/eqar_backend#218

Closed

bruno-fs pushed a commit to tikal-tech/django-filter that referenced this issue Sep 20, 2018

Fix carltongibson#745

f10fecb

bruno-fs pushed a commit to tikal-tech/django-filter that referenced this issue Sep 20, 2018

Fix carltongibson#745

ec8fbb4

bruno-fs added a commit to tikal-tech/django-filter that referenced this issue Sep 20, 2018

fix carltongibson#745

6927460

bruno-fs added a commit to tikal-tech/django-filter that referenced this issue Sep 20, 2018

fix carltongibson#745

e8c7af7

bruno-fs mentioned this issue Sep 20, 2018

Improve performance on ManyToMany chained filters #977

Closed

bruno-fs pushed a commit to tikal-tech/django-filter that referenced this issue Sep 21, 2018

Fix carltongibson#745

48edb7c

bruno-fs added a commit to tikal-tech/django-filter that referenced this issue Sep 21, 2018

fix carltongibson#745

9fb78bc

bruno-fs added a commit to tikal-tech/django-filter that referenced this issue Sep 21, 2018

fix carltongibson#745

61bf8b1

bruno-fs pushed a commit to tikal-tech/django-filter that referenced this issue Sep 21, 2018

Fix carltongibson#745

4187057

bruno-fs added a commit to tikal-tech/django-filter that referenced this issue Sep 21, 2018

fix carltongibson#745

71dd9c8

rpkilby mentioned this issue May 9, 2019

Multifield filtering on related models does not work as expected #1077

Open

rpkilby mentioned this issue Jul 26, 2019

Feature request: automatic filter negation/exclusion #978

Closed

jonathan-golorry mentioned this issue Jan 13, 2020

Added filters that filter multiple values at once using multiwidgets and multivaluefields #1164

Closed

danmash mentioned this issue Nov 27, 2023

Add group validation & filtering #1167

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with chained filters and Django ORM #745

Problem with chained filters and Django ORM #745

eduardocalil commented Jul 4, 2017 •

edited

Loading

carltongibson commented Jul 4, 2017

carltongibson commented Jul 4, 2017 •

edited

Loading

rpkilby commented Jul 4, 2017

rpkilby commented Jul 4, 2017

eduardocalil commented Aug 1, 2017

eduardocalil commented Aug 1, 2017 •

edited

Loading

josert commented Oct 3, 2017

steve-kasica commented Oct 5, 2017

carltongibson commented Oct 6, 2017

rpkilby commented Oct 9, 2017

carltongibson commented Oct 10, 2017

carltongibson commented Oct 17, 2017

rpkilby commented Oct 18, 2017 •

edited

Loading

carltongibson commented Oct 18, 2017

rpkilby commented Oct 24, 2017

jonathan-golorry commented Jan 30, 2018 •

edited

Loading

jacebrowning commented Nov 3, 2019 •

edited

Loading

pawelzar commented Nov 20, 2019 •

edited

Loading

carltongibson commented Nov 28, 2019

jonathan-golorry commented Dec 16, 2019 •

edited

Loading

yuekui commented Mar 3, 2022 •

edited

Loading

mike667 commented Aug 8, 2023

Problem with chained filters and Django ORM #745

Problem with chained filters and Django ORM #745

Comments

eduardocalil commented Jul 4, 2017 • edited Loading

carltongibson commented Jul 4, 2017

carltongibson commented Jul 4, 2017 • edited Loading

rpkilby commented Jul 4, 2017

rpkilby commented Jul 4, 2017

eduardocalil commented Aug 1, 2017

eduardocalil commented Aug 1, 2017 • edited Loading

josert commented Oct 3, 2017

steve-kasica commented Oct 5, 2017

carltongibson commented Oct 6, 2017

rpkilby commented Oct 9, 2017

carltongibson commented Oct 10, 2017

carltongibson commented Oct 17, 2017

rpkilby commented Oct 18, 2017 • edited Loading

carltongibson commented Oct 18, 2017

rpkilby commented Oct 24, 2017

jonathan-golorry commented Jan 30, 2018 • edited Loading

jacebrowning commented Nov 3, 2019 • edited Loading

pawelzar commented Nov 20, 2019 • edited Loading

carltongibson commented Nov 28, 2019

jonathan-golorry commented Dec 16, 2019 • edited Loading

yuekui commented Mar 3, 2022 • edited Loading

mike667 commented Aug 8, 2023

eduardocalil commented Jul 4, 2017 •

edited

Loading

carltongibson commented Jul 4, 2017 •

edited

Loading

eduardocalil commented Aug 1, 2017 •

edited

Loading

rpkilby commented Oct 18, 2017 •

edited

Loading

jonathan-golorry commented Jan 30, 2018 •

edited

Loading

jacebrowning commented Nov 3, 2019 •

edited

Loading

pawelzar commented Nov 20, 2019 •

edited

Loading

jonathan-golorry commented Dec 16, 2019 •

edited

Loading

yuekui commented Mar 3, 2022 •

edited

Loading