-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #8715: eliminates duplicates when used in many-to-many field constraints #8793
Conversation
I have 10,000 devices in my database. Tags tag1 and tag2 are assigned to 1 device. The tag_with_many_matches has been assigned to 1500 devices. Simple request:
Query using DISTINCT:
Query with subquery:
|
Possibly also fixes #8351 |
…nstraints When using permissions that use tags, a user may receive multiple permissions of the same type if multiple tags are assigned to the device. This causes the RestrictedQuerySet class to generate a query similar to this: >>> dcim.models.Device.objects.filter(Q(tags__name='tag1')|Q(tags__name='tag2')) <ConfigContextModelQuerySet [<Device: device1>, <Device: device1>]> This query returns the same object twice if both tags are assigned to it. This is due to the use of the django-taggit library. The library's documentation describes this behavior as expected and suggests using an explicit distinct() call in queries to avoid duplicates. However, the use of DISTINCT in queries has a global side effect - deduplication of responses, which may or may not be acceptable behavior (depending on further use). Since it is not known how RestrictedQuerySet will be used in the rest of the code, it was decided to dedupe using a subquery.
allowed_objects = self.model.objects.filter(attrs) | ||
attrs = Q(pk__in=allowed_objects) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach can be optimized by retrieving only the IDs list (rather than a full queryset) in the first query.
allowed_objects = self.model.objects.filter(attrs) | |
attrs = Q(pk__in=allowed_objects) | |
allowed_object_ids = self.model.objects.filter(attrs).values_list('pk', flat=True) | |
attrs = Q(pk__in=allowed_object_ids) |
Thanks for digging into this @seros1521! I'm having a hard time thinking of a scenario where we wouldn't want to call I'm curious to see whether the change I recommended above reduces the subquery approach substantially. Would you mind testing it and sharing the results? |
This change does not change the resulting query. We don't fetch the result of the subquery directly, but use it in another query. django-orm is smart enough to use only the primary keys of the objects in the subquery.
|
Interesting, I didn't think it looked that far into the query. Good to know! I'm slightly concerned about the modest performance penalty, though it seems unlikely to have a noticeable impact. In the worst case, we might need to revert the change and try using Thanks for your work on this! |
Fixes #8715
Some netbox users create object access rights using tags. Inside netbox, all object level permissions are implemented by the RestrictedQuerySet class:
https://github.com/netbox-community/netbox/blob/3436905744c93fec7ba59a8b7d72ef4102f82334/netbox/utilities/querysets.py
When using permissions that use tags, a user may receive multiple permissions of the same type if multiple tags are assigned to the device. This causes the RestrictedQuerySet class to generate a query similar to this:
This query returns the same object twice if both tags are assigned to it. This is due to the use of the django-taggit library. The library's documentation describes this behavior as expected and suggests using an explicit distinct() call in queries to avoid duplicates.
However, the use of DISTINCT in queries has a global side effect - deduplication of responses, which may or may not be acceptable behavior (depending on further use). Since it is not known how RestrictedQuerySet will be used in the rest of the code, it was decided to dedupe using a subquery.