-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add final SQL check when looking up involved tables #199
Add final SQL check when looking up involved tables #199
Conversation
Proof of concept for conservative mode: final SQL query check.
What do you think about the proposal of providing the option of enabling an additional SQL check to be on the safe side? |
Print differences between regular checks and SQL check.
Adapt unit tests.
@@ -328,13 +338,13 @@ def test_subquery(self): | |||
def test_custom_subquery(self): | |||
tests = Test.objects.filter(permission=OuterRef('pk')).values('name') | |||
qs = Permission.objects.annotate(first_permission=Subquery(tests[:1])) | |||
self.assert_tables(qs, Permission, Test) | |||
self.assert_tables(qs, Permission, Test, ContentType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The final SQL includes an ORDER BY "django_content_type"."app_label" ASC
https://github.com/django/django/blob/ca9872905559026af82000e46cde6f7dedc897b6/django/contrib/auth/models.py#L72
Thus it’s correct behavior to add ContentType
as involved model.
@Andrew-Chen-Wang: before diving too deep - what do you think about this "Proof of concept" in general? I.e. the idea of doing a final SQL query check to catch unconsidered tables? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's alr and makes sense. I'm just worried about false positives; although it's better to invalidate than pass, if there are still too many queries that are invalidated, then that's a prob.
The original behavior for pre-Django 1.11 in cachalot is exactly what you've implemented here. Going through the commit history, you can see that the IsRawSQL
exception was created for tracking subqueries. The question is why not just check the generated SQL query; why bother go through all these Pythonic subqueries in the first place especially with the addition quote_name
?
I guess that's a test we can try: replace everything in _get_tables
with a single call to _get_tables_from_sql
with the generated SQL and quote_name
.
Note: I've published a patch release; will be gone for the next two weeks cuz school is starting.
cachalot/utils.py
Outdated
# Additional check of the final SQL. | ||
# Potentially overlooked tables are added here. Tables may be overlooked by the regular checks | ||
# as not all expressions are handled yet. This final check acts as safety net. | ||
final_check_tables = _get_tables_from_sql(connections[db_alias], str(query), enable_quote=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
difference between this and L224? I say just put in try clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
difference between this and L224? I say just put in try clause.
Put it into an try - except - else.
"""Returns names of involved tables after analyzing the final SQL query.""" | ||
return {table for table in connection.introspection.django_table_names() | ||
+ cachalot_settings.CACHALOT_ADDITIONAL_TABLES | ||
if _quote_table_name(table, connection, enable_quote) in lowercased_sql} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice addition 👍
I also wanna double check that quote_name
is not making a call to the database since that sometimes happens for some obscure thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice addition +1
I also wanna double check that
quote_name
is not making a call to the database since that sometimes happens for some obscure thing.
At first glance quote_name
seems to be safe, but it’s better to double-check of course!
https://github.com/django/django/blob/e703b152c6148ddda1b072a4353e9a41dca87f90/django/db/backends/mysql/operations.py#L177
My take: Better false positives than false negatives. But of course we should avoid them both. One thing to note is that with the final SQL check the parent models’ tables have to included in
Yes, that will be something to investigate.
👍 And happy start of 🏫! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a couple of typings and a add-ons, but looks great.
cachalot/utils.py
Outdated
def _get_tables_from_sql(connection, lowercased_sql, | ||
enable_quote=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _get_tables_from_sql(connection, lowercased_sql, | |
enable_quote=False): | |
def _get_tables_from_sql(connection, lowercased_sql, enable_quote=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Andrew-Chen-Wang: what’s the line length limit used in the cachalot project? 120?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's black's default 88 I believe. Unfortunately, I've got a small computer, so that's why I can't match line length with Django's standards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - but black is not used for cachalot, is it? Would that be an option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks a lot for this!
@@ -917,7 +954,7 @@ def test_now_annotate(self): | |||
"""Check that queries with a Now() annotation are not cached #193""" | |||
qs = Test.objects.annotate(now=Now()) | |||
self.assert_query_cached(qs, after=1) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird space? I can use pre-commit later to remove this space though, so dw about it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird space? I can use pre-commit later to remove this space though, so dw about it
Would it make sense to add the pro-commit configuration to the project itself?
@dbartenstein you'll need to convert this draft PR to a finalized PR |
@Andrew-Chen-Wang: I will be on vacation for the next 10 days. So I wonder if it was better to postpone merging? Or would you like to do a release ASAP? It’s up to you. |
Postpone since I like to give a couple days for others to view and for me to think. Plus school. It's not unusual for releases to be a monthly thing. |
@Andrew-Chen-Wang: just wanted to inform you that I am back from vacation 🌴. |
Pull Request Test Coverage Report for Build 1626626393
💛 - Coveralls |
@Andrew-Chen-Wang: I just did another optimization to prevent that @Fasther: can you please share another benchmark with us? I am also fine with introducing a bool: |
@dbartenstein yes please introduce a setting for this feature regardless of the results. I think it would be a breaking change regardless due to possible addition of tables unexpectedly. |
Optimized solution:
|
@Andrew-Chen-Wang: @Fasther did great work on the PR and introduced the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the setting to the docs would be great! Additionally, a CHANGELOG entry for this feature for version bump to 2.5.0 would be great.
Just in the docs, explaining why this setting was added would be great. Adding on the performance metrics done by Fasther would be helpful for organizations to decide whether to enable the feature.
Excited to get this out!
@Andrew-Chen-Wang from our point of view (Thanks to @Fasther) the PR is ready to be merged. It includes documentation as well. What do you think? |
Thank you. I will review and push a minor version release today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much @dbartenstein @Fasther this is a great addition!
@Andrew-Chen-Wang you’re welcome! 🙇 |
@Andrew-Chen-Wang when do you plan to make the next release containing this PR? |
Done 👍 thanks for this PR again everyone! |
Description
Rationale
Changing the referenced object should also invalidate the query as calling the query again might lead to another result.
"Order by" allows expressions such as
Coalesce
as well: https://docs.djangoproject.com/en/3.2/ref/models/querysets/#order-byDiscussion
Initially I thought of adding the final SQL check as configuration option. After having looked at all the queries, I believe that it should be the default behavior. Thus I did not make it an option for now.