Add final SQL check when looking up involved tables #199

dbartenstein · 2021-08-21T20:13:12Z

Description

Add a final SQL check to include potentially overlooked tables when looking up involved tables.
Add unit tests showing queries which do "order by" using a field of a referenced table. These tests would fail without the final SQL check.

Rationale

Changing the referenced object should also invalidate the query as calling the query again might lead to another result.

"Order by" allows expressions such as Coalesce as well: https://docs.djangoproject.com/en/3.2/ref/models/querysets/#order-by

Discussion

Initially I thought of adding the final SQL check as configuration option. After having looked at all the queries, I believe that it should be the default behavior. Thus I did not make it an option for now.

Proof of concept for conservative mode: final SQL query check.

dbartenstein · 2021-08-21T21:00:27Z

@Andrew-Chen-Wang

Two additional order_by tests which would fail without the additional SQL query check.
I have written a Proof of Concept for the "redundancy mode" as I believe it’s easier to discuss when there is a code proposal on the table. This redundancy mode makes both order_by tests succeed and would have caught the previously unhandled Case example as well.

What do you think about the proposal of providing the option of enabling an additional SQL check to be on the safe side?

Print differences between regular checks and SQL check.

Adapt unit tests.

dbartenstein · 2021-08-22T20:52:59Z

cachalot/tests/read.py

@@ -328,13 +338,13 @@ def test_subquery(self):
    def test_custom_subquery(self):
        tests = Test.objects.filter(permission=OuterRef('pk')).values('name')
        qs = Permission.objects.annotate(first_permission=Subquery(tests[:1]))
-        self.assert_tables(qs, Permission, Test)
+        self.assert_tables(qs, Permission, Test, ContentType)


The final SQL includes an ORDER BY "django_content_type"."app_label" ASC
https://github.com/django/django/blob/ca9872905559026af82000e46cde6f7dedc897b6/django/contrib/auth/models.py#L72

Thus it’s correct behavior to add ContentType as involved model.

cachalot/tests/read.py

dbartenstein · 2021-08-23T10:29:18Z

@Andrew-Chen-Wang: before diving too deep - what do you think about this "Proof of concept" in general? I.e. the idea of doing a final SQL query check to catch unconsidered tables?
Hint: there still is some way to go - especially with Django 3.1 which seems to have some issues with returning the SQL query.

Andrew-Chen-Wang

I think it's alr and makes sense. I'm just worried about false positives; although it's better to invalidate than pass, if there are still too many queries that are invalidated, then that's a prob.

The original behavior for pre-Django 1.11 in cachalot is exactly what you've implemented here. Going through the commit history, you can see that the IsRawSQL exception was created for tracking subqueries. The question is why not just check the generated SQL query; why bother go through all these Pythonic subqueries in the first place especially with the addition quote_name?

I guess that's a test we can try: replace everything in _get_tables with a single call to _get_tables_from_sql with the generated SQL and quote_name.

Note: I've published a patch release; will be gone for the next two weeks cuz school is starting.

Andrew-Chen-Wang · 2021-08-23T15:19:08Z

cachalot/utils.py

+    # Additional check of the final SQL.
+    # Potentially overlooked tables are added here. Tables may be overlooked by the regular checks
+    # as not all expressions are handled yet. This final check acts as safety net.
+    final_check_tables = _get_tables_from_sql(connections[db_alias], str(query), enable_quote=True)


difference between this and L224? I say just put in try clause.

difference between this and L224? I say just put in try clause.

Put it into an try - except - else.

Andrew-Chen-Wang · 2021-08-23T15:20:30Z

cachalot/utils.py

+    """Returns names of involved tables after analyzing the final SQL query."""
+    return {table for table in connection.introspection.django_table_names()
+            + cachalot_settings.CACHALOT_ADDITIONAL_TABLES
+            if _quote_table_name(table, connection, enable_quote) in lowercased_sql}


nice addition 👍

I also wanna double check that quote_name is not making a call to the database since that sometimes happens for some obscure thing.

nice addition +1

I also wanna double check that quote_name is not making a call to the database since that sometimes happens for some obscure thing.

At first glance quote_name seems to be safe, but it’s better to double-check of course!
https://github.com/django/django/blob/e703b152c6148ddda1b072a4353e9a41dca87f90/django/db/backends/mysql/operations.py#L177

Fix some tests

cachalot/tests/read.py

dbartenstein · 2021-08-23T21:56:32Z

I think it's alr and makes sense. I'm just worried about false positives; although it's better to invalidate than pass, if there are still too many queries that are invalidated, then that's a prob.

My take: Better false positives than false negatives. But of course we should avoid them both. One thing to note is that with the final SQL check the parent models’ tables have to included in CACHALOT_ONLY_CACHABLE_TABLES.

The original behavior for pre-Django 1.11 in cachalot is exactly what you've implemented here. Going through the commit history, you can see that the IsRawSQL exception was created for tracking subqueries. The question is why not just check the generated SQL query; why bother go through all these Pythonic subqueries in the first place especially with the addition quote_name?

Yes, that will be something to investigate.

I guess that's a test we can try: replace everything in _get_tables with a single call to _get_tables_from_sql with the generated SQL and quote_name.

Note: I've published a patch release; will be gone for the next two weeks cuz school is starting.

👍 And happy start of 🏫!

Andrew-Chen-Wang

LGTM. Just a couple of typings and a add-ons, but looks great.

Andrew-Chen-Wang · 2021-08-24T17:47:40Z

cachalot/utils.py

+def _get_tables_from_sql(connection, lowercased_sql,
+                         enable_quote=False):


Suggested change

def _get_tables_from_sql(connection, lowercased_sql,

enable_quote=False):

def _get_tables_from_sql(connection, lowercased_sql, enable_quote=False):

@Andrew-Chen-Wang: what’s the line length limit used in the cachalot project? 120?

It's black's default 88 I believe. Unfortunately, I've got a small computer, so that's why I can't match line length with Django's standards.

Ok - but black is not used for cachalot, is it? Would that be an option?

cachalot/utils.py

cachalot/tests/read.py

Clean up code.

Andrew-Chen-Wang

LGTM! Thanks a lot for this!

Andrew-Chen-Wang · 2021-08-24T22:33:32Z

cachalot/tests/read.py

@@ -917,7 +954,7 @@ def test_now_annotate(self):
        """Check that queries with a Now() annotation are not cached #193"""
        qs = Test.objects.annotate(now=Now())
        self.assert_query_cached(qs, after=1)
-        
+


weird space? I can use pre-commit later to remove this space though, so dw about it

weird space? I can use pre-commit later to remove this space though, so dw about it

Would it make sense to add the pro-commit configuration to the project itself?

cachalot/tests/transaction.py

Andrew-Chen-Wang · 2021-08-24T22:35:15Z

@dbartenstein you'll need to convert this draft PR to a finalized PR

dbartenstein · 2021-08-25T07:10:10Z

@dbartenstein you'll need to convert this draft PR to a finalized PR

@Andrew-Chen-Wang: I will be on vacation for the next 10 days. So I wonder if it was better to postpone merging? Or would you like to do a release ASAP? It’s up to you.

Andrew-Chen-Wang · 2021-08-25T16:19:07Z

Postpone since I like to give a couple days for others to view and for me to think. Plus school. It's not unusual for releases to be a monthly thing.

dbartenstein · 2021-09-05T13:01:02Z

Postpone since I like to give a couple days for others to view and for me to think. Plus school. It's not unusual for releases to be a monthly thing.

@Andrew-Chen-Wang: just wanted to inform you that I am back from vacation 🌴.

coveralls · 2021-11-25T20:22:07Z

Pull Request Test Coverage Report for Build 1626626393

16 of 16 (100.0%) changed or added relevant lines in 3 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.2%) to 97.189%

Totals
Change from base Build 1551740406:	0.2%
Covered Lines:	657
Relevant Lines:	676

💛 - Coveralls

dbartenstein · 2021-11-25T20:33:15Z

Thanks for running the benchmark for us @Fasther! And yes, I hope to get a quick docker-compose file up, so sorry about the inconvenience.

Pertaining to the results, I don't think many organizations (or personally I) would mind the decrease in the "faster" column, though the initial slow down performance drop might be troublesome as that can add up quickly. It may be better to roll this out as a separate, experimental feature @dbartenstein that can be enabled by a bool. That way, we can slowly improve this feature over time while not breaking any benchmarks for current users of cachalot.

@Andrew-Chen-Wang: I just did another optimization to prevent that as_sql() for the whole query is called twice: in get_query_cache_key and in _get_tables. This will very likely improve performance. I didn’t find a better way than adding an attribute to the compiler object for storing the generated SQL query. In general I would favor a more object-oriented approach.

@Fasther: can you please share another benchmark with us?

I am also fine with introducing a bool: CACHALOT_FINAL_SQL_CHECK which is False by default.

Andrew-Chen-Wang · 2021-11-25T20:34:36Z

@dbartenstein yes please introduce a setting for this feature regardless of the results. I think it would be a breaking change regardless due to possible addition of tables unexpectedly.

PavelPancocha · 2021-11-26T13:15:00Z

Optimized solution:

mysql      is 1.5× slower then 9.0× faster
postgresql is 1.3× slower then 10.5× faster
sqlite     is 1.4× slower then 2.6× faster
filebased  is 1.4× slower then 9.1× faster
locmem     is 1.3× slower then 9.9× faster
pylibmc    is 1.4× slower then 7.5× faster
pymemcache is 1.4× slower then 6.5× faster
redis      is 1.5× slower then 6.2× faster

…etting

dbartenstein · 2021-11-27T13:36:52Z

@Andrew-Chen-Wang: @Fasther did great work on the PR and introduced the CACHALOT_FINAL_SQL_CHECK setting 👏

From your point of view - is the PR ready to be merged?
Are you going to update the documentation to include the setting CACHALOT_FINAL_SQL_CHECK?

Andrew-Chen-Wang

Adding the setting to the docs would be great! Additionally, a CHANGELOG entry for this feature for version bump to 2.5.0 would be great.

Just in the docs, explaining why this setting was added would be great. Adding on the performance metrics done by Fasther would be helpful for organizations to decide whether to enable the feature.

Excited to get this out!

cachalot/tests/postgres.py

dbartenstein · 2021-12-27T11:23:23Z

Adding the setting to the docs would be great! Additionally, a CHANGELOG entry for this feature for version bump to 2.5.0 would be great.

Just in the docs, explaining why this setting was added would be great. Adding on the performance metrics done by Fasther would be helpful for organizations to decide whether to enable the feature.

Excited to get this out!

@Andrew-Chen-Wang from our point of view (Thanks to @Fasther) the PR is ready to be merged. It includes documentation as well. What do you think?

Andrew-Chen-Wang · 2021-12-27T16:49:36Z

Thank you. I will review and push a minor version release today.

Andrew-Chen-Wang

Thanks so much @dbartenstein @Fasther this is a great addition!

dbartenstein · 2021-12-27T22:34:58Z

Thanks so much @dbartenstein @Fasther this is a great addition!

@Andrew-Chen-Wang you’re welcome! 🙇

dbartenstein · 2022-01-13T14:55:06Z

@Andrew-Chen-Wang when do you plan to make the next release containing this PR?

Andrew-Chen-Wang · 2022-01-14T23:33:19Z

Done 👍 thanks for this PR again everyone!

Test with "order by" using field of another table.

5ad2920

dbartenstein changed the title ~~Test with "order by" using field of another table.~~ "order by" using field of another table. Aug 21, 2021

Add another unit test with order_by.

9386460

Proof of concept for conservative mode: final SQL query check.

Dominik Bartenstein added 2 commits August 22, 2021 13:05

Improve naming and add comments.

b3ce3c6

Print differences between regular checks and SQL check.

Enable final SQL check.

99e0a2c

Adapt unit tests.

dbartenstein changed the title ~~"order by" using field of another table.~~ Add final SQL check when looking up involved tables Aug 22, 2021

dbartenstein commented Aug 22, 2021

View reviewed changes

cachalot/tests/read.py Outdated Show resolved Hide resolved

Andrew-Chen-Wang requested changes Aug 23, 2021

View reviewed changes

Dominik Bartenstein added 4 commits August 23, 2021 20:54

Work in progress.

9639efd

Merge branch 'master' into feature/order_by

b409678

Prevent Subquery error with Django 3.1

d20de25

Fix some tests

Sqlite does not seem to ORDER BY django_content_type for Permission.

ab7eb91

Andrew-Chen-Wang reviewed Aug 23, 2021

View reviewed changes

cachalot/tests/read.py Outdated Show resolved Hide resolved

Andrew-Chen-Wang reviewed Aug 23, 2021

View reviewed changes

cachalot/tests/read.py Outdated Show resolved Hide resolved

Use self.is_sqlite instead of connection.vendor != 'sqlite'

5d63954

dbartenstein requested a review from Andrew-Chen-Wang August 24, 2021 07:19

Andrew-Chen-Wang requested changes Aug 24, 2021

View reviewed changes

Dominik Bartenstein added 2 commits August 24, 2021 23:03

Replace " with ' (as this seems to be the default quotation mark).

832adf0

Clean up code.

Add unit test for AtomicCache.set

e870b73

dbartenstein requested a review from Andrew-Chen-Wang August 24, 2021 22:09

Andrew-Chen-Wang approved these changes Aug 24, 2021

View reviewed changes

dbartenstein marked this pull request as ready for review August 25, 2021 07:10

PavelPancocha and others added 11 commits November 26, 2021 16:01

Add CACHALOT_FINAL_SQL_CHECK setting

2ed49e2

Adapt _get_tables tests for with and without final SQL check

c34b3d3

Fix patch in assert_tables

fd6dead

Improve naming

b55f50e

Remove setting overrides from assert_tables

4bb7efa

Add decorators for testing different CACHALOT_FINAL_SQL_CHECK settings

5ed8453

Add decorators final SQL tests decorators.

21c79d7

Add decorators final SQL checks decorators to tests - read.py

9e5ae1b

Add decorators final SQL checks decorators to tests - postgres.py

2c45240

Clean up code

155464f

Merge pull request #1 from dbartenstein/feature/add_final_sql_check_s…

b1b9d37

…etting

Andrew-Chen-Wang requested changes Nov 29, 2021

View reviewed changes

cachalot/tests/postgres.py Show resolved Hide resolved

cachalot/tests/postgres.py Show resolved Hide resolved

dbartenstein and others added 4 commits December 18, 2021 00:08

Merge branch 'master' into feature/order_by

5cbac14

Fix failing 'test_subquery' on Django 4+

7356556

Update docs & changelog with CACHALOT_FINAL_SQL_CHECK setting

17871ef

Merge pull request #2 from dbartenstein/feature/docs_and_tests

38bf231

Andrew-Chen-Wang self-requested a review December 27, 2021 17:29

Andrew-Chen-Wang approved these changes Dec 27, 2021

View reviewed changes

Andrew-Chen-Wang merged commit 434a575 into noripyt:master Dec 27, 2021

danlamanna mentioned this pull request Sep 4, 2024

Clarify and consider changing the default value of CACHALOT_FINAL_SQL_CHECK #266

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add final SQL check when looking up involved tables #199

Add final SQL check when looking up involved tables #199

dbartenstein commented Aug 21, 2021 •

edited

Loading

dbartenstein commented Aug 21, 2021 •

edited

Loading

dbartenstein Aug 22, 2021 •

edited

Loading

dbartenstein commented Aug 23, 2021 •

edited

Loading

Andrew-Chen-Wang left a comment

Andrew-Chen-Wang Aug 23, 2021

dbartenstein Aug 24, 2021

Andrew-Chen-Wang Aug 23, 2021

dbartenstein Aug 23, 2021

dbartenstein commented Aug 23, 2021

Andrew-Chen-Wang left a comment

Andrew-Chen-Wang Aug 24, 2021

dbartenstein Aug 24, 2021 •

edited

Loading

Andrew-Chen-Wang Aug 24, 2021

dbartenstein Aug 25, 2021

Andrew-Chen-Wang left a comment

Andrew-Chen-Wang Aug 24, 2021

dbartenstein Aug 25, 2021

Andrew-Chen-Wang commented Aug 24, 2021

dbartenstein commented Aug 25, 2021 •

edited

Loading

Andrew-Chen-Wang commented Aug 25, 2021

dbartenstein commented Sep 5, 2021

coveralls commented Nov 25, 2021 •

edited

Loading

dbartenstein commented Nov 25, 2021

Andrew-Chen-Wang commented Nov 25, 2021

PavelPancocha commented Nov 26, 2021

dbartenstein commented Nov 27, 2021 •

edited

Loading

Andrew-Chen-Wang left a comment

dbartenstein commented Dec 27, 2021

Andrew-Chen-Wang commented Dec 27, 2021

Andrew-Chen-Wang left a comment •

edited

Loading

dbartenstein commented Dec 27, 2021

dbartenstein commented Jan 13, 2022

Andrew-Chen-Wang commented Jan 14, 2022

		def _get_tables_from_sql(connection, lowercased_sql,
		enable_quote=False):

Add final SQL check when looking up involved tables #199

Add final SQL check when looking up involved tables #199

Conversation

dbartenstein commented Aug 21, 2021 • edited Loading

Description

Rationale

Discussion

dbartenstein commented Aug 21, 2021 • edited Loading

dbartenstein Aug 22, 2021 • edited Loading

Choose a reason for hiding this comment

dbartenstein commented Aug 23, 2021 • edited Loading

Andrew-Chen-Wang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbartenstein commented Aug 23, 2021

Andrew-Chen-Wang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbartenstein Aug 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Andrew-Chen-Wang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Andrew-Chen-Wang commented Aug 24, 2021

dbartenstein commented Aug 25, 2021 • edited Loading

Andrew-Chen-Wang commented Aug 25, 2021

dbartenstein commented Sep 5, 2021

coveralls commented Nov 25, 2021 • edited Loading

Pull Request Test Coverage Report for Build 1626626393

💛 - Coveralls

dbartenstein commented Nov 25, 2021

Andrew-Chen-Wang commented Nov 25, 2021

PavelPancocha commented Nov 26, 2021

Optimized solution:

dbartenstein commented Nov 27, 2021 • edited Loading

Andrew-Chen-Wang left a comment

Choose a reason for hiding this comment

dbartenstein commented Dec 27, 2021

Andrew-Chen-Wang commented Dec 27, 2021

Andrew-Chen-Wang left a comment • edited Loading

Choose a reason for hiding this comment

dbartenstein commented Dec 27, 2021

dbartenstein commented Jan 13, 2022

Andrew-Chen-Wang commented Jan 14, 2022

dbartenstein commented Aug 21, 2021 •

edited

Loading

dbartenstein commented Aug 21, 2021 •

edited

Loading

dbartenstein Aug 22, 2021 •

edited

Loading

dbartenstein commented Aug 23, 2021 •

edited

Loading

dbartenstein Aug 24, 2021 •

edited

Loading

dbartenstein commented Aug 25, 2021 •

edited

Loading

coveralls commented Nov 25, 2021 •

edited

Loading

dbartenstein commented Nov 27, 2021 •

edited

Loading

Andrew-Chen-Wang left a comment •

edited

Loading