-
Notifications
You must be signed in to change notification settings - Fork 419
Proposal for disabling long-lived prepared statements as a cache by default for Connection
methods
#76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@vitaly-burovoy First of all, thanks for looking into these things! This is a good discussion to have. That said, I don't think that all your points are necessarily valid grounds for disabling the statement cache by default. I think this topic warrants further discussion and feedback from the users. I'll go through each point.
Postgres JDBC [1] and Rails ActiveRecord [2] use cached prepared statements by default. I'd like to think that that this is a proof that it can be done and it can work beneficially. Specifically, AR handles the "cached plan" issue [3] with the following algorithm:
The maximum risk with this approach is that a very limited number of requests to the database might fail after the live migration. But that can happen for a bunch of other reasons as well, and if that part of your application is so critical that it cannot tolerate a failed transaction, you should write the retry logic anyway.
If 99.99% of your table column consists of the same value, then creating an index on it is pointless, and will just impact write performance for no benefit. Your example is an extremely pathological case. In practice, Postgres' query cost stats are usually fairly accurate, and the server will only choose the generic plan, if the last five custom plans were of similar cost. Thus, I'm not convinced. Given [1] and [2], I'd like to see some real reports of users experiencing significant performance issues on their data before making a conclusion that caching prepared statements by default is bad for the majority of cases.
This is trivially fixable by choosing a prefix that is unlikely to collide: Bottom line: a) there are straightforward fixes for the "cached plan" issue and the name collision issue; b) the generic plan issue is debatable, and we'd like to see some real-world feedback before making a decision; c) we can't do anything about [1] JDBC connection parameters |
I'm sorry for the very late response. Unfortunately, nobody wants to join the discussion and to support either side.
If I'm wrong, please, correct me. I'm neither Java user nor PgJDBC one. What I found by the link[1]: But there is a property[2] which is set by default to
No doubt about. But ActiveRecord is a higher-lever abstraction (I'd compare it with
Yes, it is a good algorithm for the case
Sure. I created this issue because I expect the exception
OK, change: - await conn.execute("CREATE INDEX ON abc(i);")
+ await conn.execute("CREATE INDEX ON abc(i) WHERE i <> 1;") and get the same result without impacting write performance. Also note that statistics change during filling/updating a table and (auto-)ANALYZE. But a prepared statement will still use a generic plan. Pay attention that my examples have very simple queries. OLAP ones can be several kB long and have multiple joins for which static generic plan is a pretty bad.
Yes, because it is an example. I can't give you my real data, but such cases take place.
Yes, but if there is no "Sync" message between them, it is a single transaction ("ReadyForQuery" is sent only once after
I don't think so. It just pays attention to the 'ReadyForQuery' message[4] which is sent as an answer for the 'Sync' message. If the current state is "Idle" it can reuse the server's connection for one of other clients. Bottom line: [1] JDBC connection parameters |
PostgreSQL will raise an exception when it detects that the result type of the query has changed from when the statement was prepared. This may happen, for example, after an ALTER TABLE or SET search_path. When this happens, and there is no transaction running, we can simply re-prepare the statement and try again. If the transaction _is_ running, this error will put it into an error state, and we have no choice but to raise an exception. The original error is somewhat cryptic, so we raise a custom InvalidCachedStatementError with the original server exception as context. In either case we clear the statement cache for this connection and all other connections of the pool this connection belongs to (if any). See #72 and #76 for discussion. Fixes: #72.
PostgreSQL will raise an exception when it detects that the result type of the query has changed from when the statement was prepared. This may happen, for example, after an ALTER TABLE or SET search_path. When this happens, and there is no transaction running, we can simply re-prepare the statement and try again. If the transaction _is_ running, this error will put it into an error state, and we have no choice but to raise an exception. The original error is somewhat cryptic, so we raise a custom InvalidCachedStatementError with the original server exception as context. In either case we clear the statement cache for this connection and all other connections of the pool this connection belongs to (if any). See #72 and #76 for discussion. Fixes: #72.
PostgreSQL will raise an exception when it detects that the result type of the query has changed from when the statement was prepared. This may happen, for example, after an ALTER TABLE or SET search_path. When this happens, and there is no transaction running, we can simply re-prepare the statement and try again. If the transaction _is_ running, this error will put it into an error state, and we have no choice but to raise an exception. The original error is somewhat cryptic, so we raise a custom InvalidCachedStatementError with the original server exception as context. In either case we clear the statement cache for this connection and all other connections of the pool this connection belongs to (if any). See #72 and #76 for discussion. Fixes: #72.
PostgreSQL will raise an exception when it detects that the result type of the query has changed from when the statement was prepared. This may happen, for example, after an ALTER TABLE or SET search_path. When this happens, and there is no transaction running, we can simply re-prepare the statement and try again. If the transaction _is_ running, this error will put it into an error state, and we have no choice but to raise an exception. The original error is somewhat cryptic, so we raise a custom InvalidCachedStatementError with the original server exception as context. In either case we clear the statement cache for this connection and all other connections of the pool this connection belongs to (if any). See #72 and #76 for discussion. Fixes: #72.
PostgreSQL will raise an exception when it detects that the result type of the query has changed from when the statement was prepared. This may happen, for example, after an ALTER TABLE or SET search_path. When this happens, and there is no transaction running, we can simply re-prepare the statement and try again. If the transaction _is_ running, this error will put it into an error state, and we have no choice but to raise an exception. The original error is somewhat cryptic, so we raise a custom InvalidCachedStatementError with the original server exception as context. In either case we clear the statement cache for this connection and all other connections of the pool this connection belongs to (if any). See #72 and #76 for discussion. Fixes: #72.
PostgreSQL will raise an exception when it detects that the result type of the query has changed from when the statement was prepared. This may happen, for example, after an ALTER TABLE or SET search_path. When this happens, and there is no transaction running, we can simply re-prepare the statement and try again. If the transaction _is_ running, this error will put it into an error state, and we have no choice but to raise an exception. The original error is somewhat cryptic, so we raise a custom InvalidCachedStatementError with the original server exception as context. In either case we clear the statement cache for this connection and all other connections of the pool this connection belongs to (if any). See #72 and #76 for discussion. Fixes: #72.
PostgreSQL will raise an exception when it detects that the result type of the query has changed from when the statement was prepared. This may happen, for example, after an ALTER TABLE or SET search_path. When this happens, and there is no transaction running, we can simply re-prepare the statement and try again. If the transaction _is_ running, this error will put it into an error state, and we have no choice but to raise an exception. The original error is somewhat cryptic, so we raise a custom InvalidCachedStatementError with the original server exception as context. In either case we clear the statement cache for this connection and all other connections of the pool this connection belongs to (if any). See #72 and #76 for discussion. Fixes: #72.
PostgreSQL will raise an exception when it detects that the result type of the query has changed from when the statement was prepared. This may happen, for example, after an ALTER TABLE or SET search_path. When this happens, and there is no transaction running, we can simply re-prepare the statement and try again. If the transaction _is_ running, this error will put it into an error state, and we have no choice but to raise an exception. The original error is somewhat cryptic, so we raise a custom InvalidCachedStatementError with the original server exception as context. In either case we clear the statement cache for this connection and all other connections of the pool this connection belongs to (if any). See #72 and #76 for discussion. Fixes: #72.
PostgreSQL will raise an exception when it detects that the result type of the query has changed from when the statement was prepared. This may happen, for example, after an ALTER TABLE or SET search_path. When this happens, and there is no transaction running, we can simply re-prepare the statement and try again. If the transaction _is_ running, this error will put it into an error state, and we have no choice but to raise an exception. The original error is somewhat cryptic, so we raise a custom InvalidCachedStatementError with the original server exception as context. In either case we clear the statement cache for this connection and all other connections of the pool this connection belongs to (if any). See #72 and #76 for discussion. Fixes: #72.
The parameter allows asyncpg to refresh cached prepared statements periodically. See also issue #76.
The parameter allows asyncpg to refresh cached prepared statements periodically. See also issue #76.
The parameter allows asyncpg to refresh cached prepared statements periodically. See also issue #76.
The parameter allows asyncpg to refresh cached prepared statements periodically. See also issue #76.
In asyncpg 0.10.0 we made some changes to address some of the problems you highlighted, specifically:
I believe that at this point the discussion boils down to one question: should we disable the statement cache by default or not. Our position is to have the cache enabled by default, because:
Thank you for the discussion, Vitaly, it really helped us make asyncpg better. We might reconsider if there's more evidence to support disabling the cache (feel free to reopen the issue). |
OK, at least now people who need a
I don't think people will post issues about it since they can just disable the cache.
Thank you very much! It was very exciting discussion. |
How about using a keyword arg to give the user explicit control whether or not cache should be used?
If |
(if locally, which version of Cython was used)?: Cython-0.25.1
While I was commenting #72 I paid attention that
asyncpg
caches prepared statements by default for methods likefetch
andexecute
of theConnection
class. I.e. it uses long-lived objects by default instead of just do what it declares (sending a query to_a_connection because for explicit long-lived statements there is a special "prepare" method).Prepared statements allow to separate a query from its parameters, but they also have unobvious disadvantages.
The first one is holding types' parameters of resulting rows. If database objects have changed since the first query of a prepared statement ("PS") was run, all queries of the PS are failed with PG's error mapped to
asyncpg.exceptions.FeatureNotSupportedError: cached plan must not change result type
(see #72).The most often cause of changing schema's objects is one of migration tools which are used for all non-"Hello, world" projects.
The second one is holding "generic" plans[1] after five[2] runs of a PS which leads to ineffectiveness under some circumstances because generic plan is not consulting whether concrete parameters match table's statistics or not (see below). It can be very dramatic for OLAP queries. It worsens by inability to flush a
Connection
cache (except disabling it right at the connecting stage by passing thestatement_cache_size=0
to theConnection.__init__
).Also note that many users use the
Connection
class indirectly by thePool
class and can miss thestatement_cache_size
argument.The third one is inability to use external connection poolers like pgbouncer. There are many issues in the tracker and there is a special question for filling a new issue.
The fourth is hiddenly occupying PS' names which can lead to errors:
asyncpg.exceptions.DuplicatePreparedStatementError: prepared statement "stmt_1" already exists
So users can write SQL queries consciously avoiding PS because of disadvantages in their own projects, but they still can get either ruined service (until its restart) because all important queries are failed due to an external migration tool or dramatical decreasing speed due to ineffective generic plan. And it is just because they have not paid attention to the
statement_cache_size
argument.From my (user's) point of view these issues appear from nowhere because I work with Postgres with invisible (but important) asyncpg's help. And in my mind I don't work directly with the library the same as I don't move my wrist in my mind when I move a pointer over a screen.
The current behavior for disabled cache can be more effective. An implementation of the libpq's
PQsendQueryParams
uses unnamed prepared statements which adds only one round-trip to a server (the "parse" stage) comparing to the PS behavior whereasasyncpg
sends one more request (besides "prepare") to Postgres to clear just used PS. Unnamed PS' last until the next "parse" command, so they don't require clearing.Your library can be fast. It is great! But do it stable by default with minimum side effects from its internals and let people make a decision. Not all of them want the top speed (they are ready to be slower by parsing a request for their convenience - see issue #9).
Then, if you declare that it is possible to cache execution plans using PS, I (as a user) expect the library does its best to keep the original behaviour (I'm about catching "cached plan must not change result type") except cases where it is really not possible (inside transactions except the first command there - see discussion in #72).
===
My proposal:
P.S.: About proposed solution at #72 (comment)
I don't think it is wise to offer event triggers, it is an application level, not a library one.
Also:
pg_catalog
is visible in the middle of a transaction. I've just checked it.P.P.S.:
I wrote a little bench which shows speed decreasing in specific circumstances due to generic plans for prepared statements.
I got the next result:
Concrete values vary from run to run, but Stage2 is always much bigger than other ones (with uvloop it is even more dramatic).
Pay attention to the difference between stages 2 and 3.
The reason for the second stage is usage of a prepared statement which holds a generic plan (after the first stage) with SeqScan (I recommend you to repeat it with real prepared statements and run "analyze" of it instead of "execute"; see[1]).
The third stage uses a different prepared statement due to a '+' sign with different execution plan in the same connection.
The fourth stage uses completely different statements and pretend to be PQsendQueryParams from libpq. There is a 1.5x to 2x speed loss, but I do not know how much time takes "Connection._get_statement" internals (PG's parse time is 0.040ms! + roundtrip time).
For the concrete example speed loss between:
I think when users impact that speed loss they'll search anywhere except the "statement_cache_size" argument.
Also those who know their data and require really maximal speed will use cases like Stage 6 (2x faster than Stage 3!), i.e. will not use your cache at all.
[1] https://www.postgresql.org/docs/current/static/sql-prepare.html#SQL-PREPARE-NOTES
[2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/cache/plancache.c#l1037
If the file is changed then search "if (plansource->num_custom_plans < 5)" around it.
The text was updated successfully, but these errors were encountered: