-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplement prepared statements with LRU cache and statement deduplication #618
Reimplement prepared statements with LRU cache and statement deduplication #618
Conversation
8d15135
to
cfe8e9f
Compare
…port mocking response packets and close
… cached plans with ideas about how to further improve it
@@ -568,6 +556,9 @@ pub struct Pool { | |||
#[serde(default)] // False | |||
pub log_client_parameter_status_changes: bool, | |||
|
|||
#[serde(default = "Pool::default_prepared_statements_cache_size")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to pool instead of global because there is a small packet inspection penalty you pay when this is enabled to determine if packets are using named statements or not
src/server.rs
Outdated
@@ -957,6 +970,42 @@ impl Server { | |||
if self.in_copy_mode { | |||
self.in_copy_mode = false; | |||
} | |||
// TODO: consider logging a warning here | |||
|
|||
if self.prepared_statement_enabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say we have a statement that uses select *
on a table but a new column is added after we prepared it, PG will send a cached plan must not change result type
error if we try to use this statement.
This change tries to identify this type of error and DEALLOCATE ALL
on the server connection to force re-prepares. Clients will still see errors the first time for each server that hasn't deallocated but this will help clean up the pool.
fb0d253
to
d791f06
Compare
* Cosmetic fixes * fix test
1ee3df3
to
db70499
Compare
src/client.rs
Outdated
pool: &ConnectionPool, | ||
server: &mut Server, | ||
address: &Address, | ||
) -> Result<(), Error> { | ||
// We want to update this in the LRU to know this was recently used and add it if it isn't there already | ||
// This could be the case if it was evicted or if doesn't exist (ie. we reloaded and it got removed) | ||
pool.register_parse_to_cache(hash, parse); | ||
if let Some(new_parse) = pool.register_parse_to_cache(hash, &parse) { | ||
// If the pool has renamed this parse, we need to update the client cache with the new name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who's doing this and why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the pool is cleared and we are generating new names for the parse messages then we want to update those within the client. I was thinking about ways to handle DDL changes better, but I think I'll have a follow up PR instead and exclude this change.
4ac4832
to
db70499
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really cool, thank you!
This PR reimplements prepared statements in PgCat. There was a large latency regression when running PgCat with multiple clients and prepared statements, this is most likely due to cache misses and needing to prepare multiple statements on the server.
This new approach hashes the Parse content (excluding the name) as a key to a map which holds rewritten parse messages, this greatly increased the number of cache hits we get and deduplicates any of the same statements that different clients send with different names.
Prepared statements in action
Running a test against a Baseline PgCat with the
extended
protocol and a Feature PgCat withprepared statements
we were able to get some data. The test runs 12 unique prepared statements against an empty table, this helps us exclude actual PG compute from other parts of the query execution process.We were able to get a slight latency win, about 5% faster
But even more interestingly we saw the prepared statement database instance's CPU usage was less than half of the extended protocol instance. While an extreme case, this helps to show how much time can be saved by pre-planning queries and reducing the amount of work PG has to do.
Other notes and features:
cached plan must not change result type
error message. This helps to fix things when there are DDL changes that invalidate cached plansprepared_statements
variable and moves theprepared_statements_cache_size
to the pool configuration level, this is because when enabled prepared statements will incur a small packet inspection penalty to determine if a statement is named or notEquivalent latency to pgbouncer, all running with 1000 cache size, 50 max connections and 1 thread
Pgbouncer
PgCat
Implementation details:
TLDR:
ConnectionPool
:This connection pool has a new attribute which looks like this:
cache: LruCache<u64, Arc<Parse>>
It is an LRU cache where the Key is a hash of the contents of Parse packet excluding the name ie. (query string, number of params, param types). The value is an Arc of the rewritten Parse message (it's an Arc to avoid duplicating data that is given to the clients).
When a Parse comes in, the pool determines if it the query already exists (ignoring the name), if it exists it clones the
Arc<Parse>
and gives it to the client, if it doesn't exist it creates a new name for the packet that will be used to prepare it against the server. This is based on a global counter that increments every time we need to generate a new name.If we exceed the capacity of the LRU then the
Arc<Parse>
is simply dropped. There is no other management to be done since any clients that still need it have a copy of the Parse and can just add it back to the cache if needed.Client
:The client has a new attribute which looks like this:
prepared_statements: HashMap<String, Arc<Parse>>
This is a mapping of the Prepared Statement names set by the client to the rewritten Parse messages. When a new parse comes in it checks against the connection pool for the rewritten Parse message and stores it in this map. Any statements that come in (Bind, Describe) will be rewritten based on the rewritten Parse messages name.
The main reason we're using an
Arc
for the Parse messages is because clients that use a prepared statement will all need the original contents of the Parse (especially in the case when the message is evicted from theConnectionPool
) and that can take up memory, by using anArc
we have one copy of the Parse and the client stores a reference to this. The Parse is released from memory when the client is dropped and it no longer exists on the ConnectionPool.Server
:The server has a new attribute which looks like this:
prepared_statement_cache: LruCache<String, ()>
The server needs to know which prepared statements it has, this mapping stores the rewritten names in an LRU. The client prepares the Parse on the server if needed (it might not already have it) and it updates (adds/promotes) the name of the statement to the
ConnecctionPool
'sprepared_statement_cache
.If a statement needs to be evicted from the cache, pgcat will send a close message to the server to drop it.