-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement clientauth feature #227
Conversation
Add a needs_processing flag to each queue entry to determine when a background worker can pick up a task. Add ConditionVariablePrepareToSleep calls to each CV predicate loop. Do not call ConditionVariableSleep without first checking the loop predicate. We are still seeing an occasional race condition during heavy load (more connections than there are queue slots) where the last one or two connections are not returned by clientauth_hook and eventually time out.
I'm doing some load testing by opening a few thousand connections in parallel, and I'm encountering a race condition every few tries that results in the last one or two connections being lost. My suspicion is that ConditionVariableSleep is not playing well with LWLocks. This is the pattern for waiting on a condition to be true:
Sources that discuss condition variables (e.g. wikipedia) say that LWLockRelease and ConditionVariableSleep must be atomic, or else a race condition can occur where 1. the lock is released, 2. another thread updates the state and sends the signal, and then 3. ConditionVariableSleep is called. This results in a dropped signal. I suspect this is happening because looking at the logs when the dropped connection bug happens, Postgres provides a ConditionVariablePrepareToSleep method which adds a thread to the CV wait list before sleeping so I tried adding that, but the issue isn't fixed. I can't tell if ConditionVariablePrepareToSleep is supposed to fix this bug or if there's something else wrong with the logic? I would appreciate any feedback/advice, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't tell if ConditionVariablePrepareToSleep is supposed to fix this bug or if there's something else wrong with the logic? I would appreciate any feedback/advice, thanks!
The logic seems fine.
Can you try adding a debug line here:
https://github.com/postgres/postgres/blob/master/src/backend/storage/lmgr/condition_variable.c#L77-L79
Something like logging the MyProcPid (backend_pid) and the CV it's listening on?
And when BGW signals add logging here:
To get the signal of the CV, and the PGPROC pid it signals.
I wonder when the BGW signals the CV, if the client auth backend is actually listening on it?
Or maybe we have multiple processes listening on the CV, and Signal only signals the oldest, while broadcast does all.
Broadcast would be incorrect though cause we're only processing one entry
Also a generic comment on if the client disconnects or client auth backend terminates, do we have a stale shared memory entry perpetually?
src/clientauth.c
Outdated
static void clientauth_sighup(SIGNAL_ARGS); | ||
|
||
void clientauth_init(void); | ||
bool can_allow_without_executing(void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can be static? (Other than clientauth_init)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry are you asking why they are static or saying that they should be static?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That they should*
src/clientauth.c
Outdated
|
||
if (clientauth_ss->requests[idx].needs_processing) | ||
{ | ||
clientauth_ss->requests[idx].needs_processing = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one risk you run into with this is that if the BGW is terminated, we would have needs_processing = false even though we haven't actually processed the event.
I imagine you're doing this to allow parallelization to happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this makes sure the other BGWs that may have woken up at the same time know that this entry has been picked up and they shouldn't try to pick it up themselves. Instead they can move on to the next entry in parallel
I think you're right that the client would be left hanging though. Previously I had this loop check for done_processing
instead of needs_processing
, perhaps that would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'd still have duplicate work, and then you also run the risk of a second BGW populating an overlap entry with new data potentially
src/clientauth.c
Outdated
while (true) | ||
{ | ||
LWLockAcquire(clientauth_ss->lock, LW_EXCLUSIVE); | ||
ConditionVariablePrepareToSleep(&clientauth_ss->requests[idx].client_cv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOC is there a reason we put ConditionVariablePrepareToSleep
within the while loop instead of before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC ConditionVariablePrepareToSleep adds the thread to CV's queue. I put it in the while loop before LWLockRelease, this way the thread is added to the wait queue before releasing the lock, preventing missed wakeups. Though maybe the thread only needs to be added once per wait loop? and this doesn't fix the bug anyway
Previously the client backend checked num_requests to see if there are available entries to load into. However, when a client finishes and decrements num_requests, we don't know which index it was, so we can't assume that the next index is free. This commit replaces num_requests with an available_entry flag for each queue entry. The cliend backend directly checks if the next entry is available instead. This means that a client backend can be blocked by a single long-running query if it happens to be occupying the next queue entry. We should think about a way for client backends to "skip" entries if a later one is available.
The race condition was caused by an assumption that when This is fixed by adding a flag to each entry denoting whether it is available. This raises two questions though:
|
Each worker is responsible for its own partition of entries to prevent clobbering and allow graceful handling of terminated worker processes. Let client backends pick which slot (and therefore worker) to use based on its own PID, which should be roughly sequential.
I pushed a change that assigns each worker a partition of the queue that they are solely responsible for, instead of letting workers pick whatever entry is available for processing. This allows workers terminated mid-process to come back up and resume processing, and gives a stronger guarantee that there is no clobbering between background workers trying to process the same entry. Rather than inserting sequentially, a client backend decides which queue entry to insert into by calculating However in practice there is some still clustering going on. During load tests, there is a tail of clients that are waiting for the same handful of background workers to process them. We can discuss ways to avoid this or whether we should go back to the original approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can discuss ways to avoid this or whether we should go back to the original approach.
I think using rand() is fineif we've found that PID isn't "sequential" enough that there's still some clustering.
Although like you mentioned I think the only time this might actually matter is when there's a connection storm such as a connection pool reconnecting and wanting to establish a bunch of new connections, or appservers re-connecting asap after a restart. One risk we run into is if we're "unlucky" enough we lose the parallelization.
src/clientauth.c
Outdated
static void clientauth_hook(Port *port, int status) | ||
{ | ||
/* Determine the queue index that this client will insert into based on its PID */ | ||
int idx = MyProc->pid % CLIENT_AUTH_MAX_PENDING_ENTRIES; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh you're %-ing this to get the idx to insert. I guess my earlier comments on fairness doesn't really matter too much? Well it might.
OOC what were the trade-offs in doing this approach as opposed to something like:
idx = MyProc % BGW and then start from some idx to try to insert into
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the fairness comment still applies? The worker will still work on its first entry before moving on to other entries, even if those were filled earlier. If client 1 finishes with entry 1 and signals client 5 to insert, client 5 will get worker priority over clients 2, 3, 4 that occupied the entries 2, 3, 4.
If we put aside the fairness issue (which I think we can resolve by having the worker process all entries per wake or pick entries in a staggered way), I think the two % approaches are the same and we'll observe the same clustering behaviour.
Either the clients will pick distinct entries in the whole queue and then get folded down into BGW buckets, or they'll pick BGW buckets and then spread out across entries in the queue. In the latter approach, clients could in theory find an empty entry earlier instead of having to wait for a single predetermined entry, but since that entry is handled by the same worker anyway it would end up waiting the same amount of time (or at least in aggregate, the total wait time among clients would be the same).
To prevent workers from biasing towards processing their first entry, add an offset to the starting index in the for loop that increments every time an entry is picked up for processing.
Did some testing:
New changes:
|
Move query logic to a dedicated function and capture errors from feature_proc.
src/feature.c
Outdated
bool | ||
check_string_in_guc_list(const char *str, const char *guc_var, const char *guc_name) | ||
{ | ||
bool skip = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spacing? Or is this how pg_indent does it which is weird ehh
s/skip/match ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks for catching that, yeah pg_indent likes to do this apparently
src/clientauth.c
Outdated
* clientauth_hook | ||
*/ | ||
LWLockAcquire(clientauth_ss->lock, LW_EXCLUSIVE); | ||
clientauth_ss->requests[idx].done_processing = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should move done_processing
to the very end in case BGW gets terminated in between?
1. Fix my pg_indent setup 2. Set enable_clientauth context to POSTMASTER and check its value in clientauth_init to determine whether to start background workers. If a user is not using clientauth, background workers will not sit idle in the background 3. Move done_processing to the very end of BGW loop 4. Remove some logging
If a client backend terminates unexpectedly after setting entry_available = false, it can block every subsequent client from using that entry. This commit adds a PID field to the request entry struct and allows clients to check if the process currently holding the entry still exists. If it doesn't, then it can set entry_available to true. Signalling order is rearranged to prevent deadlocks where a backend is terminated after entry_available is set to false but before signalling anyone else. In addition to signalling the current client, background workers signal the next waiting client to check if the current client still exists.
Pushed a fix for the stale entry bug. Copying the commit message here:
I realized the message is not super clear but the upshot is that: after processing, background workers signal waiting clients to check if the PID of the current client still exists. if it doesn't then the waiting client can go ahead I tested this by setting CLIENT_AUTH_MAX_PENDING_ENTRIES to 1, starting a connection and SIGTERMing the backend, then starting another connection. Previously the second connection would be blocked forever, now it goes through as expected. |
If IsBinaryUpgrade, then do not start background workers and skip all logic in clientauth_hook.
src/clientauth.c
Outdated
* Copy the entry to local memory and then release the lock to unblock | ||
* other workers/clients. | ||
*/ | ||
LWLockAcquire(clientauth_ss->lock, LW_EXCLUSIVE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this one can be changed to LW_SHARED
? IIUC only writes to shared_mem would actually need LW_EXCLUSIVE
. Probably applies to other places?
If the user's function returns a table, SELECT func combines all the columns into one, whereas SELECT * FROM lets us pick the first column out of the first returned row. Not sure this really matters since we aren't officially supporting functions that return table types anyway, but this change doesn't affect non-table return values. Add a test case to cover functions that return tables, and add a test case that covers a void-returning function with no error.
Postgres commit 414f6c0 makes this change
We want enable_clientauth to be SIGHUP in case users lock themselves out by setting to REQUIRE without any functions registered. The hooks and background workers will register only if enable_clientauth is ON or REQUIRE at postmaster startup. We should make this very clear in the docs.
src/clientauth.c
Outdated
hookargs[0] = CStringGetTextDatum(port_subset_str); | ||
hookargs[1] = Int32GetDatum(*status); | ||
|
||
SPI_execute_with_args(query, SPI_NARGS_2, hookargtypes, hookargs, hooknulls, true, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should check the return value here too? Might not be necessary but what do you think
https://www.postgresql.org/docs/current/spi-spi-execute-with-args.html
https://www.postgresql.org/docs/current/spi-spi-execute.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah makes sense, adding the same handling as in passcheck
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it always SPI_SELECT
even if an INSERT was done by the function?
ie. we do something like
SELECT * FROM insert_foo();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is but not sure I can guarantee that, checking < 0 also makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested with INSERT and it does return SPI_OK_SELECT -- we also use the same logic in passcheck
include/compatibility.h
Outdated
#define WAIT_EVENT_MESSAGE_QUEUE_PUT_MESSAGE WAIT_EVENT_MQ_PUT_MESSAGE | ||
#define WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL WAIT_EVENT_WAL_SENDER_WAIT_WAL | ||
#define WAIT_EVENT_MESSAGE_QUEUE_SEND WAIT_EVENT_MQ_SEND | ||
#define WAIT_EVENT_MESSAGE_QUEUE_RECEIVE WAIT_EVENT_MQ_RECEIVE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one's duplicated, and we're only using one right might not need to declare all
Issue #, if available:
Description of changes:
Implementation of clientauth feature, allowing users to register trusted language functions to the Postgres ClientAuthentication_hook. Uses background workers to execute database functions.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.