Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement clientauth feature #227

Merged
merged 38 commits into from
Sep 7, 2023
Merged

Implement clientauth feature #227

merged 38 commits into from
Sep 7, 2023

Conversation

adamguo0
Copy link
Contributor

Issue #, if available:

Description of changes:

Implementation of clientauth feature, allowing users to register trusted language functions to the Postgres ClientAuthentication_hook. Uses background workers to execute database functions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Add a needs_processing flag to each queue entry to determine when a
background worker can pick up a task. Add
ConditionVariablePrepareToSleep calls to each CV predicate loop. Do not
call ConditionVariableSleep without first checking the loop predicate.

We are still seeing an occasional race condition during heavy load (more
connections than there are queue slots) where the last one or two
connections are not returned by clientauth_hook and eventually time out.
@adamguo0
Copy link
Contributor Author

adamguo0 commented Aug 21, 2023

I'm doing some load testing by opening a few thousand connections in parallel, and I'm encountering a race condition every few tries that results in the last one or two connections being lost.

My suspicion is that ConditionVariableSleep is not playing well with LWLocks. This is the pattern for waiting on a condition to be true:

while (true)
{
    LWLockAcquire(lock);
    if (condition)
    {
        LWLockRelease(lock);
        break;
    }
    // condition not met, now wait for signal
    ConditionVariableSleep(cv);
}

Sources that discuss condition variables (e.g. wikipedia) say that LWLockRelease and ConditionVariableSleep must be atomic, or else a race condition can occur where 1. the lock is released, 2. another thread updates the state and sends the signal, and then 3. ConditionVariableSleep is called. This results in a dropped signal. I suspect this is happening because looking at the logs when the dropped connection bug happens, clientauth_hook goes to sleep waiting for processed_entry to become true, and then never wakes up.

Postgres provides a ConditionVariablePrepareToSleep method which adds a thread to the CV wait list before sleeping so I tried adding that, but the issue isn't fixed. I can't tell if ConditionVariablePrepareToSleep is supposed to fix this bug or if there's something else wrong with the logic? I would appreciate any feedback/advice, thanks!

Copy link
Contributor

@JohnHVancouver JohnHVancouver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell if ConditionVariablePrepareToSleep is supposed to fix this bug or if there's something else wrong with the logic? I would appreciate any feedback/advice, thanks!

The logic seems fine.
Can you try adding a debug line here:
https://github.com/postgres/postgres/blob/master/src/backend/storage/lmgr/condition_variable.c#L77-L79

Something like logging the MyProcPid (backend_pid) and the CV it's listening on?

And when BGW signals add logging here:

https://github.com/postgres/postgres/blob/master/src/backend/storage/lmgr/condition_variable.c#L263-L266

To get the signal of the CV, and the PGPROC pid it signals.

I wonder when the BGW signals the CV, if the client auth backend is actually listening on it?
Or maybe we have multiple processes listening on the CV, and Signal only signals the oldest, while broadcast does all.

https://github.com/postgres/postgres/blob/master/src/backend/storage/lmgr/condition_variable.c#L277-L279

Broadcast would be incorrect though cause we're only processing one entry

Also a generic comment on if the client disconnects or client auth backend terminates, do we have a stale shared memory entry perpetually?

src/clientauth.c Outdated
static void clientauth_sighup(SIGNAL_ARGS);

void clientauth_init(void);
bool can_allow_without_executing(void);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be static? (Other than clientauth_init)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry are you asking why they are static or saying that they should be static?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That they should*

src/clientauth.c Outdated

if (clientauth_ss->requests[idx].needs_processing)
{
clientauth_ss->requests[idx].needs_processing = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one risk you run into with this is that if the BGW is terminated, we would have needs_processing = false even though we haven't actually processed the event.

I imagine you're doing this to allow parallelization to happen?

Copy link
Contributor Author

@adamguo0 adamguo0 Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this makes sure the other BGWs that may have woken up at the same time know that this entry has been picked up and they shouldn't try to pick it up themselves. Instead they can move on to the next entry in parallel

I think you're right that the client would be left hanging though. Previously I had this loop check for done_processing instead of needs_processing, perhaps that would help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'd still have duplicate work, and then you also run the risk of a second BGW populating an overlap entry with new data potentially

src/clientauth.c Outdated
while (true)
{
LWLockAcquire(clientauth_ss->lock, LW_EXCLUSIVE);
ConditionVariablePrepareToSleep(&clientauth_ss->requests[idx].client_cv);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOC is there a reason we put ConditionVariablePrepareToSleep within the while loop instead of before?

Copy link
Contributor Author

@adamguo0 adamguo0 Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC ConditionVariablePrepareToSleep adds the thread to CV's queue. I put it in the while loop before LWLockRelease, this way the thread is added to the wait queue before releasing the lock, preventing missed wakeups. Though maybe the thread only needs to be added once per wait loop? and this doesn't fix the bug anyway

Previously the client backend checked num_requests to see if there are
available entries to load into. However, when a client finishes and
decrements num_requests, we don't know which index it was, so we can't
assume that the next index is free. This commit replaces num_requests
with an available_entry flag for each queue entry. The cliend backend
directly checks if the next entry is available instead.

This means that a client backend can be blocked by a single long-running
query if it happens to be occupying the next queue entry. We should
think about a way for client backends to "skip" entries if a later one
is available.
@adamguo0
Copy link
Contributor Author

adamguo0 commented Aug 22, 2023

The race condition was caused by an assumption that when num_requests is decremented, the next queue entry (denoted by idx_insert) would be available. However when a client backend is done processing and decrements num_requests, we don't know which index it was, chances are it was not the next index in queue. This resulted in entries being clobbered.

This is fixed by adding a flag to each entry denoting whether it is available. This raises two questions though:

  1. A long-running query can potentially block all waiting client backends even though latter entries have become available, because client backends look sequentially through the queue and they don't "skip" indexes
  2. This parallelisation approach could cause clobbering issues elsewhere too. Per offline discussion with John we can try partitioning the queue and assigning each entry to a single BGW instead to avoid clobbering.

Each worker is responsible for its own partition of entries to prevent
clobbering and allow graceful handling of terminated worker processes.
Let client backends pick which slot (and therefore worker) to use based
on its own PID, which should be roughly sequential.
@adamguo0
Copy link
Contributor Author

I pushed a change that assigns each worker a partition of the queue that they are solely responsible for, instead of letting workers pick whatever entry is available for processing. This allows workers terminated mid-process to come back up and resume processing, and gives a stronger guarantee that there is no clobbering between background workers trying to process the same entry.

Rather than inserting sequentially, a client backend decides which queue entry to insert into by calculating client_backend_pid % number_of_workers. The PIDs should be roughly sequential which distributes the clients across the BGWs.

However in practice there is some still clustering going on. During load tests, there is a tail of clients that are waiting for the same handful of background workers to process them. We can discuss ways to avoid this or whether we should go back to the original approach.

Copy link
Contributor

@JohnHVancouver JohnHVancouver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can discuss ways to avoid this or whether we should go back to the original approach.

I think using rand() is fineif we've found that PID isn't "sequential" enough that there's still some clustering.

Although like you mentioned I think the only time this might actually matter is when there's a connection storm such as a connection pool reconnecting and wanting to establish a bunch of new connections, or appservers re-connecting asap after a restart. One risk we run into is if we're "unlucky" enough we lose the parallelization.

Makefile Outdated Show resolved Hide resolved
src/clientauth.c Outdated Show resolved Hide resolved
src/clientauth.c Outdated Show resolved Hide resolved
src/clientauth.c Outdated Show resolved Hide resolved
src/clientauth.c Outdated Show resolved Hide resolved
src/clientauth.c Outdated Show resolved Hide resolved
src/clientauth.c Outdated Show resolved Hide resolved
src/clientauth.c Outdated
static void clientauth_hook(Port *port, int status)
{
/* Determine the queue index that this client will insert into based on its PID */
int idx = MyProc->pid % CLIENT_AUTH_MAX_PENDING_ENTRIES;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh you're %-ing this to get the idx to insert. I guess my earlier comments on fairness doesn't really matter too much? Well it might.

OOC what were the trade-offs in doing this approach as opposed to something like:

idx = MyProc % BGW and then start from some idx to try to insert into

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fairness comment still applies? The worker will still work on its first entry before moving on to other entries, even if those were filled earlier. If client 1 finishes with entry 1 and signals client 5 to insert, client 5 will get worker priority over clients 2, 3, 4 that occupied the entries 2, 3, 4.

If we put aside the fairness issue (which I think we can resolve by having the worker process all entries per wake or pick entries in a staggered way), I think the two % approaches are the same and we'll observe the same clustering behaviour.

Either the clients will pick distinct entries in the whole queue and then get folded down into BGW buckets, or they'll pick BGW buckets and then spread out across entries in the queue. In the latter approach, clients could in theory find an empty entry earlier instead of having to wait for a single predetermined entry, but since that entry is handled by the same worker anyway it would end up waiting the same amount of time (or at least in aggregate, the total wait time among clients would be the same).

src/clientauth.c Outdated Show resolved Hide resolved
test/t/004_pg_tle_clientauth.pl Show resolved Hide resolved
To prevent workers from biasing towards processing their first entry,
add an offset to the starting index in the for loop that increments
every time an entry is picked up for processing.
@adamguo0
Copy link
Contributor Author

adamguo0 commented Aug 24, 2023

Did some testing:

  1. Client backend PIDs are actually more uniformly distributed than rand(), at least given how I run load tests (which is a few thousand psql -c 'select' & lines)
  2. Workers are not actually picking their first entry significantly more often than their other entries, it's pretty uniform. I added an offset mechanism anyway though in case this does become a problem

New changes:

  1. Add index offset to the worker loop when checking for pending entries
  2. Set default clientauth_num_parallel_workers to 1 and set maximum to min(max_connections, CLIENT_AUTH_MAX_PENDING_ENTRIES)
  3. Ran pg_indent
  4. Reverted Makefile change
  5. Removed unused code, updated some comments

Move query logic to a dedicated function and capture errors from
feature_proc.
src/feature.c Outdated
bool
check_string_in_guc_list(const char *str, const char *guc_var, const char *guc_name)
{
bool skip = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spacing? Or is this how pg_indent does it which is weird ehh

s/skip/match ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for catching that, yeah pg_indent likes to do this apparently

src/clientauth.c Outdated Show resolved Hide resolved
src/clientauth.c Show resolved Hide resolved
src/clientauth.c Show resolved Hide resolved
src/clientauth.c Show resolved Hide resolved
src/clientauth.c Outdated Show resolved Hide resolved
src/clientauth.c Show resolved Hide resolved
src/clientauth.c Outdated
* clientauth_hook
*/
LWLockAcquire(clientauth_ss->lock, LW_EXCLUSIVE);
clientauth_ss->requests[idx].done_processing = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should move done_processing to the very end in case BGW gets terminated in between?

src/clientauth.c Show resolved Hide resolved
1. Fix my pg_indent setup
2. Set enable_clientauth context to POSTMASTER and check its value in
   clientauth_init to determine whether to start background workers.
   If a user is not using clientauth, background workers will not sit
   idle in the background
3. Move done_processing to the very end of BGW loop
4. Remove some logging
If a client backend terminates unexpectedly after setting
entry_available = false, it can block every subsequent client from using
that entry.

This commit adds a PID field to the request entry struct and allows
clients to check if the process currently holding the entry still
exists. If it doesn't, then it can set entry_available to true.

Signalling order is rearranged to prevent deadlocks where a backend is
terminated after entry_available is set to false but before signalling
anyone else. In addition to signalling the current client, background
workers signal the next waiting client to check if the current client
still exists.
@adamguo0
Copy link
Contributor Author

adamguo0 commented Sep 5, 2023

Pushed a fix for the stale entry bug. Copying the commit message here:

If a client backend terminates unexpectedly after setting
entry_available = false, it can block every subsequent client from using
that entry.

This commit adds a PID field to the request entry struct and allows
clients to check if the process currently holding the entry still
exists. If it doesn't, then it can set entry_available to true.

Signalling order is rearranged to prevent deadlocks where a backend is
terminated after entry_available is set to false but before signalling
anyone else. In addition to signalling the current client, background
workers signal the next waiting client to check if the current client
still exists.

I realized the message is not super clear but the upshot is that: after processing, background workers signal waiting clients to check if the PID of the current client still exists. if it doesn't then the waiting client can go ahead

I tested this by setting CLIENT_AUTH_MAX_PENDING_ENTRIES to 1, starting a connection and SIGTERMing the backend, then starting another connection. Previously the second connection would be blocked forever, now it goes through as expected.

If IsBinaryUpgrade, then do not start background workers and skip all
logic in clientauth_hook.
src/clientauth.c Outdated Show resolved Hide resolved
src/clientauth.c Outdated
* Copy the entry to local memory and then release the lock to unblock
* other workers/clients.
*/
LWLockAcquire(clientauth_ss->lock, LW_EXCLUSIVE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this one can be changed to LW_SHARED? IIUC only writes to shared_mem would actually need LW_EXCLUSIVE. Probably applies to other places?

src/clientauth.c Show resolved Hide resolved
src/clientauth.c Show resolved Hide resolved
src/clientauth.c Show resolved Hide resolved
src/clientauth.c Outdated Show resolved Hide resolved
src/clientauth.c Show resolved Hide resolved
src/clientauth.c Outdated Show resolved Hide resolved
If the user's function returns a table, SELECT func combines all the
columns into one, whereas SELECT * FROM lets us pick the first column
out of the first returned row. Not sure this really matters since we
aren't officially supporting functions that return table types anyway,
but this change doesn't affect non-table return values.

Add a test case to cover functions that return tables, and add a test
case that covers a void-returning function with no error.
Postgres commit 414f6c0 makes this change
We want enable_clientauth to be SIGHUP in case users lock themselves out
by setting to REQUIRE without any functions registered.

The hooks and background workers will register only if enable_clientauth
is ON or REQUIRE at postmaster startup. We should make this very clear
in the docs.
src/clientauth.c Outdated
hookargs[0] = CStringGetTextDatum(port_subset_str);
hookargs[1] = Int32GetDatum(*status);

SPI_execute_with_args(query, SPI_NARGS_2, hookargtypes, hookargs, hooknulls, true, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should check the return value here too? Might not be necessary but what do you think

https://www.postgresql.org/docs/current/spi-spi-execute-with-args.html
https://www.postgresql.org/docs/current/spi-spi-execute.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense, adding the same handling as in passcheck

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it always SPI_SELECT even if an INSERT was done by the function?

ie. we do something like

SELECT * FROM insert_foo();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is but not sure I can guarantee that, checking < 0 also makes sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with INSERT and it does return SPI_OK_SELECT -- we also use the same logic in passcheck

src/clientauth.c Show resolved Hide resolved
src/clientauth.c Show resolved Hide resolved
#define WAIT_EVENT_MESSAGE_QUEUE_PUT_MESSAGE WAIT_EVENT_MQ_PUT_MESSAGE
#define WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL WAIT_EVENT_WAL_SENDER_WAIT_WAL
#define WAIT_EVENT_MESSAGE_QUEUE_SEND WAIT_EVENT_MQ_SEND
#define WAIT_EVENT_MESSAGE_QUEUE_RECEIVE WAIT_EVENT_MQ_RECEIVE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one's duplicated, and we're only using one right might not need to declare all

@adamguo0 adamguo0 merged commit 1835907 into aws:main Sep 7, 2023
@adamguo0 adamguo0 deleted the clientauth branch July 1, 2024 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants