Implement clientauth feature #227

adamguo0 · 2023-08-18T19:30:03Z

Issue #, if available:

Description of changes:

Implementation of clientauth feature, allowing users to register trusted language functions to the Postgres ClientAuthentication_hook. Uses background workers to execute database functions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Add a needs_processing flag to each queue entry to determine when a background worker can pick up a task. Add ConditionVariablePrepareToSleep calls to each CV predicate loop. Do not call ConditionVariableSleep without first checking the loop predicate. We are still seeing an occasional race condition during heavy load (more connections than there are queue slots) where the last one or two connections are not returned by clientauth_hook and eventually time out.

adamguo0 · 2023-08-21T13:31:46Z

I'm doing some load testing by opening a few thousand connections in parallel, and I'm encountering a race condition every few tries that results in the last one or two connections being lost.

My suspicion is that ConditionVariableSleep is not playing well with LWLocks. This is the pattern for waiting on a condition to be true:

while (true)
{
    LWLockAcquire(lock);
    if (condition)
    {
        LWLockRelease(lock);
        break;
    }
    // condition not met, now wait for signal
    ConditionVariableSleep(cv);
}

Sources that discuss condition variables (e.g. wikipedia) say that LWLockRelease and ConditionVariableSleep must be atomic, or else a race condition can occur where 1. the lock is released, 2. another thread updates the state and sends the signal, and then 3. ConditionVariableSleep is called. This results in a dropped signal. I suspect this is happening because looking at the logs when the dropped connection bug happens, clientauth_hook goes to sleep waiting for processed_entry to become true, and then never wakes up.

Postgres provides a ConditionVariablePrepareToSleep method which adds a thread to the CV wait list before sleeping so I tried adding that, but the issue isn't fixed. I can't tell if ConditionVariablePrepareToSleep is supposed to fix this bug or if there's something else wrong with the logic? I would appreciate any feedback/advice, thanks!

JohnHVancouver

I can't tell if ConditionVariablePrepareToSleep is supposed to fix this bug or if there's something else wrong with the logic? I would appreciate any feedback/advice, thanks!

The logic seems fine.
Can you try adding a debug line here:
https://github.com/postgres/postgres/blob/master/src/backend/storage/lmgr/condition_variable.c#L77-L79

Something like logging the MyProcPid (backend_pid) and the CV it's listening on?

And when BGW signals add logging here:

https://github.com/postgres/postgres/blob/master/src/backend/storage/lmgr/condition_variable.c#L263-L266

To get the signal of the CV, and the PGPROC pid it signals.

I wonder when the BGW signals the CV, if the client auth backend is actually listening on it?
Or maybe we have multiple processes listening on the CV, and Signal only signals the oldest, while broadcast does all.

https://github.com/postgres/postgres/blob/master/src/backend/storage/lmgr/condition_variable.c#L277-L279

Broadcast would be incorrect though cause we're only processing one entry

Also a generic comment on if the client disconnects or client auth backend terminates, do we have a stale shared memory entry perpetually?

JohnHVancouver · 2023-08-21T22:21:57Z

src/clientauth.c

+static void clientauth_sighup(SIGNAL_ARGS);
+
+void clientauth_init(void);
+bool can_allow_without_executing(void);


These can be static? (Other than clientauth_init)

Sorry are you asking why they are static or saying that they should be static?

That they should*

JohnHVancouver · 2023-08-21T22:24:10Z

src/clientauth.c

+
+            if (clientauth_ss->requests[idx].needs_processing)
+            {
+                clientauth_ss->requests[idx].needs_processing = false;


I think one risk you run into with this is that if the BGW is terminated, we would have needs_processing = false even though we haven't actually processed the event.

I imagine you're doing this to allow parallelization to happen?

Yeah this makes sure the other BGWs that may have woken up at the same time know that this entry has been picked up and they shouldn't try to pick it up themselves. Instead they can move on to the next entry in parallel

I think you're right that the client would be left hanging though. Previously I had this loop check for done_processing instead of needs_processing, perhaps that would help.

I think you'd still have duplicate work, and then you also run the risk of a second BGW populating an overlap entry with new data potentially

JohnHVancouver · 2023-08-21T22:38:32Z

src/clientauth.c

+    while (true)
+    {
+        LWLockAcquire(clientauth_ss->lock, LW_EXCLUSIVE);
+        ConditionVariablePrepareToSleep(&clientauth_ss->requests[idx].client_cv);


OOC is there a reason we put ConditionVariablePrepareToSleep within the while loop instead of before?

IIUC ConditionVariablePrepareToSleep adds the thread to CV's queue. I put it in the while loop before LWLockRelease, this way the thread is added to the wait queue before releasing the lock, preventing missed wakeups. Though maybe the thread only needs to be added once per wait loop? and this doesn't fix the bug anyway

Previously the client backend checked num_requests to see if there are available entries to load into. However, when a client finishes and decrements num_requests, we don't know which index it was, so we can't assume that the next index is free. This commit replaces num_requests with an available_entry flag for each queue entry. The cliend backend directly checks if the next entry is available instead. This means that a client backend can be blocked by a single long-running query if it happens to be occupying the next queue entry. We should think about a way for client backends to "skip" entries if a later one is available.

adamguo0 · 2023-08-22T21:28:57Z

The race condition was caused by an assumption that when num_requests is decremented, the next queue entry (denoted by idx_insert) would be available. However when a client backend is done processing and decrements num_requests, we don't know which index it was, chances are it was not the next index in queue. This resulted in entries being clobbered.

This is fixed by adding a flag to each entry denoting whether it is available. This raises two questions though:

A long-running query can potentially block all waiting client backends even though latter entries have become available, because client backends look sequentially through the queue and they don't "skip" indexes
This parallelisation approach could cause clobbering issues elsewhere too. Per offline discussion with John we can try partitioning the queue and assigning each entry to a single BGW instead to avoid clobbering.

Each worker is responsible for its own partition of entries to prevent clobbering and allow graceful handling of terminated worker processes. Let client backends pick which slot (and therefore worker) to use based on its own PID, which should be roughly sequential.

adamguo0 · 2023-08-23T21:24:52Z

I pushed a change that assigns each worker a partition of the queue that they are solely responsible for, instead of letting workers pick whatever entry is available for processing. This allows workers terminated mid-process to come back up and resume processing, and gives a stronger guarantee that there is no clobbering between background workers trying to process the same entry.

Rather than inserting sequentially, a client backend decides which queue entry to insert into by calculating client_backend_pid % number_of_workers. The PIDs should be roughly sequential which distributes the clients across the BGWs.

However in practice there is some still clustering going on. During load tests, there is a tail of clients that are waiting for the same handful of background workers to process them. We can discuss ways to avoid this or whether we should go back to the original approach.

JohnHVancouver

We can discuss ways to avoid this or whether we should go back to the original approach.

I think using rand() is fineif we've found that PID isn't "sequential" enough that there's still some clustering.

Although like you mentioned I think the only time this might actually matter is when there's a connection storm such as a connection pool reconnecting and wanting to establish a bunch of new connections, or appservers re-connecting asap after a restart. One risk we run into is if we're "unlucky" enough we lose the parallelization.

Makefile

src/clientauth.c

JohnHVancouver · 2023-08-24T01:04:41Z

src/clientauth.c

+static void clientauth_hook(Port *port, int status)
+{
+    /* Determine the queue index that this client will insert into based on its PID */
+    int     idx = MyProc->pid % CLIENT_AUTH_MAX_PENDING_ENTRIES;


Ahh you're %-ing this to get the idx to insert. I guess my earlier comments on fairness doesn't really matter too much? Well it might.

OOC what were the trade-offs in doing this approach as opposed to something like:

idx = MyProc % BGW and then start from some idx to try to insert into

I think the fairness comment still applies? The worker will still work on its first entry before moving on to other entries, even if those were filled earlier. If client 1 finishes with entry 1 and signals client 5 to insert, client 5 will get worker priority over clients 2, 3, 4 that occupied the entries 2, 3, 4.

If we put aside the fairness issue (which I think we can resolve by having the worker process all entries per wake or pick entries in a staggered way), I think the two % approaches are the same and we'll observe the same clustering behaviour.

Either the clients will pick distinct entries in the whole queue and then get folded down into BGW buckets, or they'll pick BGW buckets and then spread out across entries in the queue. In the latter approach, clients could in theory find an empty entry earlier instead of having to wait for a single predetermined entry, but since that entry is handled by the same worker anyway it would end up waiting the same amount of time (or at least in aggregate, the total wait time among clients would be the same).

src/clientauth.c

test/t/004_pg_tle_clientauth.pl

To prevent workers from biasing towards processing their first entry, add an offset to the starting index in the for loop that increments every time an entry is picked up for processing.

adamguo0 · 2023-08-24T19:45:29Z

Did some testing:

Client backend PIDs are actually more uniformly distributed than rand(), at least given how I run load tests (which is a few thousand psql -c 'select' & lines)
Workers are not actually picking their first entry significantly more often than their other entries, it's pretty uniform. I added an offset mechanism anyway though in case this does become a problem

New changes:

Add index offset to the worker loop when checking for pending entries
Set default clientauth_num_parallel_workers to 1 and set maximum to min(max_connections, CLIENT_AUTH_MAX_PENDING_ENTRIES)
Ran pg_indent
Reverted Makefile change
Removed unused code, updated some comments

Move query logic to a dedicated function and capture errors from feature_proc.

JohnHVancouver · 2023-08-30T22:07:19Z

src/feature.c

+bool
+check_string_in_guc_list(const char *str, const char *guc_var, const char *guc_name)
+{
+	bool		skip = false;


nit: spacing? Or is this how pg_indent does it which is weird ehh

s/skip/match ?

Ah thanks for catching that, yeah pg_indent likes to do this apparently

src/clientauth.c

JohnHVancouver · 2023-08-30T22:48:47Z

src/clientauth.c

+		 * clientauth_hook
+		 */
+		LWLockAcquire(clientauth_ss->lock, LW_EXCLUSIVE);
+		clientauth_ss->requests[idx].done_processing = true;


I think you should move done_processing to the very end in case BGW gets terminated in between?

src/clientauth.c

1. Fix my pg_indent setup 2. Set enable_clientauth context to POSTMASTER and check its value in clientauth_init to determine whether to start background workers. If a user is not using clientauth, background workers will not sit idle in the background 3. Move done_processing to the very end of BGW loop 4. Remove some logging

If a client backend terminates unexpectedly after setting entry_available = false, it can block every subsequent client from using that entry. This commit adds a PID field to the request entry struct and allows clients to check if the process currently holding the entry still exists. If it doesn't, then it can set entry_available to true. Signalling order is rearranged to prevent deadlocks where a backend is terminated after entry_available is set to false but before signalling anyone else. In addition to signalling the current client, background workers signal the next waiting client to check if the current client still exists.

adamguo0 · 2023-09-05T16:01:49Z

Pushed a fix for the stale entry bug. Copying the commit message here:

If a client backend terminates unexpectedly after setting
entry_available = false, it can block every subsequent client from using
that entry.

This commit adds a PID field to the request entry struct and allows
clients to check if the process currently holding the entry still
exists. If it doesn't, then it can set entry_available to true.

Signalling order is rearranged to prevent deadlocks where a backend is
terminated after entry_available is set to false but before signalling
anyone else. In addition to signalling the current client, background
workers signal the next waiting client to check if the current client
still exists.

I realized the message is not super clear but the upshot is that: after processing, background workers signal waiting clients to check if the PID of the current client still exists. if it doesn't then the waiting client can go ahead

I tested this by setting CLIENT_AUTH_MAX_PENDING_ENTRIES to 1, starting a connection and SIGTERMing the backend, then starting another connection. Previously the second connection would be blocked forever, now it goes through as expected.

If IsBinaryUpgrade, then do not start background workers and skip all logic in clientauth_hook.

src/clientauth.c

JohnHVancouver · 2023-09-05T20:35:22Z

src/clientauth.c

+		 * Copy the entry to local memory and then release the lock to unblock
+		 * other workers/clients.
+		 */
+		LWLockAcquire(clientauth_ss->lock, LW_EXCLUSIVE);


I wonder if this one can be changed to LW_SHARED? IIUC only writes to shared_mem would actually need LW_EXCLUSIVE. Probably applies to other places?

src/clientauth.c

If the user's function returns a table, SELECT func combines all the columns into one, whereas SELECT * FROM lets us pick the first column out of the first returned row. Not sure this really matters since we aren't officially supporting functions that return table types anyway, but this change doesn't affect non-table return values. Add a test case to cover functions that return tables, and add a test case that covers a void-returning function with no error.

Postgres commit 414f6c0 makes this change

We want enable_clientauth to be SIGHUP in case users lock themselves out by setting to REQUIRE without any functions registered. The hooks and background workers will register only if enable_clientauth is ON or REQUIRE at postmaster startup. We should make this very clear in the docs.

JohnHVancouver · 2023-09-06T23:46:40Z

src/clientauth.c

+		hookargs[0] = CStringGetTextDatum(port_subset_str);
+		hookargs[1] = Int32GetDatum(*status);
+
+		SPI_execute_with_args(query, SPI_NARGS_2, hookargtypes, hookargs, hooknulls, true, 0);


I think we should check the return value here too? Might not be necessary but what do you think

https://www.postgresql.org/docs/current/spi-spi-execute-with-args.html
https://www.postgresql.org/docs/current/spi-spi-execute.html

Yeah makes sense, adding the same handling as in passcheck

Is it always SPI_SELECT even if an INSERT was done by the function?

ie. we do something like

SELECT * FROM insert_foo();

I think it is but not sure I can guarantee that, checking < 0 also makes sense

Tested with INSERT and it does return SPI_OK_SELECT -- we also use the same logic in passcheck

src/clientauth.c

This reverts commit bd76413.

JohnHVancouver · 2023-09-07T16:18:55Z

include/compatibility.h

+#define WAIT_EVENT_MESSAGE_QUEUE_PUT_MESSAGE WAIT_EVENT_MQ_PUT_MESSAGE
+#define WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL WAIT_EVENT_WAL_SENDER_WAIT_WAL
+#define WAIT_EVENT_MESSAGE_QUEUE_SEND WAIT_EVENT_MQ_SEND
+#define WAIT_EVENT_MESSAGE_QUEUE_RECEIVE WAIT_EVENT_MQ_RECEIVE


This one's duplicated, and we're only using one right might not need to declare all

adamguo0 added 9 commits August 18, 2023 18:55

Initial implementation of clientauth

d221c01

Process queue entries one at a time

5adff7f

Add GUC to skip databases

d4c75d6

Call previous client auth hook before ours

ffca9b1

Fix compiler warnings

18ac02d

Add pgtle.enable_clientauth = require test

eb96dfa

Fix tests for PG13 and 14

5b17684

Add some comments

16a2a01

Add clientauth to pgtle.pg_tle_features enum

2924d17

adamguo0 requested review from jkatz and JohnHVancouver August 18, 2023 19:30

Add pg_tle--1.1.1.sql to Makefile

3e591ed

jkatz mentioned this pull request Aug 21, 2023

POC implementation for ClientAuthentication_hook #174

Closed

JohnHVancouver reviewed Aug 21, 2023

View reviewed changes

adamguo0 added 2 commits August 22, 2023 21:22

Mark helper functions as static

6bd7ba2

JohnHVancouver requested changes Aug 24, 2023

View reviewed changes

JohnHVancouver reviewed Aug 24, 2023

View reviewed changes

test/t/004_pg_tle_clientauth.pl Outdated Show resolved Hide resolved

adamguo0 added 4 commits August 24, 2023 14:32

Apply pg_indent

4c2d378

Add pg_tle--$(EXTVERSION).sql back to EXTRA_CLEAN

dbe01f0

pg_indent and small changes

9d088d9

Add an index offset to worker loop when checking for pending entries

804ddb9

To prevent workers from biasing towards processing their first entry, add an offset to the starting index in the for loop that increments every time an entry is picked up for processing.

adamguo0 added 2 commits August 28, 2023 15:17

Refactor clientauth users_to_skip and databases_to_skip checks

b4458cc

Refactor worker query logic

92b90bf

Move query logic to a dedicated function and capture errors from feature_proc.

Move SPI calls to clientauth_launcher_run_user_functions

8875a03

JohnHVancouver reviewed Aug 30, 2023

View reviewed changes

adamguo0 added 3 commits August 31, 2023 19:28

Signal BGW before doing any work in client

38d4637

Handle pg_upgrade

944f423

If IsBinaryUpgrade, then do not start background workers and skip all logic in clientauth_hook.

JohnHVancouver reviewed Sep 5, 2023

View reviewed changes

adamguo0 added 8 commits September 5, 2023 23:56

Change LW_EXCLUSIVE to LW_SHARED when reading only

95245f3

Check IsBinaryUpgrade before registering hooks

acee3f3

Set version to 1.2.0

fa05713

Remove elog

511c727

Rename WAIT_EVENT_MQ_* to WAIT_EVENT_MESSAGE_QUEUE_*

ac30344

Postgres commit 414f6c0 makes this change

Fix test numbering

5e9c01b

adamguo0 mentioned this pull request Sep 6, 2023

Add clientauth documentation and example #231

Merged

Update deadlock comment

57c400f

JohnHVancouver reviewed Sep 6, 2023

View reviewed changes

adamguo0 added 3 commits September 7, 2023 00:11

ereport if SPI_execute fails

6e13965

Report SPI error if < 0 instead of != SPI_OK_SELECT

bd76413

Revert "Report SPI error if < 0 instead of != SPI_OK_SELECT"

42fadf0

This reverts commit bd76413.

JohnHVancouver reviewed Sep 7, 2023

View reviewed changes

Remove extra compatibility defines

29dfe78

JohnHVancouver approved these changes Sep 7, 2023

View reviewed changes

adamguo0 merged commit 1835907 into aws:main Sep 7, 2023

adamguo0 deleted the clientauth branch July 1, 2024 17:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement clientauth feature #227

Implement clientauth feature #227

adamguo0 commented Aug 18, 2023

adamguo0 commented Aug 21, 2023 •

edited

Loading

JohnHVancouver left a comment •

edited

Loading

JohnHVancouver Aug 21, 2023

adamguo0 Aug 22, 2023

JohnHVancouver Aug 22, 2023

JohnHVancouver Aug 21, 2023

adamguo0 Aug 22, 2023 •

edited

Loading

JohnHVancouver Aug 22, 2023

JohnHVancouver Aug 21, 2023

adamguo0 Aug 22, 2023 •

edited

Loading

adamguo0 commented Aug 22, 2023 •

edited

Loading

adamguo0 commented Aug 23, 2023

JohnHVancouver left a comment

JohnHVancouver Aug 24, 2023

adamguo0 Aug 24, 2023

adamguo0 commented Aug 24, 2023 •

edited

Loading

JohnHVancouver Aug 30, 2023

adamguo0 Aug 30, 2023

JohnHVancouver Aug 30, 2023

adamguo0 commented Sep 5, 2023 •

edited

Loading

JohnHVancouver Sep 5, 2023

JohnHVancouver Sep 6, 2023

adamguo0 Sep 7, 2023

JohnHVancouver Sep 7, 2023

adamguo0 Sep 7, 2023

adamguo0 Sep 7, 2023

JohnHVancouver Sep 7, 2023

Implement clientauth feature #227

Implement clientauth feature #227

Conversation

adamguo0 commented Aug 18, 2023

adamguo0 commented Aug 21, 2023 • edited Loading

JohnHVancouver left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamguo0 Aug 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamguo0 Aug 22, 2023 • edited Loading

Choose a reason for hiding this comment

adamguo0 commented Aug 22, 2023 • edited Loading

adamguo0 commented Aug 23, 2023

JohnHVancouver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamguo0 commented Aug 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamguo0 commented Sep 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamguo0 commented Aug 21, 2023 •

edited

Loading

JohnHVancouver left a comment •

edited

Loading

adamguo0 Aug 22, 2023 •

edited

Loading

adamguo0 Aug 22, 2023 •

edited

Loading

adamguo0 commented Aug 22, 2023 •

edited

Loading

adamguo0 commented Aug 24, 2023 •

edited

Loading

adamguo0 commented Sep 5, 2023 •

edited

Loading