Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Processes killed while holding a spinlock stop new connections #18272

Closed
1 task done
timothy-e opened this issue Jul 17, 2023 · 0 comments
Closed
1 task done
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@timothy-e
Copy link
Contributor

timothy-e commented Jul 17, 2023

Jira Link: DB-7270

Description

  1. Add a delay to a new process while it holds the ProcStructLock
	spinLockAcquire(ProcStructLock);

	set_spins_per_delay(ProcGlobal->spins_per_delay);
+ 	pg_usleep(1000 * 1000 * 10); // 10s
	MyProc = *procgloballist;
  1. Create a new connection A and kill it while it is holding the lock.
  2. The postmaster cleans up some aspects of the process, but doesn't touch the ProcStructLock or the proc struct list.
  3. Create a new connection B. It will get stuck on the spinlock and terminate after 2 minutes:
ysqlsh: could not connect to server: Connection refused
	Is the server running on host "localhost" (::1) and accepting
	TCP/IP connections on port 5433?
PANIC:  stuck spinlock detected at InitProcess, ../../../../../../../src/postgres/src/backend/storage/lmgr/proc.c:334

This fatal error will kill backend B, but not affect the postmaster.

Solution:

  1. Reduce the time spent until spinlock is determined to be stuck
  2. When a stuck spinlock causes a process to die, the postmaster should be restarted.

This is related to #18265, but the difference is that this issue specifically addresses spinlocks. It only occurs when the process is killed while holding the spinlock, which is extremely rate. If this issue is encountered, it is immediately very bad and obvious.

#18265 addresses the dangling proc struct list entry, and occurs if the backend is killed anytime between claiming a proc struct and having a PID assigned to the proc struct. If that scenario happens enough times, it would result in a max connection error.

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@timothy-e timothy-e added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Jul 17, 2023
@timothy-e timothy-e self-assigned this Jul 17, 2023
@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue and removed status/awaiting-triage Issue awaiting triage labels Jul 17, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug and removed kind/enhancement This is an enhancement of an existing feature labels Nov 15, 2023
timothy-e added a commit that referenced this issue Nov 20, 2023
…lding a spinlock

Summary:
=== What's new in this diff ===
This diff tracks spinlock acquisition during process initialisation by using a
`ybSpinLockAcquired` field on `MyProc` for processes that are already initialised.
1. If a process is killed while holding a spinlock, we restart the postmaster.
2. If a process is killed before it has a `MyProc` (during initilzation), we also restart the postmaster.
    We don't restart if the process exited with code 0 or 1. If the database is still initializing, new connections
    would be rejected with status 1. We don't want to restart the postmaster, because we might end up in a loop
   where the app always tries too early to connect, causing us to start the whole thing over again.
3. Reduce the spinlock deadlock detection time by 75%. We are more likely to have a bad spinlock state than PG, and a stuck spinlock could block any new connection. Not printing anything for 1 minute while the backend is frozen is a bad experience, and since most spinlocks are acquired for only a few instructions, there should never be a case that a process is waiting for 15-30s on a spinlock.

=== Background ===
Logic already existed to track if a process was holding (or acquiring) a LWLock
when it was killed (see the variable `yLWLockAcquired` (previously
`ybAnyLockAcquired`)). This diff adds logic to detect if a spinlock is held or
being acquired when a process is killed. For example, if a process is killed
while holding the `ProcStructLock`, any future connection will fail to acquire
the ProcStructLock - so it cannot initialize itself. Since the postmaster
typically doesn't acquire this lock, it remains blissfully unaware that no new
children can be acquired.

When a LWLock is acquired, it is guaranteed that the global variable `MyProc` is
 non-null. So it's straightforward to add a field to `MyProc` to indicate if
 this process is holding a lock. However, spinlocks are a much lower level
 construct, and are used to safely create and allocate the `MyProc` object. Some
  spinlocks are acquired before `MyProc` or `MyBackendId` are set. See uses of
  `ProcStructLock` in `proc.c::InitProcess`.

There are three different stages in which we acquire spinlocks:
- During Postmaster init. In this case, the acquiring process is the postmaster,
   so if the acquiring process is killed, we are not left in a bad state (because the postmaster
   is restarting)
- After process init: MyProc, MyProcPid, MyBackendId, and ProcGlobal are all
  initialised and functional. It is easy to handle cases like this - we handle
  them similarly to LWLocks.
- During process init. In this stage, we have a `ProcGlobal` with some various
  global fields set and `MyProcPid`.  `MyProc` is null and `MyBackendId` is not
  yet set. This is the hard case, discussed next.

=== How to handle SpinLocks acquired during process initialisation ===

Since there is no way to communicate to the postmaster that a process died holding a spinlock,
we simply restart if we find that a process has crashed / been killed without owning a PGPROC struct.

Another idea (and a previous version of this diff) was to use buckets, writing to a large array that
`crashed_pid % 1000` owns a spinlock. If a process died without a PGPROC, we could check the array to see
if it was likely that it owned a spinlock. This is more complicated, not perfect, and covers a rare scenario.
It is better to more conservatively crash than to introduce potential new bugs or gaps.

Since the OOM killer is more likely to kill procs with high memory usage, and backends during initialization
don't have high memory usage, it is very unlikely that they will be killed, so very unlikely that we'll have to
restart the postmaster because a process with no PGPROC was killed.
Jira: DB-7270, DB-7265

Test Plan:
=== Test Connections before database initialization ===
1. Add delay to a new process while it holds the ProcStructLock in proc.c:
```lang=diff
        SpinLockAcquire(ProcStructLock);
-
+       pg_usleep(1000 * 1000 * 10);
        set_spins_per_delay(ProcGlobal->spins_per_delay);
```
2. Restart :
```
 bin/yb-ctl destroy && bin/yb-ctl start
```
3. Repeatedly make connections. The first ones will fail with
```
ysqlsh: could not connect to server: Connection refused
	Is the server running on host "localhost" (::1) and accepting
	TCP/IP connections on port 5433?
could not connect to server: Connection refused
	Is the server running on host "localhost" (127.0.0.1) and accepting
	TCP/IP connections on port 5433?
```
And after a few seconds, they will fail with
```
ysqlsh: could not connect to server: Connection refused
	Is the server running on host "localhost" (::1) and accepting
	TCP/IP connections on port 5433?
FATAL:  the database system is starting up
```
In the logs, notice
```
2023-11-08 20:13:28.030 UTC [274016] FATAL:  the database system is starting up
2023-11-08 20:13:28.035 UTC [273956] WARNING:  server process (PID 274016) exited with exit code 1
```
for each of the second type of failures.
4. Finally, the database will finish initializing:
```
2023-11-08 20:13:31.669 UTC [273956] LOG:  database system is ready to accept connections
```

=== Test Partially Initialised Process ===
1. Add delay to a new process while it holds the ProcStructLock in proc.c:
```lang=diff
        SpinLockAcquire(ProcStructLock);
-
+       pg_usleep(1000 * 1000 * 10);
        set_spins_per_delay(ProcGlobal->spins_per_delay);
```
2. Create a new connection A
3. Create a new connection B and kill it while it is holding the lock. (`kill -9 44650`)
4. Observe that:
 * Connection A must reconnect because the postmaster restarted
```
yugabyte=# select * from pg_class;
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
yugabyte=#
```
 * In the logs:
```
2023-11-08 20:08:16.903 UTC [272686] WARNING:  terminating active server processes due to backend crash of a partially initialized process
2023-11-08 20:08:16.903 UTC [272686] WARNING:  server process (PID 272919) was terminated by signal 9: Killed
2023-11-08 20:08:16.903 UTC [272686] WARNING:  could not find BEEntry for pid 272919 to add to yb_terminated_queries
2023-11-08 20:08:16.903 UTC [272866] WARNING:  terminating connection because of crash of another server process
2023-11-08 20:08:16.903 UTC [272866] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2023-11-08 20:08:16.903 UTC [272866] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2023-11-08 20:08:16.907 UTC [272686] WARNING:  server process (PID 272866) exited with exit code 2
2023-11-08 20:08:16.909 UTC [272686] LOG:  all server processes terminated; reinitializing
```

(272866 is the existing connection, 272686 is the postmaster, and 272919 is the new process)

=== Test Initialized Process ===
1. Add the lines
```lang=c
slock_t* lock = NULL;
lock = (slock_t *) ShmemAlloc(sizeof(slock_t));
SpinLockInit(lock);
SpinLockAcquire(lock);
pg_usleep(1000 * 1000 * 10);
SpinLockRelease(lock);
```
somewhere to `YbSeqNext` in `YbSeqScan.c`.
2. Create a new connection A
3. Create a new connection B. Run a seq scan and kill it while it is holding the lock.
4. Observe that:
 * Connection A must reconnect, exactly as in the previous case.
 * In the logs:
```
2023-09-13 16:45:16.091 EDT [14387] WARNING:  terminating active server processes due to backend crash while acquiring SpinLock
2023-09-13 16:45:16.091 EDT [14387] WARNING:  server process (PID 14403) was terminated by signal 9: Killed
2023-09-13 16:45:16.091 EDT [14387] DETAIL:  Failed process was running: SELECT * FROM t;
2023-09-13 16:45:16.091 EDT [14466] WARNING:  terminating connection because of crash of another server process
2023-09-13 16:45:16.091 EDT [14466] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2023-09-13 16:45:16.091 EDT [14466] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
```

=== Test Spinlock Deadlock Detection Timeout ===
1. Apply the following changes to `YbSeqNext` in `nodeYbSeqScan.c`:
```lang=diff
--- a/src/postgres/src/backend/executor/nodeYbSeqscan.c
+++ b/src/postgres/src/backend/executor/nodeYbSeqscan.c
@@ -71,6 +71,14 @@ YbSeqNext(YbSeqScanState *node)
 	econtext = node->ss.ps.ps_ExprContext;
 	slot = node->ss.ss_ScanTupleSlot;

+	slock_t* lock = NULL;
+	lock = (slock_t *) ShmemAlloc(sizeof(slock_t));
+	SpinLockInit(lock);
+	SpinLockAcquire(lock);
+	SpinLockAcquire(lock); // double acquire, will deadlock
+	pg_usleep(1000 * 1000 * 10);
+	SpinLockRelease(lock);
+
 	/*
 	 * Initialize the scandesc upon the first invocation.
 	 * The scan only needs its ybscan field, so eventually we may use it
```

2. Create a session and run a sequential scan: `bin/ysqlsh -c "SELECT * FROM pg_class;"`
3. Observe in the logs that a session started:
```
I1120 13:14:40.311434 56496 pg_client.cc:165] Session id 3: Session id acquired. Postgres backend pid: 56496
```
4. Observe in the logs that the session was killed, 35 seconds later:
```
2023-11-20 13:15:10.214 UTC [56496] PANIC:  stuck spinlock detected at YbSeqNext, ../../../../../../src/postgres/src/backend/executor/nodeYbSeqscan.c:78
```

Reviewers: amartsinchyk, smishra

Reviewed By: amartsinchyk

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D28471
Spikatrix pushed a commit to Spikatrix/yugabyte-db that referenced this issue Nov 29, 2023
…er when a process dies while holding a spinlock

Summary:
Original commit: 8e8cead / D28471
=== What's new in this diff ===
This diff tracks spinlock acquisition during process initialisation by using a
`ybSpinLockAcquired` field on `MyProc` for processes that are already initialised.
1. If a process is killed while holding a spinlock, we restart the postmaster.
2. If a process is killed before it has a `MyProc` (during initilzation), we also restart the postmaster.
    We don't restart if the process exited with code 0 or 1. If the database is still initializing, new connections
    would be rejected with status 1. We don't want to restart the postmaster, because we might end up in a loop
   where the app always tries too early to connect, causing us to start the whole thing over again.
3. Reduce the spinlock deadlock detection time by 75%. We are more likely to have a bad spinlock state than PG, and a stuck spinlock could block any new connection. Not printing anything for 1 minute while the backend is frozen is a bad experience, and since most spinlocks are acquired for only a few instructions, there should never be a case that a process is waiting for 15-30s on a spinlock.

=== Background ===
Logic already existed to track if a process was holding (or acquiring) a LWLock
when it was killed (see the variable `yLWLockAcquired` (previously
`ybAnyLockAcquired`)). This diff adds logic to detect if a spinlock is held or
being acquired when a process is killed. For example, if a process is killed
while holding the `ProcStructLock`, any future connection will fail to acquire
the ProcStructLock - so it cannot initialize itself. Since the postmaster
typically doesn't acquire this lock, it remains blissfully unaware that no new
children can be acquired.

When a LWLock is acquired, it is guaranteed that the global variable `MyProc` is
 non-null. So it's straightforward to add a field to `MyProc` to indicate if
 this process is holding a lock. However, spinlocks are a much lower level
 construct, and are used to safely create and allocate the `MyProc` object. Some
  spinlocks are acquired before `MyProc` or `MyBackendId` are set. See uses of
  `ProcStructLock` in `proc.c::InitProcess`.

There are three different stages in which we acquire spinlocks:
- During Postmaster init. In this case, the acquiring process is the postmaster,
   so if the acquiring process is killed, we are not left in a bad state (because the postmaster
   is restarting)
- After process init: MyProc, MyProcPid, MyBackendId, and ProcGlobal are all
  initialised and functional. It is easy to handle cases like this - we handle
  them similarly to LWLocks.
- During process init. In this stage, we have a `ProcGlobal` with some various
  global fields set and `MyProcPid`.  `MyProc` is null and `MyBackendId` is not
  yet set. This is the hard case, discussed next.

=== How to handle SpinLocks acquired during process initialisation ===

Since there is no way to communicate to the postmaster that a process died holding a spinlock,
we simply restart if we find that a process has crashed / been killed without owning a PGPROC struct.

Another idea (and a previous version of this diff) was to use buckets, writing to a large array that
`crashed_pid % 1000` owns a spinlock. If a process died without a PGPROC, we could check the array to see
if it was likely that it owned a spinlock. This is more complicated, not perfect, and covers a rare scenario.
It is better to more conservatively crash than to introduce potential new bugs or gaps.

Since the OOM killer is more likely to kill procs with high memory usage, and backends during initialization
don't have high memory usage, it is very unlikely that they will be killed, so very unlikely that we'll have to
restart the postmaster because a process with no PGPROC was killed.
Jira: DB-7270, DB-7265

Test Plan:
=== Test Connections before database initialization ===
1. Add delay to a new process while it holds the ProcStructLock in proc.c:
```lang=diff
        SpinLockAcquire(ProcStructLock);
-
+       pg_usleep(1000 * 1000 * 10);
        set_spins_per_delay(ProcGlobal->spins_per_delay);
```
2. Restart :
```
 bin/yb-ctl destroy && bin/yb-ctl start
```
3. Repeatedly make connections. The first ones will fail with
```
ysqlsh: could not connect to server: Connection refused
	Is the server running on host "localhost" (::1) and accepting
	TCP/IP connections on port 5433?
could not connect to server: Connection refused
	Is the server running on host "localhost" (127.0.0.1) and accepting
	TCP/IP connections on port 5433?
```
And after a few seconds, they will fail with
```
ysqlsh: could not connect to server: Connection refused
	Is the server running on host "localhost" (::1) and accepting
	TCP/IP connections on port 5433?
FATAL:  the database system is starting up
```
In the logs, notice
```
2023-11-08 20:13:28.030 UTC [274016] FATAL:  the database system is starting up
2023-11-08 20:13:28.035 UTC [273956] WARNING:  server process (PID 274016) exited with exit code 1
```
for each of the second type of failures.
4. Finally, the database will finish initializing:
```
2023-11-08 20:13:31.669 UTC [273956] LOG:  database system is ready to accept connections
```

=== Test Partially Initialised Process ===
1. Add delay to a new process while it holds the ProcStructLock in proc.c:
```lang=diff
        SpinLockAcquire(ProcStructLock);
-
+       pg_usleep(1000 * 1000 * 10);
        set_spins_per_delay(ProcGlobal->spins_per_delay);
```
2. Create a new connection A
3. Create a new connection B and kill it while it is holding the lock. (`kill -9 44650`)
4. Observe that:
 * Connection A must reconnect because the postmaster restarted
```
yugabyte=# select * from pg_class;
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
yugabyte=#
```
 * In the logs:
```
2023-11-08 20:08:16.903 UTC [272686] WARNING:  terminating active server processes due to backend crash of a partially initialized process
2023-11-08 20:08:16.903 UTC [272686] WARNING:  server process (PID 272919) was terminated by signal 9: Killed
2023-11-08 20:08:16.903 UTC [272686] WARNING:  could not find BEEntry for pid 272919 to add to yb_terminated_queries
2023-11-08 20:08:16.903 UTC [272866] WARNING:  terminating connection because of crash of another server process
2023-11-08 20:08:16.903 UTC [272866] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2023-11-08 20:08:16.903 UTC [272866] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2023-11-08 20:08:16.907 UTC [272686] WARNING:  server process (PID 272866) exited with exit code 2
2023-11-08 20:08:16.909 UTC [272686] LOG:  all server processes terminated; reinitializing
```

(272866 is the existing connection, 272686 is the postmaster, and 272919 is the new process)

=== Test Initialized Process ===
1. Add the lines
```lang=c
slock_t* lock = NULL;
lock = (slock_t *) ShmemAlloc(sizeof(slock_t));
SpinLockInit(lock);
SpinLockAcquire(lock);
pg_usleep(1000 * 1000 * 10);
SpinLockRelease(lock);
```
somewhere to `YbSeqNext` in `YbSeqScan.c`.
2. Create a new connection A
3. Create a new connection B. Run a seq scan and kill it while it is holding the lock.
4. Observe that:
 * Connection A must reconnect, exactly as in the previous case.
 * In the logs:
```
2023-09-13 16:45:16.091 EDT [14387] WARNING:  terminating active server processes due to backend crash while acquiring SpinLock
2023-09-13 16:45:16.091 EDT [14387] WARNING:  server process (PID 14403) was terminated by signal 9: Killed
2023-09-13 16:45:16.091 EDT [14387] DETAIL:  Failed process was running: SELECT * FROM t;
2023-09-13 16:45:16.091 EDT [14466] WARNING:  terminating connection because of crash of another server process
2023-09-13 16:45:16.091 EDT [14466] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2023-09-13 16:45:16.091 EDT [14466] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
```

=== Test Spinlock Deadlock Detection Timeout ===
1. Apply the following changes to `YbSeqNext` in `nodeYbSeqScan.c`:
```lang=diff
--- a/src/postgres/src/backend/executor/nodeYbSeqscan.c
+++ b/src/postgres/src/backend/executor/nodeYbSeqscan.c
@@ -71,6 +71,14 @@ YbSeqNext(YbSeqScanState *node)
 	econtext = node->ss.ps.ps_ExprContext;
 	slot = node->ss.ss_ScanTupleSlot;

+	slock_t* lock = NULL;
+	lock = (slock_t *) ShmemAlloc(sizeof(slock_t));
+	SpinLockInit(lock);
+	SpinLockAcquire(lock);
+	SpinLockAcquire(lock); // double acquire, will deadlock
+	pg_usleep(1000 * 1000 * 10);
+	SpinLockRelease(lock);
+
 	/*
 	 * Initialize the scandesc upon the first invocation.
 	 * The scan only needs its ybscan field, so eventually we may use it
```

2. Create a session and run a sequential scan: `bin/ysqlsh -c "SELECT * FROM pg_class;"`
3. Observe in the logs that a session started:
```
I1120 13:14:40.311434 56496 pg_client.cc:165] Session id 3: Session id acquired. Postgres backend pid: 56496
```
4. Observe in the logs that the session was killed, 35 seconds later:
```
2023-11-20 13:15:10.214 UTC [56496] PANIC:  stuck spinlock detected at YbSeqNext, ../../../../../../src/postgres/src/backend/executor/nodeYbSeqscan.c:78
```

Reviewers: amartsinchyk, smishra

Reviewed By: amartsinchyk

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30375
Spikatrix pushed a commit to Spikatrix/yugabyte-db that referenced this issue Nov 29, 2023
…er when a process dies while holding a spinlock

Summary:
Original commit: 8e8cead / D28471
=== What's new in this diff ===
This diff tracks spinlock acquisition during process initialisation by using a
`ybSpinLockAcquired` field on `MyProc` for processes that are already initialised.
1. If a process is killed while holding a spinlock, we restart the postmaster.
2. If a process is killed before it has a `MyProc` (during initilzation), we also restart the postmaster.
    We don't restart if the process exited with code 0 or 1. If the database is still initializing, new connections
    would be rejected with status 1. We don't want to restart the postmaster, because we might end up in a loop
   where the app always tries too early to connect, causing us to start the whole thing over again.
3. Reduce the spinlock deadlock detection time by 75%. We are more likely to have a bad spinlock state than PG, and a stuck spinlock could block any new connection. Not printing anything for 1 minute while the backend is frozen is a bad experience, and since most spinlocks are acquired for only a few instructions, there should never be a case that a process is waiting for 15-30s on a spinlock.

=== Background ===
Logic already existed to track if a process was holding (or acquiring) a LWLock
when it was killed (see the variable `yLWLockAcquired` (previously
`ybAnyLockAcquired`)). This diff adds logic to detect if a spinlock is held or
being acquired when a process is killed. For example, if a process is killed
while holding the `ProcStructLock`, any future connection will fail to acquire
the ProcStructLock - so it cannot initialize itself. Since the postmaster
typically doesn't acquire this lock, it remains blissfully unaware that no new
children can be acquired.

When a LWLock is acquired, it is guaranteed that the global variable `MyProc` is
 non-null. So it's straightforward to add a field to `MyProc` to indicate if
 this process is holding a lock. However, spinlocks are a much lower level
 construct, and are used to safely create and allocate the `MyProc` object. Some
  spinlocks are acquired before `MyProc` or `MyBackendId` are set. See uses of
  `ProcStructLock` in `proc.c::InitProcess`.

There are three different stages in which we acquire spinlocks:
- During Postmaster init. In this case, the acquiring process is the postmaster,
   so if the acquiring process is killed, we are not left in a bad state (because the postmaster
   is restarting)
- After process init: MyProc, MyProcPid, MyBackendId, and ProcGlobal are all
  initialised and functional. It is easy to handle cases like this - we handle
  them similarly to LWLocks.
- During process init. In this stage, we have a `ProcGlobal` with some various
  global fields set and `MyProcPid`.  `MyProc` is null and `MyBackendId` is not
  yet set. This is the hard case, discussed next.

=== How to handle SpinLocks acquired during process initialisation ===

Since there is no way to communicate to the postmaster that a process died holding a spinlock,
we simply restart if we find that a process has crashed / been killed without owning a PGPROC struct.

Another idea (and a previous version of this diff) was to use buckets, writing to a large array that
`crashed_pid % 1000` owns a spinlock. If a process died without a PGPROC, we could check the array to see
if it was likely that it owned a spinlock. This is more complicated, not perfect, and covers a rare scenario.
It is better to more conservatively crash than to introduce potential new bugs or gaps.

Since the OOM killer is more likely to kill procs with high memory usage, and backends during initialization
don't have high memory usage, it is very unlikely that they will be killed, so very unlikely that we'll have to
restart the postmaster because a process with no PGPROC was killed.
Jira: DB-7270, DB-7265

Test Plan:
=== Test Connections before database initialization ===
1. Add delay to a new process while it holds the ProcStructLock in proc.c:
```lang=diff
        SpinLockAcquire(ProcStructLock);
-
+       pg_usleep(1000 * 1000 * 10);
        set_spins_per_delay(ProcGlobal->spins_per_delay);
```
2. Restart :
```
 bin/yb-ctl destroy && bin/yb-ctl start
```
3. Repeatedly make connections. The first ones will fail with
```
ysqlsh: could not connect to server: Connection refused
	Is the server running on host "localhost" (::1) and accepting
	TCP/IP connections on port 5433?
could not connect to server: Connection refused
	Is the server running on host "localhost" (127.0.0.1) and accepting
	TCP/IP connections on port 5433?
```
And after a few seconds, they will fail with
```
ysqlsh: could not connect to server: Connection refused
	Is the server running on host "localhost" (::1) and accepting
	TCP/IP connections on port 5433?
FATAL:  the database system is starting up
```
In the logs, notice
```
2023-11-08 20:13:28.030 UTC [274016] FATAL:  the database system is starting up
2023-11-08 20:13:28.035 UTC [273956] WARNING:  server process (PID 274016) exited with exit code 1
```
for each of the second type of failures.
4. Finally, the database will finish initializing:
```
2023-11-08 20:13:31.669 UTC [273956] LOG:  database system is ready to accept connections
```

=== Test Partially Initialised Process ===
1. Add delay to a new process while it holds the ProcStructLock in proc.c:
```lang=diff
        SpinLockAcquire(ProcStructLock);
-
+       pg_usleep(1000 * 1000 * 10);
        set_spins_per_delay(ProcGlobal->spins_per_delay);
```
2. Create a new connection A
3. Create a new connection B and kill it while it is holding the lock. (`kill -9 44650`)
4. Observe that:
 * Connection A must reconnect because the postmaster restarted
```
yugabyte=# select * from pg_class;
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
yugabyte=#
```
 * In the logs:
```
2023-11-08 20:08:16.903 UTC [272686] WARNING:  terminating active server processes due to backend crash of a partially initialized process
2023-11-08 20:08:16.903 UTC [272686] WARNING:  server process (PID 272919) was terminated by signal 9: Killed
2023-11-08 20:08:16.903 UTC [272686] WARNING:  could not find BEEntry for pid 272919 to add to yb_terminated_queries
2023-11-08 20:08:16.903 UTC [272866] WARNING:  terminating connection because of crash of another server process
2023-11-08 20:08:16.903 UTC [272866] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2023-11-08 20:08:16.903 UTC [272866] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2023-11-08 20:08:16.907 UTC [272686] WARNING:  server process (PID 272866) exited with exit code 2
2023-11-08 20:08:16.909 UTC [272686] LOG:  all server processes terminated; reinitializing
```

(272866 is the existing connection, 272686 is the postmaster, and 272919 is the new process)

=== Test Initialized Process ===
1. Add the lines
```lang=c
slock_t* lock = NULL;
lock = (slock_t *) ShmemAlloc(sizeof(slock_t));
SpinLockInit(lock);
SpinLockAcquire(lock);
pg_usleep(1000 * 1000 * 10);
SpinLockRelease(lock);
```
somewhere to `YbSeqNext` in `YbSeqScan.c`.
2. Create a new connection A
3. Create a new connection B. Run a seq scan and kill it while it is holding the lock.
4. Observe that:
 * Connection A must reconnect, exactly as in the previous case.
 * In the logs:
```
2023-09-13 16:45:16.091 EDT [14387] WARNING:  terminating active server processes due to backend crash while acquiring SpinLock
2023-09-13 16:45:16.091 EDT [14387] WARNING:  server process (PID 14403) was terminated by signal 9: Killed
2023-09-13 16:45:16.091 EDT [14387] DETAIL:  Failed process was running: SELECT * FROM t;
2023-09-13 16:45:16.091 EDT [14466] WARNING:  terminating connection because of crash of another server process
2023-09-13 16:45:16.091 EDT [14466] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2023-09-13 16:45:16.091 EDT [14466] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
```

=== Test Spinlock Deadlock Detection Timeout ===
1. Apply the following changes to `YbSeqNext` in `nodeYbSeqScan.c`:
```lang=diff
--- a/src/postgres/src/backend/executor/nodeYbSeqscan.c
+++ b/src/postgres/src/backend/executor/nodeYbSeqscan.c
@@ -71,6 +71,14 @@ YbSeqNext(YbSeqScanState *node)
 	econtext = node->ss.ps.ps_ExprContext;
 	slot = node->ss.ss_ScanTupleSlot;

+	slock_t* lock = NULL;
+	lock = (slock_t *) ShmemAlloc(sizeof(slock_t));
+	SpinLockInit(lock);
+	SpinLockAcquire(lock);
+	SpinLockAcquire(lock); // double acquire, will deadlock
+	pg_usleep(1000 * 1000 * 10);
+	SpinLockRelease(lock);
+
 	/*
 	 * Initialize the scandesc upon the first invocation.
 	 * The scan only needs its ybscan field, so eventually we may use it
```

2. Create a session and run a sequential scan: `bin/ysqlsh -c "SELECT * FROM pg_class;"`
3. Observe in the logs that a session started:
```
I1120 13:14:40.311434 56496 pg_client.cc:165] Session id 3: Session id acquired. Postgres backend pid: 56496
```
4. Observe in the logs that the session was killed, 35 seconds later:
```
2023-11-20 13:15:10.214 UTC [56496] PANIC:  stuck spinlock detected at YbSeqNext, ../../../../../../src/postgres/src/backend/executor/nodeYbSeqscan.c:78
```

Reviewers: amartsinchyk, smishra

Reviewed By: amartsinchyk

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30374
@yugabyte-ci yugabyte-ci reopened this Nov 29, 2023
@sushantrmishra sushantrmishra reopened this Dec 6, 2023
jharveysmith pushed a commit that referenced this issue May 24, 2024
…s dies while holding a spinlock

Summary:
Original commit: ef8433a90d470a2bc841e170bd2e70633f35273d / D28471
=== What's new in this diff ===
This diff tracks spinlock acquisition during process initialisation by using a
`ybSpinLockAcquired` field on `MyProc` for processes that are already initialised.
1. If a process is killed while holding a spinlock, we restart the postmaster.
2. If a process is killed before it has a `MyProc` (during initilzation), we also restart the postmaster.
    We don't restart if the process exited with code 0 or 1. If the database is still initializing, new connections
    would be rejected with status 1. We don't want to restart the postmaster, because we might end up in a loop
   where the app always tries too early to connect, causing us to start the whole thing over again.
3. Reduce the spinlock deadlock detection time by 75%. We are more likely to have a bad spinlock state than PG, and a stuck spinlock could block any new connection. Not printing anything for 1 minute while the backend is frozen is a bad experience, and since most spinlocks are acquired for only a few instructions, there should never be a case that a process is waiting for 15-30s on a spinlock.

=== Background ===
Logic already existed to track if a process was holding (or acquiring) a LWLock
when it was killed (see the variable `yLWLockAcquired` (previously
`ybAnyLockAcquired`)). This diff adds logic to detect if a spinlock is held or
being acquired when a process is killed. For example, if a process is killed
while holding the `ProcStructLock`, any future connection will fail to acquire
the ProcStructLock - so it cannot initialize itself. Since the postmaster
typically doesn't acquire this lock, it remains blissfully unaware that no new
children can be acquired.

When a LWLock is acquired, it is guaranteed that the global variable `MyProc` is
 non-null. So it's straightforward to add a field to `MyProc` to indicate if
 this process is holding a lock. However, spinlocks are a much lower level
 construct, and are used to safely create and allocate the `MyProc` object. Some
  spinlocks are acquired before `MyProc` or `MyBackendId` are set. See uses of
  `ProcStructLock` in `proc.c::InitProcess`.

There are three different stages in which we acquire spinlocks:
- During Postmaster init. In this case, the acquiring process is the postmaster,
   so if the acquiring process is killed, we are not left in a bad state (because the postmaster
   is restarting)
- After process init: MyProc, MyProcPid, MyBackendId, and ProcGlobal are all
  initialised and functional. It is easy to handle cases like this - we handle
  them similarly to LWLocks.
- During process init. In this stage, we have a `ProcGlobal` with some various
  global fields set and `MyProcPid`.  `MyProc` is null and `MyBackendId` is not
  yet set. This is the hard case, discussed next.

=== How to handle SpinLocks acquired during process initialisation ===

Since there is no way to communicate to the postmaster that a process died holding a spinlock,
we simply restart if we find that a process has crashed / been killed without owning a PGPROC struct.

Another idea (and a previous version of this diff) was to use buckets, writing to a large array that
`crashed_pid % 1000` owns a spinlock. If a process died without a PGPROC, we could check the array to see
if it was likely that it owned a spinlock. This is more complicated, not perfect, and covers a rare scenario.
It is better to more conservatively crash than to introduce potential new bugs or gaps.

Since the OOM killer is more likely to kill procs with high memory usage, and backends during initialization
don't have high memory usage, it is very unlikely that they will be killed, so very unlikely that we'll have to
restart the postmaster because a process with no PGPROC was killed.
Jira: DB-7270, DB-7265

Test Plan:
=== Test Connections before database initialization ===
1. Add delay to a new process while it holds the ProcStructLock in proc.c:
```lang=diff
        SpinLockAcquire(ProcStructLock);
-
+       pg_usleep(1000 * 1000 * 10);
        set_spins_per_delay(ProcGlobal->spins_per_delay);
```
2. Restart :
```
 bin/yb-ctl destroy && bin/yb-ctl start
```
3. Repeatedly make connections. The first ones will fail with
```
ysqlsh: could not connect to server: Connection refused
	Is the server running on host "localhost" (::1) and accepting
	TCP/IP connections on port 5433?
could not connect to server: Connection refused
	Is the server running on host "localhost" (127.0.0.1) and accepting
	TCP/IP connections on port 5433?
```
And after a few seconds, they will fail with
```
ysqlsh: could not connect to server: Connection refused
	Is the server running on host "localhost" (::1) and accepting
	TCP/IP connections on port 5433?
FATAL:  the database system is starting up
```
In the logs, notice
```
2023-11-08 20:13:28.030 UTC [274016] FATAL:  the database system is starting up
2023-11-08 20:13:28.035 UTC [273956] WARNING:  server process (PID 274016) exited with exit code 1
```
for each of the second type of failures.
4. Finally, the database will finish initializing:
```
2023-11-08 20:13:31.669 UTC [273956] LOG:  database system is ready to accept connections
```

=== Test Partially Initialised Process ===
1. Add delay to a new process while it holds the ProcStructLock in proc.c:
```lang=diff
        SpinLockAcquire(ProcStructLock);
-
+       pg_usleep(1000 * 1000 * 10);
        set_spins_per_delay(ProcGlobal->spins_per_delay);
```
2. Create a new connection A
3. Create a new connection B and kill it while it is holding the lock. (`kill -9 44650`)
4. Observe that:
 * Connection A must reconnect because the postmaster restarted
```
yugabyte=# select * from pg_class;
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
yugabyte=#
```
 * In the logs:
```
2023-11-08 20:08:16.903 UTC [272686] WARNING:  terminating active server processes due to backend crash of a partially initialized process
2023-11-08 20:08:16.903 UTC [272686] WARNING:  server process (PID 272919) was terminated by signal 9: Killed
2023-11-08 20:08:16.903 UTC [272686] WARNING:  could not find BEEntry for pid 272919 to add to yb_terminated_queries
2023-11-08 20:08:16.903 UTC [272866] WARNING:  terminating connection because of crash of another server process
2023-11-08 20:08:16.903 UTC [272866] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2023-11-08 20:08:16.903 UTC [272866] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2023-11-08 20:08:16.907 UTC [272686] WARNING:  server process (PID 272866) exited with exit code 2
2023-11-08 20:08:16.909 UTC [272686] LOG:  all server processes terminated; reinitializing
```

(272866 is the existing connection, 272686 is the postmaster, and 272919 is the new process)

=== Test Initialized Process ===
1. Add the lines
```lang=c
slock_t* lock = NULL;
lock = (slock_t *) ShmemAlloc(sizeof(slock_t));
SpinLockInit(lock);
SpinLockAcquire(lock);
pg_usleep(1000 * 1000 * 10);
SpinLockRelease(lock);
```
somewhere to `YbSeqNext` in `YbSeqScan.c`.
2. Create a new connection A
3. Create a new connection B. Run a seq scan and kill it while it is holding the lock.
4. Observe that:
 * Connection A must reconnect, exactly as in the previous case.
 * In the logs:
```
2023-09-13 16:45:16.091 EDT [14387] WARNING:  terminating active server processes due to backend crash while acquiring SpinLock
2023-09-13 16:45:16.091 EDT [14387] WARNING:  server process (PID 14403) was terminated by signal 9: Killed
2023-09-13 16:45:16.091 EDT [14387] DETAIL:  Failed process was running: SELECT * FROM t;
2023-09-13 16:45:16.091 EDT [14466] WARNING:  terminating connection because of crash of another server process
2023-09-13 16:45:16.091 EDT [14466] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2023-09-13 16:45:16.091 EDT [14466] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
```

=== Test Spinlock Deadlock Detection Timeout ===
1. Apply the following changes to `YbSeqNext` in `nodeYbSeqScan.c`:
```lang=diff
--- a/src/postgres/src/backend/executor/nodeYbSeqscan.c
+++ b/src/postgres/src/backend/executor/nodeYbSeqscan.c
@@ -71,6 +71,14 @@ YbSeqNext(YbSeqScanState *node)
 	econtext = node->ss.ps.ps_ExprContext;
 	slot = node->ss.ss_ScanTupleSlot;

+	slock_t* lock = NULL;
+	lock = (slock_t *) ShmemAlloc(sizeof(slock_t));
+	SpinLockInit(lock);
+	SpinLockAcquire(lock);
+	SpinLockAcquire(lock); // double acquire, will deadlock
+	pg_usleep(1000 * 1000 * 10);
+	SpinLockRelease(lock);
+
 	/*
 	 * Initialize the scandesc upon the first invocation.
 	 * The scan only needs its ybscan field, so eventually we may use it
```

2. Create a session and run a sequential scan: `bin/ysqlsh -c "SELECT * FROM pg_class;"`
3. Observe in the logs that a session started:
```
I1120 13:14:40.311434 56496 pg_client.cc:165] Session id 3: Session id acquired. Postgres backend pid: 56496
```
4. Observe in the logs that the session was killed, 35 seconds later:
```
2023-11-20 13:15:10.214 UTC [56496] PANIC:  stuck spinlock detected at YbSeqNext, ../../../../../../src/postgres/src/backend/executor/nodeYbSeqscan.c:78
```

Reviewers: amartsinchyk, smishra

Reviewed By: amartsinchyk

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30374
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

3 participants