-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] Processes killed while holding a spinlock stop new connections #18272
Labels
Comments
timothy-e
added
area/ysql
Yugabyte SQL (YSQL)
status/awaiting-triage
Issue awaiting triage
labels
Jul 17, 2023
yugabyte-ci
added
kind/enhancement
This is an enhancement of an existing feature
priority/medium
Medium priority issue
and removed
status/awaiting-triage
Issue awaiting triage
labels
Jul 17, 2023
yugabyte-ci
added
kind/bug
This issue is a bug
and removed
kind/enhancement
This is an enhancement of an existing feature
labels
Nov 15, 2023
timothy-e
added a commit
that referenced
this issue
Nov 20, 2023
…lding a spinlock Summary: === What's new in this diff === This diff tracks spinlock acquisition during process initialisation by using a `ybSpinLockAcquired` field on `MyProc` for processes that are already initialised. 1. If a process is killed while holding a spinlock, we restart the postmaster. 2. If a process is killed before it has a `MyProc` (during initilzation), we also restart the postmaster. We don't restart if the process exited with code 0 or 1. If the database is still initializing, new connections would be rejected with status 1. We don't want to restart the postmaster, because we might end up in a loop where the app always tries too early to connect, causing us to start the whole thing over again. 3. Reduce the spinlock deadlock detection time by 75%. We are more likely to have a bad spinlock state than PG, and a stuck spinlock could block any new connection. Not printing anything for 1 minute while the backend is frozen is a bad experience, and since most spinlocks are acquired for only a few instructions, there should never be a case that a process is waiting for 15-30s on a spinlock. === Background === Logic already existed to track if a process was holding (or acquiring) a LWLock when it was killed (see the variable `yLWLockAcquired` (previously `ybAnyLockAcquired`)). This diff adds logic to detect if a spinlock is held or being acquired when a process is killed. For example, if a process is killed while holding the `ProcStructLock`, any future connection will fail to acquire the ProcStructLock - so it cannot initialize itself. Since the postmaster typically doesn't acquire this lock, it remains blissfully unaware that no new children can be acquired. When a LWLock is acquired, it is guaranteed that the global variable `MyProc` is non-null. So it's straightforward to add a field to `MyProc` to indicate if this process is holding a lock. However, spinlocks are a much lower level construct, and are used to safely create and allocate the `MyProc` object. Some spinlocks are acquired before `MyProc` or `MyBackendId` are set. See uses of `ProcStructLock` in `proc.c::InitProcess`. There are three different stages in which we acquire spinlocks: - During Postmaster init. In this case, the acquiring process is the postmaster, so if the acquiring process is killed, we are not left in a bad state (because the postmaster is restarting) - After process init: MyProc, MyProcPid, MyBackendId, and ProcGlobal are all initialised and functional. It is easy to handle cases like this - we handle them similarly to LWLocks. - During process init. In this stage, we have a `ProcGlobal` with some various global fields set and `MyProcPid`. `MyProc` is null and `MyBackendId` is not yet set. This is the hard case, discussed next. === How to handle SpinLocks acquired during process initialisation === Since there is no way to communicate to the postmaster that a process died holding a spinlock, we simply restart if we find that a process has crashed / been killed without owning a PGPROC struct. Another idea (and a previous version of this diff) was to use buckets, writing to a large array that `crashed_pid % 1000` owns a spinlock. If a process died without a PGPROC, we could check the array to see if it was likely that it owned a spinlock. This is more complicated, not perfect, and covers a rare scenario. It is better to more conservatively crash than to introduce potential new bugs or gaps. Since the OOM killer is more likely to kill procs with high memory usage, and backends during initialization don't have high memory usage, it is very unlikely that they will be killed, so very unlikely that we'll have to restart the postmaster because a process with no PGPROC was killed. Jira: DB-7270, DB-7265 Test Plan: === Test Connections before database initialization === 1. Add delay to a new process while it holds the ProcStructLock in proc.c: ```lang=diff SpinLockAcquire(ProcStructLock); - + pg_usleep(1000 * 1000 * 10); set_spins_per_delay(ProcGlobal->spins_per_delay); ``` 2. Restart : ``` bin/yb-ctl destroy && bin/yb-ctl start ``` 3. Repeatedly make connections. The first ones will fail with ``` ysqlsh: could not connect to server: Connection refused Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5433? could not connect to server: Connection refused Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IP connections on port 5433? ``` And after a few seconds, they will fail with ``` ysqlsh: could not connect to server: Connection refused Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5433? FATAL: the database system is starting up ``` In the logs, notice ``` 2023-11-08 20:13:28.030 UTC [274016] FATAL: the database system is starting up 2023-11-08 20:13:28.035 UTC [273956] WARNING: server process (PID 274016) exited with exit code 1 ``` for each of the second type of failures. 4. Finally, the database will finish initializing: ``` 2023-11-08 20:13:31.669 UTC [273956] LOG: database system is ready to accept connections ``` === Test Partially Initialised Process === 1. Add delay to a new process while it holds the ProcStructLock in proc.c: ```lang=diff SpinLockAcquire(ProcStructLock); - + pg_usleep(1000 * 1000 * 10); set_spins_per_delay(ProcGlobal->spins_per_delay); ``` 2. Create a new connection A 3. Create a new connection B and kill it while it is holding the lock. (`kill -9 44650`) 4. Observe that: * Connection A must reconnect because the postmaster restarted ``` yugabyte=# select * from pg_class; WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. yugabyte=# ``` * In the logs: ``` 2023-11-08 20:08:16.903 UTC [272686] WARNING: terminating active server processes due to backend crash of a partially initialized process 2023-11-08 20:08:16.903 UTC [272686] WARNING: server process (PID 272919) was terminated by signal 9: Killed 2023-11-08 20:08:16.903 UTC [272686] WARNING: could not find BEEntry for pid 272919 to add to yb_terminated_queries 2023-11-08 20:08:16.903 UTC [272866] WARNING: terminating connection because of crash of another server process 2023-11-08 20:08:16.903 UTC [272866] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2023-11-08 20:08:16.903 UTC [272866] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2023-11-08 20:08:16.907 UTC [272686] WARNING: server process (PID 272866) exited with exit code 2 2023-11-08 20:08:16.909 UTC [272686] LOG: all server processes terminated; reinitializing ``` (272866 is the existing connection, 272686 is the postmaster, and 272919 is the new process) === Test Initialized Process === 1. Add the lines ```lang=c slock_t* lock = NULL; lock = (slock_t *) ShmemAlloc(sizeof(slock_t)); SpinLockInit(lock); SpinLockAcquire(lock); pg_usleep(1000 * 1000 * 10); SpinLockRelease(lock); ``` somewhere to `YbSeqNext` in `YbSeqScan.c`. 2. Create a new connection A 3. Create a new connection B. Run a seq scan and kill it while it is holding the lock. 4. Observe that: * Connection A must reconnect, exactly as in the previous case. * In the logs: ``` 2023-09-13 16:45:16.091 EDT [14387] WARNING: terminating active server processes due to backend crash while acquiring SpinLock 2023-09-13 16:45:16.091 EDT [14387] WARNING: server process (PID 14403) was terminated by signal 9: Killed 2023-09-13 16:45:16.091 EDT [14387] DETAIL: Failed process was running: SELECT * FROM t; 2023-09-13 16:45:16.091 EDT [14466] WARNING: terminating connection because of crash of another server process 2023-09-13 16:45:16.091 EDT [14466] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2023-09-13 16:45:16.091 EDT [14466] HINT: In a moment you should be able to reconnect to the database and repeat your command. ``` === Test Spinlock Deadlock Detection Timeout === 1. Apply the following changes to `YbSeqNext` in `nodeYbSeqScan.c`: ```lang=diff --- a/src/postgres/src/backend/executor/nodeYbSeqscan.c +++ b/src/postgres/src/backend/executor/nodeYbSeqscan.c @@ -71,6 +71,14 @@ YbSeqNext(YbSeqScanState *node) econtext = node->ss.ps.ps_ExprContext; slot = node->ss.ss_ScanTupleSlot; + slock_t* lock = NULL; + lock = (slock_t *) ShmemAlloc(sizeof(slock_t)); + SpinLockInit(lock); + SpinLockAcquire(lock); + SpinLockAcquire(lock); // double acquire, will deadlock + pg_usleep(1000 * 1000 * 10); + SpinLockRelease(lock); + /* * Initialize the scandesc upon the first invocation. * The scan only needs its ybscan field, so eventually we may use it ``` 2. Create a session and run a sequential scan: `bin/ysqlsh -c "SELECT * FROM pg_class;"` 3. Observe in the logs that a session started: ``` I1120 13:14:40.311434 56496 pg_client.cc:165] Session id 3: Session id acquired. Postgres backend pid: 56496 ``` 4. Observe in the logs that the session was killed, 35 seconds later: ``` 2023-11-20 13:15:10.214 UTC [56496] PANIC: stuck spinlock detected at YbSeqNext, ../../../../../../src/postgres/src/backend/executor/nodeYbSeqscan.c:78 ``` Reviewers: amartsinchyk, smishra Reviewed By: amartsinchyk Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D28471
Spikatrix
pushed a commit
to Spikatrix/yugabyte-db
that referenced
this issue
Nov 29, 2023
…er when a process dies while holding a spinlock Summary: Original commit: 8e8cead / D28471 === What's new in this diff === This diff tracks spinlock acquisition during process initialisation by using a `ybSpinLockAcquired` field on `MyProc` for processes that are already initialised. 1. If a process is killed while holding a spinlock, we restart the postmaster. 2. If a process is killed before it has a `MyProc` (during initilzation), we also restart the postmaster. We don't restart if the process exited with code 0 or 1. If the database is still initializing, new connections would be rejected with status 1. We don't want to restart the postmaster, because we might end up in a loop where the app always tries too early to connect, causing us to start the whole thing over again. 3. Reduce the spinlock deadlock detection time by 75%. We are more likely to have a bad spinlock state than PG, and a stuck spinlock could block any new connection. Not printing anything for 1 minute while the backend is frozen is a bad experience, and since most spinlocks are acquired for only a few instructions, there should never be a case that a process is waiting for 15-30s on a spinlock. === Background === Logic already existed to track if a process was holding (or acquiring) a LWLock when it was killed (see the variable `yLWLockAcquired` (previously `ybAnyLockAcquired`)). This diff adds logic to detect if a spinlock is held or being acquired when a process is killed. For example, if a process is killed while holding the `ProcStructLock`, any future connection will fail to acquire the ProcStructLock - so it cannot initialize itself. Since the postmaster typically doesn't acquire this lock, it remains blissfully unaware that no new children can be acquired. When a LWLock is acquired, it is guaranteed that the global variable `MyProc` is non-null. So it's straightforward to add a field to `MyProc` to indicate if this process is holding a lock. However, spinlocks are a much lower level construct, and are used to safely create and allocate the `MyProc` object. Some spinlocks are acquired before `MyProc` or `MyBackendId` are set. See uses of `ProcStructLock` in `proc.c::InitProcess`. There are three different stages in which we acquire spinlocks: - During Postmaster init. In this case, the acquiring process is the postmaster, so if the acquiring process is killed, we are not left in a bad state (because the postmaster is restarting) - After process init: MyProc, MyProcPid, MyBackendId, and ProcGlobal are all initialised and functional. It is easy to handle cases like this - we handle them similarly to LWLocks. - During process init. In this stage, we have a `ProcGlobal` with some various global fields set and `MyProcPid`. `MyProc` is null and `MyBackendId` is not yet set. This is the hard case, discussed next. === How to handle SpinLocks acquired during process initialisation === Since there is no way to communicate to the postmaster that a process died holding a spinlock, we simply restart if we find that a process has crashed / been killed without owning a PGPROC struct. Another idea (and a previous version of this diff) was to use buckets, writing to a large array that `crashed_pid % 1000` owns a spinlock. If a process died without a PGPROC, we could check the array to see if it was likely that it owned a spinlock. This is more complicated, not perfect, and covers a rare scenario. It is better to more conservatively crash than to introduce potential new bugs or gaps. Since the OOM killer is more likely to kill procs with high memory usage, and backends during initialization don't have high memory usage, it is very unlikely that they will be killed, so very unlikely that we'll have to restart the postmaster because a process with no PGPROC was killed. Jira: DB-7270, DB-7265 Test Plan: === Test Connections before database initialization === 1. Add delay to a new process while it holds the ProcStructLock in proc.c: ```lang=diff SpinLockAcquire(ProcStructLock); - + pg_usleep(1000 * 1000 * 10); set_spins_per_delay(ProcGlobal->spins_per_delay); ``` 2. Restart : ``` bin/yb-ctl destroy && bin/yb-ctl start ``` 3. Repeatedly make connections. The first ones will fail with ``` ysqlsh: could not connect to server: Connection refused Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5433? could not connect to server: Connection refused Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IP connections on port 5433? ``` And after a few seconds, they will fail with ``` ysqlsh: could not connect to server: Connection refused Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5433? FATAL: the database system is starting up ``` In the logs, notice ``` 2023-11-08 20:13:28.030 UTC [274016] FATAL: the database system is starting up 2023-11-08 20:13:28.035 UTC [273956] WARNING: server process (PID 274016) exited with exit code 1 ``` for each of the second type of failures. 4. Finally, the database will finish initializing: ``` 2023-11-08 20:13:31.669 UTC [273956] LOG: database system is ready to accept connections ``` === Test Partially Initialised Process === 1. Add delay to a new process while it holds the ProcStructLock in proc.c: ```lang=diff SpinLockAcquire(ProcStructLock); - + pg_usleep(1000 * 1000 * 10); set_spins_per_delay(ProcGlobal->spins_per_delay); ``` 2. Create a new connection A 3. Create a new connection B and kill it while it is holding the lock. (`kill -9 44650`) 4. Observe that: * Connection A must reconnect because the postmaster restarted ``` yugabyte=# select * from pg_class; WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. yugabyte=# ``` * In the logs: ``` 2023-11-08 20:08:16.903 UTC [272686] WARNING: terminating active server processes due to backend crash of a partially initialized process 2023-11-08 20:08:16.903 UTC [272686] WARNING: server process (PID 272919) was terminated by signal 9: Killed 2023-11-08 20:08:16.903 UTC [272686] WARNING: could not find BEEntry for pid 272919 to add to yb_terminated_queries 2023-11-08 20:08:16.903 UTC [272866] WARNING: terminating connection because of crash of another server process 2023-11-08 20:08:16.903 UTC [272866] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2023-11-08 20:08:16.903 UTC [272866] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2023-11-08 20:08:16.907 UTC [272686] WARNING: server process (PID 272866) exited with exit code 2 2023-11-08 20:08:16.909 UTC [272686] LOG: all server processes terminated; reinitializing ``` (272866 is the existing connection, 272686 is the postmaster, and 272919 is the new process) === Test Initialized Process === 1. Add the lines ```lang=c slock_t* lock = NULL; lock = (slock_t *) ShmemAlloc(sizeof(slock_t)); SpinLockInit(lock); SpinLockAcquire(lock); pg_usleep(1000 * 1000 * 10); SpinLockRelease(lock); ``` somewhere to `YbSeqNext` in `YbSeqScan.c`. 2. Create a new connection A 3. Create a new connection B. Run a seq scan and kill it while it is holding the lock. 4. Observe that: * Connection A must reconnect, exactly as in the previous case. * In the logs: ``` 2023-09-13 16:45:16.091 EDT [14387] WARNING: terminating active server processes due to backend crash while acquiring SpinLock 2023-09-13 16:45:16.091 EDT [14387] WARNING: server process (PID 14403) was terminated by signal 9: Killed 2023-09-13 16:45:16.091 EDT [14387] DETAIL: Failed process was running: SELECT * FROM t; 2023-09-13 16:45:16.091 EDT [14466] WARNING: terminating connection because of crash of another server process 2023-09-13 16:45:16.091 EDT [14466] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2023-09-13 16:45:16.091 EDT [14466] HINT: In a moment you should be able to reconnect to the database and repeat your command. ``` === Test Spinlock Deadlock Detection Timeout === 1. Apply the following changes to `YbSeqNext` in `nodeYbSeqScan.c`: ```lang=diff --- a/src/postgres/src/backend/executor/nodeYbSeqscan.c +++ b/src/postgres/src/backend/executor/nodeYbSeqscan.c @@ -71,6 +71,14 @@ YbSeqNext(YbSeqScanState *node) econtext = node->ss.ps.ps_ExprContext; slot = node->ss.ss_ScanTupleSlot; + slock_t* lock = NULL; + lock = (slock_t *) ShmemAlloc(sizeof(slock_t)); + SpinLockInit(lock); + SpinLockAcquire(lock); + SpinLockAcquire(lock); // double acquire, will deadlock + pg_usleep(1000 * 1000 * 10); + SpinLockRelease(lock); + /* * Initialize the scandesc upon the first invocation. * The scan only needs its ybscan field, so eventually we may use it ``` 2. Create a session and run a sequential scan: `bin/ysqlsh -c "SELECT * FROM pg_class;"` 3. Observe in the logs that a session started: ``` I1120 13:14:40.311434 56496 pg_client.cc:165] Session id 3: Session id acquired. Postgres backend pid: 56496 ``` 4. Observe in the logs that the session was killed, 35 seconds later: ``` 2023-11-20 13:15:10.214 UTC [56496] PANIC: stuck spinlock detected at YbSeqNext, ../../../../../../src/postgres/src/backend/executor/nodeYbSeqscan.c:78 ``` Reviewers: amartsinchyk, smishra Reviewed By: amartsinchyk Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D30375
Spikatrix
pushed a commit
to Spikatrix/yugabyte-db
that referenced
this issue
Nov 29, 2023
…er when a process dies while holding a spinlock Summary: Original commit: 8e8cead / D28471 === What's new in this diff === This diff tracks spinlock acquisition during process initialisation by using a `ybSpinLockAcquired` field on `MyProc` for processes that are already initialised. 1. If a process is killed while holding a spinlock, we restart the postmaster. 2. If a process is killed before it has a `MyProc` (during initilzation), we also restart the postmaster. We don't restart if the process exited with code 0 or 1. If the database is still initializing, new connections would be rejected with status 1. We don't want to restart the postmaster, because we might end up in a loop where the app always tries too early to connect, causing us to start the whole thing over again. 3. Reduce the spinlock deadlock detection time by 75%. We are more likely to have a bad spinlock state than PG, and a stuck spinlock could block any new connection. Not printing anything for 1 minute while the backend is frozen is a bad experience, and since most spinlocks are acquired for only a few instructions, there should never be a case that a process is waiting for 15-30s on a spinlock. === Background === Logic already existed to track if a process was holding (or acquiring) a LWLock when it was killed (see the variable `yLWLockAcquired` (previously `ybAnyLockAcquired`)). This diff adds logic to detect if a spinlock is held or being acquired when a process is killed. For example, if a process is killed while holding the `ProcStructLock`, any future connection will fail to acquire the ProcStructLock - so it cannot initialize itself. Since the postmaster typically doesn't acquire this lock, it remains blissfully unaware that no new children can be acquired. When a LWLock is acquired, it is guaranteed that the global variable `MyProc` is non-null. So it's straightforward to add a field to `MyProc` to indicate if this process is holding a lock. However, spinlocks are a much lower level construct, and are used to safely create and allocate the `MyProc` object. Some spinlocks are acquired before `MyProc` or `MyBackendId` are set. See uses of `ProcStructLock` in `proc.c::InitProcess`. There are three different stages in which we acquire spinlocks: - During Postmaster init. In this case, the acquiring process is the postmaster, so if the acquiring process is killed, we are not left in a bad state (because the postmaster is restarting) - After process init: MyProc, MyProcPid, MyBackendId, and ProcGlobal are all initialised and functional. It is easy to handle cases like this - we handle them similarly to LWLocks. - During process init. In this stage, we have a `ProcGlobal` with some various global fields set and `MyProcPid`. `MyProc` is null and `MyBackendId` is not yet set. This is the hard case, discussed next. === How to handle SpinLocks acquired during process initialisation === Since there is no way to communicate to the postmaster that a process died holding a spinlock, we simply restart if we find that a process has crashed / been killed without owning a PGPROC struct. Another idea (and a previous version of this diff) was to use buckets, writing to a large array that `crashed_pid % 1000` owns a spinlock. If a process died without a PGPROC, we could check the array to see if it was likely that it owned a spinlock. This is more complicated, not perfect, and covers a rare scenario. It is better to more conservatively crash than to introduce potential new bugs or gaps. Since the OOM killer is more likely to kill procs with high memory usage, and backends during initialization don't have high memory usage, it is very unlikely that they will be killed, so very unlikely that we'll have to restart the postmaster because a process with no PGPROC was killed. Jira: DB-7270, DB-7265 Test Plan: === Test Connections before database initialization === 1. Add delay to a new process while it holds the ProcStructLock in proc.c: ```lang=diff SpinLockAcquire(ProcStructLock); - + pg_usleep(1000 * 1000 * 10); set_spins_per_delay(ProcGlobal->spins_per_delay); ``` 2. Restart : ``` bin/yb-ctl destroy && bin/yb-ctl start ``` 3. Repeatedly make connections. The first ones will fail with ``` ysqlsh: could not connect to server: Connection refused Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5433? could not connect to server: Connection refused Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IP connections on port 5433? ``` And after a few seconds, they will fail with ``` ysqlsh: could not connect to server: Connection refused Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5433? FATAL: the database system is starting up ``` In the logs, notice ``` 2023-11-08 20:13:28.030 UTC [274016] FATAL: the database system is starting up 2023-11-08 20:13:28.035 UTC [273956] WARNING: server process (PID 274016) exited with exit code 1 ``` for each of the second type of failures. 4. Finally, the database will finish initializing: ``` 2023-11-08 20:13:31.669 UTC [273956] LOG: database system is ready to accept connections ``` === Test Partially Initialised Process === 1. Add delay to a new process while it holds the ProcStructLock in proc.c: ```lang=diff SpinLockAcquire(ProcStructLock); - + pg_usleep(1000 * 1000 * 10); set_spins_per_delay(ProcGlobal->spins_per_delay); ``` 2. Create a new connection A 3. Create a new connection B and kill it while it is holding the lock. (`kill -9 44650`) 4. Observe that: * Connection A must reconnect because the postmaster restarted ``` yugabyte=# select * from pg_class; WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. yugabyte=# ``` * In the logs: ``` 2023-11-08 20:08:16.903 UTC [272686] WARNING: terminating active server processes due to backend crash of a partially initialized process 2023-11-08 20:08:16.903 UTC [272686] WARNING: server process (PID 272919) was terminated by signal 9: Killed 2023-11-08 20:08:16.903 UTC [272686] WARNING: could not find BEEntry for pid 272919 to add to yb_terminated_queries 2023-11-08 20:08:16.903 UTC [272866] WARNING: terminating connection because of crash of another server process 2023-11-08 20:08:16.903 UTC [272866] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2023-11-08 20:08:16.903 UTC [272866] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2023-11-08 20:08:16.907 UTC [272686] WARNING: server process (PID 272866) exited with exit code 2 2023-11-08 20:08:16.909 UTC [272686] LOG: all server processes terminated; reinitializing ``` (272866 is the existing connection, 272686 is the postmaster, and 272919 is the new process) === Test Initialized Process === 1. Add the lines ```lang=c slock_t* lock = NULL; lock = (slock_t *) ShmemAlloc(sizeof(slock_t)); SpinLockInit(lock); SpinLockAcquire(lock); pg_usleep(1000 * 1000 * 10); SpinLockRelease(lock); ``` somewhere to `YbSeqNext` in `YbSeqScan.c`. 2. Create a new connection A 3. Create a new connection B. Run a seq scan and kill it while it is holding the lock. 4. Observe that: * Connection A must reconnect, exactly as in the previous case. * In the logs: ``` 2023-09-13 16:45:16.091 EDT [14387] WARNING: terminating active server processes due to backend crash while acquiring SpinLock 2023-09-13 16:45:16.091 EDT [14387] WARNING: server process (PID 14403) was terminated by signal 9: Killed 2023-09-13 16:45:16.091 EDT [14387] DETAIL: Failed process was running: SELECT * FROM t; 2023-09-13 16:45:16.091 EDT [14466] WARNING: terminating connection because of crash of another server process 2023-09-13 16:45:16.091 EDT [14466] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2023-09-13 16:45:16.091 EDT [14466] HINT: In a moment you should be able to reconnect to the database and repeat your command. ``` === Test Spinlock Deadlock Detection Timeout === 1. Apply the following changes to `YbSeqNext` in `nodeYbSeqScan.c`: ```lang=diff --- a/src/postgres/src/backend/executor/nodeYbSeqscan.c +++ b/src/postgres/src/backend/executor/nodeYbSeqscan.c @@ -71,6 +71,14 @@ YbSeqNext(YbSeqScanState *node) econtext = node->ss.ps.ps_ExprContext; slot = node->ss.ss_ScanTupleSlot; + slock_t* lock = NULL; + lock = (slock_t *) ShmemAlloc(sizeof(slock_t)); + SpinLockInit(lock); + SpinLockAcquire(lock); + SpinLockAcquire(lock); // double acquire, will deadlock + pg_usleep(1000 * 1000 * 10); + SpinLockRelease(lock); + /* * Initialize the scandesc upon the first invocation. * The scan only needs its ybscan field, so eventually we may use it ``` 2. Create a session and run a sequential scan: `bin/ysqlsh -c "SELECT * FROM pg_class;"` 3. Observe in the logs that a session started: ``` I1120 13:14:40.311434 56496 pg_client.cc:165] Session id 3: Session id acquired. Postgres backend pid: 56496 ``` 4. Observe in the logs that the session was killed, 35 seconds later: ``` 2023-11-20 13:15:10.214 UTC [56496] PANIC: stuck spinlock detected at YbSeqNext, ../../../../../../src/postgres/src/backend/executor/nodeYbSeqscan.c:78 ``` Reviewers: amartsinchyk, smishra Reviewed By: amartsinchyk Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D30374
jharveysmith
pushed a commit
that referenced
this issue
May 24, 2024
…s dies while holding a spinlock Summary: Original commit: ef8433a90d470a2bc841e170bd2e70633f35273d / D28471 === What's new in this diff === This diff tracks spinlock acquisition during process initialisation by using a `ybSpinLockAcquired` field on `MyProc` for processes that are already initialised. 1. If a process is killed while holding a spinlock, we restart the postmaster. 2. If a process is killed before it has a `MyProc` (during initilzation), we also restart the postmaster. We don't restart if the process exited with code 0 or 1. If the database is still initializing, new connections would be rejected with status 1. We don't want to restart the postmaster, because we might end up in a loop where the app always tries too early to connect, causing us to start the whole thing over again. 3. Reduce the spinlock deadlock detection time by 75%. We are more likely to have a bad spinlock state than PG, and a stuck spinlock could block any new connection. Not printing anything for 1 minute while the backend is frozen is a bad experience, and since most spinlocks are acquired for only a few instructions, there should never be a case that a process is waiting for 15-30s on a spinlock. === Background === Logic already existed to track if a process was holding (or acquiring) a LWLock when it was killed (see the variable `yLWLockAcquired` (previously `ybAnyLockAcquired`)). This diff adds logic to detect if a spinlock is held or being acquired when a process is killed. For example, if a process is killed while holding the `ProcStructLock`, any future connection will fail to acquire the ProcStructLock - so it cannot initialize itself. Since the postmaster typically doesn't acquire this lock, it remains blissfully unaware that no new children can be acquired. When a LWLock is acquired, it is guaranteed that the global variable `MyProc` is non-null. So it's straightforward to add a field to `MyProc` to indicate if this process is holding a lock. However, spinlocks are a much lower level construct, and are used to safely create and allocate the `MyProc` object. Some spinlocks are acquired before `MyProc` or `MyBackendId` are set. See uses of `ProcStructLock` in `proc.c::InitProcess`. There are three different stages in which we acquire spinlocks: - During Postmaster init. In this case, the acquiring process is the postmaster, so if the acquiring process is killed, we are not left in a bad state (because the postmaster is restarting) - After process init: MyProc, MyProcPid, MyBackendId, and ProcGlobal are all initialised and functional. It is easy to handle cases like this - we handle them similarly to LWLocks. - During process init. In this stage, we have a `ProcGlobal` with some various global fields set and `MyProcPid`. `MyProc` is null and `MyBackendId` is not yet set. This is the hard case, discussed next. === How to handle SpinLocks acquired during process initialisation === Since there is no way to communicate to the postmaster that a process died holding a spinlock, we simply restart if we find that a process has crashed / been killed without owning a PGPROC struct. Another idea (and a previous version of this diff) was to use buckets, writing to a large array that `crashed_pid % 1000` owns a spinlock. If a process died without a PGPROC, we could check the array to see if it was likely that it owned a spinlock. This is more complicated, not perfect, and covers a rare scenario. It is better to more conservatively crash than to introduce potential new bugs or gaps. Since the OOM killer is more likely to kill procs with high memory usage, and backends during initialization don't have high memory usage, it is very unlikely that they will be killed, so very unlikely that we'll have to restart the postmaster because a process with no PGPROC was killed. Jira: DB-7270, DB-7265 Test Plan: === Test Connections before database initialization === 1. Add delay to a new process while it holds the ProcStructLock in proc.c: ```lang=diff SpinLockAcquire(ProcStructLock); - + pg_usleep(1000 * 1000 * 10); set_spins_per_delay(ProcGlobal->spins_per_delay); ``` 2. Restart : ``` bin/yb-ctl destroy && bin/yb-ctl start ``` 3. Repeatedly make connections. The first ones will fail with ``` ysqlsh: could not connect to server: Connection refused Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5433? could not connect to server: Connection refused Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IP connections on port 5433? ``` And after a few seconds, they will fail with ``` ysqlsh: could not connect to server: Connection refused Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5433? FATAL: the database system is starting up ``` In the logs, notice ``` 2023-11-08 20:13:28.030 UTC [274016] FATAL: the database system is starting up 2023-11-08 20:13:28.035 UTC [273956] WARNING: server process (PID 274016) exited with exit code 1 ``` for each of the second type of failures. 4. Finally, the database will finish initializing: ``` 2023-11-08 20:13:31.669 UTC [273956] LOG: database system is ready to accept connections ``` === Test Partially Initialised Process === 1. Add delay to a new process while it holds the ProcStructLock in proc.c: ```lang=diff SpinLockAcquire(ProcStructLock); - + pg_usleep(1000 * 1000 * 10); set_spins_per_delay(ProcGlobal->spins_per_delay); ``` 2. Create a new connection A 3. Create a new connection B and kill it while it is holding the lock. (`kill -9 44650`) 4. Observe that: * Connection A must reconnect because the postmaster restarted ``` yugabyte=# select * from pg_class; WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. yugabyte=# ``` * In the logs: ``` 2023-11-08 20:08:16.903 UTC [272686] WARNING: terminating active server processes due to backend crash of a partially initialized process 2023-11-08 20:08:16.903 UTC [272686] WARNING: server process (PID 272919) was terminated by signal 9: Killed 2023-11-08 20:08:16.903 UTC [272686] WARNING: could not find BEEntry for pid 272919 to add to yb_terminated_queries 2023-11-08 20:08:16.903 UTC [272866] WARNING: terminating connection because of crash of another server process 2023-11-08 20:08:16.903 UTC [272866] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2023-11-08 20:08:16.903 UTC [272866] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2023-11-08 20:08:16.907 UTC [272686] WARNING: server process (PID 272866) exited with exit code 2 2023-11-08 20:08:16.909 UTC [272686] LOG: all server processes terminated; reinitializing ``` (272866 is the existing connection, 272686 is the postmaster, and 272919 is the new process) === Test Initialized Process === 1. Add the lines ```lang=c slock_t* lock = NULL; lock = (slock_t *) ShmemAlloc(sizeof(slock_t)); SpinLockInit(lock); SpinLockAcquire(lock); pg_usleep(1000 * 1000 * 10); SpinLockRelease(lock); ``` somewhere to `YbSeqNext` in `YbSeqScan.c`. 2. Create a new connection A 3. Create a new connection B. Run a seq scan and kill it while it is holding the lock. 4. Observe that: * Connection A must reconnect, exactly as in the previous case. * In the logs: ``` 2023-09-13 16:45:16.091 EDT [14387] WARNING: terminating active server processes due to backend crash while acquiring SpinLock 2023-09-13 16:45:16.091 EDT [14387] WARNING: server process (PID 14403) was terminated by signal 9: Killed 2023-09-13 16:45:16.091 EDT [14387] DETAIL: Failed process was running: SELECT * FROM t; 2023-09-13 16:45:16.091 EDT [14466] WARNING: terminating connection because of crash of another server process 2023-09-13 16:45:16.091 EDT [14466] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2023-09-13 16:45:16.091 EDT [14466] HINT: In a moment you should be able to reconnect to the database and repeat your command. ``` === Test Spinlock Deadlock Detection Timeout === 1. Apply the following changes to `YbSeqNext` in `nodeYbSeqScan.c`: ```lang=diff --- a/src/postgres/src/backend/executor/nodeYbSeqscan.c +++ b/src/postgres/src/backend/executor/nodeYbSeqscan.c @@ -71,6 +71,14 @@ YbSeqNext(YbSeqScanState *node) econtext = node->ss.ps.ps_ExprContext; slot = node->ss.ss_ScanTupleSlot; + slock_t* lock = NULL; + lock = (slock_t *) ShmemAlloc(sizeof(slock_t)); + SpinLockInit(lock); + SpinLockAcquire(lock); + SpinLockAcquire(lock); // double acquire, will deadlock + pg_usleep(1000 * 1000 * 10); + SpinLockRelease(lock); + /* * Initialize the scandesc upon the first invocation. * The scan only needs its ybscan field, so eventually we may use it ``` 2. Create a session and run a sequential scan: `bin/ysqlsh -c "SELECT * FROM pg_class;"` 3. Observe in the logs that a session started: ``` I1120 13:14:40.311434 56496 pg_client.cc:165] Session id 3: Session id acquired. Postgres backend pid: 56496 ``` 4. Observe in the logs that the session was killed, 35 seconds later: ``` 2023-11-20 13:15:10.214 UTC [56496] PANIC: stuck spinlock detected at YbSeqNext, ../../../../../../src/postgres/src/backend/executor/nodeYbSeqscan.c:78 ``` Reviewers: amartsinchyk, smishra Reviewed By: amartsinchyk Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D30374
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Jira Link: DB-7270
Description
spinLockAcquire(ProcStructLock); set_spins_per_delay(ProcGlobal->spins_per_delay); + pg_usleep(1000 * 1000 * 10); // 10s MyProc = *procgloballist;
This fatal error will kill backend B, but not affect the postmaster.
Solution:
This is related to #18265, but the difference is that this issue specifically addresses spinlocks. It only occurs when the process is killed while holding the spinlock, which is extremely rate. If this issue is encountered, it is immediately very bad and obvious.
#18265 addresses the dangling proc struct list entry, and occurs if the backend is killed anytime between claiming a proc struct and having a PID assigned to the proc struct. If that scenario happens enough times, it would result in a max connection error.
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: