You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There were two bugs:
1. We initialized the page headers on the allocated temporary WAL page
even if the end-of-log was precisely at page boundary. That means we
wrote beyond the end of the allocation. That readily gives an
assertion failure on debug-enabled builds. Add test case and fix.
2. We initialize the WAL buffer by copying the contents of the WAL
read buffer. The idea is that when we stop reading WAL, the last
buffer in the reader becomes the new WAL buffer we'll write to.
That's how PostgreSQL does it too, see code around comment "Tricky
point here" in StartupXLOG().
However, that doesn't work with Neon. The startup procedure is a
little different: we don't do normal WAL recovery and we don't read
any WAL at startup, except when promoting a read replica. Vanilla
PostgreSQL always reads WAL: it reads the last checkpoint record from
the WAL if nothing else, but in Neon we don't necessarily read even
that. In that case, the xlogreader's read buffer is still
uninitialized by the time that we copy it.
That's relatively harmless, the only consequence is that the initial
WAL segment on local disk can contain garbage before the first WAL
record that we write. That's why we haven't noticed until
now. Furthermore, it seems that the uninitialized memory just happens
to be all-zeros. However, it now caused the test_pg_waldump.py test to
fail with the new communicator implementation. That was very
coincidental - the new communicator process isn't even running yet
when the WAL buffer is initialized. It seems to have changed the
memory allocation just so that the uninitialized memory is no longer
all-zeros. That's normally harmless too, but it makes the pg_waldump
test to fail: pg_waldump, with the --ignore option, starts reading the
WAL from the first non-zero bytes, so when the uninitialized portion
was filled with garbage rather than zeros, it fails.
This little patch to poison the allocated buffer with garbage was
helpful while debugging, to make the test fail in a repeatable fashion
with or without the new communicator:
```
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 988be3f..2f4844c2b86 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -97,6 +97,7 @@ XLogReaderAllocate(int wal_segment_size, const char *waldir,
*/
state->readBuf = (char *) palloc_extended(XLOG_BLCKSZ,
MCXT_ALLOC_NO_OOM);
+ memset(state->readBuf, 0x7e, XLOG_BLCKSZ);
if (!state->readBuf)
{
pfree(state);
```
0 commit comments