Undo disarming VM check warning in vacuumlazy.c #184

knizhnik · 2022-07-22T20:09:05Z

No description provided.

Make smgr API pluggable. Add smgr_hook that can be used to define custom smgrs. Remove smgrsw[] array and smgr_sw selector. Instead, smgropen() loads f_smgr implementation using smgr_hook. Also add smgr_init_hook and smgr_shutdown_hook. And a lot of mechanical changes in smgr.c functions. This patch is proposed to community: https://commitfest.postgresql.org/33/3216/ Author: anastasia <lubennikovaav@gmail.com>

Add contrib/zenith that handles interaction with remote pagestore. To use it add 'shared_preload_library = zenith' to postgresql.conf. It adds a protocol for network communications - see libpagestore.c; and implements smgr API. Also it adds several custom GUC variables: - zenith.page_server_connstring - zenith.callmemaybe_connstring - zenith.zenith_timeline - zenith.wal_redo Authors: Stas Kelvich <stanconn@gmail.com> Konstantin Knizhnik <knizhnik@garret.ru> Heikki Linnakangas <heikki.linnakangas@iki.fi>

Add WAL redo helper for zenith - alternative postgres operation mode to replay wal by pageserver request. To start postgres in wal-redo mode, run postgres with --wal-redo option It requires zenith shared library and zenith.wal_redo Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>

Save lastWrittenPageLSN in XLogCtlData to know what pages to request from remote pageserver. Authors: Konstantin Knizhnik <knizhnik@garret.ru> Heikki Linnakangas <heikki.linnakangas@iki.fi>

In the test_createdb test, we created a new database, and created a new branch after that. I was seeing the test fail with: PANIC: could not open critical system index 2662 The WAL contained records like this: rmgr: XLOG len (rec/tot): 49/ 8241, tx: 0, lsn: 0/0163E8F0, prev 0/0163C8A0, desc: FPI , blkref #0: rel 1663/12985/1249 fork fsm blk 1 FPW rmgr: XLOG len (rec/tot): 49/ 8241, tx: 0, lsn: 0/01640940, prev 0/0163E8F0, desc: FPI , blkref #0: rel 1663/12985/1249 fork fsm blk 2 FPW rmgr: Standby len (rec/tot): 54/ 54, tx: 0, lsn: 0/01642990, prev 0/01640940, desc: RUNNING_XACTS nextXid 541 latestCompletedXid 539 oldestRunningXid 540; 1 xacts: 540 rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn: 0/016429C8, prev 0/01642990, desc: CHECKPOINT_ONLINE redo 0/163C8A0; tli 1; prev tli 1; fpw true; xid 0:541; oid 24576; multi 1; offset 0; oldest xid 532 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 540; online rmgr: Database len (rec/tot): 42/ 42, tx: 540, lsn: 0/01642A40, prev 0/016429C8, desc: CREATE copy dir 1663/1 to 1663/16390 rmgr: Standby len (rec/tot): 54/ 54, tx: 0, lsn: 0/01642A70, prev 0/01642A40, desc: RUNNING_XACTS nextXid 541 latestCompletedXid 539 oldestRunningXid 540; 1 xacts: 540 rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn: 0/01642AA8, prev 0/01642A70, desc: CHECKPOINT_ONLINE redo 0/1642A70; tli 1; prev tli 1; fpw true; xid 0:541; oid 24576; multi 1; offset 0; oldest xid 532 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 540; online rmgr: Transaction len (rec/tot): 66/ 66, tx: 540, lsn: 0/01642B20, prev 0/01642AA8, desc: COMMIT 2021-05-21 15:55:46.363728 EEST; inval msgs: catcache 21; sync rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn: 0/01642B68, prev 0/01642B20, desc: CHECKPOINT_SHUTDOWN redo 0/1642B68; tli 1; prev tli 1; fpw true; xid 0:541; oid 24576; multi 1; offset 0; oldest xid 532 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown The compute node had correctly replayed all the WAL up to the last record, and opened up. But when you tried to connect to the new database, the very first requests for the critical relations, like pg_class, were made with request LSN 0/01642990. That's the last record that's applicable to a particular block. Because the database CREATE record didn't bump up the "last written LSN", the getpage requests were made with too old LSN. I fixed this by adding a SetLastWrittenLSN() call to the redo of database CREATE record. It probably wouldn't hurt to also throw in a call at the end of WAL replay, but let's see if we bump into more cases like this first. This doesn't seem to be happening with page server as of 'main'; I was testing with a version where I had temporarily reverted all the recent changes to reconstruct control file, checkpoints, relmapper files etc. from the WAL records in the page server, so that the compute node was redoing all the WAL. I'm pretty sure we need this fix even with 'main', even though this test case wasn't failing there right now.

Some operations in PostgreSQL are not WAL-logged at all (i.e. hint bits) or delay wal-logging till the end of operation (i.e. index build). So if such page is evicted, we will lose the update. To fix it, we introduce PD_WAL_LOGGED bit to track whether the page was wal-logged. If the page is evicted before it has been wal-logged, then zenith smgr creates FPI for it. Authors: Konstantin Knizhnik <knizhnik@garret.ru> anastasia <lubennikovaav@gmail.com>

Add WalProposer background worker to broadcast WAL stream to Zenith WAL acceptors Author: Konstantin Knizhnik <knizhnik@garret.ru>

Ignore unlogged table qualifier. Add respective changes to regression test outputs. Author: Konstantin Knizhnik <knizhnik@garret.ru>

Request relation size via smgr function, not just stat(filepath).

Author: Konstantin Knizhnik <knizhnik@garret.ru>

…mmon error. TODO: add a comment, why this is fine for zenith.

…d of WAL page header, then return it back to the page origin

…of WAL at compute node + Check for presence of replication slot

…t inside. WAL proposer (as bgw without BGWORKER_BACKEND_DATABASE_CONNECTION) previously ignored SetLatch, so once caught up it stuck inside WalProposerPoll infinitely. Futher, WaitEventSetWait didn't have timeout, so we didn't try to reconnect if all connections are dead as well. Fix that. Also move break on latch set to the end of the loop to attempt ReconnectWalKeepers even if latch is constantly set. Per test_race_conditions (Python version now).

…kpoint from WAL + Check for presence of zenith.signal file to allow skip reading checkpoint record from WAL + Pass prev_record_ptr through zenith.signal file to postgres

@knizhnik

This patch aims to make our bespoke WAL redo machinery more robust in the presence of untrusted (in other words, possibly malicious) inputs. Pageserver delegates complex WAL decoding duties to postgres, which means that the latter might fall victim to carefully designed malicious WAL records and start doing harmful things to the system. To prevent this, it has been decided to limit possible interactions with the outside world using the Secure Computing BPF mode. We use this mode to disable all syscalls not in the allowlist. Please refer to src/backend/postmaster/seccomp.c to learn more about the pros & cons of the current approach. + Fix some bugs in seccomp bpf wrapper * Use SCMP_ACT_TRAP instead of SCMP_ACT_KILL_PROCESS to receive signals. * Add a missing variant of select() syscall (thx to @knizhnik). * Write error messages to an fd stderr's currently pointing to.

…ause it cause memory leak in wal-redo-postgres 2. Add check for local relations to make it possible to use DEBUG_COMPARE_LOCAL mode in SMGR + Call smgr_init_standard from smgr_init_zenith

this patch adds support for zenith_tenant variable. it has similar format as zenith_timeline. It is used in callmemaybe query to pass tenant to pageserver and in ServerInfo structure passed to wal acceptor

…recovery. Rust's postgres_backend currently is too dummy to handle it properly: reading happens in separate thread which just ignores CopyDone. Instead, writer thread must get aware of termination and send CommandComplete. Also reading socket must be transferred back to postgres_backend (or connection terminated completely after COPY). Let's do that after more basic safkeeper refactoring and right now cover this up to make tests pass. ref #388

…ion position in wal_proppser to segment boundary

…ugging. Now it contains only one function test_consume_xids() for xid wraparound testing.

- extend zenith pageserver API to handle new request type; - add dbsize_hook to intercept db_dir_size() call.

To force making basebackup again.

This function is to simplify complex WAL generation in neondatabase/neon#1574 `pg_logical_emit_message` is the easiest way to get a big WAL record, but: * If it's transactional, it gets `COMMIT` record right after * If it's not, WAL is not flushed at all. The function helps here, so we don't rely on the background WAL writer. I suspect the plain `xlogflush()` name may collide in the future, hence the prefix.

I'm seeing a lot of these warnings from B-tree SPLIT records: WARNING: inmem_write() called for 1663/12990/16397.0 blk 2630: used_pages 0 CONTEXT: WAL redo at 1/235A1B50 for Btree/SPLIT_R: level 0, firstrightoff 368, newitemoff 408, postingoff 0 That seems OK, replaying a split record legitimately accesses many buffers: the left half, the right half, left sibling, right sibling, and child. We could bump up 'temp_buffers' (currently 4), but I didn't do that beceause it's also good to get some test coverage for the inmem_smgr.c.

At neondatabase/neon#1783 (comment), Kirill saw case where the WAL redo process failed to open /dev/null. That's pretty weird, and I have no idea what might be causing it, but with this patch we'll at least get a little more details if it happens again. This will print the OS error (with %m) if it happens, and also distinguishes between the two error cases that previously both emitted the 'failed to open a test file' error.

- zenith.page_server_connstring -> neon.pageserver_connstring - zenith.zenith_tenant -> neon.tenant_id - zenith.zenith_timeline -> neon.timeline_id - zenith.max_cluster_size -> neon.max_cluster_size

as basebackup LSN always skips over page header

Part of neondatabase/neon#1838

* Do not allocate shared memory for wal_redo process * Add comment

* Add check for NULL for malloc in InternalIpcMemoryCreate * apply pgindent

- Fix typos - Change Zenith -> Neon in the ZENITH_SMGR tag that's printed in error messages that is user-visible, and in various function names and comments that are not user-visible. - pgindent - Remove comment about zm_to_string() leaking memory. It doesn't. - Re-word some error messages to match PostgreSQL error message style guide - Cleanup logging style - Don't print JWT token to log

Maintain cache of last written LSN for each relation segment (8 Mb).

* Add uuid-ossp to the supported extensions Also update compile flags to `-O2` to trade compile time for PostgreSQL performance, and removes --enable-cassert.

* Update last written LSN for gin/gist index metadata * Replace SetLastWrittenLSN with family of SetLastWrittenLSNFFor* functions

hlinnaka

typo in the commit message: "diasming"

To finish this, let's:

Fix the commit message, rebase, and force-push to the undo_vm_change branch
Create a PR in the neon-repository, with these two changes:

remove the corresponding section from docs/core_changes.md, and
bump vendor/postgres to undo_vm_change branch.

Wait for the CI to run on that PR. Assuming it's all green:
Push this PR to the postgres main branch
Push the neon-repo PR to neon main branch

hlinnaka · 2022-07-26T09:06:06Z

But let's finish and push #183 first. We cannot bump vendor/postgres before that.

stepashka · 2022-07-28T09:55:52Z

@hlinnaka , #183 was merged recently, does it mean that this one can be merged too?
are the changes you've mentioned earlier still required?

knizhnik · 2022-09-20T07:15:36Z

Repalced by #213 and #214

lubennikovaav and others added 30 commits July 7, 2022 11:44

lastWrittenPageLSN.patch

eb463f0

Save lastWrittenPageLSN in XLogCtlData to know what pages to request from remote pageserver. Authors: Konstantin Knizhnik <knizhnik@garret.ru> Heikki Linnakangas <heikki.linnakangas@iki.fi>

[walproposer] wal_proposer.patch

736487d

Add WalProposer background worker to broadcast WAL stream to Zenith WAL acceptors Author: Konstantin Knizhnik <knizhnik@garret.ru>

persist_unlogged_tables.patch

1b994c3

Ignore unlogged table qualifier. Add respective changes to regression test outputs. Author: Konstantin Knizhnik <knizhnik@garret.ru>

fix_pg_table_size.patch

8d7a4ea

Request relation size via smgr function, not just stat(filepath).

[walredo] fix_gin_redo.patch

480a745

Author: Konstantin Knizhnik <knizhnik@garret.ru>

[walredo] fix_brin_redo.patch

7f82995

Author: Konstantin Knizhnik <knizhnik@garret.ru>

speculative_records_workaround.patch

a5e82cb

wallog_t_ctid.patch

176878f

vacuumlazy_debug_stub.patch

b67388f

[test] zenith_test_evict.patch

497c6a1

fix_sequence_wallogging.patch

0ad2d1c

Bring back change that got lost in refactoring. silence ReadBuffer_co…

bf5d710

…mmon error. TODO: add a comment, why this is fine for zenith.

[contrib/zenith] [refer #225] if insert WAL position points at the en…

1c6a260

…d of WAL page header, then return it back to the page origin

[walproposer] Create replication slot for walproposer to avoid loose …

45fad00

…of WAL at compute node + Check for presence of replication slot

[walproposer] Skip absent WAL segment removed by pg_resetwal

7589331

[walproposer] Make it possible to start postgres without reading chec…

960cb8c

…kpoint from WAL + Check for presence of zenith.signal file to allow skip reading checkpoint record from WAL + Pass prev_record_ptr through zenith.signal file to postgres

[walproposer] Simplify WL_LATCH_SET testing in the walproposer

4cdee32

[smgr_api] [contrib/zenith] 1. Do not call mdinit from smgrinit() bec…

81735a8

…ause it cause memory leak in wal-redo-postgres 2. Add check for local relations to make it possible to use DEBUG_COMPARE_LOCAL mode in SMGR + Call smgr_init_standard from smgr_init_zenith

[walproposer] [contrib/zenith] support zenith_tenant

96b0212

this patch adds support for zenith_tenant variable. it has similar format as zenith_timeline. It is used in callmemaybe query to pass tenant to pageserver and in ServerInfo structure passed to wal acceptor

[walproposer] [contrib/zenith] [refer #395] Do no align sart replicat…

d84f93e

…ion position in wal_proppser to segment boundary

[test] Add contrib/zenith_test_utils with helpers for testing and deb…

f0aeb64

…ugging. Now it contains only one function test_consume_xids() for xid wraparound testing.

[walproposer] Change condition for triggering recovery

843ec86

lubennikovaav and others added 21 commits July 7, 2022 12:38

Implement pg_database_size():

f7ab7b5

- extend zenith pageserver API to handle new request type; - add dbsize_hook to intercept db_dir_size() call.

Shut down instance on basebackup LSN mismatch.

88cd1c4

To force making basebackup again.

Use compute-tools from the new neondatabase Docker Hub repo

46a03a3

Use compute_ctl instead of zenith_ctl (#162)

17142a1

Rename contrib/zenith to contrib/neon. Rename custom GUCs:

55644db

- zenith.page_server_connstring -> neon.pageserver_connstring - zenith.zenith_tenant -> neon.tenant_id - zenith.zenith_timeline -> neon.timeline_id - zenith.max_cluster_size -> neon.max_cluster_size

Rename 'wal_acceptors' GUC to 'safekeepers'

25e4321

Fix basebackup LSN comparison in walproposer.

ee26890

as basebackup LSN always skips over page header

Allow specifiyng the different compute-tools tag (#170)

493c8d2

Part of neondatabase/neon#1838

Do not allocate shared memory for wal_redo process (#165)

a9714fe

* Do not allocate shared memory for wal_redo process * Add comment

Add check for NULL for malloc in InternalIpcMemoryCreate (#173)

eace46e

* Add check for NULL for malloc in InternalIpcMemoryCreate * apply pgindent

Rename ZenithFeedback (#174)

0019c01

Remove callmemaybe bits from compute (#172)

b41fbe6

Panic on receiving higher term (#176)

4e3ab45

Large last written lsn cache (#177)

595ac69

Maintain cache of last written LSN for each relation segment (8 Mb).

Add uuid-ossp to the supported extensions (#181)

db67809

* Add uuid-ossp to the supported extensions Also update compile flags to `-O2` to trade compile time for PostgreSQL performance, and removes --enable-cassert.

Update last written LSN for gin/gist index metadata (#182)

7517d1c

* Update last written LSN for gin/gist index metadata * Replace SetLastWrittenLSN with family of SetLastWrittenLSNFFor* functions

Undo diasming VM check warning in vacuumlazy.c

ae6d46a

knizhnik requested a review from lubennikovaav July 22, 2022 20:09

hlinnaka approved these changes Jul 26, 2022

View reviewed changes

MMeent force-pushed the main branch from 3f315a1 to a479855 Compare August 19, 2022 17:59

knizhnik changed the title ~~Undo diasming VM check warning in vacuumlazy.c~~ Undo disarming VM check warning in vacuumlazy.c Sep 20, 2022

knizhnik closed this Sep 20, 2022

knizhnik deleted the undo_vm_change branch September 20, 2022 07:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Undo disarming VM check warning in vacuumlazy.c #184

Undo disarming VM check warning in vacuumlazy.c #184

knizhnik commented Jul 22, 2022

hlinnaka left a comment

hlinnaka commented Jul 26, 2022

stepashka commented Jul 28, 2022

knizhnik commented Sep 20, 2022

Undo disarming VM check warning in vacuumlazy.c #184

Undo disarming VM check warning in vacuumlazy.c #184

Conversation

knizhnik commented Jul 22, 2022

hlinnaka left a comment

Choose a reason for hiding this comment

hlinnaka commented Jul 26, 2022

stepashka commented Jul 28, 2022

knizhnik commented Sep 20, 2022