Set last written lsn for created relation #2398

knizhnik · 2022-09-06T13:10:31Z

No description provided.

hlinnaka · 2022-09-06T13:13:55Z

Seems reasonable, but I'm curious if the lack of this was causing some user-visible issue? Can you write a test case?

knizhnik · 2022-09-06T13:39:53Z

Seems reasonable, but I'm curious if the lack of this was causing some user-visible issue? Can you write a test case?

This is what I trying to create all this day. Without any success:(
This change is motivated by failure of test_parallel_copy test (#2384)
It failed to find key with relation size in get_page_at_lsn.
I tried to understand how it is possible (assuming that there is no data corruption or race condition in pageserver). I have inspected content of WAL after execution of "CREATE TABLE" command. There is only one record associated with created relation: SMGR_CREATE. But we are not update last written LSN cache.So if we try to send any request to this relation to pageserver, then we will use LSN preceeding one of SMGR_CREATE record. Compute node will specify "latest" flag, but it can happen that it is also smaller than LSN of SMGR_CREATE.

It is not so difficult to force this scenario: just inset some sleep in ingest_xlog_smgr_create.
But what I failed to make compute to request page of this relation fro page server. We have relation size cache at compute, so we know that relation is empty and do not try to read any pages from it.

Also I wonder why test_parallel_copy may cause reading relation pages while copying data to it.

hlinnaka · 2022-09-07T07:04:20Z

I think I just hit this bug on my PR: https://github.com/neondatabase/neon/actions/runs/3002515510

knizhnik · 2022-09-07T07:38:17Z

I think I just hit this bug on my PR: https://github.com/neondatabase/neon/actions/runs/3002515510

I more precisely investigated the log and almost sure that the problem is cause by lsn written cache: it is not updated after relation creation (and it is fixed by this PR).
As you can see request LSN is 0/016960E8:

asyncpg.exceptions.PostgresIOError: could not read block 3 in rel 1663/12972/16384.0 from page server at lsn 0/016960E8

and it is LSN of branch creation:

2022-09-06 21:32:27.756 INFO [neon_fixtures.py:1151] Run success: Created timeline '55a645087623b4cffacbb3336d4d2825' at Lsn 0/16960E8 for tenant: a310952299e6717aa7b9bab68c685eba. Ancestor timeline: 'empty'

So this LSN precedes moment when table is created and that it why key is not found.

hlinnaka · 2022-09-07T08:45:34Z

The question remains, what exactly is the sequence of events here? I would've thought it goes like this:

CREATE TABLE, creates SMGR_CREATE record. Without this PR, it does not update last-written LSN
COPY extends the table from 0 to 1 pages. It calls smgrextend(), which updates the relsize cache, and also updates the last-written LSN for the block range, and for the relation (i.e. for REL_METADATA_PSEUDO_BLOCKNO).

So even if we're missing a SetLastWrittenLSNForRelation() call when the relation is created, the COPY should do it before any GetPage requests on the table are issued. What am I missing?

hlinnaka · 2022-09-07T09:04:07Z

Hmm, I think this is what actually happens:

CREATE TABLE, creates SMGR_CREATE record. Without this PR, it does not update last-written LSN
COPY extends the table from 0 to 1 pages. It calls smgrextend(), which calls SetLastWrittenLSNForBlock() and SetLastWrittenLSNForRelation(). However: when the heap is extended, it extends the relation with empty pages, with 0/0 LSN. The SetLastWrittenLSN calls do nothing, when the LSN is invalid.

I think the "blulk extension" code in RelationAddExtraBlocks() needs to be hit for this to lead to an error. RelationAddExtraBlocks() extends the relation with empty pages without WAL-logging them.

knizhnik · 2022-09-07T09:19:30Z

So even if we're missing a SetLastWrittenLSNForRelation() call when the relation is created, the COPY should do it before any GetPage requests on the table are issued. What am I missing?

This is what I do not understand myself:(
As we see, pageserver failed in get_page_at_lsn call. It means that one of parallel copy tasks tries to load some page pf this relation. But it means that relation shoudl be considered as non-empty i.e. was extended.
But if it is extended (smgr_extend) is called, then SetLastWrittenLsn should be called for this relation and the fact that it was not called after creation of relation doesn't matter any more.

knizhnik · 2022-09-07T09:33:09Z

O just realized that actually smgr_extend can be called and in turn call SetLastWrittenLSNForRelation.
But if there is no LSN stored in the mage image (PageGetLSN) then this SetLastWrittenLSNForRelation does nothing and does not update last written LSN for this relation.
And calling ReadBuffer* with P_NEW actually initializes page with zeros.

So there is no magic here, but still any my attempt to reproduce the bug by inserting delays didn't succeed.

knizhnik · 2022-09-07T10:08:38Z

The strangest thing here is that this PR is setting last written LSN for the relation metadata (i.e. REL_METADATA_PSEUDO_BLOCKNO). But according to the log, the failure happens in GetPage, so last written LSN for the correspondent chunk should be used instead. It is unclear how setting LSN for relation metadata may affect it. But is it a fact there are no CI failures in this branch, although I have restarted tests more than ten times.

knizhnik · 2022-09-09T13:45:51Z

Three news:

(good) I managed to reliably reproduce this error.
(bad) My patches in this PR really doesn't prevent it
It is hard to write test for it.

So, as @hlinnaka expected, the problem is related with bulk relation extension. In this case quantum of new page is allocaetd using smgr_extend with zero buffer. So it them are not wal-logged and last written LSN is not updated for them. If such page is swapped out and the accessed before SMGR_CREATE record is replayed by pageserver and some other page from this chunk is updated, then we will get this "key not found" error, because we try to retrieve page of the relation which doesn't not yet exist at pageserver. But reproducing all this conditions is very non trivial and I have spent couple of days trying to simulate stuation which rarely happens on CI and never at local runs.

So what I have to do:

Insert large sleep before self.put_rel_creation ni walingest. So it simulates situation when lagging page server is not yet replayed this record when page of this relation is accessed.

	std::thread::sleep(std::time::Duration::from_millis(30_000));
        self.put_rel_creation(modification, rel)?;

Bulk relation extension (RelationAddExtraBlocks) is used only when there are mutliple waiter: backends trying to extend relation. It just comment this logic and let RelationAddExtraBlocks always be used:

	/*
	 * If we need the lock but are not able to acquire it immediately, we'll
	 * consider extending the relation by multiple blocks at a time to manage
	 * contention on the relation extension lock.  However, this only makes
	 * sense if we're using the FSM; otherwise, there's no point.
	 */
	if (needLock)
	{
		#if 0
		if (!use_fsm)
			LockRelationForExtension(relation, ExclusiveLock);
		else if (!ConditionalLockRelationForExtension(relation, ExclusiveLock))
		#endif

and

	/* Use the length of the lock wait queue to judge how much to extend. */
	#if 0
	lockWaiters = RelationExtensionLockWaiterCount(relation);
	if (lockWaiters <= 0)
		return;
	#endif

Then I set breakpoint in GDB at the end of RelationAddExtraBlocks at let it extend relation, mark pages as free in free space map but do not fill them.
Create relation and initiate copy to it:

create table t1(x integer);
copy t1 from '/tmp/t.csv';

Backend is stopped at breakpoint.

In another session initiate another copy and get key not found error:

copy t1 from '/tmp/t.csv';
ERROR:  could not read block 0 in rel 1663/13010/16384.0 from page server at lsn 0/0169A3F8
DETAIL:  page server returned error: could not find data for key 000000067F000032D20000400000FFFFFFFF at LSN 0/169A380, for request at LSN 0/169A488
CONTEXT:  COPY t1, line 1000

knizhnik · 2022-09-09T13:48:24Z

I have not checked, but looks like the problem is not caused by my last written lsn cache. It just increase probability of such error.

knizhnik · 2022-09-12T09:19:21Z

I wonder if I should continue attempts to create some test reproducing the problem? Looks like we need something like failpoints mechanism but now for C (for postgres code). There are actually two mechanism used to reproduce race condition bugs;

Specify particular schedule using sleeps or some other mechanism.
Repeat test many times hoping that any synchronization bug will sooner or later show up itself.
In this particular case approach 2 is not working well, because the problem can happen t the very beginning work with relation. So we need to have multiple backends trying to append data to empty (just created) relation. So we nee to organize concurrent access to the relation and contention fro the very beginning. It seems to be difficult to achieve at normal desktop or laptop. This is why the problem never reproduced locally, I am not sure what is more critical for reproducing the problem: larger number of cores or some background activity which delays backends acting as some kind of barrier.

In any case, the problem seems to be clear and this PR is fixing it (I hope: the problem is not reproduced with my manual scenario).

hlinnaka

Ok, let's get this in.

Careful with vendor/postgres-v15! I think this PR is about to make the same mistake as commit f44afba, and changes vendor/postgres-v15 actually be v14 again.

pgxn/neon/pagestore_smgr.c

knizhnik · 2022-09-15T19:33:35Z

Ok, let's get this in.

Careful with vendor/postgres-v15! I think this PR is about to make the same mistake as commit f44afba, and changes vendor/postgres-v15 actually be v14 again.

I created 2 PRs for core part of this patch:
neondatabase/postgres#209
neondatabase/postgres#211
Them require approval to be committed.
Once been commiitted, I will update submodule references in neon repository.

hlinnaka · 2022-09-16T08:09:38Z

Ok, let's get this in.
Careful with vendor/postgres-v15! I think this PR is about to make the same mistake as commit f44afba, and changes vendor/postgres-v15 actually be v14 again.

I created 2 PRs for core part of this patch: neondatabase/postgres#209 neondatabase/postgres#211 Them require approval to be committed. Once been commiitted, I will update submodule references in neon repository.

Approved neondatabase/postgres#209 and opened new PR for the v15 changes at neondatabase/postgres#212, with REL_15_STABLE_neon as the base.

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>

knizhnik · 2022-09-19T05:20:22Z

Postgres core part of this PR is merged, neon part still waiting for review. Not ACID:)

hlinnaka mentioned this pull request Sep 7, 2022

rebase generic remote storage pr #2401

Closed

knizhnik force-pushed the create_rel_lsn branch from c8839f8 to 24e9713 Compare September 10, 2022 12:21

hlinnaka reviewed Sep 15, 2022

View reviewed changes

pgxn/neon/pagestore_smgr.c Outdated Show resolved Hide resolved

knizhnik and others added 4 commits September 16, 2022 11:48

Set last written lsn for created relation

e501672

use current LSN for updating last written LSN of relation metadata

c6b924b

Update LSN for the extended blocks even for pges without LSN (zeroed)

674f9f8

Update pgxn/neon/pagestore_smgr.c

84d3fb2

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>

knizhnik force-pushed the create_rel_lsn branch from eda89e9 to 84d3fb2 Compare September 16, 2022 08:48

knizhnik requested a review from hlinnaka September 16, 2022 13:50

hlinnaka approved these changes Sep 19, 2022

View reviewed changes

knizhnik merged commit 846d126 into main Sep 19, 2022

knizhnik deleted the create_rel_lsn branch September 19, 2022 09:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set last written lsn for created relation #2398

Set last written lsn for created relation #2398

knizhnik commented Sep 6, 2022

hlinnaka commented Sep 6, 2022

knizhnik commented Sep 6, 2022

hlinnaka commented Sep 7, 2022

knizhnik commented Sep 7, 2022

hlinnaka commented Sep 7, 2022

hlinnaka commented Sep 7, 2022

knizhnik commented Sep 7, 2022

knizhnik commented Sep 7, 2022

knizhnik commented Sep 7, 2022

knizhnik commented Sep 9, 2022 •

edited

Loading

knizhnik commented Sep 9, 2022

knizhnik commented Sep 12, 2022

hlinnaka left a comment •

edited

Loading

knizhnik commented Sep 15, 2022 •

edited

Loading

hlinnaka commented Sep 16, 2022

knizhnik commented Sep 19, 2022

Set last written lsn for created relation #2398

Set last written lsn for created relation #2398

Conversation

knizhnik commented Sep 6, 2022

hlinnaka commented Sep 6, 2022

knizhnik commented Sep 6, 2022

hlinnaka commented Sep 7, 2022

knizhnik commented Sep 7, 2022

hlinnaka commented Sep 7, 2022

hlinnaka commented Sep 7, 2022

knizhnik commented Sep 7, 2022

knizhnik commented Sep 7, 2022

knizhnik commented Sep 7, 2022

knizhnik commented Sep 9, 2022 • edited Loading

knizhnik commented Sep 9, 2022

knizhnik commented Sep 12, 2022

hlinnaka left a comment • edited Loading

Choose a reason for hiding this comment

knizhnik commented Sep 15, 2022 • edited Loading

hlinnaka commented Sep 16, 2022

knizhnik commented Sep 19, 2022

knizhnik commented Sep 9, 2022 •

edited

Loading

hlinnaka left a comment •

edited

Loading

knizhnik commented Sep 15, 2022 •

edited

Loading