Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practice to handle "no old tuple data for UPDATE in table"? #283

Open
qianl15 opened this issue May 23, 2024 · 11 comments
Open

Best practice to handle "no old tuple data for UPDATE in table"? #283

qianl15 opened this issue May 23, 2024 · 11 comments

Comments

@qianl15
Copy link

qianl15 commented May 23, 2024

Hello, I've been using wal2json to export changes from a Postgres instance. I use pg_logical_slot_peek_changes to obtain changes and then pg_replication_slot_advance to advance the slot separately. However, very occasionally I would get a no old tuple data for UPDATE in table warning message emitted from wal2json, and the update record would have an empty identity field.

I found the source code that emits this message, but I wonder why could it happen? If this happens, should I retry pg_logical_slot_peek_changes again? What is the best practice to handle this situation?

I also couldn't find a way to reproduce this issue. This happens randomly once every few days on a database with low load.

Any help would be appreciated. Thanks!

@eulerto
Copy link
Owner

eulerto commented May 23, 2024

@qianl15 there are a few cases related to this message. Read about REPLICA IDENTITY. Postgres writes the information (old tuple) in the WAL that identifies the row that it is updating or deleting.

This message generally means:

a. REPLICA IDENTITY DEFAULT: your table does not contain a primary key. That's the common case.
b. REPLICA IDENTITY NOTHING: it does not store old row. Your setup is wrong and you should change it to DEFAULT (if the table has a primary key).

If it is case (a) and you don't want to add a primary key, you have 2 options:

  • REPLICA IDENTITY FULL: it stores all columns as old tuple, which can increase the amount of WAL to write.
  • REPLICA IDENTITY INDEX: it stores columns from an unique index as old tuple.

@qianl15
Copy link
Author

qianl15 commented May 23, 2024

Thanks @eulerto for your quick response! Just to clarify that my table contains a primary key and uses REPLICA IDENTITY DEFAULT. In my test, I insert 10,000 rows into that table, peek changes, export changes to another database, and finally advance the replication slot. I repeat this test process many times a day, but this empty identity issue only happens once every few days. More precisely, occasionally only one out of thousands of update records would have an empty identity field.

The table looks like this:

create table hello (
 name  text,
 greet_count  integer,
 primary key (name)
)

My workload simply updates the greet_count of the same row 10,000 times.

The peek slot command I used:

select lsn::text, xid::text, data from pg_logical_slot_peek_changes('myslot', NULL, 5000, 'format-version', '2', 'include-types', 'false', 'filter-tables', '*.knex_migrations, *.knex_migrations_lock', 'include-transaction', 'true', 'include-lsn', 'true')

where I limit to peek up to 5000 changes each time.

Environment: RDS Postgres 16.1

@eulerto
Copy link
Owner

eulerto commented May 23, 2024

What's your Postgres version? Are you using the latest wal2json version? What's the SQL command to advance the replication slot?

Did you observe any pattern from the missing old tuples? Could you share a test case that we can reproduce the issue? (Even if the test case takes several hour to reproduce.)

@qianl15
Copy link
Author

qianl15 commented May 23, 2024

Postgres version: PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit
wal2json version: 2.5 (pre-installed by RDS)

Advance slot command: SELECT * FROM pg_replication_slot_advance(myslot, <latest_nextlsn>)
I iterate through the records I got from pg_logical_slot_peek_changes and obtained latest_nextlsn from the "nextlsn" field in the last "C" record of the batch.

I didn't see any particular pattern as it appears to happen rarely at random intervals. I will try to reproduce on my end as well. The actual test depends on a larger internal system on our side, but here's the simplified description of it.

-- Create table
create table hello (name  text, greet_count  integer, primary key (name));

-- Create replication slot
select * from pg_create_logical_replication_slot('myslot', 'wal2json');

-- Repeat this command many times
INSERT INTO hello (name, greet_count) VALUES ('test', 1) ON CONFLICT (name) DO UPDATE SET greet_count = hello.greet_count + 1 RETURNING greet_count;

-- The following steps advance the replication slot
-- 1. Peak changes in batches

select lsn::text, xid::text, data from pg_logical_slot_peek_changes('myslot', NULL, 5000, 'format-version', '2', 'include-types', 'false', 'filter-tables', '*.knex_migrations, *.knex_migrations_lock', 'include-transaction', 'true', 'include-lsn', 'true')

-- 2. Iterate through records from the above command and get the nextlsn from the last commit record

-- 3. Advance slot with the latest nextlsn
SELECT * FROM pg_replication_slot_advance(myslot, <latest_nextlsn>)

-- 4. Repeat steps 1~3 until no more records are read

@qianl15
Copy link
Author

qianl15 commented May 27, 2024

This only occurred once since I first reported, even though the test continuously runs every hour. Another observation: if I run pg_logical_slot_peek_changes again the data would become normal. It looks like a very rare and transient issue.

Is there any other traces I can provide to help figure out the issue?

@eulerto
Copy link
Owner

eulerto commented Jun 4, 2024

I need a test case. Maybe the WAL records might be useful. Use pg_walinspect to extract the affected transaction. Are you using pg_logical_emit_message?

@qianl15
Copy link
Author

qianl15 commented Jun 23, 2024

Hello! I will get back to you in a bit regarding the pg_walinspect info.

In this particular case, no pg_logical_emit_message was used. Is there anything I need to pay attention to regarding pg_logical_emit_message?

@qianl15
Copy link
Author

qianl15 commented Jun 25, 2024

Hi @eulerto here's the pg_walinspect output from the affected transaction and transactions surrounding it. The xid was 34826270. All transactions executed the same set of statements (the INSERT ... ON CONFLICT DO UPDATE command I provided above, and a command to insert to another append-only table).

  start_lsn   |   end_lsn    |   prev_lsn   |   xid    | resource_manager | record_type | record_length | main_data_length | fpi_length |                                            description                                            |                                                             block_ref
--------------+--------------+--------------+----------+------------------+-------------+---------------+------------------+------------+---------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------
 160/2130D1B8 | 160/2130D280 | 160/2130D188 | 34826269 | Heap             | INSERT      |           194 |                3 |          0 | off: 45, flags: 0x08                                                                              | blkref #0: rel 1663/16562/16564 fork main blk 18812
 160/2130D280 | 160/2130DBA0 | 160/2130D1B8 | 34826269 | Btree            | INSERT_LEAF |          2331 |                2 |       2276 | off: 59                                                                                           | blkref #0: rel 1663/16562/16569 fork main blk 10073 (FPW); hole: offset: 332, length: 3540, compression saved: 2376, method: zstd
 160/2130DBA0 | 160/2130DBE8 | 160/2130D280 | 34826269 | Transaction      | COMMIT      |            65 |               36 |          0 | 2024-06-24 21:03:59.298358+00; origin: node 1, lsn 160/2130D1B8, at 2024-06-24 21:03:59.296157+00 |
 160/2130DBE8 | 160/2130DC20 | 160/2130DBA0 | 34826270 | Heap             | LOCK        |            54 |                8 |          0 | xmax: 34826270, off: 49, infobits: [LOCK_ONLY, EXCL_LOCK], flags: 0x00                            | blkref #0: rel 1663/16424/16439 fork main blk 1
 160/2130DC20 | 160/2130DC70 | 160/2130DBE8 | 34826270 | Heap             | HOT_UPDATE  |            78 |               14 |          0 | old_xmax: 34826270, old_off: 49, old_infobits: [], flags: 0x10, new_xmax: 34826270, new_off: 50   | blkref #0: rel 1663/16424/16439 fork main blk 1
 160/2130DC70 | 160/2130DD30 | 160/2130DC20 | 34826270 | Heap             | INSERT      |           191 |                3 |          0 | off: 15, flags: 0x08                                                                              | blkref #0: rel 1663/16424/16453 fork main blk 18814
 160/2130DD30 | 160/2130E7D0 | 160/2130DC70 | 34826270 | Btree            | INSERT_LEAF |          2692 |                2 |       2637 | off: 68                                                                                           | blkref #0: rel 1663/16424/16459 fork main blk 5596 (FPW); hole: offset: 380, length: 2820, compression saved: 2735, method: zstd
 160/2130E7D0 | 160/2130E810 | 160/2130DD30 | 34826270 | Btree            | INSERT_LEAF |            64 |                2 |          0 | off: 168                                                                                          | blkref #0: rel 1663/16424/16461 fork main blk 436
 160/2130E810 | 160/2130E840 | 160/2130E7D0 | 34826270 | Transaction      | COMMIT      |            46 |               20 |          0 | 2024-06-24 21:03:59.307775+00                                                                     |
 160/2130E840 | 160/2130E908 | 160/2130E810 | 34826271 | Heap             | INSERT      |           194 |                3 |          0 | off: 46, flags: 0x08                                                                              | blkref #0: rel 1663/16562/16564 fork main blk 18812
 160/2130E908 | 160/2130F390 | 160/2130E840 | 34826271 | Btree            | INSERT_LEAF |          2690 |                2 |       2635 | off: 68                                                                                           | blkref #0: rel 1663/16562/16569 fork main blk 5596 (FPW); hole: offset: 380, length: 2820, compression saved: 2737, method: zstd
 160/2130F390 | 160/2130F3D8 | 160/2130E908 | 34826271 | Transaction      | COMMIT      |            65 |               36 |          0 | 2024-06-24 21:03:59.310571+00; origin: node 1, lsn 160/2130E840, at 2024-06-24 21:03:59.307775+00 |
(12 rows)

The Postgres log:

DETAIL: Streaming transactions committing after 160/2130D1B8, reading WAL from 160/20430548.
STATEMENT: select lsn::text, xid::text, data from pg_logical_slot_peek_changes($1, NULL, $2, 'format-version', '2', 'include-types', 'false', 'filter-tables', 'dbos.*, *.knex_migrations, *.knex_migrations_lock', 'include-transaction', 'true', 'add-msg-prefixes', 'dbosprov', 'include-lsn', 'true')
LOG: logical decoding found consistent point at 160/20430548
DETAIL: There are no running transactions.
STATEMENT: select lsn::text, xid::text, data from pg_logical_slot_peek_changes($1, NULL, $2, 'format-version', '2', 'include-types', 'false', 'filter-tables', 'dbos.*, *.knex_migrations, *.knex_migrations_lock', 'include-transaction', 'true', 'add-msg-prefixes', 'dbosprov', 'include-lsn', 'true')
WARNING: no old tuple data for UPDATE in table "public"."dbos_hello"
CONTEXT: slot "hello_dbos_prov", output plugin "wal2json", in the change callback, associated LSN 160/2130DC20

But when I retried pg_logical_slot_peek_changes again, I got no warning messages and the old tuple identity was there. It looked like a transient issue.

@eulerto
Copy link
Owner

eulerto commented Jun 25, 2024

The only explanation for the OIDs in Relation to be invalid in following code is due to an invalidation.

wal2json/wal2json.c

Lines 2453 to 2462 in 75629c2

/*
* Before v10, there is not rd_pkindex then rely on REPLICA
* IDENTITY DEFAULT to obtain primary key.
*/
#if PG_VERSION_NUM >= 100000
if (OidIsValid(relation->rd_pkindex) || OidIsValid(relation->rd_replidindex))
#else
if (OidIsValid(relation->rd_replidindex))
#endif
{

The Relation seems to be invalidated during this code path and then OidIsValid returns false. It's a race condition. :(

@qianl15
Copy link
Author

qianl15 commented Jun 25, 2024

What would be the cause for a Relation invalidation? (sorry I'm not familiar with PG internal..)

And retrying pg_logical_slot_peek_changes would be a workaround in this case?

@eulerto
Copy link
Owner

eulerto commented Jun 26, 2024

See RelationCacheInvalidate. You should also check inval.c. Logical replication processes some invalidation messages when calls AcceptInvalidationMessages. This function removes some catalog caches and relation cache that are not useful anymore. It also rebuilds the entries that are still useful.

The only workaround is to retry pg_logical_slot_peek_changes. (I'm investigating if this code can be changed to avoid this issue.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants