clear_data can delete the same transaction twice, resulting in debug_assert #2906

SkidanovAlex · 2020-06-26T20:18:50Z

I can only reproduce it under stress. But possibly we can figure out why that happens without having a lightweight repro, just by inspecting the logic.

The observed behavior:

Debug assert with "Transaction overwrites itself" (it is inside StoreUpdate::commit), that shows in the trace the following line twice (and multiple other similar lines, each occurring twice):

- ColTransactions 5rrivziGwNuAsKp5WqE1Z7Eh5oCPKZ6YtYL6sfZZz8Xw

Indicating that the same transaction was cleaned up twice.
The stacktrace indicates that the commit is invoked from within clear_data.

If you do want to reproduce, check out stress_fixed branch and run

python tests/stress/stress.py 3 3 3 0 staking transactions node_restart

It reproduces relatively consistently.

The text was updated successfully, but these errors were encountered:

Kouprin · 2020-06-30T05:16:43Z

As the same tx may be included in several chunks, it's possible to delete the tx as many times as amount of Forks we currently have.
The solution is to store refcount on tx usage.

After this change stress.py passes consistently, and is reintroduced to nightly. Nearcore fixes: [v] We had a bug in the syncing logic (with a low chance of being triggered in the wild): if a block is produced, and between 1/3 and 2/3 of block producers received it, and the rest have not, the system stalls, because no 2/3 of block producers have the same head, but also nobody is two blocks behind the highest peer to start syncing. Fixing it by forcing sync if we've been 1 block behind for too long. stress.py was reproducing this issue in every run Test fixes [v] Fixing a scenario in which a failure to send a transaction to all validators resulted in recording an incorrect tx hash alongside the tx. Later when checking balances using the incorrect hash resulted in getting incorrect success value, and thus applying incorrect corrections to the expected balances; [ ] Removing the old infrastructure for network interference, which relied on certain node setup, and instead using the new network proxy infrastructure. [ ] Adding a new argument that controls the percentage of dropped messages between nodes. [v] Changing the order of magnitude of staking transactions, so that the validator set actually changes. [ ] Altering `node_restart` process to ocassionally wipe out the data folder of the node, so that we stress state sync (and syncing in general) more Other issues discovered while fixing stress.py: - #2906

…#3036) After this change stress.py node_restart passes relatively consistently, and is reintroduced to nightly. Nearcore fixes: - We had a bug in the syncing logic (with a low chance of being triggered in the wild): if a block is produced, and between 1/3 and 2/3 of block producers received it, and the rest have not, the system stalls, because no 2/3 of block producers have the same head, but also nobody is two blocks behind the highest peer to start syncing. Fixing it by forcing sync if we've been 1 block behind for too long. stress.py was reproducing this issue in every run - (#2916) we had an issue that if a node produced a chunk, and then crashed, on recovery it was not able to serve it because it didn't have all the parts and receipts stored in the storage from which we recover cache entries in the shards manager. Fixing it by always storing all the parts and receipts (redundantly) for chunks in the shards we care about. Test fixes [v] Fixing a scenario in which a failure to send a transaction to all validators resulted in recording an incorrect tx hash alongside the tx. Later when checking balances using the incorrect hash resulted in getting incorrect success value, and thus applying incorrect corrections to the expected balances; [v] Changing the order of magnitude of staking transactions, so that the validator set actually changes. Other issues discovered while fixing stress.py: - #2906

SkidanovAlex added the A-chain Area: Chain, client & related label Jun 26, 2020

SkidanovAlex assigned Kouprin Jun 26, 2020

ilblackdragon added the C-bug Category: This is a bug label Jun 29, 2020

Kouprin mentioned this issue Jul 1, 2020

fix(GC): removing Txs only if they not included to any Chunk #2924

Merged

Kouprin closed this as completed in #2924 Jul 3, 2020

weekly-digest bot mentioned this issue Jul 3, 2020

Weekly Digest (26 June, 2020 - 3 July, 2020) #2941

Closed

SkidanovAlex mentioned this issue Jul 24, 2020

(fix): Making stress.py with node_restart mode pass, and fixing #2916 #3036

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clear_data can delete the same transaction twice, resulting in debug_assert #2906

clear_data can delete the same transaction twice, resulting in debug_assert #2906

SkidanovAlex commented Jun 26, 2020 •

edited

Loading

Kouprin commented Jun 30, 2020

clear_data can delete the same transaction twice, resulting in debug_assert #2906

clear_data can delete the same transaction twice, resulting in debug_assert #2906

Comments

SkidanovAlex commented Jun 26, 2020 • edited Loading

Kouprin commented Jun 30, 2020

SkidanovAlex commented Jun 26, 2020 •

edited

Loading