-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clear_data can delete the same transaction twice, resulting in debug_assert #2906
Comments
As the same tx may be included in several chunks, it's possible to delete the tx as many times as amount of Forks we currently have. |
SkidanovAlex
added a commit
that referenced
this issue
Jul 8, 2020
After this change stress.py passes consistently, and is reintroduced to nightly. Nearcore fixes: [v] We had a bug in the syncing logic (with a low chance of being triggered in the wild): if a block is produced, and between 1/3 and 2/3 of block producers received it, and the rest have not, the system stalls, because no 2/3 of block producers have the same head, but also nobody is two blocks behind the highest peer to start syncing. Fixing it by forcing sync if we've been 1 block behind for too long. stress.py was reproducing this issue in every run Test fixes [v] Fixing a scenario in which a failure to send a transaction to all validators resulted in recording an incorrect tx hash alongside the tx. Later when checking balances using the incorrect hash resulted in getting incorrect success value, and thus applying incorrect corrections to the expected balances; [ ] Removing the old infrastructure for network interference, which relied on certain node setup, and instead using the new network proxy infrastructure. [ ] Adding a new argument that controls the percentage of dropped messages between nodes. [v] Changing the order of magnitude of staking transactions, so that the validator set actually changes. [ ] Altering `node_restart` process to ocassionally wipe out the data folder of the node, so that we stress state sync (and syncing in general) more Other issues discovered while fixing stress.py: - #2906
SkidanovAlex
added a commit
that referenced
this issue
Jul 8, 2020
After this change stress.py passes consistently, and is reintroduced to nightly. Nearcore fixes: [v] We had a bug in the syncing logic (with a low chance of being triggered in the wild): if a block is produced, and between 1/3 and 2/3 of block producers received it, and the rest have not, the system stalls, because no 2/3 of block producers have the same head, but also nobody is two blocks behind the highest peer to start syncing. Fixing it by forcing sync if we've been 1 block behind for too long. stress.py was reproducing this issue in every run Test fixes [v] Fixing a scenario in which a failure to send a transaction to all validators resulted in recording an incorrect tx hash alongside the tx. Later when checking balances using the incorrect hash resulted in getting incorrect success value, and thus applying incorrect corrections to the expected balances; [ ] Removing the old infrastructure for network interference, which relied on certain node setup, and instead using the new network proxy infrastructure. [ ] Adding a new argument that controls the percentage of dropped messages between nodes. [v] Changing the order of magnitude of staking transactions, so that the validator set actually changes. [ ] Altering `node_restart` process to ocassionally wipe out the data folder of the node, so that we stress state sync (and syncing in general) more Other issues discovered while fixing stress.py: - #2906
SkidanovAlex
added a commit
that referenced
this issue
Jul 24, 2020
After this change stress.py passes consistently, and is reintroduced to nightly. Nearcore fixes: [v] We had a bug in the syncing logic (with a low chance of being triggered in the wild): if a block is produced, and between 1/3 and 2/3 of block producers received it, and the rest have not, the system stalls, because no 2/3 of block producers have the same head, but also nobody is two blocks behind the highest peer to start syncing. Fixing it by forcing sync if we've been 1 block behind for too long. stress.py was reproducing this issue in every run Test fixes [v] Fixing a scenario in which a failure to send a transaction to all validators resulted in recording an incorrect tx hash alongside the tx. Later when checking balances using the incorrect hash resulted in getting incorrect success value, and thus applying incorrect corrections to the expected balances; [ ] Removing the old infrastructure for network interference, which relied on certain node setup, and instead using the new network proxy infrastructure. [ ] Adding a new argument that controls the percentage of dropped messages between nodes. [v] Changing the order of magnitude of staking transactions, so that the validator set actually changes. [ ] Altering `node_restart` process to ocassionally wipe out the data folder of the node, so that we stress state sync (and syncing in general) more Other issues discovered while fixing stress.py: - #2906
SkidanovAlex
added a commit
that referenced
this issue
Jul 30, 2020
After this change stress.py passes consistently, and is reintroduced to nightly. Nearcore fixes: [v] We had a bug in the syncing logic (with a low chance of being triggered in the wild): if a block is produced, and between 1/3 and 2/3 of block producers received it, and the rest have not, the system stalls, because no 2/3 of block producers have the same head, but also nobody is two blocks behind the highest peer to start syncing. Fixing it by forcing sync if we've been 1 block behind for too long. stress.py was reproducing this issue in every run Test fixes [v] Fixing a scenario in which a failure to send a transaction to all validators resulted in recording an incorrect tx hash alongside the tx. Later when checking balances using the incorrect hash resulted in getting incorrect success value, and thus applying incorrect corrections to the expected balances; [ ] Removing the old infrastructure for network interference, which relied on certain node setup, and instead using the new network proxy infrastructure. [ ] Adding a new argument that controls the percentage of dropped messages between nodes. [v] Changing the order of magnitude of staking transactions, so that the validator set actually changes. [ ] Altering `node_restart` process to ocassionally wipe out the data folder of the node, so that we stress state sync (and syncing in general) more Other issues discovered while fixing stress.py: - #2906
nearprotocol-bulldozer bot
pushed a commit
that referenced
this issue
Jul 31, 2020
…#3036) After this change stress.py node_restart passes relatively consistently, and is reintroduced to nightly. Nearcore fixes: - We had a bug in the syncing logic (with a low chance of being triggered in the wild): if a block is produced, and between 1/3 and 2/3 of block producers received it, and the rest have not, the system stalls, because no 2/3 of block producers have the same head, but also nobody is two blocks behind the highest peer to start syncing. Fixing it by forcing sync if we've been 1 block behind for too long. stress.py was reproducing this issue in every run - (#2916) we had an issue that if a node produced a chunk, and then crashed, on recovery it was not able to serve it because it didn't have all the parts and receipts stored in the storage from which we recover cache entries in the shards manager. Fixing it by always storing all the parts and receipts (redundantly) for chunks in the shards we care about. Test fixes [v] Fixing a scenario in which a failure to send a transaction to all validators resulted in recording an incorrect tx hash alongside the tx. Later when checking balances using the incorrect hash resulted in getting incorrect success value, and thus applying incorrect corrections to the expected balances; [v] Changing the order of magnitude of staking transactions, so that the validator set actually changes. Other issues discovered while fixing stress.py: - #2906
bowenwang1996
pushed a commit
that referenced
this issue
Aug 14, 2020
…#3036) After this change stress.py node_restart passes relatively consistently, and is reintroduced to nightly. Nearcore fixes: - We had a bug in the syncing logic (with a low chance of being triggered in the wild): if a block is produced, and between 1/3 and 2/3 of block producers received it, and the rest have not, the system stalls, because no 2/3 of block producers have the same head, but also nobody is two blocks behind the highest peer to start syncing. Fixing it by forcing sync if we've been 1 block behind for too long. stress.py was reproducing this issue in every run - (#2916) we had an issue that if a node produced a chunk, and then crashed, on recovery it was not able to serve it because it didn't have all the parts and receipts stored in the storage from which we recover cache entries in the shards manager. Fixing it by always storing all the parts and receipts (redundantly) for chunks in the shards we care about. Test fixes [v] Fixing a scenario in which a failure to send a transaction to all validators resulted in recording an incorrect tx hash alongside the tx. Later when checking balances using the incorrect hash resulted in getting incorrect success value, and thus applying incorrect corrections to the expected balances; [v] Changing the order of magnitude of staking transactions, so that the validator set actually changes. Other issues discovered while fixing stress.py: - #2906
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I can only reproduce it under stress. But possibly we can figure out why that happens without having a lightweight repro, just by inspecting the logic.
The observed behavior:
Debug assert with "Transaction overwrites itself" (it is inside
StoreUpdate::commit
), that shows in the trace the following line twice (and multiple other similar lines, each occurring twice):- ColTransactions 5rrivziGwNuAsKp5WqE1Z7Eh5oCPKZ6YtYL6sfZZz8Xw
Indicating that the same transaction was cleaned up twice.
The stacktrace indicates that the
commit
is invoked from withinclear_data
.If you do want to reproduce, check out
stress_fixed
branch and runpython tests/stress/stress.py 3 3 3 0 staking transactions node_restart
It reproduces relatively consistently.
The text was updated successfully, but these errors were encountered: