eth_sendRawTransaction: remove logging, because we have `--txpool.trace.senders` which enabling more verbose logging for given senders list #7686

mh0lt · 2023-06-08T11:25:52Z

When testing with Bor consensus turned on I discovered that SendRawTransaction returns a 0x000... hash when transactions are submitted during block transitions. This turns out to be spurious in the sense that the transaction insertion is successful.

The cause is that ReadCurrentBlockNumber returns a nil block number.

This in turn is caused by the following: In accessors_chain.go there are two methods: WriteHeader and WriteHeadHeaderHash when the first is called the block number is written for the header. The second writes the header has, but there is no guarantee when it does that the head header will have been written yet. In fact it seems to happen sometime later.

The problem for SendRawTransation is that it begins a transaction after inserting into the txpool. And depending on timing this transaction may see only the WriteHeadHeaderHash insertion, and hence can't read the block number.

I have mitigated this by opening the db transaction before calling the tx pool insertion, meaning that it is more likely to have a clean view of the DB.

I have also moved the chain id check earlier in the code - as I think that if this is invalid the method should not try to insert transactions in the first place.

The ReadCurrentBlockNumber is only used to produce a log message - so I've changed this to not fail the whole function but to just log an unknown sender. Which means that the hash is still returned to the sender after a successful txpool insertion

…BlockNumber

AskAlexSharov · 2023-06-08T12:14:40Z

ReadCurrentBlockNumber may fail during block transitions. The root cause is the fact that rawdb.WriteHeadHeaderHash is called
// prior WriteHeader - which can leave the DB in a state where the header hash has no block number mapping.

no, it’s not true. We using db transactions and “all or nothing” will be committed at once. DB is never in this state: except initialSync - which is fine.

In the past Erigon also had corner-case at firstSyncCycle (after start) - but it’s fixed by: #7532

so, on tip of chain now - all block changes are done in 1 ACID transaction.

mh0lt · 2023-06-08T12:36:22Z

So that change is in the codebase - I presume.

If that is the case I think that I may have a different issue - because when I run Bor consensus in the devnet test I am doing I definitely see it several times during the test run of 200 transactions and it is not necessarily in the first block.

I put some local logging in to test and could see the transactions process interleaved.

If I revert my change - which I have locally I can do a run and give you more detail as to how it happens - as I get it almost every time I run.

I think that the WriteHeadHeaderHash originates in startHandlingForkChoice

AskAlexSharov · 2023-06-08T12:55:00Z

@mh0lt startHandlingForkChoice is part of "staged sync" and rw transaction is managed in StageLoopStep (which run all stages - including unwind - in 1 rwtx).

mh0lt · 2023-06-08T13:12:23Z

Apologies - I scrolled up too far in the code. I think its more likely: HeadersPOW. I'll reproduce the test and confirm. Will be in a couple of hours as I'm just looking at a transaction propagation issue.

AskAlexSharov · 2023-06-08T13:27:23Z

thank you

mh0lt · 2023-06-08T20:17:16Z

Attached is the debug location of the write location and the full run log:

read_header_number.log

The pattern which is repeated is as follows:

You can see that the read gets a null back, and that there is an interspersed write operation, with a read which is working on a previous context, I don't think the print order is particularly significant (I used println rather than a log) as its enough to see what is going on.

There didn't seem to be a clear way of printing a transaction id so I've just used the address. Which may be not correct - but it seems like all of the prints are using a different transaction (though this could be wrong)

ReadHeaderNumber 0xdb39c298f4894f4e6b1ca8659f0055a48512d6f466b486723d6291cb38023eb7 1 0xc0010bf780
[INFO] [06-08|20:59:58.166] local tx propagated                      tx_hash=13b895e5945382d0de0014a75d2ee520eabbb6c28cc896768d55c4eb3413384a announced to peers=1 broadcast to peers=1 baseFee=1000000000
[INFO] [06-08|20:59:58.168] Submitted transaction                    hash=0x13b895e5945382d0de0014a75d2ee520eabbb6c28cc896768d55c4eb3413384a from=0x67b1d87101671b127f5f8714789C7192f7ad340e nonce=35 recipient=0x71562b71999873DB5b286dF957af199Ec94617F7 value=0x2d1f
[INFO] [06-08|20:59:58.171] http://localhost:8545#eth_sendRawTransaction time=0.008
[INFO] [06-08|20:59:58.171] [2/15 Headers] Processed                 highest inserted=2 age=12s
[DBUG] [06-08|20:59:58.171] [2/15 Headers] DONE                      in=13.158611s
ReadHeaderNumber 0x0d9b4685d3e20a6e319f8a9afbba073bcc390b8eba044718767349be19dd297d <nil> 0xc0010bfbc0
[INFO] [06-08|21:00:10.595] local tx propagated                      tx_hash=80cf4ea8a0d9d0c7d490914ea4483827ab174476a26c440d0ff5945acc911018 announced to peers=1 broadcast to peers=1 baseFee=1000000000
[INFO] [06-08|21:00:10.601] http://localhost:8545#eth_sendRawTransaction time=12.430
[EROR] [06-08|21:00:10.601] failed SendTransaction                   error="Request: 42, hash: 0x80cf4ea8a0d9d0c7d490914ea4483827ab174476a26c440d0ff5945acc911018, nonce  36: returned a zero transaction hash"
[DBUG] [06-08|21:00:10.602] Start mining new block based on miner.recommit
[DBUG] [06-08|21:00:10.605] [3/15 CumulativeIndex] DONE              in=12.4338955s
WriteHeaderRaw 0x0d9b4685d3e20a6e319f8a9afbba073bcc390b8eba044718767349be19dd297d 2 0xc001887200
WriteHeadHeaderHash 0x0d9b4685d3e20a6e319f8a9afbba073bcc390b8eba044718767349be19dd297d 0xc001887230
[DBUG] [06-08|21:01:36.622] Start mining new block based on miner.recommit
[INFO] [06-08|21:01:36.623] [4/15 BlockHashes] DONE                  in=1m26.0163087s
[INFO] [06-08|21:01:36.644] [1/5 MiningCreateBlock] Start mine       block=2 baseFee=878906250 gasLimit=10019538
ReadHeaderNumber 0xdb39c298f4894f4e6b1ca8659f0055a48512d6f466b486723d6291cb38023eb7 1 0xc0004f41a0
[DBUG] [06-08|21:01:36.645] [1/5 MiningCreateBlock] DONE             in=1.8474ms
[DBUG] [06-08|21:01:36.647] [5/15 Bodies] DONE                       in=23.438ms
[DBUG] [06-08|21:01:47.123] Start mining new block based on miner.recommit
[DBUG] [06-08|21:01:47.126] [txpool] Commit                          written_kb=24 in=1m36.5271432s
[INFO] [06-08|21:01:47.129] [txpool] stat                            pending=0 baseFee=37 queued=0 alloc=29.9MB sys=55.7MB
[DBUG] [06-08|21:01:47.129] [txpool] Commit                          written_kb=4 in=0s
[INFO] [06-08|21:01:47.130] [p2p] GoodPeers                          eth68=1
ReadHeaderNumber 0x0d9b4685d3e20a6e319f8a9afbba073bcc390b8eba044718767349be19dd297d 2 0xc0011e40e0

mh0lt · 2023-06-08T20:17:59Z

All of this goes away with my change above - although clearly it is avoiding the issue rather than fixing it.

AskAlexSharov · 2023-06-09T06:56:45Z

You can see that the read gets a null back - i don't see - where I need look? at which log line?
sorry, don't understand the root of problem yet.

On your screenshot WriteHeadHeaderHash accepting tx as a first parameter - it's write-db-transaction, so RPC handlers must not see "partial writes". If it's somehow happening - we must guarantee "no partial writes visibility" on writer side - rather than do workarounds on reader side (be cause we have hundreds of RPC methods and dozens of db tables - can't workaround all this cases).

mh0lt · 2023-06-09T07:44:29Z

Its on line 7 of the log:

ReadHeaderNumber 0x0d9b4685d3e20a6e319f8a9afbba073bcc390b8eba044718767349be19dd297d 0xc0010bfbc0

The following lines show the error being propagated to the client.

I refactored the code because I think there are actually 2 issues here:

In SendRawTransaction the transaction is actually inserted into the TX pool ok, the error is generated as a side effect of producing the log message. I don't think that an error in logging should cause a change in the return from this function. It actually implies a fail when the transaction insertion was a success. This could lead the client to tryiung again which will create a duplicate transaction.
The partial write issue which you have described.

I agreed with Alexy that for moment to just check in the first fix as it unblocks my testing. The second I think needs more investigation and possibly a greater understanding of the flow than I have at the moment.

I'm happy to look at it if you think I should - however I'd like to of it after checking in some other devnet enhancements and bor fixes which I have in my dev environment which are dependent on SendRawTransaction working consistently - which this fix enables.

AskAlexSharov · 2023-06-09T12:02:59Z

Let’s remove all non-required code. Logging can be less verbose - it’s fine.
ok, let’s unblock you somehow. If you can create some script - to reproduce problem or teach me - i think it will be important - because if it’s true it may be reason of other bugs (I don’t remember any similar bugs).

mh0lt · 2023-06-09T14:58:17Z

I've removed the logging code in the latest check in.

Once I have the bor flow I'm working on I'll create a flow which reproduces the issue. I may be able to suggest a fix myself if not I'll check in a version of the devnet with a scenario which can be debugged.

It will probably be early next week.

mh0lt · 2023-06-10T09:04:44Z

@AskAlexSharov are you ok for me to promote this to ready for review ?

AskAlexSharov

I guess it's ok to remove log line totally. Because we have --txpool.trace.senders flag - which enabling tracing of txs from given senders inside TxPool - which is more useful.

AskAlexSharov · 2023-06-12T09:04:31Z

oookey... seems I understand: Released v2.45.2

refactored SendRawTransaction to mitigate nil return from ReadCurrent…

b4a85fd

…BlockNumber

removed logging

8cc728b

AskAlexSharov approved these changes Jun 11, 2023

View reviewed changes

AskAlexSharov changed the title ~~Refactor SendRawTransaction to mitigate ReadCurrentBlockNumber nil return~~ eth_sendRawTransaction: avoid reading blocks. remove existing logging, because have --txpool.trace.senders which enabling more verbose logging Jun 11, 2023

AskAlexSharov marked this pull request as ready for review June 11, 2023 05:53

AskAlexSharov merged commit f9acbff into devel Jun 11, 2023

AskAlexSharov deleted the send_transaction_refactor branch June 11, 2023 06:18

AskAlexSharov mentioned this pull request Jun 12, 2023

block transaction replay right after block update gives errors #7702

Closed

p-linnane mentioned this pull request Jun 20, 2023

erigon 2.46.0 Homebrew/homebrew-core#134200

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eth_sendRawTransaction: remove logging, because we have `--txpool.trace.senders` which enabling more verbose logging for given senders list #7686

eth_sendRawTransaction: remove logging, because we have `--txpool.trace.senders` which enabling more verbose logging for given senders list #7686

mh0lt commented Jun 8, 2023

AskAlexSharov commented Jun 8, 2023 •

edited

Loading

mh0lt commented Jun 8, 2023 •

edited

Loading

AskAlexSharov commented Jun 8, 2023 •

edited

Loading

mh0lt commented Jun 8, 2023

AskAlexSharov commented Jun 8, 2023

mh0lt commented Jun 8, 2023

mh0lt commented Jun 8, 2023

AskAlexSharov commented Jun 9, 2023 •

edited

Loading

mh0lt commented Jun 9, 2023

AskAlexSharov commented Jun 9, 2023

mh0lt commented Jun 9, 2023

mh0lt commented Jun 10, 2023

AskAlexSharov left a comment

AskAlexSharov commented Jun 12, 2023

eth_sendRawTransaction: remove logging, because we have --txpool.trace.senders which enabling more verbose logging for given senders list #7686

eth_sendRawTransaction: remove logging, because we have --txpool.trace.senders which enabling more verbose logging for given senders list #7686

Conversation

mh0lt commented Jun 8, 2023

AskAlexSharov commented Jun 8, 2023 • edited Loading

mh0lt commented Jun 8, 2023 • edited Loading

AskAlexSharov commented Jun 8, 2023 • edited Loading

mh0lt commented Jun 8, 2023

AskAlexSharov commented Jun 8, 2023

mh0lt commented Jun 8, 2023

mh0lt commented Jun 8, 2023

AskAlexSharov commented Jun 9, 2023 • edited Loading

mh0lt commented Jun 9, 2023

AskAlexSharov commented Jun 9, 2023

mh0lt commented Jun 9, 2023

mh0lt commented Jun 10, 2023

AskAlexSharov left a comment

Choose a reason for hiding this comment

AskAlexSharov commented Jun 12, 2023

eth_sendRawTransaction: remove logging, because we have `--txpool.trace.senders` which enabling more verbose logging for given senders list #7686

eth_sendRawTransaction: remove logging, because we have `--txpool.trace.senders` which enabling more verbose logging for given senders list #7686

AskAlexSharov commented Jun 8, 2023 •

edited

Loading

mh0lt commented Jun 8, 2023 •

edited

Loading

AskAlexSharov commented Jun 8, 2023 •

edited

Loading

AskAlexSharov commented Jun 9, 2023 •

edited

Loading