-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eth_sendRawTransaction: remove logging, because we have --txpool.trace.senders
which enabling more verbose logging for given senders list
#7686
Conversation
no, it’s not true. We using db transactions and “all or nothing” will be committed at once. DB is never in this state: except initialSync - which is fine. In the past Erigon also had corner-case at firstSyncCycle (after start) - but it’s fixed by: #7532 so, on tip of chain now - all block changes are done in 1 ACID transaction. |
So that change is in the codebase - I presume. If that is the case I think that I may have a different issue - because when I run Bor consensus in the devnet test I am doing I definitely see it several times during the test run of 200 transactions and it is not necessarily in the first block. I put some local logging in to test and could see the transactions process interleaved. If I revert my change - which I have locally I can do a run and give you more detail as to how it happens - as I get it almost every time I run. I think that the WriteHeadHeaderHash originates in startHandlingForkChoice |
@mh0lt startHandlingForkChoice is part of "staged sync" and rw transaction is managed in StageLoopStep (which run all stages - including unwind - in 1 rwtx). |
Apologies - I scrolled up too far in the code. I think its more likely: HeadersPOW. I'll reproduce the test and confirm. Will be in a couple of hours as I'm just looking at a transaction propagation issue. |
thank you |
Attached is the debug location of the write location and the full run log: The pattern which is repeated is as follows: You can see that the read gets a null back, and that there is an interspersed write operation, with a read which is working on a previous context, I don't think the print order is particularly significant (I used println rather than a log) as its enough to see what is going on. There didn't seem to be a clear way of printing a transaction id so I've just used the address. Which may be not correct - but it seems like all of the prints are using a different transaction (though this could be wrong)
|
All of this goes away with my change above - although clearly it is avoiding the issue rather than fixing it. |
On your screenshot |
Its on line 7 of the log: ReadHeaderNumber 0x0d9b4685d3e20a6e319f8a9afbba073bcc390b8eba044718767349be19dd297d 0xc0010bfbc0 The following lines show the error being propagated to the client. I refactored the code because I think there are actually 2 issues here:
I agreed with Alexy that for moment to just check in the first fix as it unblocks my testing. The second I think needs more investigation and possibly a greater understanding of the flow than I have at the moment. I'm happy to look at it if you think I should - however I'd like to of it after checking in some other devnet enhancements and bor fixes which I have in my dev environment which are dependent on SendRawTransaction working consistently - which this fix enables. |
|
I've removed the logging code in the latest check in. Once I have the bor flow I'm working on I'll create a flow which reproduces the issue. I may be able to suggest a fix myself if not I'll check in a version of the devnet with a scenario which can be debugged. It will probably be early next week. |
@AskAlexSharov are you ok for me to promote this to ready for review ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's ok to remove log line totally. Because we have --txpool.trace.senders
flag - which enabling tracing of txs from given senders inside TxPool - which is more useful.
--txpool.trace.senders
which enabling more verbose logging
--txpool.trace.senders
which enabling more verbose logging--txpool.trace.senders
which enabling more verbose logging for given senders list
oookey... seems I understand: Released |
When testing with
Bor
consensus turned on I discovered thatSendRawTransaction
returns a 0x000... hash when transactions are submitted during block transitions. This turns out to be spurious in the sense that the transaction insertion is successful.The cause is that
ReadCurrentBlockNumber
returns a nil block number.This in turn is caused by the following: In
accessors_chain.go
there are two methods:WriteHeader
andWriteHeadHeaderHash
when the first is called the block number is written for the header. The second writes the header has, but there is no guarantee when it does that the head header will have been written yet. In fact it seems to happen sometime later.The problem for
SendRawTransation
is that it begins a transaction after inserting into the txpool. And depending on timing this transaction may see only theWriteHeadHeaderHash
insertion, and hence can't read the block number.I have mitigated this by opening the db transaction before calling the tx pool insertion, meaning that it is more likely to have a clean view of the DB.
I have also moved the chain id check earlier in the code - as I think that if this is invalid the method should not try to insert transactions in the first place.
The
ReadCurrentBlockNumber
is only used to produce a log message - so I've changed this to not fail the whole function but to just log an unknown sender. Which means that the hash is still returned to the sender after a successful txpool insertion