Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client: add basic TxPool #1176

Merged
merged 20 commits into from
Aug 24, 2021
Merged

Client: add basic TxPool #1176

merged 20 commits into from
Aug 24, 2021

Conversation

holgerd77
Copy link
Member

Ok, this first round was relatively smooth. Transaction retrieval based on the announced hashes is already working, TransactionFactory is also able to create respective tx objects. 😄

@codecov
Copy link

codecov bot commented Mar 30, 2021

Codecov Report

Merging #1176 (130c3f3) into master (5430bf3) will decrease coverage by 0.02%.
The diff coverage is 85.71%.

Impacted file tree graph

Flag Coverage Δ
block 86.76% <ø> (ø)
blockchain 83.49% <ø> (ø)
client 83.75% <85.71%> (+0.08%) ⬆️
common 94.18% <ø> (-0.22%) ⬇️
devp2p 82.86% <ø> (ø)
ethash 86.56% <ø> (ø)
tx 88.24% <ø> (-0.12%) ⬇️
vm 82.78% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

@holgerd77
Copy link
Member Author

Rebased this.

@holgerd77
Copy link
Member Author

holgerd77 commented Aug 11, 2021

Ok, made some progress here. The tx pool now has some basic - hopefully - sufficient data structure, collects and de-duplicates incoming new txs and also (naively, so without checking) works with req IDs from ETH/66.

Tests are currently not passing and I need to fix and also need to write some tests in general. Feel free to give this some early review and/or also continue some work on this branch if you have some strong opinion how things can be improved (then please drop some note before).

This is what the current output looks like:

grafik

Some note: there are reoccuring errors "peer returned no headers for blocks ..." like these:

grafik

I've checked, these have been already present on master though. We should investigate here, this is rather independent from this PR though.

The next step would be to do some tx pool cleanup, my current idea is to do this in the following steps:

  1. Forward new blocks received to the tx pool via a new event SYNC_NEW_BLOCKS or similar, likely triggered within the full synchronizer. It can then be checked if txs are already included in a block and can subsequently be removed from the tx pool.
  2. Not sure how performant this is and if we want to do additionally, but we can do interval-based nonce checks via state reads on all the txs in the pool and remove all where the nonce is too low.

Going forward with 1. will need the tip of the chain behavior from #1132 to be implemented, so we could rather wait on this before continuing or merge here at some point yet without the clean up (but with e.g. some additional simple limitation of the pool size).

Also note that this is only the passive pool behavior, so we are not answering any incoming tx requests yet or broadcasting new txs ourselves. Also - as some TODO note: our own txs submitted via RPC also needs to be added to our own pool so that we include our own txs when we produce our own blocks.

Still not completely sure when the whole tx pool behavior should be activated (directly at the client start or rather when the chain is synced). But at the end this is secondary for this initial implementation, since this can likely be very simply added/adopted by a simple switch.

@holgerd77
Copy link
Member Author

Currently there are some open handles left which prevent the client tests from finishing. I test with this leaked-handles I just found - and which is actually pretty handy 😀 - this is the output:

no of handles 3

tcp stream {
  fd: 20,
  readable: false,
  writable: true,
  address: {},
  serverAddr: null
}

tcp stream {
  fd: 22,
  readable: false,
  writable: true,
  address: {},
  serverAddr: null
}

tcp stream {
  fd: 23,
  readable: true,
  writable: false,
  address: {},
  serverAddr: null
}

Didn't dig any further yet.

@holgerd77 holgerd77 force-pushed the add-tx-pool branch 2 times, most recently from 1032b71 to 220ebd0 Compare August 12, 2021 13:45
@acolytec3
Copy link
Contributor

Some note: there are reoccuring errors "peer returned no headers for blocks ..." like these:

grafik

I've checked, these have been already present on master though. We should investigate here, this is rather independent from this PR though.

On this, it was added here as part of the eth66 PR to catch some responses from peers where there was no body returned with the response to a getBlockHeaders eth message. If it makes sense, we could just simply drop those responses and remove the logging. Happy to open a new issue to track.

The general PR for the txpool looks great! I did a little experimenting along the question of the leaked handles but got basically the same result as you and it's not clear to me out to use the output provided to do useful research. I did notice that if you run npm run test:unit with these changes in place, the tests never finish running and just seems to hang indefinitely so there definitely to be researched here. I'll poke at it a little bit more before moving on.

@holgerd77
Copy link
Member Author

Some note: there are reoccuring errors "peer returned no headers for blocks ..." like these:
grafik
I've checked, these have been already present on master though. We should investigate here, this is rather independent from this PR though.

On this, it was added here as part of the eth66 PR to catch some responses from peers where there was no body returned with the response to a getBlockHeaders eth message. If it makes sense, we could just simply drop those responses and remove the logging. Happy to open a new issue to track.

Ah, thanks for the explanation.

We had this kind of problem already earlier in client development (too verbose output on these kind of connection/data errors.

One good rule of thumb we came up with: if the error (unexpected behavior) is within the scope of our client (the client is doing something which it shouldn't do), use warning or error as a log level. If something happens outside of our scope (if there is some "normal" non-working communication) use debug.

This works reasonably well respectively is a good fit in most cases.

@ryanio
Copy link
Contributor

ryanio commented Aug 18, 2021

just reviewing the code, looks great! should we add a cli option --txpool to enable?

I'm still trying to locate where the process leak is happening but no clues yet, the first culprit is usually the intervals but they seem to be cleared properly as every pool.open() is followed by a close(). by the way those methods are marked as async and awaited properly but there are no async methods actually within them.

as you mentioned on our last call @holgerd77 I think once we are at the tip of the chain we can have some kind of new block notification we can subscribe to in the pool and remove all the newly mined txs. We can use Event.CHAIN_UPDATED and query the block, or to skip the extra db lookup we could make e.g. Event.MINED_TXS with the hashes passed as params that we could quickly iterate through.

@holgerd77
Copy link
Member Author

should we add a cli option --txpool to enable

Not sure, do we need this? Can't this be active by default? 🤔

@ryanio
Copy link
Contributor

ryanio commented Aug 18, 2021

should we add a cli option --txpool to enable

Not sure, do we need this? Can't this be active by default? 🤔

Ah sure, I was thinking in some cases people might not want the tx pool active, but perhaps not. I checked what flags geth has and they don't seem to have an enable/disable either.

Their cli options were an interesting read nonetheless to get a sense of configurability and defaults:

TRANSACTION POOL OPTIONS:
  --txpool.locals value               Comma separated accounts to treat as locals (no flush, priority inclusion)
  --txpool.nolocals                   Disables price exemptions for locally submitted transactions
  --txpool.journal value              Disk journal for local transaction to survive node restarts (default: "transactions.rlp")
  --txpool.rejournal value            Time interval to regenerate the local transaction journal (default: 1h0m0s)
  --txpool.pricelimit value           Minimum gas price limit to enforce for acceptance into the pool (default: 1)
  --txpool.pricebump value            Price bump percentage to replace an already existing transaction (default: 10)
  --txpool.accountslots value         Minimum number of executable transaction slots guaranteed per account (default: 16)
  --txpool.globalslots value          Maximum number of executable transaction slots for all accounts (default: 4096)
  --txpool.accountqueue value         Maximum number of non-executable transaction slots permitted per account (default: 64)
  --txpool.globalqueue value          Maximum number of non-executable transaction slots for all accounts (default: 1024)
  --txpool.lifetime value             Maximum amount of time non-executable transaction are queued (default: 3h0m0s)

I also learned the difference between an executable and non-executable transaction: https://ethereum.stackexchange.com/a/86698

@ryanio
Copy link
Contributor

ryanio commented Aug 19, 2021

I found the hanging test, we needed to stub TxPool in fullethereumservice.spec.ts like the PeerPool was, otherwise it was getting caught on the log listener.

if (txHashes.length === 0) {
return
}
this.config.logger.info(`TxPool: received new pooled hashes number=${txHashes.length}`)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, after finishing this is just nothing we want on the info level I guess since it is too verbose. We might just move to debug.

@holgerd77
Copy link
Member Author

@ryanio cool, nice to see this evolving! 😄

@holgerd77
Copy link
Member Author

Rebased this.

@holgerd77
Copy link
Member Author

Some debug note: hanging tests are in test/service/fullethereumservice.ts, can be reproduced with: npm run tape -- test/service/fullethereumservice.spec.ts.

(It's really tricky that one is often just suspectiving the last tests executed to cause the test run hanging, which is often not the case)

@holgerd77
Copy link
Member Author

@ryanio wanted eventually take this in for the block builder PR he plans to do and do some last additions here along that PR. I will therefore put the PR to the "Needs Review" state, but please let Ryan do the (at least final) review.

Some general note: thought about this for a couple of days and I think I have finally found a practical solution on how and when to start the pool. The pool is now activated when a BLOCKS_BEFORE_TARGET_HEIGHT_ACTIVATION is reached, so when the sync is getting close to the finish line. This allows for a pre-block processing tx pool preparation and clean up and it also eliminates the need to separate between NEW_BLOCK_HASHES and generic historic block retrievals for the tx pool cleaning, since from this pre-sync moment on one can just take all new incoming blocks into account and see if the respective block transactions are included in the pool and should be removed.

This should lead to some descent state of the pool once sync is finished. Then we could do another final nonce check on all left pool txs to have the pool really ready for including txs in a new block.

The BLOCKS_BEFORE_TARGET_HEIGHT_ACTIVATION mechanism also has the advantage that this is relatively flexible. If one wants to see tx pool behavior directly at the beginning of syncing, once can simply add a very high block number there. If we want we can also add this as a CLI option to configure and eventually also add some string shortcut here - e.g. chainstart or similar - a bit analogue to latest in the RPC methods when retrieving blocks e.g..

Ok, so far. 😄

ryanio added 4 commits August 23, 2021 10:30
  * clean up unneeded EventEmitter usage
  * unnest announcedTxHashes logic when length === 0
  * add logger debugs for tx pool requests
// Craft block with tx in pool
block = Block.fromBlockData({ transactions: [txB02] }, { common })
pool.newBlocks([block])
t.equal(pool.pool.size, 0, 'pool size 0')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Cool new tests! 😄

Copy link
Contributor

@ryanio ryanio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! 🚀🚀

@ryanio ryanio merged commit b298caf into master Aug 24, 2021
@ryanio ryanio deleted the add-tx-pool branch August 24, 2021 01:49
await new Promise((resolve) => setTimeout(resolve, 5000))
peer = this.best()
numAttempts += 1
}*/
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, interesting. This can now be reactivated without breaking the tests? Might have accidentally fixed something on a more profound level when reworking some initialization parts of the sync and service tests. 😋

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hehe, yeah I had to add && this.opened to the while loop conditions to fix it hanging during the integration tests, I was pleased to find it as a simple solution :)

@holgerd77
Copy link
Member Author

Pretty cool 🎉! Looking really forward to see this applied! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants