-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cache for recent on-chain transaction hashes to enhance TPS #1734
Comments
Did you try with a stored bloom filter? |
I tried with a 100 bit-per-key bloom filter and got the following result: TPS with bloom filter acts better than case without transaction hash cashe. It acts as well as case with transaction hash cache at first and then drop bit by bit. The following figure can clearly show the reason: It can be observed that time used in func Blockchain.ContainsTransaction varies severely in two cases. Case with bloom filter exhausts much more time than case with transaction hash cashe for same or less transactions per block, especially when more and more transactions get on chain, and the overall time used in func Blockchain.OnNewTransaction actually reaches 15s in a block period... So bloom filter is not able to help free func Blockchain.OnNewTransaction from bottleneck status, but transaction hash cache is. |
How big it's your cache? It's a FIFO cache? |
Have you tried testing against RocksDB?
IIUC this only improves anything for the case when you're actively sending duplicate transactions to the node (because you have to check th DB at step 3 anyway). And you're doing it constantly, even after the transaction was already added into the block. That's probably not something I would expect of a real network and even less so for a synthetic test which makes me ask what is transactions mix you're sending to the node? For synthetic test I would expect sending unique transactions (at least that's what https://github.com/nspcc-dev/neo-bench/ does) and this problem wouldn't exist at all when testing against solo node. For multiple CNs as well as for a real network I would expect some duplicate So I'm wondering if this test is really representative and we're not trying to optimize some rare corner-case. |
Have not, will test later.
Why? For transactions fulfilling step (2) we don't need to check step (3)
Actually in my test case there are no duplicate transactions. Maybe you mistake something. And actually non-duplicate transactions are the ones that need to be checked firstly in mempool and then leveldb as they have not been seen before, and checking leveldb requires disk reading & is time consuming. THAT IS WHY FUNC CONTAINSTRANSACTION COST TIME AND THE REASON I WANT TO POST THIS DISCUSSION.
Once more you must mistake something, there are no duplicate transactions in my test case. |
It's some data structure similar to HashSetCache but without bucketCapacity. It contains N entries where each entry contains transaction hashes of a certain history block. This data will be refreshed opun persisting block, dropping the oldest hash set & creating a new one for the persisting block. The cashe's size depends upon transaction stream, i.e suppose there are averagely 200000 transactions in a block and N = 10, so the cache's biggest size will be 200000 * 10 * 32 = 61Mb |
OK, my initial thought was that it's a fail-fast route basically, but upon reviewing the condition number 1 more carefully I see that it's not and it might improve something for the general case. However, I should note that it's only effective if all transactions use |
That's also what I did for my test case, including signing all tx before sending them.
My transactions are spreaded over different ValidUntilBlock.
Surely ppl can provide a wrong ValidUntilBlock in their transaction as transaction is man-made. I have some thoughts on this problem as follows. For example current height is 180 and N = 10, and we calculate Height when tx is sent = ValidUntilBlock - Transaction.MaxValidUntilBlockIncrement. So for transactions that are made after height 170 can be checked by cache and the others must be checked in leveldb. So there can be 2 kinds of wrong ValidUntilBlock: (1) Higher ValidUntilBlock, i.e. a transaction is made in height 100 but its sender provide a higher ValidUntilBlock to make consensus node think that this transaction is made in height 176. For such occasions we can add a variable in class Transaction: PreviousBlockHash, which stands for the the hash of previous block when the transaction is sent. So for such occasion, even though the transaction pretends to be made in height 176, it cannot provide the correct hash of block 175 and can be dropped. (2) Lower ValidUntilBlock, i.e. a transaction is made in height 176 but its sender provide a lower ValidUntilBlock to make consensus node think that this transaction is made in height 100. Such occasions can occur when some malfactors want to waste consensus nodes' capacity of calculation. For such occasions we can demand more gas fee for transactions that are not within N blocks. |
@Qiao-Jin, I like your points, I think that, for such tool you are proposing, link to the previous hash might be a need. Since it a local optimization, why do not we just assign a local variable to the TX when the node receives it, saying that it was received at |
Yes we can assign such local variable, but I think there can be a bug as follows, if I am not misundestanding: (1) Firstly I send a transaction and it's onchain in height 170 (2) As I know N = 10 I will resend the same transaction in height i.e. 190. At this time this transaction will neither be in mempool nor in hash cache and will be regarded as not seen before, and will be onchain again... Please correct me if I'm wrong 😝 |
I think it will be invalid during verification. |
I don't think we check duplicate hashes in tx verification.. And yes, we record know hashes in remotenode, which might be useful for checking... But once connection is lost or node restarted this will lose effect.. So still there is chance for duplicate tx onchain |
Any advise? |
What about replace
It is a cache and it is updated |
In such case every new coming transaction will still need to be checked in levelDB as they have not been seen before, even though they are created just now. Besides a hash cache we need to judge the height where the transaction is created to determine, whether to search in levelDB if it's not in knowhashes. Only by so time is saved. |
Another method is to use node health for such cases. Nodes that keeps sending transactions in low height (i.e. < Height - 20, this is just an example and we should make certain of the value after careful observation, and use the same value for cached height) should be banned for some period. And we should tell users about this rule. |
I think this can solve the problem. https://www.eecs.harvard.edu/~michaelm/postscripts/aller2006.pdf
|
Summary or problem description
Currently when Actor Blockchain receives a transaction it will need to check whether this transaction is already onchain:
neo/src/neo/Ledger/Blockchain.cs
Line 357 in ae06696
This piece of code can be severely time-exhausting when height increases & more and more transactions already onchain. This is because levelDB would need to search over the whole DB on disk for the specified transaction.
Do you have any solution you want to propose?
A cache for recent transactions' hashes can be added to solve the problem for most cases. Say, a cache to store the hashes of most recent onchain transactions (i.e. transactions within most recent N blocks, where N can be some certain figure like 10 or 20). Thus upon receiving a transaction:
(1) Judge whether ValidUntilBlock - Transaction.MaxValidUntilBlockIncrement >= CurrentSnapshot.Height - N
(2) If satisfied, check whether the specified transaction is within MemPool or transaction hash cache. If neither, the transaction is not yet onchain and can be added to MemPool
(3) Otherwise, besides steps in (2), we need to check levelDB whether this transaction is already onchain.
Note: N should be specified so that in most cases new coming transactions suit condition (2). On the other hand, it should not be too big for the sake of ram-saving. Structure of transaction hash cache should be convinient for updating along with height increasing.
Neo Version
The text was updated successfully, but these errors were encountered: