Add cache for recent on-chain transaction hashes to enhance TPS #1734

Qiao-Jin · 2020-06-28T10:08:38Z

Summary or problem description
Currently when Actor Blockchain receives a transaction it will need to check whether this transaction is already onchain:

neo/src/neo/Ledger/Blockchain.cs

Line 357 in ae06696

if (ContainsTransaction(transaction.Hash)) return VerifyResult.AlreadyExists;

This piece of code can be severely time-exhausting when height increases & more and more transactions already onchain. This is because levelDB would need to search over the whole DB on disk for the specified transaction.

Do you have any solution you want to propose?
A cache for recent transactions' hashes can be added to solve the problem for most cases. Say, a cache to store the hashes of most recent onchain transactions (i.e. transactions within most recent N blocks, where N can be some certain figure like 10 or 20). Thus upon receiving a transaction:

(1) Judge whether ValidUntilBlock - Transaction.MaxValidUntilBlockIncrement >= CurrentSnapshot.Height - N

(2) If satisfied, check whether the specified transaction is within MemPool or transaction hash cache. If neither, the transaction is not yet onchain and can be added to MemPool

(3) Otherwise, besides steps in (2), we need to check levelDB whether this transaction is already onchain.

Note: N should be specified so that in most cases new coming transactions suit condition (2). On the other hand, it should not be too big for the sake of ram-saving. Structure of transaction hash cache should be convinient for updating along with height increasing.

Neo Version

Neo 3

Qiao-Jin · 2020-06-29T06:35:44Z

Env:
one consensus node + one non-consensus node which sends transactions to the former
win10 system + 12-core computer
N = 10

shargon · 2020-06-29T08:13:48Z

Did you try with a stored bloom filter?

Qiao-Jin · 2020-06-29T09:47:35Z

Did you try with a stored bloom filter?

I tried with a 100 bit-per-key bloom filter and got the following result:

TPS with bloom filter acts better than case without transaction hash cashe. It acts as well as case with transaction hash cache at first and then drop bit by bit. The following figure can clearly show the reason:

It can be observed that time used in func Blockchain.ContainsTransaction varies severely in two cases. Case with bloom filter exhausts much more time than case with transaction hash cashe for same or less transactions per block, especially when more and more transactions get on chain, and the overall time used in func Blockchain.OnNewTransaction actually reaches 15s in a block period... So bloom filter is not able to help free func Blockchain.OnNewTransaction from bottleneck status, but transaction hash cache is.

shargon · 2020-06-29T10:34:43Z

How big it's your cache? It's a FIFO cache?

roman-khimov · 2020-06-29T19:15:20Z

This is because levelDB would need

Have you tried testing against RocksDB?

(1) Judge whether ValidUntilBlock - Transaction.MaxValidUntilBlockIncrement >= CurrentSnapshot.Height - N
(2) If satisfied, check whether the specified transaction is within MemPool or transaction hash cache. If neither, the transaction is not yet onchain and can be added to MemPool
(3) Otherwise, besides steps in (2), we need to check levelDB whether this transaction is already onchain.

IIUC this only improves anything for the case when you're actively sending duplicate transactions to the node (because you have to check th DB at step 3 anyway). And you're doing it constantly, even after the transaction was already added into the block. That's probably not something I would expect of a real network and even less so for a synthetic test which makes me ask what is transactions mix you're sending to the node?

For synthetic test I would expect sending unique transactions (at least that's what https://github.com/nspcc-dev/neo-bench/ does) and this problem wouldn't exist at all when testing against solo node. For multiple CNs as well as for a real network I would expect some duplicate invs being received by the node while the transaction propagates through the network (and even less tx), but almost all of them would happen while the transaction is in the mempool of the node where checking for existence costs almost nothing and where the check has to be done anyway.

So I'm wondering if this test is really representative and we're not trying to optimize some rare corner-case.

Qiao-Jin · 2020-06-30T06:54:51Z

Have you tried testing against RocksDB?

Have not, will test later.

IIUC this only improves anything for the case when you're actively sending duplicate transactions to the node (because you have to check th DB at step 3 anyway)

Why? For transactions fulfilling step (2) we don't need to check step (3)

And you're doing it constantly, even after the transaction was already added into the block. That's probably not something I would expect of a real network and even less so for a synthetic test which makes me ask what is transactions mix you're sending to the node?

Actually in my test case there are no duplicate transactions. Maybe you mistake something. And actually non-duplicate transactions are the ones that need to be checked firstly in mempool and then leveldb as they have not been seen before, and checking leveldb requires disk reading & is time consuming. THAT IS WHY FUNC CONTAINSTRANSACTION COST TIME AND THE REASON I WANT TO POST THIS DISCUSSION.

but almost all of them would happen while the transaction is in the mempool of the node where checking for existence costs almost nothing and where the check has to be done anyway.

Once more you must mistake something, there are no duplicate transactions in my test case.

Qiao-Jin · 2020-06-30T07:01:24Z

How big it's your cache? It's a FIFO cache?

It's some data structure similar to HashSetCache but without bucketCapacity. It contains N entries where each entry contains transaction hashes of a certain history block. This data will be refreshed opun persisting block, dropping the oldest hash set & creating a new one for the persisting block.

The cashe's size depends upon transaction stream, i.e suppose there are averagely 200000 transactions in a block and N = 10, so the cache's biggest size will be 200000 * 10 * 32 = 61Mb

roman-khimov · 2020-06-30T08:23:45Z

For transactions fulfilling step (2) we don't need to check step (3)

OK, my initial thought was that it's a fail-fast route basically, but upon reviewing the condition number 1 more carefully I see that it's not and it might improve something for the general case.

However, I should note that it's only effective if all transactions use MaxValidUntilBlockIncrement to set their ValidUntilBlock and if you're generating them on the fly, so that new transactions are made against ever increasing current height of the chain. And it's a bit problematic from test point of view because generating (signing) transactions is quite resource-intensive task and doing it on the fly may affect the benchmark result. That's why in neo-bench we generate all test transactions before starting the test, but obviously they all then have the same ValidUntilBlock value which means that this optimization would become ineffective after N blocks. Maybe it's not a big deal as we should be targeting more real life use cases and not synthetic benchmarks, but still something to consider.

Qiao-Jin · 2020-06-30T08:50:59Z

That's why in neo-bench we generate all test transactions before starting the test

That's also what I did for my test case, including signing all tx before sending them.

obviously they all then have the same ValidUntilBlock value which means that this optimization would become ineffective after N blocks

My transactions are spreaded over different ValidUntilBlock.

However, I should note that it's only effective if all transactions use MaxValidUntilBlockIncrement to set their ValidUntilBlock

Surely ppl can provide a wrong ValidUntilBlock in their transaction as transaction is man-made. I have some thoughts on this problem as follows.

For example current height is 180 and N = 10, and we calculate Height when tx is sent = ValidUntilBlock - Transaction.MaxValidUntilBlockIncrement. So for transactions that are made after height 170 can be checked by cache and the others must be checked in leveldb. So there can be 2 kinds of wrong ValidUntilBlock:

(1) Higher ValidUntilBlock, i.e. a transaction is made in height 100 but its sender provide a higher ValidUntilBlock to make consensus node think that this transaction is made in height 176. For such occasions we can add a variable in class Transaction: PreviousBlockHash, which stands for the the hash of previous block when the transaction is sent. So for such occasion, even though the transaction pretends to be made in height 176, it cannot provide the correct hash of block 175 and can be dropped.

(2) Lower ValidUntilBlock, i.e. a transaction is made in height 176 but its sender provide a lower ValidUntilBlock to make consensus node think that this transaction is made in height 100. Such occasions can occur when some malfactors want to waste consensus nodes' capacity of calculation. For such occasions we can demand more gas fee for transactions that are not within N blocks.

vncoelho · 2020-07-02T00:26:00Z

@Qiao-Jin, I like your points, I think that, for such tool you are proposing, link to the previous hash might be a need.
We need to analyse the benefits of this proposal and an extra field.

Since it a local optimization, why do not we just assign a local variable to the TX when the node receives it, saying that it was received at height x and then each node has its own value. Based on this value it uses the cache or not, if the transaction is too much old it will need to directly such on the Leveldb, right?

Qiao-Jin · 2020-07-02T02:56:03Z

@Qiao-Jin, I like your points, I think that, for such tool you are proposing, link to the previous hash might be a need.
We need to analyse the benefits of this proposal and an extra field.

Since it a local optimization, why do not we just assign a local variable to the TX when the node receives it, saying that it was received at height x and then each node has its own value. Based on this value it uses the cache or not, if the transaction is too much old it will need to directly such on the Leveldb, right?

Yes we can assign such local variable, but I think there can be a bug as follows, if I am not misundestanding:

(1) Firstly I send a transaction and it's onchain in height 170

(2) As I know N = 10 I will resend the same transaction in height i.e. 190. At this time this transaction will neither be in mempool nor in hash cache and will be regarded as not seen before, and will be onchain again...

Please correct me if I'm wrong 😝

vncoelho · 2020-07-02T03:04:22Z

I think it will be invalid during verification.

Qiao-Jin · 2020-07-03T03:23:17Z

I think it will be invalid during verification.

I don't think we check duplicate hashes in tx verification.. And yes, we record know hashes in remotenode, which might be useful for checking... But once connection is lost or node restarted this will lose effect.. So still there is chance for duplicate tx onchain

Qiao-Jin · 2020-08-26T10:47:14Z

Any advise?

shargon · 2020-08-26T11:12:49Z

What about replace

if (MemPool.ContainsKey(hash)) return true;
with
if (KnownHashes.ContainsKey(hash)) return true;

It is a cache and it is updated

Qiao-Jin · 2020-08-27T04:06:39Z

What about replace

if (MemPool.ContainsKey(hash)) return true;
with
if (KnownHashes.ContainsKey(hash)) return true;

It is a cache and it is updated

In such case every new coming transaction will still need to be checked in levelDB as they have not been seen before, even though they are created just now. Besides a hash cache we need to judge the height where the transaction is created to determine, whether to search in levelDB if it's not in knowhashes. Only by so time is saved.

Qiao-Jin · 2020-09-02T06:30:38Z

Lower ValidUntilBlock, i.e. a transaction is made in height 176 but its sender provide a lower ValidUntilBlock to make consensus node think that this transaction is made in height 100. Such occasions can occur when some malfactors want to waste consensus nodes' capacity of calculation. For such occasions we can demand more gas fee for transactions that are not within N blocks.

Another method is to use node health for such cases. Nodes that keeps sending transactions in low height (i.e. < Height - 20, this is just an example and we should make certain of the value after careful observation, and use the same value for cached height) should be banned for some period. And we should tell users about this rule.

erikzhang · 2020-12-28T02:52:18Z

I think this can solve the problem.

https://www.eecs.harvard.edu/~michaelm/postscripts/aller2006.pdf

This paper describes dynamic bit reassignment, an approach
that allows the size of the fingerprint to flexibly change with
the load in each hash bucket, thereby reducing the probability
of a false positive.

Qiao-Jin added the discussion Initial issue state - proposed but not yet accepted label Jun 28, 2020

Qiao-Jin changed the title ~~Add cache for recent on-chain transactions to enhance TPS~~ Add cache for recent on-chain transaction hashes to enhance TPS Jun 28, 2020

Qiao-Jin mentioned this issue Jun 28, 2020

Akka remote proved to be much more faster than current tcp protocol #1718

Closed

Qiao-Jin mentioned this issue Aug 28, 2020

Add cache for recent on chain transaction hashes #1886

Closed

erikzhang mentioned this issue Sep 6, 2020

Add transaction cache #1910

Closed

Qiao-Jin mentioned this issue Sep 23, 2021

Add an auto-clean bloom filter #2606

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cache for recent on-chain transaction hashes to enhance TPS #1734

Add cache for recent on-chain transaction hashes to enhance TPS #1734

Qiao-Jin commented Jun 28, 2020 •

edited

Loading

Qiao-Jin commented Jun 29, 2020 •

edited

Loading

shargon commented Jun 29, 2020

Qiao-Jin commented Jun 29, 2020

shargon commented Jun 29, 2020

roman-khimov commented Jun 29, 2020

Qiao-Jin commented Jun 30, 2020 •

edited

Loading

Qiao-Jin commented Jun 30, 2020 •

edited

Loading

roman-khimov commented Jun 30, 2020

Qiao-Jin commented Jun 30, 2020 •

edited

Loading

vncoelho commented Jul 2, 2020

Qiao-Jin commented Jul 2, 2020

vncoelho commented Jul 2, 2020

Qiao-Jin commented Jul 3, 2020 •

edited

Loading

Qiao-Jin commented Aug 26, 2020

shargon commented Aug 26, 2020 •

edited

Loading

Qiao-Jin commented Aug 27, 2020

Qiao-Jin commented Sep 2, 2020 •

edited

Loading

erikzhang commented Dec 28, 2020 •

edited

Loading

Add cache for recent on-chain transaction hashes to enhance TPS #1734

Add cache for recent on-chain transaction hashes to enhance TPS #1734

Comments

Qiao-Jin commented Jun 28, 2020 • edited Loading

Qiao-Jin commented Jun 29, 2020 • edited Loading

shargon commented Jun 29, 2020

Qiao-Jin commented Jun 29, 2020

shargon commented Jun 29, 2020

roman-khimov commented Jun 29, 2020

Qiao-Jin commented Jun 30, 2020 • edited Loading

Qiao-Jin commented Jun 30, 2020 • edited Loading

roman-khimov commented Jun 30, 2020

Qiao-Jin commented Jun 30, 2020 • edited Loading

vncoelho commented Jul 2, 2020

Qiao-Jin commented Jul 2, 2020

vncoelho commented Jul 2, 2020

Qiao-Jin commented Jul 3, 2020 • edited Loading

Qiao-Jin commented Aug 26, 2020

shargon commented Aug 26, 2020 • edited Loading

Qiao-Jin commented Aug 27, 2020

Qiao-Jin commented Sep 2, 2020 • edited Loading

erikzhang commented Dec 28, 2020 • edited Loading

Qiao-Jin commented Jun 28, 2020 •

edited

Loading

Qiao-Jin commented Jun 29, 2020 •

edited

Loading

Qiao-Jin commented Jun 30, 2020 •

edited

Loading

Qiao-Jin commented Jun 30, 2020 •

edited

Loading

Qiao-Jin commented Jun 30, 2020 •

edited

Loading

Qiao-Jin commented Jul 3, 2020 •

edited

Loading

shargon commented Aug 26, 2020 •

edited

Loading

Qiao-Jin commented Sep 2, 2020 •

edited

Loading

erikzhang commented Dec 28, 2020 •

edited

Loading