Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cache for recent on-chain transaction hashes to enhance TPS #1734

Open
Qiao-Jin opened this issue Jun 28, 2020 · 18 comments
Open

Add cache for recent on-chain transaction hashes to enhance TPS #1734

Qiao-Jin opened this issue Jun 28, 2020 · 18 comments
Labels
discussion Initial issue state - proposed but not yet accepted

Comments

@Qiao-Jin
Copy link
Contributor

Qiao-Jin commented Jun 28, 2020

Summary or problem description
Currently when Actor Blockchain receives a transaction it will need to check whether this transaction is already onchain:

if (ContainsTransaction(transaction.Hash)) return VerifyResult.AlreadyExists;

This piece of code can be severely time-exhausting when height increases & more and more transactions already onchain. This is because levelDB would need to search over the whole DB on disk for the specified transaction.

Do you have any solution you want to propose?
A cache for recent transactions' hashes can be added to solve the problem for most cases. Say, a cache to store the hashes of most recent onchain transactions (i.e. transactions within most recent N blocks, where N can be some certain figure like 10 or 20). Thus upon receiving a transaction:

(1) Judge whether ValidUntilBlock - Transaction.MaxValidUntilBlockIncrement >= CurrentSnapshot.Height - N

(2) If satisfied, check whether the specified transaction is within MemPool or transaction hash cache. If neither, the transaction is not yet onchain and can be added to MemPool

(3) Otherwise, besides steps in (2), we need to check levelDB whether this transaction is already onchain.

Note: N should be specified so that in most cases new coming transactions suit condition (2). On the other hand, it should not be too big for the sake of ram-saving. Structure of transaction hash cache should be convinient for updating along with height increasing.

Neo Version

  • Neo 3
@Qiao-Jin Qiao-Jin added the discussion Initial issue state - proposed but not yet accepted label Jun 28, 2020
@Qiao-Jin Qiao-Jin changed the title Add cache for recent on-chain transactions to enhance TPS Add cache for recent on-chain transaction hashes to enhance TPS Jun 28, 2020
@Qiao-Jin
Copy link
Contributor Author

Qiao-Jin commented Jun 29, 2020

Env:
one consensus node + one non-consensus node which sends transactions to the former
win10 system + 12-core computer
N = 10

image

@shargon
Copy link
Member

shargon commented Jun 29, 2020

Did you try with a stored bloom filter?

@Qiao-Jin
Copy link
Contributor Author

Did you try with a stored bloom filter?

I tried with a 100 bit-per-key bloom filter and got the following result:

image

TPS with bloom filter acts better than case without transaction hash cashe. It acts as well as case with transaction hash cache at first and then drop bit by bit. The following figure can clearly show the reason:

image

It can be observed that time used in func Blockchain.ContainsTransaction varies severely in two cases. Case with bloom filter exhausts much more time than case with transaction hash cashe for same or less transactions per block, especially when more and more transactions get on chain, and the overall time used in func Blockchain.OnNewTransaction actually reaches 15s in a block period... So bloom filter is not able to help free func Blockchain.OnNewTransaction from bottleneck status, but transaction hash cache is.

@shargon
Copy link
Member

shargon commented Jun 29, 2020

How big it's your cache? It's a FIFO cache?

@roman-khimov
Copy link
Contributor

This is because levelDB would need

Have you tried testing against RocksDB?

(1) Judge whether ValidUntilBlock - Transaction.MaxValidUntilBlockIncrement >= CurrentSnapshot.Height - N
(2) If satisfied, check whether the specified transaction is within MemPool or transaction hash cache. If neither, the transaction is not yet onchain and can be added to MemPool
(3) Otherwise, besides steps in (2), we need to check levelDB whether this transaction is already onchain.

IIUC this only improves anything for the case when you're actively sending duplicate transactions to the node (because you have to check th DB at step 3 anyway). And you're doing it constantly, even after the transaction was already added into the block. That's probably not something I would expect of a real network and even less so for a synthetic test which makes me ask what is transactions mix you're sending to the node?

For synthetic test I would expect sending unique transactions (at least that's what https://github.com/nspcc-dev/neo-bench/ does) and this problem wouldn't exist at all when testing against solo node. For multiple CNs as well as for a real network I would expect some duplicate invs being received by the node while the transaction propagates through the network (and even less tx), but almost all of them would happen while the transaction is in the mempool of the node where checking for existence costs almost nothing and where the check has to be done anyway.

So I'm wondering if this test is really representative and we're not trying to optimize some rare corner-case.

@Qiao-Jin
Copy link
Contributor Author

Qiao-Jin commented Jun 30, 2020

Have you tried testing against RocksDB?

Have not, will test later.

IIUC this only improves anything for the case when you're actively sending duplicate transactions to the node (because you have to check th DB at step 3 anyway)

Why? For transactions fulfilling step (2) we don't need to check step (3)

And you're doing it constantly, even after the transaction was already added into the block. That's probably not something I would expect of a real network and even less so for a synthetic test which makes me ask what is transactions mix you're sending to the node?

Actually in my test case there are no duplicate transactions. Maybe you mistake something. And actually non-duplicate transactions are the ones that need to be checked firstly in mempool and then leveldb as they have not been seen before, and checking leveldb requires disk reading & is time consuming. THAT IS WHY FUNC CONTAINSTRANSACTION COST TIME AND THE REASON I WANT TO POST THIS DISCUSSION.

but almost all of them would happen while the transaction is in the mempool of the node where checking for existence costs almost nothing and where the check has to be done anyway.

Once more you must mistake something, there are no duplicate transactions in my test case.

@Qiao-Jin
Copy link
Contributor Author

Qiao-Jin commented Jun 30, 2020

How big it's your cache? It's a FIFO cache?

It's some data structure similar to HashSetCache but without bucketCapacity. It contains N entries where each entry contains transaction hashes of a certain history block. This data will be refreshed opun persisting block, dropping the oldest hash set & creating a new one for the persisting block.

The cashe's size depends upon transaction stream, i.e suppose there are averagely 200000 transactions in a block and N = 10, so the cache's biggest size will be 200000 * 10 * 32 = 61Mb

@roman-khimov
Copy link
Contributor

For transactions fulfilling step (2) we don't need to check step (3)

OK, my initial thought was that it's a fail-fast route basically, but upon reviewing the condition number 1 more carefully I see that it's not and it might improve something for the general case.

However, I should note that it's only effective if all transactions use MaxValidUntilBlockIncrement to set their ValidUntilBlock and if you're generating them on the fly, so that new transactions are made against ever increasing current height of the chain. And it's a bit problematic from test point of view because generating (signing) transactions is quite resource-intensive task and doing it on the fly may affect the benchmark result. That's why in neo-bench we generate all test transactions before starting the test, but obviously they all then have the same ValidUntilBlock value which means that this optimization would become ineffective after N blocks. Maybe it's not a big deal as we should be targeting more real life use cases and not synthetic benchmarks, but still something to consider.

@Qiao-Jin
Copy link
Contributor Author

Qiao-Jin commented Jun 30, 2020

That's why in neo-bench we generate all test transactions before starting the test

That's also what I did for my test case, including signing all tx before sending them.

obviously they all then have the same ValidUntilBlock value which means that this optimization would become ineffective after N blocks

My transactions are spreaded over different ValidUntilBlock.

However, I should note that it's only effective if all transactions use MaxValidUntilBlockIncrement to set their ValidUntilBlock

Surely ppl can provide a wrong ValidUntilBlock in their transaction as transaction is man-made. I have some thoughts on this problem as follows.

For example current height is 180 and N = 10, and we calculate Height when tx is sent = ValidUntilBlock - Transaction.MaxValidUntilBlockIncrement. So for transactions that are made after height 170 can be checked by cache and the others must be checked in leveldb. So there can be 2 kinds of wrong ValidUntilBlock:

(1) Higher ValidUntilBlock, i.e. a transaction is made in height 100 but its sender provide a higher ValidUntilBlock to make consensus node think that this transaction is made in height 176. For such occasions we can add a variable in class Transaction: PreviousBlockHash, which stands for the the hash of previous block when the transaction is sent. So for such occasion, even though the transaction pretends to be made in height 176, it cannot provide the correct hash of block 175 and can be dropped.

(2) Lower ValidUntilBlock, i.e. a transaction is made in height 176 but its sender provide a lower ValidUntilBlock to make consensus node think that this transaction is made in height 100. Such occasions can occur when some malfactors want to waste consensus nodes' capacity of calculation. For such occasions we can demand more gas fee for transactions that are not within N blocks.

@vncoelho
Copy link
Member

vncoelho commented Jul 2, 2020

@Qiao-Jin, I like your points, I think that, for such tool you are proposing, link to the previous hash might be a need.
We need to analyse the benefits of this proposal and an extra field.

Since it a local optimization, why do not we just assign a local variable to the TX when the node receives it, saying that it was received at height x and then each node has its own value. Based on this value it uses the cache or not, if the transaction is too much old it will need to directly such on the Leveldb, right?

@Qiao-Jin
Copy link
Contributor Author

Qiao-Jin commented Jul 2, 2020

@Qiao-Jin, I like your points, I think that, for such tool you are proposing, link to the previous hash might be a need.
We need to analyse the benefits of this proposal and an extra field.

Since it a local optimization, why do not we just assign a local variable to the TX when the node receives it, saying that it was received at height x and then each node has its own value. Based on this value it uses the cache or not, if the transaction is too much old it will need to directly such on the Leveldb, right?

Yes we can assign such local variable, but I think there can be a bug as follows, if I am not misundestanding:

(1) Firstly I send a transaction and it's onchain in height 170

(2) As I know N = 10 I will resend the same transaction in height i.e. 190. At this time this transaction will neither be in mempool nor in hash cache and will be regarded as not seen before, and will be onchain again...

Please correct me if I'm wrong 😝

@vncoelho
Copy link
Member

vncoelho commented Jul 2, 2020

I think it will be invalid during verification.

@Qiao-Jin
Copy link
Contributor Author

Qiao-Jin commented Jul 3, 2020

I think it will be invalid during verification.

I don't think we check duplicate hashes in tx verification.. And yes, we record know hashes in remotenode, which might be useful for checking... But once connection is lost or node restarted this will lose effect.. So still there is chance for duplicate tx onchain

@Qiao-Jin
Copy link
Contributor Author

Any advise?

@shargon
Copy link
Member

shargon commented Aug 26, 2020

What about replace

if (MemPool.ContainsKey(hash)) return true;
with
if (KnownHashes.ContainsKey(hash)) return true;

It is a cache and it is updated

@Qiao-Jin
Copy link
Contributor Author

What about replace

if (MemPool.ContainsKey(hash)) return true;
with
if (KnownHashes.ContainsKey(hash)) return true;

It is a cache and it is updated

In such case every new coming transaction will still need to be checked in levelDB as they have not been seen before, even though they are created just now. Besides a hash cache we need to judge the height where the transaction is created to determine, whether to search in levelDB if it's not in knowhashes. Only by so time is saved.

@Qiao-Jin
Copy link
Contributor Author

Qiao-Jin commented Sep 2, 2020

Lower ValidUntilBlock, i.e. a transaction is made in height 176 but its sender provide a lower ValidUntilBlock to make consensus node think that this transaction is made in height 100. Such occasions can occur when some malfactors want to waste consensus nodes' capacity of calculation. For such occasions we can demand more gas fee for transactions that are not within N blocks.

Another method is to use node health for such cases. Nodes that keeps sending transactions in low height (i.e. < Height - 20, this is just an example and we should make certain of the value after careful observation, and use the same value for cached height) should be banned for some period. And we should tell users about this rule.

@erikzhang
Copy link
Member

erikzhang commented Dec 28, 2020

I think this can solve the problem.

https://www.eecs.harvard.edu/~michaelm/postscripts/aller2006.pdf

This paper describes dynamic bit reassignment, an approach
that allows the size of the fingerprint to flexibly change with
the load in each hash bucket, thereby reducing the probability
of a false positive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Initial issue state - proposed but not yet accepted
Projects
None yet
5 participants