Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cache for recent on chain transaction hashes #1886

Conversation

Qiao-Jin
Copy link
Contributor

@Qiao-Jin Qiao-Jin commented Aug 28, 2020

Partially Close #1734.

In this PR a new feature, UInt256 LastBlockHash is added to class Transaction, to work together with uint ValidUntilBlock to prove that transaction sender provides an honest ValidUntilBlock. So that other people can be sure that this Transaction must be created no earlier than Height = ValidUntilBlock - MaxValidUntilBlockIncrement.

We can use this feature to judge whether we need to search this transaction only in cache to make sure no duplicate transactions will be onchain. Thus lots of time wasted in hard disk reading whilst DB searching, upon incoming transactions, will be saved.

@erikzhang
Copy link
Member

I don’t think it’s worth adding 32 bytes to every transaction.

@Qiao-Jin
Copy link
Contributor Author

I don’t think it’s worth adding 32 bytes to every transaction.

Or maybe we can add only part of the block hash for proof, say 8 bytes or less?

@erikzhang
Copy link
Member

erikzhang commented Aug 28, 2020

If you want to cache the recent transactions, why not just cache the transactions in the latest n blocks?

@Qiao-Jin
Copy link
Contributor Author

If you want to cache the recent transactions, why not just cache the transactions in the latest n blocks?

I cached transaction hashes instead of transactions themseves for the sake of RAM saving.

@@ -30,6 +30,7 @@ public class Transaction : IEquatable<Transaction>, IInventory, IInteroperable
private long sysfee;
private long netfee;
private uint validUntilBlock;
private UInt256 lastBlockHash = UInt256.Zero;
Copy link
Contributor

@Tommo-L Tommo-L Aug 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, why not use Height or SendHeight, HeightAt, ValidAfter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such value can be forged by sender to make duplicate transactions onchain.


namespace Neo.IO.Caching
{
public class HashCache<T> where T : IEquatable<T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is somehow different, i.e. don't need a bucketCapacity, don't want to check Contains upon adding as has been checked before in Blockchain.OnTransaction, etc.

@Qiao-Jin
Copy link
Contributor Author

Qiao-Jin commented Sep 1, 2020

After merging #1507 Blockchain.OnTransaction has become the new bottleneck as each new coming transaction needs to be checked in hard disk, as explained in corresponding issue. We do need such a cache for recent onchain transctions so as to get ride of preventable disk readings, even though the price is adding some extra data into Transaction, as the benefit is very much compared with the price.

@roman-khimov
Copy link
Contributor

The problem here to me is that this code makes too many assumptions about the way it'd be tested. It fine-tunes for a very specific transaction flow and it will surely improve things for it, but for a real congested network under big load it doesn't look that good. Transactions will arrive late and you'll have to make a full search anyway. Think about other benchmarks also, we're testing nodes with neo-bench and I can tell you right now that this change won't improve the number there in any way, just because we're creating all (signed) transactions in advance and they only differ in nonce values (and they're perfectly valid!).

@Qiao-Jin
Copy link
Contributor Author

Qiao-Jin commented Sep 2, 2020

but for a real congested network under big load it doesn't look that good

Please provide test result for this assumption like in the issue.

Transactions will arrive late and you'll have to make a full search anyway

How late? 10 blocks or 20 blocks? Again you need a test result for this assumption. On the other hand, as long as there is an average delay got by testing we can set the cached height can't we? As long as most cases are covered this PR is effective isn't it?

I can tell you right now that this change won't improve the number there in any way, just because we're creating all (signed) transactions in advance and they only differ in nonce values (and they're perfectly valid!).

You should read more carefully about corresponding issue. There I wrote some methods to prevent such occasion. It involves other repos & functionality so is not shown in this PR.

"in any way"? Please, read before saying so OK?

And once more, you like to ask ppl for test results as proof in their issues/PRs. Then WHY do youself negate others' ideas without any tests and only relies upon assumption of yourself??????

Please, do some experiments. Then you can see how much time is exhausted in DB checking for incoming transactions.

@erikzhang
Copy link
Member

public bool ContainsTransaction(UInt256 hash)
{
TransactionState state = Transactions.TryGet(hash);
return state != null;
}

Maybe if we can optimize StoreView.ContainsTransaction(), we can solve all the problems.

@shargon shargon mentioned this pull request Sep 2, 2020
@shargon
Copy link
Member

shargon commented Sep 2, 2020

Maybe if we can optimize StoreView.ContainsTransaction(), we can solve all the problems.

Agree, we can optimize this method, and take new benchmarks with rocksDb and levelDb.

@roman-khimov
Copy link
Contributor

Please, do some experiments.

That's what I've been doing for the past ~2 weeks going from

Screenshot_20200902_101754

(it's that pesky HasTransaction rectangle we're talking about here) to

Screenshot_20200902_101923

(which suddenly doesn't really have it)

I can certainly feel your pain in trying to squeeze more juice out of the node. And I'm absolutely sympathetic to any attempts at improving node's performance, it's not easy. Yet at the same time I think there is lot of room for improvement without any protocol changes. And any protocol change should be weighted very carefully and only applied if there is a real improvement that justifies this change.

@Qiao-Jin
Copy link
Contributor Author

Qiao-Jin commented Sep 2, 2020

That's what I've been doing for the past ~2 weeks going from

What's the TPS of your test? Is this test result of Neo-Go? And what does the times in each block mean?

@roman-khimov
Copy link
Contributor

What's the TPS of your test?

Let me reference our latest public verifiable results for now (you can grab neo-bench and get something similar and I'll reiterate here that part of C# "failures" are due to RPC settings that were not taken into account by neo-bench, it's fixed in subsequent revisions of it): https://medium.com/@neospcc/neo-3-0-0-preview3-nodes-benchmarking-e0f447fdf6af

We have improved since then.

Is this test result of Neo-Go?

Yes. But the protocol is the same. And the problems we're facing are very similar.

And what does the times in each block mean?

It's a standard Go pprof output (SVG originally, but Github doesn't allow to attach it), "42.87s" on incoming arrow means that this amount of time is spent in a call to this function, "0.02s of 42.87s" inside of rectangle means that only 0.02 seconds out of 42.87 are actually spent inside this function and the rest of it is spent in internal calls.

@Qiao-Jin
Copy link
Contributor Author

Qiao-Jin commented Sep 2, 2020

I see. And I know why we have different understanding about optimization. In your pic HasTransaction is not the bottleneck but in my local env it is. This is because in my local env I have optimized many blocks in your pic, i.e. I use Akka remote to send transaction data in batch via remote actors which are located in different actor systems (as in #1874), so transaction transmission is not a bottleneck, at least not the most serious one; I use parelleled transaction verification which is shown in #1507 so that transaction verification is not the bottleneck; Also the logic of this PR is in my local env; And yes, vm exhausts time during Persist and I'm also thinking about solutions.

@@ -285,6 +296,8 @@ public virtual VerifyResult VerifyStateDependent(StoreView snapshot, Transaction
{
if (ValidUntilBlock <= snapshot.Height || ValidUntilBlock > snapshot.Height + MaxValidUntilBlockIncrement)
return VerifyResult.Expired;
if (snapshot.GetBlock(LastBlockHash)?.Index != ValidUntilBlock - MaxValidUntilBlockIncrement)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Qiao-Jin, shouldn't it be <= instead of !=? If not, could you, please, explain the meaning of this line?

Copy link
Contributor Author

@Qiao-Jin Qiao-Jin Sep 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? This LastBlockHash is used to judge whether it's a valid ValidUntilBlock got from Height + MaxValidUntilBlockIncrement.

@AnnaShaleva
Copy link
Member

AnnaShaleva commented Sep 3, 2020

We also tested this branch against 765c43a with the help of neo-bench. Two configurations were tested under the load of 30 workers: single C# node and four-nodes network. As a result, there's no significant TPS improvements:

Branch Single node TPS Four nodes TPS
765c43a 3320.284 503.935
tx_hash_cache 3118.608 533.828

The full benchmark results you can find here.

@Qiao-Jin
Copy link
Contributor Author

Qiao-Jin commented Sep 4, 2020

We also tested this branch against 765c43a with the help of neo-bench. Two configurations were tested under the load of 30 workers: single C# node and four-nodes network. As a result, there's no significant TPS improvements:

Branch Single node TPS Four nodes TPS
765c43a 3320.284 503.935
tx_hash_cache 3118.608 533.828
The full benchmark results you can find here.

Please refer this, and my test results in corressponding issue. This place is a future bottleneck when tx verification & message transmission are optimized.

@Qiao-Jin
Copy link
Contributor Author

Qiao-Jin commented Sep 4, 2020

@AnnaShaleva @roman-khimov

  1. What it solves?

It avoids large numbers of preventable DB search for incoming transactions. Upon each incoming transaction we will search it in mempool and then DB to make sure that it shall not be onchain for duplicate times. But such strategy is problematic.

For each newly created incoming transaction we will eventually have to search it in LevelDb/RocksDB till the last level. What does this mean? So much time is exhausted in hard disk reading for each transaction. What's worse is that it can be a disaster when the size of levelDB goes up. Just imagine how it's like proceeding such strategy when the size of DB data grows to many many GB as time goes by. More seriously, new-coming stateroot logic will produce much more data in DB than current. Think about it.

  1. Is it with "too many assumptions"?

I don't think so. Firstly tell me, what is TPS? Doesn't it mean transactions per second? So aren't transactions are the center of this definition? So what should we care about most? Shouldn't it include things like transaction transmission, transaction verification, transaction persisting, etc?? So tell me, why checking transactions a node have not seen before, is with "too many assumptions"?? EVERY IN COMING TRANSACTION HAS NOT BEEN SEEN BEFORE BY A NODE UNLESS IT'S CREATED BY ITSELF.

  1. Is this PR the whole logic?

No. Some functionalities has not been included, because I think further discussion is needed, i.e. such as how to prevent ppl keep sending transactions with low ValidUntilBlock. Sadly till now I am the only one to come up with some ideas on this topic (I myself also don't like strategies such as more gas fee for DB checking & am thinking about better solutions). I am waiting for other ideas. And the reason I propose this PR is also attracting more ppl's attention to think about better solutions.

@erikzhang
Copy link
Member

@Qiao-Jin Do you think #1910 solves this problem?

@erikzhang erikzhang closed this Sep 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add cache for recent on-chain transaction hashes to enhance TPS
6 participants