-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[suggestion] P2P module Enhancement #2862
Comments
I have a general question here, how long exactly does it takes to sync 1000 txs across the entire neo network? How about 10,000 txs? what if attackers send each consensus node 1000 txs different valid txs, with different accounts from different IP addresses? Before we get another sound solution, i vote for packing transaction bodies in the consensus message. And in the meanwhile, freeze the mempool if it is full. |
Sorry that I wrote a wrong number here. The attack speed should be 10k tx/s. If it's 5k tx/s, the network will work just fine. If the speed is much more larger, then the network will be worse. I'll send the result to you privately. Anyway, this problem should attract our attention. |
A bit more open discussion really can be helpful in this case, I'll reiterate on some points discussed previously. First, I don't think you can avoid having an inconsistent (unsynchronized wrt mempool contents) network. Any throttling scheme can be subverted with enough senders (as in Sending transaction contents with PrepareRequest is a solution, what I don't like about it is the associated cost it has. Correct me if I'm wrong, but I think dBFT tried to avoid doing exactly that since its inception. Testnet has a MaxBlockSize of 2M, most of this volume will be transactions, so if we're to try sending this large PrepareRequest it will take like 2s for every hop assuming 10Mbit/s. Of course you can have somewhat better connection, but you can also consider additional network stress from the transaction flow, either way I'd say this volume will be noticeable. Even in ideal case (synchronized mempools!) it'll cause some measurable block delay. So if we're to reject the idea of including transactions into PrepareRequest (but hey, convince me it's fine!), we have two basic routes: either relying on CVs to handle the situation (they're a part of the normal protocol anyway) and trying to make the next proposal more suitable for everyone, or trying to avoid them. In that sense #2058 (implemented in NeoGo) which tries to optimize the decision by using local data only can be extended to try taking into account data from other nodes in a way described in #2058 (comment) (hi, @vncoelho!). All nodes requesting CV send a reason for that. And if it's TxNotFound then they at least have got a proposal and they really can provide a bitmask of tx presence for that proposal. The problem is that if the network is being stressed in a way that causes complete mempool replacements in time comparable with the regular block time this may lead to the transaction data being outdated by the time CV is to happen. That is the new primary may have none of the old transactions in its mempool when it's to make a proposal. This, btw, may defeat the original assumption of #2058 as well, rendering it ineffective. We can try solving this problem before CV happens though, this requires getting transactions in the same round in some more reliable manner (but avoiding a full copy in PrepareRequest). CNs missing transactions from proposal do perform requests for them from the nearest neighbors which in many cases lead to successful block acceptance. But in case it doesn't we may try sending a new broadcasted RecoverTx message to the primary before CV timer expires. It can either be parameter-less or contain the same bitmask as mentioned above (so it'll be small). This should be a high-priority message (so that it'd be more likely to propagate in time) that is processed by the primary only. Primary can wait for an f+1 set of these messages and then either broadcast a full set of transactions in a new message or just those that are missed on other CNs. There can also be some heuristic like not sending RecoverTx if we're missing less than 20% of transactions or if the proposal contains less than 20% of MaxTransactionsPerBlock (numbers are pretty arbitrary here, this needs to be tested of course). RecoverTx may be sent after initial attempt to get transactions from neighbors like we send requests and wait 1/3 of the CV timer and if transactions are not there then we send RecoverTx and wait for the next 2/3 of the same timer. This may keep the performance we have in the optimistic case (highly synchronized network) while allowing quicker recovery in case there is a problem. Even if we're to have some problems collecting transactions at view 0, the next one will have extended timers and thus higher chance to succeed. |
Thanks for pinging me here. |
This may be a good strategy for these edge cases, @roman-khimov. We can also track these messages on the backups and extend it in a similar way we do with |
Regarding my worry on the Conflicts Attribute PR,in fact that can similarly happen nowadays with the ascending priority logic, as you highlighted: "causing complete mempool replacements". However, differently, the attacker will eventually need to pay for those |
As @roman-khimov has stated, the packing transaction body would cause delayed consensus message processing. Thus, I propose a new solution below that does not require any change to the consensus: The main reason for the attack is that attackers refresh CNs' mempools with ascending transactions after the primary issues a My new solution would target the
focusing on the mempool would pose a minimal impact on existing neo protocols. |
Intuitively I agree, but it strikes me as a problem that will remain in some form as long as transaction selection remains within the purview of CNs. As @roman-khimov points out, unsynchronized mempools can't really be avoided. I'd like to propose that we give some consideration to what is known in the Ethereum ecosystem as proposer-builder separation. Atomic transaction batches (or even full block bodies) can be constructed by trusted (e.g. non-CN committee members) or even untrusted nodes (à la dBFT 3.0). These can be validated and relayed to (or possibly by) CNs, who simply select the most profitable batches to use in proposed blocks. I'm skirting over a lot of details, and a lot of the research on this topic is designed with Ethereum in mind, so would need a number of adjustments to work in our case. But there are various potential implications of this architectural adjustment that I think at least merits further investigation. To name a few:
There's more, and I don't claim to be an expert, but I think there are some interesting potentials for us here. |
@Liaojinghui
I see your steps as a possibility, yes. But I believe that exists quite an amount of efforts and tests for us to make that work safely as well. |
That is great possibility for the future, @EdgeDLT. We can have the additional speaker as an untrusted node doing such kind of things, which, in my opinion, makes the decentralization of the consensus process more fair. |
Economically speaking, this mempool issue has been here since the free txs. In my opinion, the emergency txs pointed by @Liaojinghui solves the problem because the committee (from my perspective) should be a real-time online system. They should not discuss and take time to solve such situations. In principle, it is GREAT to have the mempool full of txs! There are lots of fees to be collected. |
under the case when the mempool is already full, I think it is ok to halt accepting new transactions. The liveness and stability would definitely be more important than processing new transactions when we already have thousands of nonprocessed in the cache. Well, not elegant, though. I kind of like putting more strict limitations on the system. |
I have to reminder that not all of the txs in the mempool can be valid on chain even if they can fit in the mempool now. |
I did not mean system fees @dusmart |
A much more reasonable and logical mempool, of course more complex, strategy then. For instance: 500 at most for valid until next, clear the rest. |
@Liaojinghui,
While this may seem like a fair behavior, I doubt it's viable:
However, some economic adjustments can help in avoiding transaction flood. It'll still be possible to cause some mempool difference with carefully planted nodes in different parts of the network, but at least there won't be a lot of traffic on the network, so it'll be easier to deal with it. IIRC now you only need to have bigger It can be done before reaching mempool capacity as well, have 50% full -> require 10% higher
That's true, btw. This economic disincentivization should be easier to start with, maybe it'll even be enough for public networks (but some knobs will be needed for tests/private installations).
That's an interesting thing in general, but it's more about the long-standing issue of missing economic motivation for running a Neo node (and some other things like potentially improving MEV situation). The problems here are:
We've got
Unrelated, but as I've said previously, mainnet should use the same 5K setting as well (it also somewhat mitigates low VUB problem and helps with transaction spikes in general, 500tx per 15s is 33TPS, 5000tx is 333TPS, easy as that). |
@roman-khimov I basically agree with your solution and consider yours not much different than mine. when I say freeze the mempool, well just to lazy to consider a proper name, I mean stop processing low fee txs but require higher tx fees, even 10 times more fees. and mempool difference is not a big deal when we already have 50k txs in the cache, users will not see neo as broken, users will see neo as |
mempool varies across nodes while acceptance criteria should not. This link to mempool capacity can/may causes other problems. |
I've thought about it. Yes, we still can't have perfect synchronization and different parts of the network will have some different sets of transactions potentially with different average If won't be perfect, it can't be, but it can work good enough for the purpose of preventing the flood. At least, we can try and this attempt won't cost a lot. Then |
Summary or problem description
If bad guys sent lots of valid transactions on chain in priority-ascending order with more than
100010000 transactions per second, the consensus can then hardly be reached for a very long time.If the attack details are required, please contact @superboyiii. He helped us for identifying this problem. We've tried our best to solve this while we did't get a good solution. Therefore, we issued a new enhancement request here to ask for core developer's solutions.
TxNotFound
will be a frequent reason for view changes, we'd better solve this issue firstDo you have any solution you want to propose?
We have discussed the proposal with Roman Khimov privately and he didn't like these two solutions. Roman had other methods for mitigation.
Neo Version
Where in the software does this update applies to?
The text was updated successfully, but these errors were encountered: