Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't charge extra for trie deletions outside of contract execution #11507

Merged
merged 5 commits into from
Jun 6, 2024

Conversation

jancionear
Copy link
Contributor

To limit the amount of storage proof generated during chunk application we calculate the upper bound estimation of how big the storage proof will be, and stop executing receipts when this estimated size gets to big. When estimating we assume that every trie removals generates 2000 bytes of storage proof, because this is the maximum size that a malicious attacker could generate (#11069, #10890).

This estimation was meant to limit the size of proof generated while executing receipts, but currently it also applies to other trie removals performed by the runtime, for example when removing receipts from the delayed receipt queue. This is really wasteful - removing 1000 receipts would cause the estimation to jump by 2MB, hitting the soft limit. We don't really need to charge this much for internal operations performed by the runtime, they aren't malicious. Let's change is so that only contracts are charged extra for removals. This will avoid the extra big estimation caused by normal queue manipulation.

Refs: https://near.zulipchat.com/#narrow/stream/308695-nearone.2Fprivate/topic/Large.20number.20of.20delayed.20receipts.20in.20statelessnet/near/442878068

@jancionear jancionear added the A-stateless-validation Area: stateless validation label Jun 6, 2024
@jancionear jancionear requested a review from a team as a code owner June 6, 2024 13:19
@jancionear
Copy link
Contributor Author

Theoretically this is a protocol change, and should have a separate protocol version, but I'm not sure if it's really needed. We are able to restart all nodes of statelessnet at once, right? Upgrading all of them at the same time should be enough to get things going.

Copy link
Member

@Longarithm Longarithm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabling/disabling a mode is an anti-pattern to me, because it requires mutability and more care from the struct owner - no one is protected from accidentally calling it in some other place and breaking costs. ChunkCache also should not have been implemented this way :(

@Ekleog-NEAR inside a contract, is there a way to access keys other than ContractData?
If not, I would just check a trie key type. Intuitively, the cost of removing all other key types is by several orders of magnitude higher, so they shouldn't be an issue.

Theoretically this is a protocol change, and should have a separate protocol version, but I'm not sure if it's really needed. We are able to restart all nodes of statelessnet at once, right? Upgrading all of them at the same time should be enough to get things going.

Yeah, I would prefer not adding new protocol version, it means more ugly code which we don't really need.

Copy link

codecov bot commented Jun 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.32%. Comparing base (97d4605) to head (6bb54e0).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #11507      +/-   ##
==========================================
+ Coverage   71.31%   71.32%   +0.01%     
==========================================
  Files         787      787              
  Lines      159389   159391       +2     
  Branches   159389   159391       +2     
==========================================
+ Hits       113667   113685      +18     
+ Misses      40782    40773       -9     
+ Partials     4940     4933       -7     
Flag Coverage Δ
backward-compatibility 0.23% <0.00%> (ø)
db-migration 0.23% <0.00%> (ø)
genesis-check 1.36% <0.00%> (-0.01%) ⬇️
integration-tests 37.48% <100.00%> (-0.01%) ⬇️
linux 68.75% <25.00%> (+<0.01%) ⬆️
linux-nightly 70.84% <100.00%> (+0.01%) ⬆️
macos 52.37% <25.00%> (+<0.01%) ⬆️
pytests 1.58% <0.00%> (-0.01%) ⬇️
sanity-checks 1.38% <0.00%> (-0.01%) ⬇️
unittests 66.04% <25.00%> (+<0.01%) ⬆️
upgradability 0.28% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jancionear
Copy link
Contributor Author

jancionear commented Jun 6, 2024

Enabling/disabling a mode is an anti-pattern to me, because it requires mutability and more care from the struct owner - no one is protected from accidentally calling it in some other place and breaking costs. ChunkCache also should not have been implemented this way :(

Yeah this is a good point. It was easier for me to think about as an on/off switch that is turned to on during contract execution, but it's true that it's pretty error prone.

Changed it to record only when removing TrieKey::ContractData. I hope that contracts can only do that.

@nagisa
Copy link
Collaborator

nagisa commented Jun 6, 2024

During the execution of the contract RocksDB does get accessed with ContractData only, to the best of my knowledge. Not even receipts and such reach the storage until the receipts get seen by the transaction runtime (which happens after the execution of the contract.)

But I cannot claim that the contract is unable to construct an arbitrary list of tasks for the transaction runtime that would result in deletions that we don't want to see.

For the reference you can check the impl External for RuntimeExt to see the entirety of the interface contracts have access to during the execution of the contract code.

@tayfunelmas
Copy link
Contributor

LGTM. One sanity check though: is ContractData the only other key type we are interested in? are transfers also executed through contract data? how about account data deletions such as code, are they all considered in the category of Contract Data?

Copy link
Contributor

@tayfunelmas tayfunelmas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one question about ContractData. otherwise LGTM.

@jancionear
Copy link
Contributor Author

is ContractData the only other key type we are interested in? are transfers also executed through contract data? how about account data deletions such as code, are they all considered in the category of Contract Data?

We are interested in ContractData because contracts are allowed to insert/delete arbitrary keys of this type. Other key types are manipulated by the runtime in a predefined way, so it's very hard (impossible?) for a malicious user to cause the extra-large deletes that we're afraid of.

@jancionear
Copy link
Contributor Author

jancionear commented Jun 6, 2024

I ran a manual test.
I set up 1 local node with 6 shards and started locust ft.py loadtest with 1000 users (ramping up at 1 user/second)

First on master (97d4605):
Transactions per second increased until the maximum throughput was achieved, at which point all the transactions started getting rejected because the shard is congested. Upper bound of storage proof increased to the maximum limit (2.4-3.2 MB bucket). The throughput dropped dramatically, to about 20TGas/s:

image

Then I ran the same test on this branch:
TPS increased until it reached the limit, at which point some transactions started getting rejected due to congestion, and the throughput dropped a bit, but shard remained functional. Delayed receipts started appearing, but upper bound of storage proof didn't explode, it stayed under 1.6MB the whole time:

image

It looks like the change helps with handling congestion, or at least it doesn't hurt ;)

@jancionear jancionear added this pull request to the merge queue Jun 6, 2024
Merged via the queue into near:master with commit 30bb67f Jun 6, 2024
28 of 29 checks passed
@jancionear jancionear deleted the selective branch June 6, 2024 19:10
@tayfunelmas
Copy link
Contributor

is ContractData the only other key type we are interested in? are transfers also executed through contract data? how about account data deletions such as code, are they all considered in the category of Contract Data?

We are interested in ContractData because contracts are allowed to insert/delete arbitrary keys of this type. Other key types are manipulated by the runtime in a predefined way, so it's very hard (impossible?) for a malicious user to cause the extra-large deletes that we're afraid of.

I was actually thinking that we are entirely removing the recording, but in fact is it just the removing of the 2M-extra in the calculation, so restricting to ContractData is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-stateless-validation Area: stateless validation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants