Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc v2: backpressure chainHead_v1_storage #5741

Merged
merged 37 commits into from
Oct 3, 2024
Merged

Conversation

niklasad1
Copy link
Member

@niklasad1 niklasad1 commented Sep 17, 2024

Close #5589

This PR makes it possible for rpc_v2::Storage::query_iter_paginated to be "backpressured" which is achieved by having a channel where the result is sent back and when this channel is "full" we pause the iteration.

The chainHead_follow has an internal channel which doesn't represent the actual connection and that is set to a very small number (16). Recall that the JSON-RPC server has a dedicate buffer for each connection by default of 64.

Benchmarks using subxt on localhost

  • Iterate over 10 accounts on westend-dev -> ~2-3x faster
  • Fetch 1024 storage values (i.e, not descedant values) -> ~50x faster
  • Fetch 1024 descendant values -> ~500x faster

The reason for this is because as Josep explained in the issue is that one is only allowed query five storage items per call and clients has make lots of calls to drive it forward..

@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable 2/3
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7365923

@niklasad1 niklasad1 changed the title WIP: rpc v2: rely backpressure Storage::query_iter WIP: rpc v2: rely backpressure for chainHead_v1_storage Sep 18, 2024
@niklasad1 niklasad1 changed the title WIP: rpc v2: rely backpressure for chainHead_v1_storage rpc v2: rely backpressure for chainHead_v1_storage Sep 19, 2024
@niklasad1 niklasad1 marked this pull request as ready for review September 19, 2024 09:09
@niklasad1 niklasad1 added the T3-RPC_API This PR/Issue is related to RPC APIs. label Sep 19, 2024
impl OperationState {
pub fn stop(&mut self) {
if !self.stop.is_stopped() {
self.operations.lock().remove(&self.operation_id);
Copy link
Member Author

@niklasad1 niklasad1 Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

annoying to lock the mutex here instead of AtomicBool but this is needed to have an async notification when operation is stopped.

couldn't find a better way for this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is acceptable since the operationStop shouldn't be happening too often if at all. We are also acquiring the mutex on dropping RegisteredOperations to clean up the tracing of operation IDs.

Copy link
Contributor

@lexnv lexnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for tackling this 🙏

Tiny nits around the return value of the chainHead_continue method and some open question about the number of reserved operation that should discard items

Co-authored-by: James Wilson <james@jsdw.me>
Copy link
Contributor

@jsdw jsdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and a nice general improvement in the code!

@niklasad1 niklasad1 requested a review from a team October 1, 2024 18:06
@niklasad1 niklasad1 added this pull request to the merge queue Oct 2, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 2, 2024
@niklasad1 niklasad1 added this pull request to the merge queue Oct 3, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 3, 2024
@niklasad1 niklasad1 added this pull request to the merge queue Oct 3, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 3, 2024
@niklasad1 niklasad1 added this pull request to the merge queue Oct 3, 2024
Merged via the queue into master with commit 3313163 Oct 3, 2024
217 checks passed
@niklasad1 niklasad1 deleted the na-fix-rpc-storage-iter branch October 3, 2024 18:59
@carlosala
Copy link

carlosala commented Oct 17, 2024

Hey! Just circling back on this, do you think it is feasible to backport this PR into stable2407 and stable2409. It is actually a huge perf improvement.
Thanks!

cc @niklasad1 @jsdw @lexnv

niklasad1 added a commit that referenced this pull request Oct 17, 2024
Close #5589

This PR makes it possible for `rpc_v2::Storage::query_iter_paginated` to
be "backpressured" which is achieved by having a channel where the
result is sent back and when this channel is "full" we pause the
iteration.

The chainHead_follow has an internal channel which doesn't represent the
actual connection and that is set to a very small number (16). Recall
that the JSON-RPC server has a dedicate buffer for each connection by
default of 64.

- Because `archive_storage` also depends on
`rpc_v2::Storage::query_iter_paginated` I had to tweak the method to
support limits as well. The reason is that archive_storage won't get
backpressured properly because it's not an subscription. (it would much
easier if it would be a subscription in rpc v2 spec because nothing
against querying huge amount storage keys)
- `query_iter_paginated` doesn't necessarily return the storage "in
order" such as
- `query_iter_paginated(vec![("key1", hash), ("key2", value)], ...)`
could return them in arbitrary order because it's wrapped in
FuturesUnordered but I could change that if we want to process it
inorder (it's slower)
- there is technically no limit on the number of storage queries in each
`chainHead_v1_storage call` rather than the rpc max message limit which
10MB and only allowed to max 16 calls `chainHead_v1_x` concurrently
(this should be fine)

- Iterate over 10 accounts on westend-dev -> ~2-3x faster
- Fetch 1024 storage values (i.e, not descedant values) -> ~50x faster
- Fetch 1024 descendant values -> ~500x faster

The reason for this is because as Josep explained in the issue is that
one is only allowed query five storage items per call and clients has
make lots of calls to drive it forward..

---------

Co-authored-by: command-bot <>
Co-authored-by: James Wilson <james@jsdw.me>
@niklasad1
Copy link
Member Author

niklasad1 commented Oct 17, 2024

Hey! Just circling back on this, do you think it is feasible to backport this PR into stable2407 and stable2409. It is actually a huge perf improvement.
Thanks!

I had a look and stable2409 looks possible to backport and I have created a PR for it but stable2407 requires a bunch of others PRs that are not backported. Hopefully stable2409 is sufficient?

@carlosala
Copy link

I'd be great to backport all recent PRs around rpc server v2 to both supported versions. This will ensure that nodes get those fixes faster. I leave it to you to decide 👍🏻

niklasad1 added a commit that referenced this pull request Oct 18, 2024
Close #5589

This PR makes it possible for `rpc_v2::Storage::query_iter_paginated` to
be "backpressured" which is achieved by having a channel where the
result is sent back and when this channel is "full" we pause the
iteration.

The chainHead_follow has an internal channel which doesn't represent the
actual connection and that is set to a very small number (16). Recall
that the JSON-RPC server has a dedicate buffer for each connection by
default of 64.

- Because `archive_storage` also depends on
`rpc_v2::Storage::query_iter_paginated` I had to tweak the method to
support limits as well. The reason is that archive_storage won't get
backpressured properly because it's not an subscription. (it would much
easier if it would be a subscription in rpc v2 spec because nothing
against querying huge amount storage keys)
- `query_iter_paginated` doesn't necessarily return the storage "in
order" such as
- `query_iter_paginated(vec![("key1", hash), ("key2", value)], ...)`
could return them in arbitrary order because it's wrapped in
FuturesUnordered but I could change that if we want to process it
inorder (it's slower)
- there is technically no limit on the number of storage queries in each
`chainHead_v1_storage call` rather than the rpc max message limit which
10MB and only allowed to max 16 calls `chainHead_v1_x` concurrently
(this should be fine)

- Iterate over 10 accounts on westend-dev -> ~2-3x faster
- Fetch 1024 storage values (i.e, not descedant values) -> ~50x faster
- Fetch 1024 descendant values -> ~500x faster

The reason for this is because as Josep explained in the issue is that
one is only allowed query five storage items per call and clients has
make lots of calls to drive it forward..

---------

Co-authored-by: command-bot <>
Co-authored-by: James Wilson <james@jsdw.me>
@niklasad1
Copy link
Member Author

Yeah, ok I had another look and it was possible. Opened #6114 as well puuh :)

@carlosala
Copy link

yeah, cool!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T3-RPC_API This PR/Issue is related to RPC APIs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JSON-RPC: performance problem with chainHead_v1_storage queries using descendantValues
5 participants