Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdserver: request is too large and grpc: received message larger than max #7315

Closed
Tracked by #6640
lmatz opened this issue Jan 11, 2023 · 6 comments
Closed
Tracked by #6640
Assignees
Labels
found-by-longevity-test help wanted Issues that need help from contributors type/bug Something isn't working

Comments

@lmatz
Copy link
Contributor

lmatz commented Jan 11, 2023

Describe the bug

psql:./nexmark/queries/q15.sql:17: ERROR:  QueryError: internal error: MetadataModel error: Meta store error: internal error: grpc request error: status: ResourceExhausted, message: "grpc: received message larger than max (2182088 vs. 2097152)", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }CREATE_MATERIALIZED_VIEWCREATE_MATERIALIZED_VIEW
psql:./nexmark/queries/q5.sql:35: ERROR:  QueryError: internal error: MetadataModel error: Meta store error: internal error: grpc request error: status: InvalidArgument, message: "etcdserver: request is too large", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }
psql:./nexmark/queries/q16.sql:19: ERROR:  QueryError: internal error: MetadataModel error: Meta store error: internal error: grpc request error: status: ResourceExhausted, message: "grpc: received message larger than max (2417002 vs. 2097152)", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }

To Reproduce

No response

Expected behavior

No response

Additional context

namespace: rwc-3-longevity-20230111-060844
buildkite: https://buildkite.com/risingwave-test/longevity-test/builds/289#01859f71-d994-40ea-a5ab-c4791805061e

Related: #6726

@lmatz lmatz added the type/bug Something isn't working label Jan 11, 2023
@github-actions github-actions bot added this to the release-0.1.16 milestone Jan 11, 2023
@lmatz lmatz changed the title etcdserver: request is too large etcdserver: request is too large and grpc: received message larger than max Jan 11, 2023
@zwang28
Copy link
Contributor

zwang28 commented Jan 11, 2023

Seems some too large write to etcd when creating MV.
We've customized etcd config in risedev.
We can enlarge at least --max-request-bytes in longevity test.

BTW this issue is not the same as #6726, because the latter happens during recovery, while this one happens during creating MV. @yezizp2012 Are you aware of any potential large write when creating MV?

@yezizp2012
Copy link
Member

BTW this issue is not the same as #6726, because the latter happens during recovery, while this one happens during creating MV. @yezizp2012 Are you aware of any potential large write when creating MV?

Nope, only metadata of streaming job (including actor/fragment/location infos) will be wrote into etcd during MV creation. The issue found in #6726 already included all metadata of tpch MVs, which is about 5M+ in a ci-3cn-1fe environment and obviously larger than the ones in this issue. It's quite weird that the metadata of nexrmark q15/q5/q16 are all larger than 2M in this issue. 🤔

Anyway, I also agree that we'd better enlarge --max-request-bytes in longevity test.

@kwannoel kwannoel assigned kwannoel and unassigned kwannoel Jan 13, 2023
@zwang28
Copy link
Contributor

zwang28 commented Jan 13, 2023

Because in the test the compute node's parallelism is 32 and there are 3 compute nodes. The write size is proportional to to them.
We can reproduce it locally with ./risedev d ci-3cn-1fe, compute node's parallelism=32 and etcd's default max-request-bytes instead of the enlarged one we've used in risedev, then creating nexmark q15.

With the enlarged max-request-bytes, the error is gone.

@kwannoel kwannoel added the help wanted Issues that need help from contributors label Jan 18, 2023
@kwannoel
Copy link
Contributor

To enable --max-request-bytes in rwc: https://github.com/risingwavelabs/risingwave-cloud/issues/1493

@kwannoel kwannoel assigned kwannoel and unassigned kwannoel Jan 26, 2023
@kwannoel
Copy link
Contributor

kwannoel commented Jan 26, 2023

Some additional notes on reproducing this via risedev, thanks to @zwang28 for providing more details offline:

Run the queries:

And you will see the error.

After https://github.com/risingwavelabs/risingwave-test/pull/212 is running on the buildkite job, we should no longer see this error message in longevity-test.

@fuyufjh fuyufjh assigned cloudcarver and unassigned kwannoel Jan 30, 2023
@fuyufjh
Copy link
Member

fuyufjh commented Jan 30, 2023

Fixed by https://github.com/risingwavelabs/risingwave-cloud/pull/1566 ? @mikechesterwang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
found-by-longevity-test help wanted Issues that need help from contributors type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants