-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bazel CI: RBE builds are broken after grpc java upgrade #12264
Comments
FYI @dmivankov, can you help look into this issue? Is this a bug in protobuf? |
Similar issue and fix: pravega/pravega#1621, pravega/pravega#1622 |
https://source.corp.google.com/piper///depot/google3/third_party/bazel/src/main/java/com/google/devtools/build/lib/authandtls/AuthAndTLSOptions.java;l=119?q=%22grpcKeepaliveTime%22 |
Interestingly I couldn't immediately find keep-alive default timeout changes in https://github.com/grpc/grpc-java v1.26.0->v1.31.1 or in https://github.com/netty/netty netty-4.1.42.Final -> netty-4.1.48.Final 20sec in GrpcUtil But this looks interesting Does setting environment variable GRPC_EXPERIMENTAL_AUTOFLOWCONTROL=false fix the issue on client side? NettyServerBuilder builder;
..
builder.flowControlWindow(NettyServerBuilder.DEFAULT_FLOW_CONTROL_WINDOW) in https://github.com/bazelbuild/bazel/search?q=NettyServerBuilder |
Was turned on by default during 1.26.0->1.31.1 grpc-java bump It seems that it may be causing errors in RBE: io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings bazelbuild#12264 grpc/grpc-java#7302
Drafted a PR #12266 |
We can also try a forward fix going for v1.32.2 netty: BDP ping accounting should occur after flow control. This resolves an incompatibility issue introduced in v1.30.0 and could be worked around via GRPC_EXPERIMENTAL_AUTOFLOWCONTROL=false introduced later. The symptom was a GOAWAY with “too_many_pings” without an aggressive keepalive configured. The environment variable is still available, but will be removed in the future |
The original default keepalive was to send no keepalive. Maybe that has also changed upstream? That seems more likely to explain the problem. |
Since gRPC v1.32.2 fixes this, can we upgrade to that version instead? |
yes, auto flow enables pinging Given that auto flow control is a new feature and there's some indication that it caused the regression I'd rather try disabling it first #12266 as a more solid option. v1.32.2 has fixes in that area, but it takes more PRs to bump again, unless there's an easy way to check whether it really helps before merging probably a good idea to try a faster fix. I will prepare v1.32.2 though |
interesting bit in grpc core 1.32.0 |
grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266
Part 1: add v1.32.x version to third_party/grpc Note: partly switches to v1.32.x too as not all bits are versioned and some of unversioned bits are used from other third_party targets grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266
Part 1: add v1.32.x version to third_party/grpc Note: partly switches to v1.32.x too as not all bits are versioned and some of unversioned bits are used from other third_party targets grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266
Part 1: add v1.32.x version to third_party/grpc Note: partly switches to v1.32.x too as not all bits are versioned and some of unversioned bits are used from other third_party targets grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266
Part 2: switch to v1.32.x grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266
Part 3: remove 1.31.1 from third_party/grpc grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266
Was turned on by default during 1.26.0->1.31.1 grpc-java bump It seems that it may be causing errors in RBE: io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings #12264 grpc/grpc-java#7302 Closes #12266. PiperOrigin-RevId: 337254515
@meteorcloudy The fix is merged as 6e94b05. Can you run the tests with it? |
Just launched a downstream test here: https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/1701 |
The issue still exists for |
Part 1: add v1.32.x version to third_party/grpc Note: partly switches to v1.32.x too as not all bits are versioned and some of unversioned bits are used from other third_party targets grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266
Part 2: switch to v1.32.x grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266
Part 3: remove 1.31.1 from third_party/grpc grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266
Part 1: add v1.32.x version to third_party/grpc Note: partly switches to v1.32.x too as not all bits are versioned and some of unversioned bits are used from other third_party targets grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266
Part 2: switch to v1.32.x grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266
Part 3: remove 1.31.1 from third_party/grpc grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266
Part 2: switch to v1.32.x grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266
Part 3: remove 1.31.1 from third_party/grpc grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266
Ok, so we can try next one of two things
Is there a way to run those RBE tests on PRs? |
Yes, I'll help test #12273 |
The RBE build seems to be fixed by upgrading to 1.32.x. |
The rules_haskell failure is caused by something else. So it looks like upgrading grpc to 1.32.x does fix the issue and allows us to safely bring back auto flow control. |
Great, then I can make the PRs: add 1.32.x, switch to 1.32.x & bring auto flow control back, drop 1.31.1 |
Part 1: add v1.32.x version to third_party/grpc Note: partly switches to v1.32.x too as not all bits are versioned and some of unversioned bits are used from other third_party targets Composed PR: bazelbuild#12273 grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266
Part 1: add v1.32.x version to third_party/grpc Note: partly switches to v1.32.x too as not all bits are versioned and some of unversioned bits are used from other third_party targets Composed PR: #12273 grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps #12264 Note: also an attempt and disabling auto flow by default is made in #12266 Closes #12279
Part 2: switch to v1.32.x grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266
Part 2: switch to v1.32.x grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps #12264 Note: also an attempt and disabling auto flow by default is made in #12266 Also turn auto flow control feature back on This reverts commit 6e94b05. Closes #12288. PiperOrigin-RevId: 337485572
Part 3: remove 1.31.1 from third_party/grpc grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266
This is fixed by upgrading grpc java version |
Part 3: remove 1.31.1 from third_party/grpc grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps #12264 Note: also an attempt and disabling auto flow by default is made in #12266 Closes #12289
Part 1: add v1.32.x version to third_party/grpc Note: partly switches to v1.32.x too as not all bits are versioned and some of unversioned bits are used from other third_party targets Composed PR: bazelbuild#12273 grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266 Closes bazelbuild#12279
Part 2: switch to v1.32.x grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266 Also turn auto flow control feature back on This reverts commit 6e94b05. Closes bazelbuild#12288. PiperOrigin-RevId: 337485572
Part 3: remove 1.31.1 from third_party/grpc grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266 Closes bazelbuild#12289
Part 1: add v1.32.x version to third_party/grpc Note: partly switches to v1.32.x too as not all bits are versioned and some of unversioned bits are used from other third_party targets Composed PR: bazelbuild#12273 grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266 Closes bazelbuild#12279
Part 2: switch to v1.32.x grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266 Also turn auto flow control feature back on This reverts commit 6e94b05. Closes bazelbuild#12288. PiperOrigin-RevId: 337485572
Part 3: remove 1.31.1 from third_party/grpc grpc-java transition from v1.26.0 to v1.31.1 enabled auto flow control which started failing in RBE with io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings grpc-java v1.32.2 has a bugfix attempt on that grpc v1.32.0 also has something new around keepalive pings Hopefully version bump to those helps bazelbuild#12264 Note: also an attempt and disabling auto flow by default is made in bazelbuild#12266 Closes bazelbuild#12289
Was turned on by default during 1.26.0->1.31.1 grpc-java bump It seems that it may be causing errors in RBE: io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Bandwidth exhausted HTTP/2 error code: ENHANCE_YOUR_CALM Received Goaway too_many_pings bazelbuild/bazel#12264 grpc/grpc-java#7302 Closes #12266. PiperOrigin-RevId: 337254515
https://buildkite.com/bazel/bazel-auto-sheriff-face-with-cowboy-hat/builds/306
Verified by building with Bazel@d4cd4e7ab18ebeae4152dafc113367289ffebb12 and its previous commit:
https://buildkite.com/bazel/culprit-finder/builds/581
https://buildkite.com/bazel/culprit-finder/builds/582
Culprit: d4cd4e7
The text was updated successfully, but these errors were encountered: