Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump grpc to 1.33.1 to fix corruption when downloading CAS blobs #13055

Closed
wants to merge 3 commits into from

Conversation

scele
Copy link
Contributor

@scele scele commented Feb 17, 2021

grpc-java versions 1.27 through 1.32 had a bug where messages could arrive
after the call was reported clsoed. In the case of bazel, this meant that
in GrpcCacheClient, onNext could be called after onError. This leads to
offset bookkeeping getting out of sync, and corrupts the CAS blob download.

#12927

@google-cla google-cla bot added the cla: yes label Feb 17, 2021
@divanorama
Copy link
Contributor

divanorama commented Feb 18, 2021

github says jars were renamed without change which is suspicious
upd: though git shows changes, so looks like a github glitch

@scele scele force-pushed the issue_12927 branch 2 times, most recently from dc24530 to 4a4c8bc Compare February 24, 2021 08:53
Part 1: add v1.33.1 version to third_party/grpc
Note: partly switches to v1.33.1 too as not all bits are versioned and
      some of unversioned bits are used from other third_party targets

grpc-java versions 1.27 through 1.32 had a bug where messages could arrive
after the call was reported clsoed.  In the case of bazel, this meant that
in GrpcCacheClient, onNext could be called after onError.  This leads to
offset bookkeeping getting out of sync, and corrupts the CAS blob download.

bazelbuild#12927
Part 2: Switch to grpc 1.33.1.

grpc-java versions 1.27 through 1.32 had a bug where messages could arrive
after the call was reported clsoed.  In the case of bazel, this meant that
in GrpcCacheClient, onNext could be called after onError.  This leads to
offset bookkeeping getting out of sync, and corrupts the CAS blob download.

bazelbuild#12927
Part 3: remove 1.32.x from third_party/grpc.

grpc-java versions 1.27 through 1.32 had a bug where messages could arrive
after the call was reported clsoed.  In the case of bazel, this meant that
in GrpcCacheClient, onNext could be called after onError.  This leads to
offset bookkeeping getting out of sync, and corrupts the CAS blob download.

bazelbuild#12927
@scele scele changed the title Bump grpc to 1.34.1 to fix corruption when downloading CAS blobs Bump grpc to 1.33.1 to fix corruption when downloading CAS blobs Feb 24, 2021
@scele
Copy link
Contributor Author

scele commented Feb 24, 2021

@divanorama @coeuvre I originally tried to upgrade to 1.34.1, but that was causing issues since grpc 1.34.1 had upgraded many of its transitive dependencies like upb and abseil... So now I'm trying to upgrade to 1.33.1 instead.

@divanorama
Copy link
Contributor

There are further bugfixes in 1.34+, but shouldn't hurt to try 1.33.1 first imho
Also noticed that grpc-core has 1.33.2 https://github.com/grpc/grpc/releases/tag/v1.33.2

@scele
Copy link
Contributor Author

scele commented Feb 24, 2021

Also noticed that grpc-core has 1.33.2 https://github.com/grpc/grpc/releases/tag/v1.33.2

@divanorama Good point! Diffing 1.33.1 and 1.33.2 it seems that the only material change is in the python client, so in practice it should not make any difference. I can certainly bump to 1.33.2 instead if you prefer?

@coeuvre
Copy link
Member

coeuvre commented Feb 25, 2021

@scele Thanks for catching the bug and creating the fix!
@meteorcloudy Can you please import the changes?

@meteorcloudy
Copy link
Member

@scele Thanks for the PR! For reasons, we cannot merge a PR containing both third_party and non-third_party changes.
Could you please split this PR into separate changes?
An example is:
#12279
#12288
#12289

@scele
Copy link
Contributor Author

scele commented Feb 25, 2021

@meteorcloudy I have created:
#13104 [1/3]
#13105 [2/3] (CI fails because this patch requires #13104)
#13106 [3/3] (This also includes the commit from #13104, since otherwise the patch does not apply cleanly on master)

Let me know if I can do more. I'm also 100% OK with you or other googlers modifying my commits/PRs as needed to get them in.

@meteorcloudy
Copy link
Member

Thanks! I'll start to merge them.

@scele
Copy link
Contributor Author

scele commented Mar 1, 2021

Merged by #13104, #13105, #13106 - closing. Thanks!

@scele scele closed this Mar 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants