Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ensure context is set before spawning goroutines using it #2569

Merged
merged 3 commits into from
May 11, 2022

Conversation

Hades32
Copy link
Contributor

@Hades32 Hades32 commented May 4, 2022

What type of PR is this?
/kind bug

What this PR does / Why we need it:
should fix the panic in the linked issue

Which issue(s) this PR fixes:

Closes #2568

Special notes for your reviewer:

@google-cla
Copy link

google-cla bot commented May 4, 2022

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

For more information, open the CLA check for this pull request.

@google-oss-prow google-oss-prow bot requested review from aLekSer and markmandel May 4, 2022 15:07
@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: b99f0bc2-19b7-41f8-a753-356c5367f527

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@roberthbailey
Copy link
Member

sdk conformance tests failed (that's the second time I've seen that recently, so we may have a new flake to track down).

@roberthbailey
Copy link
Member

@Hades32 - Thanks for the PR!

Have you had a chance to test out this fix? Your comment

hopefully fixes the panic in the linked issue

makes me wonder if this is more of a guess or something that you have been able to verify.

I realize that the underlying issue is a race condition and probably not deterministically reproducible, but it's pretty straightforward using some of the load testing tools to spin up a few hundred / thousand test game servers and verify that none of them have a panic in the sidecar.

Since we are in between a release candidate and a release we only merge bug fixes and documentation (see here) so if this is known to be a bug fix then we can consider merging it before the 1.23 release. Otherwise we can merge it right after that release to let it bake for a while in our continuous integration tests for the following release.

@Hades32
Copy link
Contributor Author

Hades32 commented May 4, 2022

@roberthbailey I haven't yet. First wanted to see CI results and then figure out how to get that pre-release image to try in our cluster. We don't have the capacity though, to do that for a few thousand servers. I'd just use it for a while and see if the issue vanishes. Would be really great if we could that into the next release of course

@roberthbailey
Copy link
Member

Given that the build failure looked like a flake I've restarted it. Assuming it works, a link to grab the SDK built by CI should be posted here and you can try it out.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 9c0d7909-e6cc-43e1-9616-68b10c773b6f

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2569/head:pr_2569 && git checkout pr_2569
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.23.0-3f5b6bc-amd64

@Hades32
Copy link
Contributor Author

Hades32 commented May 6, 2022

I'm confused why the bot didn't post the one image I actually needed (the sidecar)... I'm now rebuilding locally and hope the results will be the same. Will report back tomorrow if the panics disappeared

@roberthbailey
Copy link
Member

It looks like we don't post links to all of the containers that are built for each PR, but there is a sdk image that was built: gcr.io/agones-images/agones-sdk:1.23.0-3f5b6bc-linux-amd64

There are also arm / windows ones if you need those instead, but we don't have a multi-arch manifest that gets generated by CI (see #2280).

@Hades32
Copy link
Contributor Author

Hades32 commented May 7, 2022

Ah, seems I just had a typo when I tried that out. Anyway, my local build (which got tagged exactly the same, so I'm confident it's identical) didn't crash for 24h which is longer than the 1.22.0 ever lasted, so I consider this test successful.

Is there anything else you want me to do to get this merged? @roberthbailey

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 507c2cda-d4fe-4836-9ede-a671620f8fc8

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@Hades32
Copy link
Contributor Author

Hades32 commented May 7, 2022

bad news, the exact same panic is back. But TBH I don't see how that's even possible... The context and the server object and the Done method all get used before. So, it's unclear to me how any of them can be null now in 536...

I'll try again with the build from CI in case mine was somehow not correct, but seems like this PR is not fixing the issue.

How long will the preview images be available?

@roberthbailey
Copy link
Member

How long will the preview images be available?

I think they get automatically cleaned up after a few weeks, but I'm not exactly sure. If you rebase this PR it will build a new image for you though if that one gets garbage collected.

@Hades32
Copy link
Contributor Author

Hades32 commented May 8, 2022

OK, I think I now really understand what's going on and my latest push should fix it:
Our gameserver starts up extremely fast and therefore the WatchGameserver gRPC call arrives before initialization has finished. Now two bugs really: 1) the context wasn't set early enough and 2) the waitGroup was notified too early. Once CI is done I'll test this out again

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: bab6c6f2-be3b-418f-a2cf-85e669dca646

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: e3242715-29ce-46ce-8789-7aa7b2de4776

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@Hades32
Copy link
Contributor Author

Hades32 commented May 9, 2022

If I interpret those logs correctly, then there are only errors in htmltest which this PR shouldn't affect...

@roberthbailey
Copy link
Member

Yeah, there are some new link check failures from old blog posts.

@Hades32
Copy link
Contributor Author

Hades32 commented May 9, 2022

@roberthbailey so they newest build is working fine for us. Is there anything else I should do?

@roberthbailey
Copy link
Member

I've created #2573 to address the html test errors.

@roberthbailey roberthbailey added the kind/bug These are bugs. label May 11, 2022
@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 2ea012c6-1b24-4818-92a6-a226d0dae9b5

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2569/head:pr_2569 && git checkout pr_2569
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.23.0-5ccc5f4-amd64

@google-oss-prow google-oss-prow bot added the lgtm label May 11, 2022
@roberthbailey roberthbailey merged commit 015822a into googleforgames:main May 11, 2022
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Hades32, roberthbailey

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@roberthbailey roberthbailey added this to the 1.23.0 milestone May 11, 2022
@Hades32 Hades32 deleted the patch-1 branch May 11, 2022 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panic in sidecar in 1.22.0
3 participants