Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't apply fleetautoscaler on 1.17.0 #2253

Closed
karamaru-alpha opened this issue Sep 4, 2021 · 37 comments
Closed

Can't apply fleetautoscaler on 1.17.0 #2253

karamaru-alpha opened this issue Sep 4, 2021 · 37 comments
Labels
kind/bug These are bugs.
Milestone

Comments

@karamaru-alpha
Copy link
Contributor

karamaru-alpha commented Sep 4, 2021

What happened:
Can't apply fleetautoscaler on 1.17.0 agones.

$ kubectl apply -f https://raw.githubusercontent.com/googleforgames/agones/release-1.17.0/examples/simple-game-server/fleetautoscaler.yaml

Error from server (InternalError): error when creating "examples/simple-game-server/fleetautoscaler.yaml": Internal error occurred: failed calling webhook "validations.agones.dev": Post "https://agones-controller-service.agones-system.svc:443/validate?timeout=10s": EOF

How to reproduce it (as minimally and precisely as possible):

reproduce guide

Create cluster on GKE

$ gcloud container clusters create [CLUSTER_NAME] --cluster-version=1.20
--tags=game-server
--scopes=gke-default
--num-nodes=4
--no-enable-autoupgrade
--machine-type=e2-standard-4
$ gcloud config set container/cluster [CLUSTER_NAME]
$ gcloud container clusters get-credentials [CLUSTER_NAME]
$ gcloud compute firewall-rules create game-server-firewall
--allow udp:7000-8000
--target-tags game-server
--description "Firewall to allow game server udp traffic"

Install agones 1.17.0

$ kubectl create namespace agones-system
$ kubectl apply -f https://raw.githubusercontent.com/googleforgames/agones/release-1.17.0/install/yaml/install.yaml

Create fleet

$ kubectl apply -f https://raw.githubusercontent.com/googleforgames/agones/release-1.17.0/examples/simple-game-server/fleet.yaml

Create fleetautoscaler

$ kubectl apply -f https://raw.githubusercontent.com/googleforgames/agones/release-1.17.0/examples/simple-game-server/fleetautoscaler.yaml
-> error!

Anything else we need to know?:

  • 1.16.0 work fine
  • According to agones-controller logs, nil pointer error is occured

Environment:

  • Agones version: 1.17.0
  • Kubernetes version (use kubectl version): 1.20+
  • Cloud provider or hardware configuration: GKE
  • Install method (yaml/helm): yaml

This issue from disscussion on slack

@karamaru-alpha karamaru-alpha added the kind/bug These are bugs. label Sep 4, 2021
@karamaru-alpha karamaru-alpha changed the title fleetautoscaler is not work on 1.17.0 Can't apply fleetautoscaler on 1.17.0 Sep 4, 2021
@roberthbailey
Copy link
Member

Thanks for the bug report. I've been able to reproduce the bug locally.

@markmandel
Copy link
Collaborator

@roberthbailey are you able to replicate it in an e2e test? I'm struggling to work out why this wasn't picked up in e2e testing 🤔

@roberthbailey
Copy link
Member

The stack trace is pointing to https://github.com/googleforgames/agones/blob/main/pkg/apis/autoscaling/v1/fleetautoscaler.go#L207 which is the line that was changed in #2246

/cc @lambertwang

@roberthbailey
Copy link
Member

@lambertwang / @cindy52 - did either of you do a smoke test with the yaml install after #2246 was merged?

@markmandel
Copy link
Collaborator

If this is blocking people creating FleetAutoscalers - I'm wondering if a fix warrants a hotfix release?

@roberthbailey
Copy link
Member

I think it likely does.... That's a pretty major issue.

@roberthbailey
Copy link
Member

I just created a second cluster and did a helm install... and it works without issue. That explains why e2e didn't catch it - e2e doesn't test via the yaml install.

@markmandel
Copy link
Collaborator

markmandel commented Sep 7, 2021

I just created a second cluster and did a helm install... and it works without issue. That explains why e2e didn't catch it - e2e doesn't test via the yaml install.

Did you helm install with feature flags enabled or disabled?

@markmandel
Copy link
Collaborator

So I'm noticing that the install.yaml for 1.17.0 - still all points to 1.17.0-rc

https://raw.githubusercontent.com/googleforgames/agones/release-1.17.0/install/yaml/install.yaml

@roberthbailey
Copy link
Member

I was just about to point that out....

@markmandel
Copy link
Collaborator

Whereas the tag doesn't: https://raw.githubusercontent.com/googleforgames/agones/v1.17.0/install/yaml/install.yaml

Something went wrong with pushing the release branch. We can manually fix this I think at least.

@roberthbailey
Copy link
Member

I installed helm following the instructions: https://agones.dev/site/docs/installation/install-agones/helm/#helm-3:

helm repo add agones https://agones.dev/chart/stable
helm repo update
helm install my-release --namespace agones-system --create-namespace agones/agones

@markmandel
Copy link
Collaborator

markmandel commented Sep 7, 2021

Sorry, if I wasn't clear before: The 1.17.0 tag of install.yaml is correct, whereas the release branch isn't correct.

@cindy52 I'm assuming you didn't manually push the release branch? Or maybe run make do-release before those last minute PR's where merged? 🤔 just trying to work out how this happened.

I expect we can fix this by manually bringing the release branch inline with the tag -- but we should triple check that that will fix the issue.

Worth noting: will need to disable the protection on the release branch temporarily to do this.

@roberthbailey
Copy link
Member

Before we do any branch shenanigans, I'm re-running the test with the yaml file from the tag to make sure that fixes the issue.

@roberthbailey
Copy link
Member

$ kubectl apply -f https://raw.githubusercontent.com/googleforgames/agones/release-1.17.0/examples/simple-game-server/fleetautoscaler.yaml
fleetautoscaler.autoscaling.agones.dev/simple-game-server-autoscaler created
$ kubectl get fas
NAME                            AGE
simple-game-server-autoscaler   4s

@roberthbailey
Copy link
Member

Ok, now I feel ok moving the branch pointer.

@roberthbailey
Copy link
Member

Looking at git history, here's what I see for 1.16:

commit 9eda3bdb5ae578a4682c08ac4c6eb894417a2020 (tag: v1.16.0, upstream/release-1.16.0)
Author: Robert Bailey <robertbailey@google.com>
Date:   Tue Jul 20 12:59:28 2021 -0700

    Release 1.16.0. (#2184)

commit 1db3ddf519ea4abc1ff325390a9df4755f105f79
Author: André Micheletti <andreluizmtmicheletti@gmail.com>
Date:   Mon Jul 19 04:13:19 2021 -0300

    add godot-sdk to third-party libraries-tools (#2182)

commit 82c1a9cdc7717499c4db9a1668cbce9ec4e741b5 (tag: v1.16.0-rc, upstream/release-1.16.0-rc)
Author: Robert Bailey <robertbailey@google.com>
Date:   Tue Jul 13 20:46:12 2021 -0700

    This includes all of the changes for the 1.16.0 release candidate. (#2181)

But for 1.17 I see tags but not branches:

commit 50d4e8484b7c12c764118f5bc0321fbcc3ab54bb (tag: v1.17.0)
Author: cindy52 <ciguo@google.com>
Date:   Wed Sep 1 13:38:03 2021 -0400

    Release/1.17.0 (#2247)
    
    * release 1.17.0
    
    * release 1.17.0
    
    * Release 1.17.0
    
    * Release 1.17.0
    
    * Release 1.17.0
    
    * Sync the bugfix of fleet autoscaler nil check to release 1.17.0

...
commit 4bf3398cc458bc19e66d8f571451850e72dc1089 (tag: v1.17.0-rc)
Author: cindy52 <ciguo@google.com>
Date:   Thu Aug 26 11:40:27 2021 -0400

    Release 1.17.0-rc (#2236)

@roberthbailey
Copy link
Member

https://github.com/googleforgames/agones/tree/release-1.16.0-rc shows "This branch is 38 commits behind main." whereas https://github.com/googleforgames/agones/tree/release-1.17.0-rc shows "This branch is 1 commit ahead, 7 commits behind main."

https://github.com/googleforgames/agones/tree/release-1.17.0 says "This branch is 1 commit ahead, 6 commits behind main"

@roberthbailey
Copy link
Member

It looks like there may have been a rebase issue or a local commit that hadn't been merged that got picked up when pushing the release branch.

@roberthbailey
Copy link
Member

That's probably why I don't see them in my git history.

@roberthbailey
Copy link
Member

Moving the branch to align with the tag looks like it makes the most sense.

@roberthbailey
Copy link
Member

Ok, here's a view of the diff between the tag & branch for the RC: v1.17.0-rc...release-1.17.0-rc

@roberthbailey
Copy link
Member

@markmandel - do you want to tinker with the branches?

@markmandel
Copy link
Collaborator

Erk, looks like there are a few issues in there. There's some stuff that's tagged a -dev?

@markmandel
Copy link
Collaborator

@roberthbailey I don't mind. I can't get to it for a few hours yet.

Might also be worth doing a post-mortem after the fact as well, see if we can identify where things in our release process we can improve things.

@roberthbailey
Copy link
Member

Do you know how to drop branch protections? I think we just need to run:

git branch -f release-1.17.0-rc 4bf3398cc458bc19e66d8f571451850e72dc1089
git branch -f release-1.17.0 50d4e8484b7c12c764118f5bc0321fbcc3ab54bb

to update the branch pointers.

@cindy52
Copy link
Contributor

cindy52 commented Sep 7, 2021

Sorry, if I wasn't clear before: The 1.17.0 tag of install.yaml is correct, whereas the release branch isn't correct.

@cindy52 I'm assuming you didn't manually push the release branch? Or maybe run make do-release before those last minute PR's where merged? 🤔 just trying to work out how this happened.

I expect we can fix this by manually bringing the release branch inline with the tag -- but we should triple check that that will fix the issue.

Worth noting: will need to disable the protection on the release branch temporarily to do this.

I should have done the make-do-release after the change has merged since these steps are all after merge the 1.170 pr, but I might messed up with my git commit by rebasing. I did delete my current branch and pull everything from upstream to start with a clean one.

@markmandel
Copy link
Collaborator

I've got it. Pushing up a fix shortly. 👍🏻

@markmandel
Copy link
Collaborator

Branch has been updated, verifying fix.
https://github.com/googleforgames/agones/tree/release-1.17.0

@markmandel
Copy link
Collaborator

Everything looks to be working for me now. I've put the branch protections back in place, but if someone could double check the fix, I think that would be ideal 👍🏻

@cindy52
Copy link
Contributor

cindy52 commented Sep 7, 2021

I can see it's pointing to 1.17.0 now.

@markmandel markmandel added this to the 1.18.0 milestone Sep 7, 2021
@roberthbailey
Copy link
Member

The release branch looks fixed but the release candidate branch is still not pointing at the correct commit.

@markmandel
Copy link
Collaborator

This should be fixed now 👍🏻

I'd love someone to confirm the install.yaml works for them from 1.17.0, and then I think we can close this issue.

@roberthbailey
Copy link
Member

I've verified (on a fresh cluster) that the getting started guide for the fleet autoscaler is now working using the 1.17 release branch.

@markmandel
Copy link
Collaborator

I've verified (on a fresh cluster) that the getting started guide for the fleet autoscaler is now working using the 1.17 release branch.

Awesome! I'm happy to close this ticket. If you're happy, let's close it! 😃

@roberthbailey
Copy link
Member

@karamaru-alpha - any concerns with closing this out?

@karamaru-alpha
Copy link
Contributor Author

Thank you for your quick and polite response.
I close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug These are bugs.
Projects
None yet
Development

No branches or pull requests

4 participants