Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apiServerLoadBalancer.enabled set to false loops infinitely with ClusterClass #1729

Closed
mnaser opened this issue Oct 13, 2023 · 11 comments
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@mnaser
Copy link
Contributor

mnaser commented Oct 13, 2023

/kind bug

What steps did you take and what happened:
When using ClusterClass and not enabling apiServerLoadBalancer, if the clusterClass sets the following:

apiServerLoadBalancer:
  enabled: false

This will then be 'reverted' back to

apiServerLoadBalancer: {}

And by doing that, it keeps looping forever, I've got an environemnt where the OpenStackCluster resource is up to 3412256(!) generations so far.

Upon editing the CRD and adding a default to false, it shows up in the CRD and reconciles just fine.

What did you expect to happen:
We shouldn't probably let it be set or avoid the omitempty to make sure this doesnt happen.

Enabled bool `json:"enabled,omitempty"`

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster API Provider OpenStack version (Or git rev-parse HEAD if manually built):
  • Cluster-API version:
  • OpenStack version:
  • Minikube/KIND version:
  • Kubernetes version (use kubectl version):
  • OS (e.g. from /etc/os-release):
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 13, 2023
@jichenjc
Copy link
Contributor

We shouldn't probably let it be set or avoid the omitempty to make sure this doesnt happen.

maybe we have enabled from false=>true or true=> false caes (though very unlikely)
so I think make it immutable after cluster creation seems better option?

as cluster created with LB or w/ LB seems should not allow to switch over

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 30, 2024
@mdbooth
Copy link
Contributor

mdbooth commented Jan 30, 2024

This feels like a bug in ClusterClass. These are 2 valid representations of the same state:

apiServerLoadBalancer:
  enabled: false
apiServerLoadBalancer: {}

You're going to hit this anywhere you specify the implicit zero value for a field with omitempty, and we have lots of those. I'm not sure how ClusterClass should deal with it, because it doesn't have the go types (and therefore marshalling) for all the objects it handles. The API server knows, though, because it has the CRD. Can ClusterClass get a canonical representation for comparison from the API server?

@JoelSpeed
Copy link

In general, it is bad to have two ways to express the same thing within an API, I would lean towards removing the omitempty and making the API more discoverable by doing so.

What is the meaning of apiServerLoadBalancer: {} though? Why is that valid and stored? I guess in the go representation that this struct is not a pointer? Or is missing omitempty?

Why in cluster class are we trying to set this explicitly? I guess the default is that the load balance is enabled and we want to disable it in this case?

I don't think ClusterClass can know the difference, I'm not necessarily sure the API server even knows the difference, because this is a json serialisation issue.
The API server takes raw yaml and stores it, it won't manipulate and kill that enabled: False, something else is doing that while doing a json serialisation loop, probably the controller?

@mdbooth
Copy link
Contributor

mdbooth commented Jan 30, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 30, 2024
@mdbooth
Copy link
Contributor

mdbooth commented Jan 30, 2024

In general, it is bad to have two ways to express the same thing within an API, I would lean towards removing the omitempty and making the API more discoverable by doing so.

Surely this is a general issue, though. It implies you can never use omitempty anywhere, because anywhere it is omitted you have 2 different representations of the same state: one where it is omitted, and one where it is included with the zero value.

What is the meaning of apiServerLoadBalancer: {} though? Why is that valid and stored? I guess in the go representation that this struct is not a pointer? Or is missing omitempty?

It is not a pointer, and has omitempty, which is interesting:

// APIServerLoadBalancer configures the optional LoadBalancer for the APIServer.
// It must be activated by setting `enabled: true`.
// +optional
APIServerLoadBalancer APIServerLoadBalancer `json:"apiServerLoadBalancer,omitempty"`

Why in cluster class are we trying to set this explicitly? I guess the default is that the load balance is enabled and we want to disable it in this case?

I don't think ClusterClass can know the difference, I'm not necessarily sure the API server even knows the difference, because this is a json serialisation issue. The API server takes raw yaml and stores it, it won't manipulate and kill that enabled: False, something else is doing that while doing a json serialisation loop, probably the controller?

@JoelSpeed
Copy link

Surely this is a general issue, though. It implies you can never use omitempty anywhere, because anywhere it is omitted you have 2 different representations of the same state: one where it is omitted, and one where it is included with the zero value.

Kube API guidelines tell you to use pointers for optional fields. In this case, the zero value being a nil, is not common for people to write out explicitly in their yaml. I think that would normally get around the present issue.

You have to be careful when writing clients to try not to adjust what's already been written. Moving towards server side apply, this becomes much easier as clients are expected only to send content for fields that they care about themselves.

It is not a pointer, and has omitempty, which is interesting:

Omitempty only works for pointers to structs, not structs themselves, there is no point having omitempty on this case, it does nothing. Make it a pointer if you want the omit to work.

@mdbooth
Copy link
Contributor

mdbooth commented Jan 31, 2024

After offline discussion with @JoelSpeed about this, I think I understand a new API principal: omitempty is only for struct pointers. IOW we're massively over-using it.

I'd like this to be another v1beta1 cleanup: remove omitempty from all values which aren't struct pointers. Unfortunately this may mean turning some structs (back) into pointers if we want to ensure they're omitted, which I hate because Go makes struct pointers in large data structures a syntactic minefield1.

Footnotes

  1. Because it doesn't require a -> operator like more civilised pointer languages, it's impossible to visually distinguish a pointer dereference from regular access of a struct field, which means it's easy to miss that you need a null check. Even worse, turning a struct into a struct pointer won't break compilation of existing references, meaning they will all have to be checked manually when making this change.

@mdbooth mdbooth added this to the v1beta1 milestone Jan 31, 2024
@JoelSpeed
Copy link

I'd like this to be another v1beta1 cleanup: remove omitempty from all values which aren't struct pointers.

If you do not have required fields or validations inside a struct that should be optional, but would fail if struct: {} was present (eg minproperties), then you don't need to have the omitempty and you don't need the pointer.

With a struct with only optional fields, setting foo: {} and then something marshalling that out to include empty values for all of the members generally improves the discoverability of the API

@EmilienM
Copy link
Contributor

/assign mdbooth

@mdbooth
Copy link
Contributor

mdbooth commented Mar 22, 2024

I believe this was fixed in #1937

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

7 participants