Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant "admission webhook denied the request: tier baseline priority 253 overlaps with existing Tier" Errors #6694

Closed
mike-carrigan24 opened this issue Sep 26, 2024 · 6 comments
Labels
kind/support Categorizes issue or PR as related to a support question.

Comments

@mike-carrigan24
Copy link

Describe what you are trying to do
Hey all,

I am trying to deploy Antrea v1.13.1. Following deployment on one cluster, the Antrea controller is constantly logging this error:

1 tier.go:162] "Failed to create system Tier on init, will retry" tier="baseline" attempts=50814 err="admission webhook "tiervalidator.antrea.io" denied the request: tier baseline priority 253 overlaps with existing Tier"

This error is not present on the other cluster where Antrea v1.13.1 is running. The following tiers exist on both clusters currently:

NAME PRIORITY AGE
application 250 4d17h
baseline 253 4d17h
emergency 50 4d17h
networkops 150 4d17h
platform 200 4d17h
securityops 100 4d17h

The only information I have been able to find on this is from https://knowledge.broadcom.com/external/article/301567/antrea-controller-continues-complaining.html, which states:

[The Antrea] operator lack of deleteCollection verb for tierentitlementbindings and tierentitlements, which led to antrea operator failed to delete existing tier, when antrea-controller tried to startup, it tried to create tier baseline 253, but there is a existing one, so webhook denies the request.

The two deployments I have appear to be identical. I confirmed that the antrea image, clusterroles, and clusterrolebindings for the controller are the same across both deployments.

Would someone be able to provide any additional insight on why this error may be occurring and if there are any steps I can take to fix this?

Thanks!

@mike-carrigan24 mike-carrigan24 added the kind/support Categorizes issue or PR as related to a support question. label Sep 26, 2024
@antoninbas
Copy link
Contributor

I am trying to deploy Antrea v1.13.1.

Are you updating Antrea on a cluster where an older version of Antrea is running (if yes, which version is that)? Or is this a new deployment of Antrea?

The only information I have been able to find on this is from https://knowledge.broadcom.com/external/article/301567/antrea-controller-continues-complaining.html, which states:

Are you running a commercial version of Antrea (with tierentitlementbindings and tierentitlements support)? Are you using the Antrea Openshift operator?

Have you tried deleting the antrea-controller Pod and letting it restart? It is a strange error as the controller is supposed to check for existence of the system tiers before attempting to create them.

antoninbas added a commit to antoninbas/antrea that referenced this issue Sep 27, 2024
System Tier initialization, as well as all the APIserver endpoints that
depend on informer caches, should ideally wait for the caches to sync
before using the listers. An easy way to do that is to ensure that the
APIServer does not run until the informer factory has synced all caches.

Additionally, we improve the logic in system Tier initialization:
* After every failed API call, we should probably check the state of the
  informer cache again (via the lister) and decide which step is needed
  next (creation / update / nothing). In case of a failed create or
  update, this gives us the opportunity to check again if the Tier
  actually exists, and if yes, what its priority is.
* AlreadyExists is no longer treated differently for create. The reason
  is that if the Tier already exists and for some reason it was not
  returned by the Lister, it will probably be the validation webhook
  that fails (overlapping priority), and the error won't match
  AlreadyExists.

Related to antrea-io#6694

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Sep 27, 2024
System Tier initialization, as well as all the APIserver endpoints that
depend on informer caches, should ideally wait for the caches to sync
before using the listers. An easy way to do that is to ensure that the
APIServer does not run until the informer factory has synced all caches.

Additionally, we improve the logic in system Tier initialization:
* After every failed API call, we should probably check the state of the
  informer cache again (via the lister) and decide which step is needed
  next (creation / update / nothing). In case of a failed create or
  update, this gives us the opportunity to check again if the Tier
  actually exists, and if yes, what its priority is.
* AlreadyExists is no longer treated differently for create. The reason
  is that if the Tier already exists and for some reason it was not
  returned by the Lister, it will probably be the validation webhook
  that fails (overlapping priority), and the error won't match
  AlreadyExists.

Related to antrea-io#6694

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
@mike-carrigan24
Copy link
Author

I've seen this twice - once when upgrading from Antrea v1.13.0 and once when removing flannel and installing Antrea v1.13.1

I am not running a commercial version of Antrea or using the Antrea Openshift operator.

Deleting the antrea-controller Pod and letting it restart did resolve this issue!

Thanks!

@antoninbas
Copy link
Contributor

FYI, I have been working on a PR which hopefully will avoid this situation and hopefully the patch will be included in the next release (v2.2) - Antrea v1.13 is quite old and no longer maintained.

antoninbas added a commit to antoninbas/antrea that referenced this issue Sep 30, 2024
System Tier initialization, as well as all the APIserver endpoints that
depend on informer caches, should ideally wait for the caches to sync
before using the listers. An easy way to do that is to ensure that the
APIServer does not run until the informer factory has synced all caches.

Additionally, we improve the logic in system Tier initialization:
* After every failed API call, we should probably check the state of the
  informer cache again (via the lister) and decide which step is needed
  next (creation / update / nothing). In case of a failed create or
  update, this gives us the opportunity to check again if the Tier
  actually exists, and if yes, what its priority is.
* AlreadyExists is no longer treated differently for create. The reason
  is that if the Tier already exists and for some reason it was not
  returned by the Lister, it will probably be the validation webhook
  that fails (overlapping priority), and the error won't match
  AlreadyExists.

Related to antrea-io#6694

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
@mike-carrigan24
Copy link
Author

Thanks @antoninbas,

To confirm - I see in the versioning doc that "Antrea maintains release branches for the two most recent minor releases". Does this mean that the current supported versions of Antrea are v1.15, v2.0, and v2.1? And then v1.15 support will be dropped when v2.2 is released?

@antoninbas
Copy link
Contributor

@mike-carrigan24 your understanding is correct. In this specific case, I can probably backport #6696 to 1.15, 2.0 and 2.1, even though I have no strong evidence that it will resolve the issue you encountered (I think it will).

antoninbas added a commit to antoninbas/antrea that referenced this issue Oct 8, 2024
System Tier initialization should ideally wait for the Tier informer's
cache to sync before using the lister.  Otherwise we may think that the
system Tiers have not been created yet when in fact it is just the
informer that has not completed the initial List operation.

Additionally, we improve the logic in system Tier initialization:
* After every failed API call, we should probably check the state of the
  informer cache again (via the lister) and decide which step is needed
  next (creation / update / nothing). In case of a failed create or
  update, this gives us the opportunity to check again if the Tier
  actually exists, and if yes, what its priority is.
* AlreadyExists is no longer treated differently for create. The reason
  is that if the Tier already exists and for some reason it was not
  returned by the Lister, it will probably be the validation webhook
  that fails (overlapping priority), and the error won't match
  AlreadyExists.

Related to antrea-io#6694

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>

Don't block apiserver when caches are not synced

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit that referenced this issue Oct 8, 2024
System Tier initialization should ideally wait for the Tier informer's
cache to sync before using the lister.  Otherwise we may think that the
system Tiers have not been created yet when in fact it is just the
informer that has not completed the initial List operation.

Additionally, we improve the logic in system Tier initialization:
* After every failed API call, we should probably check the state of the
  informer cache again (via the lister) and decide which step is needed
  next (creation / update / nothing). In case of a failed create or
  update, this gives us the opportunity to check again if the Tier
  actually exists, and if yes, what its priority is.
* AlreadyExists is no longer treated differently for create. The reason
  is that if the Tier already exists and for some reason it was not
  returned by the Lister, it will probably be the validation webhook
  that fails (overlapping priority), and the error won't match
  AlreadyExists.

Related to #6694

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Oct 8, 2024
System Tier initialization should ideally wait for the Tier informer's
cache to sync before using the lister.  Otherwise we may think that the
system Tiers have not been created yet when in fact it is just the
informer that has not completed the initial List operation.

Additionally, we improve the logic in system Tier initialization:
* After every failed API call, we should probably check the state of the
  informer cache again (via the lister) and decide which step is needed
  next (creation / update / nothing). In case of a failed create or
  update, this gives us the opportunity to check again if the Tier
  actually exists, and if yes, what its priority is.
* AlreadyExists is no longer treated differently for create. The reason
  is that if the Tier already exists and for some reason it was not
  returned by the Lister, it will probably be the validation webhook
  that fails (overlapping priority), and the error won't match
  AlreadyExists.

Related to antrea-io#6694

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Oct 8, 2024
System Tier initialization should ideally wait for the Tier informer's
cache to sync before using the lister.  Otherwise we may think that the
system Tiers have not been created yet when in fact it is just the
informer that has not completed the initial List operation.

Additionally, we improve the logic in system Tier initialization:
* After every failed API call, we should probably check the state of the
  informer cache again (via the lister) and decide which step is needed
  next (creation / update / nothing). In case of a failed create or
  update, this gives us the opportunity to check again if the Tier
  actually exists, and if yes, what its priority is.
* AlreadyExists is no longer treated differently for create. The reason
  is that if the Tier already exists and for some reason it was not
  returned by the Lister, it will probably be the validation webhook
  that fails (overlapping priority), and the error won't match
  AlreadyExists.

Related to antrea-io#6694

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Oct 8, 2024
System Tier initialization should ideally wait for the Tier informer's
cache to sync before using the lister.  Otherwise we may think that the
system Tiers have not been created yet when in fact it is just the
informer that has not completed the initial List operation.

Additionally, we improve the logic in system Tier initialization:
* After every failed API call, we should probably check the state of the
  informer cache again (via the lister) and decide which step is needed
  next (creation / update / nothing). In case of a failed create or
  update, this gives us the opportunity to check again if the Tier
  actually exists, and if yes, what its priority is.
* AlreadyExists is no longer treated differently for create. The reason
  is that if the Tier already exists and for some reason it was not
  returned by the Lister, it will probably be the validation webhook
  that fails (overlapping priority), and the error won't match
  AlreadyExists.

Related to antrea-io#6694

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Oct 8, 2024
System Tier initialization should ideally wait for the Tier informer's
cache to sync before using the lister.  Otherwise we may think that the
system Tiers have not been created yet when in fact it is just the
informer that has not completed the initial List operation.

Additionally, we improve the logic in system Tier initialization:
* After every failed API call, we should probably check the state of the
  informer cache again (via the lister) and decide which step is needed
  next (creation / update / nothing). In case of a failed create or
  update, this gives us the opportunity to check again if the Tier
  actually exists, and if yes, what its priority is.
* AlreadyExists is no longer treated differently for create. The reason
  is that if the Tier already exists and for some reason it was not
  returned by the Lister, it will probably be the validation webhook
  that fails (overlapping priority), and the error won't match
  AlreadyExists.

Related to antrea-io#6694

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit that referenced this issue Oct 9, 2024
System Tier initialization should ideally wait for the Tier informer's
cache to sync before using the lister.  Otherwise we may think that the
system Tiers have not been created yet when in fact it is just the
informer that has not completed the initial List operation.

Additionally, we improve the logic in system Tier initialization:
* After every failed API call, we should probably check the state of the
  informer cache again (via the lister) and decide which step is needed
  next (creation / update / nothing). In case of a failed create or
  update, this gives us the opportunity to check again if the Tier
  actually exists, and if yes, what its priority is.
* AlreadyExists is no longer treated differently for create. The reason
  is that if the Tier already exists and for some reason it was not
  returned by the Lister, it will probably be the validation webhook
  that fails (overlapping priority), and the error won't match
  AlreadyExists.

Related to #6694

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Oct 9, 2024
System Tier initialization should ideally wait for the Tier informer's
cache to sync before using the lister.  Otherwise we may think that the
system Tiers have not been created yet when in fact it is just the
informer that has not completed the initial List operation.

Additionally, we improve the logic in system Tier initialization:
* After every failed API call, we should probably check the state of the
  informer cache again (via the lister) and decide which step is needed
  next (creation / update / nothing). In case of a failed create or
  update, this gives us the opportunity to check again if the Tier
  actually exists, and if yes, what its priority is.
* AlreadyExists is no longer treated differently for create. The reason
  is that if the Tier already exists and for some reason it was not
  returned by the Lister, it will probably be the validation webhook
  that fails (overlapping priority), and the error won't match
  AlreadyExists.

Related to antrea-io#6694

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Oct 9, 2024
System Tier initialization should ideally wait for the Tier informer's
cache to sync before using the lister.  Otherwise we may think that the
system Tiers have not been created yet when in fact it is just the
informer that has not completed the initial List operation.

Additionally, we improve the logic in system Tier initialization:
* After every failed API call, we should probably check the state of the
  informer cache again (via the lister) and decide which step is needed
  next (creation / update / nothing). In case of a failed create or
  update, this gives us the opportunity to check again if the Tier
  actually exists, and if yes, what its priority is.
* AlreadyExists is no longer treated differently for create. The reason
  is that if the Tier already exists and for some reason it was not
  returned by the Lister, it will probably be the validation webhook
  that fails (overlapping priority), and the error won't match
  AlreadyExists.

Related to antrea-io#6694

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Oct 9, 2024
System Tier initialization should ideally wait for the Tier informer's
cache to sync before using the lister.  Otherwise we may think that the
system Tiers have not been created yet when in fact it is just the
informer that has not completed the initial List operation.

Additionally, we improve the logic in system Tier initialization:
* After every failed API call, we should probably check the state of the
  informer cache again (via the lister) and decide which step is needed
  next (creation / update / nothing). In case of a failed create or
  update, this gives us the opportunity to check again if the Tier
  actually exists, and if yes, what its priority is.
* AlreadyExists is no longer treated differently for create. The reason
  is that if the Tier already exists and for some reason it was not
  returned by the Lister, it will probably be the validation webhook
  that fails (overlapping priority), and the error won't match
  AlreadyExists.

Related to antrea-io#6694

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit that referenced this issue Oct 10, 2024
System Tier initialization should ideally wait for the Tier informer's
cache to sync before using the lister.  Otherwise we may think that the
system Tiers have not been created yet when in fact it is just the
informer that has not completed the initial List operation.

Additionally, we improve the logic in system Tier initialization:
* After every failed API call, we should probably check the state of the
  informer cache again (via the lister) and decide which step is needed
  next (creation / update / nothing). In case of a failed create or
  update, this gives us the opportunity to check again if the Tier
  actually exists, and if yes, what its priority is.
* AlreadyExists is no longer treated differently for create. The reason
  is that if the Tier already exists and for some reason it was not
  returned by the Lister, it will probably be the validation webhook
  that fails (overlapping priority), and the error won't match
  AlreadyExists.

Related to #6694

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit that referenced this issue Oct 10, 2024
System Tier initialization should ideally wait for the Tier informer's
cache to sync before using the lister.  Otherwise we may think that the
system Tiers have not been created yet when in fact it is just the
informer that has not completed the initial List operation.

Additionally, we improve the logic in system Tier initialization:
* After every failed API call, we should probably check the state of the
  informer cache again (via the lister) and decide which step is needed
  next (creation / update / nothing). In case of a failed create or
  update, this gives us the opportunity to check again if the Tier
  actually exists, and if yes, what its priority is.
* AlreadyExists is no longer treated differently for create. The reason
  is that if the Tier already exists and for some reason it was not
  returned by the Lister, it will probably be the validation webhook
  that fails (overlapping priority), and the error won't match
  AlreadyExists.

Related to #6694

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
@antoninbas
Copy link
Contributor

Closing this. I am fairly confident that #6696 will fix this. The patch will be included in v2.2, and has been backported to v2.1, v2.0, v1.15

hangyan pushed a commit to hangyan/antrea that referenced this issue Oct 29, 2024
System Tier initialization should ideally wait for the Tier informer's
cache to sync before using the lister.  Otherwise we may think that the
system Tiers have not been created yet when in fact it is just the
informer that has not completed the initial List operation.

Additionally, we improve the logic in system Tier initialization:
* After every failed API call, we should probably check the state of the
  informer cache again (via the lister) and decide which step is needed
  next (creation / update / nothing). In case of a failed create or
  update, this gives us the opportunity to check again if the Tier
  actually exists, and if yes, what its priority is.
* AlreadyExists is no longer treated differently for create. The reason
  is that if the Tier already exists and for some reason it was not
  returned by the Lister, it will probably be the validation webhook
  that fails (overlapping priority), and the error won't match
  AlreadyExists.

Related to antrea-io#6694

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as related to a support question.
Projects
None yet
Development

No branches or pull requests

2 participants