Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to delete a NetworkServiceEndpoints #467

Closed
dezenxi opened this issue Jul 30, 2024 · 3 comments · Fixed by networkservicemesh/sdk-k8s#529
Closed

failed to delete a NetworkServiceEndpoints #467

dezenxi opened this issue Jul 30, 2024 · 3 comments · Fixed by networkservicemesh/sdk-k8s#529
Assignees

Comments

@dezenxi
Copy link

dezenxi commented Jul 30, 2024

Hi,
I got a strange issue from a test, registry-k8s keep printing a lot of messages as below:

Jul 26 06:52:15.269 [ERRO] [type:registry] Error returned from sdk-k8s/pkg/registry/etcd/etcdNSERegistryServer.Unregister: failed to delete a NetworkServiceEndpoints vpn2-a-qt69r in a namespace nsm: Operation cannot be fulfilled on NetworkServiceEndpoint.networkservicemesh.io "vpn2-a-qt69r": the ResourceVersion in the precondition (307410884) does not match the ResourceVersion in record (307411402). The object might have been modified span=63e29ca0cf14dbb0

Jul 26 06:52:18.572 [ERRO] [type:registry] Error returned from sdk-k8s/pkg/registry/etcd/etcdNSERegistryServer.Unregister: failed to delete a NetworkServiceEndpoints vpn1-b-48hkb in a namespace nsm: Operation cannot be fulfilled on NetworkServiceEndpoint.networkservicemesh.io "vpn1-b-48hkb": the ResourceVersion in the precondition (307410941) does not match the ResourceVersion in record (307411475). The object might have been modified span=5a77069c230b8bc0

Jul 26 06:52:19.087 [ERRO] [type:registry] Error returned from sdk-k8s/pkg/registry/etcd/etcdNSERegistryServer.Unregister: failed to delete a NetworkServiceEndpoints forwarder-vpp-qk4xn in a namespace nsm: Operation cannot be fulfilled on NetworkServiceEndpoint.networkservicemesh.io "forwarder-vpp-qk4xn": the ResourceVersion in the precondition (307410952) does not match the ResourceVersion in record (307411485). The object might have been modified span=d27f4caf2aad4ff3

These messages were printed out after few seconds interval
until restarting cmd-forwarder-vpp and cmd-nsmgr pods.
Could you help clarify what could happened with registry-k8s? (@denis-tingaikin )

Regards,
Duong

@dezenxi
Copy link
Author

dezenxi commented Aug 3, 2024

Hi,
Seems this issue is related to spire-server when it could not renew the expired token/NSE
I can reproduce the issue when scaling-down spire-sever to 0.

Regards,
Duong

@NikitaSkrynnik NikitaSkrynnik moved this to In Progress in Release v1.14.0 Aug 9, 2024
@denis-tingaikin denis-tingaikin moved this from In Progress to Blocked in Release v1.14.0 Aug 19, 2024
@dezenxi
Copy link
Author

dezenxi commented Aug 22, 2024

Hi,
I managed to reproduce the issue by scale-out more number of registry-k8s pod. The issue happened immediately after that

Aug 22 04:55:14.205 [ERRO] [type:registry] (12.2) failed to delete a NetworkServiceEndpoints forwarder-vpp-nfhmr in a namespace nsm-system: Operation cannot be fulfilled on NetworkServiceEndpoint.networkservicemesh.io "forwarder-vpp-nfhmr": the ResourceVersion in the precondition (16308) does not match the ResourceVersion in record (16392). The object might have been modified
Aug 22 04:55:14.205 [ERRO] [type:registry] (11.2) failed to delete a NetworkServiceEndpoints forwarder-vpp-nfhmr in a namespace nsm-system: Operation cannot be fulfilled on NetworkServiceEndpoint.networkservicemesh.io "forwarder-vpp-nfhmr": the ResourceVersion in the precondition (16308) does not match the ResourceVersion in record (16392). The object might have been modified
Aug 22 04:55:14.205 [ERRO] [type:registry] (10.3) failed to delete a NetworkServiceEndpoints forwarder-vpp-nfhmr in a namespace nsm-system: Operation cannot be fulfilled on NetworkServiceEndpoint.networkservicemesh.io "forwarder-vpp-nfhmr": the ResourceVersion in the precondition (16308) does not match the ResourceVersion in record (16392). The object might have been modified

The issue does not occur when there only one k8s-registry pod

k scale deployment/registry-k8s --replicas=2
registry-k8s-84d576d58c-ln67p 1/1 Running 0 2m40s
registry-k8s-84d576d58c-zhkmx 1/1 Running 0 18m

It could be 2 registry-k8s are monitoring the same NSEs, then there some conflicts happened

Regards,
Duong

@bszirtes
Copy link

Hello,
With the new v1.14.2-rc.2 the change to WARN log level seems to be working:

...
2024/10/24 08:09:58 [INFO]  [expireNSEServer:Register] selected expiration time 2024-10-24 08:10:58.812753445 +0000 UTC for forwarder-vpp-mwqtb
2024/10/24 08:09:59 [DEBUG] [metadata:nse server] [nse name:nse-kernel-ipv4-65948769b-lzzxs] metadata deleted
2024/10/24 08:09:59 [WARN]  failed to delete a NetworkServiceEndpoints nse-kernel-ipv4-65948769b-lzzxs in a namespace nsm-system, cause: Operation cannot be fulfilled on NetworkServiceEndpoint.networkservicemesh.io "nse-kernel-ipv4-65948769b-lzzxs": the ResourceVersion in the precondition (2461) does not match the ResourceVersion in record (2548). The object might have been modified
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants