Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in client exclude prefixes #1674

Closed
Ex4amp1e opened this issue Oct 1, 2024 · 3 comments
Closed

Race condition in client exclude prefixes #1674

Ex4amp1e opened this issue Oct 1, 2024 · 3 comments
Assignees

Comments

@Ex4amp1e
Copy link
Contributor

Ex4amp1e commented Oct 1, 2024

Expected Behavior

All clients should connect no NSE

Current Behavior

After enpoint healing there is a case where one of the clients can't get ip

Steps to Reproduce

  1. Restart endpoint several times
  2. Check ifconfig and clients logs - at one time one of the clients will get ip after healing normally, but the other will have expected ip included in excluded prefixes list, so he gets an ip and right after gets it removed in a cycle.

Context

Setup 2 clients and 1 endpoint

Looks like it is unnecessary, but the issue was reproduced with custom endpoint envs:

        - name: NSM_IPAM_POLICY
          value: strict
        - name: NSM_CIDR_PREFIX
          value: 172.16.1.100/27,2001:db8::/116

Failure Logs

Broken NSC:

Oct  1 10:08:26.868 [ERRO] [ExcludedPrefixesClient:Request] [cmd:[/bin/app]] Source or destination IPs are overlapping with excluded prefixes, srcIPs: [172.16.1.99/32 2001:db8::3/128], dstIPs: [172.16.1.98/32 2001:db8::2/128], excluded prefixes: [172.16.1.99/32 2001:db8::3/128 172.16.1.98/32 2001:db8::2/128], error: IP 172.16.1.99 is excluded, but it was found in response IPs
Oct  1 10:08:26.878 [ERRO] [cmd:[/bin/app]] policy failed: policies/common/tokens_expired.rego
Oct  1 10:08:26.882 [ERRO] [cmd:[/bin/app]] policy failed: policies/common/tokens_expired.rego
Oct  1 10:08:26.883 [WARN] [cmd:[/bin/app]] Environment variable NODE_NAME is not set. Skipping.
Oct  1 10:08:26.883 [WARN] [cmd:[/bin/app]] The label podName was already assigned to alpine-2. Skipping.
Oct  1 10:08:26.883 [WARN] [cmd:[/bin/app]] Environment variable CLUSTER_NAME is not set. Skipping.

Cluster dump: dump-policy.zip

@Ex4amp1e Ex4amp1e self-assigned this Oct 1, 2024
@Ex4amp1e
Copy link
Contributor Author

Ex4amp1e commented Oct 1, 2024

Plan:

  1. Write unit test to reproduce the issue
  2. Provide fix
  3. Check other tests

@denis-tingaikin
Copy link
Member

@Ex4amp1e
Copy link
Contributor Author

Ex4amp1e commented Oct 7, 2024

Previous plan has been done:

  • Write unit test to reproduce the issue - covered excluded prefixes from response on strict ipam
  • Provide fix
  • Check other tests - found found new bugs, investigating, debugging

TODO:

  • Cherry pick and test PR to avoid recovering of the IPs from the 2 different clients with the same ip context - 0.5d
  • Replace strict imap, add unit tests and check other tests -1.5d (with possible update old tests)
  • Update integration test - use 2 clients here - 2h

@denis-tingaikin PTAL

@NikitaSkrynnik NikitaSkrynnik moved this from In Progress to Under Review in Release v1.14.1 Oct 10, 2024
@Ex4amp1e Ex4amp1e moved this from Under Review to Done in Release v1.14.1 Oct 17, 2024
@Ex4amp1e Ex4amp1e closed this as completed by moving to Done in Release v1.14.1 Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

2 participants