Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid Comparison level when refreshing applications #21839

Closed
1 of 3 tasks
rumstead opened this issue Feb 10, 2025 · 12 comments
Closed
1 of 3 tasks

Invalid Comparison level when refreshing applications #21839

rumstead opened this issue Feb 10, 2025 · 12 comments
Labels
bug Something isn't working

Comments

@rumstead
Copy link
Member

rumstead commented Feb 10, 2025

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug
It looks like there is a race condition or some flow that can trigger an invalid comparison level when refreshing applications.

time="2025-02-10T20:33:00Z" level=info msg="Refreshing app status (controller refresh requested), level (831479569432)" app-namespace=namespace app-qualified-name=namespace/app-name application=app-name project=project

time="2025-02-10T20:33:01Z" level=info msg="Reconciliation completed" app-namespace=namespace app-qualified-name=namespace/app-name application=app-name project=project comparison-level=831479569432 dedup_ms=1 dest-name=cluster dest-namespace=namespace dest-server="cluster.com:443" diff_ms=201 git_ms=23 health_ms=14 live_ms=412 patch_ms=154 project=service-mesh setop_ms=0 settings_ms=0 sync_ms=1 time_ms=1085

To Reproduce

Expected behavior

Screenshots

Version

argocd-server: v2.12.4+27d1e64
  BuildDate: 2024-09-26T06:36:13Z
  GitCommit: 27d1e641b6ea99d9f4bf788c032aeaeefd782910
  GitTreeState: clean
  GoVersion: go1.22.4
  Compiler: gc
  Platform: linux/amd64
  Kustomize Version: v5.4.2 2024-05-22T15:20:33Z
  Helm Version: v3.15.2+g1a500d5
  Kubectl Version: v0.29.6
  Jsonnet Version: v0.20.0

Logs

Paste any relevant application logs here.
@rumstead rumstead added the bug Something isn't working label Feb 10, 2025
@gdsoumya
Copy link
Member

Is this reproducible or happens randomly? If it's reproducible can you provide steps to reproduce it.

@rumstead
Copy link
Member Author

I can’t reproduce it but it happens a few times a day in our instance. I haven’t been able to attribute it to any issue but feels like a race condition

@gdsoumya
Copy link
Member

gdsoumya commented Feb 22, 2025

I tried to search the codebase where we use CompareWith in most places we just set a static value and the part where we store it also is wrapped around a mutex lock. I do suspect 2 places where something wrong might be happening :

  1. ctrl.refreshRequestedApps[key] = compareWith.Max(ctrl.refreshRequestedApps[key])
  2. compareWith, err := strconv.Atoi(parts[2])

1 accesses ctrl.refreshRequestedApps[key] without checking if the value exists which generally should default to 0 but maybe somehow accesses some stale memory value/address
2 converts a string to int which could have issues

But just going through the logic and the calls I couldn't see where this invalid value might be stored.

@rumstead
Copy link
Member Author

I underestimated how frequently we see this. I see it thousands of times per day with 14k applications where level > 3.

@rumstead
Copy link
Member Author

I take back my race condition comment, my guess is that is printing the memory address of a CompareWith pointer.

@gdsoumya
Copy link
Member

gdsoumya commented Feb 23, 2025

But the print statement doesn't get a pointer compareWith is the actual value :

logCtx.Infof("Refreshing app status (%s), level (%d)", reason, compareWith)

@gdsoumya
Copy link
Member

I underestimated how frequently we see this. I see it thousands of times per day with 14k applications where level > 3.

Is it possible for you to query the logs to see what values you see for the level specifically if you can see values 0,1,2 and 3 or if there's a value that doesn't show up at all among these 4. The invalid level val might be related to specific case so narrowing it down to a specific value or set of values would make it easy to reproduce.

@rumstead
Copy link
Member Author

Image

There are only ever 2 log lines with the value.

time="2025-02-23T12:15:33Z" level=info msg="Refreshing app status (controller refresh requested), level (824640216104)" app-namespace=namespace app-qualified-name=namespace/app-name application=app-name project=project

time="2025-02-23T12:15:33Z" level=info msg="Reconciliation completed" app-namespace=namespace app-qualified-name=namespace/app-name application=app-name project=project comparison-level=824640216104 dedup_ms=0 dest-name=blkdmz-aks-api-musw2 dest-namespace=target-name dest-server="target server" diff_ms=64 git_ms=11 health_ms=0 live_ms=1 patch_ms=124 project=project setop_ms=70 settings_ms=0 sync_ms=0 time_ms=327

@rumstead
Copy link
Member Author

I wonder if these are decimal representations of a hex memory address.

Decimal: 824655384872, Hexadecimal: 0xc0014a9128
Decimal: 824655962712, Hexadecimal: 0xc001536258

@gdsoumya
Copy link
Member

I also think it is somehow getting a pointer stored as the invalid no.s all correspond to memory addresses. Just need to find out where it's being stored.

@gdsoumya
Copy link
Member

Oh wait I was looking at master code, in branch release-2.12 there is indeed a bug which is fixed in master :

master :

ctrl.appComparisonTypeRefreshQueue.AddAfter(fmt.Sprintf("%s/%d", key, *compareWith), *after)

release-2.12 :
ctrl.appComparisonTypeRefreshQueue.AddAfter(fmt.Sprintf("%s/%d", key, compareWith), *after)

This was already fixed a month ago f548fd7 🤦‍♂.

Will cherry pick into the releases

@gdsoumya
Copy link
Member

Cherrypicked into release branches should be fixed in the next releases. Closing this issue as resolved @rumstead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants