-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify the memory footprint of KIC 2.0 wrt 1.x #1465
Comments
We use less. End results with 10k each Ingresses, Services, and KongConsumers are below. Not really sure what's going on with the proxy CPU/RAM consumption. A bit more detail in test.tar.gz 2.x as of current next:
1.3:
Minikube on my laptop, which has a 3.1GHz i5. |
Thanks for digging into that @rainest 👍 So I was expecting a significant improvement in CPU/MEM utilization with KIC 2.0 so no surprise there and I'm happy to see the gains were so high. However, the enormous difference in the proxy caught me off guard: I feel like we should try to account for the difference there, since we only changed how we used upstream but not upstream itself? |
|
Possibly that there was no config in Kong when I initially checked :) It looks like 2.x is stuck adding finalizers before generating config. It's been in this state for quite some time:
It looks like it maybe finished shortly after (unclear--logs appeared to still have events, but it looks like it pushed a non-empty config) and then sent a config that overflowed Kong's cache size! |
Umm, can we fix the cache size and observe the footprint once things have settled? This begs another question: After creating these many resources, how long does it take for 1.x vs 2.x to reach a steady state again? |
Added plugins (one per consumer), not much difference. Proxy usage remains larger on 1.3, but there's not a clear reason why. Confirmed that we had at least one successful config POST. This runs up against the practical limits of DB-less mode (at least on my machine), and this much config is prone to issues with timer exhaustion and/or the NGINX process getting killed out of the blue by a kworker (I do not know why: I am not imposing limits, am not out of memory, and do not see obvious explanations in minikube service, kernel, Docker, or Pod event logs). The proxy container remains around long enough to receive a non-empty config and report status. I can get services to confirm they're not empty, but not all at once because pagination. Trying to Whatever is using memory isn't accounted for in status, which is only reporting 100s of MB versus the GBs in use. If we want to dig into that more, I can try to retrieve and analyze core files, but I don't know that it's worth the effort, especially since usage is less on 2.x.
Well, apparently no. I have cache size set to 4GB. Instability happens regardless. DB-less 🤷 That all said, I'm not too concerned with this from the controller memory usage perspective. It shouldn't care about proxy memory usage or instability, as its memory usage should be largely:
The controller builds all of these regardless of whether it can send them to Kong. We don't expect memory usage to vary based on successful completion of the config POSTs, do we? I suppose the ephemeral structures may remain around for longer, but those are (a) apparently not the bulk of usage (monitoring is kinda limited without Prometheus, but a basic "top containers" loop doesn't show much fluctuation during updates) and (b) are shared between 1.x and 2.x (they both use the same parser and Kong config blob generation code). Non-memory concerns appear more pressing: it looks like 2.x has a flaw in its config generation/hash comparison logic, as it's sending config updates without K8S resource changes. I've retrieved some pcaps to try and pull config details from them. |
Alright, conclusion is that trying to gather pcaps with this large a config is futile. Things are getting cut off for some reason. Smaller ones would probably demonstrate the same issue, but it's a pain to collect them, so moving that to #1519. |
To clarify, at this point I'm not concerned about the ingress-controller container footprints, which is the scope of the issue at hand. Feel free to close this one. I want to understand why 1.3 and 2.0 result in such a large difference in Kong's footprint (229 vs 646). |
Agreed that the difference in memory could indicate some other issue (something incorrect in configuration). Direct comparison of the JSON blobs for #1519 should indicate what those are and if they're problematic. If there's nothing obvious there we may need to engage the core team for better memory analysis tools, since the bulk of memory appears to be allocated to the nether zone that standard tools do not report on, and further appears to fluctuate considerably (although 2.x appears to result in consistently less proxy usage, IIRC I observed that both 1.x and 2.x usage per Pod could vary by up to a GB versus other runs with the same controller version). |
As @hbagdi stated, there is a hypothesis that KIC 2.0 can have different memory usage characteristics which could potentially break upgrading users with their container limits tuned to 1.x.
Acceptance criteria:
The text was updated successfully, but these errors were encountered: