-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNMP: snmpwalk is slow and can timeout #404
Comments
This is typically best observed by running clixon-snmp with the IFMIB walk in a profiler.
One could add such a snmp-specific option to skip deprecated YANG. |
I'm currently trying to reproduce this issue on a smaller setup, I'll update with more details if needed when done. |
I extended the clixon regression test/test_snmp_ifmib.sh just by adding eight more interfaces with no significant delay:
That is 1.5s using an old x86. In other words, I cannot reproduce it. |
I did some more research into that. Looks like it is a combination of multiple issues:
|
This patch makes a hardcoding of the prefix:oid usage instead of using a generic function. It removes the large number
|
@olofhagsand I see some significant performance improvement: |
Fixed by optimizing yang_extension_value() into a more specialized version yang_extension_value_opt() specific to snmp code. |
I verified this patch with the latest commit and we did get an improvement (from ~19 seconds to ~7 seconds). |
@olofhagsand we are currently testing the SNMP performance, |
@olofhagsand we are getting SNMP timeouts when running |
@dima1308 Has there been some change in behvior or is it the same result as in Feb 13? |
@olofhagsand No changes in the behavior. The problem is that now we have larger setups where the issue is reproduced all the time: |
Can one control the timeout as a work-around, who makes the timeout? |
Yes, if I increase the timeout on
In some cases, if customers use our in-house management stations, the issue is not happening (our management station did not do snmp walk). |
What is written here is still correct (apart from 3). Maybe it would be correct to begin with the low-hanging fruits like caching "get" calls for let's say a second. |
Yes I think caching is a good way forward. However, caching is seldom trivial, due to stale entries, timeouts, memory usage etc. |
@olofhagsand Are there any plans to improve SNMP performance? In our system, when NMS stations run SNMP queries, CPU usage spikes to 100%, making SNMP unusable. |
No current plans. |
What is required as a first step is a profiling of a repeatable specific usecase. I see some profiling made earlier, are they still valid? |
@olofhagsand It remains valid. In the thread above, you can find an analysis of the failure. From what I understand, the issue stems from the implementation of the SNMP get-next command, which consumes substantial CPU resources. During standard SNMP operations (such as get-bulk or walk), the get-next command is executed frequently, causing the CPU utilization to spike to 100% and making SNMP usage impractical. Could you please address this issue? |
Added a cache for getnext. Please verify. |
Cherry pick from 739d052
Cherry pick from 739d052
Cherry pick from 739d052
Cherry pick from 739d052
Hi.
Running snmpwalk takes a long time to complete, and if there is a large amount of data it can also timeout.
By looking at the clixon logs I saw that the most of the delay happens when the walk reaches ifTestTable.
We suspect that the general delay is caused by traversing the yang files in real time
and the ifTestTable delay is caused by ignoring the deprecated status which is, according to the documentation, currently unimplemented in clixon.
The text was updated successfully, but these errors were encountered: