Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement request: forced-multithread operation to support "snapshot"-like collection #2527

Closed
athompson-merlin opened this issue Apr 27, 2022 · 1 comment · Fixed by #2528

Comments

@athompson-merlin
Copy link

(See #457 for some background detail. I may as well just quote myself, more or less:)

I, like others, have a need to collect as much of a "snapshot" as possible, so I would prefer that - at the interval time - Oxidized should spin up as many threads as possible in order to accomplish data collection as rapidly as possible. Oxidized is not resource-limited in my environment, and I still have max_threads to control that anyway.

This would also be a useful feature for troubleshooting my production instance more easily - when something doesn't work right and I have to restart Oxidized (e.g. editing a custom model), it can take almost an hour before Oxidized finishes its initial single-threaded poll of all the devices and reaches steady-state, at which point I can begin troubleshooting usefully.

This might make more sense when I say that many of my devices take 5-10min to collect running-config, so when those get serialized, some of the assumptions about e.g., jobs taking 5sec by default, are waaaaaay out of sync with my environment.

Ideal (for me) would be a new config option along the lines of use_max_available_threads: [yes|no] that could be changed and reloaded at runtime. Although some discussion has previously occurred, I don't see anything like that today.

If I'm missing something, or if this is a trivial local modification, great! - please tell me.
Otherwise, it looks like the section of code needing changed is jobs.want starting at

def new_count
but I'm barely able to read this code base, forget about modifying core functionality.

Alternately, if MAX_INTER_JOB_GAP (at

MAX_INTER_JOB_GAP = 300 # add job if more than X from last job started
) was a config file parameter, and able to be reloaded at runtime, that might suffice? I don't think I would want that permanently set to e.g. 1 sec, but I'm unclear on what negative effects that might have.

@athompson-merlin
Copy link
Author

Adjusting MAX_INTER_JOB_GAP to 1 directly in the source doesn't produce exactly the results I want, but certainly a lot closer. It takes A seconds to spin up B parallel jobs for C devices, where A>C>B (why???) and then once done, it appears to wait the usual interval amount of time before starting again.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant