Let's understand how Oxidized creates threads #457

ElvinEfendi · 2016-06-02T21:14:55Z

I've realized that even if I have over 100 devices in the router.db Oxidized still uses a single thread and fetches the configs one after another. By looking at https://github.com/ytti/oxidized/blob/master/lib/oxidized/jobs.rb#L36 it seems Oxidized considers the interval too when calculating the number of threads to be created. Does this mean that Oxidized will never create a parallel thread unless it thinks that Oxidized.config.interval is not enough time to fetch all configs sequentially? What is the rationale behind this decision? Why not just while @jobs.size < Oxidized.config.threads at https://github.com/ytti/oxidized/blob/master/lib/oxidized/worker.rb#L16

@ytti

The text was updated successfully, but these errors were encountered:

ytti · 2016-06-02T21:19:28Z

If single thread is sufficient to meet interval, it will never launch more threads. If average config fetch time implies interval cannot be met, more threads are started, until interval is met.

Essentially user will decide how old can configuration backup be, configures this as interval, and lets it run.

For L36, if we have 100 nodes and average duration is 10s, it'll take 1000s to fetch them. Then we'll divide the aggregated time with desired interval, to arrive on how many threads we need to accomplish that.

danilopopeye · 2016-06-02T23:12:47Z

Why not just while @jobs.size < Oxidized.config.threads at https://github.com/ytti/oxidized/blob/master/lib/oxidized/worker.rb#L16

Maybe we could introduce something like a :min_threads config, instead of just try to hit the maximum number directly. Not sure which one is better.

ytti · 2016-06-03T10:08:11Z

What problem are we solving?

++ytti

On 03 Jun 2016, at 02:12, Danilo Sousa notifications@github.com wrote:

Why not just while @jobs.size < Oxidized.config.threads at https://github.com/ytti/oxidized/blob/master/lib/oxidized/worker.rb#L16

Maybe we could introduce something like a :min_threads config, instead of just try to hit the maximum number directly. Not sure which one is better.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

danilopopeye · 2016-06-03T18:53:48Z

What problem are we solving?

Take less time to fetch all nodes when you have loads of them?

ytti · 2016-06-03T18:59:09Z

Elaborate? Do you have case where configured interval is not being met? If you want to fetch nodes faster, that means you want backed up config to be younger, i.e. you want interval to be smaller?

yesbox · 2016-06-08T16:44:16Z

The time a user want to wait between complete backups of all devices and time to fetch all configs when doing a complete backup can be seen as two different wishes, whereas now it's assumed they are approximately the same.

As the fetching time becomes shorter the entirety of the backup becomes closer to a snapshot of the network at the time rather than a more continuous stream of configs during the interval time. This could be desirable as it may be easier to reason about a snapshot when checking configs relation to each other or restoring an entire environment, or at least multiple devices, from backup.

This could be done by forcing a minimum number of threads but perhaps what is really being asked for is two timers; the interval between complete backups as well as the time to fetch, which is smaller this the interval, to aim for. That could be used to calculate the number of threads, much like today. If time to fetch is not set, make it equal to the interval and the behavior is the same as today. This doesn't prevent there from being a minimum threads settings though.

ytti · 2016-06-08T17:06:16Z

If I understood correctly, we can't change time-to-fetch, we're only using single thread to talk to single device, we do it as fast as we can, like 99% of time is obviously I/O wait. So if some improvements could be made to time-to-fetch, it would fall within 1% improvement.

What we can do, is try to guarantee config is no older than N time, which is what we do.

Sometimes N might be temporarily too long, then you can do /next to move device to head-of-queue, to force instant fetch.

yesbox · 2016-06-08T18:14:16Z

Sorry, I think I wasn't totally clear. The suggestion (didn't actually mean to write a feature request but here we are, I think I would use this but I'm not sure it's terribly important :)) is not to make the time to fetch faster by optimizing the code but to have another configurable timer which I refereed to as "time to fetch". Like the interval this timer describes what the user desires and the thread algorithm adjusts the number of threads to make it finish on time, so really what I mean by this is "desired time to fetch from all devices in total during one iteration/interval".

The current algorithm ought to be able to handle that without big changes (said without looking at the code...). Basically you'd allocate threads like you already do today but using the "desired time to fetch everything" instead of the "interval" to find how many threads are needed to finish within that time, but then you'd still wait until the next "interval" to begin another fetch.

That way you could ask Oxidized not only to "get the configs from all devices every X minutes and I want you to finish within that same interval, please adjust the number of concurrent fetches to make it so", you could also say, if you want to; "get the configs from all devices every X minutes but I want you to finish getting them in Y minutes (which is lower than X), please adjust the number of concurrent fetches to make it so".

Again, doesn't exclude the possibility of configuring a min/max amount of threads but I think that would cover the perceived need to do so in perhaps a better way than plain overriding the thread algorithm.

ytti · 2016-06-08T18:32:32Z

Apologies I'm being thick. I cannot understand your new use case get the configs from all devices every X minutes but I want you to finish getting them in Y minutes (which is lower than X) how is this different than setting current interval to Y?

Maybe some other example, or perhaps really concrete with nodes and exact time for each being fetched, in both scenarios would help me wrap my head around the request.

danilopopeye · 2016-06-08T20:18:04Z

I'll try to use my case as an example of why I said about a min_threads config.

We have (almost) finished configuring all the ~2.6k elements that we need to backup. For connectivity issues we need 3 machines to cover all nodes. The first will have ~900, second ~600 and the last ~1100 elements each.

We set the interval to 12 hours (43200 seconds), since we don't touch most of the devices during the day. Then restarted each Oxidized around midnight. Since our interval is really long, only 1 thread is used until we hit a firewall that takes longer than 300 seconds, then a second thread is started.

My problem here is: It would be ideal to finish all fetches before 6 am, but I don't have this kind of control today hence why I suggested the use of a min_threads config. Nothing as fancy as what @yesbox suggested, but could easily achieve a similar result.

ytti · 2016-06-08T20:23:01Z

If you want to finish by 6am, and you start at midnight, Shouldn't you set the interval to 6h?

Or is it one off? You want to get boxes now by X, but subsequently within 12h?

yesbox · 2016-06-08T20:27:05Z

Let's try an extreme but perhaps not unreasonable example. Say you want to get all configs in a deployment of 100 devices all backed up in a one minute span, because you don't want the configs on the devices to diverge more than one minute in any one iteration. You'd like to do this every 2 hours.

You can make sure to get all configs in one minute by setting the interval to one minute and that will attempt to get all configs in one minute but it will also do so every minute - you just wanted to backup the configs every 2 hours, now you're hitting the devices much more often that you wished since the time between getting the first and last config in your list of devices is tied to how often you do so.

In that case you instead set interval to 2 hours and fetch_time/fetch_spread/time_to_fetch_all_configs to 1 minute. That will effectively set what is today interval to one minute and once done, sleep for 1 hour and 59 minutes before starting the next internval, assuming it actually did finish in one minute.

Time axis ->
| new interval begins
- currently fetching using one thread
= currently fetching using many threads
^ fetch_time amount of time passed since new interval started
Always the same amount of devices.

One type of user wants this behavior:

Short interval set to be up to date with changes quickly.
Needs many threads to keep up, busy all the time. Works as desired.
|=|=|=|=|=|=|=|=|=

Longer interval set to regularly get all devices backed up.
The devices may be quite independent or you don't make many changes so we don't
care how much (measured in time) the configs may have diverged.
Finishes in time for the next interval so one thread is used. Works as desired.
|---------      |---------      |---------      


Another type of user wants this:

Short interval set to get all configs without much time passed between the first and last.
We want something closer to a snapshot or for some other reason want it to finish quickly.
Does what we asked but does it all the time. Didn't do what we wanted.
|=|=|=|=|=|=|=|=|=

Longer interval set to regularly get all devices backed up, not to hit them all the time.
Does what we asked but now with fewer or a single thread, so it didn't finish quick like it
did with short interval. Didn't do what we wanted.
|---------      |---------      |---------      

Suggestion: Longer interval (timer to start fetching from first device) set to regularly get all devices backed up and
a short fetch_time (timer used to adjust the amount of threads to finish fetching the last device config in this amount of
time since the first device config started being fetched) to
finshing getting them quickly so that they do not diverge (or whatever your reason might be).
This is what this type of user wanted, a combination of the two by decoupling the time between beginning a new interval and
the time to finishing getting the last device config.
|=^             |=^             |=^

danilopopeye · 2016-06-08T20:35:34Z

Or is it one off? You want to get boxes now by X, but subsequently within 12h?

We can only run 2 times a day for now, but shouldn't pass after 6 am.
(Probably we will change this to run only once every 24h at midnight)

ytti · 2016-06-08T20:43:37Z

So essentially what is wanted is bursty behaviour. My initial thought was that device could be provisioned with predictable CPU time requirements, so that CPU use is constant over time.

But from @yesbox example I hear that crucial is that all configs are relatively near same time, but need not be collected very often.
Thank you, now I understand the desire.

Could we satisfy both requirements by doing absolutely no periodic fetch at all. Instead have API call to run one rotation at max_threads? I guess it could be config options too.

ElvinEfendi · 2016-06-24T11:10:09Z

The purpose of the issue was to understand how Oxidized creates threads and motivation behind it, and I think we achieved it. So closing this.

athompson-merlin · 2022-04-27T00:23:45Z

I also have the need to collect as much of a "snapshot" as possible, so I would prefer that - at the interval time - Oxidized should spin up as many threads as possible in order to accomplish data collection as rapidly as possible.
(Oxidized is not resource-limited in this environment - CPU/RAM/IO usage is not a concern at all.)

I also would like this feature in order to troubleshoot my production instance more easily - when something doesn't work right and I have to restart Oxidized (e.g. editing a custom model), it can take almost an hour before Oxidized finishes its initial single-threaded poll of all the devices and reaches steady-state, at which point I can begin troubleshooting usefully.

Did anything ever get added to Oxidized to force multi-threaded operation? Ideal (for me) would be a new config option like use_max_available_threads: [yes|no] that could be toggled and reloaded at runtime, but I'm not seeing anything like that.

davama · 2022-04-27T00:37:43Z

@athompson-merlin

I would open new issue and reference this.

Would advise against necrobumping

ElvinEfendi closed this as completed Jun 24, 2016

ivarhjulben mentioned this issue May 5, 2017

Number of threads #829

Closed

athompson-merlin mentioned this issue Apr 27, 2022

Enhancement request: forced-multithread operation to support "snapshot"-like collection #2527

Closed

davama mentioned this issue Jan 3, 2023

How to store the configuration as a file in a separate folder. #2686

Closed

0x00af mentioned this issue Mar 5, 2024

when 'use_max_threads' is true, and 'threads' is equal or higher than the nodes to query, oxidized enters an endless loop #3095

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let's understand how Oxidized creates threads #457

Let's understand how Oxidized creates threads #457

ElvinEfendi commented Jun 2, 2016

ytti commented Jun 2, 2016

danilopopeye commented Jun 2, 2016

ytti commented Jun 3, 2016

danilopopeye commented Jun 3, 2016 •

edited

Loading

ytti commented Jun 3, 2016

yesbox commented Jun 8, 2016

ytti commented Jun 8, 2016

yesbox commented Jun 8, 2016 •

edited

Loading

ytti commented Jun 8, 2016

danilopopeye commented Jun 8, 2016

ytti commented Jun 8, 2016

yesbox commented Jun 8, 2016 •

edited

Loading

danilopopeye commented Jun 8, 2016

ytti commented Jun 8, 2016

ElvinEfendi commented Jun 24, 2016

athompson-merlin commented Apr 27, 2022

davama commented Apr 27, 2022

Let's understand how Oxidized creates threads #457

Let's understand how Oxidized creates threads #457

Comments

ElvinEfendi commented Jun 2, 2016

ytti commented Jun 2, 2016

danilopopeye commented Jun 2, 2016

ytti commented Jun 3, 2016

danilopopeye commented Jun 3, 2016 • edited Loading

ytti commented Jun 3, 2016

yesbox commented Jun 8, 2016

ytti commented Jun 8, 2016

yesbox commented Jun 8, 2016 • edited Loading

ytti commented Jun 8, 2016

danilopopeye commented Jun 8, 2016

ytti commented Jun 8, 2016

yesbox commented Jun 8, 2016 • edited Loading

danilopopeye commented Jun 8, 2016

ytti commented Jun 8, 2016

ElvinEfendi commented Jun 24, 2016

athompson-merlin commented Apr 27, 2022

davama commented Apr 27, 2022

danilopopeye commented Jun 3, 2016 •

edited

Loading

yesbox commented Jun 8, 2016 •

edited

Loading

yesbox commented Jun 8, 2016 •

edited

Loading