-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Drop collection of old realtime capture requests #15422
[WIP] Drop collection of old realtime capture requests #15422
Conversation
If the realtime capture request is many days old due to stale data on the queue then don't process it. The perf_capture_timer will re-queue historical gap collection and current realtime captures.
Checked commit agrare@7237eed with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0 |
Another option is to allow the collection to go through but to drop it if the actual date range doesn't overlap at all with the requested data range. The collection in this case took 1.5min so dropping it early might be better. |
@kbrock Please review. |
For the record, the root cause of why the huge date range was returned appears to be ManageIQ/manageiq-providers-kubernetes#49 |
Does this mean we still get historical correct data for doing e.g chargeback? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally ok with this approach, though I dont know why it should be configurable - see my last comment
This was the result of having a huge backlog of realtime captures on the queue, so by the time we got to a realtime request it was almost two weeks old.
I'd love to not add new config for this if one already exists, can you point me to where this is? |
afai can tell https://github.com/ManageIQ/manageiq/blob/master/config/settings.yml#L89 |
👎 for parsing crontab strings 😆 |
I'm Jon Snow. |
How do we typically collect metrics data for gaps in the past? I was under the impression that we only collect granular data and then roll it up to coarse data vs collecting coarse data. I'd prefer to just hardcode a threshold at 1 day and not collect anything older. (yes, it may cause issues, but it feels like we should solve metrics rather than letting it stay fluffy and troublesome.) |
Closing since ManageIQ/manageiq-providers-kubernetes#49 fixed the root cause of this issue |
If the realtime capture request is many days old due to stale data on the queue then don't process it. The perf_capture_timer will re-queue historical gap collection and current realtime captures.
We've seen cases where we are processing realtime capture requests for a 20min window that are over a week old causing the provider to return many days of performance data which causes
#perf_process
to timeout.Example:
In this case we requested 20 minutes of performance data from 11 days prior, and we actually got about 1 week and 27,000 rows where a few hundred rows are typical.
/cc @blomquisg