-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split metric collections into smaller intervals #14332
Merged
Merged
Changes from 5 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
5571320
Split metric collections into smaller intervals
blomquisg 82b83dd
Cleanup split_capture_intervals method
blomquisg f874607
Test for reversed metric dates
blomquisg 1b539ce
Simplify split_capture_intervals
blomquisg 8b24751
More testing for perf_capture_queue
blomquisg 7bd3090
Fixes for rubocops in metric capture
blomquisg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,20 @@ def queue_name_for_metrics_collection | |
ems.metrics_collector_queue_name | ||
end | ||
|
||
def split_capture_intervals(interval_name, start_time, end_time, threshold=1.day) | ||
raise _("Start time must be earlier than End time") if start_time > end_time | ||
# Create an array of ordered pairs from start_time and end_time so that each ordered pair is contained | ||
# within the threshold. Then, reverse it so the newest ordered pair is first: | ||
# start_time = 2017/01/01 12:00:00, end_time = 2017/01/04 12:00:00 | ||
# [[interval_name, 2017-01-03 12:00:00 UTC, 2017-01-04 12:00:00 UTC], | ||
# [interval_name, 2017-01-02 12:00:00 UTC, 2017-01-03 12:00:00 UTC], | ||
# [interval_name, 2017-01-01 12:00:00 UTC, 2017-01-02 12:00:00 UTC]] | ||
(start_time.utc..end_time.utc).step_value(threshold).each_cons(2).collect do |s_time, e_time| | ||
[interval_name, s_time, e_time] | ||
end.reverse | ||
end | ||
private :split_capture_intervals | ||
|
||
def perf_capture_queue(interval_name, options = {}) | ||
start_time = options[:start_time] | ||
end_time = options[:end_time] | ||
|
@@ -32,27 +46,20 @@ def perf_capture_queue(interval_name, options = {}) | |
|
||
log_target = "#{self.class.name} name: [#{name}], id: [#{id}]" | ||
|
||
# Determine what items we should be queuing up | ||
items = [] | ||
cb = nil | ||
if interval_name == 'historical' | ||
start_time = Metric::Capture.historical_start_time if start_time.nil? | ||
end_time = Time.now.utc if end_time.nil? | ||
|
||
start_hour = start_time | ||
while start_hour != end_time | ||
end_hour = start_hour + 1.day | ||
end_hour = end_time if end_hour > end_time | ||
items.unshift([interval_name, start_hour, end_hour]) | ||
start_hour = end_hour | ||
end | ||
else | ||
items << [interval_name] | ||
items[0] << start_time << end_time unless start_time.nil? | ||
|
||
start_time = last_perf_capture_on unless start_time | ||
end_time = Time.now.utc unless end_time | ||
cb = {:class_name => self.class.name, :instance_id => id, :method_name => :perf_capture_callback, :args => [[task_id]]} if task_id | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could this cause a problem if we have multiple queue items calling the same callback now? I can't really understand the code on line 79 below. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
end | ||
|
||
# Determine what items we should be queuing up | ||
items = [interval_name] | ||
items = split_capture_intervals(interval_name, start_time, end_time) if start_time | ||
|
||
# Queue up the actual items | ||
queue_item = { | ||
:class_name => self.class.name, | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The side effect of this is that it's still possible to call
target.perf_capture("realtime")
on a target with a very oldlast_perf_capture_on
and end up with the bad behavior of attempting to grab many many days of metrics in one go.This PR's change only helps if the caller uses the queue to queue up the perf capture.
This leads me to think that we need to refactor the metrics capture system past the
perf_capture_queue
method to always insist that thestart_date
andend_date
are provided. That way, either you know what you're doing from the caller's standpoint, or, you queue up the call and let the queuing mechanism handle the break down of date intervals.In fact, even trying to catch the problem at the
perf_capture
level leaves the individual provider implementations with the ability to interpret what it means to have a missingstart_date
orend_date
. It feels very dirty allowing each layer of the metrics capture code make those assumptions.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was sort of relying on the fact that on vmware, if you ask for anything older than 4 hours for realtime, you are just going to get the last 4 hours. I think it's ok to split it up, because then you should just get a bunch of no-op collections.
Is there any place that doesn't do this?
So, are you saying that the provider-specific part of perf_capture should be responsible for determining what to do with the timestamps given to them? If so, I thought that's what the code already does (aside from the gap collection).
One other point to note is that for historical collection, we intentionally collected the hours in reverse (that's the .unshift in the original). This was because historical is done in a lower priority. So, it's possible data could come in more slowly and we wanted the charts to grow backwards instead of growing forward but from a time in the way past (if that makes sense). This code undoes that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe all of the important parts of the code queue today, but there's still some method name building that's hard to track down.
No, I'm saying that there are provider-specific implementations that still expect that the caller can send in
nil
for start/end dates. And, I think the interface should be more strict to assume that the caller always provides a valid start/end date. Instead of allowing the assumption of what anil
start/end date means permeate throughout the metrics capture code. In the end, if no one is calling specific implementations withnil
start/end date, then it's wasted logic. And, if people are calling specific implementations withnil
start/end date, then it's a pattern of deviation that will show up as we consolidate more things into the provider platform arena.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D'oh, I'll go over that one more time, then. I think I can just reverse what I've got now with the steps and each_cons bits.