-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Refactor Metric::Processing.process_derived_columns to use less memory #15757
[WIP] Refactor Metric::Processing.process_derived_columns to use less memory #15757
Conversation
This pull request is not mergeable. Please rebase and repush. |
Can't have the |
@Fryguy Can you explain the rational behind this statement? Because we already do similar things for defining methods in our app: https://github.com/ManageIQ/manageiq/blob/master/lib/extensions/ar_virtual.rb#L175 Some of which you wrote. I can define all these methods by hand as well, it just seemed like the repetitious part wasn't very dry, and was going against the original reason for using a case statement to begin with. |
General I wrote neither of these, but even so
Note that I did find this one, which should just be changed to a regular block style as there is no need for a String:
|
Just to clarify, this "risk" (and I would clarify "risk" to assume you mean "security risk") is only the case with user input when dealing with I use |
Again, I am not thrilled about the I also think that defining a method to then define this methods using That said, I am fine with either rolling back to the commit that included the eval changes, or typing these methods out by hand. The only reason I didn't do the latter was again, because it wasn't DRY, and could lead to bugs when this is refactored down the road (I also figured @jrafanie would cry with all the extra additions to this file). |
Taking a look at this, it seems related to the newest version of
Going to look through the commits and the memory profile data to see what I can find that might be causing this. Edit: I take some of that back, as it is using the same version number it seems, but something is using part of |
After looking at the changes that I had from after I rebased master to what I originally tested: And running a few tests after some merge PR commits just to be sure, I am convinced that the increase in memory is caused by #15710 . Note, that this is with and without my changes, which is very concerning seeing the differences in memory before and after this change is in place. |
Much of what was being done with the DERIVED_COLS was repeating the same code over and over to split the array values up to determine what the methods, groups, types, etc. were for the given column. Instead of doing that everytime, and instantiating extra strings to do so, just make that part of the values of the hash, and iterate over that instead.
There should be no reason to create a new string from the `col` local variable, since it is already a symbol and that should work just fine for checking it's name is correct.
The complexity of the process_derived_columns method was quite high, and a lot of the logic for it was simply calling out specific routines needed to generate the DERIVED_COLS data, and using "types" to do it. At it's heart, this simply changes DERIVED_COLS to be a hash of the name of the derived column, and the method necessary to generate it, and then the code in process_derived_columns is replaced with a simple send to that method, passing the reference to col, state, attrs, and result to the method for it to be calculated. All of the methods for DERIVED_COLS are then private class methods at the bottom of the file, and a decent amount of them are meta-programmed in (since many are similar). The way these are generated are specifically trying to avoid generating new strings to determine what methods to call, so `module_eval` statements in this case are preferred over using define_method. This then allows: - The "TYPE_" constants to be removed - The DERIVED_COLS definition to be simplified - The process_derived_columns columns method to be greatly simplified - No real changes to how the data is derived** The cost of this is a bit of complexity due to the meta-programmed methods, some more additions than deletions (sorry LJ...), and a few more method definitions existing. The extra method definitions is probably negligible from a performance front, but the complexity might be up for debate. ** This a few if statements in generate_derived_memory_used and generate_cpu_usagemhz_rate_average were reduced/simplified to either avoid extra calls to helper methods, or remove complex unless statements, but should be functionally equivalent.
So the change from #15710 seems to be from it's introduction of using ManageIQ/more_core_extensions#54 More details and before/after benchmarks can be found there. Note: The benchmarks taken there were done after a recent rebase of master applied to this branch, so the |
1b8c721
to
343199a
Compare
Checked commits NickLaMuro/manageiq@ef40363~...343199a with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0 app/models/metric/processing.rb
|
Note to self: My current implementation has omitted some of the changes from #15695 , so that needs to be addressed and rebased back in. Seems like he changes from #15695 were rebased in without changes from the previous commit, but were not accounted for when NickLaMuro@343199a was also rebased and updated. |
@NickLaMuro based on your comment, does that make this wip? Or, would you want to address that separately? |
@blomquisg haven't actually touched this in a mouth since I found out with some testing by Dennis that it didn't solve the metric's collector memory retention issue ("memory leak"). Anyway, yes, I will toss this as [WIP] for now, but know that it probably won't get love for a bit. Sorry this is currently sitting on your plate with stuff still left to do. |
No worries. If it's |
This pull request is not mergeable. Please rebase and repush. |
Disclaimer: @blomquisg Not sure how much you looked into this when you last checked in... months ago... sadly... but @Fryguy left a comment about a concern regarding my use of I would like to clarify what he (and you if you care to weigh in) would prefer, and hopefully clear up any questions both he and I have for each other before fixing issues mentioned in my comment above. EDIT: This portion of this comment moved to this gist to cut down on clutter in this PR's comments. |
"Use define_method over eval" is a generality of mine. If the same expression can be written with both I always prefer define_method for consistency as well as just avoiding While your gist on the performance of define_method vs eval vs others is interesting,
This is actually the critical sentence. If you CAN'T do something in define_method, then that is your entire argument. Can you clarify that part a bit because I don't understand why you can't do it, or perhaps what you are doing in eval that is so much easier than doing it with define_method. |
EDIT: Moved to gist to cut down on clutter |
Sorry, but I'm not reading all that. I'm pretty sure the first part could just be written as:
Without "real" values though it's hard for me to understand if it really makes a difference. The second part I can't follow at all, and frankly, the up/down/left/right example just confuses me more. Can you boil it down to literally 2 or 3 sentences about why you can't use define_method aside from performance impact? Also, I don't see in the entire thread what the actual saving are aside from the OP, which only talks about objects (and is edited later as being old information anyway after rebase):
|
@Fryguy : After a long convo with @chrisarcand about my previous comments being poorly conveyed, here is a short answer (hopefully that helps): I choose That said, I was originally arguing using I don't care if you don't want to do meta-programming in general here, though, if you find it too complex, which I define as a different form of "risk". |
end | ||
private_class_method :generate_derived_#{group}_reserved | ||
METHOD_DEF | ||
eval method_def.split("\n").map(&:strip).join(';') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming we went with the eval
, why the need for all of the split/map/join? I thought eval just worked with the original full string...
[1] pry(main)> class Foo
[1] pry(main)* %w{a b}.each do |x|
[1] pry(main)* eval <<-EOS
[1] pry(main)* def #{x}
[1] pry(main)* puts "#{x}"
[1] pry(main)* end
[1] pry(main)* EOS
[1] pry(main)* end
[1] pry(main)* end
=> ["a", "b"]
[2] pry(main)> Foo.new.a
a
=> nil
[3] pry(main)> Foo.new.b
b
=> nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, probably could go without it, but I think I was trying to channel what was done here to a degree (probably not needed though):
rails/rails@44c51fc (see associated issue as well).
I would have to check to see if that even helps though. Honestly, I forget since it has been a while since I have put this code together. Will test out in a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Rails commit link there doesn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fryguy Shoot... sorry, I usually self shorten the SHA's to 7 chars, and I guess that was the one time Rails actually had a conflicting commit SHA with that... updated in that last comment so there isn't a broken link.
Regardless, it was stupid to do that in this case since I wasn't building a DSL, so that probably shouldn't stay. Line numbers would be exactly where we want them in this case. (past @NickLaMuro was being stupid)
This pull request has been automatically closed because it has not been updated for at least 6 months. Feel free to reopen this pull request if these changes are still valid. Thank you for all your contributions! |
Attempting to address some of the concerns from: https://bugzilla.redhat.com/show_bug.cgi?id=1458392
This is mostly a refactoring of the DERIVED_COLS constant to prevent having to do
String#split
andArray#join
unnecessarily. You can read the commit messages for more details.Benchmarks
To test, the following script was used:
https://gist.github.com/NickLaMuro/836f8a08d26c74fa0863a08f2926f3da
Coupled with the
memory_profiler
gem, which was added via anoverride_gem
.BEFORE:
AFTER:
Saves about 50k objects from being instanciated.
Of note, the last commit overall only saves about 1k-2k objects from the commit before it:
So if the extra changes in the last commit make you squeamish, rebasing out that commit wouldn't be the end of the world.
Also of note, since rebasing in master, this morning (2017/08/07) and running a
bin/update
, the number of total objects returned by all reports was increased by nearly 100k... I will try an take a look at what might have caused this, but I am relatively sure I was on this commit prior to this morning.Links
Steps for Testing/QA
The script can be run from be added into the
bin/
dir of themanageiq
project and run against the existing branch:$ bin/perf_capture_realtime_test
Add
-h
to see available options. Results from above used mostly the default options, and took the second report results for the data.