Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ModuleCache initializing takes a minute #53

Open
pxue opened this issue Feb 12, 2015 · 11 comments
Open

ModuleCache initializing takes a minute #53

pxue opened this issue Feb 12, 2015 · 11 comments

Comments

@pxue
Copy link

pxue commented Feb 12, 2015

I'm having problem where every reload is taking about a minute. With the debug mode on I was able to trace through the code to dependencies_patch Line: 115

Apparently iterating over ObjectSpace and mapping _mod_name takes a really long time.

For example, if I open up rails console and run
ObjectSpace.each_object(Module){|mod| mod._mod_name }

take about 100 seconds. The count returned is 26000+.

I've monkey patched Line 231 of same file module_cache = nil so that it doesn't reload the cache on every file reload. This improved the reload time significantly to around 1s, but I'm not sure what side effects I've introduced (I'm a ruby nub).

Any insights or point me to the right direction? Thanks!

@thedarkone
Copy link
Owner

This improved the reload time significantly to around 1s, but I'm not sure what side effects I've introduced (I'm a ruby nub).

This should have broke rails-dev-boost, ie. it will now fail to detect classes/modules' inter-dependencies.

The count returned is 26000+.

This indicates either a really huge app (with 26000+ classes/modules/.rb files, just booting a Rails console should take about 10 minutes) or the app is somehow generating a lot of classes/modules on the fly. Either way this shouldn't take 100 seconds to iterate over.

For example here the an output for an app I have (running ruby 2.0):

irb(main):001:0> ObjectSpace.each_object(Module) {}
=> 4235
irb(main):002:0> t=Time.now; 10.times {ObjectSpace.each_object(Module) {|m| m._mod_name}}; Time.now-t
=> 0.804799

As can be seen the app has 4235 modules/classes, and iterating over them 10 times (this should be equivalent to iterating over 42350 modules/classes) takes 0.8s.

Any insights or point me to the right direction?

Check how fast "pure" ObjectSpace.each_object(Module) runs, ie:

irb(main):003:0> t=Time.now; 10.times {ObjectSpace.each_object(Module) {|m| m}}; Time.now-t
=> 0.122112

This way we can rule out that native ObjectSpace.each_object(Module) is to blame.

Then, check whether ._mod_name method is the correct one (it should be an alias of a native Module#name method):

irb(main):004:0> ObjectSpace.each_object(Module).map {|mod| mod.method(:_mod_name).unbind}.uniq
=> [#<UnboundMethod: Module#name>, #<UnboundMethod: Class(Module)#name>]

@pxue
Copy link
Author

pxue commented Feb 12, 2015

Yeah the Rails app I'm working on is pretty ginormous. Here's the benchmarks:

[1] pry(main)> ObjectSpace.each_object(Module) {}
=> 26605
[2] pry(main)> t=Time.now; 10.times {ObjectSpace.each_object(Module) {|m| m }}; Time.now-t
=> 0.570916

and then calling _mod_name

[1] pry(main)> t=Time.now; ObjectSpace.each_object(Module) {|m| m._mod_name }; Time.now-t
=> 34.679376

but take note that I this is just ONE call. Iterating over 10x would take 5min+.

Lastly this is the output of the unbind:

[2] pry(main)> ObjectSpace.each_object(Module).map {|mod| mod.method(:_mod_name).unbind}.uniq
=> [#<UnboundMethod: Module#_mod_name(name)>,
 #<UnboundMethod: Class(Module)#_mod_name(name)>,
 #<UnboundMethod: StateMachine::HelperModule(Module)#_mod_name(name)>]

@thedarkone
Copy link
Owner

Mine current hunch is that you are running into an issue with Module#name (note that _mod_name is a pure method alias for native Method#name) being 2 to 3 orders of magnitudes slower for anonymous modules/classes (ie: when mod.name # => nil) than for "named" modules/classes (when mod.name returns a string, for example RailsDevelopmentBoost.name # => "RailsDevelopmentBoost").

This is because: if a module/class name is unknown (which it is for "anonymous" modules/classes) the whole Ruby constant namespace is traversed by MRI trying to find the name of the mod, if a name is found, it is cached and all subsequent mod.name calls return that cached result. If traversing constant namespace yields no matches, nothing is cached and next time mod._mod_name is called MRI has to re-traverse everything yet again.

You can check how many of the modules/classes of your app are "anonymous" like this:

irb(main):002:0> ObjectSpace.each_object(Module).select {|m| RailsDevelopmentBoost::DependenciesPatch::Util.anonymous_const?(m)}.size
=> 172

Does this yield thousands modules/classes for you (I'm expecting it to be something like 20000+)?

@pxue
Copy link
Author

pxue commented Feb 13, 2015

I think your hunch is right, here:

[1] pry(main)> ObjectSpace.each_object(Module).select {|m| RailsDevelopmentBoost::DependenciesPatch::Util.anonymous_const?(m)}.size
=> 19224

So this is interesting. I did read about anonymous module/classes and their side effects. I'll have to read up on it.

@pxue
Copy link
Author

pxue commented Feb 13, 2015

Is keeping a persistent set/hash of anon class to skip the _mod_name look up a good solution?

require 'set'
anon = Set.new([])

2.times do
  t = Time.now
  ObjectSpace.each_object(Module) do |mod|

    next if anon.include? mod
    m = mod._mod_name

    if !m
      anon.add mod
    end

  end
  puts Time.now - t
  puts anon.length
end

And here's the benchmark after:

33.183677
19226

0.102118
19226

@thedarkone
Copy link
Owner

Don't know what to do about that abysmal anon module/class mod.name performance on MRI. For example with my app I have 4009 non-anonymous modules/classes and calling .name on all of them takes 0.002s, while invoking .name on remaining 220 anonymous modules/classes takes 0.06s. What's even worse is - the bigger your app gets, the slower .name call for anon modules becomes (as the namespace structure grows, the longer it takes to traverse).

Not sure if I'll ever be able to fix this for you (short of fixing MRI).

You could alleviate the problem by giving your anonymous modules names, not sure have feasible that is given your app's constraints:

def deanonymize_module_or_class(mod)
  Object.const_set(:XYZ, mod)
  mod.name # ensure the name is computed and cached by MRI
  Object.send(:remove_const, :XYZ) # unset the const, `mod` will now forever retain XYZ const name
end

Note that Ruby will not care that 19224 of your classes/modules now have the same mod.name.

@thedarkone
Copy link
Owner

Is keeping a persistent set/hash of anon class to skip the _mod_name look up a good solution?

I can't, they wouldn't get garbage collected by Ruby VM then...

I also can't set a an i-var on them like this:

if mod.instance_variable_get(:@anon_module)
  # skip
elsif Util.anonymous_const_name?(mod._mod_name)
  mod.instance_variable_set(:@anon_module, true)
else
  # work with mod
end

Because next time around a previously anonymous module might no longer be anonymous and ModuleCache needs to keep track of it.

@pxue
Copy link
Author

pxue commented Feb 13, 2015

You could alleviate the problem by giving your anonymous modules names, not sure have feasible that is given your app's constraints:

This is what I was planning. In config/application.rb I've added the deanonymizing code in an after_initialize block. Obviously this adds about an extra minute to the application boot time, but much better than the 40s per reload time.

I guess unless we monkey patch MRI, this is good enough for me. What I'm wondering is how does languages like Python handles anon name look ups.

@thedarkone
Copy link
Owner

What I'm wondering is how does languages like Python handles anon name look ups.

This is purely an MRI implementation problem, other Ruby VMs (JRuby and Rubinius) don't suffer from this issue, they always keep track of whether a module/class has a name or not, if it doesn't nil is returned right away (the whole namespace/constants-world is never traversed). Also because returning a nil from mod.name call does not involve a memory allocation (whereas on the other hand for "named" module/class mod.name call always returns a new String instance), calling mod.name on anonymous modules/classes is actually faster :).

This is because MRI optimizes for boot-up performance (when a module acquires a name, MRI chooses not to bother storing it on a module's object, thinking "if somebody asks for a module name, we'll get around to it and figure it out") whereas JRuby and Rubinius always keep module object's name up to date (an anonymous module has a null pointer, if it ever acquires a name a pointer is updated with the proper name).

PS: personally I believe MRI behavior is a legacy artifact/quirk and switching it to JRuby/Rubinius "mode" won't result in any boot-up slow down.

@thedarkone
Copy link
Owner

@pxue not sure if rails-dev-boost is ever going to attempt to fix your problem, but there is hope on the MRI front! https://bugs.ruby-lang.org/issues/11119

@pxue
Copy link
Author

pxue commented May 6, 2015

@thedarkone Nice! we should back contribute to the discussion. we have some nice real world benches here of the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants