-
Notifications
You must be signed in to change notification settings - Fork 789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefer Hash#[] over Set#.include? for speed #312
Conversation
Playing with stackprof against codetriage and for an initial run with no cache `Set#include?` was the top called method, something around 8% of execution time spent there. Did a microbenchmark to see if it would be faster to use a hash: ``` require 'benchmark/ips' require 'set' set = Set.new [:foo, :bar] hash = {foo: true, bar: true} Benchmark.ips do |x| x.report("set ") {|num| i = 0 while i < num set.include?(:foo) i += 1 end } x.report("hash") {|num| i = 0 while i < num hash[:foo] i += 1 end } x.compare! end # Warming up -------------------------------------- # set 215.314k i/100ms # hash 219.939k i/100ms # Calculating ------------------------------------- # set 11.715M (±15.5%) i/s - 56.843M in 5.010837s # hash 20.119M (±18.2%) i/s - 96.333M in 5.010977s # Comparison: # hash: 20118880.7 i/s # set : 11714839.0 i/s - 1.72x slower ``` Yes, it is faster. Anecdotally when running `RAILS_ENV=production time rake assets:precompile` against codetriage: Before patch: ``` eal 0m18.325s user 0m14.564s sys 0m2.729s ``` After patch: ``` real 0m17.981s user 0m14.461s sys 0m2.716s ```
30d439f
to
f032821
Compare
Actually looks like we can go even faster with case statements
|
Was just writing to suggest Wonder whether |
We need some kind of way to enable a "debug" or "strict" mode when it comes to these checks. They're not even really that useful for debugging, they only guarantee a certain subset of types are used, they don't guarantee the correct types are used in the correct places. I guess the checks are done to blow up before we try to serialize a non-serializable type to the cache by accident.The deep checking of types is really bad for source maps which is an array of hashes with values that hold arrays. If this is really so expensive, we should definitely look for ways to work around having to do these checks in production. I tried implementing the logic using a case statement: def valid_processor_metadata_value?(value)
case value
when String, Symbol, Fixnum, Bignum, TrueClass, FalseClass, NilClass
true
when Set, Array
value.all? { |v| valid_processor_metadata_value?(v) }
when Hash
value.all? do |(key, value)|
valid_processor_metadata_value?(key) &&
valid_processor_metadata_value?(value)
end
else
false
end
end Then I set up my system to run task "assets:bench" do
measure = []
50.times do
measure << Benchmark.measure do
`rm -rf tmp/cache/assets/sprockets/v4.0/ ; rm -rf public/assets; time RAILS_ENV=production bundle exec rake assets:precompile`
end.real
end
puts "================ DONE ================"
puts measure.join("\n")
end I put the results in a spreadsheet
The case statement is faster than the "current" but not within the stdev. Hash is the fastest even with stdev. On the average case my assets compile about 10% faster when we switched over to a hash. I think the case statement might care where the types are in the statement, or maybe it might care how many values in If you're interested in messing around with optimizing the case statement I wrote a script that compares it between hash and set https://gist.github.com/schneems/925ff1988066d4d582915fdf948ca142 for the non "complex" (i.e. Set, Array, Hash) data types. |
Looked into it more, still not sure why case is slower. Need to move on. We can revisit some other time. |
Yes. |
👍 |
Whoa, this is a little weird to me. Any thoughts on why |
Calls to code = <<-END
set = Set.new
set.include?(foo.class)
END
puts RubyVM::InstructionSequence.compile(code).disasm Produces
While code = <<-END
hash = {}
hash[foo.class]
END
puts RubyVM::InstructionSequence.compile(code).disasm produces
I think it's that |
If this is true, is there ever an appropriate time to use |
I think it's supposed to be faster for set operations such as unions and calculating intersections. I could be wrong but I think set is powered by a hash under the hood. |
Ah, cool. Thanks for the info. |
Looks like Set is written in ruby https://github.com/ruby/ruby/blob/trunk/lib/set.rb and it is powered by a hash. Based on that a Hash will always be faster, however it might not be as convenient. |
Playing with stackprof against codetriage and for an initial run with no cache
Set#include?
was the top called method, something around 8% of execution time spent there.Did a microbenchmark to see if it would be faster to use a hash:
Yes, it is faster.
Anecdotally when running
RAILS_ENV=production time rake assets:precompile
against codetriage:Before patch:
After patch: