Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memoize regexp inside PhoneAnalyzer#country_code_candidates_for #317

Merged
merged 1 commit into from
Nov 20, 2024

Conversation

tgwizard
Copy link
Contributor

@tgwizard tgwizard commented Oct 24, 2024

This code path appears in production profiles for us. Local benchmarks shows that It's >80x faster if we memoize the regexp.

Benchmark
ruby 3.3.5 (2024-09-03 revision ef084cc8f4) +YJIT [arm64-darwin23]
Warming up --------------------------------------
original_country_code_candidates_for
                         1.491k i/100ms
cached_country_code_candidates_for
                        74.288k i/100ms
smarter_cached_country_code_candidates_for
                       109.646k i/100ms
Calculating -------------------------------------
original_country_code_candidates_for
                         12.539k (± 8.2%) i/s   (79.75 μs/i) -     62.622k in   5.045558s
cached_country_code_candidates_for
                        751.572k (± 4.9%) i/s    (1.33 μs/i) -      3.789M in   5.054955s
smarter_cached_country_code_candidates_for
                          1.087M (± 2.3%) i/s  (920.34 ns/i) -      5.482M in   5.048195s

Comparison:
smarter_cached_country_code_candidates_for:  1086552.9 i/s
cached_country_code_candidates_for:   751571.7 i/s - 1.45x  slower
original_country_code_candidates_for:    12539.3 i/s - 86.65x  slower
PHONE = "+4623123123"

def original_country_code_candidates_for(phone)
  stripped_phone = phone.gsub(/^(#{Phonelib.phone_data_int_prefixes})/, "")
  ((1..3).map { |length| phone[0, length] } + (1..3).map { |length| stripped_phone[0, length] }).uniq
end

def cached_country_code_candidates_for(phone)
  raw_re = "^(#{Phonelib.phone_data_int_prefixes})"
  re = Phonelib.phone_regexp_cache[raw_re] ||= Regexp.new(raw_re).freeze
  stripped_phone = phone.gsub(re, "")
  ((1..3).map { |length| phone[0, length] } + (1..3).map { |length| stripped_phone[0, length] }).uniq
end

def smarter_cached_country_code_candidates_for(phone)
  re = Phonelib.phone_regexp_cache["phone_data_int_prefixes"] ||= Regexp.new("^(#{Phonelib.phone_data_int_prefixes})").freeze
  stripped_phone = phone.gsub(re, "")
  ((1..3).map { |length| phone[0, length] } + (1..3).map { |length| stripped_phone[0, length] }).uniq
end

Benchmark.ips do |x|
  x.report("original_country_code_candidates_for") { original_country_code_candidates_for(PHONE) }
  x.report("cached_country_code_candidates_for") { cached_country_code_candidates_for(PHONE) }
  x.report("smarter_cached_country_code_candidates_for") { smarter_cached_country_code_candidates_for(PHONE) }
  x.compare!
end

@@ -52,7 +52,7 @@ def country_prefix(country)

# caches regular expression, reusing it for later lookups
def cr(regexp)
Phonelib.phone_regexp_cache[regexp] ||= Regexp.new(regexp).freeze
Phonelib.phone_regexp_cache[regexp] ||= Regexp.new(block_given? ? yield(regexp) : regexp).freeze
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated so that cr can take a block that returns the string to use for the regexp. That way we don't have to compute create new objects just to create the key for the regexp cache.

@tgwizard
Copy link
Contributor Author

tgwizard commented Nov 6, 2024

cc @daddyz

@daddyz daddyz merged commit 15c77d2 into daddyz:master Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants