Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of WebMock::Util::Headers.normalize_headers #1027

Closed
baweaver opened this issue May 18, 2023 · 4 comments
Closed

Performance of WebMock::Util::Headers.normalize_headers #1027

baweaver opened this issue May 18, 2023 · 4 comments

Comments

@baweaver
Copy link
Contributor

While trying to sort out some Capybara speed issues we were running memory profiling and noticed that there was an exceptionally hot path in the normalize_headers method. We took a more detailed look at it and came up with this experiment to see what interesting information we could find on it:

require 'benchmark/ips'
require 'memory_profiler'

# Recreating the Header utils
module WebMock
  module Util
    class Headers
      # Hoist these so we don't end up with recreated objects
      STANDARD_HEADER_DELIMITER = '-'.freeze
      NONSTANDARD_HEADER_DELIMITER = '_'.freeze
      JOIN = ', '.freeze

      # What exists today, unchanged
      def self.normalize_headers(headers)
        return nil unless headers
        array = headers.map { |name, value|
          [name.to_s.split(/_|-/).map { |segment| segment.capitalize }.join("-"),
           case value
            when Regexp then value
            when Array then (value.size == 1) ? value.first.to_s : value.map {|v| v.to_s}.sort
            else value.to_s
           end
          ]
        }
        Hash[*array.inject([]) {|r,x| r + x}]
      end

      def self.just_return_headers(headers)
        headers
      end

      # Experiment to try and speed that up
      def self.enhanced_normalize_headers(headers)
        return nil unless headers

        # Using `each_with_object` to reduce object creations and directly build
        # the new headers hash.
        headers.each_with_object({}) do |(name, value), new_headers|
          # Moved the name normalization into a new method mostly to give a name to what it's
          # doing here.
          new_headers[normalize_name(name)] =
            case value
            when Regexp then value
            when Array then (value.size == 1) ? value.first.to_s : value.map(&:to_s).sort
            else value.to_s
            end
        end
      end

      # Good deal of time savings on that process though
      def self.normalize_name(name)
        name
          .to_s
          # By using `tr` here we can normalize the headers ahead of time
          .tr(NONSTANDARD_HEADER_DELIMITER, STANDARD_HEADER_DELIMITER)
          # ...which will be faster because `split` on a `String` is a good deal faster
          # than on a `Regexp`.
          .split(STANDARD_HEADER_DELIMITER)
          # Now since this is state we control here we can mutate it with `map!` to again
          # avoid extra object creations.
          .map!(&:capitalize)
          # Then join it back together. Most of these constants are to avoid recreating
          # these strings hundreds of thousands of times in larger codebases. Ours I
          # believe was in the millions.
          .join(STANDARD_HEADER_DELIMITER)
      end
    end
  end
end

# Headers pulled from a request on our suite to see how they're handled
headers = {
  "x-powered-by"=>["Express"],
  "x-request-id"=>["80763389-a6a7-4495-aef2-d26290b1ef6f"],
  "timing-allow-origin"=>["*"],
  "content-type"=>["application/json; charset=utf-8"],
  "content-length"=>["17"],
  "etag"=>["W/\"11-pEowaKzwbXVLfeOv1lndFqAk+sg\""],
  "date"=>["Thu, 18 May 2023 17:46:46 GMT"],
  "connection"=>["keep-alive"],
  "keep-alive"=>["timeout=5"]
}

# Then see how they fare against eachother
Benchmark.ips do |x|
  x.report('old') { WebMock::Util::Headers.normalize_headers(headers) }
  x.report('new') { WebMock::Util::Headers.enhanced_normalize_headers(headers) }
  x.report('none') { WebMock::Util::Headers.just_return_headers(headers) }

  x.compare!
end

# Results:
#
# Warming up --------------------------------------
#                  old     7.745k i/100ms
#                  new    10.272k i/100ms
# Calculating -------------------------------------
#                  old     77.489k (± 3.6%) i/s -    387.250k in   5.004044s
#                  new    103.976k (± 2.5%) i/s -    523.872k in   5.041797s

# Comparison:
#                  new:   103976.0 i/s
#                  old:    77489.0 i/s - 1.34x  slower

# While that 1.34 does not seem large we had seen reports of a significant amount of object allocations
# and memory usage out of WebMock in our larger monorepo, so in the process of trying to find out
# sources we investigated fixing this and saw some interesting gains and wanted to upstream it.

# Let's take a look into memory footprints real quick and see how those look for 10k parses.
# Why 10k? We'd seen 20-50k requests running through this on a single spec run at one point, 
# so it's a very hot path for a larger monolithic app. That said also investigating why
# in the world that request count is so high as well.

puts "Old Variant Memory Profile", '=' * 50, ''

MemoryProfiler.report {
  10_000.times { WebMock::Util::Headers.normalize_headers(headers) }
}.pretty_print

puts

puts "New Variant Memory Profile", '=' * 50, ''

MemoryProfiler.report {
  10_000.times { WebMock::Util::Headers.enhanced_normalize_headers(headers) }
}.pretty_print

# While I could give the raw report this is the condensed version formatted for
# readability of what I'd found interesting:
{
  "Allocated Memory by Gem in Bytes" => {
    old: 60_640_360,
    new: 38_080_360
  },

  # Pretty dramatic reduction on Hash, and elimination of MatchData
  # in terms of memory. Didn't get a huge boost on String, though
  # testing this with `gsub` to avoid extra strings is much slower
  # so probably not worth it.
  "Allocated Memory by Class" => {
    old: {
      String: 23_840_000,
      Hash: 21_600_360,
      MatchData: 10_080_000,
      Array: 5_120_000,
    },
    new: {
      String: 21_600_360,
      Hash: 9_280_000,
      Array: 7_200_000,
    }
  },

  "Allocated Objects by Class" => {
    old: {
      String: 540_009,
      Hash: 380_000,
      MatchData: 60_000,
      Array: 10_000,
    },
    new: {
      String: 540_009,
      Hash: 180_000,
      Array: 10_000,
    }
  }
}

There are some important caveats and nuances to mention in this:

  1. We are a very large Rails monolith
  2. That monolith runs a React frontend on Vite
  3. Vite makes a crapton of requests (~20k in one spec on occasion)
  4. We're additionally auditing why in the world it does that and trying to cut that count

That said, this does represent a performance improvement in what can be a hot path in larger applications.

Looking for feedback and/or interest level in this type of change before I go pushing any PRs on it.

baweaver pushed a commit to baweaver/webmock that referenced this issue Jun 6, 2023
Optimizes the `WebMock::Util::Headers.normalize_headers` method to:

* Create less objects
* Avoid using Regexp where possible
* Hoists constants

More details can be found in the related issue, including performance
benchmarks:

bblimke#1027
baweaver pushed a commit to baweaver/webmock that referenced this issue Jun 6, 2023
Optimizes the `WebMock::Util::Headers.normalize_headers` method to:

* Create less objects
* Avoid using Regexp where possible
* Hoists constants

More details can be found in the related issue, including performance
benchmarks:

bblimke#1027
@baweaver
Copy link
Contributor Author

baweaver commented Jun 6, 2023

Opened a PR which addresses this:

#1029

@vanderhoop
Copy link

vanderhoop commented Jun 13, 2023

Vite makes a crapton of requests (~20k in one spec on occasion) We're additionally auditing why in the world it does that and trying to cut that count.

@baweaver are you running the conventional vite dev server, or transpiling production-like assets and using the vite "preview" server?

If it's the former, I would think there might be an incredibly long daisy chain of necessary module fetches, which on http/1.1 (the default if you haven't set up local certs) could really slow down your capybara specs. The optimization and recommendation would be to generate assets and serve them via the vite "preview" server.

If you're already using the preview server, I'd be super curious to continue chatting about what you find on the rails perf slack (handle: vanderhoop) or elsewhere.

@technicalpickles
Copy link
Contributor

@vanderhoop we're using vite-rails which ends up running the conventional vite-dev server. I looked, and I'm not seeing an option for the preview server. I can file issues there about that.

I think there's still value in improving the performance of this particular method in any event.

If anyone reading this is looking for resources about Ruby & Memory, I've found How Ruby Uses Memory — SitePoint to be a great guide. It touches a bit on how object allocation can affect performance.

@bblimke
Copy link
Owner

bblimke commented Aug 19, 2023

@baweaver thank you for this great research and analysis. Awesome work! The PR is merged.

@bblimke bblimke closed this as completed Aug 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants