Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize traverse_and_update without accumulator #584

Merged

Conversation

ypconstante
Copy link
Contributor

Today traverse_and_update/2 calls the same traverse functions as traverse_and_update/3, so the traverse keeps creating a bunch of tuples to store the accumulator value unnecessarily.

This PR optimizes traverse_and_update/2 by avoiding this tuples, it required basically duplicating the traverse functions, since the implementation is small it shouldn't be an issue.

read_file = fn name ->
  __ENV__.file
  |> Path.dirname()
  |> Path.join(name)
  |> File.read!()
  |> Floki.parse_document!()
end

inputs = %{
  "big" => read_file.("big.html"),
  "medium" => read_file.("medium.html"),
  "small" => read_file.("small.html")
}

Benchee.run(
  %{
    "delete footer-poweredbyico" => fn doc ->
      Floki.traverse_and_update(doc, fn node ->
        if Floki.attribute(node, "id") == ["footer-poweredbyico"] do
          nil
        else
          node
        end
      end)
    end,
    "do nothing" => fn doc ->
      Floki.traverse_and_update(doc, &Function.identity/1)
    end,
    "reverse all children" => fn doc ->
      Floki.traverse_and_update(doc, fn
        {elem, attrs, children} -> {elem, attrs, Enum.reverse(children)}
        node -> node
      end)
    end
  },
  time: 10,
  inputs: inputs,
  save: [path: "benchs/results/raw-html-#{tag}.benchee", tag: tag],
  memory_time: 2
)
##### With input big #####
Name                                        ips        average  deviation         median         99th %
do nothing (pr)                          581.63        1.72 ms    ±54.58%        2.29 ms        2.90 ms
reverse all children (pr)                335.23        2.98 ms     ±6.86%        2.97 ms        3.64 ms
delete footer-poweredbyico (pr)          231.10        4.33 ms    ±22.97%        3.87 ms        6.11 ms
do nothing (main)                        214.01        4.67 ms     ±7.65%        4.66 ms        5.40 ms
reverse all children (main)              197.61        5.06 ms    ±23.27%        4.90 ms        7.44 ms
delete footer-poweredbyico (main)        172.87        5.78 ms    ±30.27%        5.92 ms        8.32 ms

Comparison:
do nothing (pr)                          581.63
reverse all children (pr)                335.23 - 1.73x slower +1.26 ms
delete footer-poweredbyico (pr)          231.10 - 2.52x slower +2.61 ms
do nothing (main)                        214.01 - 2.72x slower +2.95 ms
reverse all children (main)              197.61 - 2.94x slower +3.34 ms
delete footer-poweredbyico (main)        172.87 - 3.36x slower +4.07 ms

Memory usage statistics:

Name                                 Memory usage
do nothing (pr)                           1.00 MB
reverse all children (pr)                 1.83 MB - 1.83x memory usage +0.83 MB
delete footer-poweredbyico (pr)           2.29 MB - 2.29x memory usage +1.29 MB
do nothing (main)                         2.87 MB - 2.86x memory usage +1.86 MB
reverse all children (main)               3.70 MB - 3.68x memory usage +2.69 MB
delete footer-poweredbyico (main)         4.16 MB - 4.14x memory usage +3.15 MB

**All measurements for memory usage were the same**

##### With input medium #####
Name                                        ips        average  deviation         median         99th %
do nothing (pr)                          4.12 K        0.24 ms    ±11.05%        0.23 ms        0.34 ms
reverse all children (pr)                3.26 K        0.31 ms    ±11.17%        0.29 ms        0.42 ms
reverse all children (main)              2.61 K        0.38 ms     ±7.61%        0.38 ms        0.49 ms
do nothing (main)                        0.85 K        1.17 ms    ±27.36%        0.99 ms        1.77 ms
delete footer-poweredbyico (pr)          0.84 K        1.19 ms    ±12.74%        1.17 ms        1.87 ms
delete footer-poweredbyico (main)        0.56 K        1.80 ms     ±4.57%        1.79 ms        2.07 ms

Comparison:
do nothing (pr)                          4.12 K
reverse all children (pr)                3.26 K - 1.26x slower +0.0636 ms
reverse all children (main)              2.61 K - 1.58x slower +0.141 ms
do nothing (main)                        0.85 K - 4.83x slower +0.93 ms
delete footer-poweredbyico (pr)          0.84 K - 4.90x slower +0.95 ms
delete footer-poweredbyico (main)        0.56 K - 7.40x slower +1.55 ms

Memory usage statistics:

Name                                 Memory usage
do nothing (pr)                         341.23 KB
reverse all children (pr)               608.67 KB - 1.78x memory usage +267.44 KB
reverse all children (main)            1212.55 KB - 3.55x memory usage +871.32 KB
do nothing (main)                       945.16 KB - 2.77x memory usage +603.92 KB
delete footer-poweredbyico (pr)         816.06 KB - 2.39x memory usage +474.83 KB
delete footer-poweredbyico (main)      1419.94 KB - 4.16x memory usage +1078.70 KB

**All measurements for memory usage were the same**

##### With input small #####
Name                                        ips        average  deviation         median         99th %
reverse all children (pr)               12.40 K       80.65 μs   ±164.37%       40.63 μs      686.97 μs
do nothing (pr)                         12.26 K       81.55 μs   ±154.68%       30.62 μs      609.80 μs
do nothing (main)                        5.13 K      195.06 μs    ±66.46%      222.88 μs      439.79 μs
delete footer-poweredbyico (pr)          4.42 K      226.28 μs    ±56.23%      261.80 μs      462.57 μs
reverse all children (main)              3.73 K      268.35 μs    ±48.08%      279.34 μs      532.85 μs
delete footer-poweredbyico (main)        2.74 K      364.54 μs    ±31.97%      347.42 μs      640.36 μs

Comparison:
reverse all children (pr)               12.40 K
do nothing (pr)                         12.26 K - 1.01x slower +0.90 μs
do nothing (main)                        5.13 K - 2.42x slower +114.41 μs
delete footer-poweredbyico (pr)          4.42 K - 2.81x slower +145.63 μs
reverse all children (main)              3.73 K - 3.33x slower +187.70 μs
delete footer-poweredbyico (main)        2.74 K - 4.52x slower +283.90 μs

Memory usage statistics:

Name                                 Memory usage
reverse all children (pr)               125.11 KB
do nothing (pr)                          69.08 KB - 0.55x memory usage -56.03125 KB
do nothing (main)                       191.57 KB - 1.53x memory usage +66.46 KB
delete footer-poweredbyico (pr)         167.14 KB - 1.34x memory usage +42.03 KB
reverse all children (main)             247.60 KB - 1.98x memory usage +122.49 KB
delete footer-poweredbyico (main)       289.61 KB - 2.31x memory usage +164.50 KB

Copy link
Owner

@philss philss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! 💜 🚀

@philss philss merged commit 62e1d48 into philss:main Jul 29, 2024
15 checks passed
@ypconstante ypconstante deleted the optimize-traversal-without-accumulator branch August 6, 2024 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants