-
Notifications
You must be signed in to change notification settings - Fork 35
Simplify mkdir_p (and make it faster in some cases)
#60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Doing it this way is simpler and it doesn't end up adding "/" to the list of folders, so it doesn't need to be removed later.
I think it's debatable which is the most common usage of
`FileUtils.mkdir_p`, but even assuming the most common use case is
creating a folder when it doesn't previously exist but the parent does,
this optimization doesn't seem to have a noticiable effect there while
harming other use cases.
For benchmarks, I created this script
```ruby
require "benchmark/ips"
Benchmark.ips do |x|
x.report("old mkdir_p - exists") do
FileUtils.mkdir_p "/tmp"
end
x.report("new_mkdir_p - exists") do
FileUtils.mkdir_p_new "/tmp"
end
x.compare!
end
FileUtils.rm_rf "/tmp/foo"
Benchmark.ips do |x|
x.report("old mkdir_p - doesnt exist, parent exists") do
FileUtils.mkdir_p "/tmp/foo"
FileUtils.rm_rf "/tmp/foo"
end
x.report("new_mkdir_p - doesnt exist, parent exists") do
FileUtils.mkdir_p_new "/tmp/foo"
FileUtils.rm_rf "/tmp/foo"
end
x.compare!
end
Benchmark.ips do |x|
x.report("old mkdir_p - doesnt exist, parent either") do
FileUtils.mkdir_p "/tmp/foo/bar"
FileUtils.rm_rf "/tmp/foo"
end
x.report("new_mkdir_p - doesnt exist, parent either") do
FileUtils.mkdir_p_new "/tmp/foo/bar"
FileUtils.rm_rf "/tmp/foo"
end
x.compare!
end
Benchmark.ips do |x|
x.report("old mkdir_p - more levels") do
FileUtils.mkdir_p "/tmp/foo/bar/baz"
FileUtils.rm_rf "/tmp/foo"
end
x.report("new_mkdir_p - more levels") do
FileUtils.mkdir_p_new "/tmp/foo/bar/baz"
FileUtils.rm_rf "/tmp/foo"
end
x.compare!
end
```
and copied the method with the "optimization" removed as
`FileUtils.mkdir_p_new`. The results are as below:
```
Warming up --------------------------------------
old mkdir_p - exists 15.914k i/100ms
new_mkdir_p - exists 46.512k i/100ms
Calculating -------------------------------------
old mkdir_p - exists 161.461k (± 3.2%) i/s - 811.614k in 5.032315s
new_mkdir_p - exists 468.192k (± 2.9%) i/s - 2.372M in 5.071225s
Comparison:
new_mkdir_p - exists: 468192.1 i/s
old mkdir_p - exists: 161461.0 i/s - 2.90x (± 0.00) slower
Warming up --------------------------------------
old mkdir_p - doesnt exist, parent exists
2.142k i/100ms
new_mkdir_p - doesnt exist, parent exists
1.961k i/100ms
Calculating -------------------------------------
old mkdir_p - doesnt exist, parent exists
21.242k (± 6.7%) i/s - 107.100k in 5.069206s
new_mkdir_p - doesnt exist, parent exists
19.682k (± 4.2%) i/s - 100.011k in 5.091961s
Comparison:
old mkdir_p - doesnt exist, parent exists: 21241.7 i/s
new_mkdir_p - doesnt exist, parent exists: 19681.7 i/s - same-ish: difference falls within error
Warming up --------------------------------------
old mkdir_p - doesnt exist, parent either
945.000 i/100ms
new_mkdir_p - doesnt exist, parent either
1.002k i/100ms
Calculating -------------------------------------
old mkdir_p - doesnt exist, parent either
9.689k (± 4.4%) i/s - 49.140k in 5.084342s
new_mkdir_p - doesnt exist, parent either
10.806k (± 4.6%) i/s - 54.108k in 5.020714s
Comparison:
new_mkdir_p - doesnt exist, parent either: 10806.3 i/s
old mkdir_p - doesnt exist, parent either: 9689.3 i/s - 1.12x (± 0.00) slower
Warming up --------------------------------------
old mkdir_p - more levels
702.000 i/100ms
new_mkdir_p - more levels
775.000 i/100ms
Calculating -------------------------------------
old mkdir_p - more levels
7.046k (± 3.5%) i/s - 35.802k in 5.087548s
new_mkdir_p - more levels
7.685k (± 5.5%) i/s - 38.750k in 5.061351s
Comparison:
new_mkdir_p - more levels: 7685.1 i/s
old mkdir_p - more levels: 7046.4 i/s - same-ish: difference falls within error
```
I think it's better to keep the code simpler is the optimization is not
so clear like in this case.
|
I also wanted to remove all code using |
|
Hei @jeremyevans, I noticed that the code I'm modifying in the first commit here is relatively new. I think my solution doesn't change your idea in #53, just makes the code simpler. Let me know what you think, in case you have some time. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach. It makes the code simpler and shouldn't change the behavior, creating only the minimum number of directories necessary.
|
Thanks! |
|
@deivid-rodriguez Actually it changed the behavior unfortunately. https://bugs.ruby-lang.org/issues/18941 Could you check it out? |
|
It seems some Windows specific issue, right? I don't have much time right now but I will try to have a look! |
|
Oh, wait, I see the problem now. Let me fix it! |
|
So, I don't have a Windows machine right now to try this but this is my guess. Intuitively I think we could restore the previous behavior by checking until path == stack.last # dirname("/")=="/", dirname("C:/")=="C:/"However, that doesn't work for me: Maybe that's because I'm not on Windows, but this check in So I'm not sure, I guess we need to try it on Windows. But in any case, I think reverting e842a0e would also fix the issue because it would always eagerly try to create the directory and generate the expected error in this case. |
There's two commits here:
The first one simplifies building the list of paths that need to be created. Previously the exit condition for the
untilloop was thatFile.dirname(path) == path(root path), but that's unnecessary. It's enough to exit the loop once we find a directory that already exists while traversing the path upwards.The second one removes an optimization that doesn't seem to affect much the case it's meant to optimize (creating a folder that doesn't exist already under a folder that already exists), but does affect other use cases (for example, when the target folder already exists). To measure the effect of this commit, I copied the method with the optimization removed as
FileUtils.mkidr_p_new, and run the following micro benchmark:Results were: