Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-X should batch the number of passed files to the maximum supported by the shell #410

Closed
lespea opened this issue Feb 15, 2019 · 18 comments · Fixed by #866 or #1020
Closed

-X should batch the number of passed files to the maximum supported by the shell #410

lespea opened this issue Feb 15, 2019 · 18 comments · Fixed by #866 or #1020

Comments

@lespea
Copy link

lespea commented Feb 15, 2019

It appears that if you run getconf ARG_MAX it returns the maximum length that the command string can be. Possibly include a command to artificially limit the number of arguments as well?

$ fd -IH . -tf -X wc -l
[fd error]: Problem while executing command: Argument list too long (os error 7)
@sharkdp
Copy link
Owner

sharkdp commented Feb 16, 2019

Thank you for reporting this. That was a known limitation when we first implemented --exec-batch, but we should definitely try to fix this.

Thank you for the information about getconf ARG_MAX. Looks like this should work on all POSIX systems. We will have to check how to get that information on Windows.

@tavianator
Copy link
Collaborator

Note that actually counting the size of your arguments is not straightforward: https://github.com/tavianator/bfs/blob/master/exec.c#L61

On Linux at least, you have to count the total length of the command line arguments and environment variables (and auxiliary vector), including NUL terminators, plus the length of the argv/environ pointer arrays themselves (including the final NULL elements). The kernel actually allocates these a page at a time, so you have to round up to the nearest page size. And POSIX recommends you leave an additional 2048 bytes just in case.

This is all finicky enough that I also implemented code in bfs to detect and recover from E2BIG by trying fewer and fewer arguments until it works, just in case the ARG_MAX accounting is wrong or some other platform counts them differently. I don't know if that's feasible in rust.

@Artoria2e5
Copy link

It could be easier to implement on Windows, as the accounting only involves the size of the ''lpCommandLine'' parameter. However, it is still non-trivial since the escapes performed by the rust std will increase the size of the command-line. Since we are going to break the std::process::Command abstraction either way, it might make sense to just ask rust-lang to make such a thing available.

@tavianator
Copy link
Collaborator

See rust-lang/rust#40384

@sharkdp
Copy link
Owner

sharkdp commented Aug 7, 2021

In #768, @BurritoBurrato suggests to add a --batch-size argument, possibly in addition to an automatically computed (max) batch size. This would probably be much easier to implement. And it might be better to have this option instead of having nothing (inevitably causing "Argument list too long" errors).

This makes me think. Is there a reasonable (sub-optimal) limit that we could set for the batch size that should work for most platforms/environments? This wouldn't be ideal, but better than the current situation.

@tmccombs
Copy link
Collaborator

tmccombs commented Aug 8, 2021

Is there a reasonable (sub-optimal) limit that we could set for the batch size

I'm not sure. On Linux at least, the limit is a combination of all environment variables, and all command line arguments. So if it is run on an unusually large environment, the space remaining may be unusually small.

If it's helpful, the output of xargs --show-limits on my linux system is:

POSIX upper limit on argument length (this system): 2091895
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2088686
Size of command buffer we are actually using: 131072

I'm not sure where xargs gets the buffer size of 131072 from.

@devonhollowood
Copy link
Contributor

devonhollowood commented Oct 20, 2021

I can take a shot at implementing a --batch-size option (as discussed above). I figure that gives a good work around, and leaves open the option of later adding a default that is something like "however many options fit". Potentially --batch-size 0 would be equivalent to "no limit". Sound good?

@tmccombs
Copy link
Collaborator

Sounds good to me.

@tavianator
Copy link
Collaborator

I don't really think this is fixed yet. --batch-size is a useful work around but it requires you to guess a limit. Eventually we should properly respect ARG_MAX

@tavianator tavianator reopened this Oct 23, 2021
@tmccombs
Copy link
Collaborator

Agreed.

@sharkdp
Copy link
Owner

sharkdp commented Oct 24, 2021

Eventually we should properly respect ARG_MAX

I started to implement this. I took the liberty to do a 1:1 translation of your implementation in bfs, @tavianator. I hope you are okay with this (I haven't put a license on the project so far). The idea is to have a std::process::Command wrapper that has a try_arg function which returns false if no more argument could be added. This is far from finished.

Limitations:

  • Linux only
  • No additional env. variables can be added for the child process
  • Only a part of the process::Command API is provided

That being said, I was able to get the exact same numbers as the ones in the bfs -D exec … debug output (which was a bit tricky to compare, as the environment is not exactly the same due to things like the _ environment variable).

The library also contains a test that determines the limit experimentally (via a binary search on echo foo foo foo …). And I can confirm that the "bfs"-computed limit is below (but not too much below) the experimental limit.

https://github.com/sharkdp/argmax

@tavianator
Copy link
Collaborator

Oh awesome! I was actually going to do that today too but didn't get around to it.

I'm definitely okay with it, and pretty much any licence is compatible with bfs's.

I think the implementation is probably good enough for any unix, not just Linux.

@sharkdp
Copy link
Owner

sharkdp commented Nov 16, 2021

I got some time today to continue working on this. There is more work to be done.

  • fix compile errors on all fd-supported platforms
    • i686-pc-windows-mscv
    • i686-unknown-linux-gnu
    • i686-unknown-linux-musl
    • x86_64-apple-darwin
    • x86_64-pc-windows-gnu
    • x86_64-pc-windows-mscv
    • x86_64-unknown-linux-gnu
    • x86_64-unknown-linux-musl
  • fix unit test on all platforms
    • i686-pc-windows-mscv
    • i686-unknown-linux-gnu
    • i686-unknown-linux-musl
    • x86_64-apple-darwin
    • x86_64-pc-windows-gnu
    • x86_64-pc-windows-mscv
    • x86_64-unknown-linux-gnu
    • x86_64-unknown-linux-musl
  • Make sure that enforced limit is reasonably close* to actual (experimental) limit
    • i686-pc-windows-mscv (1020 vs 2045)
    • i686-unknown-linux-gnu (test fails)
    • i686-unknown-linux-musl (15,459 vs 349,409)
    • x86_64-apple-darwin (20,585 vs 21,096)
    • x86_64-pc-windows-gnu (679 vs 2045)
    • x86_64-pc-windows-mscv (679 vs 2045)
    • x86_64-unknown-linux-gnu (348,329 vs 348,841)
    • x86_64-unknown-linux-musl (10,294 vs 349,408)

Ref: current CI output: https://github.com/sharkdp/argmax/actions/runs/1469073321

I also found out that there is another limit that needs to be enforced (not relevant for fd). There is a max. length for a single argument. On Linux, this is 32 * PAGE_SIZE. This is smaller than the max. command-line length. I added an additional unit test for this.

Any help would be very much appreciated.

*reasonably close: it shouldn't be an order of magnitude smaller than the actual limit.

@tavianator
Copy link
Collaborator

tavianator commented Nov 16, 2021

For i686-unknown-linux-gnu, I wonder if the issue is that the (presumably) 64-bit kernel and the 32-bit test binary disagree about sizeof(char *)?

Edit: seems like that is what's happening. This workaround fixes the failure for me:

diff --git a/src/unix.rs b/src/unix.rs
index 9f4c419..04c7501 100644
--- a/src/unix.rs
+++ b/src/unix.rs
@@ -48,7 +48,7 @@ fn size_of_environment() -> i64 {
 /// Required size to store a single ARG argument and the corresponding
 /// pointer in argv**.
 pub(crate) fn arg_size<O: AsRef<OsStr>>(arg: O) -> i64 {
-    size_of::<*const c_char>() as i64 // size for the pointer in argv**
+    size_of::<u64>() as i64 // size for the pointer in argv**
       + arg.as_ref().len() as i64     // size for argument string
       + 1 // terminating NULL
 }

I don't know what the best fix is. To be perfectly accurate we'd have to know whether the binary we're executing is a 32- or 64-bit binary. Maybe it's okay to conservatively assume that pointers are always 8 bytes.

@sharkdp
Copy link
Owner

sharkdp commented Jan 23, 2022

I released a first version of the argmax crate with a pretty minimalistic API. But it might be enough to test the integration into fd if someone wants to work on this.

@sambacha
Copy link

having this issue on macOS 10.15 as well

@tavianator tavianator self-assigned this May 19, 2022
@tavianator
Copy link
Collaborator

I am working on this, but the details are pretty awkward. Since fd supports things like

$ fd -X echo before {} after

I have to support adding arguments after the paths. But that's pretty awkward with try_arg(), since if try_arg("after") fails, I have to somehow back up and remove a path or something.

What I think we want is something like

if cmd.has_room_for([arg, "after", ...]) {
    assert!(cmd.try_arg(arg));
} else {
    assert!(cmd.try_args(["after", ...]));
    cmd.spawn();
    cmd = Command::new(...);
}

@sharkdp
Copy link
Owner

sharkdp commented May 19, 2022

Right - I was afraid that try_arg would not be the most convenient API we could come up with. argmax certainly needs some polishing and I am happy to integrate a more user-friendly interface or some convenience methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants