Skip to content

Command-line arguments are cloned a lot on Unix #47164

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mbrubeck opened this issue Jan 3, 2018 · 5 comments
Open

Command-line arguments are cloned a lot on Unix #47164

mbrubeck opened this issue Jan 3, 2018 · 5 comments
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. O-linux Operating system: Linux T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@mbrubeck
Copy link
Contributor

mbrubeck commented Jan 3, 2018

The std::sys::unix::args module does a lot of allocation and cloning of command-line parameters:

  1. On startup, std::sys::unix::args::init copies all of the command-line arguments into a Box<Vec<Vec<u8>>> (except on macOS and iOS).
  2. When std::env::args or args_os is called, it eagerly copies all of the args into a new Vec<OsString>.

On non-Apple systems, this means there is at least one allocation and clone per argument (plus 2 additional allocations, for the outer Vec and Box) even if they are never accessed. These extra allocations take up space on the heap for the duration of the program.

On both Apple and non-Apple systems, accessing any args causes at least one additional allocation and clone of every arg. Calling std::env::args more than once causes all arguments to be cloned again, even if the caller doesn't iterate through all of them.

On Windows, for comparison, each arg is cloned lazily only when it is yielded from the iterator, so there are zero allocations or clones for args that are never accessed (update: at least, no clones in Rust code; see comments below).

@steveklabnik
Copy link
Member

Incidentally, I was just reading http://andrewkelley.me/post/zig-december-2017-in-review.html which says

It's really a shame that Windows command line parsing requires you to allocate memory. This means that to have a cross-platform API for command line arguments, even though in POSIX it can never fail, we have to handle the possibility because of Windows.

and i was wondering about our code regarding this.

@mbrubeck
Copy link
Contributor Author

mbrubeck commented Jan 3, 2018

Some ideas on reducing each source of allocation/cloning (on startup, and on iterator construction):

  1. Copying on startup could be avoided by storing the argc and argv values that init receives from the OS, instead of cloning their contents. This could change behavior for programs that use unsafe platform-specific code to access these values directly and mutate them, then later call std::env::args. But such programs already behave inconsistently between different platforms (e.g. macOS versus Linux).

  2. Eager copying when constructing the Args iterator could be replaced by lazy cloning during iteration, as we already do on Windows. This requires that the data it clones from is guaranteed to last for the duration of the iterator (which has a 'static type, so it could be up to the duration of the program). This should be safe for data that is created in init and destroyed in cleanup as in the current non-Apple Unix implementation, since cleanup runs after catch_unwind(main). For data owned by the OS, it again can be affected by unsafe-platform-specific code that mutates this data directly, but again I argue that such programs already have poorly-specified behavior.

@mbrubeck
Copy link
Contributor Author

mbrubeck commented Jan 3, 2018

Incidentally, I was just reading http://andrewkelley.me/post/zig-december-2017-in-review.html which says

It's really a shame that Windows command line parsing requires you to allocate memory.

Ah, yes. On Windows the Args iterator constructor calls CommandLineToArgvW which allocates an array of pointers and a single UTF-16 buffer to hold a copy of the args. So while it doesn't allocate and clone each arg individually, it does do 1 or 2 allocations, and copies the whole command line as UTF-16.

We definitely can't get to zero copies on Windows, because we at least need to do UTF-16 to UTF-8 conversion.

@mbrubeck
Copy link
Contributor Author

mbrubeck commented Jan 3, 2018

#47165 eliminates the allocations/copies on startup.

@estebank estebank added I-slow Issue: Problems and improvements with respect to performance of generated code. O-linux Operating system: Linux labels Jan 4, 2018
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this issue Jan 6, 2018
[unix] Don't clone command-line args on startup

Fixes part of rust-lang#47164 and simplifies the `args` code on non-Apple Unix platforms.

Note: This could change behavior for programs that use both `std::env::args` *and* unsafe code that mutates `argv` directly.  However, these programs already behave differently on different platforms.  The new behavior on non-Apple platforms is closer to the existing behavior on Apple platforms.
@XAMPPRocky XAMPPRocky added O-macos Operating system: macOS C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Apr 10, 2018
@Enselic Enselic added T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Sep 27, 2023
@madsmtm
Copy link
Contributor

madsmtm commented Aug 21, 2024

Triage: fixed by #47165

And specifically not related to macOS:
@rustbot label -O-macos

@rustbot rustbot removed the O-macos Operating system: macOS label Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. O-linux Operating system: Linux T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants