Fix `ld`'s response files support for special files #131

Gabriella439 · 2023-01-31T01:05:09Z

Currently ld uses mmap to read in the contents of the response file (presumably for performance reasons), but mmap sometimes doesn't work on special files like pipes. For example, if you do something like:

$ ld @<(echo --help)

… that will currently fail with a segmentation fault.

You might wonder why one would want to generate a response file from process substitution. The rationale behind this is that I'm currently in the process of fixing a longstanding issue with the linker failing in Nixpkgs on macOS due to hitting command-line length limits and the fix entails the use of process substitution to generate the process file.

Specifically, what I was doing was building upon the work from this PR:

NixOS/nixpkgs#112449

… which modified the cc-wrapper in Nixpkgs to use a response file generate from process substitution. I was going to do essentially the same for the ld-wrapper in Nixpkgs, but that failed with a segmentation fault (for the reasons outlined above).

There are other possible ways to work around that, but using process substitution is the "leanest" way of generating the response file for ld in Nixpkgs, so I wanted to push on getting that working here instead of working around the problem downstream.

So the way I fixed it was to fall back to using read instead of mmap if the mmap failed. After this change, the above sample command now works correctly.

This also fixes another small issue along the way: this now correctly detects when the mmap fails. Previously, the mmap logic was detecting failure by looking for a NULL/0 return value, but that is not the correct error-handling behavior. mmap returns MAP_FAILED on failure, which is -1 in practice, and not 0. That's the reason why the code was failing with a segmentation fault before because it wasn't detecting the failure and proceeding to read from the invalid buffer anyway.

Currently `ld` uses `mmap` to read in the contents of the response file (presumably for performance reasons), but `mmap` sometimes doesn't work on special files like pipes. For example, if you do something like: ``` $ ld @<(echo --help) ``` … that will currently fail with a segmentation fault. You might wonder why one would want to generate a response file from process substitution. The rationale behind this is that I'm currently in the process of fixing a longstanding issue with the linker failing in Nixpkgs on macOS due to hitting command-line length limits and the fix entails the use of process substitution to generate the process file. Specifically, what I was doing was building upon the work from this PR: NixOS/nixpkgs#112449 … which modified the `cc-wrapper` in Nixpkgs to use a response file generate from process substitution. I was going to do essentially the same for the `ld-wrapper` in Nixpkgs, but that failed with a segmentation fault (for the reasons outlined above). There are other possible ways to work around that, but using process substitution is the "leanest" way of generating the response file for `ld` in Nixpkgs, so I wanted to push on getting that working here instead of working around the problem downstream. So the way I fixed it was to fall back to using `read` instead of `mmap` if the `mmap` failed. After this change, the above sample command now works correctly. This also fixes another small issue along the way: this now correctly detects when the `mmap` fails. Previously, the `mmap` logic was detecting failure by looking for a `NULL`/`0` return value, but that is not the correct error-handling behavior. `mmap` returns `MAP_FAILED` on failure, which is `-1` in practice, and not `0`. That's the reason why the code was failing with a segmentation fault before because it wasn't detecting the failure and proceeding to read from the invalid buffer anyway.

The motivation behind this is to alleviate the problem described in NixOS#41340. I'm not sure if this completely fixes the problem, but it eliminates one more area where we can exceed command line length limits. This is essentially the same change as in NixOS#112449, except for `ld-wrapper.sh` instead of `cc-wrapper.sh`. However, that change alone was not enough; on macOS the `ld` provided by `darwin.cctools` fails if you use process substitution to generate the response file, so I put up a PR to fix that: tpoechtrager/cctools-port#131 … and I included a patch referencing that fix so that the new `ld-wrapper` still works on macOS.

This is a follow-up to: tpoechtrager#131 … which introduced a bug: the `st_size` field returned by does not necessarily accurately represent the length if the file is a special file. For example, on macOS the `st_size` field appears to empirically be no larger than 64 KB for special files, even if the actual input is larger than 64 KB. Consequently, `ld` incorrectly truncates the input arguments to 64 KB on macOS for large response files, which defeats the purpose of response files (since they're typically used to support large command lines). For these special files, we don't have a good way to ascertain the length of the input other than to `read` from the input and see how many bytes we receive. This means that we can't allocate all the necessary memory we require up front and instead we need to dynamically resize the argument array as we read.

This is a follow-up to: tpoechtrager#131 … which introduced a bug: the `st_size` field does not necessarily accurately represent hte length if the file is a special file. For example, on macOS the `st_size` field appears to empirically be no larger than 64 KB for special files, even if the actual input is larger than 64 KB. As a result, `ld` was truncating the input arguments to 64 KB for large response files, which defeats the purpose of response files (since they're typically used to support large command lines). More generally, this issue isn't specific to `fstat`, but rather appears to be an issue with anything that uses a file descriptor returned by `open`. For example, `read` misbehaves in the exact same way and refuses to read more than 64 KB from a special file that was opened by `fopen`. For these special files, we don't have a good way to ascertain the length of the input because `fstat` won't work (that only works on file descriptors returned by `open`, which misbehave on special files), nor can we seek to the end to determine the length (because special files might not support rewinding the input) so the current solution is to simply read from the input to the end and see how many bytes we receive. This means that we can't allocate all the necessary memory we require up front and instead we need to dynamically resize the argument array as we read.

This is a follow-up to: tpoechtrager#131 … which introduced a bug: the `st_size` field does not necessarily accurately represent the length if the file is a special file. For example, on macOS the `st_size` field appears to empirically be no larger than 64 KB for special files (presumably the size of some buffer), even if the actual input is larger than 64 KB. As a result, `ld` was truncating the input arguments to 64 KB for large response files, which defeats the purpose of response files (since they're typically used to support large command lines). More generally, this issue isn't specific to `fstat`, but rather appears to be an issue with anything that uses a file descriptor returned by `open`. For example, `read` misbehaves in the exact same way and refuses to read more than 64 KB from a special file that was opened by `open` even if you try to repeatedly `read` from the file to completion. For these special files, we don't have a good way to ascertain the length of the input because `fstat` won't work (that only works on file descriptors returned by `open`, which misbehave on special files), nor can we seek to the end to determine the length (because special files might not support rewinding the input) so the first part of this fix is to simply read from the input to the end and see how many bytes receive. This means that we can't allocate all the necessary memory we require up front and instead we need to dynamically resize the argument array as we read. The second part of this solution is to use `fopen` / `fread` / `close` on the unhappy path when `mmap` fails instead of using `open` / `read` / `close` since the latter operations misbehave on special files.

The motivation behind this is to alleviate the problem described in NixOS#41340. I'm not sure if this completely fixes the problem, but it eliminates one more area where we can exceed command line length limits. This is essentially the same change as in NixOS#112449, except for `ld-wrapper.sh` instead of `cc-wrapper.sh`. However, that change alone was not enough; on macOS the `ld` provided by `darwin.cctools` fails if you use process substitution to generate the response file, so I put up two PRs to fix that: tpoechtrager/cctools-port#131 tpoechtrager/cctools-port#132 … and I included a patch referencing that fix so that the new `ld-wrapper` still works on macOS.

The motivation behind this is to alleviate the problem described in #41340. I'm not sure if this completely fixes the problem, but it eliminates one more area where we can exceed command line length limits. This is essentially the same change as in #112449, except for `ld-wrapper.sh` instead of `cc-wrapper.sh`. However, that change alone was not enough; on macOS the `ld` provided by `darwin.cctools` fails if you use process substitution to generate the response file, so I put up a PR to fix that: tpoechtrager/cctools-port#131 … and I included a patch referencing that fix so that the new `ld-wrapper` still works on macOS.

Gabriella439 mentioned this pull request Jan 31, 2023

bintools: Add response file support to ld-wrapper NixOS/nixpkgs#213831

Merged

3 tasks

tpoechtrager merged commit fac6a28 into tpoechtrager:master Feb 2, 2023

Gabriella439 mentioned this pull request Feb 22, 2023

Fix ld's response file support for special files (part 2) #132

Merged

Gabriella439 mentioned this pull request Feb 22, 2023

bintools: Add response file support to ld-wrapper MercuryTechnologies/nixpkgs#2

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `ld`'s response files support for special files #131

Fix `ld`'s response files support for special files #131

Gabriella439 commented Jan 31, 2023

Fix ld's response files support for special files #131

Fix ld's response files support for special files #131

Conversation

Gabriella439 commented Jan 31, 2023

Fix `ld`'s response files support for special files #131

Fix `ld`'s response files support for special files #131