-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ld
's response files support for special files
#131
Merged
tpoechtrager
merged 1 commit into
tpoechtrager:master
from
MercuryTechnologies:gabriella/fix_response_files
Feb 2, 2023
Merged
Fix ld
's response files support for special files
#131
tpoechtrager
merged 1 commit into
tpoechtrager:master
from
MercuryTechnologies:gabriella/fix_response_files
Feb 2, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Currently `ld` uses `mmap` to read in the contents of the response file (presumably for performance reasons), but `mmap` sometimes doesn't work on special files like pipes. For example, if you do something like: ``` $ ld @<(echo --help) ``` … that will currently fail with a segmentation fault. You might wonder why one would want to generate a response file from process substitution. The rationale behind this is that I'm currently in the process of fixing a longstanding issue with the linker failing in Nixpkgs on macOS due to hitting command-line length limits and the fix entails the use of process substitution to generate the process file. Specifically, what I was doing was building upon the work from this PR: NixOS/nixpkgs#112449 … which modified the `cc-wrapper` in Nixpkgs to use a response file generate from process substitution. I was going to do essentially the same for the `ld-wrapper` in Nixpkgs, but that failed with a segmentation fault (for the reasons outlined above). There are other possible ways to work around that, but using process substitution is the "leanest" way of generating the response file for `ld` in Nixpkgs, so I wanted to push on getting that working here instead of working around the problem downstream. So the way I fixed it was to fall back to using `read` instead of `mmap` if the `mmap` failed. After this change, the above sample command now works correctly. This also fixes another small issue along the way: this now correctly detects when the `mmap` fails. Previously, the `mmap` logic was detecting failure by looking for a `NULL`/`0` return value, but that is not the correct error-handling behavior. `mmap` returns `MAP_FAILED` on failure, which is `-1` in practice, and not `0`. That's the reason why the code was failing with a segmentation fault before because it wasn't detecting the failure and proceeding to read from the invalid buffer anyway.
3 tasks
Gabriella439
added a commit
to MercuryTechnologies/nixpkgs
that referenced
this pull request
Feb 3, 2023
The motivation behind this is to alleviate the problem described in NixOS#41340. I'm not sure if this completely fixes the problem, but it eliminates one more area where we can exceed command line length limits. This is essentially the same change as in NixOS#112449, except for `ld-wrapper.sh` instead of `cc-wrapper.sh`. However, that change alone was not enough; on macOS the `ld` provided by `darwin.cctools` fails if you use process substitution to generate the response file, so I put up a PR to fix that: tpoechtrager/cctools-port#131 … and I included a patch referencing that fix so that the new `ld-wrapper` still works on macOS.
Gabriella439
added a commit
to MercuryTechnologies/cctools-port
that referenced
this pull request
Feb 10, 2023
This is a follow-up to: tpoechtrager#131 … which introduced a bug: the `st_size` field returned by does not necessarily accurately represent the length if the file is a special file. For example, on macOS the `st_size` field appears to empirically be no larger than 64 KB for special files, even if the actual input is larger than 64 KB. Consequently, `ld` incorrectly truncates the input arguments to 64 KB on macOS for large response files, which defeats the purpose of response files (since they're typically used to support large command lines). For these special files, we don't have a good way to ascertain the length of the input other than to `read` from the input and see how many bytes we receive. This means that we can't allocate all the necessary memory we require up front and instead we need to dynamically resize the argument array as we read.
Gabriella439
added a commit
to MercuryTechnologies/cctools-port
that referenced
this pull request
Feb 14, 2023
This is a follow-up to: tpoechtrager#131 … which introduced a bug: the `st_size` field returned by does not necessarily accurately represent the length if the file is a special file. For example, on macOS the `st_size` field appears to empirically be no larger than 64 KB for special files, even if the actual input is larger than 64 KB. Consequently, `ld` incorrectly truncates the input arguments to 64 KB on macOS for large response files, which defeats the purpose of response files (since they're typically used to support large command lines). For these special files, we don't have a good way to ascertain the length of the input other than to `read` from the input and see how many bytes we receive. This means that we can't allocate all the necessary memory we require up front and instead we need to dynamically resize the argument array as we read.
Gabriella439
added a commit
to MercuryTechnologies/cctools-port
that referenced
this pull request
Feb 16, 2023
This is a follow-up to: tpoechtrager#131 … which introduced a bug: the `st_size` field does not necessarily accurately represent hte length if the file is a special file. For example, on macOS the `st_size` field appears to empirically be no larger than 64 KB for special files, even if the actual input is larger than 64 KB. As a result, `ld` was truncating the input arguments to 64 KB for large response files, which defeats the purpose of response files (since they're typically used to support large command lines). More generally, this issue isn't specific to `fstat`, but rather appears to be an issue with anything that uses a file descriptor returned by `open`. For example, `read` misbehaves in the exact same way and refuses to read more than 64 KB from a special file that was opened by `fopen`. For these special files, we don't have a good way to ascertain the length of the input because `fstat` won't work (that only works on file descriptors returned by `open`, which misbehave on special files), nor can we seek to the end to determine the length (because special files might not support rewinding the input) so the current solution is to simply read from the input to the end and see how many bytes we receive. This means that we can't allocate all the necessary memory we require up front and instead we need to dynamically resize the argument array as we read.
Gabriella439
added a commit
to MercuryTechnologies/cctools-port
that referenced
this pull request
Feb 21, 2023
This is a follow-up to: tpoechtrager#131 … which introduced a bug: the `st_size` field does not necessarily accurately represent the length if the file is a special file. For example, on macOS the `st_size` field appears to empirically be no larger than 64 KB for special files (presumably the size of some buffer), even if the actual input is larger than 64 KB. As a result, `ld` was truncating the input arguments to 64 KB for large response files, which defeats the purpose of response files (since they're typically used to support large command lines). More generally, this issue isn't specific to `fstat`, but rather appears to be an issue with anything that uses a file descriptor returned by `open`. For example, `read` misbehaves in the exact same way and refuses to read more than 64 KB from a special file that was opened by `open` even if you try to repeatedly `read` from the file to completion. For these special files, we don't have a good way to ascertain the length of the input because `fstat` won't work (that only works on file descriptors returned by `open`, which misbehave on special files), nor can we seek to the end to determine the length (because special files might not support rewinding the input) so the first part of this fix is to simply read from the input to the end and see how many bytes receive. This means that we can't allocate all the necessary memory we require up front and instead we need to dynamically resize the argument array as we read. The second part of this solution is to use `fopen` / `fread` / `close` on the unhappy path when `mmap` fails instead of using `open` / `read` / `close` since the latter operations misbehave on special files.
Gabriella439
added a commit
to MercuryTechnologies/cctools-port
that referenced
this pull request
Feb 21, 2023
This is a follow-up to: tpoechtrager#131 … which introduced a bug: the `st_size` field does not necessarily accurately represent the length if the file is a special file. For example, on macOS the `st_size` field appears to empirically be no larger than 64 KB for special files (presumably the size of some buffer), even if the actual input is larger than 64 KB. As a result, `ld` was truncating the input arguments to 64 KB for large response files, which defeats the purpose of response files (since they're typically used to support large command lines). More generally, this issue isn't specific to `fstat`, but rather appears to be an issue with anything that uses a file descriptor returned by `open`. For example, `read` misbehaves in the exact same way and refuses to read more than 64 KB from a special file that was opened by `open` even if you try to repeatedly `read` from the file to completion. For these special files, we don't have a good way to ascertain the length of the input because `fstat` won't work (that only works on file descriptors returned by `open`, which misbehave on special files), nor can we seek to the end to determine the length (because special files might not support rewinding the input) so the first part of this fix is to simply read from the input to the end and see how many bytes receive. This means that we can't allocate all the necessary memory we require up front and instead we need to dynamically resize the argument array as we read. The second part of this solution is to use `fopen` / `fread` / `close` on the unhappy path when `mmap` fails instead of using `open` / `read` / `close` since the latter operations misbehave on special files.
Gabriella439
added a commit
to MercuryTechnologies/cctools-port
that referenced
this pull request
Feb 22, 2023
This is a follow-up to: tpoechtrager#131 … which introduced a bug: the `st_size` field does not necessarily accurately represent the length if the file is a special file. For example, on macOS the `st_size` field appears to empirically be no larger than 64 KB for special files (presumably the size of some buffer), even if the actual input is larger than 64 KB. As a result, `ld` was truncating the input arguments to 64 KB for large response files, which defeats the purpose of response files (since they're typically used to support large command lines). More generally, this issue isn't specific to `fstat`, but rather appears to be an issue with anything that uses a file descriptor returned by `open`. For example, `read` misbehaves in the exact same way and refuses to read more than 64 KB from a special file that was opened by `open` even if you try to repeatedly `read` from the file to completion. For these special files, we don't have a good way to ascertain the length of the input because `fstat` won't work (that only works on file descriptors returned by `open`, which misbehave on special files), nor can we seek to the end to determine the length (because special files might not support rewinding the input) so the first part of this fix is to simply read from the input to the end and see how many bytes receive. This means that we can't allocate all the necessary memory we require up front and instead we need to dynamically resize the argument array as we read. The second part of this solution is to use `fopen` / `fread` / `close` on the unhappy path when `mmap` fails instead of using `open` / `read` / `close` since the latter operations misbehave on special files.
Gabriella439
added a commit
to MercuryTechnologies/nixpkgs
that referenced
this pull request
Feb 22, 2023
The motivation behind this is to alleviate the problem described in NixOS#41340. I'm not sure if this completely fixes the problem, but it eliminates one more area where we can exceed command line length limits. This is essentially the same change as in NixOS#112449, except for `ld-wrapper.sh` instead of `cc-wrapper.sh`. However, that change alone was not enough; on macOS the `ld` provided by `darwin.cctools` fails if you use process substitution to generate the response file, so I put up two PRs to fix that: tpoechtrager/cctools-port#131 tpoechtrager/cctools-port#132 … and I included a patch referencing that fix so that the new `ld-wrapper` still works on macOS.
Gabriella439
added a commit
to MercuryTechnologies/nixpkgs
that referenced
this pull request
Feb 22, 2023
The motivation behind this is to alleviate the problem described in NixOS#41340. I'm not sure if this completely fixes the problem, but it eliminates one more area where we can exceed command line length limits. This is essentially the same change as in NixOS#112449, except for `ld-wrapper.sh` instead of `cc-wrapper.sh`. However, that change alone was not enough; on macOS the `ld` provided by `darwin.cctools` fails if you use process substitution to generate the response file, so I put up two PRs to fix that: tpoechtrager/cctools-port#131 tpoechtrager/cctools-port#132 … and I included a patch referencing that fix so that the new `ld-wrapper` still works on macOS.
Gabriella439
added a commit
to NixOS/nixpkgs
that referenced
this pull request
Feb 24, 2023
The motivation behind this is to alleviate the problem described in #41340. I'm not sure if this completely fixes the problem, but it eliminates one more area where we can exceed command line length limits. This is essentially the same change as in #112449, except for `ld-wrapper.sh` instead of `cc-wrapper.sh`. However, that change alone was not enough; on macOS the `ld` provided by `darwin.cctools` fails if you use process substitution to generate the response file, so I put up a PR to fix that: tpoechtrager/cctools-port#131 … and I included a patch referencing that fix so that the new `ld-wrapper` still works on macOS.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently
ld
usesmmap
to read in the contents of the response file (presumably for performance reasons), butmmap
sometimes doesn't work on special files like pipes. For example, if you do something like:… that will currently fail with a segmentation fault.
You might wonder why one would want to generate a response file from process substitution. The rationale behind this is that I'm currently in the process of fixing a longstanding issue with the linker failing in Nixpkgs on macOS due to hitting command-line length limits and the fix entails the use of process substitution to generate the process file.
Specifically, what I was doing was building upon the work from this PR:
NixOS/nixpkgs#112449
… which modified the
cc-wrapper
in Nixpkgs to use a response file generate from process substitution. I was going to do essentially the same for theld-wrapper
in Nixpkgs, but that failed with a segmentation fault (for the reasons outlined above).There are other possible ways to work around that, but using process substitution is the "leanest" way of generating the response file for
ld
in Nixpkgs, so I wanted to push on getting that working here instead of working around the problem downstream.So the way I fixed it was to fall back to using
read
instead ofmmap
if themmap
failed. After this change, the above sample command now works correctly.This also fixes another small issue along the way: this now correctly detects when the
mmap
fails. Previously, themmap
logic was detecting failure by looking for aNULL
/0
return value, but that is not the correct error-handling behavior.mmap
returnsMAP_FAILED
on failure, which is-1
in practice, and not0
. That's the reason why the code was failing with a segmentation fault before because it wasn't detecting the failure and proceeding to read from the invalid buffer anyway.