Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ld's response files support for special files #131

Conversation

Gabriella439
Copy link
Contributor

Currently ld uses mmap to read in the contents of the response file (presumably for performance reasons), but mmap sometimes doesn't work on special files like pipes. For example, if you do something like:

$ ld @<(echo --help)

… that will currently fail with a segmentation fault.

You might wonder why one would want to generate a response file from process substitution. The rationale behind this is that I'm currently in the process of fixing a longstanding issue with the linker failing in Nixpkgs on macOS due to hitting command-line length limits and the fix entails the use of process substitution to generate the process file.

Specifically, what I was doing was building upon the work from this PR:

NixOS/nixpkgs#112449

… which modified the cc-wrapper in Nixpkgs to use a response file generate from process substitution. I was going to do essentially the same for the ld-wrapper in Nixpkgs, but that failed with a segmentation fault (for the reasons outlined above).

There are other possible ways to work around that, but using process substitution is the "leanest" way of generating the response file for ld in Nixpkgs, so I wanted to push on getting that working here instead of working around the problem downstream.

So the way I fixed it was to fall back to using read instead of mmap if the mmap failed. After this change, the above sample command now works correctly.

This also fixes another small issue along the way: this now correctly detects when the mmap fails. Previously, the mmap logic was detecting failure by looking for a NULL/0 return value, but that is not the correct error-handling behavior. mmap returns MAP_FAILED on failure, which is -1 in practice, and not 0. That's the reason why the code was failing with a segmentation fault before because it wasn't detecting the failure and proceeding to read from the invalid buffer anyway.

Currently `ld` uses `mmap` to read in the contents of the
response file (presumably for performance reasons), but `mmap`
sometimes doesn't work on special files like pipes.  For example,
if you do something like:

```
$ ld @<(echo --help)
```

… that will currently fail with a segmentation fault.

You might wonder why one would want to generate a response file
from process substitution.  The rationale behind this is that
I'm currently in the process of fixing a longstanding issue
with the linker failing in Nixpkgs on macOS due to hitting
command-line length limits and the fix entails the use of
process substitution to generate the process file.

Specifically, what I was doing was building upon the work from
this PR:

NixOS/nixpkgs#112449

… which modified the `cc-wrapper` in Nixpkgs to use a
response file generate from process substitution.  I was
going to do essentially the same for the `ld-wrapper` in
Nixpkgs, but that failed with a segmentation fault (for the
reasons outlined above).

There are other possible ways to work around that, but using
process substitution is the "leanest" way of generating the
response file for `ld` in Nixpkgs, so I wanted to push on
getting that working here instead of working around the problem
downstream.

So the way I fixed it was to fall back to using `read` instead
of `mmap` if the `mmap` failed.  After this change, the above
sample command now works correctly.

This also fixes another small issue along the way: this now
correctly detects when the `mmap` fails.  Previously, the `mmap`
logic was detecting failure by looking for a `NULL`/`0` return
value, but that is not the correct error-handling behavior.
`mmap` returns `MAP_FAILED` on failure, which is `-1` in practice,
and not `0`.  That's the reason why the code was failing with
a segmentation fault before because it wasn't detecting the failure
and proceeding to read from the invalid buffer anyway.
@tpoechtrager tpoechtrager merged commit fac6a28 into tpoechtrager:master Feb 2, 2023
Gabriella439 added a commit to MercuryTechnologies/nixpkgs that referenced this pull request Feb 3, 2023
The motivation behind this is to alleviate the problem
described in NixOS#41340.
I'm not sure if this completely fixes the problem, but it
eliminates one more area where we can exceed command line
length limits.

This is essentially the same change as in NixOS#112449,
except for `ld-wrapper.sh` instead of `cc-wrapper.sh`.

However, that change alone was not enough; on macOS the
`ld` provided by `darwin.cctools` fails if you use process
substitution to generate the response file, so I put up a
PR to fix that:

tpoechtrager/cctools-port#131

… and I included a patch referencing that fix so that the
new `ld-wrapper` still works on macOS.
Gabriella439 added a commit to MercuryTechnologies/cctools-port that referenced this pull request Feb 10, 2023
This is a follow-up to:

tpoechtrager#131

… which introduced a bug: the `st_size` field returned by
does not necessarily accurately represent the length if the
file is a special file.  For example, on macOS the `st_size`
field appears to empirically be no larger than 64 KB for
special files, even if the actual input is larger than 64
KB.  Consequently, `ld` incorrectly truncates the input
arguments to 64 KB on macOS for large response files, which
defeats the purpose of response files (since they're
typically used to support large command lines).

For these special files, we don't have a good way to
ascertain the length of the input other than to `read` from
the input and see how many bytes we receive.  This means
that we can't allocate all the necessary memory we require
up front and instead we need to dynamically resize the
argument array as we read.
Gabriella439 added a commit to MercuryTechnologies/cctools-port that referenced this pull request Feb 14, 2023
This is a follow-up to:

tpoechtrager#131

… which introduced a bug: the `st_size` field returned by
does not necessarily accurately represent the length if the
file is a special file.  For example, on macOS the `st_size`
field appears to empirically be no larger than 64 KB for
special files, even if the actual input is larger than 64
KB.  Consequently, `ld` incorrectly truncates the input
arguments to 64 KB on macOS for large response files, which
defeats the purpose of response files (since they're
typically used to support large command lines).

For these special files, we don't have a good way to
ascertain the length of the input other than to `read` from
the input and see how many bytes we receive.  This means
that we can't allocate all the necessary memory we require
up front and instead we need to dynamically resize the
argument array as we read.
Gabriella439 added a commit to MercuryTechnologies/cctools-port that referenced this pull request Feb 16, 2023
This is a follow-up to:

tpoechtrager#131

… which introduced a bug: the `st_size` field does not necessarily
accurately represent hte length if the file is a special file.  For
example, on macOS the `st_size` field appears to empirically be no
larger than 64 KB for special files, even if the actual input is larger
than 64 KB.  As a result, `ld` was truncating the input arguments to
64 KB for large response files, which defeats the purpose of response
files (since they're typically used to support large command lines).

More generally, this issue isn't specific to `fstat`, but rather
appears to be an issue with anything that uses a file descriptor
returned by `open`.  For example, `read` misbehaves in the exact same
way and refuses to read more than 64 KB from a special file that was
opened by `fopen`.

For these special files, we don't have a good way to ascertain the
length of the input because `fstat` won't work (that only works on
file descriptors returned by `open`, which misbehave on special files),
nor can we seek to the end to determine the length (because special
files might not support rewinding the input) so the current solution is
to simply read from the input to the end and see how many bytes we
receive.  This means that we can't allocate all the necessary memory we
require up front and instead we need to dynamically resize the argument
array as we read.
Gabriella439 added a commit to MercuryTechnologies/cctools-port that referenced this pull request Feb 21, 2023
This is a follow-up to:

tpoechtrager#131

… which introduced a bug: the `st_size` field does not necessarily
accurately represent the length if the file is a special file.  For
example, on macOS the `st_size` field appears to empirically be no
larger than 64 KB for special files (presumably the size of some
buffer), even if the actual input is larger than 64 KB.  As a result,
`ld` was truncating the input arguments to 64 KB for large response
files, which defeats the purpose of response files (since they're
typically used to support large command lines).

More generally, this issue isn't specific to `fstat`, but rather
appears to be an issue with anything that uses a file descriptor
returned by `open`.  For example, `read` misbehaves in the exact same
way and refuses to read more than 64 KB from a special file that was
opened by `open` even if you try to repeatedly `read` from the file to
completion.

For these special files, we don't have a good way to ascertain the
length of the input because `fstat` won't work (that only works on
file descriptors returned by `open`, which misbehave on special files),
nor can we seek to the end to determine the length (because special
files might not support rewinding the input) so the first part of this
fix is to simply read from the input to the end and see how many bytes
receive.  This means that we can't allocate all the necessary memory we
require up front and instead we need to dynamically resize the argument
array as we read.

The second part of this solution is to use `fopen` / `fread` / `close`
on the unhappy path when `mmap` fails instead of using `open` / `read`
/ `close` since the latter operations misbehave on special files.
Gabriella439 added a commit to MercuryTechnologies/cctools-port that referenced this pull request Feb 21, 2023
This is a follow-up to:

tpoechtrager#131

… which introduced a bug: the `st_size` field does not necessarily
accurately represent the length if the file is a special file.  For
example, on macOS the `st_size` field appears to empirically be no
larger than 64 KB for special files (presumably the size of some
buffer), even if the actual input is larger than 64 KB.  As a result,
`ld` was truncating the input arguments to 64 KB for large response
files, which defeats the purpose of response files (since they're
typically used to support large command lines).

More generally, this issue isn't specific to `fstat`, but rather
appears to be an issue with anything that uses a file descriptor
returned by `open`.  For example, `read` misbehaves in the exact same
way and refuses to read more than 64 KB from a special file that was
opened by `open` even if you try to repeatedly `read` from the file to
completion.

For these special files, we don't have a good way to ascertain the
length of the input because `fstat` won't work (that only works on
file descriptors returned by `open`, which misbehave on special files),
nor can we seek to the end to determine the length (because special
files might not support rewinding the input) so the first part of this
fix is to simply read from the input to the end and see how many bytes
receive.  This means that we can't allocate all the necessary memory we
require up front and instead we need to dynamically resize the argument
array as we read.

The second part of this solution is to use `fopen` / `fread` / `close`
on the unhappy path when `mmap` fails instead of using `open` / `read`
/ `close` since the latter operations misbehave on special files.
Gabriella439 added a commit to MercuryTechnologies/cctools-port that referenced this pull request Feb 22, 2023
This is a follow-up to:

tpoechtrager#131

… which introduced a bug: the `st_size` field does not necessarily
accurately represent the length if the file is a special file.  For
example, on macOS the `st_size` field appears to empirically be no
larger than 64 KB for special files (presumably the size of some
buffer), even if the actual input is larger than 64 KB.  As a result,
`ld` was truncating the input arguments to 64 KB for large response
files, which defeats the purpose of response files (since they're
typically used to support large command lines).

More generally, this issue isn't specific to `fstat`, but rather
appears to be an issue with anything that uses a file descriptor
returned by `open`.  For example, `read` misbehaves in the exact same
way and refuses to read more than 64 KB from a special file that was
opened by `open` even if you try to repeatedly `read` from the file to
completion.

For these special files, we don't have a good way to ascertain the
length of the input because `fstat` won't work (that only works on
file descriptors returned by `open`, which misbehave on special files),
nor can we seek to the end to determine the length (because special
files might not support rewinding the input) so the first part of this
fix is to simply read from the input to the end and see how many bytes
receive.  This means that we can't allocate all the necessary memory we
require up front and instead we need to dynamically resize the argument
array as we read.

The second part of this solution is to use `fopen` / `fread` / `close`
on the unhappy path when `mmap` fails instead of using `open` / `read`
/ `close` since the latter operations misbehave on special files.
Gabriella439 added a commit to MercuryTechnologies/nixpkgs that referenced this pull request Feb 22, 2023
The motivation behind this is to alleviate the problem
described in NixOS#41340.
I'm not sure if this completely fixes the problem, but it
eliminates one more area where we can exceed command line
length limits.

This is essentially the same change as in NixOS#112449,
except for `ld-wrapper.sh` instead of `cc-wrapper.sh`.

However, that change alone was not enough; on macOS the
`ld` provided by `darwin.cctools` fails if you use process
substitution to generate the response file, so I put up two
PRs to fix that:

tpoechtrager/cctools-port#131
tpoechtrager/cctools-port#132

… and I included a patch referencing that fix so that the
new `ld-wrapper` still works on macOS.
Gabriella439 added a commit to MercuryTechnologies/nixpkgs that referenced this pull request Feb 22, 2023
The motivation behind this is to alleviate the problem
described in NixOS#41340.
I'm not sure if this completely fixes the problem, but it
eliminates one more area where we can exceed command line
length limits.

This is essentially the same change as in NixOS#112449,
except for `ld-wrapper.sh` instead of `cc-wrapper.sh`.

However, that change alone was not enough; on macOS the
`ld` provided by `darwin.cctools` fails if you use process
substitution to generate the response file, so I put up two
PRs to fix that:

tpoechtrager/cctools-port#131
tpoechtrager/cctools-port#132

… and I included a patch referencing that fix so that the
new `ld-wrapper` still works on macOS.
Gabriella439 added a commit to NixOS/nixpkgs that referenced this pull request Feb 24, 2023
The motivation behind this is to alleviate the problem
described in #41340.
I'm not sure if this completely fixes the problem, but it
eliminates one more area where we can exceed command line
length limits.

This is essentially the same change as in #112449,
except for `ld-wrapper.sh` instead of `cc-wrapper.sh`.

However, that change alone was not enough; on macOS the
`ld` provided by `darwin.cctools` fails if you use process
substitution to generate the response file, so I put up a
PR to fix that:

tpoechtrager/cctools-port#131

… and I included a patch referencing that fix so that the
new `ld-wrapper` still works on macOS.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants