-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unpaired surrogate in any env var stalls libz-ng-sys build on Windows #215
Comments
Thanks a lot for reporting and the detailed write-up! If it was a rust program we'd be lucky as it should be possible to find where it iterates through all environment variables. Something I could specifically imagine is that a program traverses UTF-8 variables in a thread while holding a lock, then panics, the lock gets poisoned or it otherwise fails to unlock a resource, making the rest of the program stall as it can't obtain a shared resource anymore. |
Yes, I should be able to find and add more information, including that. I had reported this with the readily available information so I could refer to it from other issues and PRs, which turned out just to be GitoxideLabs/gitoxide#1574 and GitoxideLabs/gitoxide#1580, but can I find more information. In addition to looking at what's running, I should also be able to check if it happens with other non-MSVC Windows targets on Windows, such as |
As possible support for this hypothesis, I've just noticed that the problem only happens in parallel builds. Passing FOO=$'x\uD800y' ./cargo-zng build -j1 Edit: It looks like the affected processes may be
Killing either the parent or child |
This is very interesting! Now a workaround exists, and maybe that can be useful. Unfortunately the process explorer seems to only sample the current process tree, so it might miss invocations or at least won't have the exact timing of who spawns what, when. If that was possible, maybe it would be clearer which two processes have to be present for the hang to occur. In the tree above it doesn't seem that there are any multiple/parallel processes, but maybe I am not seeing it right and MSBuild is not really waiting on the child MSBuild, and thus is running concurrently. Or maybe… somehow the parent MSBuild attempts to copy the environment for its child, and kind of succeeds enough to spawn the child, but then bails out early without properly handling it so the whole process tree seems to hang. Strange that the children don't terminate if that's the case, but who knows how the tree communicates internally. Sorry for the rambling above. Maybe it's possible to examine if Alternatively, as a fix, one could probably sanitize the environment in the build-script to clean |
That's because a separate Sysinternals tool, Process Monitor, is meant for that. It dumps an enormous number of events if one does not narrow down what one wants and set filters accordingly, though.
I could use a Process Monitor to record what is going on, but I am somewhat more interested in attaching a debugger. I didn't manage to get very far with that, though I think I technically did do it, with the tools currently installed on this machine: I was able to attach the Visual Studio debugger, but the information available was minimal and I couldn't do much. But I can install WinDbg (or another version of it) and try again. The other thing that comes to mind is that MSBuild is actually free open source software, so if I can get it to run a version built from source code that is present, I may be able to debug the actual C# code of MSBuild. What I'm unsure of is if it really has all the same functionality; its readme mentions:
I don't know why that would be, if it is the same.
This could be, but I don't know how it would be, because I've watched the process tree in Process Explorer as the command runs, and the parent seems to have no trouble creating other children earlier in the build (which it seems to wait on successfully).
I will include this among the things I try.
A disadvantage there, even if especially important variables such as |
On Windows, when building
libz-ng-sys
, including when it is built bycargo
as a dependency of another crate, the build stalls if the value of any environment variable is ill-formed Unicode due to the presence of an unpaired surrogate code point.This is not related to #212 or any other Windows-specific behavior of the
cargo-zng
script in this repository. It happens even whenlibz-ng-sys
is downloaded and built bycargo
as a dependency of another project (which is unaffected by #212). In particular, I discovered this when investigating GitoxideLabs/gitoxide#1574, due to thelibz-ng-sys
dependency ofgitoxide
.Working around it in
gitoxide
, to test the behavior ofgix-testtools
(or othergitoxide
crates) in the presence of environment variables whose values contain unpaired surrogates, requires that I build, set the environment, and then run the tests, in such a way that this crate is built but the fixture scripts are not able to run. This is only slightly cumbersome, but it slows things down a bit and, more importantly, is easy to get wrong. So I think that application illustrates a possible benefit to fixing this. (This is because the machinery for running fixture scripts is what I'm currently investigating ingitoxide
. In an investigation of another crate'sbuild.rs
behavior in the presence of such an environment variable, working around this could be even more complex.)However, while this is not specific to the direct use of
cargo-zng
, it appears onlylibz-ng-sys
is affected. Building the code in a clone of this repository with an environment variable set to a value that contains an unpaired surrogate, runningcargo build
still works fine, while./cargo-zng build
stalls.My guess is that the bug is in a downstream component or possibly even a build tool, and not in any code in this repository. But I am not entirely sure, nor do I know what component. So at least initially I'm opening this here.
The problem occurs regardless of what environment variable has such a value. It does not need to be a variable that any code here or in a downstream component or build tool plausibly uses. In addition to testing with
MSYS
, I also tested withFOO
andBE65CC05_8B18_486B_BED4_4DB214E47441
, the latter of which is trivially derived from the output of a new run ofuuidgen
and not used elsewhere before this test. For example, this is one of the commands that stalled:BE65CC05_8B18_486B_BED4_4DB214E47441=$'x\uD800y' ./cargo-zng -v build
I primarily tested in Git Bash. I verified that I was really setting unpaired surrogates by also trying in PowerShell with the
`
notation, and (more convincingly) by runningprenv
, a program I happen to have written in such a way as to panic when it tries to print all environment variables when any environment variable's value cannot be decoded as UTF-8 (which should probably be considered a bug inprenv
, so I may fix that and this behavior may thus go away, but is convenient for now and here).To make sure that it was really invalid Unicode, rather than non-ASCII or the presence of surrogates whether or not they are paired, I also tested with a variable whose value contained
€
. As expected, this did not trigger the bug.Although this bug is unrelated to #212, my ability to test it in Git Bash on my Windows system using the code here was simplified by being able to run the
cargo-zng
script, so I ran most tests on thedirsep
branch (for #213).The best information about where the stall occurs may be from a run with high verbosity. But first, in a low verbosity run:
In a medium verbosity run:
In a high verbosity run:
To limit the volume of this issue, I've put the control runs, where
FOO
is unset and where it is set to the valuex€y
, only in this gist, which also contains the$'x\uD800y'
run shown above.The text was updated successfully, but these errors were encountered: