-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix installing with yarn on Windows #58
Conversation
One of a few ways we could prevent the postinstall.cmd script from exiting out early, when run during "yarn install". --- Background: Yarn makes its own shim "node.cmd" in a temp dir, something like this: "C:\Users\[User]\AppData\Local\Temp\yarn--1677171730971-0.47658380728238603\node.cmd" Which records an absolute path to the node.exe it detected, apparently so that later "node" commands run with the same Node binary. (This temp dir also contains a "yarn.cmd" shim, for what it's worth.) Yarn places this shim dir first on the PATH env var, so if attempting to run "node", this shim "node.cmd" batch script would be the first hit, and would be run preferentially over other "node" on the system. (Note: This issue does not affect npm, as npm does not attempt to create its own custom "node.cmd" shim added to the PATH.) -- Implications: Directly invoking "node" (in this case, yarn's "node.cmd" shim) would introduced undesired exiting behavior. Like any .cmd script, it implicitly calls "exit" when reaching the end of the script. This terminates the entire "cmd.exe" terminal instance (in our case, the one running "postinstall.cmd"), not just the shim script itself. So, in short, we exit out of a higher context than intended, due to the unexpected shim not being called/exited in an ideal way. --- Workarounds: By running node.exe directly, we don't get this batch script-specific exiting behavior, and postinstall.cmd can continue to completion. I chose that option for this commit. The main alternative would be to do "call node" instead of just "node" so that the "node.cmd" script is treated as a subroutine, and its implicit exit only used to exit "node.cmd," while the calling batch script ("postinstall.cmd") would continue on. But why add an extra layer of cmd.exe logic and subroutines into the mix? Just call the "node.exe" binary explicitly, as this commit actually does. (Note: If we decide that Yarn's shim is actually helpful, if using its pinned/path-hardcoded node.exe is helpful, then we can revisit this fix, and do "call node" instead of "node.exe".)
In our "bin\npm.cmd" wrapper script, use "call" for calling into the official npm wrapper script ".\node_modules\.bin\npm.cmd". Helps ensure our scripts complete, with script execution always returning to the calling script (our wrapper, and whatever called our wrapper) once the official "npm.cmd" finishes. Without this change, in certain circumstances, the "cmd.exe" subprocess would exit out entirely at the end of official "npm.cmd", leaving parts of the calling script(s) (our script(s)) un-run.
Prevents certain non-zero exit codes from presenting to Yarn, so that Yarn is happy and doesn't think an error has occurred. Note: The error status code is reported by an `EXIT /b %errorlevel%` command run at the end of "node_modules\.bin\npm.cmd". Error codes I'm seeing include "9009" and "3". Not sure what is causing these non-zero exit codes, but the scripts appear to be working 100% fine for our needs, in any case. --- My current best guess is that these are errors being emitted during C/C++ compilation, but which ultimately don't negatively affect the build, at least not in any meaningful way that I am aware of. ppm works, is the important bit. For reference on the above error codes: Error code 3: > Eventually, the source of the magic number 3 was identified: The C > runtime abort function terminates the process with exit code 3. - https://devblogs.microsoft.com/oldnewthing/20110519-00/?p=10623 Error Code 9009: > Error Code 9009 means error file not found. ... > And it means not finding any file the attempted command may be > involving, thus even when it couldn't find the command itself. > I was using delete instead of del. That would give you a 9009 too. - https://stackoverflow.com/a/22433346
@confused-Techie I think you were wondering why This PR fixes it, and my notes (commit messages) hopefully shed some light on what was going on if you want to know those sorts of answers as well. So I figured I would tag you and let you know about this PR addressing the issue. Best Regards, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As always @DeeDeeG it's a delight reading your PR descriptions. And especially this time was surprised by the wonderful commit messages.
Seriously this deep dive looks like an intense one. Absolutely the definition of a programmer not being someone that types on a keyboard, but being someone that knows exactly what to type on a keyboard.
Impressive work, impressive researching. The final changes here all make sense and I have a hard time imagining any downside to these changes other than ppm
finally working in a dev environment on Windows.
As always it's very appreciated, I'll give this an approve and hope some others can review, otherwise I'd be much more than happy to see these changes merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annotating the PR diff with the particular changes I realized we needed.
Hopefully this is a lot easier to read and see what I was talking about, than the commit messages as abstract words.
"%~dp0\..\node_modules\.bin\npm.cmd" %* | ||
call "%~dp0\..\node_modules\.bin\npm.cmd" %* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the next diff are basically:
"Don't run a script ending in .cmd
directly by name, from your own batch script, or when the script you call into exits, the larger script you're calling it from will also exit [early] (because its host cmd.exe
program will exit)".
Either use call [some_script].cmd
, or find another way, such as doing node.exe
instead of node
--> implicitly node.cmd
(due to the Yarn shim).
node .\script\download-node.js | ||
node.exe .\script\download-node.js |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto to the above comment about "don't run another .cmd
script directly by name from your own .cmd
script, or you'll exit early", same thing.
|
||
exit /b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this one honestly still confuses me the most.
It makes intuitive sense to me that it should work, I just wish the documentation I've read didn't suggest it shouldn't work. But this is the one area where I'm going to stop asking questions, I have used up my capacity to ask further questions about why cmd.exe
and [script_name].cmd
scripts are so weird... As long as this part works, I'll take it. Tested on multiple Windows computers as working.
Hidden behind an expander, because here lies madness (click to expand if you dare):
About exit /b
... the /b
flag is only supposed to mean "exit the current subroutine, not the whole cmd.exe
." That much I can just about accept.
Rant: (Why is this /b
thing not the default though??? Why is exiting the entire cmd.exe
the same lever as exiting a subroutine, and making the distinction is opt-in with the /b
flag? If calling into scripts from other scripts is a thing, why not make the most narrow exit the default? Seems super error-prone... /grumpiness).
Here's the weird part, we're using it for something that I'm not sure it's supposed to do (hiding/resetting a previous error code).
It lets you explicitly exit with a certain error code, such as exit /b 6
for error code 6, a number I just made up. exit /b
without an explicit code is supposed to carry forward the previous error code, if you don't explicitly set one like exit /b [code]
.
Sources:
- https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/exit
- Official docs, doesn't really clarify what happens to a previous/existing error code if you don't specify a new one.
- (The official docs are way too short, IMO, they leave a lot of these finer questions un-addressed.)
- https://ss64.com/nt/exit.html
- Some madman (Simon Sheppard, apparently), who I have to trust implicitly, since manual testing seems to be the main way to verify anything about
cmd.exe
, and this seems (???) to be the product of much manual testing and mad note-taking and scribbling.-
EXIT
without an ExitCode acts the same asgoto:eof
and will not alter the ERRORLEVEL- Also this (https://ss64.com/nt/goto.html), oh no:
-
If the jump is successfully made
%ERRORLEVEL% = unchanged
, typically this will be 0 but if a previous command set an errorlevel, that will be preserved (this is a bug). -
(this is a bug) 🤦
-
- Some madman (Simon Sheppard, apparently), who I have to trust implicitly, since manual testing seems to be the main way to verify anything about
And yet here I am doing exit /b
here to suppress the error level from the npm.cmd
--> node_modules\.bin\npm.cmd
we are calling just above.
As far as I can read in any docs, this should do nothing. And yet I have tested it works with this change and fails without this change over and over.
(I mean "fails" in the sense that the error level is visible to Yarn, and Yarn will exit with a non-zero exit code -- the error_level presented from the batch script -- , which can/would kill a CI run.)
(Other than in CI, the error messages from the npm rebuild
would just be a bit of a nuisance, since the C/C++ code seems to have built just fine??? I dunno how to verify all that built C/C++ code, honestly, but all tests are passing, right??? But failed CI runs is a bit of an issue, so I still think "hiding" the errors is better if they aren't actual problems. Like, maybe the makefile
or equivalent for these C/C++ addons tries some commands that might be missing and does fallback behavior that works instead, but the original "command not found" error status persists long enough to trip up Yarn without this change??? I have to guess so hard at this, since it's increasingly deep down the rabbit hole just to fix a package manager issue.)
This, and having to scratch my head a lot to infer/lucky guess about the Yarn shims existing, were a large part of my descent into madness on this one (on this PR). I'm not even sure how/why npm avoids some of these issues. And at this point, I don't want to know. I manually tested it's working in both npm and yarn, like dozens of times, that's good enough for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this really just makes me say, lets make all scripts JavaScript ones, with #/env/node
at the top, and require everyone to have node already. That's just a lot of ridiculousness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have loosely aspired to replacing these shell and batch scripts with pure JS for some time. I assume it would've been harder ~9 years ago when apm was first written, but NodeJS has kind of had everything and the kitchen sink built in by now. Indeed child_process.spawn()
and the like can set env vars and call out to CLI commands, generally replacing the main point of shell/batch scripts.
If we can do JS with the same or better reliability for ppm in-use, then it'll be worth it. Otherwise, if the shell/batch scripts run more reliably, then perhaps the insanity these scripts bring during maintenance can be a maintainer-only concern, and a rare one (just don't touch them that often)... Not that I love the thought of that. We'll see...
(More cmd weirdness: You might be intrigued to hear that adding exit /b
at the bottom of our bin\npm.cmd
, closer to where the errors I was trying to suppress came from, seemed to do nothing. Whereas doing exit /b
one further layer out, at the bottom of postinstall.cmd
, did the trick. 🤷.)
In short, this fixes
yarn install
for this repo on Windows(Without breaking
npm install
for this repo, either.)This PR fixes the scenario where you want to install this repo's dependencies with Yarn on Windows. (Should allow
yarn build:apm
to actually work on Windows over at the Pulsar core repo!) It does so by working around some differences with how Yarn interacts with Windows' cmd.exe command line interpreter and batch scripts, compared to how npm interacts with cmd.exe and batch scripts.For technical explanation of each change, see the commit messages. Each change is no more than one line (and no more than 6 characters, at that!), other than whitespace. So do consider looking at the diff first, then referring to the commit messages if/only if you have further questions. 100% of the changes are found in two .cmd batch script files.
That's the technical info, the rest of this PR body is commentary/not much more than a rant, but it gives context. Other than the footnote about npm major versions, which contains technical information I learned more about while preparing this PR, but which is a tangent and arguably off-topic. Feel free to skip reading the rest of this PR body if you like, since it shouldn't be technically relevant to reviewing the changes in this PR.
Background, effort, and why's cmd.exe gotta be this way?
This was surprisingly involved, as I wanted to know what I was doing with these fixes, but IMO cmd.exe's documentation is loosely worded, sparsely provided, scattered to the many corners of the internet, describes a generally janky/weird command execution environment / scripting language, and feels like a bunch of loosely cobbled together conventions accreted into a de-facto "don't touch it, it'll break someone's scripts" ball of stuff more than it feels like a (well or reasonably or even "at all") designed scripting and command execution environment.
CMD.exe's behavior feels more like happenstance, magic and luck (or misfortune, depending on your perspective) than a well-thought out scripting environment. IMO.
I'm not a fan.
Idea: Maybe we move everything to JS scripts where possible??? But more than a little bit out of scope for this PR.
What did I do about it in this PR?
Anyway, this has not only a few tweaks to the .cmd scripts, but a detailed explanation of the rationale of why I made these tweaks. (See the commit message for each effectively one-line change for the detailed explanation).
I don't like changing small, obscure things because "eh, works for me on my machine", I want to do it because it's the right solution. Otherwise, the fix always feels like a temporary band-aid, and I spend all my time waiting for it to spring a leak or turn out to be woefully problematic on somebody's machine or with somebody's use-case. I want it done once, and right. Doing it the "works on my machine, don't think too hard about why" way always feels like how we got here, not like a way out from here.
So.
Here it is. This makes the postinstall scripts behave under
yarn install
andnpm install
* alike.* Tested and working with npm 6.x only.
Compatibility note for npm major versions (kind of a tangent/off-topic, click to expand if you want):
If/when we migrate to newer npm, we should do so once and never look back, since the major versions do not forward-and-backward interop well with each-other, though one major npm version range is mostly compatible with itself. Lockfile handling is different enough between the different major npm versions to be a major headache for this repo.
Particularly, the
jasmine-node
github URL gets saved in package-lock.json in one way with npm 8 or 9, that eventually breaks npm 6. npm 6 tries to reformat this github URL to suit its own way of doing things, but after it modifies and saves package-lock.json, npm 6 cannot handle what it wrote out, and it will error if you try tonpm install
with nonode_modules
folder populated (a fresh install).Either we need to do some R&D on how to get that to stop happening, or interop between npm 6 and npm 8 or 9 will be a real nuisance, enough of a hassle to want to strongly avoid IMO. So we should stay on npm 6 for now, IMO, until we can make the jump to npm 8 or 9, and then not go back to npm 6.
npm 6.x is kind of sort of going end-of-life at the end of April 2023, when NodeJS 14 stops being supported, (https://github.com/nodejs/Release/#release-schedule), so we should look into this soon, IMO. (npm 6 is already sort of not supported, but it gets attention due to shipping with NodeJS LTS 14, support for which ends May 2023.) But upgrading npm is way out of scope for this PR, again.