Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix installing with yarn on Windows #58

Merged
merged 3 commits into from
Feb 25, 2023

Conversation

DeeDeeG
Copy link
Member

@DeeDeeG DeeDeeG commented Feb 24, 2023

In short, this fixes yarn install for this repo on Windows

(Without breaking npm install for this repo, either.)

This PR fixes the scenario where you want to install this repo's dependencies with Yarn on Windows. (Should allow yarn build:apm to actually work on Windows over at the Pulsar core repo!) It does so by working around some differences with how Yarn interacts with Windows' cmd.exe command line interpreter and batch scripts, compared to how npm interacts with cmd.exe and batch scripts.

For technical explanation of each change, see the commit messages. Each change is no more than one line (and no more than 6 characters, at that!), other than whitespace. So do consider looking at the diff first, then referring to the commit messages if/only if you have further questions. 100% of the changes are found in two .cmd batch script files.

That's the technical info, the rest of this PR body is commentary/not much more than a rant, but it gives context. Other than the footnote about npm major versions, which contains technical information I learned more about while preparing this PR, but which is a tangent and arguably off-topic. Feel free to skip reading the rest of this PR body if you like, since it shouldn't be technically relevant to reviewing the changes in this PR.

Background, effort, and why's cmd.exe gotta be this way?

This was surprisingly involved, as I wanted to know what I was doing with these fixes, but IMO cmd.exe's documentation is loosely worded, sparsely provided, scattered to the many corners of the internet, describes a generally janky/weird command execution environment / scripting language, and feels like a bunch of loosely cobbled together conventions accreted into a de-facto "don't touch it, it'll break someone's scripts" ball of stuff more than it feels like a (well or reasonably or even "at all") designed scripting and command execution environment.

CMD.exe's behavior feels more like happenstance, magic and luck (or misfortune, depending on your perspective) than a well-thought out scripting environment. IMO.

I'm not a fan.

Idea: Maybe we move everything to JS scripts where possible??? But more than a little bit out of scope for this PR.

What did I do about it in this PR?

Anyway, this has not only a few tweaks to the .cmd scripts, but a detailed explanation of the rationale of why I made these tweaks. (See the commit message for each effectively one-line change for the detailed explanation).

I don't like changing small, obscure things because "eh, works for me on my machine", I want to do it because it's the right solution. Otherwise, the fix always feels like a temporary band-aid, and I spend all my time waiting for it to spring a leak or turn out to be woefully problematic on somebody's machine or with somebody's use-case. I want it done once, and right. Doing it the "works on my machine, don't think too hard about why" way always feels like how we got here, not like a way out from here.

So.

Here it is. This makes the postinstall scripts behave under yarn install and npm install* alike.

* Tested and working with npm 6.x only.

Compatibility note for npm major versions (kind of a tangent/off-topic, click to expand if you want):

If/when we migrate to newer npm, we should do so once and never look back, since the major versions do not forward-and-backward interop well with each-other, though one major npm version range is mostly compatible with itself. Lockfile handling is different enough between the different major npm versions to be a major headache for this repo.

Particularly, the jasmine-node github URL gets saved in package-lock.json in one way with npm 8 or 9, that eventually breaks npm 6. npm 6 tries to reformat this github URL to suit its own way of doing things, but after it modifies and saves package-lock.json, npm 6 cannot handle what it wrote out, and it will error if you try to npm install with no node_modules folder populated (a fresh install).

Either we need to do some R&D on how to get that to stop happening, or interop between npm 6 and npm 8 or 9 will be a real nuisance, enough of a hassle to want to strongly avoid IMO. So we should stay on npm 6 for now, IMO, until we can make the jump to npm 8 or 9, and then not go back to npm 6.

npm 6.x is kind of sort of going end-of-life at the end of April 2023, when NodeJS 14 stops being supported, (https://github.com/nodejs/Release/#release-schedule), so we should look into this soon, IMO. (npm 6 is already sort of not supported, but it gets attention due to shipping with NodeJS LTS 14, support for which ends May 2023.) But upgrading npm is way out of scope for this PR, again.

One of a few ways we could prevent the postinstall.cmd script from
exiting out early, when run during "yarn install".

---

Background:

Yarn makes its own shim "node.cmd" in a temp dir, something like this:
"C:\Users\[User]\AppData\Local\Temp\yarn--1677171730971-0.47658380728238603\node.cmd"
Which records an absolute path to the node.exe it detected, apparently
so that later "node" commands run with the same Node binary.

(This temp dir also contains a "yarn.cmd" shim, for what it's worth.)

Yarn places this shim dir first on the PATH env var, so if attempting
to run "node", this shim "node.cmd" batch script would be the first
hit, and would be run preferentially over other "node" on the system.

(Note: This issue does not affect npm, as npm does not attempt to
create its own custom "node.cmd" shim added to the PATH.)

--

Implications:

Directly invoking "node" (in this case, yarn's "node.cmd" shim) would
introduced undesired exiting behavior. Like any .cmd script,  it
implicitly calls "exit" when reaching the end of the script. This
terminates the entire "cmd.exe" terminal instance (in our case, the
one running "postinstall.cmd"), not just the shim script itself.

So, in short, we exit out of a higher context than intended, due
to the unexpected shim not being called/exited in an ideal way.

---

Workarounds:

By running node.exe directly, we don't get this batch script-specific
exiting behavior, and postinstall.cmd can continue to completion.

I chose that option for this commit.

The main alternative would be to do "call node" instead of just "node"
so that the "node.cmd" script is treated as a subroutine, and its
implicit exit only used to exit "node.cmd," while the calling batch
script ("postinstall.cmd") would continue on. But why add an extra
layer of cmd.exe logic and subroutines into the mix? Just call the
"node.exe" binary explicitly, as this commit actually does.

(Note: If we decide that Yarn's shim is actually helpful, if using its
pinned/path-hardcoded node.exe is helpful, then we can revisit
this fix, and do "call node" instead of "node.exe".)
In our "bin\npm.cmd" wrapper script, use "call" for calling into
the official npm wrapper script ".\node_modules\.bin\npm.cmd".

Helps ensure our scripts complete, with script execution always
returning to the calling script (our wrapper, and whatever called our
wrapper) once the official "npm.cmd" finishes.

Without this change, in certain circumstances, the "cmd.exe"
subprocess would exit out entirely at the end of official "npm.cmd",
leaving parts of the calling script(s) (our script(s)) un-run.
Prevents certain non-zero exit codes from presenting to Yarn,
so that Yarn is happy and doesn't think an error has occurred.

Note: The error status code is reported by an `EXIT /b %errorlevel%`
command run at the end of "node_modules\.bin\npm.cmd".

Error codes I'm seeing include "9009" and "3".

Not sure what is causing these non-zero exit codes, but the scripts
appear to be working 100% fine for our needs, in any case.

---

My current best guess is that these are errors being emitted during
C/C++ compilation, but which ultimately don't negatively affect the
build, at least not in any meaningful way that I am aware of.

ppm works, is the important bit.

For reference on the above error codes:

Error code 3:

> Eventually, the source of the magic number 3 was identified: The C
> runtime abort function terminates the process with exit code 3.

- https://devblogs.microsoft.com/oldnewthing/20110519-00/?p=10623

Error Code 9009:

> Error Code 9009 means error file not found.

  ...

> And it means not finding any file the attempted command may be
> involving, thus even when it couldn't find the command itself.
> I was using delete instead of del. That would give you a 9009 too.

- https://stackoverflow.com/a/22433346
@DeeDeeG
Copy link
Member Author

DeeDeeG commented Feb 24, 2023

@confused-Techie I think you were wondering why yarn install in this repo or yarn build:apm in Pulsar core repo didn't work on Windows.

This PR fixes it, and my notes (commit messages) hopefully shed some light on what was going on if you want to know those sorts of answers as well.

So I figured I would tag you and let you know about this PR addressing the issue.

Best Regards,
- DeeDeeG

Copy link
Member

@confused-Techie confused-Techie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As always @DeeDeeG it's a delight reading your PR descriptions. And especially this time was surprised by the wonderful commit messages.

Seriously this deep dive looks like an intense one. Absolutely the definition of a programmer not being someone that types on a keyboard, but being someone that knows exactly what to type on a keyboard.

Impressive work, impressive researching. The final changes here all make sense and I have a hard time imagining any downside to these changes other than ppm finally working in a dev environment on Windows.

As always it's very appreciated, I'll give this an approve and hope some others can review, otherwise I'd be much more than happy to see these changes merged

Copy link
Member Author

@DeeDeeG DeeDeeG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotating the PR diff with the particular changes I realized we needed.

Hopefully this is a lot easier to read and see what I was talking about, than the commit messages as abstract words.

"%~dp0\..\node_modules\.bin\npm.cmd" %*
call "%~dp0\..\node_modules\.bin\npm.cmd" %*
Copy link
Member Author

@DeeDeeG DeeDeeG Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the next diff are basically:

"Don't run a script ending in .cmd directly by name, from your own batch script, or when the script you call into exits, the larger script you're calling it from will also exit [early] (because its host cmd.exe program will exit)".

Either use call [some_script].cmd, or find another way, such as doing node.exe instead of node --> implicitly node.cmd (due to the Yarn shim).

node .\script\download-node.js
node.exe .\script\download-node.js
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto to the above comment about "don't run another .cmd script directly by name from your own .cmd script, or you'll exit early", same thing.

Comment on lines +16 to +17

exit /b
Copy link
Member Author

@DeeDeeG DeeDeeG Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this one honestly still confuses me the most.

It makes intuitive sense to me that it should work, I just wish the documentation I've read didn't suggest it shouldn't work. But this is the one area where I'm going to stop asking questions, I have used up my capacity to ask further questions about why cmd.exe and [script_name].cmd scripts are so weird... As long as this part works, I'll take it. Tested on multiple Windows computers as working.

Hidden behind an expander, because here lies madness (click to expand if you dare):

About exit /b ... the /b flag is only supposed to mean "exit the current subroutine, not the whole cmd.exe." That much I can just about accept.

Rant: (Why is this /b thing not the default though??? Why is exiting the entire cmd.exe the same lever as exiting a subroutine, and making the distinction is opt-in with the /b flag? If calling into scripts from other scripts is a thing, why not make the most narrow exit the default? Seems super error-prone... /grumpiness).

Here's the weird part, we're using it for something that I'm not sure it's supposed to do (hiding/resetting a previous error code).

It lets you explicitly exit with a certain error code, such as exit /b 6 for error code 6, a number I just made up. exit /b without an explicit code is supposed to carry forward the previous error code, if you don't explicitly set one like exit /b [code].

Sources:

And yet here I am doing exit /b here to suppress the error level from the npm.cmd --> node_modules\.bin\npm.cmd we are calling just above.

As far as I can read in any docs, this should do nothing. And yet I have tested it works with this change and fails without this change over and over.

(I mean "fails" in the sense that the error level is visible to Yarn, and Yarn will exit with a non-zero exit code -- the error_level presented from the batch script -- , which can/would kill a CI run.)

(Other than in CI, the error messages from the npm rebuild would just be a bit of a nuisance, since the C/C++ code seems to have built just fine??? I dunno how to verify all that built C/C++ code, honestly, but all tests are passing, right??? But failed CI runs is a bit of an issue, so I still think "hiding" the errors is better if they aren't actual problems. Like, maybe the makefile or equivalent for these C/C++ addons tries some commands that might be missing and does fallback behavior that works instead, but the original "command not found" error status persists long enough to trip up Yarn without this change??? I have to guess so hard at this, since it's increasingly deep down the rabbit hole just to fix a package manager issue.)


This, and having to scratch my head a lot to infer/lucky guess about the Yarn shims existing, were a large part of my descent into madness on this one (on this PR). I'm not even sure how/why npm avoids some of these issues. And at this point, I don't want to know. I manually tested it's working in both npm and yarn, like dozens of times, that's good enough for me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this really just makes me say, lets make all scripts JavaScript ones, with #/env/node at the top, and require everyone to have node already. That's just a lot of ridiculousness

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have loosely aspired to replacing these shell and batch scripts with pure JS for some time. I assume it would've been harder ~9 years ago when apm was first written, but NodeJS has kind of had everything and the kitchen sink built in by now. Indeed child_process.spawn() and the like can set env vars and call out to CLI commands, generally replacing the main point of shell/batch scripts.

If we can do JS with the same or better reliability for ppm in-use, then it'll be worth it. Otherwise, if the shell/batch scripts run more reliably, then perhaps the insanity these scripts bring during maintenance can be a maintainer-only concern, and a rare one (just don't touch them that often)... Not that I love the thought of that. We'll see...

(More cmd weirdness: You might be intrigued to hear that adding exit /b at the bottom of our bin\npm.cmd, closer to where the errors I was trying to suppress came from, seemed to do nothing. Whereas doing exit /b one further layer out, at the bottom of postinstall.cmd, did the trick. 🤷.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants