Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update node deps to next breaking release #1129

Merged
merged 26 commits into from
Jan 14, 2024
Merged

Update node deps to next breaking release #1129

merged 26 commits into from
Jan 14, 2024

Conversation

JordanMartinez
Copy link
Contributor

Description of the change

Next step in #1122. Currently blocked by optparse not being updated to latest node deps

Checklist:

  • Added the change to the "Unreleased" section of the changelog
  • Added some example of the new feature to the README
  • Added a test for the contribution (if applicable)

P.S.: the above checks are not compulsory to get a change merged, so you may skip them. However, taking care of them will result in less work for the maintainers and will be much appreciated 😊

@JordanMartinez
Copy link
Contributor Author

Blocked by this error:

[1/1 TypesDoNotUnify] .spago/packages/optparse-5.0.0/src/Options/Applicative/Extra.purs:67:14

  67  exitWith c = exit $ fromEnum c
                   ^^^^

  Could not match type
    Effect
  with type
    Function t0
  while trying to match type Effect t2
    with type t0 -> t1
  while checking that expression exit
    has type t0 -> t1
  in value declaration exitWith
  where t0 is an unknown type
        t1 is an unknown type
        t2 is an unknown type

@f-f
Copy link
Member

f-f commented Dec 3, 2023

I posted this comment in the upstream repo already, but just for context here: there's a PR up for optparse already, we could use the fork from there purescript-contrib/purescript-optparse#31

@JordanMartinez
Copy link
Contributor Author

CI failure when calling purs graph on Windows:

The command line is too long.

@JordanMartinez
Copy link
Contributor Author

AFAICT, there is no workaround this limitation. Either we shorten the file paths used for globs, so that each glob takes up less characters (which doesn't actually address the problem), or we update the compiler to add support for passing globs in via a file.

@JordanMartinez
Copy link
Contributor Author

I'm not sure why the current character count of 7,124 is causing problems when the limit is 8191 characters. However, without much else to go on, one of the above fixes seems to be the easiest way forward.

@JordanMartinez
Copy link
Contributor Author

On second thought, another issue here might be node-execa. There's a few other child process commands that get run before we actually run the main command (e.g. which to figure out what the actual command is). That might be the source of the extra characters.

I think another argument for that is the fact that this test worked before this PR where we're upgrading the Node deps.

@JordanMartinez
Copy link
Contributor Author

@f-f
Copy link
Member

f-f commented Dec 4, 2023

Ah, great find! 👏👏

I wonder if using node.exe instead of just node on windows would allow to pass that isExecutableRegex check?

I.e. do we need the new flag at all? Maybe we're just holding it wrong

@JordanMartinez
Copy link
Contributor Author

I'm not sure. What is the file content of purs.cmd anyway? Does it need to be called via cmd.exe or is there some other way around that?

@f-f
Copy link
Member

f-f commented Dec 4, 2023

Oh sorry, disregard my comment - for a moment I was convinced we were talking about the node invocation, but the purs.cmd is the issue here.

Does the test you linked above pass if you set the new flag to false?

@JordanMartinez
Copy link
Contributor Author

Yeah

@JordanMartinez
Copy link
Contributor Author

But I'm wondering if that's the correct way to fix this. I was just verifying that the command parses correctly, not necessarily whether this is the right way to solve it. And I'm not entirely sure because I'm not sure what purs.cmd's file content even is.

@f-f
Copy link
Member

f-f commented Dec 5, 2023

Some context about why the CMD wrapping is there is in the (currently active) sindresorhus/execa#578

Quoting from the thread, they say the wrapping is there to fix

issues with escaping (commands with spaces, etc.) although I am not completely sure whether this is fixed or not in the latest version of Node. cross-spawn seems to fix this by wrapping the command in an additional cmd.exe call and manually escaping the command and arguments, using cmd.exe-specific syntax.

So it seems to be less about what purs.cmd is, and more about how the command looks like (with spaces, etc). I would say that if the command works without the wrapping then we're good to go

@JordanMartinez
Copy link
Contributor Author

Rust fixed a similar issue here: https://github.com/rust-lang/rust/pull/95246/files

@JordanMartinez
Copy link
Contributor Author

So... I think there are a couple of issues being intertwined here.

First, purs does not allow one to pass globs in via a file, which is the main workaround to the character limit on Windows. Given that PowerShell still has a character limit, this might be worth supporting in the compiler just to ensure that one doesn't have to deal with this potential problem in the future.

Second, node.js doesn't properly call spawn on Windows. The reason why cross-spawn even exists is to address these limitations:

Node has issues when using spawn on Windows:

  • It ignores PATHEXT
  • It does not support shebangs
  • Has problems running commands with spaces
  • Has problems running commands with posix relative paths (e.g.: ./my-folder/my-executable)
  • Has an issue with command shims (files in node_modules/.bin/), where arguments with quotes and parenthesis would result in invalid syntax error

execa uses cross-spawn to escape the command in case there are spaces and whatnot. However, theoretically that isn't needed if there are no spaces. But is that a guarantee we can uphold? I can think of two possible cases:

  1. What happens when the path to purs.cmd is a path that contains spaces? I'm not sure when this situation might arise, but if it could, wouldn't we need to call cmd.exe directly to escape spaces?
  2. In a multi-package repo, what happens if the folder for one of those packages contains a space in its name (e.g. Root\my sub package\spago.yaml)? Wouldn't this need to be escaped?

Third, installing purescript via npm on Windows makes it available via purs.cmd rather than purs.ps1. cmd.exe is the old/original shell on Windows, but PowerShell is the new default shell. If we could call purs.ps1 instead, would it workaround this particular problem? I'm not actually sure because I think Node.js always uses cmd.exe to start child processes.

@JordanMartinez JordanMartinez mentioned this pull request Dec 5, 2023
3 tasks
@JordanMartinez
Copy link
Contributor Author

We could also shorten to pkgs since that would help)

I think shortening to pkgs may fix the problem temporarily, but it wouldn't fix the problem long-term (or at least longer-term). It may be worth doing so to get this PR in, but if it causes CI to fail shortly thereafter, we'd just revert it again. Also, I don't think merging this PR and then fixing #1048 is worth it if doing so complicates Spago's development due to unfixable CI errors.

@JordanMartinez
Copy link
Contributor Author

I'm trying to figure out when it would be safe to not do the escaping. Cross spawn's argument handling is based on Rob van der Woude's "Batch Files - Escape Characters" article, which was last updated in Nov 2019. jdeb and dbenham's Batch/Cmd parsing SO answer (linked above) was last updated in Dec 2022.

We typically call purs with a command like this:

purs compile --codegen docs,js,corefn,sourcemaps subpackage\src\**\*.purs .spago\packages\name\version\src\**\*.purs
purs graph subpackage\src\**\*.purs .spago\packages\name\version\src\**\*.purs

Of the characters in this command, we would need to escape:

  • ,
    • per Jeb's SO answer: "The standard token delimiters are <space> <tab> ; , = <0x0B> <0x0C> and <0xFF>"
    • per Rob's article: "Required only in the FOR /F "subject" (i.e. between the parenthesis), even in doublequoted strings"
  • *
    • Jeb's SO answer has nothing on this character, aside from the * in %* variable expansion, which only happens in the batch line parser, not the command line one.
    • Rob's article says: "Required only inside the regex pattern of FINDSTR"

Per this SO answer and the official Windows docs on path naming, special characters like * (our usage above); <, >, |, and & (command separators and pipe redirection); and " (used when escaping things with spaces) are disallowed in folder and file names.

We control purs, compile, --codegen, docs,js,corefn,sourcemaps. The docs,js,corefn,sourcemaps part could presumably be escaped by wrapping the arg in double quotes. Or updating the arg to docs","js","corefn","sourcemaps because the usage of " here escapes just that character.

I'm unsure of the naming convention used for dependencies (e.g. .spago\packages\foo\version\src\**\*.purs). For example, if my project depends on a subpackage where the repo containing that subpackage stores it with spaces in its directory name (i.e. a repo with the following structure):

\src
\spago.yaml
\sub package with spaces
  \src
  \spago.yaml

does the directory under which Spago stores the sub package with spaces's package include spaces? If it does, we need to escape these. If not, then no escaping is needed here.

We don't control the subpackage part of subpackage\src\**\*.purs. Again, if subpackage contains spaces, it needs to be escaped.

On another note, the original PR in Cross Spawn that refactored the argument escaping apparently did so because of a security vulnerability in NPM. Per this comment, the bug seems related to the usage of cmd-shims in node_modules/.bin/* files.

In the purs.cmd case, we aren't calling node.exe js-dependency/index.js like, I assume, those shims do. Rather, we call the purs binary directly. So, I'm not sure whether this security vulnerability affects our particular usage. Emailing the maintainer may provide more context of the original issue.

To conclude:

  • node-execa should be updated to allow one to have more control over whether a command/argument should be escaped and how to escape it. I'm not sure what these semantics should be quite yet. But, they would only take effect when the command is ran on Windows when the cmd.exe-wrapping code path is taken.
  • spago should update its purs usage to use these new semantics.

@f-f
Copy link
Member

f-f commented Dec 28, 2023

Also, I don't think merging this PR and then fixing #1048 is worth it if doing so complicates Spago's development due to unfixable CI errors.

That ticket is a bug so we'd need to get to it anyways, and in any case without this PR we're behind upstream - so I'd merge it sooner rather than later. The root cause of the Windows misbehaviour still needs to be taken care of though

@f-f
Copy link
Member

f-f commented Dec 28, 2023

We don't control the subpackage part of subpackage\src***.purs. Again, if subpackage contains spaces, it needs to be escaped.

We can't assume to be in control of any package that doesn't come from the registry. Subpackages, other local packages, and custom git repos are all possible sources of weird characters.

@JordanMartinez
Copy link
Contributor Author

I spent more time on investigating this and I have a clear path forward now.

There's a few corrections from previous comments I need to make and some other updates:

  • The original cross spawn algorithm claims to be based on Rob's article. However, [BUG] *.CMD shims don't work when they are in paths containing shell metachars npm/cmd-shim#45 actually links to Everyone quotes command line arguments the wrong way, which describes an algorithm that cross-spawn generally does. But in this article, it clarified for me that CreateProcess is the only way to start a program on Windows. cmd.exe is just a program with a bad parser that more or less wraps that API.
  • When we call purs.cmd compile ..., we are passing through two layers. First, node-execa calls cmd.exe with this command to fix the issues with Node.jsdefaultspawnimplementation. Thenpurs.cmdlocates and calls thepurs.binbinary. Ironically,purs.cmdis a good tool to detect whether or not original arguments "made it" through to thepursbinary correctly because any source glob args that fail to find a pattern will state such an error. And any that are correct will cause aCompiling ...` message to appear.
  • I realized I could modify the node_modules/.bin/purs.cmd to turn off the initial @ECHO OFF to better understand how arguments are transformed.
  • Turns out the & character is a valid file name character. See [BUG] *.CMD shims don't work when they are in paths containing shell metachars npm/cmd-shim#45 where I first saw an example of that before trying it out personally.

I'm going to update node-execa to do the following:
- to only escape arguments when needed using purs.cmd as my primary test tool and a single source glob with problematic characters.
- expose utils that can detect when escaping a command and/or argument(s) is needed because of the metachar symbols like & appearing somewhere in the path.
- expose a util that can detect when a 'command too long' error will occur.

I think Spago could be updated to the following, but let me know your thoughts:

  • use the first util to warn the user when such metachars (e.g. &) appear in the path as it may be a hack
  • use the second util to explain to the user why the 'command too long' error will appear.

On a different note, a given glob .spago/foo/version/src/**/*.purs adds 17 characters for the double-escaping due to the usage of a cmd-shim: 4 chars for the starting open quote (i.e. ^^^"), 4 for the ending open quote (i.e. ^^^"), and 9 chars for escaping the three *s with 3 characters each (i.e. ^^^). Potentially, using read-cmd-shim to find the location of purs.bin would be a better solution here as we could call the binary directly. This would remove one layer of escaping.

@JordanMartinez
Copy link
Contributor Author

@f-f This is ready for a review.

I need to document this more thoroughly, but Fabrizio and I discussed this issue and decided that the best course of action was two things:

  1. updating the package directory from packages to p. It doesn't solve this problem, but it seems like nothing actually does. AFAICT, the escaping the node-execa does in the latest release is correct and shouldn't change.
  2. update purs to allow globs to be passed in via a file.

The first fix makes it possible to merge this and move forward. The second fixes the underlying problem here. While that second fix won't help older compiler versions, we're assuming that people will upgrade over time and it'll stop being relevant.

@f-f
Copy link
Member

f-f commented Jan 11, 2024

Thanks @JordanMartinez! I'll have a look


pure subprocess

joinProcess :: forall m. MonadAff m => Execa.ExecaProcess -> m (Either ExecError ExecResult)
joinProcess cp = liftAff $ cp.result
joinProcess :: forall m. MonadAff m => Execa.ExecaProcess -> m ExecResult
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the Either here?

It's nice because it forces the consumer to unpack the result. Just having an ExecResult means that one can accidentally forget to check for isSuccess.

We could keep a Either ExecResult ExecResult, which would be functionally equivalent, but force consumers to unpack the result of the subprocess call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple times where we want to get the stdout value regardless of whether a command failed or not. Needing to write either _.stdout _.stdout is annoying.

Just having an ExecResult means that one can accidentally forget to check for isSuccess.

Sure, but how often is that actually going to happen? I don't think this is something that's hard to miss when reviewing, similar to how one doesn't typically forget to check for the status code of an HTTP request.

I'm not against readding the Either, but I do think it should appear as Tuple ExecaResult (Either a b). I'm just not sure what a or b should be here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performing an HTTP request is an operation that can succeed or fail (for a variety of reasons), and it makes sense that the result is encapsulated in an Either.
E.g. Affjax has request :: forall a. AffjaxDriver -> Request a -> Aff (Either Error (Response a))

We use languages like PureScript to have the typesystem - rather than human reviewers - catch things like "I forgot to unpack this result": humans get things wrong, the compiler doesn't risk forgetting.

Needing to write either _.stdout _.stdout is annoying.

I think we can have a Cmd.getStdout :: Either ExecResult ExecResult -> String for cases like that one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I'll make the changes.

@JordanMartinez
Copy link
Contributor Author

On another note, should I update node-execa to have a debug hook, so that it outputs the final command and args used before calling spawn? I think it might help with debugging in the future.

@f-f
Copy link
Member

f-f commented Jan 12, 2024

On another note, should I update node-execa to have a debug hook, so that it outputs the final command and args used before calling spawn?

That would be nice, but it's not a prerequisite to get this one merged 🙂

@JordanMartinez
Copy link
Contributor Author

On another note, should I update node-execa to have a debug hook, so that it outputs the final command and args used before calling spawn?

That would be nice, but it's not a prerequisite to get this one merged 🙂

I'll do that in a separate PR. There might be other 'debug' hooks I should add.

@JordanMartinez
Copy link
Contributor Author

In #1129 (comment), I explained the decision Fabrizio and I reached. Below, I'd like to go into more detail, so that we can link to this post as a way to answer future questions.

First, on Windows, the create-process API is the only way to create a new process. AFAIK, the only way to call that API without using the Windows C APIs directly is through the cmd.exe. More on this in a bit.

Second, when someone installs a JavaScript binary (e.g. npm i purescript, npm i esbuild), the files are downloaded to their corresponding file in node_modules (e.g. node_modules/purescript/files-go-here) and ideally a symbolic link would appear at node_modules/.bin/symbolic-link-goes-here. Then when one one calls npx esbuild bundle, Node will execute ./node_modules/.bin/esbuild, providing a common folder to check for binaries.

However, Windows does not support symbolic links well. So, how do we get a symbolic link there? The solution is cmd-shim. Rather than creating a symbolic link for a binary named foo, Node will create the file, node_modules/.bin/foo.cmd (and the same for powershell). What is the purpose of that file? The file starts the binary. That's it. But the path to that binary is hard-coded into the file. Voila! Pseudo-symlinks on Windows via a layer of cmd.exe.

Third, when Node.js spawns a new child process, it's internally using cmd.exe for the reason mentioned above in the first point. However, it does so incorrectly. That's why the library cross-spawn exists: to solve some of these issues.

Fourth, when Spago wants to run a purs command via execa "purs" [ "compile", ... globs ... ], this is the pathway it takes. Notice how it goes through cmd.exe twice:

  1. Due to being on Windows, we must use cmd.exe to start a program called purs.cmd
  2. When executed, purs.cmd starts a program using cmd.exe called ../purescript/purs.bin
  3. The purs command is executed

Fifth, unfortunately and as noted above, cmd.exe has some painful CLI arg parsing behavior. Thus, every time we pass through that layer, we need to escape its arguments for that layer. Because of the usage of cmd-shim, we have to escape that layer twice before we call execa "purs" [ "compile", ... ]. The outermost escape layer is unwrapped in the initial cmd.exe call in step 1. The innermost escape layer is unwrapped in the next cmd.exe call in step 2.

Sixth, due to cmd.exe's parsing rules, failing to escape these metachars correctly can lead to a security vulnerability. Including the & sign in a file name, which is valid in Windows, will be interpreted by cmd.exe as two commands separated by that character (e.g. foo&bar -> foo & bar). Failing to escape that character could lead to undesirable results. The same can be said for other characters more relevant to this codebase: space and *.

Finally, as far as I could tell, cross-spawn is correctly escaping the metachars via the tests I added recently, and removing them or making them configurable would likely lead to an undesired security vulnerability. Hence, we decided to fix the immediate problem by reducing the command length on Windows (packages -> p) and fix the underlying problem by allowing one to pass globs in via a glob file.

Copy link
Member

@f-f f-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great now, thanks @JordanMartinez!

@f-f f-f merged commit 68b6446 into master Jan 14, 2024
3 checks passed
@f-f f-f deleted the jam/update-node-deps branch January 14, 2024 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants