Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle nonexisting /sbin/init a bit more cleanly #2678

Merged
merged 1 commit into from
Dec 27, 2021

Conversation

lsfxz
Copy link

@lsfxz lsfxz commented Dec 6, 2021

TL;DR: nvm install fails in Arch's makepkg environment when /sbin/init does not exist. No idea (yet) why exactly this happens, but slightly modifying the line checking for /sbin/init here seems to fix things.

A few more details/reasoning:

In some very rare circumstances, a non-existing /sbin/init seems to cause an nvm install to fail without any 'good' hint as to why it fails. (It seems like the commit 7f2ccd5 made it especially hard to pinpoint what's going on here, because even if things silently fail there, it might still cause a script failure in those circumstances.

This could look like this, for example:

Downloading and installing node v14.18.1...
Binary download failed, trying source.

…and then nothing more.

I am not entirely certain what exactly causes it to fail here – I'd (very broadly) assume that some "globally" set shell option from whatever called nvm might still cause the ls error to cause the nvm script to exit. I've originally come across it on Arch in makechrootpkg, although the main culprit seems to be in makepkg itself – which seems to be enough to reproduce it.

Basically, it happens when running makepkg for a PKGBUILD where nvm is used (as, for example, recommended at https://wiki.archlinux.org/title/Node.js_package_guidelines#Using_nvm), when /sbin/init does not exist (on Arch it does not exist by default, but is provided by the systemd-sysvcompat package).

I've only come across it by accident by building in a "clean chroot" (https://wiki.archlinux.org/title/DeveloperWiki:Building_in_a_clean_chroot), where a package providing /sbin/init is not installed by default. The chroot environment itself does not seem to be relevant though (directly using nvm in it is possible) – it seems like it's only the makepkg environment that seems to matter.

On top of that, this "had worked a few weeks ago" – but I was not able to find out what had changed on Arch's end that might have caused this.

I understand this is probably not really an nvm issue – I had just hoped that, with this being an easy fix (without any obvious downsides – to me at least), which shouldn't do much damage but just handles the non-existing file "a bit more cleanly", it might be easiest to address this here. Unless I can find out where things are actually causing the issue in Arch's tools.

Copy link
Member

@ljharb ljharb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned about papering over the real issue here - if the dir doesn't exist, ls should fail inside the subshell, which should just make $L have no output, which would mean the condition is false on the next line, and thus nothing should happen.

Perhaps if you have -e set (which nobody ever should)? If so, can you try unsetting -e and seeing if that fixes your issue?

nvm.sh Outdated
@@ -1851,7 +1851,7 @@ nvm_get_arch() {
esac

# If running a 64bit ARM kernel but a 32bit ARM userland, change ARCH to 32bit ARM (armv7l)
L=$(ls -dl /sbin/init 2>/dev/null) # if /sbin/init is 32bit executable
[ -f /sbin/init ] && L=$(ls -dl /sbin/init 2>/dev/null) # if /sbin/init is 32bit executable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe this means L won't be unconditionally set, which means the next line could error if the u option is set. Can we move the condition inside the subshell?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense – did that, thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As to the -e question / the more global issue: I have not set it in my environment (and the issue also occurs in a clean chroot without any custom stuff). Still, there is an errexit set in the script that is used by Arch for building a package (https://gitlab.archlinux.org/pacman/pacman/-/blob/master/scripts/makepkg.sh.in#L387-406) – but that's been there for a while (while it has only become an issue with nvm sometime during the past few weeks, I think).

I'll fully admit that my bash / shell scripting knowledge is way to limited to properly assess if that might still be the root cause and/or how to address it there instead (or find out what other change over there might've caused it). nvm was the easier target for me ;)

Copy link
Author

@lsfxz lsfxz Dec 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's getting even more, hm, interesting?

With the check in the subshell, it does fail again as it did with the original code – which I didn't expect!

With a minimal example, like:

#!/bin/bash

set -e

[ -f /sbin/somethingnonexisting ] && L=$(ls -dl /sbin/somethingnonexisting 2>/dev/null)
echo '1'
L=$([ -f /sbin/somethingnonexisting ] && ls -dl /sbin/somethingnonexisting 2>/dev/null)
echo '2'

the resulting output would only be

1

This could be worked around by forcing it to return something "successful", but that doesn't seem like good/clean code to me.

#!/bin/bash

set -e

[ -f /sbin/somethingnonexisting ] && L=$(ls -dl /sbin/somethingnonexisting 2>/dev/null)
echo '1'
L=$([ -f /sbin/somethingnonexisting ] && ls -dl /sbin/somethingnonexisting 2>/dev/null || true)
echo '2'

Interestingly, it seems like we can get around the whole issue by just using local (we are in a function here in the nvm.sh, so that should be fine) – came upon this at https://unix.stackexchange.com/questions/23026/how-can-i-get-bash-to-exit-on-backtick-failure-in-a-similar-way-to-pipefail/88338#88338.

This seems to work with bash, zsh and dash, so I'd assume it should work "everywhere"?

A simple standalone example:

#!/bin/bsh

set -e

whatever(){
    [ -f /sbin/somethingnonexisting ] && L=$(ls -dl /sbin/somethingnonexisting 2>/dev/null)
    echo '1'
    local L=$(ls -dl /sbin/somethingnonexisting 2>/dev/null)
    echo '2'
}

whatever

This doesn't bail out early and gives the output:

1
2

as hoped for.

To sum it all up:

local L=$(ls -dl /sbin/init 2>/dev/null) # if /sbin/init is 32bit executable should be all we'd need to change this line to, if I'm not mistaken – would that sound good to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't that what the line already is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following only works in Bash because when command substitution ($(command)) is used to assign a value to a variable while declaring it (local VAR), error handling is suppressed for the command being substituted.

local L=$(ls -dl /sbin/init 2>/dev/null)

See here for more information, although there is no mention of what happens when errexit and/or errtrace (set -e and set -E respectively) are enabled.

I'm not sure whether to blame a Bash change or a makepkg update, but nvm install fails consistently on my Arch build server with any package that uses nvm, despite working previously with no change to this line.

Adding set -x before nvm install results in the following output because--according to echo "$-"; trap--makepkg invokes PKGBUILD functions with set -e, set -E, and an ERR trap that calls exit 4, which means that when ls -dl fails, exit 4 terminates four nested subshells before the error is caught and Binary download failed, trying source. is printed.

+++++ ...
++++ HOST_ARCH=x86_64
++++ local NVM_ARCH
++++ case "${HOST_ARCH}" in
++++ NVM_ARCH=x64
+++++ ls -dl /sbin/init
++++++ error_function prepare
++++++ ((  ! BASH_SUBSHELL  ))
++++++ exit 4
++++ L=
+++++ error_function prepare
+++++ ((  ! BASH_SUBSHELL  ))
+++++ exit 4
+++ NVM_ARCH=
++++ error_function prepare
++++ ((  ! BASH_SUBSHELL  ))
++++ exit 4
++ SLUG=
+++ error_function prepare
+++ ((  ! BASH_SUBSHELL  ))
+++ exit 4
+ TARBALL=
+ ...

I assume the intention is for nvm to work in any shell environment regardless of its initial state, so adding set +E if [[ $- == *E* ]] to the existing check for set -e should resolve this while preventing similar failures in future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set -e should never be used anyways - but nvm has measures in place to ensure it’s not set for nvm itself. I’m not sure what the E option does?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set -e is often used in non-interactive/CI contexts, but yes, I had noticed nvm re-invokes itself without it if needed (same for set -a and a non-standard IFS). 👍🏻

set -E enables ERR trap inheritance in Bash. Without -E, setting an ERR trap like trap 'exit 4' ERR would only run exit 4 if a command fails in the current scope, but with -E, exit 4 would also be executed whenever a command fails in a function, command substitution or subshell invoked or created by the current scope or any of its children. The Bash manual puts it like this:

-E   If set, any trap on ERR is inherited by shell functions, command substitutions, and commands executed in a subshell environment. The ERR trap is normally not inherited in such cases.

If an ERR trap exits non-zero (as it does when nvm is invoked by an Arch package build script), the effect of -E is identical to -e. Again, quoting the Bash manual:

If a sigspec is ERR, the command arg is executed whenever a pipeline (which may consist of a single simple command), a list, or a compound command returns a non-zero exit status, subject to the following conditions. The ERR trap is not executed if the failed command is part of the command list immediately following a while or until keyword, part of the test in an if statement, part of a command executed in a && or || list except the command following the final && or ||, any command in a pipeline but the last, or if the command's return value is being inverted using !. These are the same conditions obeyed by the errexit (-e) option.

I've prepared a fix that hopefully addresses all of this without side-effects on other shells and will open a PR soon.

Copy link
Author

@lsfxz lsfxz Dec 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting to the bottom of that @lkrms, and for providing a much less hacky attempt to fix the underlying issue!

I guess we should close this PR then, as it's superseded/replaced by #2698?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s hold off; i may still land both.

@lsfxz lsfxz force-pushed the sbin_init_error branch 2 times, most recently from 2d771a3 to 4a8d10a Compare December 14, 2021 17:25
Copy link
Member

@ljharb ljharb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every variable in nvm should have a local declaration, so this is now a very straightforward bug fix, thanks!

@lsfxz
Copy link
Author

lsfxz commented Dec 14, 2021

Awesome – and thanks for your patience with me on this! :)

/edit

Ah, crap: Seems like the local trick does not work if it's a separate declaration. Well, back to the drawing board, I guess.

@ljharb
Copy link
Member

ljharb commented Dec 27, 2021

This is now included in #2698, and will close when it lands.

@ljharb ljharb marked this pull request as ready for review December 27, 2021 21:34
@ljharb ljharb merged commit fb4538b into nvm-sh:master Dec 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants