-
-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(common): adds retry mechanism for build script npm ci calls #11451
Conversation
User Test ResultsTest specification and instructions User tests are not required Test Artifacts
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. Can we add function documentation at the front, similar to that found in builder.inc.sh, e.g.
keyman/resources/builder.inc.sh
Lines 227 to 243 in 6c00656
# | |
# Returns `0` if first parameter is in the array passed as second parameter, | |
# where the array may contain globs. | |
# | |
# ### Parameters | |
# | |
# * 1: `item` item to search for in array | |
# * 2: `array` bash array, e.g. `array=(one two three)` | |
# | |
# ### Example | |
# | |
# ```bash | |
# array=(foo bar it*) | |
# if _builder_item_in_glob_array "item" "${array[@]}"; then ...; fi | |
# ``` | |
# | |
_builder_item_in_glob_array() { |
Co-authored-by: Marc Durdin <marc@durdin.net>
Just discovered https://docs.npmjs.com/cli/v6/using-npm/config#fetch-retries and its friends: fetch-retries fetch-retry-factor fetch-retry-mintimeout fetch-retry-maxtimeout maxsockets We might wish to consider these because they'll be much cleaner than retrying the whole |
Also noting that the problem is not resolved with this PR, see https://build.palaso.org/buildConfiguration/Keyman_Common_KPAPI_TestPullRequests_macOS/463600
That last line is suspicious ... is the retry code working correctly? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With my previous comment -- I don't think this is working correctly
Following up on the quoted comment... viewing its link shows the following:
Based on that, I believe that those settings affect
|
Inspecting that build log, I don't think npm's retry module even kicked in. That, or there are custom defaults set on the agent that provide less retries than npm's default.
Note the elapsed time: approximately 30 seconds in total. The retry parameterization quoted above raises questions.
Shouldn't there be at least 110 seconds between initial |
No. |
https://build.palaso.org/buildConfiguration/Keyman_Test_Common_Windows/466721 is another example where npm retry may help -- in this case, not network related, but local fs related (possibly security software?):
and https://build.palaso.org/buildConfiguration/Keyman_Developer_Test/466684 also |
Trying it out locally with a temporary script... function inner_test ( ) {
npm install @keymanapp/totally-not-a-package-that-is-distributed-so-it-should-make-an-error
}
try_multiple_times inner_test I get the following if I disconnect from the internet mid-install:
It's not an Also of note: the retry script totally worked with npm's error outputs in this case - on both my Windows and macOS machines. |
Ah, inserting the invalid |
Well, the good news is... the retry mechanism definitely works now. Might need a spot of cleanup after failed attempts, though.
And... wait, what's this about a web/ specific |
Noted in discussion: it appears that our Linux BAs may be hitting out-of-memory due to VM restrictions that are then triggering the active This was noticed via cross-reference with https://build.palaso.org/buildConfiguration/Keyman_Test_Common_Linux/467596?buildTab=log&linesState=86&logView=flowAware&focusLine=102, which reported an error code of 137, which is indicative of memory issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted in discussion: it appears that our Linux BAs may be hitting out-of-memory due to VM restrictions that are then triggering the active npm ci call to be killed. That could easily leave the file-system updates halfway, which would then lead to the ENOTEMPTY error that followed in the prior log.
The exit code for OOM kill is 137 (https://stackoverflow.com/questions/53245385/npm-gets-killed-or-errno-137) so if we captured that we could do a cleanup of node_modules by wrapping the npm ci
into yet another function.
resources/shellHelperFunctions.sh
Outdated
sleep $wait_length | ||
fi | ||
|
||
if ! "$@"; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ! "$@"; then | |
if ! "$@"; then | |
builder_echo "Command failed with error $?" |
If would be really helpful to capture the error code here and report it. (Need to test that the exit code is not already lost; if it is then we'll need a slightly more complicated solution).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should've checked it locally first:
[web/src/app/browser] Command failed with error 0
Co-authored-by: Marc Durdin <marc@durdin.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Changes in this pull request will be available for download in Keyman version 18.0.50-alpha |
Fixes #10350.
I've tested the new
reattempt_if_failing
utility function on its own with the following two calls:reattempt_if_failing echo "this should pass"
reattempt_if_failing cd "not-a-directory"
@keymanapp-test-bot skip