chore(common): adds retry mechanism for build script npm ci calls #11451

jahorton · 2024-05-15T05:14:55Z

Fixes #10350.

I've tested the new reattempt_if_failing utility function on its own with the following two calls:

reattempt_if_failing echo "this should pass"
- passes immediately, no retries
reattempt_if_failing cd "not-a-directory"
- fails every time (cd: invalid path) as intended, properly delays between attempts

@keymanapp-test-bot skip

keymanapp-test-bot · 2024-05-15T05:14:58Z

User Test Results

Test specification and instructions

User tests are not required

Test Artifacts

Android
Developer
iOS
- Keyman for iOS (simulator image)
- FirstVoices Keyboards for iOS (simulator image)
- TestFlight internal PR build version - 18.0.45 (0.11451.11200)
Keyboards
- Test Keyboards
Linux
macOS
- Keyman for macOS
Web
- KeymanWeb Test Home
Windows

mcdurdin

This is looking good. Can we add function documentation at the front, similar to that found in builder.inc.sh, e.g.

keyman/resources/builder.inc.sh

Lines 227 to 243 in 6c00656

    
           # 
        
           # Returns `0` if first parameter is in the array passed as second parameter, 
        
           # where the array may contain globs. 
        
           # 
        
           # ### Parameters 
        
           # 
        
           # * 1: `item`       item to search for in array 
        
           # * 2: `array`      bash array, e.g. `array=(one two three)` 
        
           # 
        
           # ### Example 
        
           # 
        
           # ```bash 
        
           #   array=(foo bar it*) 
        
           #   if _builder_item_in_glob_array "item" "${array[@]}"; then ...; fi 
        
           # ``` 
        
           # 
        
           _builder_item_in_glob_array() {

resources/shellHelperFunctions.sh

Co-authored-by: Marc Durdin <marc@durdin.net>

mcdurdin · 2024-05-17T05:48:29Z

Just discovered https://docs.npmjs.com/cli/v6/using-npm/config#fetch-retries and its friends:

fetch-retries
Default: 2
Type: Number
The "retries" config for the retry module to use when fetching packages from the registry.

fetch-retry-factor
Default: 10
Type: Number
The "factor" config for the retry module to use when fetching packages.

fetch-retry-mintimeout
Default: 10000 (10 seconds)
Type: Number
The "minTimeout" config for the retry module to use when fetching packages.

fetch-retry-maxtimeout
Default: 60000 (1 minute)
Type: Number
The "maxTimeout" config for the retry module to use when fetching packages.

maxsockets
Default: 50
Type: Number
The maximum number of connections to use per origin (protocol/host/port combination). Passed to the http Agent used to make the request.

We might wish to consider these because they'll be much cleaner than retrying the whole npm ci?

mcdurdin · 2024-05-17T05:49:51Z

Also noting that the problem is not resolved with this PR, see https://build.palaso.org/buildConfiguration/Keyman_Common_KPAPI_TestPullRequests_macOS/463600

17:42:47   [common/tools/hextobin] ## configure starting...
17:42:48   npm WARN EBADENGINE Unsupported engine {
17:42:48   npm WARN EBADENGINE   package: undefined,
17:42:48   npm WARN EBADENGINE   required: { node: '^18.x' },
17:42:48   npm WARN EBADENGINE   current: { node: 'v21.7.3', npm: '10.5.1' }
17:42:48   npm WARN EBADENGINE }
17:42:48   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-node-xml2js.git
17:42:48   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-restructure.git
17:42:48   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-restructure.git
17:42:49   npm WARN deprecated @npmcli/move-file@2.0.1: This functionality has been moved to @npmcli/fs
17:43:17   npm ERR! code 1
17:43:17   npm ERR! git dep preparation failed
17:43:17   npm ERR! command /opt/homebrew/Cellar/node/21.7.3/bin/node /opt/homebrew/lib/node_modules/npm/bin/npm-cli.js install --force --cache=/Users/marc_durdin/.npm --prefer-offline=false --prefer-online=false --offline=false --no-progress --no-save --no-audit --include=dev --include=peer --include=optional --no-package-lock-only --no-dry-run
17:43:17   npm ERR! npm WARN using --force Recommended protections disabled.
17:43:17   npm ERR! npm ERR! code ECONNRESET
17:43:17   npm ERR! npm ERR! errno ECONNRESET
17:43:17   npm ERR! npm ERR! network Invalid response body while trying to fetch https://registry.npmjs.org/@parcel%2fgraph: aborted
17:43:17   npm ERR! npm ERR! network This is a problem related to network connectivity.
17:43:17   npm ERR! npm ERR! network In most cases you are behind a proxy or have bad network settings.
17:43:17   npm ERR! npm ERR! network
17:43:17   npm ERR! npm ERR! network If you are behind a proxy, please make sure that the
17:43:17   npm ERR! npm ERR! network 'proxy' config is set properly.  See: 'npm help config'
17:43:17   npm ERR!
17:43:17   npm ERR! npm ERR! A complete log of this run can be found in: /Users/marc_durdin/.npm/_logs/2024-05-16T10_42_51_434Z-debug-0.log
17:43:17   
17:43:17   npm ERR! A complete log of this run can be found in: /Users/marc_durdin/.npm/_logs/2024-05-16T10_42_47_808Z-debug-0.log
17:43:17   [common/tools/hextobin] Expected output: 'node_modules'.

That last line is suspicious ... is the retry code working correctly?

mcdurdin

With my previous comment -- I don't think this is working correctly

jahorton · 2024-05-29T08:51:17Z

Just discovered https://docs.npmjs.com/cli/v6/using-npm/config#fetch-retries and its friends:

[...]

Following up on the quoted comment... viewing its link shows the following:

The "retries" config for the retry module to use when fetching packages from the registry.

Based on that, I believe that those settings affect npm's use of the retry package as seen at https://www.npmjs.com/package/retry.

factor: The exponential factor to use. Default is 2.

[...]

The formula used to calculate the individual timeouts is:
Math.min(random * minTimeout * Math.pow(factor, attempt), maxTimeout)

jahorton · 2024-05-29T09:08:29Z

Inspecting that build log, I don't think npm's retry module even kicked in. That, or there are custom defaults set on the agent that provide less retries than npm's default.

17:42:47   [common/tools/hextobin] ## configure starting...
17:42:48   npm WARN EBADENGINE Unsupported engine {
17:42:48   npm WARN EBADENGINE   package: undefined,
17:42:48   npm WARN EBADENGINE   required: { node: '^18.x' },
17:42:48   npm WARN EBADENGINE   current: { node: 'v21.7.3', npm: '10.5.1' }
17:42:48   npm WARN EBADENGINE }
17:42:48   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-node-xml2js.git
17:42:48   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-restructure.git
17:42:48   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-restructure.git
17:42:49   npm WARN deprecated @npmcli/move-file@2.0.1: This functionality has been moved to @npmcli/fs
17:43:17   npm ERR! code 1
17:43:17   npm ERR! git dep preparation failed
17:43:17   npm ERR! command /opt/homebrew/Cellar/node/21.7.3/bin/node /opt/homebrew/lib/node_modules/npm/bin/npm-cli.js install --force --cache=/Users/marc_durdin/.npm --prefer-offline=false --prefer-online=false --offline=false --no-progress --no-save --no-audit --include=dev --include=peer --include=optional --no-package-lock-only --no-dry-run
17:43:17   npm ERR! npm WARN using --force Recommended protections disabled.
17:43:17   npm ERR! npm ERR! code ECONNRESET
17:43:17   npm ERR! npm ERR! errno ECONNRESET
17:43:17   npm ERR! npm ERR! network Invalid response body while trying to fetch https://registry.npmjs.org/@parcel%2fgraph: aborted
[...]

Note the elapsed time: approximately 30 seconds in total.

The retry parameterization quoted above raises questions.

Default retry-factor of 2: asserts that after an initial failure, two retry attempts were made.
fetch-retry-factor of 10: asserts that the second retry's timeouts should be multiplied by 10.
fetch-retry-mintimeout of 10000 (10 seconds)

Shouldn't there be at least 110 seconds between initial npm call and the final, reported failure? mintimeout x factor seem to suggest this. retry's documentation says that the "min" and "max" timeouts are supposed to be a delay between tries, rather than limits for the retry's elapsed time. So... why did it it take only about 30 seconds to fail? Is there a custom .npmrc already on the build agents that's set more aggressively?

mcdurdin · 2024-05-29T23:24:31Z

Is there a custom .npmrc already on the build agents that's set more aggressively?

No.

mcdurdin · 2024-05-30T01:17:48Z

https://build.palaso.org/buildConfiguration/Keyman_Test_Common_Windows/466721 is another example where npm retry may help -- in this case, not network related, but local fs related (possibly security software?):

13:39:39   npm WARN cleanup     [Error: EPERM: operation not permitted, rmdir 'C:\BuildAgent\work\99b311828f4ee7c\keyman\node_modules\@microsoft\tsdoc-config\lib\__tests__'] {
13:39:39   npm WARN cleanup       errno: -4048,
13:39:39   npm WARN cleanup       code: 'EPERM',
13:39:39   npm WARN cleanup       syscall: 'rmdir',
13:39:39   npm WARN cleanup       path: 'C:\\BuildAgent\\work\\99b311828f4ee7c\\keyman\\node_modules\\@microsoft\\tsdoc-config\\lib\\__tests__'
13:39:39   npm WARN cleanup     }
13:39:39   npm WARN cleanup   ]
13:39:39   npm WARN cleanup ]
13:39:39   npm ERR! code 1
13:39:39   npm ERR! git dep preparation failed
13:39:39   npm ERR! command C:\Program Files\nodejs\node.exe C:\Users\bob\global_node\node_modules\npm\bin\npm-cli.js install --force --cache=C:\Users\bob\AppData\Local\npm-cache --prefer-offline=false --prefer-online=false --offline=false --no-progress --no-save --no-audit --include=dev --include=peer --include=optional --no-package-lock-only --no-dry-run
13:39:39   npm ERR! npm WARN using --force Recommended protections disabled.
13:39:39   npm ERR! npm ERR! code EBUSY
13:39:39   npm ERR! npm ERR! syscall open
13:39:39   npm ERR! npm ERR! path C:\Users\bob\AppData\Local\npm-cache\_cacache\index-v5\77\5e\7e30208cca13688147d134de1109e20935222521f11e1ceef8001daae2ed
13:39:39   npm ERR! npm ERR! errno EBUSY
13:39:39   npm ERR! npm ERR! Invalid response body while trying to fetch https://registry.npmjs.org/ansi-colors: EBUSY: resource busy or locked, open 'C:\Users\bob\AppData\Local\npm-cache\_cacache\index-v5\77\5e\7e30208cca13688147d134de1109e20935222521f11e1ceef8001daae2ed'
13:39:39   npm ERR!
13:39:39   npm ERR! npm ERR! A complete log of this run can be found in: C:\Users\bob\AppData\Local\npm-cache\_logs\2024-05-29T06_36_26_705Z-debug-0.log

and https://build.palaso.org/buildConfiguration/Keyman_Developer_Test/466684 also

resources/shellHelperFunctions.sh

jahorton · 2024-05-31T07:26:21Z

Trying it out locally with a temporary script...

function inner_test ( ) {
  npm install @keymanapp/totally-not-a-package-that-is-distributed-so-it-should-make-an-error
}

try_multiple_times inner_test

I get the following if I disconnect from the internet mid-install:

$ ./temp.sh
npm ERR! code E404
npm ERR! 404 Not Found - GET https://registry.npmjs.org/@keymanapp%2ftotally-not-a-package-that-is-distributed-so-it-should-make-an-error - Not found
npm ERR! 404
npm ERR! 404  '@keymanapp/totally-not-a-package-that-is-distributed-so-it-should-make-an-error@*' is not in this registry.
npm ERR! 404
npm ERR! 404 Note that you can also install from a
npm ERR! 404 tarball, folder, http url, or git url.

npm ERR! A complete log of this run can be found in: C:\Users\User\AppData\Local\npm-cache\_logs\2024-05-31T07_17_09_049Z-debug-0.log
Delaying 39 seconds before attempt 2: `inner_test`
npm ERR! code ENOTFOUND
npm ERR! syscall getaddrinfo
npm ERR! errno ENOTFOUND
npm ERR! network request to https://registry.npmjs.org/@keymanapp%2ftotally-not-a-package-that-is-distributed-so-it-should-make-an-error failed, reason: getaddrinfo ENOTFOUND registry.npmjs.org
npm ERR! network This is a problem related to network connectivity.
npm ERR! network In most cases you are behind a proxy or have bad network settings.
npm ERR! network
npm ERR! network If you are behind a proxy, please make sure that the
npm ERR! network 'proxy' config is set properly.  See: 'npm help config'

npm ERR! A complete log of this run can be found in: C:\Users\User\AppData\Local\npm-cache\_logs\2024-05-31T07_17_50_553Z-debug-0.log
Delaying 77 seconds before attempt 3: `inner_test`
[...]

It's not an ECONNRESET, but we are getting a similar message. Of course, disconnecting from the internet would be "bad network settings".

Also of note: the retry script totally worked with npm's error outputs in this case - on both my Windows and macOS machines.

jahorton · 2024-05-31T08:07:01Z

Ah, inserting the invalid npm install command in place of the npm ci call did the trick - it appears that the surrounding pushd/popd were affecting things. Come to think of it, if the npm ci call failed, it wouldn't have gotten to do the popd afterward.

jahorton · 2024-05-31T08:19:37Z

Well, the good news is... the retry mechanism definitely works now.

https://build.palaso.org/buildConfiguration/Keyman_Test_Common_Web/467970?buildTab=log&focusLine=170&logView=linear&linesState=102

Might need a spot of cleanup after failed attempts, though.

15:12:49   [common/web/keyman-version] Delaying 52 seconds before attempt 2: `npm ci`
15:13:43   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-node-xml2js.git
15:13:43   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-restructure.git
15:13:43   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-restructure.git
15:13:50   npm WARN deprecated @npmcli/move-file@2.0.1: This functionality has been moved to @npmcli/fs
15:14:54   [common/web/keyman-version] Delaying 55 seconds before attempt 3: `npm ci`
15:14:54   /var/lib/TeamCity/work/99b311828f4ee7c/keyman/resources/shellHelperFunctions.sh: line 196: 3974199 Killed                  "$@"
15:15:51   npm ERR! code ENOTEMPTY
15:15:51   npm ERR! syscall rename
15:15:51   npm ERR! path /var/lib/TeamCity/work/99b311828f4ee7c/keyman/web/node_modules/typescript
15:15:51   npm ERR! dest /var/lib/TeamCity/work/99b311828f4ee7c/keyman/web/node_modules/.typescript-FshlhPnn
15:15:51   npm ERR! errno -39
15:15:51   npm ERR! ENOTEMPTY: directory not empty, rename '/var/lib/TeamCity/work/99b311828f4ee7c/keyman/web/node_modules/typescript' -> '/var/lib/TeamCity/work/99b311828f4ee7c/keyman/web/node_modules/.typescript-FshlhPnn'
15:15:51   
15:15:51   npm ERR! A complete log of this run can be found in: /home/bob/.npm/_logs/2024-05-31T08_15_49_718Z-debug-0.log

And... wait, what's this about a web/ specific node_modules/typescript? Sure enough, the package.json for web/ is oddly showing the older version that we dropped. Will fix that in the 🔩 PR set (new commit on #11464).

jahorton · 2024-06-03T07:00:47Z

Noted in discussion: it appears that our Linux BAs may be hitting out-of-memory due to VM restrictions that are then triggering the active npm ci call to be killed. That could easily leave the file-system updates halfway, which would then lead to the ENOTEMPTY error that followed in the prior log.

This was noticed via cross-reference with https://build.palaso.org/buildConfiguration/Keyman_Test_Common_Linux/467596?buildTab=log&linesState=86&logView=flowAware&focusLine=102, which reported an error code of 137, which is indicative of memory issues.

mcdurdin

Noted in discussion: it appears that our Linux BAs may be hitting out-of-memory due to VM restrictions that are then triggering the active npm ci call to be killed. That could easily leave the file-system updates halfway, which would then lead to the ENOTEMPTY error that followed in the prior log.

The exit code for OOM kill is 137 (https://stackoverflow.com/questions/53245385/npm-gets-killed-or-errno-137) so if we captured that we could do a cleanup of node_modules by wrapping the npm ci into yet another function.

mcdurdin · 2024-06-03T07:02:43Z

resources/shellHelperFunctions.sh

+    sleep $wait_length
+  fi
+
+  if ! "$@"; then


Suggested change

if ! "$@"; then

if ! "$@"; then

builder_echo "Command failed with error $?"

If would be really helpful to capture the error code here and report it. (Need to test that the exit code is not already lost; if it is then we'll need a slightly more complicated solution).

I should've checked it locally first:

[web/src/app/browser] Command failed with error 0

Co-authored-by: Marc Durdin <marc@durdin.net>

mcdurdin

LGTM

keyman-server · 2024-06-06T18:06:31Z

Changes in this pull request will be available for download in Keyman version 18.0.50-alpha

chore(web): adds retry mechanism for build script npm ci calls

5a0e66c

jahorton requested a review from mcdurdin as a code owner May 15, 2024 05:14

keymanapp-test-bot bot added this to the A18S2 milestone May 15, 2024

github-actions bot added common/ common/resources/ Build infrastructure chore labels May 15, 2024

mcdurdin requested changes May 15, 2024

View reviewed changes

chore(common): adjustments per code review

4705889

jahorton requested a review from mcdurdin May 15, 2024 07:28

mcdurdin approved these changes May 16, 2024

View reviewed changes

resources/shellHelperFunctions.sh Outdated Show resolved Hide resolved

resources/shellHelperFunctions.sh Outdated Show resolved Hide resolved

chore(common): Apply suggestions from code review

e382ddd

Co-authored-by: Marc Durdin <marc@durdin.net>

mcdurdin modified the milestones: A18S2, A18S3 May 24, 2024

mcdurdin requested changes May 28, 2024

View reviewed changes

mcdurdin reviewed May 30, 2024

View reviewed changes

resources/shellHelperFunctions.sh Outdated Show resolved Hide resolved

Joshua Horton added 3 commits May 31, 2024 14:52

fix(common): only retry the npm ci call itself

6251faa

Merge branch 'master' into chore/common/retry-npm-ci

27a959d

change(common): simplifies retry command-execution syntax

b6a06c6

fix(common): revert accidental test-code inclusion

66bdc29

mcdurdin reviewed Jun 3, 2024

View reviewed changes

jahorton requested a review from mcdurdin June 3, 2024 08:19

jahorton and others added 2 commits June 5, 2024 12:46

change(common): emit error-code on retry fail

e2e8ed3

Co-authored-by: Marc Durdin <marc@durdin.net>

fix(common): better error-code capture

c91ffb9

mcdurdin approved these changes Jun 5, 2024

View reviewed changes

jahorton merged commit e9ccd6f into master Jun 6, 2024
26 checks passed

jahorton deleted the chore/common/retry-npm-ci branch June 6, 2024 01:10

jahorton mentioned this pull request Jun 10, 2024

chore(web): CI builds are currently extremely unreliable #10494

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(common): adds retry mechanism for build script npm ci calls #11451

chore(common): adds retry mechanism for build script npm ci calls #11451

jahorton commented May 15, 2024 •

edited

Loading

keymanapp-test-bot bot commented May 15, 2024 •

edited

Loading

mcdurdin left a comment

mcdurdin commented May 17, 2024

mcdurdin commented May 17, 2024

mcdurdin left a comment

jahorton commented May 29, 2024 •

edited

Loading

jahorton commented May 29, 2024 •

edited

Loading

mcdurdin commented May 29, 2024

mcdurdin commented May 30, 2024 •

edited

Loading

jahorton commented May 31, 2024 •

edited

Loading

jahorton commented May 31, 2024

jahorton commented May 31, 2024 •

edited

Loading

jahorton commented Jun 3, 2024 •

edited

Loading

mcdurdin left a comment

mcdurdin Jun 3, 2024

jahorton Jun 5, 2024

mcdurdin left a comment

keyman-server commented Jun 6, 2024

	#
	# Returns `0` if first parameter is in the array passed as second parameter,
	# where the array may contain globs.
	#
	# ### Parameters
	#
	# * 1: `item` item to search for in array
	# * 2: `array` bash array, e.g. `array=(one two three)`
	#
	# ### Example
	#
	# ```bash
	# array=(foo bar it*)
	# if _builder_item_in_glob_array "item" "${array[@]}"; then ...; fi
	# ```
	#
	_builder_item_in_glob_array() {

	if ! "$@"; then
	if ! "$@"; then
	builder_echo "Command failed with error $?"

chore(common): adds retry mechanism for build script npm ci calls #11451

chore(common): adds retry mechanism for build script npm ci calls #11451

Conversation

jahorton commented May 15, 2024 • edited Loading

keymanapp-test-bot bot commented May 15, 2024 • edited Loading

User Test Results

Test Artifacts

mcdurdin left a comment

Choose a reason for hiding this comment

mcdurdin commented May 17, 2024

mcdurdin commented May 17, 2024

mcdurdin left a comment

Choose a reason for hiding this comment

jahorton commented May 29, 2024 • edited Loading

jahorton commented May 29, 2024 • edited Loading

mcdurdin commented May 29, 2024

mcdurdin commented May 30, 2024 • edited Loading

jahorton commented May 31, 2024 • edited Loading

jahorton commented May 31, 2024

jahorton commented May 31, 2024 • edited Loading

jahorton commented Jun 3, 2024 • edited Loading

mcdurdin left a comment

Choose a reason for hiding this comment

mcdurdin Jun 3, 2024

Choose a reason for hiding this comment

jahorton Jun 5, 2024

Choose a reason for hiding this comment

mcdurdin left a comment

Choose a reason for hiding this comment

keyman-server commented Jun 6, 2024

jahorton commented May 15, 2024 •

edited

Loading

keymanapp-test-bot bot commented May 15, 2024 •

edited

Loading

jahorton commented May 29, 2024 •

edited

Loading

jahorton commented May 29, 2024 •

edited

Loading

mcdurdin commented May 30, 2024 •

edited

Loading

jahorton commented May 31, 2024 •

edited

Loading

jahorton commented May 31, 2024 •

edited

Loading

jahorton commented Jun 3, 2024 •

edited

Loading