Skip to content
This repository has been archived by the owner on Jan 8, 2024. It is now read-only.

tests/Frontend/ember-build-tests: Unhandled error event #2260

Closed
briancain opened this issue Sep 8, 2021 · 37 comments · Fixed by #2264 or #2297
Closed

tests/Frontend/ember-build-tests: Unhandled error event #2260

briancain opened this issue Sep 8, 2021 · 37 comments · Fixed by #2264 or #2297
Assignees

Comments

@briancain
Copy link
Member

Describe the bug

The ember build frontend tests seem to be failing often. I think this test specifically has failed for me about ~5 times today.

https://app.circleci.com/pipelines/github/hashicorp/waypoint/9896/workflows/0f6d5e84-1e59-47ef-9916-75f8f0d22a36/jobs/87707

WARNING: Detected collisions between .js and .ts files of the same name. This can result in nondeterministic build output; see https://git.io/JvIwo for more information.
  - waypoint/services/flash-messages.{js,ts}
[BABEL] Note: The code generator has deoptimised the styling of /home/circleci/project/ui/node_modules/axe-core/axe.js as it exceeds the max of 500KB.
[BABEL] Note: The code generator has deoptimised the styling of /home/circleci/project/ui/node_modules/waypoint-pb/server_pb.js as it exceeds the max of 500KB.
events.js:291
      throw er; // Unhandled 'error' event
      ^

Error [ERR_IPC_CHANNEL_CLOSED]: Channel closed
    at ChildProcess.target.send (internal/child_process.js:680:16)
    at Object.sendMessage (/home/circleci/project/ui/node_modules/stagehand/lib/adapters/child-process.js:30:39)
    at CommandCoordinator.sendMessage (/home/circleci/project/ui/node_modules/stagehand/lib/command-coordinator.js:81:23)
    at CommandCoordinator.sendCommand (/home/circleci/project/ui/node_modules/stagehand/lib/command-coordinator.js:36:14)
    at Stagehand.<anonymous> (/home/circleci/project/ui/node_modules/stagehand/lib/stagehand.js:86:44)
    at Generator.next (<anonymous>)
    at /home/circleci/project/ui/node_modules/stagehand/lib/stagehand.js:7:71
    at new Promise (<anonymous>)
    at __awaiter (/home/circleci/project/ui/node_modules/stagehand/lib/stagehand.js:3:12)
    at Stagehand.call (/home/circleci/project/ui/node_modules/stagehand/lib/stagehand.js:82:16)
Emitted 'error' event on ChildProcess instance at:
    at internal/child_process.js:684:35
    at processTicksAndRejections (internal/process/task_queues.js:79:11) {
  code: 'ERR_IPC_CHANNEL_CLOSED'
}
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command

Steps to Reproduce

The tests flake occasionally on PRs and commits to main.

Expected behavior

They pass.

Waypoint Platform Versions

N/A

Additional context
Add any other context about the problem here.

@briancain briancain changed the title tests/Frontend: js files exceeds max size tests/Frontend/ember-build-tests: js files exceeds max size Sep 8, 2021
@jgwhite
Copy link
Contributor

jgwhite commented Sep 8, 2021

My guess is that some aspect of the UI build pipeline has a memory leak, leading it to exhaust the available memory in the CI container. See angular/angular-cli#8331 (angular-cli is a fork of ember-cli). I don’t have a ton of evidence for this, just a hunch.

@briancain briancain changed the title tests/Frontend/ember-build-tests: js files exceeds max size tests/Frontend/ember-build-tests: Unhandled error event Sep 8, 2021
@jgwhite
Copy link
Contributor

jgwhite commented Sep 8, 2021

The key message to look at here is the ERR_IPC_CHANNEL_CLOSED, essentially indicating that something has abruptly killed the build process.

@jgwhite jgwhite self-assigned this Sep 8, 2021
@jgwhite
Copy link
Contributor

jgwhite commented Sep 9, 2021

Let’s work backwards from the key ERR_IPC_CHANNEL_CLOSED error. The stack trace is almost entirely in stagehand, an inter-process coordination library. The stack trace doesn’t tell us what exactly called into stagehand, so let’s see which of our dependencies use it:

❯ yarn why stagehand
=> Found "stagehand@1.0.0"
   Reasons this module exists
     - "ember-cli-typescript" depends on it
     - Hoisted from "ember-cli-typescript#stagehand"
     - Hoisted from "@glimmer#component#ember-cli-typescript#stagehand"
     - Hoisted from "ember-load-initializers#ember-cli-typescript#stagehand"
     - Hoisted from "ember-intl#ember-cli-typescript#stagehand"
     - Hoisted from "ember-a11y-testing#@ember#test-waiters#ember-cli-typescript#stagehand"

Though it appears in multiple places, it’s always a dependency of ember-cli-typescript, so it seems reasonable to focus our investigation on ember-cli-typescript. Can we upgrade ember-cli-typescript? What version(s) are we running right now?

❯ yarn why ember-cli-typescript
=> Found "ember-cli-typescript@3.1.4"
   Has been hoisted to "ember-cli-typescript"
   Reasons this module exists
      - Specified in "devDependencies"
      - Hoisted from "ember-a11y-testing#ember-cli-typescript"
      - Hoisted from "ember-fetch#ember-cli-typescript"
      - Hoisted from "ember-concurrency-decorators#ember-cli-typescript"
      - Hoisted from "ember-in-viewport#ember-modifier#ember-cli-typescript"
      - Hoisted from "@ember#test-helpers#@ember#test-waiters#ember-cli-typescript"
=> Found "@glimmer/component#ember-cli-typescript@3.0.0"
   This module exists because "@glimmer#component" depends on it.
=> Found "ember-load-initializers#ember-cli-typescript@2.0.2"
   This module exists because "ember-load-initializers" depends on it.
=> Found "ember-intl#ember-cli-typescript@4.0.0"
   This module exists because "ember-intl" depends on it.
=> Found "ember-a11y-testing#@ember/test-waiters#ember-cli-typescript@4.1.0"
   This module exists because "ember-a11y-testing#@ember#test-waiters" depends on it.

We’re using five different version of ember-cli-typescript across the dependency tree. Thanks NPM 🤦‍♀️

What version(s) of TypeScript itself are we using?

❯ yarn why typescript
=> Found "typescript@3.9.6"
   Has been hoisted to "typescript"
   This module exists because it's specified in "devDependencies".

That’s more reasonable at least.

Current working theory:
It seems ember-cli-typescript uses stagehand to launch some kind of worker. Perhaps that worker has a memory leak and is getting killed by the OS when it uses more than its fair share of memory. Or perhaps it’s triggering a SEGV. Either way, from stagehand’s perspective, the process it launched (and it trying to talk to) has abruptly ceased communicating. The worker seems to basically drive a bunch of stuff from TypeScript.

Next step:
See if we can upgrade ember-cli-typescript and/or typescript itself.

@jgwhite
Copy link
Contributor

jgwhite commented Sep 9, 2021

Alternative next step:
Try using Node 14 for all this stuff instead of 12. 14 is likely faster and less memory hungry.

jgwhite added a commit that referenced this issue Sep 9, 2021
Hoping this addresses #2260

🤞
@jgwhite jgwhite linked a pull request Sep 9, 2021 that will close this issue
@jgwhite
Copy link
Contributor

jgwhite commented Sep 9, 2021

There’s no guarantee #2264 fixes this. So let’s keep an eye in things and, if this flapper recurs, re-open the issue and continue the investigation.

@briancain
Copy link
Member Author

Hmm, I've seen this fail a couple of times already today. 😢 I think I'll reopen this.

@briancain briancain reopened this Sep 9, 2021
@jgwhite
Copy link
Contributor

jgwhite commented Sep 9, 2021

@briancain could you point me in the direction of a failure? Would be good to see if the output has changed.

@jgwhite
Copy link
Contributor

jgwhite commented Sep 9, 2021

I think the next relatively cheap thing to try is bumping the version of Node we use.

@jgwhite
Copy link
Contributor

jgwhite commented Sep 9, 2021

Managed to invalidate the Node 14 hypothesis very quickly 😅

Next step:
Run a build with ember-cli’s debug mode enabled (I think it’s DEBUG=* ember build but don’t hold me to that) and see if we can trigger the same failure and see more output.

Alternative:
Try upgrading the five versions of ember-cli-typescript 😬

@jgwhite
Copy link
Contributor

jgwhite commented Sep 9, 2021

I have some workflows running with DEBUG=*. We get A LOT of information, which is great. Should lead us much closer to the source of the issue. As soon as I manage to get one of these builds to fail I’ll post the trace.

@jgwhite
Copy link
Contributor

jgwhite commented Sep 9, 2021

Managed to get a 137 exit code from running DEBUG=* yarn build:test manually in a circle container. That would seem to indicate that its an out-of-memory error, but it’s also possible that the glut of debug output caused the out-of-memory error.

Next step:
Try again with DEBUG=ember-cli:* yarn build:test

@jgwhite
Copy link
Contributor

jgwhite commented Sep 10, 2021

Ran with DEBUG=ember-cli:* and it gives us a bit more context, but I think the output is interleaved due to parallelism:

  ember-cli:preprocessors processing ember-cli-typescript-collision-check +118ms
  ember-cli:preprocessors processing ember-auto-import-analyzer +1ms
  ember-cli:preprocessors processing ember-cli-babel +0ms
  ember-cli:merge-trees [EmberCliMergeTrees(Processed Application and Dependencies)#3090] Returning upstream merged trees +3ms
  ember-cli:merge-trees [EmberCliMergeTrees(Packaged Vendor Javascript)#3091] Returning single input tree +19ms
  ember-cli:merge-trees [EmberCliMergeTrees(Packaged Javascript)#3092] Returning upstream merged trees +0ms
  ember-cli:registry loading css: available plugins [ 'ember-cli-sass' ]; found plugins [ 'ember-cli-sass' ]; +22ms
  ember-cli:preprocessors processing ember-cli-sass +21ms
  ember-cli:merge-trees [EmberCliMergeTrees(TreeMerger (vendorStyles))#3093] Returning single input tree +133ms
  ember-cli:merge-trees [EmberCliMergeTrees(Packaged Styles)#3094] Returning upstream merged trees +1ms
  ember-cli:merge-trees [EmberCliMergeTrees(Processed Additional Assets)#3095] Returning empty tree +0ms
  ember-cli:registry loading js: available plugins [
  'ember-cli-typescript-collision-check',
  'ember-auto-import-analyzer',
  'ember-cli-babel'
]; found plugins [
  'ember-cli-typescript-collision-check',
  'ember-auto-import-analyzer',
  'ember-cli-babel'
]; +135ms
  ember-cli:preprocessors processing ember-cli-typescript-collision-check +135ms
  ember-cli:preprocessors processing ember-auto-import-analyzer +0ms
  ember-cli:preprocessors processing ember-cli-babel +0ms
  ember-cli:merge-trees [EmberCliMergeTrees(Packaged Tests)#3096] Returning upstream merged trees +2ms
  ember-cli:merge-trees [EmberCliMergeTrees(TreeMerger (appTestTrees))#3097] Returning upstream merged trees +1ms
  ember-cli:merge-trees [EmberCliMergeTrees(undefined)#3098] Returning upstream merged trees +1ms
  ember-cli:merge-trees [EmberCliMergeTrees(TreeMerger (testFiles))#3099] Returning upstream merged trees +0ms
  ember-cli:merge-trees [EmberCliMergeTrees(Packaged Tests)#3100] Returning upstream merged trees +0ms
  ember-cli:merge-trees [EmberCliMergeTrees(Application Dist)#3101] Returning upstream merged trees +0ms
Building
Environment: test
building... 
node_modules/waypoint-pb/server_pb.d.ts:961:26 - error TS2694: Namespace '"/home/circleci/project/ui/node_modules/waypoint-pb/server_pb".Ref.Component' has no exported member 'Type'.

961     getType(): Component.Type;
                             ~~~~

node_modules/waypoint-pb/server_pb.d.ts:962:30 - error TS2694: Namespace '"/home/circleci/project/ui/node_modules/waypoint-pb/server_pb".Ref.Component' has no exported member 'Type'.

962     setType(value: Component.Type): Component;
                                 ~~~~

node_modules/waypoint-pb/server_pb.d.ts:977:23 - error TS2694: Namespace '"/home/circleci/project/ui/node_modules/waypoint-pb/server_pb".Ref.Component' has no exported member 'Type'.

977       type: Component.Type,
                          ~~~~

WARNING: Detected collisions between .js and .ts files of the same name. This can result in nondeterministic build output; see https://git.io/JvIwo for more information.
  - waypoint/services/flash-messages.{js,ts}
[BABEL] Note: The code generator has deoptimised the styling of /home/circleci/project/ui/node_modules/axe-core/axe.js as it exceeds the max of 500KB.
[BABEL] Note: The code generator has deoptimised the styling of /home/circleci/project/ui/node_modules/waypoint-pb/server_pb.js as it exceeds the max of 500KB.
events.js:291
      throw er; // Unhandled 'error' event
      ^

Error [ERR_IPC_CHANNEL_CLOSED]: Channel closed
    at ChildProcess.target.send (internal/child_process.js:680:16)
    at Object.sendMessage (/home/circleci/project/ui/node_modules/stagehand/lib/adapters/child-process.js:30:39)
    at CommandCoordinator.sendMessage (/home/circleci/project/ui/node_modules/stagehand/lib/command-coordinator.js:81:23)
    at CommandCoordinator.sendCommand (/home/circleci/project/ui/node_modules/stagehand/lib/command-coordinator.js:36:14)
    at Stagehand.<anonymous> (/home/circleci/project/ui/node_modules/stagehand/lib/stagehand.js:86:44)
    at Generator.next (<anonymous>)
    at /home/circleci/project/ui/node_modules/stagehand/lib/stagehand.js:7:71
    at new Promise (<anonymous>)
    at __awaiter (/home/circleci/project/ui/node_modules/stagehand/lib/stagehand.js:3:12)
    at Stagehand.call (/home/circleci/project/ui/node_modules/stagehand/lib/stagehand.js:82:16)
Emitted 'error' event on ChildProcess instance at:
    at internal/child_process.js:684:35
    at processTicksAndRejections (internal/process/task_queues.js:79:11) {
  code: 'ERR_IPC_CHANNEL_CLOSED'
}

It certainly seems like problems occur in conjunction with the two large files: server_pb.js and axe.js, both of which are out of our control.

Next step:
Try to get debug output with JOBS=1 to reduce interleaving.

@jgwhite
Copy link
Contributor

jgwhite commented Sep 10, 2021

Additional avenue of investigation:
Read the ember-cli-typescript source to better understand where exactly that IPC error could be thrown.

@jgwhite
Copy link
Contributor

jgwhite commented Sep 10, 2021

With JOBS=1 I had three successful builds in a row. This doesn’t prove anything, but I’m to switch gears to try DEBUG=ember-cli-typescript:*.

A thought: if the problem is the typescript type checker running out of memory, can we just disable the check during a build? ember-cli-typescript doesn’t actually use typescript to compile the files, just to type check them. Babel does the compilation. I’m not sure there’s an option for this, still reading the ember-cli-typescript source.

@jgwhite
Copy link
Contributor

jgwhite commented Sep 10, 2021

On a successful built, we get this output with DEBUG=ember-cli-typescript:*:

building... 
  ember-cli-typescript:typecheck-worker Typecheck starting +0ms
  ember-cli-typescript:typecheck-worker Typecheck complete (3 diagnostics) +9s

Followed by a bunch of other stuff. Trying to trigger a failing build to see if those messages are the same.

Note to self: still investigating ember-cli-typescript because it’s the only thing that uses stagehand.

@jgwhite
Copy link
Contributor

jgwhite commented Sep 10, 2021

I’m now in the delightful situation where about five builds in a row have succeeded, without changing anything other than the DEBUG setting 😬

@jgwhite
Copy link
Contributor

jgwhite commented Sep 10, 2021

OK, here we go:

$ ember build --environment=test
Building
Environment: test
building... 
  ember-cli-typescript:typecheck-worker Typecheck starting +0ms
  ember-cli-typescript:typecheck-worker Typecheck complete (3 diagnostics) +9s
node_modules/waypoint-pb/server_pb.d.ts:961:26 - error TS2694: Namespace '"/home/circleci/project/ui/node_modules/waypoint-pb/server_pb".Ref.Component' has no exported member 'Type'.

961     getType(): Component.Type;
                             ~~~~

node_modules/waypoint-pb/server_pb.d.ts:962:30 - error TS2694: Namespace '"/home/circleci/project/ui/node_modules/waypoint-pb/server_pb".Ref.Component' has no exported member 'Type'.

962     setType(value: Component.Type): Component;
                                 ~~~~

node_modules/waypoint-pb/server_pb.d.ts:977:23 - error TS2694: Namespace '"/home/circleci/project/ui/node_modules/waypoint-pb/server_pb".Ref.Component' has no exported member 'Type'.

977       type: Component.Type,
                          ~~~~

WARNING: Detected collisions between .js and .ts files of the same name. This can result in nondeterministic build output; see https://git.io/JvIwo for more information.
  - waypoint/services/flash-messages.{js,ts}
[BABEL] Note: The code generator has deoptimised the styling of /home/circleci/project/ui/node_modules/axe-core/axe.js as it exceeds the max of 500KB.
[BABEL] Note: The code generator has deoptimised the styling of /home/circleci/project/ui/node_modules/waypoint-pb/server_pb.js as it exceeds the max of 500KB.
events.js:291
      throw er; // Unhandled 'error' event
      ^

Error [ERR_IPC_CHANNEL_CLOSED]: Channel closed
    at ChildProcess.target.send (internal/child_process.js:680:16)
    at Object.sendMessage (/home/circleci/project/ui/node_modules/stagehand/lib/adapters/child-process.js:30:39)
    at CommandCoordinator.sendMessage (/home/circleci/project/ui/node_modules/stagehand/lib/command-coordinator.js:81:23)
    at CommandCoordinator.sendCommand (/home/circleci/project/ui/node_modules/stagehand/lib/command-coordinator.js:36:14)
    at Stagehand.<anonymous> (/home/circleci/project/ui/node_modules/stagehand/lib/stagehand.js:86:44)
    at Generator.next (<anonymous>)
    at /home/circleci/project/ui/node_modules/stagehand/lib/stagehand.js:7:71
    at new Promise (<anonymous>)
    at __awaiter (/home/circleci/project/ui/node_modules/stagehand/lib/stagehand.js:3:12)
    at Stagehand.call (/home/circleci/project/ui/node_modules/stagehand/lib/stagehand.js:82:16)
Emitted 'error' event on ChildProcess instance at:
    at internal/child_process.js:684:35
    at processTicksAndRejections (internal/process/task_queues.js:79:11) {
  code: 'ERR_IPC_CHANNEL_CLOSED'
}
error Command failed with exit code 1.

So for all the world it appears that type checking completed successfully, and quickly.

@jgwhite
Copy link
Contributor

jgwhite commented Sep 10, 2021

Nagging thought: what if the process is being killed by docker due to an out-of-memory error, and this stagehand stack trace just happens to be the wait that scenario manifests? i.e. maybe ember-cli-typescript is the wrong place to look. Good to rule it out, I guess.

@jgwhite
Copy link
Contributor

jgwhite commented Sep 10, 2021

Another avenue of investigation:
Bisecting is going to be rather painful, but maybe we could figure out when these errors starting happening from studying Circle’s data. Did we install a new dependency? Did the generated server_pb.js grow just enough to tip Babel over the edge in terms of memory consumption?

@jgwhite
Copy link
Contributor

jgwhite commented Sep 10, 2021

Note: the container is supposed to have 4Gb of memory. I find it hard to believe our build would consume that without a rampant memory leak.

@jgwhite
Copy link
Contributor

jgwhite commented Sep 10, 2021

On the “when did this start?” trail, we should look at the following PRs:

And see if we can produce and ember-build-tests failure on the commit immediately prior to each.

@jgwhite
Copy link
Contributor

jgwhite commented Sep 10, 2021

Another line of investigation:

Checking how much memory the build actually uses.

@jgwhite
Copy link
Contributor

jgwhite commented Sep 10, 2021

Here’s my script, monitoring memory usage over time for node processes:

cd ui
CLEAR_BROCCOLI_PERSISTENT_FILTER_CACHE=true yarn build:test >/dev/null 2>/dev/null &
while true
do
  date
  ps | grep node | egrep -o '[0-9]{3,}' | xargs -n 1 -I "{}" grep ^VmPeak /proc/{}/status
  sleep 1
done

@jgwhite
Copy link
Contributor

jgwhite commented Sep 10, 2021

@jgwhite
Copy link
Contributor

jgwhite commented Sep 10, 2021

Assuming I’ve done my data collecting and munging correctly (highly unlikely), we use a lot of memory during that build:

image

@jgwhite
Copy link
Contributor

jgwhite commented Sep 10, 2021

Here’s another using VmSize instead of VmPeak:

image

@jgwhite
Copy link
Contributor

jgwhite commented Sep 10, 2021

If these memory usage stats are to be believe, then @izaaklauer wisely notes:

We could probably just up the size of the circle vm, hopefully add something that logs or monitors the memory size so we can track it, and see if it’s stable or growing.

@jgwhite
Copy link
Contributor

jgwhite commented Sep 10, 2021

It’s the end of the week for me, and at this point in the investigation, I’m feeling fairly confident we’re dealing with a memory leak or runaway memory consumption by either a bug in one of our build dependencies or perhaps a bug caused by the confluence of our build dependencies.

Next step:
Go for a walk? 😬

@jgwhite
Copy link
Contributor

jgwhite commented Sep 10, 2021

Possible next step:
Take some heap snapshots. See if we can see where all the memory’s being used.

@jgwhite
Copy link
Contributor

jgwhite commented Sep 13, 2021

@gregone found a good guide to taking heap snapshots in Node:

ember-cli/ember-cli#9256 (comment)

@jgwhite
Copy link
Contributor

jgwhite commented Sep 13, 2021

@gregone
Copy link
Contributor

gregone commented Sep 13, 2021

Other things to try:

  • disable sourcemaps for test builds
  • ignore /lib files in babel

@gregone
Copy link
Contributor

gregone commented Sep 13, 2021

#2287 removes almost a minute from the build. We'd need to make sure it all still works for the prod build

@jgwhite
Copy link
Contributor

jgwhite commented Sep 13, 2021

@jgwhite
Copy link
Contributor

jgwhite commented Sep 13, 2021

Tried adding the script from above to a debug branch and got this (from current main):

image

So it’s using all the available real memory. My VmSize stats count virtual memory, so the numbers are higher.

@gregone
Copy link
Contributor

gregone commented Sep 14, 2021

Summary of what I found while experimenting with #2299 & #2297

  • we were maxing out the memory around 4 GB with a parallelization in 2 processes on a 4GB RAM docker container
  • upping the container size to 6GB also maxed out the container RAM
  • setting JOBS=1 brought the max memory usage down to ~= 5GB Bump the ember-build-tests resource class #2297 (comment) (parallelizing an ember build seems to only double memory usage, and has no impact on build time, even when running locally)

I updated #2297 to use JOBS=1 and use a medium+ docker container

@jgwhite
Copy link
Contributor

jgwhite commented Sep 27, 2021

#2355 also addresses this.

@krantzinator krantzinator removed the new label Jul 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
4 participants