Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aegir has a hard time letting go sometimes #212

Closed
victorb opened this issue Apr 6, 2018 · 10 comments
Closed

aegir has a hard time letting go sometimes #212

victorb opened this issue Apr 6, 2018 · 10 comments
Labels
kind/bug A bug in existing code (including security flaws)

Comments

@victorb
Copy link
Member

victorb commented Apr 6, 2018

I frequently see on CI (Jenkins specifically) that sometimes aegir gets stuck waiting for the tests to finish when they already finished. See this example build: https://ci.ipfs.team/blue/organizations/jenkins/Multiformats%2Fjs-multihash/detail/master/5/pipeline/16

Tests actually finished after ~5 seconds, but aegir doesn't finish until ~130 seconds later, making the CI builds a lot slower than they should.

Probably a process that doesn't nicely exists, and needs to be forced kill, but there is a timeout before that force-kill happens.

I have seen this on a couple of different repositories and also on macOS workers, but seems to happen mostly on Windows.

@victorb
Copy link
Member Author

victorb commented May 3, 2018

Seen this happen on Linux and Windows now as well.

@victorb victorb changed the title aegir on windows has a hard time letting go aegir has a hard time letting go sometimes May 3, 2018
@travisperson
Copy link
Member

Pretty sure I've also noticed this a few times as well on Linux, didn't pay too much attention to it as I thought it was something weird I was doing.

victorb pushed a commit that referenced this issue May 4, 2018
Karma has some issues letting go after the test run. Ref karma-runner/karma#1788 and ampproject/amphtml#14814

This fix basically forces aegir to close after the karma tests have been successfully run, so instead of a test-run taking 13 seconds for the tests to run then 30 seconds for karma to force-close, it finishes in 14 seconds. 

Solves #212
@daviddias daviddias added status/ready Ready to be worked kind/bug A bug in existing code (including security flaws) labels May 30, 2018
@mkg20001
Copy link
Contributor

The reason for that seems to be that the tests do not properly clean up (for example if the swarm is never stopped it listens for connections forever)

@achingbrain
Copy link
Member

Perhaps we could insert mafintosh/why-is-node-running into the test runner somehow and get it to log some output if the process doesn't exit a few seconds after the tests finish?

mkg20001 added a commit to mkg20001/aegir that referenced this issue Jul 4, 2018
@victorb
Copy link
Member Author

victorb commented Jul 4, 2018

Thanks @achingbrain and @mkg20001 for the tips! Seems to be quite a lot of promises (tested the e11dbf4 commit) pending, but will hopefully help to track it down

@victorb
Copy link
Member Author

victorb commented Jul 4, 2018

Seems the issue is in karma somewhere, downgrading the version to 0.13.9 of karma solves the problem. Unless there is a specific reason to use a later version, the simple solution would be to lock ourselves to 0.13.9

@victorb
Copy link
Member Author

victorb commented Sep 24, 2018

I've been using karma 0.13.9 for some quite some time and not find any issues with existing projects. Would there be any objection if I make this downgrade and publish a new version?

Reason for the downgrade is that sometimes tests that should take 3 seconds takes 2 minutes in windows.

@mkg20001
Copy link
Contributor

@victorbjelkholm What about pr #248 ? Does that solve any issues?

@victorb
Copy link
Member Author

victorb commented Sep 24, 2018

@mkg20001 no, it gives bunch of output of what could be the reason behind the hang in the end, but it doesn't solve the problem itself.

@victorb
Copy link
Member Author

victorb commented Sep 24, 2018

Here is one example of the output we get from why-is-node-running when it's hanged.

4 handle(s) keeping the process running

# SIGNALWRAP
C:\projects\aegir\node_modules\signal-exit\index.js:122     - process.on(sig, sigListeners[sig])
C:\projects\aegir\node_modules\signal-exit\index.js:120     - signals = signals.filter(function (sig) {
C:\projects\aegir\node_modules\signal-exit\index.js:35      - load()
C:\projects\aegir\node_modules\loud-rejection\index.js:19   - onExit(function () {
C:\projects\aegir\node_modules\loud-rejection\register.js:2 - require('./')();

# SIGNALWRAP
C:\projects\aegir\node_modules\signal-exit\index.js:122     - process.on(sig, sigListeners[sig])
C:\projects\aegir\node_modules\signal-exit\index.js:120     - signals = signals.filter(function (sig) {
C:\projects\aegir\node_modules\signal-exit\index.js:35      - load()
C:\projects\aegir\node_modules\loud-rejection\index.js:19   - onExit(function () {
C:\projects\aegir\node_modules\loud-rejection\register.js:2 - require('./')();

# SIGNALWRAP
C:\projects\aegir\node_modules\signal-exit\index.js:122     - process.on(sig, sigListeners[sig])
C:\projects\aegir\node_modules\signal-exit\index.js:120     - signals = signals.filter(function (sig) {
C:\projects\aegir\node_modules\signal-exit\index.js:35      - load()
C:\projects\aegir\node_modules\loud-rejection\index.js:19   - onExit(function () {
C:\projects\aegir\node_modules\loud-rejection\register.js:2 - require('./')();

# SIGNALWRAP
C:\projects\aegir\node_modules\signal-exit\index.js:122     - process.on(sig, sigListeners[sig])
C:\projects\aegir\node_modules\signal-exit\index.js:120     - signals = signals.filter(function (sig) {
C:\projects\aegir\node_modules\signal-exit\index.js:35      - load()
C:\projects\aegir\node_modules\loud-rejection\index.js:19   - onExit(function () {
C:\projects\aegir\node_modules\loud-rejection\register.js:2 - require('./')();

# DNSCHANNEL
C:\projects\aegir\node_modules\karma\lib\utils\net-utils.js:23          - .listen(port, listenAddress)
C:\projects\aegir\node_modules\bluebird\js\release\debuggability.js:313 - executor(resolve, reject);
C:\projects\aegir\node_modules\bluebird\js\release\promise.js:483       - var r = this._execute(executor, function(value) {
C:\projects\aegir\node_modules\bluebird\js\release\promise.js:79        - this._resolveFromExecutor(executor);
C:\projects\aegir\node_modules\karma\lib\utils\net-utils.js:8           - return new Promise((resolve, reject) => {

# HTTPPARSER
(unknown stack trace)

# HTTPPARSER
(unknown stack trace)

# HTTPPARSER
(unknown stack trace)

# HTTPPARSER
(unknown stack trace)

# HTTPPARSER
(unknown stack trace)

# ZLIB
C:\projects\aegir\node_modules\ws\lib\PerMessageDeflate.js:345 - this._inflate = zlib.createInflateRaw({ windowBits });
C:\projects\aegir\node_modules\ws\lib\PerMessageDeflate.js:304 - this._decompress(data, fin, (err, result) => {
C:\projects\aegir\node_modules\async-limiter\index.js:43       - job(this._done);
C:\projects\aegir\node_modules\async-limiter\index.js:25       - this._run();
C:\projects\aegir\node_modules\ws\lib\PerMessageDeflate.js:303 - zlibLimiter.push((done) => {
C:\projects\aegir\node_modules\ws\lib\Receiver.js:343          - perMessageDeflate.decompress(data, this._fin, (err, buf) => {
C:\projects\aegir\node_modules\ws\lib\Receiver.js:328          - this.decompress(data);
C:\projects\aegir\node_modules\ws\lib\Receiver.js:165          - this.getData();
C:\projects\aegir\node_modules\ws\lib\Receiver.js:139          - this.startLoop();
C:\projects\aegir\node_modules\ws\lib\WebSocket.js:138         - this._receiver.add(data);

# Timeout
C:\projects\aegir\node_modules\engine.io\lib\socket.js:139             - self.pingTimeoutTimer = setTimeout(function () {
C:\projects\aegir\node_modules\engine.io\lib\socket.js:95              - this.setPingTimeout();
C:\projects\aegir\node_modules\engine.io\lib\transport.js:105          - this.emit('packet', packet);
C:\projects\aegir\node_modules\engine.io\lib\transports\polling.js:208 - self.onPacket(packet);
C:\projects\aegir\node_modules\engine.io-parser\lib\index.js:313       - var more = callback(packet, i + n, l);
C:\projects\aegir\node_modules\engine.io\lib\transports\polling.js:211 - parser.decodePayload(data, callback);
C:\projects\aegir\node_modules\engine.io\lib\transports\polling.js:171 - self.onData(chunks);

# HTTPPARSER
(unknown stack trace)

@ghost ghost removed the status/ready Ready to be worked label Feb 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws)
Projects
None yet
Development

No branches or pull requests

6 participants