Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PM2_HOME lost in webhook #2756

Closed
KaiSchwarz-cnic opened this issue Mar 8, 2017 · 9 comments
Closed

PM2_HOME lost in webhook #2756

KaiSchwarz-cnic opened this issue Mar 8, 2017 · 9 comments

Comments

@KaiSchwarz-cnic
Copy link

KaiSchwarz-cnic commented Mar 8, 2017

Expected behaviour

PM2_HOME being as set as provided through pm2 startup -hp flag

Actual behaviour

PM2_HOME corresponds to /etc/.pm2 in child process (exec) and thus leading to EACCES error (which we can see the pm2 logs of our webhook app). If you need these logs too, let me know.

Steps to reproduce

So basically everything worked with older versions of pm2 (2.0.16) which we have installed on our old server infrastructure. Node.js v4.6.1 was installed there and also debian 8. That's at least what has changed, not sure if it plays a role in this issue.
PM2_HOME is getting considered correctly everythere, but no longer in our Webhook.
All our pm2 apps are getting started under user node/group node. home directory is then /home/node.
Our webhook reacts on gitlab notifications which we parse and in case it matches our needs, we update our app(s) and do a pm2 restart. Our webhook is realized as a separate PM2 app. Here the config.
On the old servers, we have not set PM2_HOME explicitely. But the above home directory is getting used and everything worked fine (including the webhook). On the new servers, PM2_HOME was also not set correctly, but as the webhook had problems, we also tried pm2 startup -hp accordingly with no success. So basically our question/issue plays around how to set PM2_HOME correctly and why it behaves different (is a bug of the old version as it worked or a bug of the new version as it is no longer working).
So where is that PM2_HOME getting lost exactly? As said our webhook app executes some deploy/build script if some criteria matched in the gitlab notification request. How does that look like/work? The webhook executes a shell command with optional options using require('child_process').exec. See below. Looks like PM2_HOME or even the home directory itself is no longer available in that child process.

Maybe it is already very help to describe a best practise on how to set PM2_HOME, so that we can go step by step through it to identify whats going wrong.

WEBHOOK pm2.json:

{
  "apps": [{
    "name": "WEBHOOK",
    "script": "app.js",
    "error_file": "/var/log/frontend/WEBHOOK/err.log",
    "out_file": "/var/log/frontend/WEBHOOK/out.log",
    "cwd": "/opt/fwebhook",
    "env": {
      "NODE_ENV": "production"
    },
    "merge_logs": true,
    "log_date_format": "YYYY-MM-DD HH:mm:ss Z",
    "autorestart": true,
    "restart_delay": "10000",
    "watch": false
  }]
}

deploy script example:

#!/bin/bash
git fetch origin --prune
git reset --hard origin/master
git clean -f -d
npm prune
npm install
pm2 restart pm2.json

child process exec command/options example:

  command: './deploy.sh master TCP_PRD',

  options: {
      cwd: '/opt/fnode',
      env: {
        'NODE_ENV': 'production'
      }
   }

If you need further inputs or in case something is unclear, feel free to ask. TIA!

Software versions used

OS         : debian 8
node.js    :  4.8.0
PM2        : 2.4.1

PM2 Log output

Use the command: tail --lines 50 ~/.pm2/pm2.log

2017-03-06 21:22:10: Starting execution sequence in -cluster mode- for app name:CP3PRD id:4
2017-03-06 21:22:10: pid=20114 msg=process killed
2017-03-06 21:22:10: Starting execution sequence in -cluster mode- for app name:CP3PRD id:5
2017-03-06 21:22:10: App name:CP3PRD id:4 online
2017-03-06 21:22:10: Stopping app:CP3PRD id:6
2017-03-06 21:22:10: App name:CP3PRD id:5 online
2017-03-06 21:22:10: App name:CP3PRD id:6 disconnected
2017-03-06 21:22:10: App [CP3PRD] with id [6] and pid [20163], exited with code [0] via signal [SIGINT]
2017-03-06 21:22:10: Stopping app:CP3PRD id:7
2017-03-06 21:22:10: App name:CP3PRD id:7 disconnected
2017-03-06 21:22:10: App [CP3PRD] with id [7] and pid [20169], exited with code [0] via signal [SIGINT]
2017-03-06 21:22:10: pid=20163 msg=process killed
2017-03-06 21:22:10: Starting execution sequence in -cluster mode- for app name:CP3PRD id:6
2017-03-06 21:22:10: pid=20169 msg=process killed
2017-03-06 21:22:10: Starting execution sequence in -cluster mode- for app name:CP3PRD id:7
2017-03-06 21:22:10: App name:CP3PRD id:6 online
2017-03-06 21:22:10: Stopping app:CP3PRD id:8
2017-03-06 21:22:10: App name:CP3PRD id:8 disconnected
2017-03-06 21:22:10: App [CP3PRD] with id [8] and pid [20213], exited with code [0] via signal [SIGINT]
2017-03-06 21:22:10: App name:CP3PRD id:7 online
2017-03-06 21:22:10: Stopping app:CP3PRD id:9
2017-03-06 21:22:11: App name:CP3PRD id:9 disconnected
2017-03-06 21:22:11: App [CP3PRD] with id [9] and pid [20223], exited with code [0] via signal [SIGINT]
2017-03-06 21:22:11: pid=20213 msg=process killed
2017-03-06 21:22:11: Starting execution sequence in -cluster mode- for app name:CP3PRD id:8
2017-03-06 21:22:11: App name:CP3PRD id:8 online
2017-03-06 21:22:11: pid=20223 msg=process killed
2017-03-06 21:22:11: Starting execution sequence in -cluster mode- for app name:CP3PRD id:9
2017-03-06 21:22:11: App name:CP3PRD id:9 online
2017-03-06 21:22:11: App [WEBHOOK] with id [10] and pid [4046], exited with code [0] via signal [SIGINT]
2017-03-06 21:22:21: Starting execution sequence in -fork mode- for app name:WEBHOOK id:10
2017-03-06 21:22:21: App name:WEBHOOK id:10 online
2017-03-07 13:39:42: App [WEBHOOK] with id [10] and pid [4707], exited with code [1] via signal [SIGINT]
2017-03-07 13:39:52: Starting execution sequence in -fork mode- for app name:WEBHOOK id:10
2017-03-07 13:39:52: App name:WEBHOOK id:10 online
2017-03-07 14:10:57: App [WEBHOOK] with id [10] and pid [26868], exited with code [1] via signal [SIGINT]
2017-03-07 14:11:07: Starting execution sequence in -fork mode- for app name:WEBHOOK id:10
2017-03-07 14:11:07: App name:WEBHOOK id:10 online
2017-03-07 17:14:17: App [WEBHOOK] with id [10] and pid [5733], exited with code [1] via signal [SIGINT]
2017-03-07 17:14:27: Starting execution sequence in -fork mode- for app name:WEBHOOK id:10
2017-03-07 17:14:27: App name:WEBHOOK id:10 online
2017-03-07 21:02:02: App [WEBHOOK] with id [10] and pid [5789], exited with code [1] via signal [SIGINT]
2017-03-07 21:02:12: Starting execution sequence in -fork mode- for app name:WEBHOOK id:10
2017-03-07 21:02:12: App name:WEBHOOK id:10 online
2017-03-07 21:13:21: App [WEBHOOK] with id [10] and pid [21778], exited with code [1] via signal [SIGINT]
2017-03-07 21:13:31: Starting execution sequence in -fork mode- for app name:WEBHOOK id:10
2017-03-07 21:13:31: App name:WEBHOOK id:10 online
2017-03-08 05:42:47: App [WEBHOOK] with id [10] and pid [25767], exited with code [1] via signal [SIGINT]
2017-03-08 05:42:57: Starting execution sequence in -fork mode- for app name:WEBHOOK id:10
2017-03-08 05:42:57: App name:WEBHOOK id:10 online

cf CONTRIBUTING.md for more informations

@vmarchaud
Copy link
Contributor

Could you check that you had only one daemon running ? ps aux | grep PM2

@KaiSchwarz-cnic
Copy link
Author

node      7977  0.0  0.9 951344 41348 ?        Ssl  Mar03  36:51 PM2 v2.4.1: God Daemon (/home/node/.pm2)               
root     13916  0.0  0.6 907844 28664 ?        Ssl  Mar08   0:00 PM2 v2.4.1: God Daemon (/root/.pm2)                    
root     15499  0.0  0.0  11128  1032 pts/2    S+   16:56   0:00 grep --color=auto PM2

That points to multiple daemons I think. How to clean that up if that's the case?

@vmarchaud
Copy link
Contributor

Just kill the daemon that doesn't old any process (should be the one under root)

@KaiSchwarz-cnic
Copy link
Author

we have killed that root daemon, but still it fails as described
node 7977 0.0 0.9 952368 41436 ? Ssl Mar03 36:55 PM2 v2.4.1: God Daemon (/home/node/.pm2)

@vmarchaud
Copy link
Contributor

Pretty sure the env variable didn't propagated to child process, could you try restarting them to pick-up the changes ?

@KaiSchwarz-cnic
Copy link
Author

KaiSchwarz-cnic commented Mar 9, 2017

So what I did right now:

  • pm2 kill
  • pm2 start
  • made a change on master branch which triggered the webhook url being called
  • webhook reacted on that machine, here the log of the App

`
node@fcp3-live1:/opt/fnode$ pm2 log WEBHOOK
[TAILING] Tailing last 15 lines for [WEBHOOK] process (change the value with --lines option)
/var/log/frontend/WEBHOOK/out.log last 15 lines:
0|WEBHOOK | Start PM2 at boot
0|WEBHOOK | $ pm2 startup
0|WEBHOOK |
0|WEBHOOK | Daemonize Application
0|WEBHOOK | $ pm2 start
0|WEBHOOK |
0|WEBHOOK | Monitoring/APM solution
0|WEBHOOK | https://app.keymetrics.io/
0|WEBHOOK |
0|WEBHOOK | -------------
0|WEBHOOK |
0|WEBHOOK |
0|WEBHOOK | 2017-03-09 17:31:56 +00:00: fwebhook: listening on port: 8765
0|WEBHOOK | 2017-03-09 17:32:52 +00:00: fwebhook: Using config CP3_PRD
0|WEBHOOK | 2017-03-09 17:32:52 +00:00: fwebhook: branches allowed: master; requested: staging

/var/log/frontend/WEBHOOK/err.log last 15 lines:
0|WEBHOOK | at Module.load (module.js:343:32)
0|WEBHOOK | fs.js:549
0|WEBHOOK | return binding.open(pathModule._makeLong(path), stringToFlags(flags), mode);
0|WEBHOOK | ^
0|WEBHOOK |
0|WEBHOOK | Error: ENOENT: no such file or directory, open '/etc/.pm2/pm2.log'
0|WEBHOOK | at Error (native)
0|WEBHOOK | at Object.fs.openSync (fs.js:549:18)
0|WEBHOOK | at Client.launchDaemon (/usr/lib/node_modules/pm2/lib/Client.js:208:14)
0|WEBHOOK | at /usr/lib/node_modules/pm2/lib/Client.js:101:10
0|WEBHOOK | at /usr/lib/node_modules/pm2/lib/Client.js:285:14
0|WEBHOOK | at nextTickCallbackWith0Args (node.js:489:9)
0|WEBHOOK | at process._tickCallback (node.js:418:13)
0|WEBHOOK |
0|WEBHOOK | 2017-03-09 17:32:52 +00:00: fwebhook: Wrong branch in request.
`

For my understanding this means already PM2_HOME not being correctly set prior to the child process exec call. So leaves the question to me how this can be lost after PM2 restarts the app. For sure this leads then also to not having that environment in child process exec at end.

@KaiSchwarz-cnic
Copy link
Author

KaiSchwarz-cnic commented Mar 14, 2017

@vmarchaud
After we dived deeply into this matter, looks like there had been a change in node.js core (lib/child_process.js) which broke our deployment process. Up to (including) version 4.6.1 they made an default fallback to process.env in child_process.exec if no options were given or no env was provided. This default fallback doesn't work any longer in versions >4.6.1. child_process.exec finally ends up in ChildProcess.spawn which is also documented to do that fallback. Not sure if the node.js api documentation is wrong there or if there's a bug. At least, I'll file a git issue on that after we're done with that issue on our side. I'll keep this thread updated for others if there are further informations worth to add.

@soyuka
Copy link
Collaborator

soyuka commented Mar 14, 2017

@Papakai can you link back the commit / issue? this might impact others! Thanks for the investigation!

@KaiSchwarz-cnic
Copy link
Author

KaiSchwarz-cnic commented Mar 15, 2017

nodejs/node#11861

There came different things together in this issue: we changed our deployment process in the way to provide the NODE_ENV environment some time ago and this dropped out other process.env vars. We haven't noticed about it as our webhooks stopped restarting themselves and kept using the working one in memory. It left us the impression that node.js changed its behavior, but I assume it did not.
So the above node.js ticket is more about improving there, if applicable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants