PM2_HOME lost in webhook #2756

KaiSchwarz-cnic · 2017-03-08T11:12:21Z

Expected behaviour

PM2_HOME being as set as provided through pm2 startup -hp flag

Actual behaviour

PM2_HOME corresponds to /etc/.pm2 in child process (exec) and thus leading to EACCES error (which we can see the pm2 logs of our webhook app). If you need these logs too, let me know.

Steps to reproduce

So basically everything worked with older versions of pm2 (2.0.16) which we have installed on our old server infrastructure. Node.js v4.6.1 was installed there and also debian 8. That's at least what has changed, not sure if it plays a role in this issue.
PM2_HOME is getting considered correctly everythere, but no longer in our Webhook.
All our pm2 apps are getting started under user node/group node. home directory is then /home/node.
Our webhook reacts on gitlab notifications which we parse and in case it matches our needs, we update our app(s) and do a pm2 restart. Our webhook is realized as a separate PM2 app. Here the config.
On the old servers, we have not set PM2_HOME explicitely. But the above home directory is getting used and everything worked fine (including the webhook). On the new servers, PM2_HOME was also not set correctly, but as the webhook had problems, we also tried pm2 startup -hp accordingly with no success. So basically our question/issue plays around how to set PM2_HOME correctly and why it behaves different (is a bug of the old version as it worked or a bug of the new version as it is no longer working).
So where is that PM2_HOME getting lost exactly? As said our webhook app executes some deploy/build script if some criteria matched in the gitlab notification request. How does that look like/work? The webhook executes a shell command with optional options using require('child_process').exec. See below. Looks like PM2_HOME or even the home directory itself is no longer available in that child process.

Maybe it is already very help to describe a best practise on how to set PM2_HOME, so that we can go step by step through it to identify whats going wrong.

WEBHOOK pm2.json:

{
  "apps": [{
    "name": "WEBHOOK",
    "script": "app.js",
    "error_file": "/var/log/frontend/WEBHOOK/err.log",
    "out_file": "/var/log/frontend/WEBHOOK/out.log",
    "cwd": "/opt/fwebhook",
    "env": {
      "NODE_ENV": "production"
    },
    "merge_logs": true,
    "log_date_format": "YYYY-MM-DD HH:mm:ss Z",
    "autorestart": true,
    "restart_delay": "10000",
    "watch": false
  }]
}

deploy script example:

#!/bin/bash
git fetch origin --prune
git reset --hard origin/master
git clean -f -d
npm prune
npm install
pm2 restart pm2.json

child process exec command/options example:

  command: './deploy.sh master TCP_PRD',

  options: {
      cwd: '/opt/fnode',
      env: {
        'NODE_ENV': 'production'
      }
   }

If you need further inputs or in case something is unclear, feel free to ask. TIA!

Software versions used

OS         : debian 8
node.js    :  4.8.0
PM2        : 2.4.1

PM2 Log output

Use the command: tail --lines 50 ~/.pm2/pm2.log

2017-03-06 21:22:10: Starting execution sequence in -cluster mode- for app name:CP3PRD id:4
2017-03-06 21:22:10: pid=20114 msg=process killed
2017-03-06 21:22:10: Starting execution sequence in -cluster mode- for app name:CP3PRD id:5
2017-03-06 21:22:10: App name:CP3PRD id:4 online
2017-03-06 21:22:10: Stopping app:CP3PRD id:6
2017-03-06 21:22:10: App name:CP3PRD id:5 online
2017-03-06 21:22:10: App name:CP3PRD id:6 disconnected
2017-03-06 21:22:10: App [CP3PRD] with id [6] and pid [20163], exited with code [0] via signal [SIGINT]
2017-03-06 21:22:10: Stopping app:CP3PRD id:7
2017-03-06 21:22:10: App name:CP3PRD id:7 disconnected
2017-03-06 21:22:10: App [CP3PRD] with id [7] and pid [20169], exited with code [0] via signal [SIGINT]
2017-03-06 21:22:10: pid=20163 msg=process killed
2017-03-06 21:22:10: Starting execution sequence in -cluster mode- for app name:CP3PRD id:6
2017-03-06 21:22:10: pid=20169 msg=process killed
2017-03-06 21:22:10: Starting execution sequence in -cluster mode- for app name:CP3PRD id:7
2017-03-06 21:22:10: App name:CP3PRD id:6 online
2017-03-06 21:22:10: Stopping app:CP3PRD id:8
2017-03-06 21:22:10: App name:CP3PRD id:8 disconnected
2017-03-06 21:22:10: App [CP3PRD] with id [8] and pid [20213], exited with code [0] via signal [SIGINT]
2017-03-06 21:22:10: App name:CP3PRD id:7 online
2017-03-06 21:22:10: Stopping app:CP3PRD id:9
2017-03-06 21:22:11: App name:CP3PRD id:9 disconnected
2017-03-06 21:22:11: App [CP3PRD] with id [9] and pid [20223], exited with code [0] via signal [SIGINT]
2017-03-06 21:22:11: pid=20213 msg=process killed
2017-03-06 21:22:11: Starting execution sequence in -cluster mode- for app name:CP3PRD id:8
2017-03-06 21:22:11: App name:CP3PRD id:8 online
2017-03-06 21:22:11: pid=20223 msg=process killed
2017-03-06 21:22:11: Starting execution sequence in -cluster mode- for app name:CP3PRD id:9
2017-03-06 21:22:11: App name:CP3PRD id:9 online
2017-03-06 21:22:11: App [WEBHOOK] with id [10] and pid [4046], exited with code [0] via signal [SIGINT]
2017-03-06 21:22:21: Starting execution sequence in -fork mode- for app name:WEBHOOK id:10
2017-03-06 21:22:21: App name:WEBHOOK id:10 online
2017-03-07 13:39:42: App [WEBHOOK] with id [10] and pid [4707], exited with code [1] via signal [SIGINT]
2017-03-07 13:39:52: Starting execution sequence in -fork mode- for app name:WEBHOOK id:10
2017-03-07 13:39:52: App name:WEBHOOK id:10 online
2017-03-07 14:10:57: App [WEBHOOK] with id [10] and pid [26868], exited with code [1] via signal [SIGINT]
2017-03-07 14:11:07: Starting execution sequence in -fork mode- for app name:WEBHOOK id:10
2017-03-07 14:11:07: App name:WEBHOOK id:10 online
2017-03-07 17:14:17: App [WEBHOOK] with id [10] and pid [5733], exited with code [1] via signal [SIGINT]
2017-03-07 17:14:27: Starting execution sequence in -fork mode- for app name:WEBHOOK id:10
2017-03-07 17:14:27: App name:WEBHOOK id:10 online
2017-03-07 21:02:02: App [WEBHOOK] with id [10] and pid [5789], exited with code [1] via signal [SIGINT]
2017-03-07 21:02:12: Starting execution sequence in -fork mode- for app name:WEBHOOK id:10
2017-03-07 21:02:12: App name:WEBHOOK id:10 online
2017-03-07 21:13:21: App [WEBHOOK] with id [10] and pid [21778], exited with code [1] via signal [SIGINT]
2017-03-07 21:13:31: Starting execution sequence in -fork mode- for app name:WEBHOOK id:10
2017-03-07 21:13:31: App name:WEBHOOK id:10 online
2017-03-08 05:42:47: App [WEBHOOK] with id [10] and pid [25767], exited with code [1] via signal [SIGINT]
2017-03-08 05:42:57: Starting execution sequence in -fork mode- for app name:WEBHOOK id:10
2017-03-08 05:42:57: App name:WEBHOOK id:10 online

cf CONTRIBUTING.md for more informations

The text was updated successfully, but these errors were encountered:

vmarchaud · 2017-03-09T16:12:16Z

Could you check that you had only one daemon running ? ps aux | grep PM2

KaiSchwarz-cnic · 2017-03-09T16:57:54Z

node      7977  0.0  0.9 951344 41348 ?        Ssl  Mar03  36:51 PM2 v2.4.1: God Daemon (/home/node/.pm2)               
root     13916  0.0  0.6 907844 28664 ?        Ssl  Mar08   0:00 PM2 v2.4.1: God Daemon (/root/.pm2)                    
root     15499  0.0  0.0  11128  1032 pts/2    S+   16:56   0:00 grep --color=auto PM2

That points to multiple daemons I think. How to clean that up if that's the case?

vmarchaud · 2017-03-09T17:14:53Z

Just kill the daemon that doesn't old any process (should be the one under root)

KaiSchwarz-cnic · 2017-03-09T17:15:01Z

we have killed that root daemon, but still it fails as described
node 7977 0.0 0.9 952368 41436 ? Ssl Mar03 36:55 PM2 v2.4.1: God Daemon (/home/node/.pm2)

vmarchaud · 2017-03-09T17:18:14Z

Pretty sure the env variable didn't propagated to child process, could you try restarting them to pick-up the changes ?

KaiSchwarz-cnic · 2017-03-09T17:43:37Z

So what I did right now:

pm2 kill
pm2 start
made a change on master branch which triggered the webhook url being called
webhook reacted on that machine, here the log of the App

`
node@fcp3-live1:/opt/fnode$ pm2 log WEBHOOK
[TAILING] Tailing last 15 lines for [WEBHOOK] process (change the value with --lines option)
/var/log/frontend/WEBHOOK/out.log last 15 lines:
0|WEBHOOK | Start PM2 at boot
0|WEBHOOK | $ pm2 startup
0|WEBHOOK |
0|WEBHOOK | Daemonize Application
0|WEBHOOK | $ pm2 start
0|WEBHOOK |
0|WEBHOOK | Monitoring/APM solution
0|WEBHOOK | https://app.keymetrics.io/
0|WEBHOOK |
0|WEBHOOK | -------------
0|WEBHOOK |
0|WEBHOOK |
0|WEBHOOK | 2017-03-09 17:31:56 +00:00: fwebhook: listening on port: 8765
0|WEBHOOK | 2017-03-09 17:32:52 +00:00: fwebhook: Using config CP3_PRD
0|WEBHOOK | 2017-03-09 17:32:52 +00:00: fwebhook: branches allowed: master; requested: staging

For my understanding this means already PM2_HOME not being correctly set prior to the child process exec call. So leaves the question to me how this can be lost after PM2 restarts the app. For sure this leads then also to not having that environment in child process exec at end.

KaiSchwarz-cnic · 2017-03-14T13:45:15Z

@vmarchaud
After we dived deeply into this matter, looks like there had been a change in node.js core (lib/child_process.js) which broke our deployment process. Up to (including) version 4.6.1 they made an default fallback to process.env in child_process.exec if no options were given or no env was provided. This default fallback doesn't work any longer in versions >4.6.1. child_process.exec finally ends up in ChildProcess.spawn which is also documented to do that fallback. Not sure if the node.js api documentation is wrong there or if there's a bug. At least, I'll file a git issue on that after we're done with that issue on our side. I'll keep this thread updated for others if there are further informations worth to add.

soyuka · 2017-03-14T14:05:34Z

@Papakai can you link back the commit / issue? this might impact others! Thanks for the investigation!

KaiSchwarz-cnic · 2017-03-15T10:18:58Z

nodejs/node#11861

There came different things together in this issue: we changed our deployment process in the way to provide the NODE_ENV environment some time ago and this dropped out other process.env vars. We haven't noticed about it as our webhooks stopped restarting themselves and kept using the working one in memory. It left us the impression that node.js changed its behavior, but I assume it did not.
So the above node.js ticket is more about improving there, if applicable.

vmarchaud added the S: Need More Data label Mar 9, 2017

vmarchaud closed this as completed Mar 9, 2017

KaiSchwarz-cnic mentioned this issue Nov 20, 2017

wrong data in pm2 jlist property versioning #3276

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PM2_HOME lost in webhook #2756

PM2_HOME lost in webhook #2756

KaiSchwarz-cnic commented Mar 8, 2017 •

edited

Loading

vmarchaud commented Mar 9, 2017

KaiSchwarz-cnic commented Mar 9, 2017

vmarchaud commented Mar 9, 2017

KaiSchwarz-cnic commented Mar 9, 2017

vmarchaud commented Mar 9, 2017

KaiSchwarz-cnic commented Mar 9, 2017 •

edited

Loading

KaiSchwarz-cnic commented Mar 14, 2017 •

edited

Loading

soyuka commented Mar 14, 2017

KaiSchwarz-cnic commented Mar 15, 2017 •

edited

Loading

PM2_HOME lost in webhook #2756

PM2_HOME lost in webhook #2756

Comments

KaiSchwarz-cnic commented Mar 8, 2017 • edited Loading

Expected behaviour

Actual behaviour

Steps to reproduce

Software versions used

PM2 Log output

vmarchaud commented Mar 9, 2017

KaiSchwarz-cnic commented Mar 9, 2017

vmarchaud commented Mar 9, 2017

KaiSchwarz-cnic commented Mar 9, 2017

vmarchaud commented Mar 9, 2017

KaiSchwarz-cnic commented Mar 9, 2017 • edited Loading

KaiSchwarz-cnic commented Mar 14, 2017 • edited Loading

soyuka commented Mar 14, 2017

KaiSchwarz-cnic commented Mar 15, 2017 • edited Loading

KaiSchwarz-cnic commented Mar 8, 2017 •

edited

Loading

KaiSchwarz-cnic commented Mar 9, 2017 •

edited

Loading

KaiSchwarz-cnic commented Mar 14, 2017 •

edited

Loading

KaiSchwarz-cnic commented Mar 15, 2017 •

edited

Loading