Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web node kept getting recreated. #184

Closed
huiz-psma opened this issue Apr 5, 2022 · 2 comments
Closed

Web node kept getting recreated. #184

huiz-psma opened this issue Apr 5, 2022 · 2 comments

Comments

@huiz-psma
Copy link

huiz-psma commented Apr 5, 2022

Here are the output from bosh events. Eventually, increase the instance size from t3.small to t3.large and it seemed to be working at the moment. Any idea what could be wrong? The version of control tower is 0.17.30. Thank you in advance.

`94 Tue Apr 5 01:04:31 UTC 2022 hm create alert dfd191e7-917a-478a-8a6f-5dc3f9885a38 - concourse - message: 'concourse has instances with timed out agents. Alert @ 2022-04-05 01:04:31 -
UTC, severity 2: concourse has instances with timed out agents'

93 Tue Apr 5 01:04:31 UTC 2022 hm create alert e8b84dd6-5eaa-4305-b1c2-ca7f6b658db1 - concourse - message: 'Scan unresponsive VMs. Alert @ 2022-04-05 01:04:31 UTC, severity 4: Notifying -
Director to scan instances: web/83a6f51b-0edc-4e56-9b8f-89110458dc87; deployment:
''concourse''; 1 of 2 agents are unhealthy (50.0%)'

92 Tue Apr 5 01:04:31 UTC 2022 hm create alert d4a2a87c-dac2-496f-9586-5f522918a30f - concourse web/83a6f51b-0edc-4e56-9b8f-89110458dc87 message: 'f7aaacac-06cb-4f48-a018-f65d5e521ce6 has timed out. Alert @ 2022-04-05 01:04:31 -
UTC, severity 2: f7aaacac-06cb-4f48-a018-f65d5e521ce6 has timed out'

91 Tue Apr 5 01:02:45 UTC 2022 hm create alert 1649120565.135951534@f7aaacac-06cb-4f48-a018-f65d5e521ce6 - concourse web/83a6f51b-0edc-4e56-9b8f-89110458dc87 message: 'influxdb () - Does not exist - restart. Alert -
@ 2022-04-05 01:02:45 UTC, severity 1: process is not running'
90 Tue Apr 5 00:51:42 UTC 2022 hm create alert 1649119902.322526585@f7aaacac-06cb-4f48-a018-f65d5e521ce6 - concourse web/83a6f51b-0edc-4e56-9b8f-89110458dc87 message: 'bosh-dns () - Does not exist - restart. Alert -
@ 2022-04-05 00:51:42 UTC, severity 1: process is not running'

89 Tue Apr 5 00:48:31 UTC 2022 hm create alert 1649119711.1897288383@f7aaacac-06cb-4f48-a018-f65d5e521ce6 - concourse web/83a6f51b-0edc-4e56-9b8f-89110458dc87 message: 'influxdb () - PPID changed - alert. Alert @ 2022-04-05 -
00:48:31 UTC, severity 4: process PPID changed from 21741 to 1'

88 Tue Apr 5 00:46:46 UTC 2022 hm create alert 1649119606.208047161@f7aaacac-06cb-4f48-a018-f65d5e521ce6 - concourse web/83a6f51b-0edc-4e56-9b8f-89110458dc87 message: 'influxdb () - Does not exist - restart. Alert -
@ 2022-04-05 00:46:46 UTC, severity 1: process is not running'

87 Tue Apr 5 00:34:29 UTC 2022 hm create alert 1649118869.814008110@f7aaacac-06cb-4f48-a018-f65d5e521ce6 - concourse web/83a6f51b-0edc-4e56-9b8f-89110458dc87 message: 'bosh-dns () - Does not exist - restart. Alert -
@ 2022-04-05 00:34:29 UTC, severity 1: process is not running'

86 Tue Apr 5 00:29:09 UTC 2022 hm create alert 1649118549.513588094@f7aaacac-06cb-4f48-a018-f65d5e521ce6 - concourse web/83a6f51b-0edc-4e56-9b8f-89110458dc87 message: 'influxdb () - PPID changed - alert. Alert @ 2022-04-05 -
00:29:09 UTC, severity 4: process PPID changed from 21367 to 1'

85 Tue Apr 5 00:27:04 UTC 2022 hm create alert 1649118424.1360800243@f7aaacac-06cb-4f48-a018-f65d5e521ce6 - concourse web/83a6f51b-0edc-4e56-9b8f-89110458dc87 message: 'influxdb () - Does not exist - restart. Alert -
@ 2022-04-05 00:27:04 UTC, severity 1: process is not running'

84 Tue Apr 5 00:13:50 UTC 2022 hm create alert 1649117630.1864908099@f7aaacac-06cb-4f48-a018-f65d5e521ce6 - concourse web/83a6f51b-0edc-4e56-9b8f-89110458dc87 message: 'bosh-dns () - Does not exist - restart. Alert -
@ 2022-04-05 00:13:50 UTC, severity 1: process is not running'

83 Fri Apr 1 04:33:06 UTC 2022 hm create alert 1648787586.404132025@f7aaacac-06cb-4f48-a018-f65d5e521ce6 - concourse web/83a6f51b-0edc-4e56-9b8f-89110458dc87 message: 'influxdb () - PPID changed - alert. Alert @ 2022-04-01 -
04:33:06 UTC, severity 4: process PPID changed from 21553 to 1'

82 Fri Apr 1 04:32:02 UTC 2022 hm create alert 1648787522.169114782@f7aaacac-06cb-4f48-a018-f65d5e521ce6 - concourse web/83a6f51b-0edc-4e56-9b8f-89110458dc87 message: 'influxdb () - Does not exist - restart. Alert -
@ 2022-04-01 04:32:02 UTC, severity 1: process is not running'

81 Fri Apr 1 04:13:52 UTC 2022 hm create alert 1648786432.419022263@f7aaacac-06cb-4f48-a018-f65d5e521ce6 - concourse web/83a6f51b-0edc-4e56-9b8f-89110458dc87 message: 'bosh-dns () - Does not exist - restart. Alert -
@ 2022-04-01 04:13:52 UTC, severity 1: process is not running'`

@crsimmons
Copy link
Contributor

This looks like a duplicate of #91. My best guess is that the combination of the web process and the metrics processes were too much for the t3.small.

The options are either to scale up (as you did) or if you don't use the included grafana dashboard/metrics you can disable it with the --no-metrics flag on version 0.18.2+.

@huiz-psma
Copy link
Author

Thanks. I will close this issue as so far the scaled up web node is doing fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants