self update job resets team permissions #91

RichardBradley · 2021-03-31T13:01:43Z

I have GitHub auth federation set up and a "main" team in Concourse that looks like this:

roles:
- name: owner
  local:
    users: ["admin"]
- name: pipeline-operator
  github:
    teams: ["myorg:myteam"]

Every time I run the "self update" job, it resets all the permissions and I have to log in as the root admin user and re-apply my team's permissions with the yaml file.

I don't know if this is related to GitHub auth federation.

I feel like this is a bug and the self update should not change permissions.

The text was updated successfully, but these errors were encountered:

crsimmons · 2021-03-31T13:20:10Z

Yeah we face a similar issue on our own Concourse where we have github auth on the main team. Main team auth is configured as part of the BOSH manifest when deploying Concourse and we don't expose the flags through Control Tower. This means every deploy will apply the manifest and wipe custom main team auth. Auth on other teams shouldn't be impacted. Given the current implementation of Control Tower this is expected behaviour.

I made concourse-mgmt in an attempt to create tooling for managing Concourse teams from Concourse. Our mitigation for this problem is to run a variation of that pipeline every 10 minutes that ensures team auth is set properly.

RichardBradley · 2022-03-03T15:30:25Z

We are now seeing this happen (i.e. all pipelines disappear but resetting the auth brings them back) much more frequently. This is happening at least once a week, even though our self update job has only been run twice in the past 3 months.

I don't really know how to diagnose or investigate this. Any suggestions would be gratefully received.

I haven't attempted to move to a non-"main" team, as I understand we will lose all our history. Perhaps it's worth taking that hit if the "main" team is not usable for a default install of control-tower.

crsimmons · 2022-03-03T17:37:34Z

The Control Tower instance we use at EngineerBetter has all the pipelines in the main team with github auth configured. I'm not aware of auth getting wiped outside of upgrades. We do run a pipeline that re-applies the team config every 10 minutes though so it might be hiding the issue.

In theory if the github auth config is getting stripped from the main team outside of control tower upgrades then it's either bosh recreating the web instance (the main team as defined in the manifest only has basic auth) or it's a bug in Concourse. I guess you could check if your web instances are getting restarted.

RichardBradley · 2022-03-16T15:44:47Z

We do run a pipeline that re-applies the team config every 10 minutes though so it might be hiding the issue.

We did that, but it has made things worse; the web machine is being killed and restarted fairly frequently now, making the web ui unusable.

We are attempting to investigate to see if we can figure out why. Any suggestions gratefully received!

RichardBradley · 2022-03-16T15:51:16Z

Looks like OOM on the "web" machine is causing this restart loop.
We will try deploying a larger server with control-tower deploy --web-size medium.

Any idea why this might happen or how to stop it happening again? We're not intending to do anything unusual with control-tower and were hoping to not need to peek inside the black box.

2022-03-16T15:45:22.085786+00:00 8593cba8-5f6e-4d7e-95b1-012eee77b396 kernel: [ 1294.206443] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=runc-bpm-uaa.scope,mems_allowed=0,global_oom,task_memcg=/,
task=influxd,pid=11559,uid=1000
2022-03-16T15:45:22.085787+00:00 8593cba8-5f6e-4d7e-95b1-012eee77b396 kernel: [ 1294.206465] Out of memory: Killed process 11559 (influxd) total-vm:4456212kB, anon-rss:1019316kB, file-rss:0kB, shmem-rss:0kB, UI
D:1000 pgtables:7120kB oom_score_adj:0
2022-03-16T15:45:22.207469+00:00 8593cba8-5f6e-4d7e-95b1-012eee77b396 kernel: [ 1294.361999] oom_reaper: reaped process 11559 (influxd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

crsimmons · 2022-03-16T16:29:39Z

We colocate influxdb and grafana on the web vm for the out-of-the-box metrics. I guess it's possible that Concourse is producing a high volume of metrics which is using up too much memory. I've also seen it before where having a frequent refresh rate on the grafana dashboard slows down the web instance. I would expect scaling the size of the web vm might resolve it.

RichardBradley · 2022-03-16T16:53:54Z

Thanks.

Increasing the instance size does seem to have helped so far. We'll keep an eye on it. I'll update here if we have anything further.

It's not ideal that the web machine enters a restart loop when under memory pressure. Ideally it would just run slower.

(Also, it's definitely not ideal that bosh auto restarting the web machine wipes the team permissions.)

(We don't use influxdb or grafana. I think I asked on a different issue how to turn them off.)

crsimmons · 2022-03-18T11:00:18Z

I added a flag last night that lets you opt out of deploying the colocated metrics stack. If you download the new release then you can deploy with --no-metrics to get rid of those extra processes.

beccar97 · 2022-03-18T15:47:16Z

Thanks, unfortunately I tried using this release to deploy with the new flag and hit the following error:

Error getting CPI info:
  Executing external CPI command: '/home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/jobs/aws_cpi/bin/cpi':
    Running command: '/home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/jobs/aws_cpi/bin/cpi', stdout: '', stderr: 'bundler: failed to load command:/home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/bosh_aws_cpi/bin/aws_cpi (/home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/bosh_aws_cpi/bin/aws_cpi)
/home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/ruby-3.1.0-r0.81.0/lib/ruby/3.1.0/net/https.rb:23:in `require': cannot load such file -- openssl (LoadError)
Did you mean?  open3
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/ruby-3.1.0-r0.81.0/lib/ruby/3.1.0/net/https.rb:23:in `<top (required)>'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/bosh_aws_cpi/vendor/bundle/ruby/3.1.0/gems/aws-sdk-core-3.113.1/lib/seahorse/client/net_http/connection_pool.rb:5:in `require'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/bosh_aws_cpi/vendor/bundle/ruby/3.1.0/gems/aws-sdk-core-3.113.1/lib/seahorse/client/net_http/connection_pool.rb:5:in `<top (required)>'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/bosh_aws_cpi/vendor/bundle/ruby/3.1.0/gems/aws-sdk-core-3.113.1/lib/seahorse.rb:36:in `require_relative'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/bosh_aws_cpi/vendor/bundle/ruby/3.1.0/gems/aws-sdk-core-3.113.1/lib/seahorse.rb:36:in `<top (required)>'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/bosh_aws_cpi/vendor/bundle/ruby/3.1.0/gems/aws-sdk-core-3.113.1/lib/aws-sdk-core.rb:4:in `require'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/bosh_aws_cpi/vendor/bundle/ruby/3.1.0/gems/aws-sdk-core-3.113.1/lib/aws-sdk-core.rb:4:in `<top (required)>'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/bosh_aws_cpi/lib/cloud/aws.rb:5:in `require'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/bosh_aws_cpi/lib/cloud/aws.rb:5:in `<top (required)>'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/bosh_aws_cpi/bin/aws_cpi:7:in `require'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/bosh_aws_cpi/bin/aws_cpi:7:in `<top (required)>'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/ruby-3.1.0-r0.81.0/lib/ruby/site_ruby/3.1.0/bundler/cli/exec.rb:58:in `load'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/ruby-3.1.0-r0.81.0/lib/ruby/site_ruby/3.1.0/bundler/cli/exec.rb:58:in `kernel_load'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/ruby-3.1.0-r0.81.0/lib/ruby/site_ruby/3.1.0/bundler/cli/exec.rb:23:in `run'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/ruby-3.1.0-r0.81.0/lib/ruby/site_ruby/3.1.0/bundler/cli.rb:484:in `exec'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/ruby-3.1.0-r0.81.0/lib/ruby/site_ruby/3.1.0/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/ruby-3.1.0-r0.81.0/lib/ruby/site_ruby/3.1.0/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/ruby-3.1.0-r0.81.0/lib/ruby/site_ruby/3.1.0/bundler/vendor/thor/lib/thor.rb:392:in `dispatch'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/ruby-3.1.0-r0.81.0/lib/ruby/site_ruby/3.1.0/bundler/cli.rb:31:in `dispatch'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/ruby-3.1.0-r0.81.0/lib/ruby/site_ruby/3.1.0/bundler/vendor/thor/lib/thor/base.rb:485:in `start'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/ruby-3.1.0-r0.81.0/lib/ruby/site_ruby/3.1.0/bundler/cli.rb:25:in `start'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/ruby-3.1.0-r0.81.0/lib/ruby/gems/3.1.0/gems/bundler-2.3.5/exe/bundle:48:in `block in <top (required)>'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/ruby-3.1.0-r0.81.0/lib/ruby/site_ruby/3.1.0/bundler/friendly_errors.rb:103:in `with_friendly_errors'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/ruby-3.1.0-r0.81.0/lib/ruby/gems/3.1.0/gems/bundler-2.3.5/exe/bundle:36:in `<top (required)>'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/ruby-3.1.0-r0.81.0/bin/bundle:25:in `load'
        from /home/ssm-user/.bosh/installations/9db8d17f-127d-4d76-4d04-58408c85d780/packages/ruby-3.1.0-r0.81.0/bin/bundle:25:in `<main>'
':
      exit status 1

Exit code

I found the same issue with the 0.18.0 and 0.18.1 releases, and had to go back to 0.17.30 to complete a successful deployment.

crsimmons · 2022-03-18T16:51:57Z

Weird. That doesn't look like it should be related to anything in the new release(s). Sometimes the contents of the local ~/.bosh directory can get inexplicably broken. You could try deleting/renaming that directory and trying again.

Another possibility is that one of the bosh prerequisites has gotten broken somehow on your machine.

beccar97 · 2022-03-21T16:33:45Z

Deleting that directory and updating all the prereqs fixed it, thanks! I noticed that although Grafana etc. are no longer running, which is great, there are still security group rules added in the -atc group for ports 3000, 8844 and 8443, all of which I believe are related to metrics. It would be nice if when using the --no-metrics flag these rules weren't created, since they are unneeded. Thanks for adding the flag, it's good to know that we've not got them using up space/memory unneccessarily anymore :)

crsimmons · 2022-03-21T17:04:50Z

I forgot about the firewall ports. I'll look into patching that out.

I'm glad you managed to get it deployed 😄. Why the ~/.bosh directory sometimes breaks is still a mystery to me even after all these years of working with BOSH...

RichardBradley · 2022-03-23T16:48:36Z

Increasing the instance size does seem to have helped so far. We'll keep an eye on it. I'll update here if we have anything further.

This does seem to have fixed things for us. Thanks for your help.

(The original issue at the top of this thread remains, AFAIK)

crsimmons · 2022-03-28T09:05:24Z

I cut 0.18.2 over the weekend to remove the metrics ports from the firewall when disabling metrics. FYI ports 8844 and 8443 are credhub and UAA respectively so they are still required.

The original issue is more of a feature request to configure github auth on the main team. I'll leave the issue open until that gets looked at.

crsimmons · 2022-04-02T15:15:14Z

I just cut 0.19.0 which adds flags for configuring github auth on the main team at deploy time. These settings should persist through web recreations.

A small note is that the Concourse release options I chose to use only support setting the owner role on the main team. There is a more free-form option in the release where you can provide your own config which would support configuring other roles but I wasn't sure how to cleanly let users pass multiline strings to flags in Control-Tower so I left it out for now.

RichardBradley mentioned this issue Mar 31, 2021

option to remove grafana and cloud foundry web UIs? #90

Open

crsimmons added the enhancement New feature or request label Mar 28, 2022

crsimmons mentioned this issue Apr 5, 2022

Web node kept getting recreated. #184

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

self update job resets team permissions #91

self update job resets team permissions #91

RichardBradley commented Mar 31, 2021

crsimmons commented Mar 31, 2021

RichardBradley commented Mar 3, 2022

crsimmons commented Mar 3, 2022

RichardBradley commented Mar 16, 2022

RichardBradley commented Mar 16, 2022 •

edited

Loading

crsimmons commented Mar 16, 2022

RichardBradley commented Mar 16, 2022

crsimmons commented Mar 18, 2022

beccar97 commented Mar 18, 2022

crsimmons commented Mar 18, 2022

beccar97 commented Mar 21, 2022

crsimmons commented Mar 21, 2022

RichardBradley commented Mar 23, 2022 •

edited

Loading

crsimmons commented Mar 28, 2022

crsimmons commented Apr 2, 2022

self update job resets team permissions #91

self update job resets team permissions #91

Comments

RichardBradley commented Mar 31, 2021

crsimmons commented Mar 31, 2021

RichardBradley commented Mar 3, 2022

crsimmons commented Mar 3, 2022

RichardBradley commented Mar 16, 2022

RichardBradley commented Mar 16, 2022 • edited Loading

crsimmons commented Mar 16, 2022

RichardBradley commented Mar 16, 2022

crsimmons commented Mar 18, 2022

beccar97 commented Mar 18, 2022

crsimmons commented Mar 18, 2022

beccar97 commented Mar 21, 2022

crsimmons commented Mar 21, 2022

RichardBradley commented Mar 23, 2022 • edited Loading

crsimmons commented Mar 28, 2022

crsimmons commented Apr 2, 2022

RichardBradley commented Mar 16, 2022 •

edited

Loading

RichardBradley commented Mar 23, 2022 •

edited

Loading