Skip to content
This repository has been archived by the owner on Oct 22, 2021. It is now read-only.

Commit

Permalink
feat: remove cc_deployment_updater.
Browse files Browse the repository at this point in the history
The cc_deployment_updater job has an incorrect readiness probe;
when it is in HA, it can get jammed and never become ready (see below).
Rather than properly fixing it (by moving it into a separate instance
group and marking it as active/passive), just drop the job completely as
it does not appear to be useful at all.

Details on the readiness probe problems:

The probe was unaware that it was supposed to be active/passive, and
just checks the logs to ensure all instances are sleeping once every
five seconds.  This only happened to work on passive nodes because the
test incorrectly succeeded when an instance has never been ready
(because jq happily accepted empty input and returned success).  This
means that if a pod was ever active, then turned passive again (e.g.
because it was being restarted and another instance became active), it
would have a stale line that matched the probe and therefore start
getting marked as failed.
  • Loading branch information
mook-as committed Dec 1, 2020
1 parent 920a8e7 commit dfcc97d
Show file tree
Hide file tree
Showing 5 changed files with 4 additions and 58 deletions.
39 changes: 0 additions & 39 deletions chart/assets/operations/instance_groups/database.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -78,13 +78,6 @@
path: /instance_groups/name=scheduler/jobs/name=cloud_controller_clock/properties/ccdb/ca_cert?
value: *pxc-cluster-ca

- type: replace
path: /instance_groups/name=scheduler/jobs/name=cc_deployment_updater/properties/ccdb/address?
value: *pxc-cluster-address
- type: replace
path: /instance_groups/name=scheduler/jobs/name=cc_deployment_updater/properties/ccdb/ca_cert?
value: *pxc-cluster-ca

{{- if .Values.features.credhub.enabled }}
- type: replace
path: /instance_groups/name=credhub/jobs/name=credhub/properties/credhub/data_storage/host?
Expand Down Expand Up @@ -353,38 +346,6 @@
path: /instance_groups/name=scheduler/jobs/name=cloud_controller_clock/properties/ccdb/ssl_verify_hostname?
value: {{ .Values.features.external_database.require_ssl }}

- type: replace
path: /instance_groups/name=scheduler/jobs/name=cc_deployment_updater/properties/ccdb/db_scheme
value: *external_cc_database_scheme
- type: replace
path: /instance_groups/name=scheduler/jobs/name=cc_deployment_updater/properties/ccdb/port
value: *external_cc_database_port
- type: replace
path: /instance_groups/name=scheduler/jobs/name=cc_deployment_updater/properties/ccdb/databases/tag=cc/name
value: *external_cc_database_name
- type: replace
path: /instance_groups/name=scheduler/jobs/name=cc_deployment_updater/properties/ccdb/address?
value: *external_cc_database_address
{{- if not .Values.features.external_database.seed }}
- type: replace
path: /instance_groups/name=scheduler/jobs/name=cc_deployment_updater/properties/ccdb/roles/name=cloud_controller/password
value: *external_cc_database_password
- type: replace
path: /instance_groups/name=scheduler/jobs/name=cc_deployment_updater/properties/ccdb/roles/name=cloud_controller/name
value: *external_cc_database_username
{{- end }}{{/* not .Values.features.external_database.seed */}}
{{- if .Values.features.external_database.ca_cert }}
- type: replace
path: /instance_groups/name=scheduler/jobs/name=cc_deployment_updater/properties/ccdb/ca_cert?
value: {{- toYaml .Values.features.external_database.ca_cert | indent 2 }}
{{- else }}
- type: remove
path: /instance_groups/name=scheduler/jobs/name=cc_deployment_updater/properties/ccdb/ca_cert?
{{- end }}
- type: replace
path: /instance_groups/name=scheduler/jobs/name=cc_deployment_updater/properties/ccdb/ssl_verify_hostname?
value: {{ .Values.features.external_database.require_ssl }}

- type: replace
path: /instance_groups/name=diego-api/jobs/name=bbs/properties/diego/bbs/sql/db_driver
value: {{ .Values.features.external_database.type | quote }}
Expand Down
20 changes: 3 additions & 17 deletions chart/assets/operations/instance_groups/scheduler.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,23 +6,6 @@
exec:
command: ["pgrep", "--full", "clock:start"]

- type: replace
path: /instance_groups/name=scheduler/jobs/name=cc_deployment_updater/properties/quarks?/run/healthcheck/cc_deployment_updater
value:
readiness:
exec:
command:
# We should sleep about once every 5 seconds; check that the last entry was no more than 2 cycles ago
- /bin/sh
- -c
- >
tac /var/vcap/sys/log/cc_deployment_updater/cc_deployment_updater.log
| grep --max-count=1 Sleeping
| jq -e '.timestamp | gsub(".[0-9]+Z$"; "Z") | fromdate | now - . | . < 10'
liveness:
exec:
command: ["pgrep", "--full", "deployment_updater:start"]

- type: replace
path: /instance_groups/name=scheduler/jobs/name=statsd_injector/properties/quarks?/run/healthcheck/statsd_injector/readiness/exec/command
value: ["/bin/sh", "-c", "ss -nlu src localhost:8125 | grep :8125"]
Expand Down Expand Up @@ -102,6 +85,9 @@
protocol: TCP
internal: 9000

- type: remove
path: /instance_groups/name=scheduler/jobs/name=cc_deployment_updater

{{- if not .Values.features.eirini.enabled }}

{{- range $bytes := .Files.Glob "assets/operations/pre_render_scripts/scheduler_*" }}
Expand Down
1 change: 1 addition & 0 deletions chart/config/jobs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ jobs:
routing-api:
'$default': 'features.routing_api.enabled'
scheduler:
cc_deployment_updater: false
cfdot:
processes: []
ssh_proxy: '!features.eirini.enabled'
Expand Down
1 change: 0 additions & 1 deletion chart/config/resources.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,6 @@ resources:
rotate: {memory: {limit: 512, request: 192}}
router: 200
scheduler:
cc_deployment_updater: 320
cloud_controller_clock: 512
singleton-blobstore:
blobstore:
Expand Down
1 change: 0 additions & 1 deletion chart/templates/_capi.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@
{{- /* The buildpacks properties are only defined for the ng/worker/clock jobs */}}
{{- if not (hasPrefix "buildpacks" $property) }}
{{- $_ := set $ig "cc_deployment_updater" "scheduler" }}
{{- /* XXX cc_route_syncer is not in cf-deployment; see CF-K8s-Networking */}}
{{- /* $_ := set $ig "cc_route_syncer" "???" */}}
{{- end }}
Expand Down

0 comments on commit dfcc97d

Please sign in to comment.