Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add time-limited retries to InstallPlan execution. #2090

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion cmd/catalog/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ var (

profiling = flag.Bool(
"profiling", false, "serve profiling data (on port 8080)")

installPlanTimeout = flag.Duration("install-plan-retry-timeout", 1*time.Minute, "time since first attempt at which plan execution errors are considered fatal")
exdx marked this conversation as resolved.
Show resolved Hide resolved
exdx marked this conversation as resolved.
Show resolved Hide resolved
)

func init() {
Expand Down Expand Up @@ -173,7 +175,7 @@ func main() {
}

// Create a new instance of the operator.
op, err := catalog.NewOperator(ctx, *kubeConfigPath, utilclock.RealClock{}, logger, *wakeupInterval, *configmapServerImage, *utilImage, *catalogNamespace, k8sscheme.Scheme)
op, err := catalog.NewOperator(ctx, *kubeConfigPath, utilclock.RealClock{}, logger, *wakeupInterval, *configmapServerImage, *utilImage, *catalogNamespace, k8sscheme.Scheme, *installPlanTimeout)
if err != nil {
log.Panicf("error configuring operator: %s", err.Error())
}
Expand Down
18 changes: 18 additions & 0 deletions deploy/chart/crds/0000_50_olm_00-clusterserviceversions.crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4527,6 +4527,19 @@ spec:
type: string
url:
type: string
relatedImages:
description: List any related images, or other container images that your Operator might require to perform their functions. This list should also include operand images as well. All image references should be specified by digest (SHA) and not by tag. This field is only used during catalog creation and plays no part in cluster runtime.
type: array
items:
type: object
required:
- image
- name
properties:
image:
type: string
name:
type: string
replaces:
description: The name of a CSV this one replaces. Should match the `metadata.Name` field of the old CSV.
type: string
Expand Down Expand Up @@ -4560,6 +4573,11 @@ spec:
type: object
additionalProperties:
type: string
skips:
description: The name(s) of one or more CSV(s) that should be skipped in the upgrade graph. Should match the `metadata.Name` field of the CSV that should be skipped. This field is only used during catalog creation and plays no part in cluster runtime.
type: array
items:
type: string
version:
description: OperatorVersion is a wrapper around semver.Version which supports correct marshaling to YAML and JSON.
type: string
Expand Down
7 changes: 7 additions & 0 deletions deploy/chart/crds/0000_50_olm_00-installplans.crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,9 @@ spec:
type:
description: InstallPlanConditionType describes the state of an InstallPlan at a certain point as a whole.
type: string
message:
description: Message is a human-readable message containing detailed information that may be important to understanding why the plan has its current status.
type: string
phase:
description: InstallPlanPhase is the current status of a InstallPlan as a whole.
type: string
Expand Down Expand Up @@ -249,6 +252,10 @@ spec:
status:
description: StepStatus is the current status of a particular resource an in InstallPlan
type: string
startTime:
description: StartTime is the time when the controller began applying the resources listed in the plan to the cluster.
type: string
format: date-time
served: true
storage: true
subresources:
Expand Down
66 changes: 33 additions & 33 deletions deploy/ocp/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,17 @@ olm:
kubernetes.io/os: linux
node-role.kubernetes.io/master: ""
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 120
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
tolerationSeconds: 120
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 120
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
tolerationSeconds: 120
tlsCertPath: /var/run/secrets/serving-cert/tls.crt
tlsKeyPath: /var/run/secrets/serving-cert/tls.key
resources:
Expand All @@ -46,17 +46,17 @@ catalog:
kubernetes.io/os: linux
node-role.kubernetes.io/master: ""
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 120
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
tolerationSeconds: 120
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 120
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
tolerationSeconds: 120
resources:
requests:
cpu: 10m
Expand All @@ -72,17 +72,17 @@ package:
kubernetes.io/os: linux
node-role.kubernetes.io/master: ""
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 120
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
tolerationSeconds: 120
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 120
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
tolerationSeconds: 120
resources:
requests:
cpu: 10m
Expand Down
Loading