Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout while waiting for component deployment roll out #3256

Closed
prietyc123 opened this issue May 28, 2020 · 23 comments
Closed

Timeout while waiting for component deployment roll out #3256

prietyc123 opened this issue May 28, 2020 · 23 comments
Labels
area/devfile-spec Issues or PRs related to the Devfile specification and how odo handles and interprets it. estimated-size/S (5-10) Rough sizing for Epics. Less then one sprint of work for one person flake Categorizes issue or PR as related to a flaky test. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/High Important issue; should be worked on before any other issues (except priority/Critical issue(s)).

Comments

@prietyc123
Copy link
Contributor

/kind bug

What versions of software are you using?

Operating System:
All Supported

Output of odo version:
master

How did you run odo exactly?

Running test on OpenShift CI.

Actual behavior

Creating a new project: ccupvnlhzv
Running odo with args [odo project create ccupvnlhzv -w -v4]
[odo]  •  Waiting for project to come up  ...
[odo] I0527 11:06:20.874584   26071 occlient.go:542] Status of creation of project ccupvnlhzv is Active
[odo] I0527 11:06:20.874675   26071 occlient.go:547] Project ccupvnlhzv now exists
[odo] 
[...]
Running odo with args [odo preference set Experimental true]
[odo] Global preference was successfully updated
[odo] I0527 11:06:21.164827   26107 preference.go:188] The path for preference file is /tmp/023406818/config.yaml
[odo] I0527 11:06:21.165321   26107 odo.go:72] Could not get the latest release information in time. Never mind, exiting gracefully :)
Running odo with args [odo create nodejs --project ccupvnlhzv veqzkx]
[odo] I0527 11:06:21.363048   26123 preference.go:188] The path for preference file is /tmp/023406818/config.yaml
[odo] I0527 11:06:21.363239   26123 util.go:399] path ./.odo/config.yaml doesn't exist, skipping it
[odo] I0527 11:06:21.363248   26123 preference.go:188] The path for preference file is /tmp/023406818/config.yaml
[odo] I0527 11:06:21.368382   26123 preference.go:188] The path for preference file is /tmp/023406818/config.yaml
[odo] Experimental mode is enabled, use at your own risk
[odo] 
[odo] I0527 11:06:21.834436   26123 util.go:399] path devfile.yaml doesn't exist, skipping it
[odo] I0527 11:06:21.834485   26123 util.go:399] path devfile.yaml doesn't exist, skipping it
[odo] Validation
[odo]  •  Checking devfile compatibility  ...
[odo]  •  Creating a devfile component from registry: DefaultDevfileRegistry  ...
[odo] 
 ✓  Checking devfile compatibility [68177ns]
[odo] 
 ✓  Creating a devfile component from registry: DefaultDevfileRegistry [60910ns]
[odo] I0527 11:06:21.834600   26123 preference.go:188] The path for preference file is /tmp/023406818/config.yaml
[odo] I0527 11:06:21.834706   26123 util.go:399] path .odo/env/env.yaml doesn't exist, skipping it
[odo]  •  Validating devfile component  ...
[odo] I0527 11:06:21.834738   26123 preference.go:188] The path for preference file is /tmp/023406818/config.yaml
[odo] 
 ✓  Validating devfile component [121592ns]
[odo] I0527 11:06:21.834818   26123 preference.go:188] The path for preference file is /tmp/023406818/config.yaml
[odo] I0527 11:06:21.834878   26123 util.go:399] path devfile.yaml doesn't exist, skipping it
[odo] I0527 11:06:21.919039   26123 odo.go:72] Could not get the latest release information in time. Never mind, exiting gracefully :)
[odo] 
[odo] Please use `odo push` command to create the component with source deployed
Running odo with args [odo push --devfile devfile.yaml --project ccupvnlhzv]
[odo] I0527 11:06:22.096385   26179 preference.go:188] The path for preference file is /tmp/023406818/config.yaml
[odo] I0527 11:06:22.096682   26179 preference.go:188] The path for preference file is /tmp/023406818/config.yaml
[odo] I0527 11:06:22.409388   26179 preference.go:188] The path for preference file is /tmp/023406818/config.yaml
[odo] I0527 11:06:22.409521   26179 preference.go:188] The path for preference file is /tmp/023406818/config.yaml
[odo] I0527 11:06:22.409582   26179 preference.go:188] The path for preference file is /tmp/023406818/config.yaml
[odo] I0527 11:06:22.409676   26179 context.go:48] absolute devfile path: '/tmp/023406818/devfile.yaml'
[odo] I0527 11:06:22.410142   26179 content.go:32] converted devfile YAML to JSON
[odo] I0527 11:06:22.410388   26179 apiVersion.go:35] devfile apiVersion: '1.0.0'
[odo] I0527 11:06:22.410404   26179 context.go:64] devfile apiVersion '1.0.0' is supported in odo
[odo] I0527 11:06:22.413106   26179 schema.go:47] validated devfile schema
[odo] I0527 11:06:22.413657   26179 preference.go:188] The path for preference file is /tmp/023406818/config.yaml
[odo] I0527 11:06:22.413734   26179 preference.go:188] The path for preference file is /tmp/023406818/config.yaml
[odo] I0527 11:06:22.417954   26179 preference.go:188] The path for preference file is /tmp/023406818/config.yaml
[odo] I0527 11:06:22.418055   26179 preference.go:188] The path for preference file is /tmp/023406818/config.yaml
[odo] I0527 11:06:22.495315   26179 command.go:44] The command "devinit" was not found in the devfile
[odo] I0527 11:06:22.495346   26179 command.go:152] No init command was provided
[odo] I0527 11:06:22.495369   26179 command.go:53] Validating actions for command: devbuild 
[odo] I0527 11:06:22.495575   26179 utils.go:102] Found component "dockerimage" with alias "runtime"
[odo] I0527 11:06:22.495592   26179 command.go:60] Action 1 maps to component runtime
[odo] I0527 11:06:22.495606   26179 command.go:169] Build command: devbuild
[odo] I0527 11:06:22.495621   26179 command.go:53] Validating actions for command: devrun 
[odo] I0527 11:06:22.495628   26179 utils.go:102] Found component "dockerimage" with alias "runtime"
[odo] I0527 11:06:22.495636   26179 command.go:60] Action 1 maps to component runtime
[odo] I0527 11:06:22.495644   26179 command.go:176] Run command: devrun
[odo] I0527 11:06:22.495682   26179 utils.go:102] Found component "dockerimage" with alias "runtime"
[odo] I0527 11:06:22.495962   26179 command.go:53] Validating actions for command: devrun 
[odo] I0527 11:06:22.495975   26179 utils.go:102] Found component "dockerimage" with alias "runtime"
[odo] I0527 11:06:22.495984   26179 command.go:60] Action 1 maps to component runtime
[odo] I0527 11:06:22.496000   26179 utils.go:125] Updating container runtime entrypoint with supervisord
[odo] I0527 11:06:22.496007   26179 utils.go:131] Updating container runtime with supervisord volume mounts
[odo] I0527 11:06:22.496015   26179 utils.go:141] Updating container runtime env with run command
[odo] I0527 11:06:22.496030   26179 utils.go:150] Updating container runtime env with run command's workdir
[odo] I0527 11:06:22.496051   26179 utils.go:102] Found component "dockerimage" with alias "runtime"
[odo] I0527 11:06:22.496094   26179 adapter.go:224] Creating deployment veqzkx
[odo] I0527 11:06:22.496106   26179 adapter.go:225] The component name is veqzkx
[odo] 
[odo] Validation
[odo]  •  Validating the devfile  ...
[odo] 
 ✓  Validating the devfile [391506ns]
[odo] 
[odo] Creating Kubernetes resources for component veqzkx
[odo] I0527 11:06:22.655760   26179 adapter.go:269] Successfully created component veqzkx
[odo]  •  Waiting for component to start  ...
[odo] I0527 11:06:22.740780   26179 adapter.go:278] Successfully created Service for component veqzkx
[odo] I0527 11:06:22.740825   26179 deployments.go:47] Waiting for veqzkx deployment rollout
[odo] I0527 11:06:22.812496   26179 deployments.go:81] Waiting for deployment "veqzkx" rollout to finish: 0 of 1 updated replicas are available...
[odo] I0527 11:06:22.812528   26179 deployments.go:88] Waiting for deployment spec update to be observed...
[odo]  ✗  Failed to start component with name veqzkx.
[odo] Error: Failed to create the component: error while waiting for deployment rollout: timeout while waiting for veqzkx deployment roll out
[odo]  ✗  Waiting for component to start [5m]
Deleting project: ccupvnlhzv
Running odo with args [odo project delete ccupvnlhzv -f]
[odo] I0527 11:11:24.056570   27915 application.go:49] Unable to list Service Catalog instances: unable to list ServiceInstances: serviceinstances.servicecatalog.k8s.io is forbidden: User "developer" cannot list resource "serviceinstances" in API group "servicecatalog.k8s.io" in the namespace "ccupvnlhzv"
[odo]  ⚠  Warning! Projects are deleted from the cluster asynchronously. Odo does its best to delete the project. Due to multi-tenant clusters, the project may still exist on a different node.
[odo] I0527 11:11:24.144572   27915 odo.go:72] Could not get the latest release information in time. Never mind, exiting gracefully :)
[odo]  ✓  Deleted project : ccupvnlhzv
Setting current dir to: /go/src/github.com/openshift/odo/tests/integration/devfile
Deleting dir: /tmp/023406818
• Failure [305.619 seconds]
odo devfile delete command tests
/go/src/github.com/openshift/odo/tests/integration/devfile/cmd_devfile_delete_test.go:14
  when devfile delete command is executed with all flag
  /go/src/github.com/openshift/odo/tests/integration/devfile/cmd_devfile_delete_test.go:67
    should delete the component created from the devfile and also the env and odo folders and the odo-index-file.json file [It]
skipped 14 lines unfold_more
[Fail] odo devfile delete command tests when devfile delete command is executed with all flag [It] should delete the component created from the devfile and also the env and odo folders and the odo-index-file.json file 
/go/src/github.com/openshift/odo/tests/helper/helper_run.go:34

Expected behavior

Component should get deployed.

Any logs, error output, etc?

Log : https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/pr-logs/pull/openshift_odo/3041/pull-ci-openshift-odo-master-v4.3-integration-e2e/682#1:build-log.txt%3A484

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 28, 2020
@amitkrout amitkrout added the flake Categorizes issue or PR as related to a flaky test. label Jun 24, 2020
@prietyc123
Copy link
Contributor Author

prietyc123 commented Jul 10, 2020

Same failure I can see while executing make test-cmd-devfile-push on windows PSI 3.11 cluster.

> make test-cmd-devfile-push

Running Suite: Devfile Suite
============================
Random Seed: 1594371595
Will run 1 of 95 specs

SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSCreated dir: C:\Users\Admin\AppData\Local\Temp\413904003
Running odo.exe with args [odo preference set Experimental true]
[odo] Global preference was successfully updated
Setting KUBECONFIG=C:\Users\Admin\AppData\Local\Temp\413904003\config
Creating a new project: qpejiarkkf
Running odo.exe with args [odo project create qpejiarkkf -w -v4]
[odo]  -  Waiting for project to come up  ...
[odo] I0710 09:00:18.844461    5920 occlient.go:542] Status of creation of project qpejiarkkf is Active
[odo] I0710 09:00:18.904421    5920 occlient.go:547] Project qpejiarkkf now exists
[odo] I0710 09:00:18.917481    5920 occlient.go:582] Status of creation of service account &ServiceAccount{ObjectMeta:{default  qpejiarkkf /api/v1/namespaces/qpejiarkkf/serviceaccounts/default fb93c32e-c28a-11ea-a594-fa163e9b6334 19425748 0 2020-07-10 08:54:37 +0000 GMT <nil> <nil> map[] map[] [] []  []},Secrets:[]ObjectReference{ObjectReference{Kind:,Namespace:,Name:default-token-w62jn,UID:,APIVersion:,ResourceVersion:,FieldPath:,},ObjectReference{Kind:,Namespace:,Name:default-dockercfg-rls82,UID:,APIVersion:,ResourceVersion:,FieldPath:,},},ImagePullSecrets:[]LocalObjectReference{LocalObjectReference{Name:default-dockercfg-rls82,},},AutomountServiceAccountToken:nil,} is ready
 V  Waiting for project to come up [809ms]
[odo]  V  Project 'qpejiarkkf' is ready for use
[odo]  V  New project created and now using project: qpejiarkkf
[odo] I0710 09:00:18.969447    5920 odo.go:72] Could not get the latest release information in time. Never mind, exiting gracefully :)
Current working dir: C:\Users\Admin\go\src\github.com\openshift\odo\tests\integration\devfile
Setting current dir to: C:\Users\Admin\AppData\Local\Temp\413904003
Running odo.exe with args [odo create java-springboot --project qpejiarkkf praybh]
[odo] Experimental mode is enabled, use at your own risk
[odo]
[odo] Validation
 V  Checking devfile existence [0ns]
 V  Checking devfile compatibility [2ms]
 V  Creating a devfile component from registry: DefaultDevfileRegistry [1ms]
 V  Validating devfile component [2ms]
[odo]
[odo] Please use `odo push` command to create the component with source deployed
Running odo.exe with args [odo push --namespace qpejiarkkf]
[odo]
[odo] Validation
 V  Validating the devfile [999500ns]
[odo]
[odo] Creating Kubernetes resources for component praybh
 X  Waiting for component to start [5m]
[odo]  X  Failed to start component with name praybh. Error: Failed to create the component: error while waiting for deployment rollout: timeout while waiting for praybh deployment roll out
Deleting project: qpejiarkkf
Running odo.exe with args [odo project delete qpejiarkkf -f]
[odo]  V  Deleted project : qpejiarkkf
[odo]  !  Warning! Projects are deleted from the cluster asynchronously. Odo does its best to delete the project. Due to multi-tenant clusters, the project 
may still exist on a different node.
Setting current dir to: C:\Users\Admin\go\src\github.com\openshift\odo\tests\integration\devfile
Deleting dir: C:\Users\Admin\AppData\Local\Temp\413904003

------------------------------
+ Failure [303.801 seconds]
odo devfile push command tests
C:/Users/Admin/go/src/github.com/openshift/odo/tests/integration/devfile/cmd_devfile_push_test.go:16
  Verify devfile push works
  C:/Users/Admin/go/src/github.com/openshift/odo/tests/integration/devfile/cmd_devfile_push_test.go:74
    should only execute devinit command once if component is already created in v1 devfiles [It]
    C:/Users/Admin/go/src/github.com/openshift/odo/tests/integration/devfile/cmd_devfile_push_test.go:285

    No future change is possible.  Bailing out early after 300.500s.
    Running odo.exe with args [odo push --namespace qpejiarkkf]
    Expected
        <int>: 1
    to match exit code:
        <int>: 0

    C:/Users/Admin/go/src/github.com/openshift/odo/tests/helper/helper_run.go:34
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS

Summarizing 1 Failure:

[Fail] odo devfile push command tests Verify devfile push works [It] should only execute devinit command once if component is already created in v1 devfiles
C:/Users/Admin/go/src/github.com/openshift/odo/tests/helper/helper_run.go:34

@prietyc123 prietyc123 added area/devfile-spec Issues or PRs related to the Devfile specification and how odo handles and interprets it. priority/Medium Nice to have issue. Getting it done before priority changes would be great. labels Jul 10, 2020
@kadel kadel removed the kind/bug Categorizes issue or PR as related to a bug. label Aug 26, 2020
@prietyc123
Copy link
Contributor Author

@prietyc123
Copy link
Contributor Author

Hitting this for operator hub odo push more frequently. Here https://gist.github.com/jgwest/5b28687514970ae6baa5942ee82d5755 I can see 28 occurence of this issue which seems a critical flake to me. I tried reproducing this locally with number of executions but failed, However I have opened a pr #3906 to get detailed information on deployment rolled out.

From the logs it is clear that while deployment its not maintaining minimum replicasets which is 1 in our case but why it is happening need to check more on this. Unsure about the reason because I see there could be lots of reason which I am trying to figure it out though #3906

@kadel @girishramnani Can you please share your thoughts also.

@prietyc123
Copy link
Contributor Author

/priority high

@openshift-ci-robot openshift-ci-robot added the priority/High Important issue; should be worked on before any other issues (except priority/Critical issue(s)). label Sep 8, 2020
@prietyc123 prietyc123 removed the priority/Medium Nice to have issue. Getting it done before priority changes would be great. label Sep 8, 2020
@prietyc123
Copy link
Contributor Author

Fixed via PR openshift/release#11695

/close

@prietyc123
Copy link
Contributor Author

Initially we suspected deployment rollout occuring due to cluster configuration with low resource and surprisingly we didn't face this issue for long time after the fix in openshift/release#11695 .
Suddenly we am facing the same issue
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-odo-master-v4.4-integration-e2e-periodic/1316166853931831296#1:build-log.txt%3A1331 and I think this should be looked into from different angle. Like I suspect there might be issue of service account, however I am not sure of it.

@sarveshtamba
Copy link
Contributor

@prietyc123 Any updates on this issue? Facing this issue on Power platform too.

@prietyc123
Copy link
Contributor Author

prietyc123 commented Feb 9, 2021

Constantly observing this on PSI fedora VM

I0209 06:58:30.626844  206136 events.go:52] Warning Event: Count: 5, Reason: Failed, Message: Error: ImagePullBackOff
[ssh:Fedora 32] [odo]  ✗  Waiting for component to start [5m] [WARNING x5: Failed]
[ssh:Fedora 32] [odo]  ✗  Failed to start component with name priylt. Error: Failed to create the component: error while waiting for deployment rollout: timeout while waiting for priylt deployment roll out\nFor more information to help determine the cause of the error, re-run with '-v'.
[ssh:Fedora 32] [odo] See below for a list of failed events that occured more than 5 times during deployment:
[ssh:Fedora 32] [odo] 
[ssh:Fedora 32] [odo]  NAME                                      COUNT  REASON  MESSAGE                 
[ssh:Fedora 32] [odo] 
[ssh:Fedora 32] [odo]  priylt-7995d85b64-9z982.166201e8f991f2d3  5      Failed  Error: ImagePullBackOff 

Detailed log https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_odo/4416/pull-ci-openshift-odo-master-v4.6-integration-e2e/1359004414869770240#1:build-log.txt%3A1531

@dharmit
Copy link
Member

dharmit commented Mar 1, 2021

Constantly observing this on PSI fedora VM

I0209 06:58:30.626844  206136 events.go:52] Warning Event: Count: 5, Reason: Failed, Message: Error: ImagePullBackOff
[ssh:Fedora 32] [odo]  ✗  Waiting for component to start [5m] [WARNING x5: Failed]
[ssh:Fedora 32] [odo]  ✗  Failed to start component with name priylt. Error: Failed to create the component: error while waiting for deployment rollout: timeout while waiting for priylt deployment roll out\nFor more information to help determine the cause of the error, re-run with '-v'.
[ssh:Fedora 32] [odo] See below for a list of failed events that occured more than 5 times during deployment:
[ssh:Fedora 32] [odo] 
[ssh:Fedora 32] [odo]  NAME                                      COUNT  REASON  MESSAGE                 
[ssh:Fedora 32] [odo] 
[ssh:Fedora 32] [odo]  priylt-7995d85b64-9z982.166201e8f991f2d3  5      Failed  Error: ImagePullBackOff 

Detailed log https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_odo/4416/pull-ci-openshift-odo-master-v4.6-integration-e2e/1359004414869770240#1:build-log.txt%3A1531

#4454

@prietyc123
Copy link
Contributor Author

Constant failure on PSI

[odo]  ���  Failed to start component with name "bwwrxa". Error: Failed to create the component: error while waiting for deployment rollout: timeout while waiting for bwwrxa deployment roll out

[ssh:macOS 11.2] Running oc with args [oc describe pods -n cmd-devfile-push-test712aaq]

[ssh:macOS 11.2] [oc] Name:           bwwrxa-67b67c8449-n5292

[ssh:macOS 11.2] [oc] Namespace:      cmd-devfile-push-test712aaq
[...]
[ssh:macOS 11.2] [oc] Status:         Pending
[...]

[oc] Events:

[ssh:macOS 11.2] [oc]   Type     Reason              Age                   From                     Message

[ssh:macOS 11.2] [oc]   ----     ------              ----                  ----                     -------

[ssh:macOS 11.2] [oc]   Normal   Scheduled           4m53s                 default-scheduler        Successfully assigned cmd-devfile-push-test712aaq/bwwrxa-67b67c8449-n5292 to testocp47-b6pfr-worker-0-mjl2x

[ssh:macOS 11.2] [oc]   Warning  FailedAttachVolume  3m3s (x4 over 6m24s)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-e4925954-666f-4c06-88b6-2cd59fc6f201" : Volume "8a880223-6f68-436b-a6a9-c4772382dbfa" failed to be attached within the alloted time

[ssh:macOS 11.2] [oc]   Warning  FailedMount         37s (x2 over 2m51s)   kubelet                  Unable to attach or mount volumes: unmounted volumes=[m2-bwwrxa-ryoa-vol], unattached volumes=[odo-supervisord-shared-data default-token-tm479 odo-projects m2-bwwrxa-ryoa-vol]: timed out waiting for the condition

@prietyc123
Copy link
Contributor Author

Debugging more via pr #4748

@dharmit dharmit added the estimated-size/S (5-10) Rough sizing for Epics. Less then one sprint of work for one person label May 26, 2021
@prietyc123
Copy link
Contributor Author

Debugging more via pr #4748

Developer user doesn't have permission to get persistence volume details. So trying to debug it using kubeadmin user locally with patch

func CommonAfterEach(commonVar CommonVar) bool {
	session := CmdRunner("oc", "describe", "pods", "-n", commonVar.Project)
	session.Wait()
	output := string(session.Out.Contents())
	if CurrentGinkgoTestDescription().Failed && strings.Contains(string(output), "AttachVolume.Attach failed for volume") {
		fmt.Println("volume error with project", commonVar.Project)
		return false
	}
	// delete the random project/namespace created in CommonBeforeEach
	commonVar.CliRunner.DeleteNamespaceProject(commonVar.Project)

	// restores the original kubeconfig and working directory
	Chdir(commonVar.OriginalWorkingDirectory)
	err := os.Setenv("KUBECONFIG", commonVar.OriginalKubeconfig)
	Expect(err).NotTo(HaveOccurred())

	// delete the temporary context directory
	DeleteDir(commonVar.Context)

	os.Unsetenv("GLOBALODOCONFIG")
	return true
}

@prietyc123
Copy link
Contributor Author

Debugging more via pr #4748

Developer user doesn't have permission to get persistence volume details. So trying to debug it using kubeadmin user locally with patch

func CommonAfterEach(commonVar CommonVar) bool {
	session := CmdRunner("oc", "describe", "pods", "-n", commonVar.Project)
	session.Wait()
	output := string(session.Out.Contents())
	if CurrentGinkgoTestDescription().Failed && strings.Contains(string(output), "AttachVolume.Attach failed for volume") {
		fmt.Println("volume error with project", commonVar.Project)
		return false
	}
	// delete the random project/namespace created in CommonBeforeEach
	commonVar.CliRunner.DeleteNamespaceProject(commonVar.Project)

	// restores the original kubeconfig and working directory
	Chdir(commonVar.OriginalWorkingDirectory)
	err := os.Setenv("KUBECONFIG", commonVar.OriginalKubeconfig)
	Expect(err).NotTo(HaveOccurred())

	// delete the temporary context directory
	DeleteDir(commonVar.Context)

	os.Unsetenv("GLOBALODOCONFIG")
	return true
}

I tried running the entire devfile suite twice locally and unfortunately everytime it passed 🙁 . I am going to try some more time to hit this error. I hope it occurs at leastt once.

@prietyc123
Copy link
Contributor Author

prietyc123 commented Jun 8, 2021

Debugging more via pr #4748

Developer user doesn't have permission to get persistence volume details. So trying to debug it using kubeadmin user locally with patch

func CommonAfterEach(commonVar CommonVar) bool {
	session := CmdRunner("oc", "describe", "pods", "-n", commonVar.Project)
	session.Wait()
	output := string(session.Out.Contents())
	if CurrentGinkgoTestDescription().Failed && strings.Contains(string(output), "AttachVolume.Attach failed for volume") {
		fmt.Println("volume error with project", commonVar.Project)
		return false
	}
	// delete the random project/namespace created in CommonBeforeEach
	commonVar.CliRunner.DeleteNamespaceProject(commonVar.Project)

	// restores the original kubeconfig and working directory
	Chdir(commonVar.OriginalWorkingDirectory)
	err := os.Setenv("KUBECONFIG", commonVar.OriginalKubeconfig)
	Expect(err).NotTo(HaveOccurred())

	// delete the temporary context directory
	DeleteDir(commonVar.Context)

	os.Unsetenv("GLOBALODOCONFIG")
	return true
}

Running the same patch on jenkins gives deployment rollout with failed to attach volume
Error:

AttachVolume.Attach failed for volume "pvc-5b4dd312-fa4d-4ea8-b2ce-a0745e7e022a" : 
The service is currently unable to handle the request due to a temporary overloading or maintenance. This is a temporary condition. Try again later.
 
Warning  FailedAttachVolume      54m                attachdetach-controller                  AttachVolume.
Attach failed for volume "pvc-b4f7643a-9dd4-4e0b-9d24-5fc2911012e2" : failed to attach cdadb402-37c3-4f80-9a3f-4dc12ba86668 volume to 16f44471-7073-45f6-aa91-beb74fc899c7 compute: 
The service is currently unable to handle the request due to a temporary overloading or maintenance. This is a temporary condition. Try again later.
  
Warning  FailedAttachVolume      53m (x7 over 55m)  attachdetach-controller                  AttachVolume.Attach failed for volume "pvc-b4f7643a-9dd4-4e0b-9d24-5fc2911012e2" : 
The service is currently unable to handle the request due to a temporary overloading or maintenance. This is a temporary condition. Try again later.
AttachVolume.Attach succeeded for volume "pvc-5b4dd312-fa4d-4ea8-b2ce-a0745e7e022a"
  Normal   SuccessfulAttachVolume  50m                attachdetach-controller                  AttachVolume.Attach succeeded for volume "pvc-b4f7643a-9dd4-4e0b-9d24-5fc2911012e2"
  Warning  FailedMount             46m                kubelet, testocp47-b6pfr-worker-0-mjl2x  Unable to attach or mount volumes: unmounted volumes=[myvol-iipxmr-cttm-vol myvol2-iipxmr-jdks-vol], 
unattached volumes=[odo-projects myvol-iipxmr-cttm-vol myvol2-iipxmr-jdks-vol odo-supervisord-shared-data default-token-vv2f8]: 
timed out waiting for the condition
  Warning  FailedMount             44m                kubelet, testocp47-b6pfr-worker-0-mjl2x  
Unable to attach or mount volumes: unmounted volumes=[myvol-iipxmr-cttm-vol myvol2-iipxmr-jdks-vol], 
unattached volumes=[odo-supervisord-shared-data default-token-vv2f8 odo-projects myvol-iipxmr-cttm-vol myvol2-iipxmr-jdks-vol]: 
timed out waiting for the condition

I tried getting the pvc and pv. They seems to be in place as expected. So overall I feel it is psi network issue which leads these failures to pop up on.

$ oc get pvc -n cmd-devfile-push-test560aje
NAME                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
myvol-iipxmr-cttm    Bound    pvc-5b4dd312-fa4d-4ea8-b2ce-a0745e7e022a   3Gi        RWO            standard       45m
myvol2-iipxmr-jdks   Bound    pvc-b4f7643a-9dd4-4e0b-9d24-5fc2911012e2   1Gi        RWO            standard       45m

$ oc get pv -n cmd-devfile-push-test560aje
pvc-5b4dd312-fa4d-4ea8-b2ce-a0745e7e022a   3Gi        RWO            Delete           Bound    cmd-devfile-push-test560aje/myvol-iipxmr-cttm              standard                47m

pvc-b4f7643a-9dd4-4e0b-9d24-5fc2911012e2   1Gi        RWO            Delete           Bound    cmd-devfile-push-test560aje/myvol2-iipxmr-jdks             standard                47m
[...]

@prietyc123 prietyc123 removed their assignment Jul 7, 2021
@kadel
Copy link
Member

kadel commented Jul 28, 2021

One of the most common reason why you get timeouts waiting for deployment roll out on PSI is due to project not being deleted properly that results in a lot of PVC that are dangling in those projects and OpenStack reaching the volume quota.

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 26, 2021
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 25, 2021
@kadel
Copy link
Member

kadel commented Mar 11, 2022

we don't see this problem since we moved away from PSI

@kadel kadel closed this as completed Mar 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/devfile-spec Issues or PRs related to the Devfile specification and how odo handles and interprets it. estimated-size/S (5-10) Rough sizing for Epics. Less then one sprint of work for one person flake Categorizes issue or PR as related to a flaky test. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/High Important issue; should be worked on before any other issues (except priority/Critical issue(s)).
Projects
None yet
Development

No branches or pull requests

7 participants