component push throws error of waited 4m0s but couldn't find running pod matching selector #2877

prietyc123 · 2020-04-14T06:17:37Z

/kind flake

What versions of software are you using?

Operating System:
All Supported

Output of odo version:
master

How did you run odo exactly?

odo push --context context on OpenShift ci.

Actual behavior

Throwing error as:

[odo] Please use `odo push` command to create the component with source deployed
Running odo with args [odo push --context /tmp/947847413]
[odo] Validation
[odo]  •  Checking component  ...
[odo] 
 ✓  Checking component [60ms]
[odo] 
[odo] Configuration changes
[odo]  ✓  Initializing component
[odo]  •  Creating component  ...
[odo] 
 ✓  Creating component [200ms]
[odo] 
[odo] Pushing to component dotnet-app of type local
[odo]  •  Checking files for pushing  ...
[odo] 
 ✓  Checking files for pushing [553410ns]
[odo]  •  Waiting for component to start  ...
[odo]  ✗  Waiting for component to start [4m]
[odo]  ✗  waited 4m0s but couldn't find running pod matching selector: 'deploymentconfig=dotnet-app-app'
Deleting project: bkbwthrcvq
Running odo with args [odo project delete bkbwthrcvq -f]

Expected behavior

It should push the component successfully into the deployment.

Any logs, error output, etc?

[odo] Please use `odo push` command to create the component with source deployed
Running odo with args [odo push --context /tmp/947847413]
[odo] Validation
[odo]  •  Checking component  ...
[odo] 
 ✓  Checking component [60ms]
[odo] 
[odo] Configuration changes
[odo]  ✓  Initializing component
[odo]  •  Creating component  ...
[odo] 
 ✓  Creating component [200ms]
[odo] 
[odo] Pushing to component dotnet-app of type local
[odo]  •  Checking files for pushing  ...
[odo] 
 ✓  Checking files for pushing [553410ns]
[odo]  •  Waiting for component to start  ...
[odo]  ✗  Waiting for component to start [4m]
[odo]  ✗  waited 4m0s but couldn't find running pod matching selector: 'deploymentconfig=dotnet-app-app'
Deleting project: bkbwthrcvq
Running odo with args [odo project delete bkbwthrcvq -f]

For more details: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_odo/2875/pull-ci-openshift-odo-master-v4.1-integration-e2e-benchmark/1778#1:build-log.txt%3A710

The text was updated successfully, but these errors were encountered:

amitkrout · 2020-04-21T11:09:50Z

similar error with little bit twist - #2942 (comment)

amitkrout · 2020-04-23T09:54:07Z

@mik-dass This issue appears more frequently than before when the test node was 2. I am also assigning you to this issue along with me as you have fixed similar kind of issue before.

mik-dass · 2020-04-23T12:15:24Z

@mik-dass This issue appears more frequently than before when the test node was 2. I am also assigning you to this issue along with me as you have fixed similar kind of issue before.

Let's reduce the number of nodes back to 2.

prietyc123 · 2020-04-23T12:44:18Z

@mik-dass This issue appears more frequently than before when the test node was 2. I am also assigning you to this issue along with me as you have fixed similar kind of issue before.

Let's reduce the number of nodes back to 2.

Not a bad though I suspect the failure might be due to less resources in travis CI. Is there any way to increase the resources in travis while doing oc cluster up? cc_ @amitkrout

prietyc123 · 2020-04-27T09:40:04Z

one more hit in https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_odo/2965/pull-ci-openshift-odo-master-v4.2-integration-e2e-benchmark/2313#1:build-log.txt%3A378

prietyc123 · 2020-04-29T10:30:59Z

Hitting this issue more frequently when running tests with xenial distribution of travis CI https://travis-ci.com/github/openshift/odo/jobs/324857465#L473 .

Xenial distribution of travis is required to run latest kubernetes cluster. On the other hand we can use older version of minikube but odo push does not support the older version of minikube, which has been elaborated in issue : #2928 .

However there are some other consequences also of using older version of minikube like lag of latest feature implementation and I am suspecting this could be one of the reason that odo push is not supporting on older version of minikube. IMO we should move towards latest version minikube and hence xenial distribution of travis CI could be one of the solution.

Right now we are running our test on travis CI with trusty distribution, ubuntu version 14.04. And running the latest minikube needs systemd which was added in ubuntu 16.04, therefore we need to bump the ubuntu version to 16.04+ using xenial distribution of travis CI.

@kadel @girishramnani WDYT?

prietyc123 · 2020-04-29T11:30:39Z

@mik-dass This issue appears more frequently than before when the test node was 2. I am also assigning you to this issue along with me as you have fixed similar kind of issue before.

Let's reduce the number of nodes back to 2.

Not a bad though I suspect the failure might be due to less resources in travis CI. Is there any way to increase the resources in travis while doing oc cluster up? cc_ @amitkrout

I have raised a ticket asking for more resources on travis CI. Let see what they are replying on that. I will update the same once the reply

prietyc123 · 2020-04-29T12:19:59Z

I have raised a ticket asking for more resources on travis CI. Let see what they are replying on that. I will update the same once the reply

Got the response from Travis CI team

The provided default memory 7.5 is enough for running 4 component push in parallel. So i think we should check component push failure from odo end. WDYT @mik-dass ?

mik-dass · 2020-04-29T12:56:20Z

The provided default memory 7.5 is enough for running 4 component push in parallel. So i think we should check component push failure from odo end. WDYT @mik-dass ?

But decreasing the test nodes to 2 has indeed reduced the amount of this failure. Also 7.5 may not be enough as we are running a cluster in the background in most of our tests which can be a expensive operation too. Also the pod initialization step can consume a lot of the time. I would suggest increasing the push timeout value by odo preference set pushtimeout <some higher value> -f in most of our tests, increasing the number of test nodes to 4 and verifying.

prietyc123 · 2020-04-29T13:55:53Z

Hitting same flake on openshift release repo https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_release/8633/rehearse-8633-pull-ci-openshift-odo-master-v4.2-integration-e2e/2#1:build-log.txt%3A426

amitkrout · 2020-04-29T14:29:45Z

The provided default memory 7.5 is enough for running 4 component push in parallel. So i think we should check component push failure from odo end. WDYT @mik-dass ?

But decreasing the test nodes to 2 has indeed reduced the amount of this failure. Also 7.5 may not be enough as we are running a cluster in the background in most of our tests which can be a expensive operation too. Also the pod initialization step can consume a lot of the time. I would suggest increasing the push timeout value by odo preference set pushtimeout <some higher value> -f in most of our tests, increasing the number of test nodes to 4 and verifying.

@mik-dass may be you are right, however iirc even on single test node we had the similar issue. Anyway we can try your suggestion to narrow down the reason for failure.

amitkrout · 2020-04-30T05:56:40Z

@prietyc123 Can you please apply @mik-dass suggestion in one of your pr you mentioned in the comment #2877 (comment).

You just need to overwrite the pushtimeout through the global config. Lets make the wait time twice the actual i.e. 8 minute.

prietyc123 · 2020-04-30T05:59:50Z

@prietyc123 Can you please apply @mik-dass suggestion in one of your pr you mentioned in the comment #2877 (comment).

You just need to overwrite the pushtimeout through the global config. Lets make the wait time twice the actual i.e. 8 minute.

Sure I will definitely try it out and update the result.

prietyc123 · 2020-04-30T07:52:37Z

The provided default memory 7.5 is enough for running 4 component push in parallel. So i think we should check component push failure from odo end. WDYT @mik-dass ?

But decreasing the test nodes to 2 has indeed reduced the amount of this failure. Also 7.5 may not be enough as we are running a cluster in the background in most of our tests which can be a expensive operation too. Also the pod initialization step can consume a lot of the time. I would suggest increasing the push timeout value by odo preference set pushtimeout <some higher value> -f in most of our tests, increasing the number of test nodes to 4 and verifying.

@mik-dass I have set the pushtimeout to 8 min but still getting the same failure.

Running odo with args [odo push --context /tmp/280910682]
[odo] Validation
[odo]  •  Checking component  ...
 ✓  Checking component [11ms]
[odo] 
[odo] Configuration changes
[odo]  ✓  Initializing component
[odo]  •  Creating component  ...
 ✓  Creating component [122ms]
[odo] 
[odo] Applying URL changes
[odo]  ✓  URL warfile: http://warfile-app-yweencucjw.127.0.0.1.nip.io created
[odo] 
[odo] Pushing to component javaee-war-test of type binary
[odo]  •  Checking files for pushing  ...
 ✓  Checking files for pushing [302877ns]
[odo]  •  Waiting for component to start  ...
 ✗  waited 8m0s but was unable to find a running pod matching selector: 'deploymentconfig=javaee-war-test-app'
[odo] For more information to help determine the cause of the error, re-run with '-v'.
[odo] See below for a list of failed events that occured more than 5 times during deployment:
[odo] 
[odo]  NAME                                          COUNT  REASON  MESSAGE                 
[odo] 
[odo]  javaee-war-test-app-1-cljpt.160a8760de1a2fe9  20     Failed  Error: ImagePullBackOff 
[odo] 
[odo] 
[odo]  ✗  Waiting for component to start [8m] [WARNING x20: Failed]
Deleting project: yweencucjw
Running odo with args [odo project delete yweencucjw -f]
[odo] This project contains the following applications, which will be deleted
[odo] Application app
[odo] This application has following components that will be deleted
[odo] component named javaee-war-test
[odo] This component has following urls that will be deleted with component
[odo] URL named warfile with host warfile-app-yweencucjw.127.0.0.1.nip.io having protocol http at port 8080
[odo] No services / could not get services
[odo]  •  Deleting project yweencucjw  ...
 ✓  Deleting project yweencucjw [6s]
[odo]  ✓  Deleted project : yweencucjw
Deleting dir: /tmp/280910682
• Failure [488.196 seconds]
odo java e2e tests
/home/travis/gopath/src/github.com/openshift/odo/tests/e2escenarios/e2e_java_test.go:13
  odo component creation
  /home/travis/gopath/src/github.com/openshift/odo/tests/e2escenarios/e2e_java_test.go:42
    Should be able to deploy a .war file using wildfly [It]
    /home/travis/gopath/src/github.com/openshift/odo/tests/e2escenarios/e2e_java_test.go:59
    No future change is possible.  Bailing out early after 480.422s.
    Running odo with args [odo push --context /tmp/280910682]
    Expected
        <int>: 1
    to match exit code:
        <int>: 0

More details : https://travis-ci.com/github/openshift/odo/jobs/325397186#L1861

mik-dass · 2020-04-30T09:38:56Z

@mik-dass I have set the pushtimeout to 8 min but still getting the same failure.

There seems to be some issue with the network on travis and most probably the ImagePullBackOff happened because of that

https://travis-ci.com/github/openshift/odo/jobs/325397186#L1693
https://travis-ci.com/github/openshift/odo/jobs/325397186#L625

Also TBH I haven't seen this error on most PRs since we switched back to 2 nodes. Even on 4 nodes it happened in 2-4 test scripts.

But for your PR #2913 it's happening for all the test scripts. In fact all the jobs on Travis, which run on oc cluster up, are failing with almost the same error (either image pull back off or http connection error) https://travis-ci.com/github/openshift/odo/builds/162887080. I would advise you to

revert all the changes in your PR and test on Travis
switch the distro to xenial and test
implement all the changes step by step in your PR and test on Travis and report back

Maybe there is some comparability issue with xenial or some problem on Travis side regarding xenial.

mik-dass · 2020-04-30T13:13:54Z

Here's an recent incident on CI https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_odo/3054/pull-ci-openshift-odo-master-v4.2-integration-e2e-benchmark/2549#1:build-log.txt%3A382

prietyc123 · 2020-05-06T08:32:33Z

one more occurence on CI https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_odo/2913/pull-ci-openshift-odo-master-v4.2-integration-e2e/78#1:build-log.txt%3A424

openshift-bot · 2021-03-04T07:44:39Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2021-04-03T09:38:06Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

mohammedzee1000 · 2021-04-27T11:33:42Z

/remove-lifecycle rotten

openshift-bot · 2021-08-22T08:54:44Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2021-09-21T14:44:41Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2021-10-21T15:12:44Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2021-10-21T15:17:15Z

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added the flake Categorizes issue or PR as related to a flaky test. label Apr 14, 2020

prietyc123 mentioned this issue Apr 14, 2020

Add documentation for odo url create for host flag #2875

Merged

kadel added the area/testing Issues or PRs related to testing, Quality Assurance or Quality Engineering label Apr 15, 2020

amitkrout mentioned this issue Apr 21, 2020

Optimize total test time in CI #2942

Merged

amitkrout mentioned this issue Apr 23, 2020

Travis failures in "generic login, component command integration tests" #2953

Closed

amitkrout self-assigned this Apr 23, 2020

amitkrout added the priority/High Important issue; should be worked on before any other issues (except priority/Critical issue(s)). label Apr 23, 2020

amitkrout assigned mik-dass Apr 23, 2020

prietyc123 mentioned this issue Apr 27, 2020

Fix parallel test run failure for devfile integration test #2965

Merged

This was referenced Apr 29, 2020

devfile push Integration test on kubernetes cluster #3041

Merged

devfile integration tests should run against Kubernetes cluster #2752

Closed

openshift-ci-robot added priority/Medium Nice to have issue. Getting it done before priority changes would be great. and removed priority/High Important issue; should be worked on before any other issues (except priority/Critical issue(s)). labels Dec 4, 2020

prietyc123 mentioned this issue Dec 8, 2020

Setting up baseline POC of internal CI #4063

Merged

anandrkskd mentioned this issue Mar 3, 2021

Stabilize tests on PSI #4459

Closed

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2021

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 3, 2021

rnapoles-rh mentioned this issue Apr 12, 2021

Use PSI for running OpenShift 3.x tests #4288

Closed

3 tasks

prietyc123 mentioned this issue Apr 16, 2021

Removes travis & codecov configuration #4628

Merged

4 tasks

anandrkskd mentioned this issue Apr 26, 2021

Add scripts for e2e test on psi #4658

Closed

4 tasks

prietyc123 mentioned this issue Apr 27, 2021

Resolve failures on PSI infra to unblock PR tests Run #4606

Closed

2 tasks

openshift-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 27, 2021

This was referenced May 21, 2021

Automate integration and unit test for windows on PSI #4412

Closed

Verify automation process to run macOS tests on PSI #4408

Closed

prietyc123 added priority/High Important issue; should be worked on before any other issues (except priority/Critical issue(s)). and removed priority/Medium Nice to have issue. Getting it done before priority changes would be great. labels May 24, 2021

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 22, 2021

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 21, 2021

openshift-ci bot closed this as completed Oct 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

component push throws error of waited 4m0s but couldn't find running pod matching selector #2877

component push throws error of waited 4m0s but couldn't find running pod matching selector #2877

prietyc123 commented Apr 14, 2020 •

edited by amitkrout

Loading

amitkrout commented Apr 21, 2020

amitkrout commented Apr 23, 2020

mik-dass commented Apr 23, 2020

prietyc123 commented Apr 23, 2020 •

edited

Loading

prietyc123 commented Apr 27, 2020

prietyc123 commented Apr 29, 2020

prietyc123 commented Apr 29, 2020

prietyc123 commented Apr 29, 2020 •

edited

Loading

mik-dass commented Apr 29, 2020 •

edited

Loading

prietyc123 commented Apr 29, 2020

amitkrout commented Apr 29, 2020

amitkrout commented Apr 30, 2020

prietyc123 commented Apr 30, 2020

prietyc123 commented Apr 30, 2020

mik-dass commented Apr 30, 2020 •

edited

Loading

mik-dass commented Apr 30, 2020

prietyc123 commented May 6, 2020

openshift-bot commented Mar 4, 2021

openshift-bot commented Apr 3, 2021

mohammedzee1000 commented Apr 27, 2021

openshift-bot commented Aug 22, 2021

openshift-bot commented Sep 21, 2021

openshift-bot commented Oct 21, 2021

openshift-ci bot commented Oct 21, 2021

component push throws error of waited 4m0s but couldn't find running pod matching selector #2877

component push throws error of waited 4m0s but couldn't find running pod matching selector #2877

Comments

prietyc123 commented Apr 14, 2020 • edited by amitkrout Loading

What versions of software are you using?

How did you run odo exactly?

Actual behavior

Expected behavior

Any logs, error output, etc?

amitkrout commented Apr 21, 2020

amitkrout commented Apr 23, 2020

mik-dass commented Apr 23, 2020

prietyc123 commented Apr 23, 2020 • edited Loading

prietyc123 commented Apr 27, 2020

prietyc123 commented Apr 29, 2020

prietyc123 commented Apr 29, 2020

prietyc123 commented Apr 29, 2020 • edited Loading

mik-dass commented Apr 29, 2020 • edited Loading

prietyc123 commented Apr 29, 2020

amitkrout commented Apr 29, 2020

amitkrout commented Apr 30, 2020

prietyc123 commented Apr 30, 2020

prietyc123 commented Apr 30, 2020

mik-dass commented Apr 30, 2020 • edited Loading

mik-dass commented Apr 30, 2020

prietyc123 commented May 6, 2020

openshift-bot commented Mar 4, 2021

openshift-bot commented Apr 3, 2021

mohammedzee1000 commented Apr 27, 2021

openshift-bot commented Aug 22, 2021

openshift-bot commented Sep 21, 2021

openshift-bot commented Oct 21, 2021

openshift-ci bot commented Oct 21, 2021

prietyc123 commented Apr 14, 2020 •

edited by amitkrout

Loading

prietyc123 commented Apr 23, 2020 •

edited

Loading

prietyc123 commented Apr 29, 2020 •

edited

Loading

mik-dass commented Apr 29, 2020 •

edited

Loading

mik-dass commented Apr 30, 2020 •

edited

Loading