Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Application not available' when starting a Che workspace #17542

Closed
sunix opened this issue Jul 31, 2020 · 28 comments
Closed

'Application not available' when starting a Che workspace #17542

sunix opened this issue Jul 31, 2020 · 28 comments
Labels
area/dashboard kind/bug Outline of a bug - must adhere to the bug report template. severity/P1 Has a major impact to usage or development of the system.

Comments

@sunix
Copy link
Contributor

sunix commented Jul 31, 2020

Describe the bug

Starting a workspace it happens randomly about 3 times a day in che.openshift.io

image

I guess there is a synch issue, che-theia may be available internally but not externally maybe?

@sunix sunix added the kind/bug Outline of a bug - must adhere to the bug report template. label Jul 31, 2020
@che-bot che-bot added the status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. label Jul 31, 2020
@l0rd l0rd added severity/P1 Has a major impact to usage or development of the system. team/hosted-che and removed status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. labels Jul 31, 2020
@ibuziuk
Copy link
Member

ibuziuk commented Aug 3, 2020

This is a known issue that is supposed to be addressed in the upstream #17002

@skabashnyuk
Copy link
Contributor

I have some doubts about #17002 and the current issue. Theia is a special server that is tested before reported that workspace is Running.

@sunix
Copy link
Contributor Author

sunix commented Aug 3, 2020

I am wondering if there is a global latency between the time the server is available internally (from the same cluster) and the time when it is available externally. I have also noticed that issue when opening an app preview in Che. it is happening randomly but still quite frequently. Could we do a wait for loop from the dashboard ?

@ScrewTSW ScrewTSW changed the title Application not available when starting a Che workspace 'Application not available' when starting a Che workspace Aug 10, 2020
@sunix
Copy link
Contributor Author

sunix commented Aug 28, 2020

or is readiness probe missing?

@ibuziuk ibuziuk added this to the 7.22 milestone Oct 22, 2020
@ibuziuk
Copy link
Member

ibuziuk commented Oct 22, 2020

I have some doubts about #17002 and the current issue. Theia is a special server that is tested before reported that workspace is Running.

@skabashnyuk this might be related to route flakiness in this case and we might need to enable retries (e.g. mark theia as ready only after 3 successful checks). As I recall the retry functionality should be already in place and available for configuration

@ibuziuk
Copy link
Member

ibuziuk commented Nov 9, 2020

The issue should be fixed on Hosted Che end by increasing the success threshold - redhat-developer/rh-che#2009

@ibuziuk ibuziuk closed this as completed Nov 9, 2020
@sunix
Copy link
Contributor Author

sunix commented Nov 9, 2020

@ibuziuk would that make sense to have that value the default one in Che ?

@ibuziuk
Copy link
Member

ibuziuk commented Nov 12, 2020

@sunix it depends how easy it to repro against upstream + some default infra

@ibuziuk
Copy link
Member

ibuziuk commented Nov 12, 2020

TL;DR this issue is infra related and the optimal config may vary

@sunix
Copy link
Contributor Author

sunix commented Nov 13, 2020

I frequently have that issue on openshift 4.4

@sunix
Copy link
Contributor Author

sunix commented Nov 13, 2020

frequently => maybe 1/10 workspace start, but it is very random obviously.
it is just that it doesn't look like very robust when you live demo Che and have that issue.

@sunix
Copy link
Contributor Author

sunix commented Nov 24, 2020

it just happened to me on CodeReady Workspaces
image

@sunix sunix reopened this Nov 24, 2020
@ibuziuk
Copy link
Member

ibuziuk commented Dec 3, 2020

yeah, so we can increase it in the success threshold in the upstream, but it would result in slower performance - so it is a tradeoff.
Also, note that there will be no such an issue when single-host become a default route exposure strategy

@ibuziuk
Copy link
Member

ibuziuk commented Dec 3, 2020

This is a central outage issue and if can not affect it. Closing since there was no issue reported during the periodic test summary.

@ibuziuk ibuziuk closed this as completed Dec 3, 2020
@sunix
Copy link
Contributor Author

sunix commented Dec 3, 2020

I understand but I don't think changing threshold is the fix. The dashboard javascript client should check if the route works or not and retry.

@Katka92
Copy link
Contributor

Katka92 commented Dec 15, 2020

The periodic tests caught that again today.
Job and screen

@sunix
Copy link
Contributor Author

sunix commented Dec 15, 2020

@ibuziuk ok to reopen that one ?

@ibuziuk
Copy link
Member

ibuziuk commented Dec 21, 2020

@sunix the real solution is single-host, we just need to wait till it becomes the default option. You can reopen, but I doubt that anything apart from what has been already done will be improved

@Katka92
Copy link
Contributor

Katka92 commented Jan 18, 2021

This issue was caught again by our periodic tests. Job and screen

@sunix
Copy link
Contributor Author

sunix commented Jan 18, 2021

Let's reopen it and close it when we have a proper fix (single host or anything else)

@sunix sunix reopened this Jan 18, 2021
@ibuziuk ibuziuk removed their assignment Feb 18, 2021
@ibuziuk ibuziuk removed this from the 7.22 milestone Feb 18, 2021
@Katka92
Copy link
Contributor

Katka92 commented Mar 11, 2021

I've encountered that issue again when testing CRW.

@sleshchenko
Copy link
Member

I faced a bit different but similar issue #19059
I assume Dashboard itself can test IDE URL before opening iframe.

@ibuziuk
Copy link
Member

ibuziuk commented Mar 11, 2021

@sleshchenko I put the UD label

@ibuziuk
Copy link
Member

ibuziuk commented Jul 16, 2021

I've encountered that issue again when testing CRW.

@Katka92 @sleshchenko folks, isn't this problem multi-host specific (should not be reproducible against single-host configuration)?

@Katka92
Copy link
Contributor

Katka92 commented Jul 16, 2021

IDK, I'm not load testing with single-host config. Maybe it would be worth adding that test case to load tests, but with lower priority than multi-host. Now I'm testing config with multi-host (~100 workspace startups for now) and I haven't seen that yet, but it was seen for testing CRW 2.9 (based on Che 7.30.x). I can try with single-host if I have time and let you know.

@ibuziuk
Copy link
Member

ibuziuk commented Jul 21, 2021

@skabashnyuk do you have any input on when we will have a single-host enabled by default?

@skabashnyuk
Copy link
Contributor

@skabashnyuk do you have any input on when we will have a single-host enabled by default?

I don't know

@sleshchenko
Copy link
Member

I think now single host is the default suggested exposure strategy, so it should not happen and from dashboard point of view it would be overhead. Closing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dashboard kind/bug Outline of a bug - must adhere to the bug report template. severity/P1 Has a major impact to usage or development of the system.
Projects
None yet
Development

No branches or pull requests

7 participants