Workspace start in debug mode -> watch the logs #15983

sparkoo · 2020-02-11T06:16:10Z

Is your enhancement related to a problem? Please describe.

In case of workspace startup failure, it's very hard for the user to get the logs from the workspace. We should improve this experience, so users have better idea what happened and can write us better bug reports with workspace logs.

Describe the solution you'd like

Introduce some workspace startup flag that will enable debug mode. In case it's enabled, watch all logs from the workspace pod during the startup. Send these logs to the dashboard workspace startup screen, so user will see what's happening and in case of failure can copy the logs. Once workspace successfully starts or fails to start, stop watching.

Observer here will be che-server.

Risks

To watch logs of all workspace's containers, we need one connection per container. This is potentially bottleneck. However, considering the short time of observation, I don't think it will be much an issue.
fabric8 kubernetes client library we're using in che-server, has only synchronous api to watch the logs, so we would need to do each log watch in separate thread. Again, considering short period of time of observation, it should not be an issue.

TODO:

monitor number of connections of che-server with workspaces

Describe alternatives you've considered

--

Additional context

epic - #15047

tolusha · 2020-02-11T08:34:02Z

fabric8 kubernetes client library we're using in che-server, has only synchronous api

@sparkoo
They have a dedicated asynchronous utility called Log (in case of JS)

sparkoo · 2020-02-11T09:04:58Z

@tolusha I'm afraid it's not implemented in java client :( this is only API I've found https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/LogWatch.java where InputStream getOutput(); is blocking.

tolusha · 2020-02-11T09:17:29Z

hmm.
it should work
https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/PodLogExample.java

sparkoo · 2020-02-11T10:52:15Z

yes, here all logs are written directly to System.out. However, we need to read from it. So we could either get InputStream from LogWatch and it's blocking. Or we could pass our OutputStream as in the example, but then again, we need to read from it, which is again blocking.

RickJWagner · 2020-02-11T14:01:29Z

I think this is a great idea.
Let's please encourage use of "About to xyz" and "Finished doing xyz" log entries from the logged containers. (This will make this unified and visible log more valuable.)
Saying that a different way: Logs are much more effective if the lines they contain announce "Going to download image 123", "Finished downloading image 123", etc. This is because when something goes wrong you can tell what it was that we were trying to do. Very useful. (Opposed to only logging in exception blocks or something like that. Not so effective, sometimes.)
Thanks again for this move forward.

sparkoo · 2020-02-11T14:17:10Z

@RickJWagner as much as I agree with you, it is out of the scope of this issue. This is about grabbing the logs from containers (that already exists) and make them visible for the user, which is now very hard to impossible to reach.

Also, we often don't have a control over the logs, because we're just resending logs from kubernetes, theia, or other components.

tolusha · 2020-02-11T15:04:16Z

: Logs are much more effective if the lines they contain announce "Going to download image 123", "Finished downloading image 123", etc.

@RickJWagner
These kind of information are available as k8s events only, for instance:

s          Normal   UPDATE                  ingress/devfile-registry                Ingress che/devfile-registry
0s          Normal   Pulled                  pod/devfile-registry-d9fd7f648-vh4hp    Successfully pulled image "quay.io/eclipse/che-devfile-registry:nightly"
0s          Normal   Created                 pod/devfile-registry-d9fd7f648-vh4hp    Created container che-devfile-registry
0s          Normal   Started                 pod/devfile-registry-d9fd7f648-vh4hp    Started container che-devfile-registry
0s          Normal   Pulled                  pod/plugin-registry-58587b799b-mkhwm    Successfully pulled image "quay.io/eclipse/che-plugin-registry:nightly"
0s          Normal   Created                 pod/plugin-registry-58587b799b-mkhwm    Created container che-plugin-registry
0s          Normal   Started                 pod/plugin-registry-58587b799b-mkhwm    Started container che-plugin-registry

RickJWagner · 2020-02-11T15:58:19Z

Hi @tolusha
Thank you for that note. I think the provided events help demonstrate that they are lacking, though.

For example, imagine if we did not see this full list of events. Let's say the last event we saw was the second line, 'Created container che-devfile-registry', how would we know what kind of problem has occurred? We would not know that we failed when we tried to start the container, because we did not signal our intention before we started the action. This group of events would not be helpful in that case.

Or say we fail at the next line. We are successful at 'Started container che-devfile-registry' and we see the event in our list, but we did not have any more events in our list. We would not know that we next went off to pull image "quay.io/eclipse/che-plugin-registry:nightly". But this would be different (and improved) if we first announced "About to pull image "quay.io/eclipse/che-plugin-registry:nightly". Then if we failed in this action, we would at least know what we were trying to do when things went badly.

It is only semi-helpful to announce success when we arrive from some task. To help diagnose problems, it is much more helpful to first announce the intention, then announce completion when we are successful. (And if we are not, log the exception that prevented successful completion.)

I know we cannot get this from k8s events, they are not set up this way. We will need logs. But I am glad we are having the conversation, so people can think about it.

Thanks for considering.

tsmaeder · 2020-02-12T08:57:14Z

yes, here all logs are written directly to System.out. However, we need to read from it. So we could either get InputStream from LogWatch and it's blocking. Or we could pass our OutputStream as in the example, but then again, we need to read from it, which is again blocking.

Why not provide your own implementation of OutputStream that writes to a buffer and pokes a single reader thread to process the output? As I understand it, the blocking nature is only of concern because you want to reduce the number threads, right?

sparkoo · 2020-02-27T14:17:28Z

fixed by #16126

sparkoo added kind/enhancement A feature request - must adhere to the feature request template. severity/P1 Has a major impact to usage or development of the system. area/che-server labels Feb 11, 2020

sparkoo changed the title ~~Watch all workspace startup logs and event~~ Watch all workspace startup logs and events Feb 11, 2020

sparkoo changed the title ~~Watch all workspace startup logs and events~~ Watch all workspace startup logs Feb 11, 2020

skabashnyuk mentioned this issue Feb 11, 2020

Workspace diagnosis capabilities in Eclipse Che #15047

Closed

12 tasks

This was referenced Feb 11, 2020

Write workspace's containers stdout/err log to file #15834

Closed

Create LogCollector component #15835

Closed

Archive logs on che-server #15836

Closed

User can access workspace logs from che-server #15837

Closed

sparkoo self-assigned this Feb 11, 2020

skabashnyuk modified the milestone: 7.9.0 Feb 11, 2020

sparkoo mentioned this issue Feb 13, 2020

Watch and provide startup logs #15988

Closed

5 tasks

sparkoo changed the title ~~Watch all workspace startup logs~~ Workspace start in debug mode -> watch the logs Feb 17, 2020

This was referenced Feb 17, 2020

[dashboard] - restart workspace in debug mode after startup failure #16050

Closed

Watch plugin broker logs in workspace startup debug mode #16061

Closed

[dashboard] - make it possible to download all workspace startup logs #16063

Closed

skabashnyuk mentioned this issue Feb 19, 2020

Platform-2020-03-10 (Sprint: 180) #16078

Closed

4 tasks

skabashnyuk added this to the 7.10.0 milestone Feb 19, 2020

sparkoo mentioned this issue Feb 24, 2020

Watch plugin broker logs #16117

Closed

sparkoo mentioned this issue Feb 25, 2020

Watch and provide startup logs (workspace + broker pods) #16126

Merged

sparkoo closed this as completed Feb 27, 2020

sparkoo mentioned this issue Feb 27, 2020

Watch mkdir pod logs in workspace startup debug mode #16161

Closed

akurinnoy mentioned this issue Mar 16, 2020

Handle large amount of debug log in workspace loader #16368

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workspace start in debug mode -> watch the logs #15983

Workspace start in debug mode -> watch the logs #15983

sparkoo commented Feb 11, 2020 •

edited

Loading

tolusha commented Feb 11, 2020 •

edited

Loading

sparkoo commented Feb 11, 2020

tolusha commented Feb 11, 2020

sparkoo commented Feb 11, 2020

RickJWagner commented Feb 11, 2020

sparkoo commented Feb 11, 2020

tolusha commented Feb 11, 2020

RickJWagner commented Feb 11, 2020

tsmaeder commented Feb 12, 2020

sparkoo commented Feb 27, 2020

Workspace start in debug mode -> watch the logs #15983

Workspace start in debug mode -> watch the logs #15983

Comments

sparkoo commented Feb 11, 2020 • edited Loading

Is your enhancement related to a problem? Please describe.

Describe the solution you'd like

Risks

TODO:

Describe alternatives you've considered

Additional context

tolusha commented Feb 11, 2020 • edited Loading

sparkoo commented Feb 11, 2020

tolusha commented Feb 11, 2020

sparkoo commented Feb 11, 2020

RickJWagner commented Feb 11, 2020

sparkoo commented Feb 11, 2020

tolusha commented Feb 11, 2020

RickJWagner commented Feb 11, 2020

tsmaeder commented Feb 12, 2020

sparkoo commented Feb 27, 2020

sparkoo commented Feb 11, 2020 •

edited

Loading

tolusha commented Feb 11, 2020 •

edited

Loading