Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workspace start in debug mode -> watch the logs #15983

Closed
sparkoo opened this issue Feb 11, 2020 · 10 comments
Closed

Workspace start in debug mode -> watch the logs #15983

sparkoo opened this issue Feb 11, 2020 · 10 comments
Assignees
Labels
area/che-server kind/enhancement A feature request - must adhere to the feature request template. severity/P1 Has a major impact to usage or development of the system.
Milestone

Comments

@sparkoo
Copy link
Member

sparkoo commented Feb 11, 2020

Is your enhancement related to a problem? Please describe.

In case of workspace startup failure, it's very hard for the user to get the logs from the workspace. We should improve this experience, so users have better idea what happened and can write us better bug reports with workspace logs.

Describe the solution you'd like

Introduce some workspace startup flag that will enable debug mode. In case it's enabled, watch all logs from the workspace pod during the startup. Send these logs to the dashboard workspace startup screen, so user will see what's happening and in case of failure can copy the logs. Once workspace successfully starts or fails to start, stop watching.

Observer here will be che-server.

Risks

  • To watch logs of all workspace's containers, we need one connection per container. This is potentially bottleneck. However, considering the short time of observation, I don't think it will be much an issue.
  • fabric8 kubernetes client library we're using in che-server, has only synchronous api to watch the logs, so we would need to do each log watch in separate thread. Again, considering short period of time of observation, it should not be an issue.

TODO:

  • monitor number of connections of che-server with workspaces

Describe alternatives you've considered

--

Additional context

epic - #15047

@sparkoo sparkoo added kind/enhancement A feature request - must adhere to the feature request template. severity/P1 Has a major impact to usage or development of the system. area/che-server labels Feb 11, 2020
@sparkoo sparkoo changed the title Watch all workspace startup logs and event Watch all workspace startup logs and events Feb 11, 2020
@sparkoo sparkoo changed the title Watch all workspace startup logs and events Watch all workspace startup logs Feb 11, 2020
@sparkoo sparkoo self-assigned this Feb 11, 2020
@tolusha
Copy link
Contributor

tolusha commented Feb 11, 2020

fabric8 kubernetes client library we're using in che-server, has only synchronous api

@sparkoo
They have a dedicated asynchronous utility called Log (in case of JS)

@sparkoo
Copy link
Member Author

sparkoo commented Feb 11, 2020

@tolusha I'm afraid it's not implemented in java client :( this is only API I've found https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/LogWatch.java where InputStream getOutput(); is blocking.

@skabashnyuk skabashnyuk modified the milestone: 7.9.0 Feb 11, 2020
@sparkoo
Copy link
Member Author

sparkoo commented Feb 11, 2020

yes, here all logs are written directly to System.out. However, we need to read from it. So we could either get InputStream from LogWatch and it's blocking. Or we could pass our OutputStream as in the example, but then again, we need to read from it, which is again blocking.

@RickJWagner
Copy link
Contributor

I think this is a great idea.
Let's please encourage use of "About to xyz" and "Finished doing xyz" log entries from the logged containers. (This will make this unified and visible log more valuable.)
Saying that a different way: Logs are much more effective if the lines they contain announce "Going to download image 123", "Finished downloading image 123", etc. This is because when something goes wrong you can tell what it was that we were trying to do. Very useful. (Opposed to only logging in exception blocks or something like that. Not so effective, sometimes.)
Thanks again for this move forward.

@sparkoo
Copy link
Member Author

sparkoo commented Feb 11, 2020

@RickJWagner as much as I agree with you, it is out of the scope of this issue. This is about grabbing the logs from containers (that already exists) and make them visible for the user, which is now very hard to impossible to reach.

Also, we often don't have a control over the logs, because we're just resending logs from kubernetes, theia, or other components.

@tolusha
Copy link
Contributor

tolusha commented Feb 11, 2020

: Logs are much more effective if the lines they contain announce "Going to download image 123", "Finished downloading image 123", etc.

@RickJWagner
These kind of information are available as k8s events only, for instance:

s          Normal   UPDATE                  ingress/devfile-registry                Ingress che/devfile-registry
0s          Normal   Pulled                  pod/devfile-registry-d9fd7f648-vh4hp    Successfully pulled image "quay.io/eclipse/che-devfile-registry:nightly"
0s          Normal   Created                 pod/devfile-registry-d9fd7f648-vh4hp    Created container che-devfile-registry
0s          Normal   Started                 pod/devfile-registry-d9fd7f648-vh4hp    Started container che-devfile-registry
0s          Normal   Pulled                  pod/plugin-registry-58587b799b-mkhwm    Successfully pulled image "quay.io/eclipse/che-plugin-registry:nightly"
0s          Normal   Created                 pod/plugin-registry-58587b799b-mkhwm    Created container che-plugin-registry
0s          Normal   Started                 pod/plugin-registry-58587b799b-mkhwm    Started container che-plugin-registry

@RickJWagner
Copy link
Contributor

Hi @tolusha
Thank you for that note. I think the provided events help demonstrate that they are lacking, though.

For example, imagine if we did not see this full list of events. Let's say the last event we saw was the second line, 'Created container che-devfile-registry', how would we know what kind of problem has occurred? We would not know that we failed when we tried to start the container, because we did not signal our intention before we started the action. This group of events would not be helpful in that case.

Or say we fail at the next line. We are successful at 'Started container che-devfile-registry' and we see the event in our list, but we did not have any more events in our list. We would not know that we next went off to pull image "quay.io/eclipse/che-plugin-registry:nightly". But this would be different (and improved) if we first announced "About to pull image "quay.io/eclipse/che-plugin-registry:nightly". Then if we failed in this action, we would at least know what we were trying to do when things went badly.

It is only semi-helpful to announce success when we arrive from some task. To help diagnose problems, it is much more helpful to first announce the intention, then announce completion when we are successful. (And if we are not, log the exception that prevented successful completion.)

I know we cannot get this from k8s events, they are not set up this way. We will need logs. But I am glad we are having the conversation, so people can think about it.

Thanks for considering.

@tsmaeder
Copy link
Contributor

yes, here all logs are written directly to System.out. However, we need to read from it. So we could either get InputStream from LogWatch and it's blocking. Or we could pass our OutputStream as in the example, but then again, we need to read from it, which is again blocking.

Why not provide your own implementation of OutputStream that writes to a buffer and pokes a single reader thread to process the output? As I understand it, the blocking nature is only of concern because you want to reduce the number threads, right?

@sparkoo
Copy link
Member Author

sparkoo commented Feb 27, 2020

fixed by #16126

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/che-server kind/enhancement A feature request - must adhere to the feature request template. severity/P1 Has a major impact to usage or development of the system.
Projects
None yet
Development

No branches or pull requests

5 participants