Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved access to Hook logs #86

Closed
jacobweinstock opened this issue Sep 21, 2021 · 6 comments · Fixed by #117
Closed

Improved access to Hook logs #86

jacobweinstock opened this issue Sep 21, 2021 · 6 comments · Fixed by #117
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature.

Comments

@jacobweinstock
Copy link
Member

jacobweinstock commented Sep 21, 2021

As far as I can tell, Hook does not ship its logs off of the machine in any way. This means that troubleshooting issues with things like Hook starting up or workflows require the user to access the console (no other way to access the machine that I'm aware of) and run docker commands. It would be nice to be able to optionally ship logs somewhere outside the machine, at least to start. OSIE did this via Syslog to Boots. I'm not saying/proposing we do this, per se, but that is an option.

(Something for the proposals repo) We probably need a more cohesive approach in the whole stack too.
tink workflow events and tink workflow status only tell us high level what is going on. Then, currently with Hook, the user has to access the console to debug issues. Something via a tink command would be one way to go about it. tink logs or similar.

Expected Behaviour

Current Behaviour

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):

  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:

  • Link to your project or a code example to reproduce issue:

@jacobweinstock jacobweinstock added kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. triage/discuss Indicates a PR or issue that requires discussion labels Sep 21, 2021
@jacobweinstock jacobweinstock changed the title Improved accessing to Hook logs Improved access to Hook logs Sep 24, 2021
@rgl
Copy link

rgl commented Oct 7, 2021

FWIW, I'm playing with Loki and Grafana. This combo seems to have a pretty nice potential, as its somewhat pleasant to use.

But, I'm currently blocked by:

  1. Allow out of order log submission grafana/loki#1544 which prevents out-of-order log events from being aggregated. This is something that Loki will eventually support, so this problem will go away soon, I really hope :-)
  2. LinuxKit DIND is somewhat hard to properly configure due to the docker (loki) plugin installation shenanigans (you can peek the horror show at rgl@0d4e3bf which I even had to comment because it somewhat broken... and have to come up with another strategy, e.g. replace LinuxKit with Debian and use docker built-in support for journal logging (and promtail to export the logs).

@rgl
Copy link

rgl commented Oct 9, 2021

FWIW, I've now finished a PoC with Loki (which can be seen at https://github.com/rgl/rpi-tinkerbell-vagrant). Its not using LinuxKit Hook because it was too much trouble to configure it. Instead I'm using Debian OSIE.

Here's a peek at Grafana Explore:

image

Notice how easy it is to filter by a given worker_id :-)

Also notice it shows the logs for the install-os action, which could be easily filtered too.

@jacobweinstock
Copy link
Member Author

Hey @thebsdbox, any thoughts here, by chance?

@thebsdbox
Copy link
Contributor

Yeah, i've some work in progress.. but I need to pick it up.

@jimmyat
Copy link

jimmyat commented Oct 28, 2021

Out of interest (and sorry to hijack this issue a little) but is there currently a way to debug why a worker failed?

I've been trying to figure out how to diagnose a workflow failure but it doesn't look like there's a viable method to get access to the Docker-in-docker logs, which makes it impossible to see what went wrong.

@jacobweinstock
Copy link
Member Author

Hey @jimmyat, no worries. You should be able to access the docker-in-docker container image from the Hook console with ctr -n services.linuxkit t exec -t --exec-id test docker sh. That should drop you into a shell to be able to interact as you might expect. So you can run things like docker logs tink-worker, etc.

Some more discussion on this can be found here.

@jacobweinstock jacobweinstock added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Nov 2, 2021
@tstromberg tstromberg removed the triage/discuss Indicates a PR or issue that requires discussion label Nov 16, 2021
@mergify mergify bot closed this as completed in #117 Apr 26, 2022
mergify bot added a commit that referenced this issue Apr 26, 2022
## Description


Currently, the only way to see container logs for workflows is to access the console of a machine. This makes troubleshooting difficult. This PR sends the tink-worker and workflow container logs back to aSyslog host.

## Why is this needed

Fixes: #81 
Fixes: #86 

## How Has This Been Tested?





## How are existing users impacted? What migration steps/scripts do we need?





## Checklist:

I have:

- [ ] updated the documentation and/or roadmap (if required)
- [ ] added unit or e2e tests
- [ ] provided instructions on how to upgrade
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants