-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e-aws: stream bootkube and 'kubectl get all' to artifacts #2064
Conversation
99cc41d
to
557bca9
Compare
|
||
# stream bootkube into artifact file | ||
mkdir -p /tmp/artifacts/bootstrap | ||
ssh -o "StrictHostKeyChecking=no" core@${ip} sudo journal -u bootkube 2>&1 > /tmp/artifacts/bootstrap/bootkube.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This container is named stream-kubectl-get-all
, but it is streaming journal -u bootkube
? And it looks like your earlier container isstreaming bootkube too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo. Had planned two containers, but then noticed the amount of shared code and merged. will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
# wait for the bootstrap node to show up with an IP | ||
ip="" | ||
while [ -z "${ip}" ]; do | ||
ip=$(terraform state show -state=terraform.tfstate module.bootstrap.aws_instance.bootstrap | sed -n 's/^public_ip *= *//p') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton wanted this IP in the build logs (I think?), I'm just not clear if there's a way to get information there from a pod container. Any ideas?
stream-kubectl-get-all ${ip} & | ||
} | ||
|
||
stream & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stream
is backgrounding the long-running resource streamers internally, so we can probably drop &
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
557bca9
to
2f2b226
Compare
function stream-bootkube () { | ||
ip=${1} | ||
while true; do | ||
ssh -o "StrictHostKeyChecking=no" core@${ip} sudo journal -u bootkube 2>&1 >> /tmp/artifacts/bootstrap/bootkube.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we use the fully-qualified bootkube.service
?
Also, this redirect isn't quite right, since you redirect stderr into stdout before adjusting stdout. For example:
$ (echo hi >&2) 2>&1 >>/tmp/bootkube.log
hi
You should instead redirect stdout first, and then redirect stderr into stdout:
$ (echo hi >&2) >>/tmp/bootkube.log 2>&1
$ cat /tmp/bootkube.log
hi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the file redirection out of the loop. Makes it slightly easier to read.
if [[ -f /tmp/shared/exit ]]; then | ||
exit 0 | ||
fi | ||
sleep 60 & wait |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the backgrounded sleep here.
$ (sleep 10 && echo 'done sleeping') & (wait && echo 'done waiting')
done waiting # this shows up very quickly
done sleeping # this is delayed by 10 seconds
Why have a non-blocking sleep
? I'd expect sleep
here with a trailing kill
:
for i in $(seq 1 120); do
if [[ -f /tmp/shared/exit ]]; then
break
fi
sleep 60
done
kill-streams
where kill-streams
had internal wait
calls (possibly using explicit PIDs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the loop with the sleep is copied from the teardown script below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the kill-streams is not necessary due to the TERM trap
2f2b226
to
99c4bec
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sttts If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@wking addressed your comments. Still wondering how to test the script. It's only blindly written right now. How do you do that usually? |
There are some docs here. Personally I've been using the "hope it works and file patches if it turns out to be broken" approach unless I'm doing something big 😊 |
while true; do | ||
ssh -o "StrictHostKeyChecking=no" core@${ip} sudo journalctl -u bootkube.service -f --no-tail 2>&1 | ||
echo "=================== journalctl terminated ===================" | ||
date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we embed this in the termination marker? It feels like it might be associated with the next iteration's logs if it comes after a big banner. Something like:
echo "========== journalctl terminated $(date --iso=s --utc) =========="
would do it.
cp /etc/openshift-installer/ssh-privatekey .ssh/id_rsa | ||
cp /etc/openshift-installer/ssh-publickey .ssh/id_rsa.pub | ||
|
||
function stream-bootkube () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super complicated. Why isn't the installer fetching the bootkube logs if anything fails?
This is way more complicated than I would like. If you can't debug a failure of bootstrap from the installer logs, I think we're doing something wrong. I don't really see the value in watch -w over possibly just dumping all the objects at the beginning of teardown. |
openshift/installer#967 is work towards the installer being able to collect |
For successful runs, the bootstrap node will be gone by the time the teardown CI container starts going through it's log collection. But yeah, you could put something there to attempt to grab the bootstrap logs, and have it only succeed when bootstrapping hung. |
Obsoleted by #2633? |
@sttts: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@sttts: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/close |
@sttts: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This PR should make issues during bootstrapping much easier to understand.