-
Notifications
You must be signed in to change notification settings - Fork 266
Conversation
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great overall. I think it needs a tiny bit of cleanup
passwordVariable: 'LOG_ANALYZER_PASSWORD', | ||
usernameVariable: 'LOG_ANALYZER_USER' | ||
), | ||
file(credentialsId: 'tectonic-license', variable: 'LICENSE_PATH'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is huge. without this, we may be directly impacting Terraform's behavior unintentionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@squat could you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I’m saying is that this is a great improvement!
tests/run.sh
Outdated
#set -eo pipefail | ||
exec 2>&1 | ||
|
||
#LICENSE_PATH="" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we eliminate these?
tests/run.sh
Outdated
|
||
echo -e "\\e[36m Starting build process...\\e[0m" | ||
bazel build tarball tests/smoke | ||
#docker run --rm -v $PWD:$PWD:Z -w $PWD quay.io/coreos/tectonic-builder:bazel-v0.3 bazel build tarball tests/smoke |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here or add a note about who it's for
tests/run.sh
Outdated
### HANDLE SSH KEY ### | ||
echo -e "\\e[36m Uploading SSH key-pair to AWS...\\e[0m" | ||
if [ ! -f "$HOME/.ssh/id_rsa.pub" ]; then | ||
#cat /dev/zero | ssh-keygen -b 2048 -t rsa -f $HOME/.ssh/id_rsa -q -N "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete this line
tests/run.sh
Outdated
#shellcheck disable=SC2034 | ||
SSH=$(ssh-keygen -b 2048 -t rsa -f "${HOME}/.ssh/id_rsa" -N "" < /dev/zero) | ||
fi | ||
aws ec2 import-key-pair --key-name "jenkins-${CLUSTER_NAME}" --public-key-material "file://$HOME/.ssh/id_rsa.pub" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way around generating and uploading a new key for every run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behaviour is backported from old CI framework. Also we are running it in a docker container so the only solution would be to embed ssh key into docker container
tests/run.sh
Outdated
echo -e "\\e[36m Running smoke test...\\e[0m" | ||
SMOKE_TEST_OUTPUT=$(./smoke 2>&1) | ||
#echo -e "\\e[36m Smoke tests finished with status:\\e[0m ${SMOKE_TEST_OUTPUT}" | ||
##tectonic destroy --dir=$CLUSTER_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove these lines as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where are we setting these env variables https://github.com/coreos/tectonic-installer/blob/master/tests/smoke/cluster_test.go#L36
This would be a great simplification. My main concern is we are not showing any logs?
tests/run.sh
Outdated
#set -eo pipefail | ||
exec 2>&1 | ||
|
||
#LICENSE_PATH="" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove comment
tests/run.sh
Outdated
tectonic install --dir="${CLUSTER_NAME}" | ||
echo -e "\\e[36m Running smoke test...\\e[0m" | ||
SMOKE_TEST_OUTPUT=$(./smoke 2>&1) | ||
#echo -e "\\e[36m Smoke tests finished with status:\\e[0m ${SMOKE_TEST_OUTPUT}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment
tests/run.sh
Outdated
### HANDLE SSH KEY ### | ||
echo -e "\\e[36m Uploading SSH key-pair to AWS...\\e[0m" | ||
if [ ! -f "$HOME/.ssh/id_rsa.pub" ]; then | ||
#cat /dev/zero | ssh-keygen -b 2048 -t rsa -f $HOME/.ssh/id_rsa -q -N "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment
@enxebre what do you mean that we are not showing any logs? There are plenty of logs in jenkins job console, since every bash execution is set with |
Smoke test variables are added to the |
Hey @paulfantom I mean the smoke test output seems truncated, As a developer I need to be able to to visualise the list of tests passing/failing. |
new smoke test image version
Now everything is also logged to a file and it is possible to execute script from local machine by running @enxebre good call with the smoke tests! This should be fixed now and smoke test output should be printed in two places:
As for two other issues you mentioned, they weren't implemented in new rspec test framework and they weren't mentioned to me before, that's why they aren't implemented here. Also both of them can increase developer experience but I fear they might cloud how CI is working and I wanted to keep it as simple as possible. I think this way later migration to Prow can be potentially easier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SMOKE_TEST_OUTPUT=$(./smoke -test.v --cluster | tee >(cat - >&5))
This is making the smoke tests silently fail when something is broken. E.g check the logs for latest test run
tests/run.sh
Outdated
@@ -0,0 +1,85 @@ | |||
#!/bin/bash | |||
#shellcheck disable=SC2155 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to clarify are we actually shellcheking this or did you do it manually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it manually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to shell check as part of the basic tests. We should add this back in ASAP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Later, in another PR, I'll add shellcheck to travis jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@squat it looks like shellcheck wasn't enabled in CI pipeline for at least 4 months or it wasn't checking every *sh
file. It is just my guess based on the fact that shellcheck prints errors in buildvars.sh
file which was last modified 4 months ago.
@enxebre actually problem isn't in the bash script itself and this can be tested localy with a simple script (simplification of #!/bin/bash
set -eo pipefail
exec &> >(tee -a "test.log")
function clean() {
echo "cleanup"
}
trap clean EXIT
exec 5>&1
FF=$(curl http://asd |tee >(cat - >&5)) However the problem was in Jenkinsfile. We are wrapping execution of Seems like I removed too much from Jenkinsfile when cleaning 😄 |
export SMOKE_NETWORKING="canal" | ||
export SMOKE_NODE_COUNT="7" # Sum of all nodes (etcd + master + worker) | ||
export SMOKE_MANIFEST_PATHS="$(pwd)/$CLUSTER_NAME/generated" | ||
exec 5>&1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need to redirect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (with SMOKE_TEST_OUTPUT=$(./smoke -test.v --cluster | tee >(cat - >&5))
) allows to write output of smoke tests to the variable and simultaneously to the screen.
How it works:
tee
gets output from smoke
command and outputs to stdout (this is written to a variable) and also it tries to output to a file. In this case we use >(cat - >&5)
construct in place of a file. It creates a new file descriptor no. 5 and writes to it, but since fd 5 isn't printed on screen by default, we need to inform shell before to redirect everything from fd 5 to stdout (exec 5>&1
).
tests/run.sh
Outdated
echo -e "\\e[34m So Long, and Thanks for All the Fish\\e[0m" | ||
} | ||
trap destroy EXIT | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra space here
hey @paulfantom dropped a few questions otherwise lgtm, also the pipeline shows some steps names not very meaningful e.g "Shell Script", can we verify is working as expected as it shows "curl: (22) The requested URL returned error: 422 Unprocessable Entity |
@enxebre this 422 error is from some script which was previously included in the pipeline. I don't really know what it does, but from the name it seems that it should update something in GitHub. |
Hey @paulfantom in a follow up please let's make sure the script is still valid or remove it otherwise |
37f623c (*: unify handling of ssh keys, 2018-08-14, openshift#127) replaced our old key-pair upload with the TF_VAR_tectonic_admin_ssh_key export, and updated the message from "Uploading SSH key-pair to AWS..." to our current "Generation SSH key-pair..." message. But while we used to *always* upload a key to AWS, we've only ever generated a new key if ~/.ssh/id_rsa.pub was missing. This commit moves the Generating... mesage into the if block to avoid freaking out callers who may think we're clobbering their SSH key ;). While I'm in the area, I've also dropped the SSH variable and its associated SC2034 (unused variable) disable. The output of ssh-keygen isn't particularly interesting, so I've just set -q to quiet it instead. We'd had the old SSH and SC2034 disable since the script landed in a2405e4 (run smoke tests with bash script, 2018-06-18, coreos/tectonic-installer#3284).
We've used the Python -> jq (-> jq) -> Python approach to editing the config file since this script landed in a2405e4 (run smoke tests with bash script, 2018-06-18, coreos/tectonic-installer#3284). But it's more efficient and almost equally compact to perform those edits directly in Python. This commit uses a here-document [1] to inject the Python script. [1]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07_04
We've used the Python -> jq (-> jq) -> Python approach to editing the config file since this script landed in a2405e4 (run smoke tests with bash script, 2018-06-18, coreos/tectonic-installer#3284). But it's more efficient and almost equally compact to perform those edits directly in Python. This commit uses a here-document [1] to inject the Python script. [1]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07_04
We've used the Python -> jq (-> jq) -> Python approach to editing the config file since this script landed in a2405e4 (run smoke tests with bash script, 2018-06-18, coreos/tectonic-installer#3284). But it's more efficient and almost equally compact to perform those edits directly in Python. This commit uses a here-document [1] to inject the Python script. [1]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07_04
I'm not authorized to assume roles in the teamcoreservices account, and the assume-role call errors out with: An error occurred (AccessDenied) when calling the AssumeRole operation: Not authorized to perform sts:AssumeRole That's fine though; I can still launch the cluster with my usual access. This commit makes the role assumption optional. I've made setting up the AWS_* access variables conditional on successful role assumption, because setting them based on an empty $RES wouldn't work ;). Using the && chain with a terminal || keeps the script from dying on this assume-role failure. From the 'set -e' docs [1]: The shell does not exit if the command that fails is ... part of any command executed in a && or || list except the command following the final && or ||... I'm also only setting iamRoleName configs if assume-role succeeded. We've been setting iamRoleName since the script landed in a2405e4 (run smoke tests with bash script, 2018-06-18, coreos/tectonic-installer#3284) and possibly before that since 82daae1 (tests: add etcd role, 2018-03-14, coreos/tectonic-installer#3074). I don't see anything in those commits or PRs to motivate the iamRoleName entries, but I'd guess they, like the tf-tectonic-installer role, are specific to the Jenkins setup. I've tied them together with CONFIGURE_AWS_ROLES based on that similarity, although in theory you may be able to toggle the iamRoleName settings independently of assume-role success. Even though the &&/|| chain sets CONFIGURE_AWS_ROLES=False when assume-role failes, I'm using ${CONFIGURE_AWS_ROLES:-False} in the Python script. That way, future versions of this script that support libvirt (or other backends) won't need to bother setting CONFIGURE_AWS_ROLES and will still get valid Python here. The :- syntax is specified in [2], and my expansion defaults to False if CONFIGURE_AWS_ROLES is unset or empty. [1]: https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html [2]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02
I'm not authorized to assume roles in the teamcoreservices account, and the assume-role call errors out with: An error occurred (AccessDenied) when calling the AssumeRole operation: Not authorized to perform sts:AssumeRole That's fine though; I can still launch the cluster with my usual access. This commit makes the role assumption optional. I've made setting up the AWS_* access variables conditional on successful role assumption, because setting them based on an empty $RES wouldn't work ;). Using the && chain with a terminal || keeps the script from dying on this assume-role failure. From the 'set -e' docs [1]: The shell does not exit if the command that fails is ... part of any command executed in a && or || list except the command following the final && or ||... I'm also only setting iamRoleName configs if assume-role succeeded. We've been setting iamRoleName since the script landed in a2405e4 (run smoke tests with bash script, 2018-06-18, coreos/tectonic-installer#3284) and possibly before that since 82daae1 (tests: add etcd role, 2018-03-14, coreos/tectonic-installer#3074). I don't see anything in those commits or PRs to motivate the iamRoleName entries, but I'd guess they, like the tf-tectonic-installer role, are specific to the Jenkins setup. I've tied them together with CONFIGURE_AWS_ROLES based on that similarity, although in theory you may be able to toggle the iamRoleName settings independently of assume-role success. Even though the &&/|| chain sets CONFIGURE_AWS_ROLES=False when assume-role failes, I'm using ${CONFIGURE_AWS_ROLES:-False} in the Python script. That way, future versions of this script that support libvirt (or other backends) won't need to bother setting CONFIGURE_AWS_ROLES and will still get valid Python here. The :- syntax is specified in [2], and my expansion defaults to False if CONFIGURE_AWS_ROLES is unset or empty. [1]: https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html [2]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02
I'm not authorized to assume roles in the teamcoreservices account, and the assume-role call errors out with: An error occurred (AccessDenied) when calling the AssumeRole operation: Not authorized to perform sts:AssumeRole That's fine though; I can still launch the cluster with my usual access. This commit makes the role assumption optional. I've made setting up the AWS_* access variables conditional on successful role assumption, because setting them based on an empty $RES wouldn't work ;). Using the && chain with a terminal || keeps the script from dying on this assume-role failure. From the 'set -e' docs [1]: The shell does not exit if the command that fails is ... part of any command executed in a && or || list except the command following the final && or ||... I'm also only setting iamRoleName configs if assume-role succeeded. We've been setting iamRoleName since the script landed in a2405e4 (run smoke tests with bash script, 2018-06-18, coreos/tectonic-installer#3284) and possibly before that since 82daae1 (tests: add etcd role, 2018-03-14, coreos/tectonic-installer#3074). I don't see anything in those commits or PRs to motivate the iamRoleName entries, but I'd guess they, like the tf-tectonic-installer role, are specific to the Jenkins setup. I've tied them together with CONFIGURE_AWS_ROLES based on that similarity, although in theory you may be able to toggle the iamRoleName settings independently of assume-role success. Even though the &&/|| chain sets CONFIGURE_AWS_ROLES=False when assume-role failes, I'm using ${CONFIGURE_AWS_ROLES:-False} in the Python script. That way, future versions of this script that support libvirt (or other backends) won't need to bother setting CONFIGURE_AWS_ROLES and will still get valid Python here. The :- syntax is specified in [2], and my expansion defaults to False if CONFIGURE_AWS_ROLES is unset or empty. [1]: https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html [2]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02
The TF_VAR_* approach to SSH handling dates back to the initial script from a2405e4 (run smoke tests with bash script, 2018-06-18, coreos/tectonic-installer#3284). But since the current installer (installer/cmd/tectonic) and the next-gen installer (cmd/openshift-install) have established channels for passing in the SSH public key, there's no need to reach around and poke Terraform directly.
The TF_VAR_* approach to SSH handling dates back to the initial script from a2405e4 (run smoke tests with bash script, 2018-06-18, coreos/tectonic-installer#3284). But since the current installer (installer/cmd/tectonic) and the next-gen installer (cmd/openshift-install) have established channels for passing in the SSH public key, there's no need to reach around and poke Terraform directly. I'm loading the file content from Python to avoid issues with escaping strings that are passed in via POSIX parameter expansion.
tests/run.sh
file which assumes AWS role, builds everything and deploys to AWS, runs smoke test and cleans up after itselfBash script is used instead of rspec framework to increase clarity and reuse bash commands described in readme files.