Skip to content
This repository has been archived by the owner on Feb 5, 2020. It is now read-only.

run smoke tests with bash script #3284

Merged
merged 8 commits into from
Jun 18, 2018
Merged

run smoke tests with bash script #3284

merged 8 commits into from
Jun 18, 2018

Conversation

paulfantom
Copy link
Contributor

@paulfantom paulfantom commented Jun 12, 2018

  • Shorter Jenkinsfile
  • add tests/run.sh file which assumes AWS role, builds everything and deploys to AWS, runs smoke test and cleans up after itself

Bash script is used instead of rspec framework to increase clarity and reuse bash commands described in readme files.

@coreosbot
Copy link

Can one of the admins verify this patch?

@paulfantom paulfantom changed the title [WIP] run smoke tests with bash script run smoke tests with bash script Jun 13, 2018
Copy link
Contributor

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great overall. I think it needs a tiny bit of cleanup

passwordVariable: 'LOG_ANALYZER_PASSWORD',
usernameVariable: 'LOG_ANALYZER_USER'
),
file(credentialsId: 'tectonic-license', variable: 'LICENSE_PATH'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is huge. without this, we may be directly impacting Terraform's behavior unintentionally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@squat could you elaborate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I’m saying is that this is a great improvement!

tests/run.sh Outdated
#set -eo pipefail
exec 2>&1

#LICENSE_PATH=""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we eliminate these?

tests/run.sh Outdated

echo -e "\\e[36m Starting build process...\\e[0m"
bazel build tarball tests/smoke
#docker run --rm -v $PWD:$PWD:Z -w $PWD quay.io/coreos/tectonic-builder:bazel-v0.3 bazel build tarball tests/smoke
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here or add a note about who it's for

tests/run.sh Outdated
### HANDLE SSH KEY ###
echo -e "\\e[36m Uploading SSH key-pair to AWS...\\e[0m"
if [ ! -f "$HOME/.ssh/id_rsa.pub" ]; then
#cat /dev/zero | ssh-keygen -b 2048 -t rsa -f $HOME/.ssh/id_rsa -q -N ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete this line

tests/run.sh Outdated
#shellcheck disable=SC2034
SSH=$(ssh-keygen -b 2048 -t rsa -f "${HOME}/.ssh/id_rsa" -N "" < /dev/zero)
fi
aws ec2 import-key-pair --key-name "jenkins-${CLUSTER_NAME}" --public-key-material "file://$HOME/.ssh/id_rsa.pub"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way around generating and uploading a new key for every run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behaviour is backported from old CI framework. Also we are running it in a docker container so the only solution would be to embed ssh key into docker container

tests/run.sh Outdated
echo -e "\\e[36m Running smoke test...\\e[0m"
SMOKE_TEST_OUTPUT=$(./smoke 2>&1)
#echo -e "\\e[36m Smoke tests finished with status:\\e[0m ${SMOKE_TEST_OUTPUT}"
##tectonic destroy --dir=$CLUSTER_NAME
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these lines as well

Copy link
Member

@enxebre enxebre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are we setting these env variables https://github.com/coreos/tectonic-installer/blob/master/tests/smoke/cluster_test.go#L36
This would be a great simplification. My main concern is we are not showing any logs?

tests/run.sh Outdated
#set -eo pipefail
exec 2>&1

#LICENSE_PATH=""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comment

tests/run.sh Outdated
tectonic install --dir="${CLUSTER_NAME}"
echo -e "\\e[36m Running smoke test...\\e[0m"
SMOKE_TEST_OUTPUT=$(./smoke 2>&1)
#echo -e "\\e[36m Smoke tests finished with status:\\e[0m ${SMOKE_TEST_OUTPUT}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comment

tests/run.sh Outdated
### HANDLE SSH KEY ###
echo -e "\\e[36m Uploading SSH key-pair to AWS...\\e[0m"
if [ ! -f "$HOME/.ssh/id_rsa.pub" ]; then
#cat /dev/zero | ssh-keygen -b 2048 -t rsa -f $HOME/.ssh/id_rsa -q -N ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comment

@paulfantom
Copy link
Contributor Author

@enxebre what do you mean that we are not showing any logs? There are plenty of logs in jenkins job console, since every bash execution is set with bash -xe.
As for log-analyzer application, we don't need this anymore and together with RA team we agreed that this can go away.

@paulfantom
Copy link
Contributor Author

Smoke test variables are added to the tests/run.sh script

@enxebre
Copy link
Member

enxebre commented Jun 13, 2018

Hey @paulfantom I mean the smoke test output seems truncated, As a developer I need to be able to to visualise the list of tests passing/failing.
Also we currently dump a bunch of logs from inside the machines when something fails, while this might well be dispensable we need to raise awareness.
Also there's currently a retry policy to overcome networking transient issues.
Other than that, is looking great to me

new smoke test image version
@paulfantom
Copy link
Contributor Author

paulfantom commented Jun 14, 2018

Now everything is also logged to a file and it is possible to execute script from local machine by running ./tests/run.sh (important to run from top level directory).

@enxebre good call with the smoke tests! This should be fixed now and smoke test output should be printed in two places:

  • between tectonic install and tectonic destroy
  • at the end of the script, so we won't have to search for it.

As for two other issues you mentioned, they weren't implemented in new rspec test framework and they weren't mentioned to me before, that's why they aren't implemented here. Also both of them can increase developer experience but I fear they might cloud how CI is working and I wanted to keep it as simple as possible. I think this way later migration to Prow can be potentially easier.

Copy link
Member

@enxebre enxebre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SMOKE_TEST_OUTPUT=$(./smoke -test.v --cluster | tee >(cat - >&5))
This is making the smoke tests silently fail when something is broken. E.g check the logs for latest test run

tests/run.sh Outdated
@@ -0,0 +1,85 @@
#!/bin/bash
#shellcheck disable=SC2155
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to clarify are we actually shellcheking this or did you do it manually?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it manually

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to shell check as part of the basic tests. We should add this back in ASAP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later, in another PR, I'll add shellcheck to travis jobs.

Copy link
Contributor Author

@paulfantom paulfantom Jun 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@squat it looks like shellcheck wasn't enabled in CI pipeline for at least 4 months or it wasn't checking every *sh file. It is just my guess based on the fact that shellcheck prints errors in buildvars.sh file which was last modified 4 months ago.

@paulfantom
Copy link
Contributor Author

@enxebre actually problem isn't in the bash script itself and this can be tested localy with a simple script (simplification of tests/run.sh, where smoke tests are replaced by a failing curl execution):

#!/bin/bash
set -eo pipefail

exec &> >(tee -a "test.log")

function clean() {
  echo "cleanup"
}
trap clean EXIT

exec 5>&1
FF=$(curl http://asd |tee >(cat - >&5))

However the problem was in Jenkinsfile. We are wrapping execution of tests/run.sh in another bash shell and this shell was previously started without -e parameter which should be enabled to propagate failure.

Seems like I removed too much from Jenkinsfile when cleaning 😄

export SMOKE_NETWORKING="canal"
export SMOKE_NODE_COUNT="7" # Sum of all nodes (etcd + master + worker)
export SMOKE_MANIFEST_PATHS="$(pwd)/$CLUSTER_NAME/generated"
exec 5>&1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need to redirect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (with SMOKE_TEST_OUTPUT=$(./smoke -test.v --cluster | tee >(cat - >&5))) allows to write output of smoke tests to the variable and simultaneously to the screen.
How it works:
tee gets output from smoke command and outputs to stdout (this is written to a variable) and also it tries to output to a file. In this case we use >(cat - >&5) construct in place of a file. It creates a new file descriptor no. 5 and writes to it, but since fd 5 isn't printed on screen by default, we need to inform shell before to redirect everything from fd 5 to stdout (exec 5>&1).

tests/run.sh Outdated
echo -e "\\e[34m So Long, and Thanks for All the Fish\\e[0m"
}
trap destroy EXIT

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space here

@enxebre
Copy link
Member

enxebre commented Jun 18, 2018

hey @paulfantom dropped a few questions otherwise lgtm, also the pipeline shows some steps names not very meaningful e.g "Shell Script", can we verify is working as expected as it shows "curl: (22) The requested URL returned error: 422 Unprocessable Entity
" and update the name?
Also let's squash the commits here before merging

@paulfantom
Copy link
Contributor Author

@enxebre this 422 error is from some script which was previously included in the pipeline. I don't really know what it does, but from the name it seems that it should update something in GitHub.
As for steps names, there are frequent requests to allow some sort of aliasing, but it isn't allowed now. (ex. https://issues.jenkins-ci.org/browse/JENKINS-36933)

@enxebre
Copy link
Member

enxebre commented Jun 18, 2018

Hey @paulfantom in a follow up please let's make sure the script is still valid or remove it otherwise
LGTM @squat PTAL thanks!

@paulfantom paulfantom merged commit a2405e4 into master Jun 18, 2018
@enxebre enxebre deleted the smoke_tests branch July 2, 2018 08:15
wking added a commit to wking/openshift-installer that referenced this pull request Aug 17, 2018
37f623c (*: unify handling of ssh keys, 2018-08-14, openshift#127) replaced
our old key-pair upload with the TF_VAR_tectonic_admin_ssh_key export,
and updated the message from "Uploading SSH key-pair to AWS..." to our
current "Generation SSH key-pair..." message.  But while we used to
*always* upload a key to AWS, we've only ever generated a new key if
~/.ssh/id_rsa.pub was missing.  This commit moves the
Generating... mesage into the if block to avoid freaking out callers
who may think we're clobbering their SSH key ;).

While I'm in the area, I've also dropped the SSH variable and its
associated SC2034 (unused variable) disable.  The output of ssh-keygen
isn't particularly interesting, so I've just set -q to quiet it
instead.  We'd had the old SSH and SC2034 disable since the script
landed in a2405e4 (run smoke tests with bash script, 2018-06-18,
coreos/tectonic-installer#3284).
wking added a commit to wking/openshift-installer that referenced this pull request Aug 23, 2018
We've used the Python -> jq (-> jq) -> Python approach to editing the
config file since this script landed in a2405e4 (run smoke tests with
bash script, 2018-06-18, coreos/tectonic-installer#3284).  But it's
more efficient and almost equally compact to perform those edits
directly in Python.

This commit uses a here-document [1] to inject the Python script.

[1]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07_04
wking added a commit to wking/openshift-installer that referenced this pull request Aug 24, 2018
We've used the Python -> jq (-> jq) -> Python approach to editing the
config file since this script landed in a2405e4 (run smoke tests with
bash script, 2018-06-18, coreos/tectonic-installer#3284).  But it's
more efficient and almost equally compact to perform those edits
directly in Python.

This commit uses a here-document [1] to inject the Python script.

[1]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07_04
wking added a commit to wking/openshift-installer that referenced this pull request Aug 24, 2018
We've used the Python -> jq (-> jq) -> Python approach to editing the
config file since this script landed in a2405e4 (run smoke tests with
bash script, 2018-06-18, coreos/tectonic-installer#3284).  But it's
more efficient and almost equally compact to perform those edits
directly in Python.

This commit uses a here-document [1] to inject the Python script.

[1]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07_04
wking added a commit to wking/openshift-installer that referenced this pull request Aug 24, 2018
I'm not authorized to assume roles in the teamcoreservices account,
and the assume-role call errors out with:

  An error occurred (AccessDenied) when calling the AssumeRole
  operation: Not authorized to perform sts:AssumeRole

That's fine though; I can still launch the cluster with my usual
access.  This commit makes the role assumption optional.  I've made
setting up the AWS_* access variables conditional on successful role
assumption, because setting them based on an empty $RES wouldn't work
;).

Using the && chain with a terminal || keeps the script from dying on
this assume-role failure.  From the 'set -e' docs [1]:

  The shell does not exit if the command that fails is ... part of any
  command executed in a && or || list except the command following the
  final && or ||...

I'm also only setting iamRoleName configs if assume-role succeeded.
We've been setting iamRoleName since the script landed in a2405e4
(run smoke tests with bash script, 2018-06-18,
coreos/tectonic-installer#3284) and possibly before that since
82daae1 (tests: add etcd role, 2018-03-14,
coreos/tectonic-installer#3074).  I don't see anything in those
commits or PRs to motivate the iamRoleName entries, but I'd guess
they, like the tf-tectonic-installer role, are specific to the Jenkins
setup.  I've tied them together with CONFIGURE_AWS_ROLES based on that
similarity, although in theory you may be able to toggle the
iamRoleName settings independently of assume-role success.

Even though the &&/|| chain sets CONFIGURE_AWS_ROLES=False when
assume-role failes, I'm using ${CONFIGURE_AWS_ROLES:-False} in the
Python script.  That way, future versions of this script that support
libvirt (or other backends) won't need to bother setting
CONFIGURE_AWS_ROLES and will still get valid Python here.  The :-
syntax is specified in [2], and my expansion defaults to False if
CONFIGURE_AWS_ROLES is unset or empty.

[1]: https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html
[2]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02
wking added a commit to wking/openshift-installer that referenced this pull request Aug 24, 2018
I'm not authorized to assume roles in the teamcoreservices account,
and the assume-role call errors out with:

  An error occurred (AccessDenied) when calling the AssumeRole
  operation: Not authorized to perform sts:AssumeRole

That's fine though; I can still launch the cluster with my usual
access.  This commit makes the role assumption optional.  I've made
setting up the AWS_* access variables conditional on successful role
assumption, because setting them based on an empty $RES wouldn't work
;).

Using the && chain with a terminal || keeps the script from dying on
this assume-role failure.  From the 'set -e' docs [1]:

  The shell does not exit if the command that fails is ... part of any
  command executed in a && or || list except the command following the
  final && or ||...

I'm also only setting iamRoleName configs if assume-role succeeded.
We've been setting iamRoleName since the script landed in a2405e4
(run smoke tests with bash script, 2018-06-18,
coreos/tectonic-installer#3284) and possibly before that since
82daae1 (tests: add etcd role, 2018-03-14,
coreos/tectonic-installer#3074).  I don't see anything in those
commits or PRs to motivate the iamRoleName entries, but I'd guess
they, like the tf-tectonic-installer role, are specific to the Jenkins
setup.  I've tied them together with CONFIGURE_AWS_ROLES based on that
similarity, although in theory you may be able to toggle the
iamRoleName settings independently of assume-role success.

Even though the &&/|| chain sets CONFIGURE_AWS_ROLES=False when
assume-role failes, I'm using ${CONFIGURE_AWS_ROLES:-False} in the
Python script.  That way, future versions of this script that support
libvirt (or other backends) won't need to bother setting
CONFIGURE_AWS_ROLES and will still get valid Python here.  The :-
syntax is specified in [2], and my expansion defaults to False if
CONFIGURE_AWS_ROLES is unset or empty.

[1]: https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html
[2]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02
wking added a commit to wking/openshift-installer that referenced this pull request Aug 24, 2018
I'm not authorized to assume roles in the teamcoreservices account,
and the assume-role call errors out with:

  An error occurred (AccessDenied) when calling the AssumeRole
  operation: Not authorized to perform sts:AssumeRole

That's fine though; I can still launch the cluster with my usual
access.  This commit makes the role assumption optional.  I've made
setting up the AWS_* access variables conditional on successful role
assumption, because setting them based on an empty $RES wouldn't work
;).

Using the && chain with a terminal || keeps the script from dying on
this assume-role failure.  From the 'set -e' docs [1]:

  The shell does not exit if the command that fails is ... part of any
  command executed in a && or || list except the command following the
  final && or ||...

I'm also only setting iamRoleName configs if assume-role succeeded.
We've been setting iamRoleName since the script landed in a2405e4
(run smoke tests with bash script, 2018-06-18,
coreos/tectonic-installer#3284) and possibly before that since
82daae1 (tests: add etcd role, 2018-03-14,
coreos/tectonic-installer#3074).  I don't see anything in those
commits or PRs to motivate the iamRoleName entries, but I'd guess
they, like the tf-tectonic-installer role, are specific to the Jenkins
setup.  I've tied them together with CONFIGURE_AWS_ROLES based on that
similarity, although in theory you may be able to toggle the
iamRoleName settings independently of assume-role success.

Even though the &&/|| chain sets CONFIGURE_AWS_ROLES=False when
assume-role failes, I'm using ${CONFIGURE_AWS_ROLES:-False} in the
Python script.  That way, future versions of this script that support
libvirt (or other backends) won't need to bother setting
CONFIGURE_AWS_ROLES and will still get valid Python here.  The :-
syntax is specified in [2], and my expansion defaults to False if
CONFIGURE_AWS_ROLES is unset or empty.

[1]: https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html
[2]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02
wking added a commit to wking/openshift-installer that referenced this pull request Sep 4, 2018
The TF_VAR_* approach to SSH handling dates back to the initial script
from a2405e4 (run smoke tests with bash script, 2018-06-18,
coreos/tectonic-installer#3284).  But since the current installer
(installer/cmd/tectonic) and the next-gen installer
(cmd/openshift-install) have established channels for passing in the
SSH public key, there's no need to reach around and poke Terraform
directly.
wking added a commit to wking/openshift-installer that referenced this pull request Sep 4, 2018
The TF_VAR_* approach to SSH handling dates back to the initial script
from a2405e4 (run smoke tests with bash script, 2018-06-18,
coreos/tectonic-installer#3284).  But since the current installer
(installer/cmd/tectonic) and the next-gen installer
(cmd/openshift-install) have established channels for passing in the
SSH public key, there's no need to reach around and poke Terraform
directly.

I'm loading the file content from Python to avoid issues with escaping
strings that are passed in via POSIX parameter expansion.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants