run smoke tests with bash script #3284

paulfantom · 2018-06-12T15:29:42Z

Shorter Jenkinsfile
add tests/run.sh file which assumes AWS role, builds everything and deploys to AWS, runs smoke test and cleans up after itself

Bash script is used instead of rspec framework to increase clarity and reuse bash commands described in readme files.

coreosbot · 2018-06-12T15:29:43Z

Can one of the admins verify this patch?

squat

looks great overall. I think it needs a tiny bit of cleanup

squat · 2018-06-13T15:18:42Z

Jenkinsfile

-    passwordVariable: 'LOG_ANALYZER_PASSWORD',
-    usernameVariable: 'LOG_ANALYZER_USER'
-  ),
+  file(credentialsId: 'tectonic-license', variable: 'LICENSE_PATH'),


this is huge. without this, we may be directly impacting Terraform's behavior unintentionally.

@squat could you elaborate?

What I’m saying is that this is a great improvement!

squat · 2018-06-13T15:20:24Z

tests/run.sh

+#set -eo pipefail
+exec 2>&1
+
+#LICENSE_PATH=""


can we eliminate these?

squat · 2018-06-13T15:21:06Z

tests/run.sh

+
+echo -e "\\e[36m Starting build process...\\e[0m"
+bazel build tarball tests/smoke
+#docker run --rm -v $PWD:$PWD:Z -w $PWD quay.io/coreos/tectonic-builder:bazel-v0.3 bazel build tarball tests/smoke


same here or add a note about who it's for

squat · 2018-06-13T15:24:01Z

tests/run.sh

+### HANDLE SSH KEY ###
+echo -e "\\e[36m Uploading SSH key-pair to AWS...\\e[0m"
+if [ ! -f "$HOME/.ssh/id_rsa.pub" ]; then
+  #cat /dev/zero | ssh-keygen -b 2048 -t rsa -f $HOME/.ssh/id_rsa -q -N ""


delete this line

squat · 2018-06-13T15:24:42Z

tests/run.sh

+  #shellcheck disable=SC2034
+  SSH=$(ssh-keygen -b 2048 -t rsa -f "${HOME}/.ssh/id_rsa" -N "" < /dev/zero)
+fi
+aws ec2 import-key-pair --key-name "jenkins-${CLUSTER_NAME}" --public-key-material "file://$HOME/.ssh/id_rsa.pub"


is there a way around generating and uploading a new key for every run?

This behaviour is backported from old CI framework. Also we are running it in a docker container so the only solution would be to embed ssh key into docker container

squat · 2018-06-13T15:24:56Z

tests/run.sh

+echo -e "\\e[36m Running smoke test...\\e[0m"
+SMOKE_TEST_OUTPUT=$(./smoke 2>&1)
+#echo -e "\\e[36m Smoke tests finished with status:\\e[0m ${SMOKE_TEST_OUTPUT}"
+##tectonic destroy --dir=$CLUSTER_NAME


remove these lines as well

enxebre

where are we setting these env variables https://github.com/coreos/tectonic-installer/blob/master/tests/smoke/cluster_test.go#L36
This would be a great simplification. My main concern is we are not showing any logs?

enxebre · 2018-06-13T15:18:56Z

tests/run.sh

+#set -eo pipefail
+exec 2>&1
+
+#LICENSE_PATH=""


Remove comment

enxebre · 2018-06-13T15:22:48Z

tests/run.sh

+tectonic install --dir="${CLUSTER_NAME}"
+echo -e "\\e[36m Running smoke test...\\e[0m"
+SMOKE_TEST_OUTPUT=$(./smoke 2>&1)
+#echo -e "\\e[36m Smoke tests finished with status:\\e[0m ${SMOKE_TEST_OUTPUT}"


remove comment

enxebre · 2018-06-13T15:22:55Z

tests/run.sh

+### HANDLE SSH KEY ###
+echo -e "\\e[36m Uploading SSH key-pair to AWS...\\e[0m"
+if [ ! -f "$HOME/.ssh/id_rsa.pub" ]; then
+  #cat /dev/zero | ssh-keygen -b 2048 -t rsa -f $HOME/.ssh/id_rsa -q -N ""


remove comment

paulfantom · 2018-06-13T15:56:59Z

@enxebre what do you mean that we are not showing any logs? There are plenty of logs in jenkins job console, since every bash execution is set with bash -xe.
As for log-analyzer application, we don't need this anymore and together with RA team we agreed that this can go away.

paulfantom · 2018-06-13T16:24:27Z

Smoke test variables are added to the tests/run.sh script

enxebre · 2018-06-13T18:28:18Z

Hey @paulfantom I mean the smoke test output seems truncated, As a developer I need to be able to to visualise the list of tests passing/failing.
Also we currently dump a bunch of logs from inside the machines when something fails, while this might well be dispensable we need to raise awareness.
Also there's currently a retry policy to overcome networking transient issues.
Other than that, is looking great to me

new smoke test image version

paulfantom · 2018-06-14T16:39:39Z

Now everything is also logged to a file and it is possible to execute script from local machine by running ./tests/run.sh (important to run from top level directory).

@enxebre good call with the smoke tests! This should be fixed now and smoke test output should be printed in two places:

between tectonic install and tectonic destroy
at the end of the script, so we won't have to search for it.

As for two other issues you mentioned, they weren't implemented in new rspec test framework and they weren't mentioned to me before, that's why they aren't implemented here. Also both of them can increase developer experience but I fear they might cloud how CI is working and I wanted to keep it as simple as possible. I think this way later migration to Prow can be potentially easier.

enxebre

SMOKE_TEST_OUTPUT=$(./smoke -test.v --cluster | tee >(cat - >&5))
This is making the smoke tests silently fail when something is broken. E.g check the logs for latest test run

enxebre · 2018-06-15T12:00:57Z

tests/run.sh

@@ -0,0 +1,85 @@
+#!/bin/bash
+#shellcheck disable=SC2155


just to clarify are we actually shellcheking this or did you do it manually?

I did it manually

We used to shell check as part of the basic tests. We should add this back in ASAP

Later, in another PR, I'll add shellcheck to travis jobs.

@squat it looks like shellcheck wasn't enabled in CI pipeline for at least 4 months or it wasn't checking every *sh file. It is just my guess based on the fact that shellcheck prints errors in buildvars.sh file which was last modified 4 months ago.

paulfantom · 2018-06-17T17:28:50Z

@enxebre actually problem isn't in the bash script itself and this can be tested localy with a simple script (simplification of tests/run.sh, where smoke tests are replaced by a failing curl execution):

#!/bin/bash
set -eo pipefail

exec &> >(tee -a "test.log")

function clean() {
  echo "cleanup"
}
trap clean EXIT

exec 5>&1
FF=$(curl http://asd |tee >(cat - >&5))

However the problem was in Jenkinsfile. We are wrapping execution of tests/run.sh in another bash shell and this shell was previously started without -e parameter which should be enabled to propagate failure.

Seems like I removed too much from Jenkinsfile when cleaning 😄

enxebre · 2018-06-18T09:05:32Z

tests/run.sh

+export SMOKE_NETWORKING="canal"
+export SMOKE_NODE_COUNT="7"  # Sum of all nodes (etcd + master + worker)
+export SMOKE_MANIFEST_PATHS="$(pwd)/$CLUSTER_NAME/generated"
+exec 5>&1


why we need to redirect?

This (with SMOKE_TEST_OUTPUT=$(./smoke -test.v --cluster | tee >(cat - >&5))) allows to write output of smoke tests to the variable and simultaneously to the screen.
How it works:
tee gets output from smoke command and outputs to stdout (this is written to a variable) and also it tries to output to a file. In this case we use >(cat - >&5) construct in place of a file. It creates a new file descriptor no. 5 and writes to it, but since fd 5 isn't printed on screen by default, we need to inform shell before to redirect everything from fd 5 to stdout (exec 5>&1).

enxebre · 2018-06-18T09:05:48Z

tests/run.sh

+  echo -e "\\e[34m So Long, and Thanks for All the Fish\\e[0m"
+}
+trap destroy EXIT
+


extra space here

enxebre · 2018-06-18T09:07:29Z

hey @paulfantom dropped a few questions otherwise lgtm, also the pipeline shows some steps names not very meaningful e.g "Shell Script", can we verify is working as expected as it shows "curl: (22) The requested URL returned error: 422 Unprocessable Entity
" and update the name?
Also let's squash the commits here before merging

paulfantom · 2018-06-18T13:28:25Z

@enxebre this 422 error is from some script which was previously included in the pipeline. I don't really know what it does, but from the name it seems that it should update something in GitHub.
As for steps names, there are frequent requests to allow some sort of aliasing, but it isn't allowed now. (ex. https://issues.jenkins-ci.org/browse/JENKINS-36933)

enxebre · 2018-06-18T13:51:18Z

Hey @paulfantom in a follow up please let's make sure the script is still valid or remove it otherwise
LGTM @squat PTAL thanks!

37f623c (*: unify handling of ssh keys, 2018-08-14, openshift#127) replaced our old key-pair upload with the TF_VAR_tectonic_admin_ssh_key export, and updated the message from "Uploading SSH key-pair to AWS..." to our current "Generation SSH key-pair..." message. But while we used to *always* upload a key to AWS, we've only ever generated a new key if ~/.ssh/id_rsa.pub was missing. This commit moves the Generating... mesage into the if block to avoid freaking out callers who may think we're clobbering their SSH key ;). While I'm in the area, I've also dropped the SSH variable and its associated SC2034 (unused variable) disable. The output of ssh-keygen isn't particularly interesting, so I've just set -q to quiet it instead. We'd had the old SSH and SC2034 disable since the script landed in a2405e4 (run smoke tests with bash script, 2018-06-18, coreos/tectonic-installer#3284).

We've used the Python -> jq (-> jq) -> Python approach to editing the config file since this script landed in a2405e4 (run smoke tests with bash script, 2018-06-18, coreos/tectonic-installer#3284). But it's more efficient and almost equally compact to perform those edits directly in Python. This commit uses a here-document [1] to inject the Python script. [1]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07_04

I'm not authorized to assume roles in the teamcoreservices account, and the assume-role call errors out with: An error occurred (AccessDenied) when calling the AssumeRole operation: Not authorized to perform sts:AssumeRole That's fine though; I can still launch the cluster with my usual access. This commit makes the role assumption optional. I've made setting up the AWS_* access variables conditional on successful role assumption, because setting them based on an empty $RES wouldn't work ;). Using the && chain with a terminal || keeps the script from dying on this assume-role failure. From the 'set -e' docs [1]: The shell does not exit if the command that fails is ... part of any command executed in a && or || list except the command following the final && or ||... I'm also only setting iamRoleName configs if assume-role succeeded. We've been setting iamRoleName since the script landed in a2405e4 (run smoke tests with bash script, 2018-06-18, coreos/tectonic-installer#3284) and possibly before that since 82daae1 (tests: add etcd role, 2018-03-14, coreos/tectonic-installer#3074). I don't see anything in those commits or PRs to motivate the iamRoleName entries, but I'd guess they, like the tf-tectonic-installer role, are specific to the Jenkins setup. I've tied them together with CONFIGURE_AWS_ROLES based on that similarity, although in theory you may be able to toggle the iamRoleName settings independently of assume-role success. Even though the &&/|| chain sets CONFIGURE_AWS_ROLES=False when assume-role failes, I'm using ${CONFIGURE_AWS_ROLES:-False} in the Python script. That way, future versions of this script that support libvirt (or other backends) won't need to bother setting CONFIGURE_AWS_ROLES and will still get valid Python here. The :- syntax is specified in [2], and my expansion defaults to False if CONFIGURE_AWS_ROLES is unset or empty. [1]: https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html [2]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02

The TF_VAR_* approach to SSH handling dates back to the initial script from a2405e4 (run smoke tests with bash script, 2018-06-18, coreos/tectonic-installer#3284). But since the current installer (installer/cmd/tectonic) and the next-gen installer (cmd/openshift-install) have established channels for passing in the SSH public key, there's no need to reach around and poke Terraform directly.

The TF_VAR_* approach to SSH handling dates back to the initial script from a2405e4 (run smoke tests with bash script, 2018-06-18, coreos/tectonic-installer#3284). But since the current installer (installer/cmd/tectonic) and the next-gen installer (cmd/openshift-install) have established channels for passing in the SSH public key, there's no need to reach around and poke Terraform directly. I'm loading the file content from Python to avoid issues with escaping strings that are passed in via POSIX parameter expansion.

paulfantom added platform/aws run-smoke-tests labels Jun 12, 2018

paulfantom force-pushed the smoke_tests branch from 74132d6 to 73c7a10 Compare June 13, 2018 13:57

paulfantom changed the title ~~[WIP] run smoke tests with bash script~~ run smoke tests with bash script Jun 13, 2018

squat suggested changes Jun 13, 2018

View reviewed changes

enxebre reviewed Jun 13, 2018

View reviewed changes

run smoke tests with bash script

644c2a9

new smoke test image version

paulfantom force-pushed the smoke_tests branch from 2f5122d to 644c2a9 Compare June 14, 2018 15:26

paulfantom added 3 commits June 14, 2018 17:48

less verbose bash

7ed7c46

show smoke test progress when executed

0887faf

show smoke test progress when executed

d1924e1

enxebre reviewed Jun 15, 2018

View reviewed changes

paulfantom mentioned this pull request Jun 15, 2018

Cleanup testing suite - remove rspec #3295

Merged

paulfantom added 3 commits June 16, 2018 21:04

Update run.sh

c22f81c

jenkinsfile - exit on failure

6fd94bc

Update run.sh

130d8e2

enxebre reviewed Jun 18, 2018

View reviewed changes

tests/run.sh Outdated

echo -e "\\e[34m So Long, and Thanks for All the Fish\\e[0m"

}

trap destroy EXIT

Copy link

Member

enxebre Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space here

paulfantom added the needs review label Jun 18, 2018

[ci skip] whitespaces

a9c82a6

enxebre approved these changes Jun 18, 2018

View reviewed changes

squat approved these changes Jun 18, 2018

View reviewed changes

paulfantom merged commit a2405e4 into master Jun 18, 2018

enxebre deleted the smoke_tests branch July 2, 2018 08:15

wking mentioned this pull request Aug 17, 2018

tests/run: Shift "Generating SSH key-pair..." message into if block openshift/installer#147

Merged

wking mentioned this pull request Aug 23, 2018

tests/run: Use one Python invocation (no jq) for editing the config openshift/installer#166

Merged

wking mentioned this pull request Aug 24, 2018

tests/run: Make IAM role changes optional openshift/installer#172

Merged

wking mentioned this pull request Sep 4, 2018

tests/run: Inject the SSH key via the YAML config openshift/installer#204

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run smoke tests with bash script #3284

run smoke tests with bash script #3284

paulfantom commented Jun 12, 2018 •

edited

Loading

coreosbot commented Jun 12, 2018

squat left a comment

squat Jun 13, 2018

paulfantom Jun 13, 2018

squat Jun 13, 2018

squat Jun 13, 2018

squat Jun 13, 2018

squat Jun 13, 2018

squat Jun 13, 2018

paulfantom Jun 13, 2018

squat Jun 13, 2018

enxebre left a comment

enxebre Jun 13, 2018

enxebre Jun 13, 2018

enxebre Jun 13, 2018

paulfantom commented Jun 13, 2018

paulfantom commented Jun 13, 2018

enxebre commented Jun 13, 2018 •

edited

Loading

paulfantom commented Jun 14, 2018 •

edited

Loading

enxebre left a comment

enxebre Jun 15, 2018

paulfantom Jun 16, 2018

squat Jun 16, 2018

paulfantom Jun 16, 2018

paulfantom Jun 17, 2018 •

edited

Loading

paulfantom commented Jun 17, 2018

enxebre Jun 18, 2018

paulfantom Jun 18, 2018

enxebre Jun 18, 2018

enxebre commented Jun 18, 2018 •

edited

Loading

paulfantom commented Jun 18, 2018

enxebre commented Jun 18, 2018

run smoke tests with bash script #3284

run smoke tests with bash script #3284

Conversation

paulfantom commented Jun 12, 2018 • edited Loading

coreosbot commented Jun 12, 2018

squat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enxebre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paulfantom commented Jun 13, 2018

paulfantom commented Jun 13, 2018

enxebre commented Jun 13, 2018 • edited Loading

paulfantom commented Jun 14, 2018 • edited Loading

enxebre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paulfantom Jun 17, 2018 • edited Loading

Choose a reason for hiding this comment

paulfantom commented Jun 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enxebre commented Jun 18, 2018 • edited Loading

paulfantom commented Jun 18, 2018

enxebre commented Jun 18, 2018

paulfantom commented Jun 12, 2018 •

edited

Loading

enxebre commented Jun 13, 2018 •

edited

Loading

paulfantom commented Jun 14, 2018 •

edited

Loading

paulfantom Jun 17, 2018 •

edited

Loading

enxebre commented Jun 18, 2018 •

edited

Loading