Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Improve e2e tests reliability #343

Merged
merged 5 commits into from
Feb 6, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ccruntime_e2e.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ jobs:
if [ $RUNNING_INSTANCE = "s390x" ]; then
args=""
fi
./run-local.sh -r "${{ matrix.runtimeclass }}" "${args}"
./run-local.sh -t -r "${{ matrix.runtimeclass }}" "${args}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it makes sense to have the timeout in the script because we don't map this steps to github job's steps, otherwise the timeouts could be set on the github workflows. This scenario might change when we address #309 .

env:
RUNNING_INSTANCE: ${{ matrix.instance }}

Expand Down
13 changes: 6 additions & 7 deletions tests/e2e/operator.sh
Original file line number Diff line number Diff line change
Expand Up @@ -164,9 +164,9 @@ uninstall_ccruntime() {
popd >/dev/null

# Wait and ensure ccruntime pods are gone
#
local cmd="! sudo -E kubectl get pods -n $op_ns |"
cmd+="grep -q -e cc-operator-daemon-install"
# (ensure failing kubectl keeps iterating)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

local cmd="_OUT=\$(sudo -E kubectl get pods -n '$op_ns')"
cmd+=" && ! echo \$_OUT | grep -q -e cc-operator-daemon-install"
cmd+=" -e cc-operator-pre-install-daemon"
if ! wait_for_process 720 30 "$cmd"; then
echo "ERROR: there are ccruntime pods still running"
Expand Down Expand Up @@ -242,10 +242,9 @@ uninstall_operator() {
popd >/dev/null

# Wait and ensure the controller pod is gone
#
local pod="cc-operator-controller-manager"
local cmd="! kubectl get pods -n $op_ns |"
cmd+="grep -q $pod"
# (ensure failing kubectl keeps iterating)
local cmd="_OUT=\$(sudo -E kubectl get pods -n '$op_ns')"
cmd+="&& ! echo \$_OUT | grep -q -e cc-operator-controller-manager"
if ! wait_for_process 180 30 "$cmd"; then
echo "ERROR: the controller manager is still running"

Expand Down
3 changes: 3 additions & 0 deletions tests/e2e/operator_tests.bats
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
load "${BATS_TEST_DIRNAME}/lib.sh"
test_tag="[cc][operator]"

# Set 10m timeout for each test
export BATS_TEST_TIMEOUT=600

is_operator_installed() {
[ "$(kubectl get deployment -n "$ns" --no-headers 2>/dev/null | wc -l)" \
-gt 0 ]
Expand Down
28 changes: 20 additions & 8 deletions tests/e2e/run-local.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ step_start_cluster=0
step_install_operator=0
runtimeclass=""
undo="false"
timeout="false"

usage() {
cat <<-EOF
Expand All @@ -29,36 +30,47 @@ usage() {
the tests. Defaults to "kata-qemu".
-u: undo the installation and configuration before exiting. Useful for
baremetal machine were it needs to do clean up for the next tests.
-t: enable default timeout for each operation (useful for CI)
EOF
}

parse_args() {
while getopts "hr:u" opt; do
while getopts "hr:ut" opt; do
case $opt in
h) usage && exit 0;;
r) runtimeclass="$OPTARG";;
u) undo="true";;
t) timeout="true";;
*) usage && exit 1;;
esac
done
}

run() {
duration=$1; shift
if [ "$timeout" == "true" ]; then
timeout $duration "$@"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it prints a friendly message (e.g. "Run timed out after XX") when it timed out? i.e. when $? -eq 124 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can open a can of worms as the script itself can return 124 so we'd have to add a logic to get the actual time (I mean we could use the $SECONDS so it's not that extensive but still) and then report "Run probably timed out after XXXs" when the timeout seems correct. Do you want me to add it or are we going to rely on log timestamps only?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... we better leave as is. If we start seeing too many timeouts and it proves to be confusing then we may change it.

else
"$@"
fi
}

undo_changes() {
pushd "$script_dir" >/dev/null
# Do not try to undo steps that did not execute.
if [ $step_install_operator -eq 1 ]; then
echo "INFO: Uninstall the operator"
sudo -E PATH="$PATH" bash -c './operator.sh uninstall' || true
run 10m sudo -E PATH="$PATH" bash -c './operator.sh uninstall' || true
fi

if [ $step_start_cluster -eq 1 ]; then
echo "INFO: Shutdown the cluster"
sudo -E PATH="$PATH" bash -c './cluster/down.sh' || true
run 5m sudo -E PATH="$PATH" bash -c './cluster/down.sh' || true
fi

if [ $step_bootstrap_env -eq 1 ]; then
echo "INFO: Undo the bootstrap"
ansible-playbook -i localhost, -c local --tags undo ansible/main.yml || true
run 5m ansible-playbook -i localhost, -c local --tags undo ansible/main.yml || true
fi
popd >/dev/null
}
Expand Down Expand Up @@ -87,19 +99,19 @@ main() {
pushd "$script_dir" >/dev/null
echo "INFO: Bootstrap the local machine"
step_bootstrap_env=1
ansible-playbook -i localhost, -c local --tags untagged ansible/main.yml
run 10m ansible-playbook -i localhost, -c local --tags untagged ansible/main.yml

echo "INFO: Bring up the test cluster"
step_start_cluster=1
sudo -E PATH="$PATH" bash -c './cluster/up.sh'
run 10m sudo -E PATH="$PATH" bash -c './cluster/up.sh'
export KUBECONFIG=/etc/kubernetes/admin.conf

echo "INFO: Build and install the operator"
step_install_operator=1
sudo -E PATH="$PATH" bash -c './operator.sh'
run 20m sudo -E PATH="$PATH" bash -c './operator.sh'

echo "INFO: Run tests"
cmd="sudo -E PATH=\"$PATH\" bash -c "
cmd="run 20m sudo -E PATH=\"$PATH\" bash -c "
if [ -z "$runtimeclass" ]; then
cmd+="'./tests_runner.sh'"
else
Expand Down