Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

tests: parallel images build support #170

Merged
merged 2 commits into from
Oct 2, 2018

Conversation

marcov
Copy link
Contributor

@marcov marcov commented Sep 17, 2018

Rework test_image.sh and Makefile to allow building images in parallel
for faster test execution.
Add some new targets to Makefile ({rootfs,image,initrd}-all,
list-distros).

Fixes: #168

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I'd like to give this a try later on. +1 for creating a tests/test_config.sh btw! ;)

/cc @jcvenegas.


local rootfs_size=$(du -sb "${rootfs}" | awk '{print $1}')
echo "will build with tgts: ${makeTargets[@]}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/tgts/targets/

then
# Images need systemd
[ "$opt" = "init" ] && continue
sudo -E make -j ${makeTargets[@]} ${makeVars[@]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will saturate the CI system nicely but could effectively DoS a dev system. It's extra logic, but could you add a check for the CI variable:

  • if it's set, use -j.
  • if it isn't set, use "-j $cpus" where $cpus is the value of $(nproc) - 1.

In other words, retain 1 core for the user to be able to use their system ;)

/cc @grahamwhaley.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 - we use nproc in other places as well etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually make is run by two bash processes in background (to build in parallel with and without AGENT_INIT=yes). So, to play it safe, should it be $(nproc) / 2 -1 ?

 4220 pts/2    S+     0:00  \_ /bin/bash ./tests/test_images.sh
 4455 pts/2    S+     0:00      \_ /bin/bash ./tests/test_images.sh
 4460 pts/2    S+     0:00      |   \_ sudo -E make -j image-fedora image-centos image-ubuntu image-clearlinux image-eu
 4462 pts/2    S+     0:00      |       \_ make -j image-fedora image-centos image-ubuntu image-clearlinux image-eulero
 4510 pts/2    S+     0:00      |           \_ /bin/bash /home/marco/go/src/github.com/kata-containers/osbuilder/rootfs
 4656 pts/2    Sl+    0:00      |           |   \_ docker run --env https_proxy= --env http_proxy= --env AGENT_VERSION=
 4511 pts/2    S+     0:00      |           \_ /bin/bash /home/marco/go/src/github.com/kata-containers/osbuilder/rootfs
 4683 pts/2    Sl+    0:00      |           |   \_ docker run --env https_proxy= --env http_proxy= --env AGENT_VERSION=
 4515 pts/2    S+     0:00      |           \_ /bin/bash /home/marco/go/src/github.com/kata-containers/osbuilder/rootfs
 4634 pts/2    Sl+    0:00      |           |   \_ docker run --env https_proxy= --env http_proxy= --env AGENT_VERSION=
 4527 pts/2    S+     0:00      |           \_ /bin/bash /home/marco/go/src/github.com/kata-containers/osbuilder/rootfs
 4559 pts/2    S+     0:00      |           |   \_ /bin/bash /home/marco/go/src/github.com/kata-containers/osbuilder/ro
 4561 pts/2    S+     0:00      |           |       \_ curl -sL https://download.clearlinux.org/latest
 4529 pts/2    S+     0:00      |           \_ /bin/bash /home/marco/go/src/github.com/kata-containers/osbuilder/rootfs
 4671 pts/2    Sl+    0:00      |               \_ docker run --env https_proxy= --env http_proxy= --env AGENT_VERSION=
 4456 pts/2    S+     0:00      \_ /bin/bash ./tests/test_images.sh
 4459 pts/2    S+     0:00          \_ sudo -E make -j initrd-alpine USE_DOCKER=true ROOTFS_BUILD_DEST=/tmp/osbuilder-t
 4461 pts/2    S+     0:00              \_ make -j initrd-alpine USE_DOCKER=true ROOTFS_BUILD_DEST=/tmp/osbuilder-test.
 4509 pts/2    S+     0:00                  \_ /bin/bash /home/marco/go/src/github.com/kata-containers/osbuilder/rootfs
 4680 pts/2    Sl+    0:00                      \_ docker run --env https_proxy= --env http_proxy= --env AGENT_VERSION=

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this won't be so bad as I suspect most of the test is going to be network-bound but we just need to ensure we don't kill a developers box by running this test ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good thought @marcov Let's try nproc/2. Although... I have no idea how many procs our cloud test instances generally have - I can't see that info in a console log either. @chavafg - any ideas on that?
We might want to do some experimentation and monitor top before we decide on a setting - if we are net bound like @jodh-intel says, then we can bump it up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add the CI variable definition to .travis.yml to take full advantage of the parallel building.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-h : Show this help message
-a : agent version DEFAULT: ${AGENT_VERSION} ENV: AGENT_VERSION
-h : show this help message
-l : list the supported Linux distributions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you put this change into a separate commit as it's not strictly related to running the tests in parallel.

makeVars=()
makeTargets=()
for t in $@; do
if [[ "$t" =~ .+= ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, deep magic ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll reformat a bit to make it easier to read

@marcov
Copy link
Contributor Author

marcov commented Sep 17, 2018

I think the 2 build configurations in .travis.yml with and without AGENT_INIT can be dropped now, as those are already built in parallel in tests_images.sh.

Do you agree?

PS: looking at the current code on master and at the Travis build logs, I am not sure this split is actually done.

@jodh-intel
Copy link
Contributor

I think the 2 build configurations in .travis.yml with and without AGENT_INIT can be dropped now, as those are already built in parallel in tests_images.sh.

Sounds good to me - is that ok with you @bergwolf?

@grahamwhaley
Copy link
Contributor

/cc @chavafg for thoughts.

@marcov marcov force-pushed the parallel-build branch 2 times, most recently from beaa586 to 5b478ac Compare September 17, 2018 13:52
@marcov marcov force-pushed the parallel-build branch 5 times, most recently from d7231be to aaa8c55 Compare September 21, 2018 15:13
@marcov marcov changed the title WIP -- osbuilder: parallel images build support tests: parallel images build support Sep 21, 2018
@marcov
Copy link
Contributor Author

marcov commented Sep 21, 2018

Hey, could any of the approvers kindly kick the CI ? 😛

@grahamwhaley
Copy link
Contributor

/test
was that the phrase...

@chavafg
Copy link
Contributor

chavafg commented Sep 21, 2018

/test

@marcov
Copy link
Contributor Author

marcov commented Sep 21, 2018

I added set -u to test_images.sh and the failure on jenkins-ci-centos-7 is definitely a bug in my code ($CI should be ${CI:-}).

/tmp/jenkins/workspace/kata-containers-osbuilder-centos-7-4-PR/go/src/github.com/kata-containers/osbuilder/.ci/../tests/test_images.sh: line 294: CI: unbound variable
/tmp/jenkins/workspace/kata-containers-osbuilder-centos-7-4-PR/go/src/github.com/kata-containers/osbuilder/.ci/../tests/test_images.sh: line 294: CI: unbound variable
ERROR: Background build job failed
INFO: ERROR: test failed

However I am kind of surprised that CI is not defined on jenkins-ci-centos-7 ..!
@chavafg can you verify if it is set in Jenkins config?

@chavafg
Copy link
Contributor

chavafg commented Sep 21, 2018

@marcov you were right, CI was not defined in the jobs. I already updated them.
Thanks.

@chavafg
Copy link
Contributor

chavafg commented Sep 21, 2018

/test

@chavafg
Copy link
Contributor

chavafg commented Sep 21, 2018

we are getting this error, when running the tests:

docker: Error response from daemon: OCI runtime create failed: Failed to check if grpc server is working: rpc error: code = DeadlineExceeded desc = context deadline exceeded: unknown.

which I have seen lately in some bug reports from users.

@chavafg
Copy link
Contributor

chavafg commented Sep 21, 2018

and found

/tmp/jenkins/workspace/kata-containers-osbuilder-fedora-27-PR/go/src/github.com/kata-containers/osbuilder/.ci/../tests/test_images.sh: line 189: ID_LIKE: unbound variable

on the fedora job.


source /etc/os-release

if [[ "$ID_LIKE" =~ suse ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @chavafg noticed, you'll need to set this variable before sourcing /etc/os-release for those distros that don't provide the variable in that file:

ID_LIKE=""

source /etc/os-release

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chavafg @jodh-intel thanks.

About the DeadlineExceeded: is this related: kata-containers/runtime#702?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @marcov - yep, I think so.

@jodh-intel
Copy link
Contributor

Hi @marcov - I spot a problem! If you look at the 16.04 log for example, and search for "INFO: osbuilder metadata file for", you'll see that multiple YAML files have been concatenated into osbuilder.yaml.

@marcov
Copy link
Contributor Author

marcov commented Sep 24, 2018

@jodh-intel: no, that's actually the output of multiple files, but the spacing make it look like it's a single file :). I'll try to make it more readable.

touch /tmp/osbuilder-test.UsE7M2P/rootfs-osbuilder/.euleros_rootfs.done
INFO: osbuilder metadata file for fedora:

---
osbuilder:
  url: "https://github.com/kata-containers/osbuilder"
  version: "unknown"
rootfs-creation-time: "2018-09-21T16:30:20.420795130+0000Z"
...
  agent-is-init-daemon: "no"
INFO: osbuilder metadata file for centos:

---
osbuilder:
  url: "https://github.com/kata-containers/osbuilder"
  version: "unknown"

@marcov
Copy link
Contributor Author

marcov commented Sep 24, 2018

/test

@jodh-intel
Copy link
Contributor

Ah - thanks, that would make it clearer ;)

@marcov marcov force-pushed the parallel-build branch 4 times, most recently from dc7fd27 to a5f6b76 Compare September 25, 2018 18:18
@marcov
Copy link
Contributor Author

marcov commented Sep 27, 2018

/test

@marcov
Copy link
Contributor Author

marcov commented Sep 27, 2018

Let's wait for 1.3.0 with urandom fix to be available on OBS before retesting...

@jodh-intel
Copy link
Contributor

@marcov - 1.3.0 is out now! ;)

@jodh-intel
Copy link
Contributor

...errm, well yes it is, but not on OBS yet... :)

@marcov
Copy link
Contributor Author

marcov commented Sep 28, 2018

@jodh-intel, actually the 1.3.0 is on OBS too.

The problem is that the installation instructions still points to a legacy repo path, e.g. for Ubuntu.

And kata-manager is using those instructions to install Kata on Jenkins nodes.

And maybe this is another thing to consider in kata-containers/documentation#234?

@jodh-intel
Copy link
Contributor

@marcov - gah - I think I got browser-cached when checking OBS! :)

Yep - we need more input on that thread - thanks for yours btw!

@marcov
Copy link
Contributor Author

marcov commented Sep 28, 2018

PS: looks like including a comment with a URL containing / t e s t has as side effect triggering the CI 🤣

@jodh-intel
Copy link
Contributor

@chavafg - ^ 😄

@grahamwhaley
Copy link
Contributor

Heh - the CI trigger is done via a regexp. Maybe we need/are missing a ^ on the front of the pattern for such cases.

@marcov
Copy link
Contributor Author

marcov commented Sep 28, 2018

And a $ too :)

@jodh-intel
Copy link
Contributor

... If I had a dollar for every caret... 😄

@chavafg
Copy link
Contributor

chavafg commented Sep 28, 2018

ouch. maybe something like .*^/test$.* to support multilines? checking...

@marcov
Copy link
Contributor Author

marcov commented Oct 1, 2018

/test

@marcov
Copy link
Contributor Author

marcov commented Oct 2, 2018

All green with the urandom fix.
I am removing from .travis.yml the AGENT_INIT=yes / no combinations, since now the agent / systemd builds are run in parallel by the test script directly.

marcov added 2 commits October 2, 2018 12:58
Rework test_images.sh and Makefile to allow building artifacts in
parallel for faster tests execution.
Add new targets to Makefile ({rootfs,image,initrd}-<distro name>).

Fixes: kata-containers#168

Signed-off-by: Marco Vedovati <mvedovati@suse.com>
Remove the AGENT_INIT = yes / no combinations from .travis.yml,
as test_images.sh is now running both builds in parallel.

Signed-off-by: Marco Vedovati <mvedovati@suse.com>
@jodh-intel
Copy link
Contributor

@marcov - nice work! :)

@marcov
Copy link
Contributor Author

marcov commented Oct 2, 2018

/test

@grahamwhaley
Copy link
Contributor

Possible network related problem on 16.04 (or, could be our OBS). Giving it a nudge.

0 upgraded, 8 newly installed, 0 to remove and 81 not upgraded.
Need to get 60.8 MB of archives.
After this operation, 675 MB of additional disk space will be used.
Get:1 http://download.opensuse.org/repositories/home:/katacontainers:/releases:/x86_64:/master/xUbuntu_16.04  kata-containers-image 1.3.0-9 [30.1 MB]
Get:2 http://download.opensuse.org/repositories/home:/katacontainers:/releases:/x86_64:/master/xUbuntu_16.04  kata-ksm-throttler 1.3.0.git+6e903fb-11 [2,611 kB]
Get:3 http://download.opensuse.org/repositories/home:/katacontainers:/releases:/x86_64:/master/xUbuntu_16.04  kata-linux-container 4.14.67.12-10 [10.1 MB]
Get:4 http://download.opensuse.org/repositories/home:/katacontainers:/releases:/x86_64:/master/xUbuntu_16.04  kata-proxy 1.3.0+git.6ddb006-9 [724 kB]
Get:5 http://download.opensuse.org/repositories/home:/katacontainers:/releases:/x86_64:/master/xUbuntu_16.04  kata-shim 1.3.0+git.5fbf1f0-8 [2,297 kB]
Err:5 http://download.opensuse.org/repositories/home:/katacontainers:/releases:/x86_64:/master/xUbuntu_16.04  kata-shim 1.3.0+git.5fbf1f0-8
  Hash Sum mismatch
Get:6 http://download.opensuse.org/repositories/home:/katacontainers:/releases:/x86_64:/master/xUbuntu_16.04  qemu-lite 2.11.0+git.f886228056-12 [5,549 kB]
Get:7 http://download.opensuse.org/repositories/home:/katacontainers:/releases:/x86_64:/master/xUbuntu_16.04  qemu-vanilla 2.11.2+git.0982a56a55-12 [5,522 kB]
Get:8 http://download.opensuse.org/repositories/home:/katacontainers:/releases:/x86_64:/master/xUbuntu_16.04  kata-runtime 1.3.0+git.a786643-14 [3,887 kB]
Fetched 58.5 MB in 40s (1,457 kB/s)
INFO: ERROR: test failed

INFO: images:

total 0
INFO: rootfs:

ls: cannot access '/tmp/osbuilder-test.UpjOGiZ/rootfs-osbuilder': No such file or directory
Build step 'Execute shell' marked build as failure

@marcov
Copy link
Contributor Author

marcov commented Oct 2, 2018

one more ack to go 🙂

Copy link
Contributor

@grahamwhaley grahamwhaley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CIs are happy.
I'm going to presume @jcvenegas has read over this and has no issues.
lgtm

@grahamwhaley grahamwhaley merged commit caf485d into kata-containers:master Oct 2, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants