Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

Add support for suse rootfs #161

Merged
merged 1 commit into from
Oct 9, 2018
Merged

Conversation

marcov
Copy link
Contributor

@marcov marcov commented Sep 6, 2018

Fixes: #33

@@ -0,0 +1,20 @@
#
# Copyright (c) 2018 Intel Corporation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SuSE? 😄

@@ -0,0 +1,8 @@
# This is a configuration file add extra variables to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs an SPDX license header.

rootfs-builder/suse/rootfs_lib.sh Show resolved Hide resolved
rootfs-builder/suse/config.xml Show resolved Hide resolved
<specification>openSUSE Leap Kata</specification>
</description>
<preferences>
<version>1.0.0</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOI, what is this version for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a way to track the version of the config file itself

<type image="vmx" filesystem="ext4" bootloader="grub2" />
</preferences>
<repository type="rpm-md" alias="Leap_15_0">
<source path="obs://openSUSE:Leap:15.0/standard"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, the two versions in this line and the one above (15) would be a parameter (and the file could be config.xml.in or something) to make this easy to update.

@@ -401,6 +400,12 @@ test_distro_alpine()
run_test "${name}" "" "alpine" "no" "init"
}

test_distro_ubuntu()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_distro_suse() maybe 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.....right 😳

@jodh-intel
Copy link
Contributor

Thanks @marcov - this is looking good!

@marcov marcov force-pushed the suse-rootfs branch 5 times, most recently from c3d5d93 to b17fd4a Compare September 7, 2018 14:00
@marcov marcov changed the title [WIP, do not merge] Add support for suse rootfs Add support for suse rootfs Sep 7, 2018
@marcov
Copy link
Contributor Author

marcov commented Sep 7, 2018

@jodh-intel tests are timing out?

@chavafg
Copy link
Contributor

chavafg commented Sep 7, 2018

Seems that getting stuck on:

Retrieving repository 'openSUSE-Leap-15.0-Non-Oss' metadata

Sent a rebuild on the centos job, maybe it was a transient error.

@chavafg
Copy link
Contributor

chavafg commented Sep 7, 2018

still stuck

I can reproduce locally:

 fuentess@fuentess-skull  ~/go/src/github.com/kata-containers/tests/integration/docker   topic/travis-add-os  sudo docker run -ti opensuse/leap:latest sh
sh-4.4# zypper --non-interactive install --force-resolution curl git gcc make python3-kiwi tar
Retrieving repository 'openSUSE-Leap-15.0-Non-Oss' metadata ----------------------------------------------------------------------------------------------------------------------------------------------------[\]
Timeout exceeded when accessing 'http://download.opensuse.org/distribution/leap/15.0/repo/non-oss/repodata/repomd.xml'.
Retrying in 30 seconds...
Timeout exceeded when accessing 'http://download.opensuse.org/distribution/leap/15.0/repo/non-oss/repodata/repomd.xml'.
Retrying in 30 seconds...

@marcov
Copy link
Contributor Author

marcov commented Sep 7, 2018

For me generating the builder image is fine.

But I get some slowdowns when bulding a rootfs with USE_DOCKER=1, I will check why...

@marcov marcov force-pushed the suse-rootfs branch 2 times, most recently from 081228e to e2bfc7b Compare September 11, 2018 08:23
@jodh-intel
Copy link
Contributor

Hi @marcov - any further thoughts on this? Could you try re-pushing as all the CI's are "build triggered" only :(

@marcov
Copy link
Contributor Author

marcov commented Sep 11, 2018

Hi @jodh-intel, the Travis CI jobs were partly failing because of lack of ppc64le support.
About the various timeouts on Jenkins, I am still trying to understand why. I tested locally on some different machine and I can see that for distros using RPM, things can get quite slow when running RPM inside a container. It may be related to this moby/moby/issues/23137

@marcov
Copy link
Contributor Author

marcov commented Sep 11, 2018

Some more insights about the failures:

  • Travis ppc64le targets are timing out I think because of max build duration cap of 30 minutes? Cause I am seeing logs abruptly truncated and this is also happening in the debian rootfs PR osbuilder: Add support for debian rootfs #166 .
  • Jenkins jobs seems to get stuck when running docker build on the builder image. I can't reproduce this locally and I am still trying to understand why.

@marcov marcov changed the title Add support for suse rootfs WIP - Add support for suse rootfs Sep 11, 2018
@jodh-intel
Copy link
Contributor

OOI, why do we need to pull in non-oss packages?!? Or maybe we don't in which case can we disable that repo?

@marcov
Copy link
Contributor Author

marcov commented Sep 11, 2018

non-oss repos are enabled by default in the opensuse docker image. Let's try to disable them, and also make sure --non-interactive flag is used everywhere.

Also, I temporarily disabled testing for other distros to avoid wasting resources while debugging this.

@marcov marcov force-pushed the suse-rootfs branch 2 times, most recently from 86545a7 to 04fe4ba Compare September 11, 2018 15:52
@jodh-intel
Copy link
Contributor

Still failing:

Retrieving repository 'openSUSE-Leap-15.0-Oss' metadata [.Retrieving: http://download.opensuse.org/distribution/leap/15.0/repo/oss/media.1/media [done]
Retrieving: http://download.opensuse.org/distribution/leap/15.0/repo/oss/repodata/repomd.xml.asc [done]
Retrieving: http://download.opensuse.org/distribution/leap/15.0/repo/oss/repodata/repomd.xml.key 
[...
.......Build timed out (after 50 minutes). Marking the build as aborted.
Build was aborted

This looks pretty strange - zypper seemingly successfully downloaded the tiny (<1k) repomd.xml.asc fast, but then took 50 minutes and still failed to download repomd.xml.key (also <1k).

You could try adding in some debug code before the call to zypper - something like:

$ curl -OLv http://download.opensuse.org/distribution/leap/15.0/repo/oss/repodata/repomd.xml.asc
$ curl -OLv http://download.opensuse.org/distribution/leap/15.0/repo/oss/repodata/repomd.xml.key

@marcov
Copy link
Contributor Author

marcov commented Sep 12, 2018

@jodh-intel Unluckily there's no curl in the opensuse/leap vanilla image.

@jodh-intel
Copy link
Contributor

@marcov - wget -S instead?

@marcov
Copy link
Contributor Author

marcov commented Sep 12, 2018

trying with a static build of curl

@marcov
Copy link
Contributor Author

marcov commented Sep 12, 2018

@jodh-intel Whoa 😮

> GET /distribution/leap/15.0/repo/oss/repodata/repomd.xml.key HTTP/1.1
> User-Agent: curl/7.30.0
> Host: download.opensuse.org
> Accept: */*
> 

  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
 ...
  0     0    0     0    0     0      0      0 --:--:--  0:03:14 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:03:15 --:--:--     0* Recv failure: Connection reset by peer

  0     0    0     0    0     0      0      0 --:--:--  0:03:15 --:--:--     0
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer

@jodh-intel
Copy link
Contributor

Weird. The only difference between those files from the headers are:

  • repomd.xml.asc: Content-Type: text/plain
  • repomd.xml.key: Content-Type: text/xml

@jodh-intel
Copy link
Contributor

... but http://download.opensuse.org/distribution/leap/15.0/repo/oss/repodata/repomd.xml.key is NOT xml...?

@marcov marcov force-pushed the suse-rootfs branch 2 times, most recently from 2beef6e to d77e6c1 Compare September 25, 2018 08:31
@jodh-intel
Copy link
Contributor

Ping @marcov 😄

@marcov
Copy link
Contributor Author

marcov commented Sep 28, 2018

@jodh-intel I'd love to have this merged. The issues are 2:

  1. RPM packages installation for SUSE is very slow when running inside docker. I still dont know why. And because of this build jobs times out. This is system dependent, as I can reproduce it on my system but not e.g. on a debian distro.
  2. Sometime packages download will fail because of bad mirrors. I can't reproduce this locally.

For 1., having parallel build support #170 could be a temp solution while we figure out what's the problem.
Once 1 is solved, we can check if 2. is still failing.

@jodh-intel
Copy link
Contributor

@marcov - ack and thanks for the summary.

@marcov
Copy link
Contributor Author

marcov commented Oct 2, 2018

@jodh-intel I'd love to have this merged. The issues are 2:

  1. RPM packages installation for SUSE is very slow when running inside docker. I still dont know why. And because of this build jobs times out. This is system dependent, as I can reproduce it on my system but not e.g. on a debian distro.

Found the cause, and this is now fixed upstream :)
openSUSE/zypper/pull/209

@grahamwhaley
Copy link
Contributor

Ouch - good fix @marcov

@marcov
Copy link
Contributor Author

marcov commented Oct 2, 2018

Given that the fix to zypper will take a while to land into openSUSE Leap repositories, I added as "workaround" ulimit -n 1024, i.e. a soft limit to 1024 on the number of file descriptors that can be opened by a process.

@marcov marcov changed the title WIP - Add support for suse rootfs Add support for suse rootfs Oct 2, 2018
@marcov
Copy link
Contributor Author

marcov commented Oct 2, 2018

Let's see if the bad mirror is gone

/test

@marcov
Copy link
Contributor Author

marcov commented Oct 2, 2018

nope, it's still around 😢

@jodh-intel
Copy link
Contributor

Is there any way to blacklist this bad mirror I wonder?

@marcov
Copy link
Contributor Author

marcov commented Oct 3, 2018

@jodh-intel: There isn't.
I'll try to fetch metadata using HTTPS instead, and if that fails, fall back to hard code a mirror domain.

@marcov
Copy link
Contributor Author

marcov commented Oct 3, 2018

Trying with HTTPS

/test

Add support for building a rootfs image based on openSUSE Leap.

Fixes: kata-containers#33

Signed-off-by: Marco Vedovati <mvedovati@suse.com>
@marcov
Copy link
Contributor Author

marcov commented Oct 4, 2018

/test

@marcov
Copy link
Contributor Author

marcov commented Oct 8, 2018

Tests on euleros are failing since some days (locally on my machine too).
Anything that could be done apart from waiting? @jcvenegas @jodh-intel

Error downloading packages:                                                                 
  pam-1.1.8-18.h4.x86_64: [Errno 256] No more mirrors to try.                             
  2:shadow-utils-4.1.5.1-21.h1.x86_64: [Errno 256] No more mirrors to try.                       
  1:dbus-libs-1.6.12-14.h12.x86_64: [Errno 256] No more mirrors to try.               
  popt-1.13-16.x86_64: [Errno 256] No more mirrors to try.                                      
  readline-6.2-9.x86_64: [Errno 256] No more mirrors to try.                                
  elfutils-libelf-0.163-3.h1.x86_64: [Errno 256] No more mirrors to try.                  
  info-5.1-4.x86_64: [Errno 256] No more mirrors to try.                           
  1:findutils-4.5.11-5.x86_64: [Errno 256] No more mirrors to try.            
  libselinux-2.2.2-6.x86_64: [Errno 256] No more mirrors to try. 

@marcov
Copy link
Contributor Author

marcov commented Oct 8, 2018

Finally an all tests are green.
This just need some more l g t m and it should be good to go :)

@jcvenegas
Copy link
Member

@marcov I think given that is not stable when is working or not we have a few options:

  1. Have a separated job and make it optional to be a blocker
  2. Keep in the same job but dont fail and get the log of the fail and automate to the job notify by a

@jcvenegas
Copy link
Member

@kata-containers/builder please take a look. @kata-containers/documentation an ack is required here.

@jodh-intel
Copy link
Contributor

jodh-intel commented Oct 9, 2018

lgtm

Approved with PullApprove Approved with PullApprove

@jodh-intel jodh-intel merged commit 37d1824 into kata-containers:master Oct 9, 2018
@marcov
Copy link
Contributor Author

marcov commented Oct 9, 2018

@marcov I think given that is not stable when is working or not we have a few options:

  1. Have a separated job and make it optional to be a blocker
  2. Keep in the same job but dont fail and get the log of the fail and automate to the job notify by a

@jcvenegas I did some investigation about the euleros errors I am getting locally and the TL;DR is that failures depends on the DNS server I use. Maybe also the location has an influence (the repo is on akamai CDN).

About your proposal, I'd prefer going for 2, and have somewhere in the test config a list of distros that are allowed to fail. Failure can be captured in the log but will not impact the green/red CI result.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants