Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

Handle EulerOS separately due to timeout issues #182

Closed
jodh-intel opened this issue Oct 9, 2018 · 13 comments
Closed

Handle EulerOS separately due to timeout issues #182

jodh-intel opened this issue Oct 9, 2018 · 13 comments
Assignees

Comments

@jodh-intel
Copy link
Contributor

As @jcvenegas mentioned #161 (comment), we should consider handling EulerOS separately as it continues to be problematic.

We see regular issues running the tests for EulerOS, generally related to server timeouts. Since such failures impact on all osbuilder PRs, we should consider treating EulerOS differently and either not fail if the EulerOS tests fail, or create a separate test job specifically for EulerOS that is non-blocking for new PRs.

We would hope such a solution would be temporary until either more EulerOS mirrors appear or the main mirrors are upgraded maybe.

@marcov
Copy link
Contributor

marcov commented Oct 9, 2018

Thanks for opening an issue @jodh-intel. I already commented in #161, but I think it's worth doing a copy-paste here.


@marcov I think given that is not stable when is working or not we have a few options:

  1. Have a separated job and make it optional to be a blocker
  2. Keep in the same job but dont fail and get the log of the fail and automate to the job notify by a

@jcvenegas I did some investigation about the euleros errors I am getting locally and the TL;DR is that failures depends on the DNS server I use. Maybe also the location has an influence (the repo is on akamai CDN).

About your proposal, I'd prefer going for 2, and have somewhere in the test config a list of distros that are allowed to fail. Failure can be captured in the log but will not impact the green/red CI result.

@marcov
Copy link
Contributor

marcov commented Oct 9, 2018

Some more info about the problem if you are really interested:

(From my location,) developer.huawei.com resolves to 2 different repo IPs depending on what DNS I am using.

  • The bad repo IP answers with HTTP 304 - Not modified - when trying to pull some of the rpm packages, even when forcing no caching. This will trigger the error message from rpm: Package does not match intended download, and from yum: No more mirrors to try
  • The good repo IP answers with HTTP 200 to the same HTTP GET

Bad repo:

$ nslookup developer.huawei.com 8.8.8.8
Server:         8.8.8.8
Address:        8.8.8.8#53
-- CUT --
Name:   e10173.a.akamaiedge.net
Address: 23.39.165.156

$ curl -sSLv  -H 'Cache-Control: no-cache' -O http://developer.huawei.com/ict/site-euleros/euleros/repo/yum/2.2/os/x86_64/updates/dbus-libs-1.6.12-14.h12.x86_64.rpm                                                    
* About to connect() to developer.huawei.com port 80 (#0)                                                             
*   Trying 23.39.165.156...
* Connected to developer.huawei.com (23.39.165.156) port 80 (#0)
> GET /ict/site-euleros/euleros/repo/yum/2.2/os/x86_64/updates/dbus-libs-1.6.12-14.h12.x86_64.rpm HTTP/1.1
> User-Agent: curl/7.29.0
> Host: developer.huawei.com
> Accept: */*
> Cache-Control: no-cache
> 
< HTTP/1.1 304 Not Modified                                                                                           
< Content-Type: audio/x-pn-realaudio-plugin                                                                           
< Last-Modified: Sat, 28 Jul 2018 08:48:27 GMT
< ETag: "1039cca7fae2f6aa474191f4a9d80c61:1532767707"
< Cache-Control: max-age=87018
< Expires: Wed, 10 Oct 2018 09:42:41 GMT
< Date: Tue, 09 Oct 2018 09:32:23 GMT
< Connection: keep-alive
< 
* Connection #0 to host developer.huawei.com left intact

Good repo:

$ nslookup developer.huawei.com 1.1.1.1                                                                    
Server:         1.1.1.1
Address:        1.1.1.1#53
-- CUT --
Name:   e10173.a.akamaiedge.net
Address: 104.83.130.9


$ curl -sSLv  -H 'Cache-Control: no-cache' -O http://developer.huawei.com/ict/site-euleros/euleros/repo/yum/2.2/os/x86_64/updates/dbus-libs-1.6.12-14.h12.x86_64.rpm                                                             
*   Trying 104.83.130.9...                                                                                            
* TCP_NODELAY set
* Connected to developer.huawei.com (104.83.130.9) port 80 (#0)                                                       
> GET /ict/site-euleros/euleros/repo/yum/2.2/os/x86_64/updates/dbus-libs-1.6.12-14.h12.x86_64.rpm HTTP/1.1            
> Host: developer.huawei.com
> User-Agent: curl/7.61.1
> Accept: */*
> Cache-Control: no-cache
> 
< HTTP/1.1 200 OK
< Server: Apache
< ETag: "1039cca7fae2f6aa474191f4a9d80c61:1532767707"
< Last-Modified: Sat, 28 Jul 2018 08:48:27 GMT
< Accept-Ranges: bytes
< Content-Length: 156164
< Content-Type: audio/x-pn-realaudio-plugin
< Cache-Control: max-age=167733
< Expires: Thu, 11 Oct 2018 08:06:33 GMT
< Date: Tue, 09 Oct 2018 09:31:00 GMT
< Connection: keep-alive
< 
{ [1085 bytes data]                                                                                                   
* Connection #0 to host developer.huawei.com left intact

@jodh-intel
Copy link
Contributor Author

/cc @liangchenye, @WeiZhang555, @caoruidong, @clarecch, @jshachm.

@caoruidong
Copy link
Member

cc @initlove

@liangchenye
Copy link
Contributor

+1 for the second option.

I'm writing to my colleague to see if we can have another stronger mirror.
(mirrors.huaweicloud.com for example)
@jodh-intel @marcov

@jodh-intel
Copy link
Contributor Author

Thanks @liangchenye! 😄

@liangchenye
Copy link
Contributor

@jodh-intel I got very negative response from the mirror maintainer (mirrors.huaweicloud.com). They only have China local server so cannot guarantee to provide stronger service.
I think we may have to make EulerOS build a separate job .

@jodh-intel
Copy link
Contributor Author

Hi @liangchenye - ok, thanks for the update. We'll create a separate CI job for testing EulerOS then.

That job can just call tests/test_images.sh euleros, but we'll need to change that script so that EulerOS is not tested by default I think.

@marcov
Copy link
Contributor

marcov commented Oct 17, 2018

That job can just call tests/test_images.sh euleros, but we'll need to change that script so that EulerOS is not tested by default I think.

For this you just need to remove euleros from the test config :)

@jodh-intel
Copy link
Contributor Author

Ah - good point!


I do wonder if we should move as much of the contents of tests/test_config.sh into the rootfs-builder/${distro}/config.sh config files though - I think that would be handled by each distro setting vars like:

USES_SYSTEMD=[no|yes]
SUPPORTS_AGENT_AS_INIT=[yes|no]

# space-separated list of architectures the distro does *not* support
ARCH_EXCLUDE_LIST="ppc64le foo bar"

That way, each distro "advertises" what is supports and then tests/test_config.sh could contain a single variable for the list of distros that should not be tested under Travis.

@marcov
Copy link
Contributor

marcov commented Oct 17, 2018

@jodh-intel: I like this. I'm wondering if there's a smart way to read the config from config.sh without using source.

@marcov
Copy link
Contributor

marcov commented Oct 18, 2018

@jodh-intel, I am assigning myself for the changes to osbuilder :)

@marcov marcov self-assigned this Oct 18, 2018
@jodh-intel
Copy link
Contributor Author

Thanks @marcov ! :)

marcov added a commit to marcov/kata-osbuilder that referenced this issue Oct 18, 2018
Move the test configuration in the distro-specific config.sh
file, for better control of what to include/exclude from
testing based on the test environment.
test_config.sh is still used to exclude specific distros from
being tested, when running tests in bulk.

Fixes: kata-containers#182

Signed-off-by: Marco Vedovati <mvedovati@suse.com>
marcov added a commit to marcov/kata-osbuilder that referenced this issue Oct 18, 2018
Move the test configuration in the distro-specific config.sh
file, for better control of what to include/exclude from
testing based on the test environment.
test_config.sh is still used to exclude specific distros from
being tested, when running tests in bulk.

Fixes: kata-containers#182

Signed-off-by: Marco Vedovati <mvedovati@suse.com>
marcov added a commit to marcov/kata-osbuilder that referenced this issue Oct 18, 2018
Move the test configuration in the distro-specific config.sh
file, for better control of what to include/exclude from
testing based on the test environment.
test_config.sh is still used to exclude specific distros from
being tested, when running tests in bulk.

Fixes: kata-containers#182

Signed-off-by: Marco Vedovati <mvedovati@suse.com>
marcov added a commit to marcov/kata-osbuilder that referenced this issue Oct 18, 2018
Move the test configuration in the distro-specific config.sh
file, for better control of what to include/exclude from
testing based on the test environment.
test_config.sh is still used to exclude specific distros from
being tested, when running tests in bulk.

Fixes: kata-containers#182

Signed-off-by: Marco Vedovati <mvedovati@suse.com>
ygefen pushed a commit to ygefen/osbuilder that referenced this issue Oct 23, 2018
Move the test configuration in the distro-specific config.sh
file, for better control of what to include/exclude from
testing based on the test environment.
test_config.sh is still used to exclude specific distros from
being tested, when running tests in bulk.

Fixes: kata-containers#182

Signed-off-by: Marco Vedovati <mvedovati@suse.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants