Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make networking/http test more resilient (bugfix) #1213

Merged
merged 5 commits into from
Apr 29, 2024

Conversation

pieqq
Copy link
Collaborator

@pieqq pieqq commented Apr 29, 2024

Description

This job often fails, usually because of connectivity issues.

It is now replaced with a Python script that automatically runs the wget command 5 times, adding backoff+jitter after each attempt to prevent the test server to misbehave in case too many devices would run this test at the same time.

Resolved issues

Fixes https://warthogs.atlassian.net/browse/CHECKBOX-1419

Documentation

Tests

  • Unit tests
  • Ran using checkbox from main (in the test below, I run checkbox while disconnected from the Internet, and I connect to a WiFi after a few seconds):
$ TRANSFER_SERVER=cdimage.ubuntu.com checkbox-cli run com.canonical.certification::networking/http
(...)
===========================[ Running Selected Jobs ]============================
=========[ Running job 1 / 1. Estimated time left (at least): 0:00:00 ]=========
------------------------------[ networking/http ]-------------------------------
ID: com.canonical.certification::networking/http
Category: com.canonical.plainbox::networking
... 8< -------------------------------------------------------------------------
--2024-04-29 16:15:06--  http://cdimage.ubuntu.com/
Resolving cdimage.ubuntu.com (cdimage.ubuntu.com)... failed: Temporary failure in name resolution.
wget: unable to resolve host address ‘cdimage.ubuntu.com’
--2024-04-29 16:15:09--  http://cdimage.ubuntu.com/
Resolving cdimage.ubuntu.com (cdimage.ubuntu.com)... failed: Temporary failure in name resolution.
wget: unable to resolve host address ‘cdimage.ubuntu.com’
--2024-04-29 16:15:13--  http://cdimage.ubuntu.com/
Resolving cdimage.ubuntu.com (cdimage.ubuntu.com)... failed: Temporary failure in name resolution.
wget: unable to resolve host address ‘cdimage.ubuntu.com’
--2024-04-29 16:15:24--  http://cdimage.ubuntu.com/
Resolving cdimage.ubuntu.com (cdimage.ubuntu.com)... 2001:67c:1562::25, 2001:67c:1562::28, 2620:2d:4000:1::1a, ...
Connecting to cdimage.ubuntu.com (cdimage.ubuntu.com)|2001:67c:1562::25|:80... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Date: Mon, 29 Apr 2024 08:15:25 GMT
  Server: Apache/2.4.29 (Ubuntu)
  Vary: Accept-Encoding
  Content-Length: 1706
  Keep-Alive: timeout=2, max=10
  Connection: Keep-Alive
  Content-Type: text/html;charset=UTF-8
Length: 1706 (1.7K) [text/html]
Saving to: ‘/dev/null’

     0K .                                                     100% 90.2M=0s

2024-04-29 16:15:25 (90.2 MB/s) - ‘/dev/null’ saved [1706/1706]

Trying to connect to http://cdimage.ubuntu.com (attempt 1/5)
Attempt 1 failed: Command '['wget', '-SO', '/dev/null', 'http://cdimage.ubuntu.com']' returned non-zero exit status 4.

Waiting for 2.51 seconds before retrying...
Trying to connect to http://cdimage.ubuntu.com (attempt 2/5)
Attempt 2 failed: Command '['wget', '-SO', '/dev/null', 'http://cdimage.ubuntu.com']' returned non-zero exit status 4.

Waiting for 4.60 seconds before retrying...
Trying to connect to http://cdimage.ubuntu.com (attempt 3/5)
Attempt 3 failed: Command '['wget', '-SO', '/dev/null', 'http://cdimage.ubuntu.com']' returned non-zero exit status 4.

Waiting for 10.99 seconds before retrying...
Trying to connect to http://cdimage.ubuntu.com (attempt 4/5)
------------------------------------------------------------------------- >8 ---
Outcome: job passed

This job often fails, usually because of connectivity issues.

It is now replaced with a Python script that automatically runs the
`wget` command 3 times, increasing the timeout after each failure to
give more chances to connect succesfully.

Fix CHECKBOX-1419
Copy link

codecov bot commented Apr 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 43.36%. Comparing base (9318a5c) to head (d6e0572).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1213      +/-   ##
==========================================
+ Coverage   43.32%   43.36%   +0.03%     
==========================================
  Files         356      357       +1     
  Lines       38658    38684      +26     
  Branches     6559     6560       +1     
==========================================
+ Hits        16750    16776      +26     
  Misses      21245    21245              
  Partials      663      663              
Flag Coverage Δ
provider-base 16.72% <100.00%> (+0.14%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pieqq pieqq marked this pull request as draft April 29, 2024 04:03
@pieqq pieqq marked this pull request as ready for review April 29, 2024 04:05
@pieqq pieqq marked this pull request as draft April 29, 2024 07:42
Following feedback from team, reworking the connection function to use a
random delay, using a backoff and a jitter. Each new attempt will wait
longer than the previous one (up to 60 seconds per attempt).

The jitter is here to prevent the test from choking the infrastructure
if many devices are trying to run this test at the same moment.
@pieqq pieqq marked this pull request as ready for review April 29, 2024 08:13
Copy link
Collaborator

@Hook25 Hook25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, +1

@Hook25 Hook25 merged commit 070d2fc into main Apr 29, 2024
13 checks passed
@Hook25 Hook25 deleted the 1419-networking-http-test-resilience branch April 29, 2024 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants