Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add timeout parameter to cli and driver #2586

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

avoidik
Copy link

@avoidik avoidik commented Jul 10, 2024

hello,

this is an attempt to make timeout settings consistent

(please consider this contribution under any appropriate permissive and open-source friendly license)

@avoidik
Copy link
Author

avoidik commented Jul 10, 2024

here is how to change timeout of the remote driver

$ docker buildx create \
  --name remote-builder \
  --node linux-amd64 \
  --platform linux/amd64 \
  --bootstrap \
  --use \
  --driver-opt timeout=10s \
  ssh://127.0.0.2
[+] Building 10.0s (1/1) FINISHED
 => ERROR [internal] waiting for connection                                                                                                                              10.0s
------
 > [internal] waiting for connection:
------
ERROR: context deadline exceeded

without the timeout property

docker buildx create \
  --name remote-builder \
  --node linux-amd64 \
  --platform linux/amd64 \
  --bootstrap \
  --use \
  ssh://127.0.0.2
[+] Building 120.0s (1/1) FINISHED
 => ERROR [internal] waiting for connection                                                                                                                             120.0s
------
 > [internal] waiting for connection:
------
ERROR: context deadline exceeded

in case of CLI commands

$ BUILDX_TIMEOUT=5s docker buildx ...

commands/root.go Outdated
timeoutDuration = defaultTimeoutCli
}
}
flags.DurationVar(&options.timeout, "timeout", timeoutDuration, "Override the default global timeout (20 seconds)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"(20 seconds)" not needed as already shown in default (and could be different in reality)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, this could take a value in seconds (assuming we don't expect users to specify timeouts in nano-seconds etc). I know we do so in some other places, for example "stop-timeout" on docker run;

     --stop-timeout int                 Timeout (in seconds) to stop a container

Copy link
Author

@avoidik avoidik Sep 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this takes a duration at the moment, let's say we want 70 seconds timeout, then we should set it as follows:

--timeout 1m10s
# or
--timeout 70s
# or
export BUILDX_TIMEOUT='70s'

if not set, we fallback to defaultTimeoutCli = 20 * time.Second (20 seconds)

if we provide a timeout as an integer, for example:

--timeout 70

this throws an exception similar to this:

invalid argument "70" for "--timeout" flag: time: missing unit in duration "70"

is this okay? should we rely on timeout in seconds as an integer?

@@ -214,7 +214,7 @@ func (d *Driver) wait(ctx context.Context, l progress.SubLogger) error {
select {
case <-ctx.Done():
return ctx.Err()
case <-time.After(time.Duration(try*120) * time.Millisecond):
case <-time.After(d.Timeout / time.Microsecond):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time.Microsecond in here does not really make logical sense. Use 1000 or / time.Second * time.Millisecond.

I think this should probably also have some min/max values. Or maybe it doesn't make sense to make these dynamic at all. Some users might say don't fail and set timeout to very big number but that doesn't mean that they want retry mechanism to wait for a long time. Same for very small timeout.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's trace that, we assume that we start with try = 1 and have 15 attempts, the operation takes longer than expected

in the current implementation we have an incremental timeout in place, where with every new attempt we wait 120ms longer:

The calc #1 timeout is: 120ms
The calc #2 timeout is: 240ms
The calc #3 timeout is: 360ms
The calc #4 timeout is: 480ms
The calc #5 timeout is: 600ms
The calc #6 timeout is: 720ms
The calc #7 timeout is: 840ms
The calc #8 timeout is: 960ms
The calc #9 timeout is: 1.08s
The calc #10 timeout is: 1.2s
The calc #11 timeout is: 1.32s
The calc #12 timeout is: 1.44s
The calc #13 timeout is: 1.56s
The calc #14 timeout is: 1.68s
The calc #15 timeout is: 1.8s

This means we have a 14,4 seconds to complete an operation in this context.

In the logic I'd like to propose there's no incremental part, but it depends on the default timeout value, we assume that d.Timeout == (defaultDriverTimeout = 120 * time.Second), so that 15 * 120 = 1800:

The calc #1 timeout is: 120ms
The calc #2 timeout is: 120ms
The calc #3 timeout is: 120ms
The calc #4 timeout is: 120ms
The calc #5 timeout is: 120ms
The calc #6 timeout is: 120ms
The calc #7 timeout is: 120ms
The calc #8 timeout is: 120ms
The calc #9 timeout is: 120ms
The calc #10 timeout is: 120ms
The calc #11 timeout is: 120ms
The calc #12 timeout is: 120ms
The calc #13 timeout is: 120ms
The calc #14 timeout is: 120ms
The calc #15 timeout is: 120ms

This means we have a 1,8 second window to complete an operation in this context, with the default timeout value set. By changing the default timeout value we can control how long every interval is, and every interval is equal.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most probably it makes sense to control how many attempts we have rather than tailor the default timeout, or even leave it as is

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^

@avoidik avoidik force-pushed the consistent-timeout-parameter branch 4 times, most recently from 966aaf3 to 7586654 Compare September 21, 2024 11:50
@@ -38,6 +39,8 @@ type Node struct {
Labels map[string]string
}

const defaultDriverTimeout = 120 * time.Second
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as such:

  • CLI by default has 20 seconds timeout, it can be changed by a user by providing CLI option or env. variable
  • buildx driver factory has 120 seconds timeout, it cannot be changed by a user, it's up to a buildx driver developer to define that value

Comment on lines +107 to +108
timeoutChan := time.After(d.Timeout)
ticker := time.NewTicker(d.Timeout / time.Microsecond)
Copy link
Author

@avoidik avoidik Sep 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to propose quite similar logic here. With the default timeout of 120s we say that every 120ms we poll for replicas. Current behavior is to fail after 120s, with fixed 100ms intervals.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your opinion? Should the default timeout value be coupled to retry intervals?

@avoidik avoidik force-pushed the consistent-timeout-parameter branch from 71b2c8f to 1b49854 Compare October 3, 2024 10:06
Signed-off-by: avoidik <avoidik@gmail.com>
Signed-off-by: Viacheslav Vasilyev <avoidik@gmail.com>
Signed-off-by: Viacheslav Vasilyev <avoidik@gmail.com>
Signed-off-by: Viacheslav Vasilyev <avoidik@gmail.com>
Signed-off-by: Viacheslav Vasilyev <avoidik@gmail.com>
Signed-off-by: Viacheslav Vasilyev <avoidik@gmail.com>
Signed-off-by: Viacheslav Vasilyev <avoidik@gmail.com>
Signed-off-by: Viacheslav Vasilyev <avoidik@gmail.com>
@avoidik avoidik force-pushed the consistent-timeout-parameter branch from 5164820 to 752b04e Compare October 3, 2024 10:07
@avoidik
Copy link
Author

avoidik commented Oct 3, 2024

CLI timeout consistency info

Command Has timeout logic implemented Inherits global timeout
buildx bake No No
buildx build No No
buildx create Yes Yes
buildx dial-stdio No No
buildx du No No
buildx imagetools create No No
buildx imagetools inspect No No
buildx inspect Yes Yes
buildx ls Yes Yes
buildx prune No No
buildx rm Yes Yes
buildx stop No No
buildx install N/A N/A
buildx uninstall N/A N/A
buildx use N/A N/A
buildx version N/A N/A

Drivers timeout consistency info

Driver Default timeout Timeout can be changed using --driver-opt
docker 120s No
docker-container 120s No
kubernetes 120s Yes, timeout=60s
remote 120s Yes, timeout=60s

Copy link

@invisiblepancake invisiblepancake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants