Tasks that fail due to file not found error stay active and block transfers of the same file #1056

johnmbarrett · 2024-11-22T17:53:44Z

Hi,

Apologies if this has been reported before, but I couldn't find a similar issue in the issues list. The issue is that if you try to transfer a file that doesn't exist, that transfer task stays active, preventing you from transferring the file if it later comes into existence.

Steps to reproduce:

$ globus transfer --notify off <endpoint 1 path>/a_path_that_does_not_exist.txt <endpoint 2 path>/a_path_that_does_not_exist.txt
Message: The transfer has been accepted and a task has been created and queued for execution
Task ID: b48719a1-a8f7-11ef-a0c7-7754bd4249c4
$ globus task show  b48719a1-a8f7-11ef-a0c7-7754bd4249c4
Label:                        None
Task ID:                      b48719a1-a8f7-11ef-a0c7-7754bd4249c4
Is Paused:                    False
Type:                         TRANSFER
Directories:                  0
Files:                        0
Status:                       ACTIVE
Request Time:                 2024-11-22T17:32:14+00:00
Faults:                       1
Total Subtasks:               1
Subtasks Succeeded:           0
Subtasks Pending:             1
Subtasks Retrying:            0
Subtasks Failed:              0
Subtasks Canceled:            0
Subtasks Expired:             0
Subtasks with Skipped Errors: 0
Deadline:                     2024-11-23T17:32:14+00:00
Details:                      FILE_NOT_FOUND
Source Endpoint:              [redacted]
Source Endpoint ID:           [redacted]
Destination Endpoint:         [redacted]
Destination Endpoint ID:      [redacted]
Bytes Transferred:            0
Bytes Per Second:             0
$ touch a_path_that_does_not_exist.txt
$ globus transfer --notify off <endpoint 1 path>/a_path_that_does_not_exist.txt <endpoint 2 path>/a_path_that_does_not_exist.txt

Expected behaviour:

The task is scheduled and completes successfully.

Actual behavior:

Globus CLI Error: A Transfer API Error Occurred.
HTTP status:      409
request_id:       qv43hhuRz
code:             Conflict
message:          A transfer with identical paths has not yet completed

Additional information:

This is unexpected and counterintuitive behaviour compared to every file system and file transfer protocol I have worked with. I could understand why the transfer stayed active if it was reattempting the transfer periodically or if it detects that the source path becomes valid, but this does not appear to be the case.

I discovered this while running a long script with a globus transfer call at the end to transfer the results, but a failure in a subprocess caused the results file not to be created. So I fixed the issue with the script and reran it, only for the results to not get saved because of the error above.

The only workaround is to identify the failed tasks and cancel them, but this is problematic if the user has a lot of tasks queued, especially if only a subset of them fail. If for some reason this is the intended behaviour, this should be clearly documented and an command line option provided to select the intuitive behavior, i.e. that tasks that fail become inactive/cancelled/whatever so that they can be reattempted when the file exists.

The text was updated successfully, but these errors were encountered:

aaschaer · 2024-11-22T18:23:30Z

Hi John,

Taking a look at the logs for your task b48719a1-a8f7-11ef-a0c7-7754bd4249c4 it appears it did eventually find the file that was added and succeed. As you expect, the task stays alive reattempting the transfer periodically, but the time between retries can be several minutes if errors persist.

If in the future you don't want tasks to continue retrying files that don't exist the --skip-source-errors option will skip source files that have file not found or permission denied errors.

Best,
Aaron

johnmbarrett · 2024-11-22T20:28:36Z

Thanks for the quick response. That was a steps to reproduce, not a log, hence the adding of the file was deliberate.

Regarding retrying, I disagree that this is what one would expect. That's not usual behaviour for file transfer protocols and it does not appear to me to be documented anywhere in the globus transfer docs. Hence it's not immediately obvious to the user that this is what is happening.

Does the --skip-source-errors option cause the task to be immediately cancelled or otherwise made inactive if all source files do not exist? If so, it should be more clearly documented, because it's not obvious that is the case (I checked for command line options that would force the behavior that I want).

Since this is apparently intended behavior, the error message in the case of a second transfer with identical paths could be improved to clarify that the user needs to cancel the existing task. Preferably giving the ID of the interfering task, if possible.

Lastly, I don't think that the behavior of the second transfer erroring is desirable, since as I say this is counterintuitive compared to standard file systems/transfer protocols (e.g. in Windows or Linux, if I try to copy a file that doesn't exist, create that file, then try to copy it again, it goes through just fine) and also causes scripts to fail in unexpected ways (i.e. if they rely on the globus transfer call not erroring). If a second transfer task is submitted with identical paths to one that's already in the system in the ACTIVE/FILE_NOT_FOUND state, the first task should be pre-empted by the second, instead of an error being thrown.

aaschaer · 2024-11-22T23:37:32Z

Regarding retrying, I disagree that this is what one would expect. That's not usual behaviour for file transfer protocols and it does not appear to me to be documented anywhere in the globus transfer docs. Hence it's not immediately obvious to the user that this is what is happening.

Looks like there is room for improvement on this topic in the docs for the underlying Transfer service as well otherwise I would link you there. I'll see if we can get those improved and include a link in the CLI docs for globus transfer.

Does the --skip-source-errors option cause the task to be immediately cancelled or otherwise made inactive if all source files do not exist?

Such a task will complete as a success and no longer be active.

As for the duplicate task error behavior itself, the error was added to avoid bad behavior when users set up recurring transfers that sometimes take longer to complete than the interval between submissions. Such users generally want to allow the first task complete and then wait for the next interval to start another transfer rather than have a new transfer override any previous transfers, otherwise some files might never complete in the worst case. Because of this I don't think the Transfer service will be able to accommodate your desired change in behavior. I'll forward the feature request to add the task ID to the error message, as that does seem like it would be a useful addition.

johnmbarrett · 2024-11-23T23:13:08Z

Thanks for the clarifying comments and agreeing to documentation improvements.

Thinking about it, the --skip-source-errors option doesn't really give the behaviour I want either, since it would be confusing to look at the logs and see the task had succeeded but the file had not been transferred. Ideal behaviour for me would be for the task to fail on source errors and never be reattempted/immediately be cancelled, enabling future transfer tasks for the same paths to be scheduled successfully. I'd also like an equivalent fail on destination errors option.

Lastly, I agree that allowing transfer tasks to preempt existing tasks with the same source & destination paths would in general not be desirable, I meant specifically in the case where the existing task is in a failure state. But it would be moot anyway if I had the fail on source/destination errors option.

aaschaer · 2024-11-25T16:50:43Z

Thinking about it, the --skip-source-errors option doesn't really give the behaviour I want either, since it would be confusing to look at the logs and see the task had succeeded but the file had not been transferred.

For a potential workaround you could look at the files_transferred field in the task (use globus task list -F json to see the full json document for each task) which would be 0 in these cases.

But it sounds like ultimately what you want here is more control over how the tasks handle errors. I believe the Transfer team originally discussed an interface for specifying which error codes are skipped/retried/fatal, but there were concerns about usability and corner cases, so we settled on the options that the CLI exposes as --skip-source-errors and --fail-on-quota-errors. I'll note this as a feature request for looking into something like that again though.

sirosen · 2025-01-31T22:38:41Z

I left this thread open for a while in case there was more discussion, but I think it's time to close it.

These are legitimate questions about the ideal Transfer interface -- the async nature of Transfer and its built-in retry behaviors are hard to trivially explain or expose, and we've seen other users who want customizations in the past. But ultimately it's not so much a globus-cli question as it is a Transfer service question; here in the CLI we're just exposing what the service provides.

Since Aaron already made a note of this for the Transfer service, I'm closing.

sirosen closed this as completed Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tasks that fail due to file not found error stay active and block transfers of the same file #1056

Tasks that fail due to file not found error stay active and block transfers of the same file #1056

johnmbarrett commented Nov 22, 2024

aaschaer commented Nov 22, 2024

johnmbarrett commented Nov 22, 2024 •

edited

Loading

aaschaer commented Nov 22, 2024

johnmbarrett commented Nov 23, 2024

aaschaer commented Nov 25, 2024

sirosen commented Jan 31, 2025

Tasks that fail due to file not found error stay active and block transfers of the same file #1056

Tasks that fail due to file not found error stay active and block transfers of the same file #1056

Comments

johnmbarrett commented Nov 22, 2024

Steps to reproduce:

Expected behaviour:

Actual behavior:

Additional information:

aaschaer commented Nov 22, 2024

johnmbarrett commented Nov 22, 2024 • edited Loading

aaschaer commented Nov 22, 2024

johnmbarrett commented Nov 23, 2024

aaschaer commented Nov 25, 2024

sirosen commented Jan 31, 2025

johnmbarrett commented Nov 22, 2024 •

edited

Loading