Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasks that fail due to file not found error stay active and block transfers of the same file #1056

Open
johnmbarrett opened this issue Nov 22, 2024 · 5 comments

Comments

@johnmbarrett
Copy link

Hi,

Apologies if this has been reported before, but I couldn't find a similar issue in the issues list. The issue is that if you try to transfer a file that doesn't exist, that transfer task stays active, preventing you from transferring the file if it later comes into existence.

Steps to reproduce:

$ globus transfer --notify off <endpoint 1 path>/a_path_that_does_not_exist.txt <endpoint 2 path>/a_path_that_does_not_exist.txt
Message: The transfer has been accepted and a task has been created and queued for execution
Task ID: b48719a1-a8f7-11ef-a0c7-7754bd4249c4
$ globus task show  b48719a1-a8f7-11ef-a0c7-7754bd4249c4
Label:                        None
Task ID:                      b48719a1-a8f7-11ef-a0c7-7754bd4249c4
Is Paused:                    False
Type:                         TRANSFER
Directories:                  0
Files:                        0
Status:                       ACTIVE
Request Time:                 2024-11-22T17:32:14+00:00
Faults:                       1
Total Subtasks:               1
Subtasks Succeeded:           0
Subtasks Pending:             1
Subtasks Retrying:            0
Subtasks Failed:              0
Subtasks Canceled:            0
Subtasks Expired:             0
Subtasks with Skipped Errors: 0
Deadline:                     2024-11-23T17:32:14+00:00
Details:                      FILE_NOT_FOUND
Source Endpoint:              [redacted]
Source Endpoint ID:           [redacted]
Destination Endpoint:         [redacted]
Destination Endpoint ID:      [redacted]
Bytes Transferred:            0
Bytes Per Second:             0
$ touch a_path_that_does_not_exist.txt
$ globus transfer --notify off <endpoint 1 path>/a_path_that_does_not_exist.txt <endpoint 2 path>/a_path_that_does_not_exist.txt

Expected behaviour:

The task is scheduled and completes successfully.

Actual behavior:

Globus CLI Error: A Transfer API Error Occurred.
HTTP status:      409
request_id:       qv43hhuRz
code:             Conflict
message:          A transfer with identical paths has not yet completed

Additional information:

This is unexpected and counterintuitive behaviour compared to every file system and file transfer protocol I have worked with. I could understand why the transfer stayed active if it was reattempting the transfer periodically or if it detects that the source path becomes valid, but this does not appear to be the case.

I discovered this while running a long script with a globus transfer call at the end to transfer the results, but a failure in a subprocess caused the results file not to be created. So I fixed the issue with the script and reran it, only for the results to not get saved because of the error above.

The only workaround is to identify the failed tasks and cancel them, but this is problematic if the user has a lot of tasks queued, especially if only a subset of them fail. If for some reason this is the intended behaviour, this should be clearly documented and an command line option provided to select the intuitive behavior, i.e. that tasks that fail become inactive/cancelled/whatever so that they can be reattempted when the file exists.

@aaschaer
Copy link
Contributor

Hi John,

Taking a look at the logs for your task b48719a1-a8f7-11ef-a0c7-7754bd4249c4 it appears it did eventually find the file that was added and succeed. As you expect, the task stays alive reattempting the transfer periodically, but the time between retries can be several minutes if errors persist.

If in the future you don't want tasks to continue retrying files that don't exist the --skip-source-errors option will skip source files that have file not found or permission denied errors.

Best,
Aaron

@johnmbarrett
Copy link
Author

johnmbarrett commented Nov 22, 2024

Thanks for the quick response. That was a steps to reproduce, not a log, hence the adding of the file was deliberate.

Regarding retrying, I disagree that this is what one would expect. That's not usual behaviour for file transfer protocols and it does not appear to me to be documented anywhere in the globus transfer docs. Hence it's not immediately obvious to the user that this is what is happening.

Does the --skip-source-errors option cause the task to be immediately cancelled or otherwise made inactive if all source files do not exist? If so, it should be more clearly documented, because it's not obvious that is the case (I checked for command line options that would force the behavior that I want).

Since this is apparently intended behavior, the error message in the case of a second transfer with identical paths could be improved to clarify that the user needs to cancel the existing task. Preferably giving the ID of the interfering task, if possible.

Lastly, I don't think that the behavior of the second transfer erroring is desirable, since as I say this is counterintuitive compared to standard file systems/transfer protocols (e.g. in Windows or Linux, if I try to copy a file that doesn't exist, create that file, then try to copy it again, it goes through just fine) and also causes scripts to fail in unexpected ways (i.e. if they rely on the globus transfer call not erroring). If a second transfer task is submitted with identical paths to one that's already in the system in the ACTIVE/FILE_NOT_FOUND state, the first task should be pre-empted by the second, instead of an error being thrown.

@aaschaer
Copy link
Contributor

Regarding retrying, I disagree that this is what one would expect. That's not usual behaviour for file transfer protocols and it does not appear to me to be documented anywhere in the globus transfer docs. Hence it's not immediately obvious to the user that this is what is happening.

Looks like there is room for improvement on this topic in the docs for the underlying Transfer service as well otherwise I would link you there. I'll see if we can get those improved and include a link in the CLI docs for globus transfer.

Does the --skip-source-errors option cause the task to be immediately cancelled or otherwise made inactive if all source files do not exist?

Such a task will complete as a success and no longer be active.

As for the duplicate task error behavior itself, the error was added to avoid bad behavior when users set up recurring transfers that sometimes take longer to complete than the interval between submissions. Such users generally want to allow the first task complete and then wait for the next interval to start another transfer rather than have a new transfer override any previous transfers, otherwise some files might never complete in the worst case. Because of this I don't think the Transfer service will be able to accommodate your desired change in behavior. I'll forward the feature request to add the task ID to the error message, as that does seem like it would be a useful addition.

@johnmbarrett
Copy link
Author

Thanks for the clarifying comments and agreeing to documentation improvements.

Thinking about it, the --skip-source-errors option doesn't really give the behaviour I want either, since it would be confusing to look at the logs and see the task had succeeded but the file had not been transferred. Ideal behaviour for me would be for the task to fail on source errors and never be reattempted/immediately be cancelled, enabling future transfer tasks for the same paths to be scheduled successfully. I'd also like an equivalent fail on destination errors option.

Lastly, I agree that allowing transfer tasks to preempt existing tasks with the same source & destination paths would in general not be desirable, I meant specifically in the case where the existing task is in a failure state. But it would be moot anyway if I had the fail on source/destination errors option.

@aaschaer
Copy link
Contributor

Thinking about it, the --skip-source-errors option doesn't really give the behaviour I want either, since it would be confusing to look at the logs and see the task had succeeded but the file had not been transferred.

For a potential workaround you could look at the files_transferred field in the task (use globus task list -F json to see the full json document for each task) which would be 0 in these cases.

But it sounds like ultimately what you want here is more control over how the tasks handle errors. I believe the Transfer team originally discussed an interface for specifying which error codes are skipped/retried/fatal, but there were concerns about usability and corner cases, so we settled on the options that the CLI exposes as --skip-source-errors and --fail-on-quota-errors. I'll note this as a feature request for looking into something like that again though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants