Skip to content

Errors from the providers aren't handled #1479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghostdevv opened this issue Nov 17, 2024 · 1 comment
Closed

Errors from the providers aren't handled #1479

ghostdevv opened this issue Nov 17, 2024 · 1 comment

Comments

@ghostdevv
Copy link
Contributor

The Problem

Providers mostly don't report any errors they encounter back up to the core system. This means that if something fails, it's possible for it to just cause an unlimited "hang". In my case the hang causes the trigger to have been marked as started, when it hasn't actually been started. In the docker provider for example, errors are caught and simply ignored (example).

I'm not familiar with the codebase so please let me know if there any mistakes. I spent some time following around the providers, and found a few examples like the following where, a effectively a request to deploy the trigger is just sent but never followed up upon.

https://github.com/triggerdotdev/trigger.dev/blob/main/apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts#L544-L560

I'm guessing what needs to happen is that the provider needs some way to return an error code, which core can then "bubble up" by changing the deployment status. I didn't want to attempt to make changes without creating an issue first, as I'm missing a lot of context. I'm also not sure how prs such as #1470 interact with this issue for example.

Example Reproduction

I originally reported this in #1476, but moved it here as I realised my issue was a symptom of a wider problem that I described above. I encountered this while I was setting up authentication for my self hosted docker registry. Trigger would try to deploy a task, and the docker provider would fail to run it, because it couldn't download the image. This would cause trigger to hang exponentially, as it was unaware that the docker provider failed to run the task.

During my testing last night I added a scheduled task that runs every 20 minutes. I then forgot about it, and was messing around with some other things in trigger. After some sleep, I came back to it and noticed that there was a long list of "running" scheduled tasks. Upon further investigation, before going to sleep I had made an incomplete deployment which resulted in a missing docker image from the registry. This lead to the same place where, trigger tries to deploy the image, it fails to, but trigger has no idea. This lead to the long list of running tasks, the longest of which was hanging for ~14 hours.

image

Screenshot of the runs list

For reference, when the task is working correctly it takes ~2 seconds from start to finish.

image

@nicktrn
Copy link
Collaborator

nicktrn commented Nov 18, 2024

Thanks for digging into this and creating two very thorough issues!

The self-hosting story isn't great at the moment, this being part of the problem. We're going to make some big changes in the next couple of months. One of those changes means providers will go away completely - there will only be a single runtime-agnostic image to run per deployment.

There's currently no easy way to fail a task from the provider. Also, any changes here would also touch parts that affect our cloud deployment.

I think for those reasons, it makes more sense not to touch this at all currently. This will be fixed by the new self-hosted setup.

However, what you could do in the meantime to prevent this and other issues is to set a max duration on your tasks. This would at least prevent tasks running forever and "alert" to investigate any underlying causes, including provider errors. Not ideal, but I think it's the best we can do for now.

@nicktrn nicktrn closed this as not planned Won't fix, can't repro, duplicate, stale Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants