Skip to content

bug: when the image fails to be pulled from the registry, the deployment hangs #1476

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghostdevv opened this issue Nov 17, 2024 · 4 comments

Comments

@ghostdevv
Copy link
Contributor

Provide environment information

n/a

Describe the bug

I have a self-hosted registry for trigger exclusively, and I noticed while debugging the setup that when the image failed to be pulled the task just kept going seemingly unaware. I can't recall if this happened when the image was missing, but it definitely happened if the authentication failed.

Reproduction repo

n/a

To reproduce

Self host a registry, and use it with trigger. I'd try kicking it offline, and if that error does appear in the task logs then try adding auth to the registry but not the docker-provider container. Self hosting the registry was rather simple with docker:

  trigger-registry:
    container_name: trigger-registry
    image: registry:2
    restart: always
    volumes:
      - ./rdata:/var/lib/registry
      - ./auth:/auth

See the guide on built-in auth if you need.

Additional information

No response

@ghostdevv
Copy link
Contributor Author

ghostdevv commented Nov 17, 2024

I set up a scheduled task yesterday as a demo which just does one fetch request. The task takes about ~2s from start to finish (incl pulling the image). The docker registry went offline and this is the result of that 😆

image

All of these runs are still queued too. The oldest of which is running for 14 hours

image

@ghostdevv
Copy link
Contributor Author

ghostdevv commented Nov 17, 2024

It looks like it's because these errors aren't returned up from the docker provider, they're simply logged and then ignored.

https://github.com/triggerdotdev/trigger.dev/blob/main/apps/docker-provider/src/index.ts#L134-L148

@ghostdevv
Copy link
Contributor Author

There only seems to be one example of a provider shell handler providing a more detailed error: https://github.com/triggerdotdev/trigger.dev/blob/main/packages/core/src/v3/apps/provider.ts#L235-L241

I think fixing this will require work from the core team, I'll make a better issue for this

@ghostdevv
Copy link
Contributor Author

Moved to #1479

@ghostdevv ghostdevv closed this as not planned Won't fix, can't repro, duplicate, stale Nov 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant