Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 BUG: Worker <-> Worker request over custom_domain returns instant 522 timeout response #787

Closed
KimlikDAO-bot opened this issue Feb 1, 2023 · 21 comments
Assignees
Labels
bug Something isn't working

Comments

@KimlikDAO-bot
Copy link

KimlikDAO-bot commented Feb 1, 2023

Which Cloudflare product(s) does this pertain to?

Workers/Other

What version of Wrangler are you using?

2.9.0

What operating system are you using?

Mac

Describe the Bug

Same zone worker <-> worker requests through custom domains returns an immediate 522 timeout http response.

Custom domains were introduced (partly?) to solve same zone worker <-> worker requests: https://blog.cloudflare.com/custom-domains-for-workers/

However, at least on some POPs the request immediately returns with a 522 response. To reproduce create the workers

// worker1.js
export default {
  fetch(req) {
    return fetch("https://worker2.example.com");
  }
};
// worker2.js
export default {
  fetch(req) {
    return new Response("<html>Hello from worker2</html>", {
      headers: { "content-type": "text/html" }
    });
  }
};

Deploy them (wrangle 2.9.0) with the following configs:

# worker1.toml
name = "example-worker1"
main = "worker1.js"
compatibility_date = "2023-01-31"

[route]
pattern = "worker1.example.com"
custom_domain = true
# worker2.toml
name = "example-worker2"
main = "worker2.js"
compatibility_date = "2023-01-31"

[route]
pattern = "worker2.example.com"
custom_domain = true

One needs to replace example.com with a domain the CF account controls. Then

wrangler publish -c worker2.toml
wrangler publish -c worker1.toml
# wait
curl https://worker2.example.com    # Fine!
curl https://worker1.example.com -v # Status 522

For a deployed example
Account: 8f0c2f2271ff857947d9a5b2c38595a0
Zone: 97fd67b98d0cc2080e9d13be10b3bca0

@KimlikDAO-bot KimlikDAO-bot added the bug Something isn't working label Feb 1, 2023
@tanushree-sharma
Copy link

Hey! It doesn't look like you're using service bindings to communicate between the two Workers. A fetch request between two Workers on the same zone is expected to fail without service bindings. Service binding documentation: https://developers.cloudflare.com/workers/platform/bindings/about-service-bindings/

@KimlikDAO-bot
Copy link
Author

KimlikDAO-bot commented Feb 13, 2023

I don't think this is accurate. Same zone worker to worker communication should work either through service bindings or through custom domain triggers. See the blog post linked above: https://blog.cloudflare.com/custom-domains-for-workers/

I am aware that same zone worker to worker communication is not possible when route trigger is used.

From https://developers.cloudflare.com/workers/platform/triggers/custom-domains/:

Another benefit of integration with Cloudflare DNS is that you can use your Custom Domains like you would any external dependency. Your Workers can fetch() Custom Domains and invoke their associated Worker, even if the Worker is on the same Cloudflare zone. The newly invoked Worker is treated like a new top-level request and will execute in a separate thread.

Either the code should be fixed or the docs. They are incompatible

@altryne
Copy link

altryne commented Feb 18, 2023

Any update on this?
This is really breaking the way we deploy our site and API,
We can't use service bindings, and this throws the 522 errors on our staging env, is blocking us from going to production.

We can't fall back on a workers dev zone as well, and have configured both the workers with custom domains as per documentation!

@tanushree-sharma
Copy link

Hmm, you're right. We'll look into this and get back to you.

I'm curious what's preventing you from using service bindings?

@KimlikDAO-bot
Copy link
Author

Service bindings are cost effective since all downstream calls happen in the same thread however they don't allow parallelism.

@KimlikDAO-bot
Copy link
Author

Also we want to aim for "smart placements" when it's ready and I doubt service bindings can support that since some people may be relying on the single thread guarantee already.

@lrapoport-cf
Copy link
Contributor

@KimlikDAO-bot as an update, we originally thought this might be an internal issue but have found that the problem doesn't surface when everything is done through the dashboard, so this does appear to be a bug in wrangler. we're continuing to investigate 👍

@penalosa
Copy link
Collaborator

We've looked into this a bit more, and it looks like it's an internal bug. We're working on finding the root cause, but in the meantime, a workaround is to set your compatibility date to before 2022-04-05.

@KianNH
Copy link
Contributor

KianNH commented Apr 14, 2023

A more ergonomic solution might be just to specifically remove the minimal_subrequests flag so you don't lose the other changes since 2022-04-05

# https://github.com/cloudflare/workerd/issues/787
compatibility_flags = ["no_minimal_subrequests"]

@jgontrum
Copy link

I think a bug related to this are quests from worker to an api that's using the same subdomain.

My scenario was:
Worker with custom domain on api.myapp.com makes a request to my db running on my own server db.myapp.com. This resulted in basically empty requests in my server logs which threw off my reverse proxy running there.

Adding the compatibility_flags did not change anything, the only solution for now is to use .workers.dev subdomain instead of a custom domain.

@abiodunakande
Copy link

Hey any update on this, I deployed my workers via terraform and also getting 522 when calling from a.domain.com -> b.domain.com

@penalosa
Copy link
Collaborator

No update right now, unfortunately—we're tracking this internally though, and will update here when there's a resolution. For now, @KianNH's workaround should be a reasonable stopgap:

# https://github.com/cloudflare/workerd/issues/787
compatibility_flags = ["no_minimal_subrequests"]

@penalosa
Copy link
Collaborator

Moving to the workerd repo since this is a runtime issue, not a Wrangler one.

@penalosa penalosa transferred this issue from cloudflare/workers-sdk Jun 19, 2023
@gillbates
Copy link

gillbates commented Jun 20, 2023

got the same issue now.
we have worker A with custom domain: a.xxx.com, worker B with custom domain: b.xxx.com
it always return status 522, when i try to fetch('https://b.xxx.com/1.jpg') in worker A, any ideas why?
my wrangler version: wrangler 3.1.0
@penalosa

@kentonv
Copy link
Member

kentonv commented Jun 21, 2023

@penalosa this is actually not a runtime issue either, it is a Cloudflare stack issue outside of the workers runtime, so routing it to workerd unfortunately doesn't send it to the right people. Perhaps we need to create a new github project for these kinds of issues and make sure the right people are watching it.

@rawkode
Copy link

rawkode commented Jul 13, 2023

Hey! It doesn't look like you're using service bindings to communicate between the two Workers. A fetch request between two Workers on the same zone is expected to fail without service bindings. Service binding documentation: https://developers.cloudflare.com/workers/platform/bindings/about-service-bindings/

Just to follow up on this, I can't use service bindings because I need the proxy fetch (to call itself) to go through the Cloudflare request flow to convert a clientID and clientSecret to a JWT - which doesn't happen when using service bindings.

@jaswrks
Copy link

jaswrks commented Mar 2, 2024

Last update Jul, 13th. Where is the proper place to track progress on this important issue?

Cross-referencing community thread:
https://community.cloudflare.com/t/522-when-worker-proxies-to-another-worker/569561

@penalosa
Copy link
Collaborator

This should be fixed now—I can no longer reproduce the original issue. @KimlikDAO-bot could you confirm you're no longer seeing this behaviour?

@shayypy
Copy link

shayypy commented Jul 28, 2024

This should be fixed now—I can no longer reproduce the original issue.

This is still happening for me. If it matters, my request is to the worker that is making the request (the worker requests itself) for the purpose of reading a set of static markdown files in the public directory. Is this also a bug or do I just need to move them such that they can be imported instead?

@10j0
Copy link

10j0 commented Oct 5, 2024

I still have this problem, my worker is being proxied to a subdomain of my site and the worker itself is okay, but the subdomain is not, it returns an error 522. I tried setting the compatilibility date and flag, to no avail. Any updates on this?

@10j0
Copy link

10j0 commented Oct 5, 2024

I still have this problem, my worker is being proxied to a subdomain of my site and the worker itself is okay, but the subdomain is not, it returns an error 522. I tried setting the compatilibility date and flag, to no avail. Any updates on this?

Resolved: https://developers.cloudflare.com/workers/configuration/routing/routes/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

15 participants