Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent high latency on PROD with Vercel-proxied spx-backend #804

Open
aofei opened this issue Aug 26, 2024 · 4 comments
Open

Consistent high latency on PROD with Vercel-proxied spx-backend #804

aofei opened this issue Aug 26, 2024 · 4 comments
Assignees

Comments

@aofei
Copy link
Member

aofei commented Aug 26, 2024

Screenshot 2024-08-26 at 11 22 33 Screenshot 2024-08-26 at 11 22 41
@aofei aofei self-assigned this Aug 26, 2024
aofei added a commit to aofei/.goplus.builder that referenced this issue Aug 28, 2024
This replaces Vercel Edge Middleware with the Vercel Build Output
API[^1] to reimplement the API proxy, cutting costs (no more
invocation-based charges), reducing latency (no middle layer or cold
starts), and simplifying the setup, while retaining dynamic
configuration via the `VERCEL_PROXIED_API_BASE_URL` environment
variable.

Updates goplus#804

[^1]: https://vercel.com/docs/build-output-api/v3

Signed-off-by: Aofei Sheng <aofei@aofeisheng.com>
aofei added a commit to aofei/.goplus.builder that referenced this issue Aug 28, 2024
This replaces Vercel Edge Middleware with the Vercel Build Output
API[^1] to reimplement the API proxy, cutting costs (no more
invocation-based charges), reducing latency (no middle layer or cold
starts), and simplifying the setup, while retaining dynamic
configuration via the `VERCEL_PROXIED_API_BASE_URL` environment
variable.

Updates goplus#804

[^1]: https://vercel.com/docs/build-output-api/v3

Signed-off-by: Aofei Sheng <aofei@aofeisheng.com>
@aofei
Copy link
Member Author

aofei commented Aug 28, 2024

The output of traceroute 16.163.99.55 (from QVM dal-vm01 to Vercel hkg1):

traceroute to 16.163.99.55 (16.163.99.55), 30 hops max, 60 byte packets
 1  10.66.130.25 (10.66.130.25)  0.200 ms  0.137 ms  0.215 ms
 2  [REDACTED] ([REDACTED])  12.330 ms  12.395 ms  12.295 ms
 3  10.66.20.1 (10.66.20.1)  15.640 ms  15.766 ms  15.796 ms
 4  172.16.200.5 (172.16.200.5)  0.392 ms 172.16.200.1 (172.16.200.1)  0.574 ms 172.16.200.5 (172.16.200.5)  0.364 ms
 5  * * *
 6  * * *
 7  * 38.83.110.137 (38.83.110.137)  2.118 ms *
 8  * * 4.36.173.37 (4.36.173.37)  2.233 ms
 9  4.69.209.110 (4.69.209.110)  1.944 ms be2595.ccr31.dfw01.atlas.cogentco.com (154.54.93.221)  2.134 ms ae1.3515.edge1.Dallas2.net.lumen.tech (4.69.209.114)  2.115 ms
10  * Tata-level3-Dallas2.Level3.net (4.68.74.42)  2.320 ms be3821.ccr21.elp02.atlas.cogentco.com (154.54.165.26)  13.238 ms
11  * * if-ae-43-2.tcore2.dt8-dallas.as6453.net (66.110.57.23)  138.610 ms
12  be2932.ccr42.lax01.atlas.cogentco.com (154.54.45.162)  34.066 ms be2931.ccr41.lax01.atlas.cogentco.com (154.54.44.86)  33.474 ms be2932.ccr42.lax01.atlas.cogentco.com (154.54.45.162)  34.161 ms
13  be3360.ccr41.lax04.atlas.cogentco.com (154.54.25.150)  34.045 ms *  34.360 ms
14  be2894.ccr72.tyo01.atlas.cogentco.com (154.54.1.22)  135.802 ms  139.303 ms *
15  154.18.29.202 (154.18.29.202)  135.731 ms  131.790 ms *
16  150.222.90.107 (150.222.90.107)  134.095 ms 150.222.90.109 (150.222.90.109)  140.729 ms *
17  * * *
18  54.239.52.97 (54.239.52.97)  139.424 ms 54.239.52.105 (54.239.52.105)  135.930 ms 54.239.52.107 (54.239.52.107)  137.152 ms
19  * 52.95.30.26 (52.95.30.26)  136.074 ms 52.95.30.16 (52.95.30.16)  135.505 ms
20  * * *
21  52.93.35.62 (52.93.35.62)  180.091 ms 52.93.157.153 (52.93.157.153)  180.897 ms *
22  52.93.157.133 (52.93.157.133)  185.932 ms 52.93.157.56 (52.93.157.56)  179.004 ms 52.93.157.140 (52.93.157.140)  181.591 ms
23  54.240.241.183 (54.240.241.183)  191.584 ms 52.93.157.112 (52.93.157.112)  188.168 ms 52.93.157.160 (52.93.157.160)  179.612 ms
24  52.93.157.92 (52.93.157.92)  178.889 ms 52.93.157.22 (52.93.157.22)  185.641 ms 52.93.157.92 (52.93.157.92)  178.832 ms
25  * * 52.93.156.25 (52.93.156.25)  186.178 ms
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *

Via Vercel-proxied API:

Screenshot 2024-08-28 at 19 34 16

Direct API access:

Screenshot 2024-08-28 at 19 34 20

aofei added a commit that referenced this issue Aug 29, 2024
)

This replaces Vercel Edge Middleware with the Vercel Build Output
API[^1] to reimplement the API proxy, cutting costs (no more
invocation-based charges), reducing latency (no middle layer or cold
starts), and simplifying the setup, while retaining dynamic
configuration via the `VERCEL_PROXIED_API_BASE_URL` environment
variable.

Updates #804

[^1]: https://vercel.com/docs/build-output-api/v3

Signed-off-by: Aofei Sheng <aofei@aofeisheng.com>
@aofei
Copy link
Member Author

aofei commented Sep 4, 2024

I conducted a test by sending 50 requests via the Vercel-proxied API using the browser console. Based on the Nginx access logs on the spx-backend server,

  1. we can infer that each request might be handled by different nodes within the Vercel Edge Network, as indicated by the varying remote_addr values.
  2. Additionally, the connection_requests=1 log entry suggests that even when the same node handles multiple requests, Vercel does not reuse the previous TCP connection.
  3. Furthermore, the varying ssl_session_id values indicate that Vercel does not reuse SSL sessions.

In summary, Vercel Rewrites does not seem to use any kind of "session persistence" at all, meaning each request requires a full TCP connection establishment and TLS handshake, even for rapid, consecutive requests from the same client. This has a significant negative impact on spx-gui's performance. So we should consider disabling the Vercel reverse proxy for the API and switch back to direct API access until Vercel provides a reasonable solution.

Screenshot 2024-09-04 at 14 05 57

@aofei
Copy link
Member Author

aofei commented Sep 4, 2024

And as a comparison, I conducted a similar test with the API proxied by Cloudflare Workers:

export default {
  async fetch(request) {
    const url = new URL(request.url);
    url.hostname = 'builder-api.goplus.org';
    return fetch(new Request(url, request));
  },
};

The result was as expected, with improved performance:

  • HTTP/2:

    Screenshot 2024-09-04 at 14 09 46
  • HTTP/1.1:

    Screenshot 2024-09-04 at 14 34 30

@nighca
Copy link
Collaborator

nighca commented Sep 25, 2024

Complete response from Vercel Support

nighca September 4, 2024 at 2:39 PM

We configured rewrites in project to proxy our APIs, while it is causing high latency. We tried to dig in and find that it lacks connection reuse when do proxy. Please see details in #804 (comment)

nighca September 4, 2024 at 2:51 PM

And some background info:

  • We configure rewrites based on Vercel Build Output API. We use vite-plugin-vercel to help us generate the output, you can find related configurations in
    rewrites: [
    {
    source: '/api/(.*)',
    destination: (env.VERCEL_PROXIED_API_BASE_URL as string) + '/$1'
    },
  • The rewrites proxies requests like https://builder.goplus.org/api/xxx to https://builder-api.goplus.org/xxx
  • We tested and found that the latency seems to be caused by establishing connection every time, which could have been avoided by connection reuse.

Vercel Support September 5, 2024 at 8:29 PM

Hi Hanxing,

Thanks for getting in touch with Vercel Support! I’m more than happy to look into this with you.

We appreciate you sharing context and relevant GitHub links to the issue and code. I'll check with our engineering team to understand if there are improvements that could be made and update you once I hear back. We appreciate your patience in the meantime.

Kind Regards,

Jennifer Tran
▲ Sr. Customer Success Engineer at Vercel
Visit our Community for developer-led discussions and help.

Your feedback matters! 🌟 If you’re satisfied with the support you received, please consider leaving a positive rating. Your feedback helps us continue providing excellent service. Thank you!

nighca September 13, 2024 at 10:25 AM

Hi, is there any update about this?

Vercel Support September 14, 2024 at 4:20 AM

Hi Hanxing,

Thank you for your patience! We did hear back from our engineering team.

There is connection pooling, but it is at worker level. Each node has N workers, so it is possible to hit the same node but still not reuse the connection. This is not something that can be adjusted for a given project as it is due to the way that we proxy external rewrites and improving it will require significant changes to our infrastructure.

I hope this information helps, please don't hesitate to reach out with additional questions and we would be happy to help!

Cheers,

Zach
Senior Customer Success Engineer ▲ Vercel

thread::-hZjW49vK9KcEh-U043ft0E::

nighca September 14, 2024 at 3:59 PM

Thank you for your reply. However, I must let you know that in our tests, "connection pooling" has nearly never taken effect. This, by default, leads to performance issues for projects that use external rewrites, which I believe is not uncommon. If I'm not mistaken, it could mean many similar projects are already impacted.

Vercel Support September 24, 2024 at 12:41 PM

Hello Hanxing,

Thank you for your update here. After further investigation, our engineering team has determined there is no change planned that can be implemented at this time to alter the current behavior for external rewrites. I realize this is not the update you were hoping to receive and for that, I do apologize.

Should you have any further questions, please do not hesitate to reach out.

Kind regards,

Sen
▲ Senior Customer Success Engineer at Vercel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@nighca @aofei and others