Consistent high latency on PROD with Vercel-proxied spx-backend #804

aofei · 2024-08-26T03:27:59Z

This replaces Vercel Edge Middleware with the Vercel Build Output API[^1] to reimplement the API proxy, cutting costs (no more invocation-based charges), reducing latency (no middle layer or cold starts), and simplifying the setup, while retaining dynamic configuration via the `VERCEL_PROXIED_API_BASE_URL` environment variable. Updates goplus#804 [^1]: https://vercel.com/docs/build-output-api/v3 Signed-off-by: Aofei Sheng <aofei@aofeisheng.com>

aofei · 2024-08-28T11:37:31Z

The output of traceroute 16.163.99.55 (from QVM dal-vm01 to Vercel hkg1):

traceroute to 16.163.99.55 (16.163.99.55), 30 hops max, 60 byte packets
 1  10.66.130.25 (10.66.130.25)  0.200 ms  0.137 ms  0.215 ms
 2  [REDACTED] ([REDACTED])  12.330 ms  12.395 ms  12.295 ms
 3  10.66.20.1 (10.66.20.1)  15.640 ms  15.766 ms  15.796 ms
 4  172.16.200.5 (172.16.200.5)  0.392 ms 172.16.200.1 (172.16.200.1)  0.574 ms 172.16.200.5 (172.16.200.5)  0.364 ms
 5  * * *
 6  * * *
 7  * 38.83.110.137 (38.83.110.137)  2.118 ms *
 8  * * 4.36.173.37 (4.36.173.37)  2.233 ms
 9  4.69.209.110 (4.69.209.110)  1.944 ms be2595.ccr31.dfw01.atlas.cogentco.com (154.54.93.221)  2.134 ms ae1.3515.edge1.Dallas2.net.lumen.tech (4.69.209.114)  2.115 ms
10  * Tata-level3-Dallas2.Level3.net (4.68.74.42)  2.320 ms be3821.ccr21.elp02.atlas.cogentco.com (154.54.165.26)  13.238 ms
11  * * if-ae-43-2.tcore2.dt8-dallas.as6453.net (66.110.57.23)  138.610 ms
12  be2932.ccr42.lax01.atlas.cogentco.com (154.54.45.162)  34.066 ms be2931.ccr41.lax01.atlas.cogentco.com (154.54.44.86)  33.474 ms be2932.ccr42.lax01.atlas.cogentco.com (154.54.45.162)  34.161 ms
13  be3360.ccr41.lax04.atlas.cogentco.com (154.54.25.150)  34.045 ms *  34.360 ms
14  be2894.ccr72.tyo01.atlas.cogentco.com (154.54.1.22)  135.802 ms  139.303 ms *
15  154.18.29.202 (154.18.29.202)  135.731 ms  131.790 ms *
16  150.222.90.107 (150.222.90.107)  134.095 ms 150.222.90.109 (150.222.90.109)  140.729 ms *
17  * * *
18  54.239.52.97 (54.239.52.97)  139.424 ms 54.239.52.105 (54.239.52.105)  135.930 ms 54.239.52.107 (54.239.52.107)  137.152 ms
19  * 52.95.30.26 (52.95.30.26)  136.074 ms 52.95.30.16 (52.95.30.16)  135.505 ms
20  * * *
21  52.93.35.62 (52.93.35.62)  180.091 ms 52.93.157.153 (52.93.157.153)  180.897 ms *
22  52.93.157.133 (52.93.157.133)  185.932 ms 52.93.157.56 (52.93.157.56)  179.004 ms 52.93.157.140 (52.93.157.140)  181.591 ms
23  54.240.241.183 (54.240.241.183)  191.584 ms 52.93.157.112 (52.93.157.112)  188.168 ms 52.93.157.160 (52.93.157.160)  179.612 ms
24  52.93.157.92 (52.93.157.92)  178.889 ms 52.93.157.22 (52.93.157.22)  185.641 ms 52.93.157.92 (52.93.157.92)  178.832 ms
25  * * 52.93.156.25 (52.93.156.25)  186.178 ms
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *

Via Vercel-proxied API:

Direct API access:

) This replaces Vercel Edge Middleware with the Vercel Build Output API[^1] to reimplement the API proxy, cutting costs (no more invocation-based charges), reducing latency (no middle layer or cold starts), and simplifying the setup, while retaining dynamic configuration via the `VERCEL_PROXIED_API_BASE_URL` environment variable. Updates #804 [^1]: https://vercel.com/docs/build-output-api/v3 Signed-off-by: Aofei Sheng <aofei@aofeisheng.com>

aofei · 2024-09-04T04:45:22Z

I conducted a test by sending 50 requests via the Vercel-proxied API using the browser console. Based on the Nginx access logs on the spx-backend server,

we can infer that each request might be handled by different nodes within the Vercel Edge Network, as indicated by the varying remote_addr values.
Additionally, the connection_requests=1 log entry suggests that even when the same node handles multiple requests, Vercel does not reuse the previous TCP connection.
Furthermore, the varying ssl_session_id values indicate that Vercel does not reuse SSL sessions.

In summary, Vercel Rewrites does not seem to use any kind of "session persistence" at all, meaning each request requires a full TCP connection establishment and TLS handshake, even for rapid, consecutive requests from the same client. This has a significant negative impact on spx-gui's performance. So we should consider disabling the Vercel reverse proxy for the API and switch back to direct API access until Vercel provides a reasonable solution.

aofei · 2024-09-04T06:14:33Z

And as a comparison, I conducted a similar test with the API proxied by Cloudflare Workers:

export default {
  async fetch(request) {
    const url = new URL(request.url);
    url.hostname = 'builder-api.goplus.org';
    return fetch(new Request(url, request));
  },
};

The result was as expected, with improved performance:

HTTP/2:
HTTP/1.1:

nighca · 2024-09-25T05:57:34Z

Complete response from Vercel Support

nighca September 4, 2024 at 2:39 PM

We configured rewrites in project to proxy our APIs, while it is causing high latency. We tried to dig in and find that it lacks connection reuse when do proxy. Please see details in #804 (comment)

nighca September 4, 2024 at 2:51 PM

And some background info:

We configure rewrites based on Vercel Build Output API. We use vite-plugin-vercel to help us generate the output, you can find related configurations in

builder/spx-gui/vite.config.ts

Lines 40 to 44 in 22cb968

rewrites: [

{

source: '/api/(.*)',

destination: (env.VERCEL_PROXIED_API_BASE_URL as string) + '/$1'

},

The rewrites proxies requests like https://builder.goplus.org/api/xxx to https://builder-api.goplus.org/xxx

We tested and found that the latency seems to be caused by establishing connection every time, which could have been avoided by connection reuse.

Vercel Support September 5, 2024 at 8:29 PM

Hi Hanxing,

Thanks for getting in touch with Vercel Support! I’m more than happy to look into this with you.

We appreciate you sharing context and relevant GitHub links to the issue and code. I'll check with our engineering team to understand if there are improvements that could be made and update you once I hear back. We appreciate your patience in the meantime.

Kind Regards,

Jennifer Tran
▲ Sr. Customer Success Engineer at Vercel
Visit our Community for developer-led discussions and help.

Your feedback matters! 🌟 If you’re satisfied with the support you received, please consider leaving a positive rating. Your feedback helps us continue providing excellent service. Thank you!

nighca September 13, 2024 at 10:25 AM

Hi, is there any update about this?

Vercel Support September 14, 2024 at 4:20 AM

Hi Hanxing,

Thank you for your patience! We did hear back from our engineering team.

There is connection pooling, but it is at worker level. Each node has N workers, so it is possible to hit the same node but still not reuse the connection. This is not something that can be adjusted for a given project as it is due to the way that we proxy external rewrites and improving it will require significant changes to our infrastructure.

I hope this information helps, please don't hesitate to reach out with additional questions and we would be happy to help!

Cheers,

Zach
Senior Customer Success Engineer ▲ Vercel

thread::-hZjW49vK9KcEh-U043ft0E::

nighca September 14, 2024 at 3:59 PM

Thank you for your reply. However, I must let you know that in our tests, "connection pooling" has nearly never taken effect. This, by default, leads to performance issues for projects that use external rewrites, which I believe is not uncommon. If I'm not mistaken, it could mean many similar projects are already impacted.

Vercel Support September 24, 2024 at 12:41 PM

Hello Hanxing,

Thank you for your update here. After further investigation, our engineering team has determined there is no change planned that can be implemented at this time to alter the current behavior for external rewrites. I realize this is not the update you were hoping to receive and for that, I do apologize.

Should you have any further questions, please do not hesitate to reach out.

Kind regards,

Sen
▲ Senior Customer Success Engineer at Vercel

aofei self-assigned this Aug 26, 2024

aofei mentioned this issue Aug 28, 2024

refactor: utilize Vercel Build Output API to reimplement API proxy #822

Merged

nighca mentioned this issue Sep 25, 2024

chore: sync Vite Modes with NODE_ENV #944

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistent high latency on PROD with Vercel-proxied spx-backend #804

Consistent high latency on PROD with Vercel-proxied spx-backend #804

aofei commented Aug 26, 2024

aofei commented Aug 28, 2024 •

edited

Loading

aofei commented Sep 4, 2024 •

edited

Loading

aofei commented Sep 4, 2024 •

edited

Loading

nighca commented Sep 25, 2024

nighca September 4, 2024 at 2:39 PM

nighca September 4, 2024 at 2:51 PM

Vercel Support September 5, 2024 at 8:29 PM

nighca September 13, 2024 at 10:25 AM

Vercel Support September 14, 2024 at 4:20 AM

nighca September 14, 2024 at 3:59 PM

Vercel Support September 24, 2024 at 12:41 PM

Consistent high latency on PROD with Vercel-proxied spx-backend #804

Consistent high latency on PROD with Vercel-proxied spx-backend #804

Comments

aofei commented Aug 26, 2024

aofei commented Aug 28, 2024 • edited Loading

aofei commented Sep 4, 2024 • edited Loading

aofei commented Sep 4, 2024 • edited Loading

nighca commented Sep 25, 2024

nighca September 4, 2024 at 2:39 PM

nighca September 4, 2024 at 2:51 PM

Vercel Support September 5, 2024 at 8:29 PM

nighca September 13, 2024 at 10:25 AM

Vercel Support September 14, 2024 at 4:20 AM

nighca September 14, 2024 at 3:59 PM

Vercel Support September 24, 2024 at 12:41 PM

aofei commented Aug 28, 2024 •

edited

Loading

aofei commented Sep 4, 2024 •

edited

Loading

aofei commented Sep 4, 2024 •

edited

Loading