Automatically choose workspace-cluster based on lowest latency. #5596

meysholdt · 2021-09-08T08:59:35Z

Problem Statement

We currently have workspace clusters in one region in the EU and one region in the US. To offer service at a good latency (e.g. < 100ms), we will need more clusters, maybe as many as one or two per continent. See https://gcping.com/ for your personal latency to every google cloud region. See the GCP network map for available regions and connections between them.

Prior Art

Collect 'gcping' data from the dashboard by @jankeromnes .

Proposed Solution

The user's web browser should measure the latency for every available workspace cluster and send the measurements to the gitpod-server, so that the server can make an informed decision about what workspace-cluster is best for the user.

Considerations

latency measurement should not slow down workspace startup time
the decision what workspace-cluster to choose should remain with the gitpod-server, because in the future, other factors besides latency may influence the decision: Example: cluster health.

Proposed Design Choices:

to keep workspace startup fast, the latency measurement should be cached. For example in a cookie in the web-browser.
to keep workspace startup fast, the latency measurement should preferable not be done when a workspace starts, but when a user visits any website of gitpod.
every workspace clusters should have a public endpoint that can be "pinged" from the web browser for latency measurement.
the server should make a cache-key and the ws-cluster-endpoints available to the users. The cache-key should encode the public IP address of the user, so that the latency will be measured again if the user changes his/her network.

Example Flow 1:

the user visit gitpod.io/workspaces.
the users browser receives {'cache-key': 'FJJDSKD', "clusters": {"us07": "https://us07.gitpod.io/ping", "sing01": "https://sing01.gitpod.io/ping" } }
the user browser measures the latency to all clusters in the background and stores the result in a cookie: {"us07": 230, "sing01": 60}
When the user opens a workspace, the cookie will be send to the gitpod-server and the server will use the latency measurement to chose the best workspace cluster.

Example Flow 2:

the user opens a workspace. The cookie is already there. No delay during workspace-start.

Example Flow 3:

the user opens a workspace. The cookie is not yet there. The is the case we want to avoid, but I don't think it can be avoided all the time.
measure the latency. Maybe the measurement can be aborted when the first workspace-cluster responds, because the first to respond will also be the one with the lowest latency (duh!). While there is the risk that the measurement is slightly inaccurate and repeated measurements would be needed for more accurate results, it seems like a good compromise to preserve fast workspace startup time. This way, if not cookie is present, 15 to ~200 ms will be added to to the workspace startup time.

The text was updated successfully, but these errors were encountered:

csweichel · 2021-09-09T08:27:56Z

Excellent idea - but we really don't have time for this right now.
We'll want to revisit workspace cluster selection once we make a decision on multi-meta.

jankeromnes · 2021-09-16T07:19:46Z

Prior Art

Collect 'gcping' data from the dashboard by @jankeromnes .

FYI, that proposal is to temporarily gather ping times to all possible GCP regions, in order to decide "where should we create a brand new cluster next?" (and then stop collecting ping times, make a decision, and create the cluster)

The proposal was not to collect ping times in order to decide "which workspace cluster should be used right now?" -- doesn't GCP's load balancer already do that automatically? How does the US vs EU selection work right now? (I assume it's not some custom code we wrote, but GCP selecting a reasonable cluster automatically -- I would hope this would also work with 3 or more clusters without requiring us to write custom code for this)

bigint · 2021-09-16T08:57:10Z

I think the selection algorithm is broken, Im from India the nearby location is EU but whenever I fire a new workspace it gets created in the US region.

Also I tried with VPN from Vienna that time it created under EU region

🤔

jankeromnes · 2021-10-11T08:49:08Z

⚠️ Just to re-iterate: This issue suspiciously sounds like we want to re-implement something as standard as a load balancer.

I don't think we want to implement and maintain custom code that measures latency, caches it, and acts upon this data.

If possible, it would be much preferable to let Google Cloud pick the best workspace cluster automatically(!)

Inspiration: Best practices for Compute Engine regions selection > Use Cloud Load Balancing and Cloud CDN:

Cloud Load Balancing, such as HTTP(S) load balancing, TCP, and SSL proxy load balancing, let you automatically redirect users to the closest region where there are backends with available capacity.

csweichel · 2021-11-15T12:05:50Z

I don't think we want to implement and maintain custom code that measures latency, caches it, and acts upon this data.

If possible, it would be much preferable to let Google Cloud pick the best workspace cluster automatically(!)

Cloud Load Balancing, such as HTTP(S) load balancing, TCP, and SSL proxy load balancing, let you automatically redirect users to the closest region where there are backends with available capacity.

The reason we need to build/maintain something ourselves is that the StartWorkspace request which would need to be regional does not go through a regional load balancer, because it's issued from server to ws-manager, and not from the (regional) user's browser.

csweichel · 2021-11-15T12:30:10Z

The minimal steps to make automatic cluster choices would be:

add a kind of "ping" endpoint to ws-proxy, so that e.g. ws-eu18.gitpod.io does not answer with 404
add a getAllRegions function to WorkspaceManagerClientProvider which returns a list of ping URLs and names.
make the dashboard execute the RTT pings as outlined above.
extend the createWorkspace and startWorkspace calls on server so that they take a cluster preference, which would then be passed in via the ExtendedUser and become an admission preference. Note, this way the cluster preference plays nicely with the score and cluster status.

Offline we discussed the option of making the workspace cluster (or region) choice explicit on the dashboard. By default we'd select the cluster with the lowest RTT (as outlined above).

However, focusing on the individual cluster instead of a region has several drawbacks:

it's noisy on the dashboard because clusters change very often (with every new workspace deployment)
we need to measure often because of the many cluster changes

Instead, we could introduce a region to clusters. We'd introduce a new region field as admission constraint and on the ws-manager-bridge API. New cluster registrations could provide the region when they're registered. We'd assume that from a latency perspective all regional clusters are equivalent, i.e. a measurement to one cluster is equivalent to that of another within the same region.

meysholdt · 2021-12-22T12:02:54Z

Not sure why this got labeled "platform". The enhancements would mostly need to happen in components owned by the meta team.

stale · 2022-04-30T09:31:23Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

bigint · 2022-04-30T13:01:30Z

This is still not yet fixed 🤔

From India it always choose us clusters instead of eu

stale · 2022-07-31T03:00:23Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

chientrm · 2022-08-31T14:23:38Z

Nah. Just get the coordinate of the user via IP address and pick the nearest server. Every server should be located in a city. AFAIK Gitpod's running on GCP.
Moreover, many cloud provider like CloudFlare Pages/Worker already append IP and lat/long in HTTP request header 🤭.

kylos101 · 2022-12-08T14:59:39Z

👋 @geropl reopening, perhaps something we can discuss to see if it can be included in an iteration early next year?

meysholdt mentioned this issue Sep 8, 2021

Gitpod Singapore (sg) region #5534

Closed

meysholdt mentioned this issue Oct 12, 2021

[wsman-bridge] Introduce admission preferences #6164

Merged

jankeromnes added the team: devx label Dec 8, 2021

jankeromnes added aspect: performance anything related to performance type: feature request New feature or request labels Dec 8, 2021

meysholdt added team: webapp Issue belongs to the WebApp team and removed team: devx labels Dec 22, 2021

jldec added this to 🍎 WebApp Team Jan 2, 2022

csweichel added this to 🌌 Workspace Team Jan 6, 2022

csweichel moved this to In Progress in 🌌 Workspace Team Jan 6, 2022

csweichel mentioned this issue Jan 6, 2022

Support more regional workspace cluster #7490

Closed

kylos101 mentioned this issue Jan 6, 2022

Epic: more regional workspace clusters #7489

Open

1 task

atduarte changed the title ~~Automatically chose workspace-cluster based on lowest latency.~~ Automatically choose workspace-cluster based on lowest latency. Jan 29, 2022

kylos101 moved this from In Progress to Scheduled in 🌌 Workspace Team Jan 31, 2022

kylos101 removed the status in 🌌 Workspace Team Feb 24, 2022

stale bot added the meta: stale This issue/PR is stale and will be closed soon label Apr 30, 2022

stale bot removed the meta: stale This issue/PR is stale and will be closed soon label Apr 30, 2022

atduarte removed this from 🌌 Workspace Team Jul 25, 2022

stale bot added the meta: stale This issue/PR is stale and will be closed soon label Jul 31, 2022

stale bot closed this as completed Aug 13, 2022

stale bot moved this to Done in 🍎 WebApp Team Aug 13, 2022

meysholdt reopened this Sep 21, 2022

Repository owner moved this from Done to In Progress in 🍎 WebApp Team Sep 21, 2022

axonasif removed the status in 🍎 WebApp Team Sep 21, 2022

stale bot closed this as completed Oct 19, 2022

stale bot moved this to In Validation in 🍎 WebApp Team Oct 19, 2022

kylos101 reopened this Dec 8, 2022

Repository owner moved this from In Validation to Scheduled in 🍎 WebApp Team Dec 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically choose workspace-cluster based on lowest latency. #5596

Automatically choose workspace-cluster based on lowest latency. #5596

meysholdt commented Sep 8, 2021

csweichel commented Sep 9, 2021

jankeromnes commented Sep 16, 2021 •

edited

Loading

Prior Art

bigint commented Sep 16, 2021

jankeromnes commented Oct 11, 2021 •

edited

Loading

csweichel commented Nov 15, 2021

csweichel commented Nov 15, 2021 •

edited

Loading

meysholdt commented Dec 22, 2021

stale bot commented Apr 30, 2022

bigint commented Apr 30, 2022

stale bot commented Jul 31, 2022

chientrm commented Aug 31, 2022

kylos101 commented Dec 8, 2022

Automatically choose workspace-cluster based on lowest latency. #5596

Automatically choose workspace-cluster based on lowest latency. #5596

Comments

meysholdt commented Sep 8, 2021

Problem Statement

Prior Art

Proposed Solution

Considerations

Proposed Design Choices:

Example Flow 1:

Example Flow 2:

Example Flow 3:

csweichel commented Sep 9, 2021

jankeromnes commented Sep 16, 2021 • edited Loading

Prior Art

bigint commented Sep 16, 2021

jankeromnes commented Oct 11, 2021 • edited Loading

csweichel commented Nov 15, 2021

csweichel commented Nov 15, 2021 • edited Loading

meysholdt commented Dec 22, 2021

stale bot commented Apr 30, 2022

bigint commented Apr 30, 2022

stale bot commented Jul 31, 2022

chientrm commented Aug 31, 2022

kylos101 commented Dec 8, 2022

jankeromnes commented Sep 16, 2021 •

edited

Loading

jankeromnes commented Oct 11, 2021 •

edited

Loading

csweichel commented Nov 15, 2021 •

edited

Loading