Support HTTP long polling as an alternative to WebSockets #859

jonathanperret · 2024-02-19T15:32:36Z

Motivation

This PR is a proposed fix for #840 .

As mentioned in that issue, the motivation for supporting an alternative to WebSockets is that while all browsers supported by Grist offer native WebSocket support, some networking environments do not allow WebSocket traffic.

Implementation

Engine.IO was selected as the underlying implementation of HTTP long polling, after careful consideration as described here.

While Engine.IO offers both a WebSocket-based transport and HTTP long polling, and the ability to transparently upgrade from long polling to WebSockets, I chose to only use Engine.IO for its polling implementation, to avoid imposing the overhead of the Engine.IO protocol (and server-side memory usage) on the vast majority of clients, which can use plain WebSockets.

The Grist client will first attempt a regular WebSocket connection, using the same protocol and endpoints as before, but fall back to long polling using Engine.IO if the WebSocket connection fails.

To get all of the test cases to pass (particularly the Comm tests) I had to add some form of emulation of the behavior of the ws library to the Engine.IO-based implementation. In particular, while the Engine.IO socket's send method accepts a callback, it is only called upon success and not if the transfer fails. Because a failed transfer immediately causes the socket to be closed, I keep track of messages that have not had their callback called yet and invoke these upon close if any are still pending.

paulfitz · 2024-02-20T21:57:00Z

Should the raw WebSocket backend remain available? In theory, Engine.IO will upgrade to the more efficient transport (i.e. WebSockets over long polling) and therefore it would seem unnecessary to maintain a separate implementation. However, this represents a significant change in a critical component of Grist and a conservative approach might be called for. Perhaps a typical sequence of making the new backend available, then making it the default, and finally deprecating the legacy backend would make sense?

There's a case for doing that, since CI style testing doesn't really capture the diversity of browsers and company firewalls and network latency out there. But I'd guess this would involve significant more work? An alternative would be, once we're happy with the PR, to tag it for release to a fly.io preview system we have before it lands. Then we could have volunteers from all over the world (recruited from our discord) hammer on it a bit. That could still miss a problem that only shows up after e.g. multiple days work with intermittent laptop suspends or whatever, but it would increase confidence that whatever problem remains it'll be something we can handle in follow-up.

jonathanperret · 2024-02-22T19:07:34Z

There's a case for doing that, since CI style testing doesn't really capture the diversity of browsers and company firewalls and network latency out there. But I'd guess this would involve significant more work? An alternative would be, once we're happy with the PR, to tag it for release to a fly.io preview system we have before it lands. Then we could have volunteers from all over the world (recruited from our discord) hammer on it a bit. That could still miss a problem that only shows up after e.g. multiple days work with intermittent laptop suspends or whatever, but it would increase confidence that whatever problem remains it'll be something we can handle in follow-up.

To make sure we're on the same page: if when saying "this would involve significant more work" you're referring to the complexity of having both implementations coexist in the code, this is something I have already handled. I briefly tried a wholesale substitution at the start but realized I was not going to be able to ensure the absence of regressions if I was not able to keep running the previous (raw WebSockets) implementation side-by-side.

Basically, with the code in this PR, setting (at run time — it has no effect on the build) GRIST_USE_ENGINE_IO to a falsy value (or not setting it at all) causes the server and client to use the legacy WebSocket-based implementation — albeit with a slightly more convoluted code path due to the GristSocketServer/GristClientSocket abstractions I introduced. I am highly confident that this mode of execution provides 100% compatibility with the existing code.

However you may have been referring to extra work on the release management and support side of things — in which case I agree that there would be an added burden in supporting these two configurations.

In any case, having this new implementation put through its paces by volunteers sound like a great idea. I have just managed to get all the tests to pass, so I'll mark this PR as ready for review and hopefully it can be deployed on your testing environment.

jonathanperret · 2024-02-22T19:34:35Z

I have updated the PR description to reflect the current implementation status.

I have also pushed a commit that adds GRIST_USE_ENGINE_IO=1 to the environment in fly-template.toml, hopefully this is how an environment variable is added to a deployment?

jonathanperret · 2024-02-22T19:37:09Z

The Fly deployment failure seems caused by the fact that PRs based on branches from forks do not have access to repository secrets, for obvious security reasons. You may have to clone this branch to one local to this repository?

paulfitz · 2024-02-22T20:05:58Z

The Fly deployment failure seems caused by the fact that PRs based on branches from forks do not have access to repository secrets, for obvious security reasons. You may have to clone this branch to one local to this repository?

Yes, you're right, done. And thanks for adding the environment variable, I was just kicking the preview machinery to make sure it still works. For anyone following along, a preview is over in https://grist-http-long-polling.fly.dev/

For the work: thanks for reminding me you had this feature under a flag already. If code review and preview all seem fine I think I'll be voting for making the new behavior the default, though it will be good that any installation that sees trouble has an easy way back to the old behavior.

Will get someone on reviewing this as soon as possible. Thanks a lot for your work! Will be exciting to have this extra level of robustness.

paulfitz · 2024-02-22T20:14:39Z

(just did a quick test in firefox with network.websocket.max-connections set to 0 and working perfectly so far with no websockets, very nice)

jonathanperret · 2024-02-23T18:01:34Z

Testing with a more complex deployment including multiple doc workers and a load balancer revealed that the addition of the /engine.io/ prefix to socket requests meant that some load balancer configurations may need to be updated.

I'll be trying to get rid of that prefix so that Grist admins don't need to rework their setup.

jonathanperret · 2024-02-29T17:51:24Z

Hi! I added a few commits that :

remove the /engine.io/ prefix previously added to WebSocket and polling request paths; this should make the new protocol compatible with practically any existing deployment as the URLs are identical. This fixed the deployment on our local staging instances which use path-based routing;
fix reconnection behavior when GRIST_USE_ENGINE_IO=1 (the client wouldn't re-attempt a connection after an initial failure);
implement a proper terminate() method for Engine.IO-based sockets, for quicker connection interruption when necessary;
add the necessary CORS headers for polling to work even in a deployment where workers are on a different domain from the home server.
@paulfitz if you update the local branch we can see if this still works on the Fly deployment.
@dsagal I'm looking forward to your feedback when you find the time to look at this.

dsagal

This is great work, thank you @jonathanperret!

app/client/components/GristClientSocket.ts

app/common/gristUrls.ts

app/server/lib/GristServerSocket.ts

dsagal · 2024-02-29T13:35:48Z

app/server/lib/GristServerSocket.ts

+      log.debug("got socket close reason=%s description=%s messages=%s",
+        reason, description?.message ?? description, [...this._messageCallbacks.keys()]);
+
+      const err = description ?? new Error(reason);


Is description an Error instance, when present?

As far as I can see when it is constructed and forwarded, the description is indeed an Error instance. It is not documented as such however, unfortunately.

I'm not sure what to do here to make this cleaner. Maybe just ignore the description?

That's fine, maybe just add a comment that in practice description is an instance of Error, or just change its type to Error if Typescript doesn't mind.

I went for a comment and hopefully robust way of always ending up with an Error object. I also changed GristServerSocket.onerror to take an (err: Error) => void callback.

Suggested change

const err = description ?? new Error(reason);

// In practice, when available, description has more error details,

// possibly in the form of an Error object.

const maybeErr = description ?? reason;

const err = maybeErr instanceof Error ? maybeErr : new Error(maybeErr);

app/server/lib/GristServerSocket.ts

dsagal · 2024-02-29T14:00:26Z

app/server/lib/GristSocketServer.ts

+  constructor(server: http.Server) {
+    super();
+    this._server = new EIO.Server({
+      allowUpgrades: false,


Is this false because we try websocket before long-polling, and this option is for upgrading from long-polling to websockets?

Yes, exactly.

I'd just suggest adding a comment for anyone else wondering the same question.

app/server/lib/GristSocketServer.ts

dsagal · 2024-02-29T19:31:54Z

app/server/lib/GristSocketServer.ts

+        // this CORS behavior will only apply to polling requests, which end up
+        // subjected downstream to the same Origin checks as those necessarily
+        // applied to WebSocket requests, it is safe to let any client attempt
+        // a connection here.


Eek, it looks like your security assumption isn't actually the reality today. It doesn't look like we are checking the Origin header for websocket requests. I had to test whether there is a security vulnerability because of that. It looks like there is not, currently, at least on modern browsers, because we set SameSite=Lax attribute on session cookies, so they are not actually sent with CORS requests. With websockets, I don't think there is a way to include them. So a CORS ws connection may be made, and used to access public documents, but it would not include credentials to access private ones.

Your assumption (that an Origin check would necessarily get applied to all WebSocket requests) makes sense. I think it's important even if SameSite=Lax provides protection, if only for consistency with other request handling where we have decided it's important (and also because my quick check does not rule out all possible shenanigans). Incidentally, I found this commit as our latest change to origin checking.

Is that something you would consider adding as part of this diff?

Ooh, good thing you looked into this then. Indeed, after a bit of testing on my side I can confirm it looks like SameSite is what is really keeping cross-origin WebSockets from being a vulnerability (WebSocket requests do include what cookies they can by default, without having to add e.g. credentials: "include" as is required with fetch).

It looks like the underlying assumption that "Opening up CORS for regular requests that are being handled by the same code that treats WebSocket requests (however insecurely) does not make things worse" still holds though.

But I agree that checking Origin would make for a more secure implementation all around. I would not mind looking into including the extra check, however I would require a few pointers to find the proper methods to call for verifying the origin.

The commit you referenced modifies the FlexServer.trustOriginHandler method but I cannot call this when validating a WebSocket request, since it is not in the (req, res, next) context of an Express middleware. The underlying requestUtils.trustOrigin is still bound to a Request/Response interface and can add response headers which again do not make sense in a WebSocket response (by the way, the fact that on this line req.hostname is checked rather than origin is puzzling).

It looks like the core of what trustOrigin does is in this line:

if (!allowHost(req, new URL(origin)) && !isEnvironmentAllowedHost(new URL(origin)))

I could add something like this early in Comm._onWebSocketConnection (where I found out WebSocket requests whose Host header does not match an allowed value are rejected, as a result of calling hosts.addOrgInfo).

But looking at allowHost, I see a cast of req to RequestWithOrg (when the raw WebSocket request clearly has no organization info, at least when it first arrives) and this gets into a part of the code I have no familiarity with yet. I wouldn't want to break a legitimate scenario, so I'm afraid some hand-holding will be necessary here.

If you can show me how and where you would add this Origin check to WebSocket requests in the current codebase, I think I would easily port this to my rearranged version of the code in this PR. Come to think of it, maybe it would be more prudent to add this check in a separate (prerequisite to this one) PR?

...require a few pointers to find the proper methods to call for verifying the origin.

What you suggested is exactly what I had in mind. addOrgInfo is what turns a req into RequestWithOrg, so if you add the allowHost check afterwards, it should almost make sense. "Almost" because you would need to adjust some methods to accept either express.Request or the underlying http.IncomingMessage. I think that's always possible by getting headers using req.headers[lowercase_name].

If allowHost returns false, we could then either fail when credentials are present (since we don't expect them to be), like trustOriginHandler does, or perhaps just strip the credentials and keep going.

the fact that on this line req.hostname is checked rather than origin is puzzling

Doh, it's puzzling to me too. It looks like a mistake. It seems it was added for tests, and in practice normally has no effect (because GRIST_HOST is either unset or set to something like 0.0.0.0), but if it did have an effect, it also looks like the wrong one to me. I'd try removing it and seeing if any tests fail. If you can try that, that would be great; if not, let me know, I'll make a note so it doesn't fall through the cracks. Thank you for noticing!

the fact that on this line req.hostname is checked rather than origin is puzzling

Doh, it's puzzling to me too. It looks like a mistake. It seems it was added for tests, and in practice normally has no effect (because GRIST_HOST is either unset or set to something like 0.0.0.0), but if it did have an effect, it also looks like the wrong one to me. I'd try removing it and seeing if any tests fail. If you can try that, that would be great; if not, let me know, I'll make a note so it doesn't fall through the cracks. Thank you for noticing!

I can confirm that removing this line has no effect on tests. I ran the whole test suite on GitHub Actions, even though it seems only the DocApi test suite would catch a regression in this code based on the fact that it seems to be the only suite where the Origin header is varied.

So I went ahead and removed that line. Now looking into getting websocket request origins checked as you described. This will involve adding some Comm tests I believe.

dsagal · 2024-03-01T18:19:39Z

app/server/lib/GristServerSocket.ts

+      log.debug("got socket close reason=%s description=%s messages=%s",
+        reason, description?.message ?? description, [...this._messageCallbacks.keys()]);
+
+      const err = description ?? new Error(reason);


That's fine, maybe just add a comment that in practice description is an instance of Error, or just change its type to Error if Typescript doesn't mind.

dsagal · 2024-03-04T17:30:05Z

app/server/lib/GristSocketServer.ts

+  constructor(server: http.Server) {
+    super();
+    this._server = new EIO.Server({
+      allowUpgrades: false,


I'd just suggest adding a comment for anyone else wondering the same question.

dsagal · 2024-03-04T17:43:22Z

app/client/components/GristClientSocket.ts

+  public set onmessage(cb: null | ((data: string) => void)) {
+    this._ws.onmessage = cb ?
+      (event: WS.MessageEvent | MessageEvent<any>) => {
+        cb(event.data instanceof String ? event.data : event.data.toString());


Would this work on the client-side? It looks like it might string in practice, but if it's not, then according to documentation, it would be ArrayBuffer or Blob, and neither would do anything useful with toString(), I don't think.

On the client side, I think it is supposed to always be a string since we are only sending text frames, see https://developer.mozilla.org/en-US/docs/Web/API/WebSocket/message_event#event_properties

If the message type is "text", then this field is a string.

On the server side (using ws), the call to .toString() here is meant to only make explicit what JSON.parse in GristWSConnection was doing implicitly.

But in fact, looking at what ws does to emulate the HTML5 WebSocket API, it seems the toString() is not even necessary since it will have been node by ws to convert its NodeJS Buffer object (which has a useful toString, as contrasted from ArrayBuffer or Blob) to a string to match standard WebSocket behavior.

So I've gone ahead and removed the toString() and added an explanatory comment.

Suggested change

cb(event.data instanceof String ? event.data : event.data.toString());

// event.data is guaranteed to be a string here because we only send text frames.

// https://developer.mozilla.org/en-US/docs/Web/API/WebSocket/message_event#event_properties

cb(event.data);

dsagal · 2024-03-04T19:16:58Z

app/server/lib/GristServerSocket.ts

-          cb(err);
-        }
+      for (const cb of this._messageCallbacks.values()) {
+        cb?.(err);


?. part shouldn't be needed here.

You're right, I removed it.

dsagal · 2024-03-04T19:19:20Z

app/server/lib/GristServerSocket.ts

-      log.debug("calling cb for msg %d", msgNum);
-      cb();
+      if (this._messageCallbacks.delete(msgNum)) {
+        cb?.();


Can cb be undefined? That's fine, but it would be good if its type reflected that.

Good point. I changed send so that the callback is optional.

public send(data: string, cb?: (err?: Error) => void) { const msgNum = this._messageCounter++; if (cb) { this._messageCallbacks.set(msgNum, cb); } this._socket.send(data, {}, () => { if (cb && this._messageCallbacks.delete(msgNum)) { cb(); } }); }

app/server/lib/GristSocketServer.ts

dsagal · 2024-03-04T20:09:59Z

app/server/lib/GristSocketServer.ts

+        // this CORS behavior will only apply to polling requests, which end up
+        // subjected downstream to the same Origin checks as those necessarily
+        // applied to WebSocket requests, it is safe to let any client attempt
+        // a connection here.


...require a few pointers to find the proper methods to call for verifying the origin.

What you suggested is exactly what I had in mind. addOrgInfo is what turns a req into RequestWithOrg, so if you add the allowHost check afterwards, it should almost make sense. "Almost" because you would need to adjust some methods to accept either express.Request or the underlying http.IncomingMessage. I think that's always possible by getting headers using req.headers[lowercase_name].

If allowHost returns false, we could then either fail when credentials are present (since we don't expect them to be), like trustOriginHandler does, or perhaps just strip the credentials and keep going.

the fact that on this line req.hostname is checked rather than origin is puzzling

Doh, it's puzzling to me too. It looks like a mistake. It seems it was added for tests, and in practice normally has no effect (because GRIST_HOST is either unset or set to something like 0.0.0.0), but if it did have an effect, it also looks like the wrong one to me. I'd try removing it and seeing if any tests fail. If you can try that, that would be great; if not, let me know, I'll make a note so it doesn't fall through the cracks. Thank you for noticing!

dsagal

Thank you! All the changes look good so far, including the removal of the GRIST_HOST check (which I am not seeing in the PR yet but I saw your note that you are removing it).

Thank you also for working on the origin checks!

The reference to "window" would unexpectedly break Node-based tests where a client stays connected long enough to send a heartbeat. GristWSSettings.getPageUrl seems explicitly made for this.

This could avoid false results from the tests.

…ustOrigin

It seems this was added for test purposes but the current tests all pass without this, and it looks a bit safer to remove it.

These classes, used as an alternative to native WebSockets, provide an automatic fallback to HTTP long polling (implemented using Engine.IO) when a WebSocket connection fails.

jonathanperret · 2024-03-21T19:09:34Z

Hi @dsagal , I finally found the time to work on this again, putting in what will hopefully be the last big change.

For origin verification, I added a verifyClient callback to GristSocketServer which lets the caller filter both WebSocket and polling connections. When instantiating GristSocketServer, Comm passes a verifyClient callback that calls trustOrigin — which I modified as discussed to use IncomingMessage instead of Request. Hopefully this is not too far from what you had in mind.

More significantly, I have changed my approach to integrating Engine.IO. It is no longer introduced as a unifying abstraction over WebSockets and long polling, as I initially hoped to do. Instead, the existing WebSocket-based protocol remains the default communication method, with an automatic fallback to the long polling implementation provided by Engine.IO. Among other things, this solves a problem revealed by the ManyFetches test suite, which is that the memory usage of the server-side Engine.IO sockets was much higher than the basic ws sockets — I could not determine exactly why but this did not inspire much confidence.

The most commonly taken path will therefore remain the native WebSocket connection without the overhead of the Engine.IO protocol, which makes regressions much less likely.

This in turn led me to remove the GRIST_USE_ENGINE_IO environment variable, which had been an annoyance for which I couldn't formulate a clear exit plan. There is now only one implementation of GristSocketServer and GristClientSocket, which includes the aforementioned fallback behavior.

I have squashed most of the commits into one that introduces the GristSocketServer and GristClientSocket, keeping apart the handful of commits that were not directly related to the main work.

Apologies for the review churn, and as always I'm looking forward to your feedback.

dsagal · 2024-03-25T01:35:06Z

app/client/components/GristClientSocket.ts

+}
+
+export class GristClientSocket {
+  private _wsSocket: WS.WebSocket | WebSocket | undefined;


It would be helpful to have a comment that exactly one of _wsSocket, _eioSocket is set.

dsagal · 2024-03-25T02:23:46Z

app/client/components/GristClientSocket.ts

+  }
+
+  private _onEIOOpen() {
+    this._openHandler?.();


Doesn't this need to set _openDone? And doesn't _onEIOMessage need to check that? (If not, a comment to explain would be good, since that's a sign of a difference with WS implementation)

As mentioned below, I reworked the logic and there's only one state check left, in _onWSError since that's the only point where we need to decide whether to emit an error or downgrade the socket.
_onWSMessage doesn't need to check the state (now called _wsConnected) because it can only occur after a successful call to _onWSOpen. None of the _onEIO handlers need to check the state either, since the very fact they are called implies the socket was downgraded to Engine.IO.

dsagal · 2024-03-25T02:27:49Z

app/client/components/GristClientSocket.ts

+    this._openHandler?.();
+  }
+
+  private _onWSError(ev: Event) {


Are you sure it's impossible to receive the error event for a socket that's already open? If possible, should we forward it to _errorHandler rather than reconnect using polling at this point?

Good catch. I ended up simplifying the logic, replacing _openDone with a _wsConnected bool that only tells us whether the next WebSocket error event means we should retry with Engine.IO or emit the error.

dsagal · 2024-03-25T02:41:12Z

app/server/lib/requestUtils.ts

  return true;
 }

 // Returns whether req satisfies the given allowedHost. Unless req is to a custom domain, it is
 // enough if only the base domains match. Differing ports are allowed, which helps in dev/testing.
-export function allowHost(req: Request, allowedHost: string|URL) {
+export function allowHost(req: IncomingMessage, allowedHost: string|URL) {
  const mreq = req as RequestWithOrg;


I suggest avoiding creating mreq variable, since it's a risky type conversion (e.g. mreq.get() is presumably not always available but would be considered correct type), and instead doing this cast inline in the one place where it's used, i.e. (req as RequestWithOrg).isCustomHost.

Good idea about scoping the cast more tightly.
Actually, looking into it I realized the check as I had written it in Comm would not work for custom hosts, because allowHost was called before the request object had been enriched with organization info. I reworked the check to have the call to addOrgInfo done before trustOrigin.

dsagal · 2024-03-25T02:57:20Z

app/server/lib/requestUtils.ts

  if (process.env.APP_HOME_URL) {
    return new URL(process.env.APP_HOME_URL).protocol.replace(':', '');
  }
-  return req.get("X-Forwarded-Proto") || req.protocol;
+  return req.headers["x-forwarded-proto"] || ((req.socket as TLSSocket).encrypted ? 'https' : 'http');


I took a look at the implementation of protocol for express.Request. There is a difference around trust-proxy settings which protocol checks. But it's moot because we seem to have been blindly trusting x-forwarded-proto header anyway. That doesn't seem quite right (maybe it's safe in this case, but it's certanly not obvious to me). It's fine not to address here, but if you don't mind, I'd appreciate a TODO to look into when x-forwarded-proto should be trusted.

TODO Comment added.

dsagal · 2024-03-25T03:37:45Z

app/server/lib/GristServerSocket.ts

+    if (cb) {
+      this._messageCallbacks.set(msgNum, cb);
+    }
+    this._socket.send(data, {}, () => {


Can the callback ever have an argument here (e.g. if err can be received, should it be passed on)?

Engine.IO only calls the send() callback on success, without an argument. So if we get here the err callback argument is correctly undefined. I'm adding a comment to clarify.

dsagal · 2024-03-25T03:39:04Z

app/server/lib/GristServerSocket.ts

+
+  public set onmessage(handler: (data: string) => void) {
+    this._ws.on('message', (msg: Buffer) => handler(msg.toString()));
+    this._eventHandlers.push({ event: 'message', handler });


This needs to remember the wrapper function to match on removal (handler is not exactly what got added with _ws.on('message'))

Oops, sorry about missing that again, fixed.

dsagal · 2024-03-25T03:42:48Z

app/server/lib/Client.ts

@@ -189,7 +184,7 @@ export class Client {

  public interruptConnection() {
    if (this._websocket) {
-      this._removeWebsocketListeners();
+      this._websocket?.removeAllListeners();


Question mark unneeded here (given the conditional)

Right, removed.

paulfitz · 2024-03-28T14:20:12Z

I just updated the preview at https://grist-http-long-polling.fly.dev/ and edited a doc using firefox with network.websocket.max-connections=0 on about:config. Worked fine! Checked fly.io backend logs and confirmed that I see transport=polling in logs. When I enable websockets again and reload, I don't see transport=polling.

dsagal

Looks good. Thank you @jonathanperret !!

paulfitz · 2024-03-28T17:49:49Z

Thanks @jonathanperret and @dsagal ! This is really great to have.

One thing that came to my mind recently as possible follow-up. @jonathanperret is it possible to determine whether long-polling fallback is happening consistently, and log a warning if so? Not setting up websocket forwarding correctly is a fairly common configuration problem (one example of many: #261). Until now people have been forced to notice and fix this problem since they can't use documents at all until they get it right. But now, they might never even notice. Which isn't terrible, but seems a bit sub-optimal?

jonathanperret · 2024-03-28T20:05:43Z

Thank you guys for helping bring this to fruition. Can't wait to deploy it and unlock the power of Grist for our restricted-network users.

@paulfitz I see what you mean. I'm not optimistic however that a Grist administrator would notice such a warning, particularly since it wouldn't occur right on server startup.

Maybe calling out WebSockets as worthy of attention in the self-managing docs would help more? It appears there are zero hits for "WebSocket" currently on https://support.getgrist.com.

paulfitz · 2024-03-28T20:29:40Z

@paulfitz I see what you mean. I'm not optimistic however that a Grist administrator would notice such a warning, particularly since it wouldn't occur right on server startup.

I agree, that is true. There is an admin panel coming soon where installation problems can be highlighted. If the warning existed, then the developer working on that panel would just need to find it in the code and hook up a way to report it better.

Maybe calling out WebSockets as worthy of attention in the self-managing docs would help more? It appears there are zero hits for "WebSocket" currently on https://support.getgrist.com.

Good point! We have a systems engineer joining the team next week whose mandate will include smoothing out the self-hosting experience. I've added that to the list of easy things to start with :-)

jordigh · 2024-04-05T19:09:30Z

app/server/lib/requestUtils.ts

  if (!origin) { return true; } // Not a CORS request.
-  if (process.env.GRIST_HOST && req.hostname === process.env.GRIST_HOST) { return true; }


I was looking for an explanation for why this line was removed, as I believe it's responsible for certain requests failing under Docker and possibly under other circumstances.

From earlier in the review:

Jonathan:

the fact that on this line req.hostname is checked rather than origin is puzzling

Me:

Doh, it's puzzling to me too. It looks like a mistake. It seems it was added for tests, and in practice normally has no effect (because GRIST_HOST is either unset or set to something like 0.0.0.0), but if it did have an effect, it also looks like the wrong one to me.

When GRIST_HOST is set to 0.0.0.0, which seems to be the case when running all of Grist within the same Docker instance from the gristlabs/grist image, certain internal requests from the doc workers to the home servers seems to fail. This seems to be the underlying cause for why certain operations like duplicating documents fail when running the tests under Docker.

But surely running the Docker tests isn't the only reason to have a setup in which we might need such internal requests?

There's a somewhat related discussion in #915

@jordigh would you have a repro scenario for requests failing since this change, under Docker or otherwise?

It won't fail anymore since 400937f, but if you want to see the failure, try running MOCHA_WEBDRIVER_HEADLESS=1 yarn run test:docker -g DuplicateDocument at its parent, (17ea97d).

jonathanperret mentioned this pull request Feb 19, 2024

Non-WebSocket transport alternative? #840

Closed

jonathanperret force-pushed the http-long-polling branch 2 times, most recently from afb841e to d38b44b Compare February 19, 2024 17:52

jonathanperret force-pushed the http-long-polling branch from 1081d2f to 5990029 Compare February 22, 2024 18:50

jonathanperret marked this pull request as ready for review February 22, 2024 19:07

paulfitz added the preview Launch preview deployment of this PR label Feb 22, 2024

paulfitz mentioned this pull request Feb 22, 2024

http long polling branch for preview #865

Closed

paulfitz removed the preview Launch preview deployment of this PR label Feb 22, 2024

jonathanperret force-pushed the http-long-polling branch from 3bc132e to 3b82ad6 Compare February 23, 2024 14:23

paulfitz requested a review from dsagal February 28, 2024 13:50

dsagal reviewed Feb 29, 2024

View reviewed changes

dsagal reviewed Mar 4, 2024

View reviewed changes

dsagal reviewed Mar 7, 2024

View reviewed changes

jonathanperret force-pushed the http-long-polling branch from 3b2a88c to 0139004 Compare March 11, 2024 16:23

jonathanperret mentioned this pull request Mar 14, 2024

Remove the GRIST_ALLOWED_HOSTS environment variable #899

Merged

jonathanperret added 6 commits March 21, 2024 15:43

GristWSConnection: use settings instead of "window" object

72c619f

The reference to "window" would unexpectedly break Node-based tests where a client stays connected long enough to send a heartbeat. GristWSSettings.getPageUrl seems explicitly made for this.

Upgrade @types/ws to be in line with ws

ceb5c78

server/Comm test: make waitForCondition throw if failed

0e5ea20

This could avoid false results from the tests.

requestUtils: work with IncomingMessage, make Response optional in tr…

e7e060c

…ustOrigin

Remove hostname check in trustOrigin

774b9a4

It seems this was added for test purposes but the current tests all pass without this, and it looks a bit safer to remove it.

Minor improvements to DocApi tests

a354b9b

jonathanperret force-pushed the http-long-polling branch from 4c47e47 to 63de9af Compare March 21, 2024 17:51

Introduce GristSocketServer, GristServerSocket and GristClientSocket

ea08b49

These classes, used as an alternative to native WebSockets, provide an automatic fallback to HTTP long polling (implemented using Engine.IO) when a WebSocket connection fails.

jonathanperret force-pushed the http-long-polling branch from 63de9af to ea08b49 Compare March 21, 2024 18:30

dsagal reviewed Mar 25, 2024

View reviewed changes

jonathanperret added 5 commits March 26, 2024 10:51

GristServerSocket: add comment for success callback case

1961b2c

Expand GristSockets tests, simplify GristClientSocket switching

adcdd5f

Comm: fix customHost origin handling

80a1b3a

GristServerSocket: fix message handler unregistration

07818ff

Client: remove extra ?

8af0eaf

dsagal approved these changes Mar 28, 2024

View reviewed changes

dsagal merged commit 96b652f into gristlabs:main Mar 28, 2024
25 of 26 checks passed

jonathanperret deleted the http-long-polling branch March 28, 2024 17:31

jonathanperret mentioned this pull request Mar 29, 2024

Introduce APP_HOME_INTERNAL_URL and fix duplicate docs #915

Merged

4 tasks

jordigh reviewed Apr 5, 2024

View reviewed changes

-      const err = description ?? new Error(reason);
+      // In practice, when available, description has more error details,
+      // possibly in the form of an Error object.
+      const maybeErr = description ?? reason;
+      const err = maybeErr instanceof Error ? maybeErr : new Error(maybeErr);

-        cb(event.data instanceof String ? event.data : event.data.toString());
+        // event.data is guaranteed to be a string here because we only send text frames.
+        // https://developer.mozilla.org/en-US/docs/Web/API/WebSocket/message_event#event_properties
+        cb(event.data);

		if (!origin) { return true; } // Not a CORS request.
		if (process.env.GRIST_HOST && req.hostname === process.env.GRIST_HOST) { return true; }

Support HTTP long polling as an alternative to WebSockets #859

Support HTTP long polling as an alternative to WebSockets #859

Conversation

jonathanperret commented Feb 19, 2024 • edited Loading

Motivation

Implementation

paulfitz commented Feb 20, 2024

jonathanperret commented Feb 22, 2024

jonathanperret commented Feb 22, 2024

jonathanperret commented Feb 22, 2024

paulfitz commented Feb 22, 2024

paulfitz commented Feb 22, 2024

jonathanperret commented Feb 23, 2024

jonathanperret commented Feb 29, 2024

dsagal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsagal left a comment

Choose a reason for hiding this comment

jonathanperret commented Mar 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paulfitz commented Mar 28, 2024

dsagal left a comment

Choose a reason for hiding this comment

paulfitz commented Mar 28, 2024

jonathanperret commented Mar 28, 2024

paulfitz commented Mar 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonathanperret commented Feb 19, 2024 •

edited

Loading