Race between untrusted operations and events websocket listener #12994

masnax · 2024-02-29T05:25:55Z

An issue causing intermittent test failures in the clustering image refresh tests was fixed by e7b3c92

The problem was a race where an operation would complete after setting up the events websocket listener, but before the listener is added to the event handler, and so we would block while waiting for the operation complete event. The fix works by reducing the time between the creation of the listener and adding it to the handler.

While this works, I think it is a bit obscure and overly couples the client with the order of operations on the server. It made more sense to me to just ensure we set up the client and all its listeners before we make any request to create an operation, which those listeners are meant to track. While testing this over here: #12885 I found a separate race that is connected to the one solved above, resulting in this error: https://github.com/canonical/lxd/actions/runs/7920353180/job/21623203927?pr=12885#step:13:15409

This happens if we run lxc image copy localhost:${fingerprint} lxd2: --mode=pushto copy an image across 2 remotes.

Internally, we make the following POST /1.0/images/{fingerprint}/export to localhost here. From this endpoint we make a CreateImage request to lxd2 without a client keypair. The request is untrusted since it's run from the server, but when we add lxd2 as a remote, it only trusts the client, not the server.

The race comes in here when CreateImage performs the request which spawns an operation on lxd2. Around this time, it also sets up an event listener which requires trust. If the operation completes before we set up the listener then everything works fine, but if the listener manages to start up first, then the request will fail with an unauthorized error.

I think a simple solution in this case would be to just ensure that we don't spawn an events listener if the client hasn't been assigned a keypair. We can infer from this that the request we are trying to make is going to be untrusted, so we shouldn't try to separately connect to a trusted endpoint.

The text was updated successfully, but these errors were encountered:

tomponline · 2024-02-29T16:18:38Z

I think a simple solution in this case would be to just ensure that we don't spawn an events listener if the client hasn't been assigned a keypair. We can infer from this that the request we are trying to make is going to be untrusted, so we shouldn't try to separately connect to a trusted endpoint.

This makes sense. Although I am wondering why the image copy succeeds if the two servers dont trust each other and there is no temporary key?

masnax · 2024-02-29T16:34:42Z

I think a simple solution in this case would be to just ensure that we don't spawn an events listener if the client hasn't been assigned a keypair. We can infer from this that the request we are trying to make is going to be untrusted, so we shouldn't try to separately connect to a trusted endpoint.

This makes sense. Although I am wondering why the image copy succeeds if the two servers dont trust each other and there is no temporary key?

There is a secret set in the request header or operation metadata which is manually verified in the corresponding images and operations endpoints, if that's what you mean by using a temporary key.

I looked into handling the secret in the events endpoints as well, but I couldn't figure out an elegant solution for verifying it.
Best I thought of is to intercept any events going through the websocket and only propagate the events which contain an operation that has the secret in its metadata. That seems overly complex to me.

tomponline · 2024-02-29T16:44:54Z

@masnax ok in that case im not following what you mean by:

I think a simple solution in this case would be to just ensure that we don't spawn an events listener if the client hasn't been assigned a keypair

masnax · 2024-02-29T17:20:40Z

@masnax ok in that case im not following what you mean by:

I think a simple solution in this case would be to just ensure that we don't spawn an events listener if the client hasn't been assigned a keypair

The client function which spawns the racy event listener is CreateImage. It sets up an http.Request with the secret in its header. Performing the request spawns an operation.

The request that spawns the operation bypasses the the normal auth process of comparing the request peer certs to the locally trusted certificates, and instead verifies the secret, which was sent to that server beforehand.

CreateImage returns an operation struct. When we call operation.Wait, we create an events websocket listener which will listen for any type of event (logging, operations, etc). This listener hits the /1.0/events endpoint which only supports the traditional auth method.

masnax · 2024-02-29T17:27:00Z

So what I'm suggesting is to not leave spawning the listener until after the operation is created and we call operation.Wait, but instead create it before we make the request that spawns the operation. We do this under the constraint that the underlying ProtocolLXD actually has a client keypair assigned to it, and otherwise set skipListener to true.

tomponline · 2024-03-01T08:21:18Z

@masnax ok lets have a look at a PR that does this when you get a chance. Thanks

tomponline · 2024-03-28T11:05:18Z

Filtering events by what the user can see, ala lxc/incus#292 would be helpful here.

@markylaing says

Ultimately, filtering events by what the user is able to view would solve the PR. Perhaps we can add an operation_secret query parameter and allow untrusted requests on the events endpoint.

tomponline added the Bug Confirmed to be a bug label Feb 29, 2024

tomponline added this to the lxd-6.1 milestone Feb 29, 2024

tomponline assigned masnax Mar 1, 2024

masnax mentioned this issue Mar 18, 2024

Avoid race between operation and events listener #12885

Closed

tomponline modified the milestones: lxd-6.1, lxd-6.2 May 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race between untrusted operations and events websocket listener #12994

Race between untrusted operations and events websocket listener #12994

masnax commented Feb 29, 2024

tomponline commented Feb 29, 2024

masnax commented Feb 29, 2024

tomponline commented Feb 29, 2024

masnax commented Feb 29, 2024

masnax commented Feb 29, 2024

tomponline commented Mar 1, 2024

tomponline commented Mar 28, 2024

Race between untrusted operations and events websocket listener #12994

Race between untrusted operations and events websocket listener #12994

Comments

masnax commented Feb 29, 2024

tomponline commented Feb 29, 2024

masnax commented Feb 29, 2024

tomponline commented Feb 29, 2024

masnax commented Feb 29, 2024

masnax commented Feb 29, 2024

tomponline commented Mar 1, 2024

tomponline commented Mar 28, 2024