-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid race between operation and events listener #12885
Conversation
Hm, too optimistic I guess. Drafting for now. |
af07b25
to
34ca0ba
Compare
Okay, so the test failures were due to remotes that were unauthorized to interact with the events API. Previously they all set up their events listeners after the operation events had all been emitted already, so there was no error, but now that the events listener is set up beforehand in every case, they're hitting an In particular the I can make all of the client functions skip the event listener and that should solve the problem, but to be honest I'm not too familiar with the events API so I don't exactly know why it's used with these operations in the first place. If there's a good reason to keep them around, then we can try to do something fancy and detect in the client if the action involves a remote, and if so check if that remote is also trusted by us, and only then include the events listener. |
The Tom and I came up with the idea of adding websocket support to the operation wait endpoint to counteract this (https://warthogs.atlassian.net/browse/LXD-726?atlOrigin=eyJpIjoiMTM1ZjY4NTgyY2MxNDAzOGIwMjk2MTQzODYzZGZkODgiLCJwIjoiaiJ9). It will need more discussion before we implement anything though. |
It sounds to me like these connections should actually be trusted by the remote then. Maybe we need an access handler on |
So untrusted operation requests instead require a secret passed as a query parameter here: Lines 907 to 924 in 0487e51
Making So this would mean that the events API would only serve exactly one operation to an untrusted remote. |
34ca0ba
to
4afce0a
Compare
@masnax are you going to land the cherry-pick from Incus separately? |
@masnax shall we close this for now? |
12da673
to
7f17365
Compare
This field can be used to determine whether the client is capable of making potentially trusted connections to its target server. This will be useful when determining whether the client should automatically spawn an events websocket listener for operations and other events which require authentication with the `/1.0/events` endpoint. Signed-off-by: Max Asnaashari <max.asnaashari@canonical.com>
The /1.0/events API only allows authenticated connections. When making a request that spawns an operation, we try to automatically connect to the events API, but the client may not necessarily support authentication if it is using a temporary key to establish trust with a particular endpoint. This leads to an obscure Forbidden error even though the actual request we made was trusted. Instead, this ensures we don't try to incorrectly connect to the events API if the underlying ProtocolLXD does not support authentication. Signed-off-by: Max Asnaashari <max.asnaashari@canonical.com>
To ensure we have begun listening for events before the operation has completed, we should explicitly create the events listener for the operation before actually initiating the request that creates and starts the operation. Signed-off-by: Max Asnaashari <max.asnaashari@canonical.com>
…ration If we implicitly create the events listener after the operation has fired, then there's a chance the operation will have completed and emitted the corresponding event before we even begin listening. Waiting for an operation can thus potentially block indefinitely, if we first encounter the operation in a running state, but then it completes before we set up the listener. Instead, we shouldn't try to create an events listener automatically when setting up the wait process as we have no way of determining the current state of an operation yet. Signed-off-by: Max Asnaashari <max.asnaashari@canonical.com>
As the `setupListener` method will no longer be responsible for spawning a listener, we can simply check if the operation has a non-nil events listener to determine whether we should use /1.0/operations/wait or /1.0/events to track the operation. Signed-off-by: Max Asnaashari <max.asnaashari@canonical.com>
@tomponline As we had discussed a few weeks ago, I've also tested to make sure this doesn't interfere with the VM agent. It seems to work fine for launching / execing into VMs. |
Signed-off-by: Max Asnaashari <max.asnaashari@canonical.com>
In many cases, we call `AddHandler` directly after creating an operation, expecting it to support connection to an events websocket. This won't be the case if the client is unable to authenticate with that endpoint. Signed-off-by: Max Asnaashari <max.asnaashari@canonical.com>
ctxConnectedCancel: ctxConnectedCancel, | ||
eventConns: make(map[string]*websocket.Conn), | ||
eventListeners: make(map[string][]*EventListener), | ||
supportsAuthentication: t.TLSClientConfig != nil && len(t.TLSClientConfig.Certificates) > 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markylaing @masnax will this work with OIDC authenticated clients?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, I'm guessing no since they won't have TLS certs. @markylaing is there something we can check to see if the client intends to use OIDC auth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two indicators from ConnectionArgs
: AuthType
and OIDCTokens
. I don't think this will work reliably though, AuthType
may or may not be set (we have a default preference for OIDC over TLS), and OIDCTokens
will not be set if the client has not already authenticated.
ProtocolLXD
has an oidcClient
field that will be non-nil if the client has an access token for that remote, so you could potentially use that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are passing suggesting that the tests arent covering this scenario with the mini-oidc service, can they be updated to check for the regression now and then we can see the fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know much about the mini-oidc tests, but the related change from this PR just determines whether we use long polling or a websocket to track an operation. So we would need a test that explicitly tries to connect to the events websocket to track an operation, and fail if we hit the /1.0/operations/wait
API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markylaing Is there a case where the client should be able to create and then track long-lived operations without having can_view_events
& can_view_projects
? Would it be valid and expected to work if a user tries to create/edit/manage instances/images/storage without also having can_view_events
and can_view_projects
?
My expectation was no, that you would need a set of entitlements inclusive of those two for those types of actions, and in that case checking for an intent to authenticate should be enough, we don't necessarily need to know if the connection will be authorized for /1.0/events
. We already don't for TLS auth, since we just check if certs exist, not if they're actually going to be accepted by the server. So we could let the endpoint just handle the error and correctly return 403 if a client is authorized to create or manage images/instances/storage, but not authorized to use the events API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ideally we would like to decouple this and have the entitlements work in isolation. For example, a client with can_exec
on an instance shouldn't be required to have can_view_events
for the whole project in order to perform the exec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't been able to dive into decoupling these yet but my thinking is that adding websocket functionality to /1.0/operations/{uuid}/wait
OR adding a /1.0/operations/{uuid}/events
would be the way to go. Both can be untrusted and require the websocket secret to function. We would then stop using events in the client except for when specifically using lxc monitor
for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep in mind the client and lxc command still need to be able to work with older servers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ideally we would like to decouple this and have the entitlements work in isolation. For example, a client with
can_exec
on an instance shouldn't be required to havecan_view_events
for the whole project in order to perform the exec.
I'm not sure I follow, that would still behave the same if we infer authentication:
- The client intends to use OIDC auth, but does not have
can_exec
, so the request fails with 403 anyway, so it wouldn't matter that we set up a bad websocket connection. - The client does not intend to use OIDC auth, but tries to exec anyway, so the request fails as it's untrusted. In this case we would have skipped the websocket too, but it's ineffectual.
- The client intends to use OIDC auth, so we set up the websocket listener. The client has all the necessary entitlements so the request succeeds.
Seems to me that's true especially if we want to have entitlements work in isolation. That would mean that whenever a client is authorized for a particular action that can involve a websocket, it must necessarily be authorized to start a websocket on that action by the fact that its authorized for the action itself. As you said, we don't need can_view_events
to view events in that case.
Can you give an example of a situation where we might incorrectly infer that we should start a websocket for a particular action, that action is in fact authorized, but the websocket connection is not?
@@ -207,8 +207,10 @@ func (r *ProtocolLXD) tryCreateContainer(req api.ContainersPost, urls []string) | |||
|
|||
rop.targetOp = op | |||
|
|||
for _, handler := range rop.handlers { | |||
_, _ = rop.targetOp.AddHandler(handler) | |||
if r.supportsAuthentication { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use SupportsAuthentication
everywhere in client
so that when we search for usage of this function we get a clear picture of the usage.
_, _ = rop.targetOp.AddHandler(handler) | ||
if r.supportsAuthentication { | ||
for _, handler := range rop.handlers { | ||
_, _ = rop.targetOp.AddHandler(handler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be ignoring the error returned from AddHandler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen this sort of thing all over the place wherever we try to set up an events listener with getEvents
or AddHandler
and I didn't understand the reason.
@masnax should this be moved back into draft status? |
Moved back to draft status as not active. |
@masnax shall we close this until you are able to work on it again? To get the open PR list down. |
Closes #12994
If we implicitly create the events listener after the operation has fired, then there's a chance the operation will have completed and emitted the corresponding event before we even begin listening. In many cases, the opposite happens too where the operation finishes so quickly after we start creating the events listener that we receive it on the first HTTP request and then never actually use the events listener.
This leads to an obscure race condition where a client which is authenticated with a temporary key tries to open an events listener, which only supports TLS authentication. Most of the time, the operation returns promptly enough that we never actually end up connecting to the events API which would return an
unauthorized
error if we hit it.To avoid having to keep track of all of these pathways and avoid running into obscure races, we shouldn't try to create an events listener automatically when setting up the wait process as we have no way of determining the current state of an operation at this point.
If we want an events listener with our operation, we should make sure to set it up before making the operation request. To do this, we need to make sure the client can actually talk to the events API. We can determine this by checking whether or not the client has any TLS certificates. If it does, then we may be able to authenticate with the
/1.0/events
API, but if not then we definitely can't.Adds a
supportsAuthentication
field toprotocolLXD
which determines whether the client should also try to target the/1.0/events
API automatically to set up an event websocket, in the special cases (for image creation, for example) where we re-implement the logic fromqueryOperation
.Ensures we always try to create the
/1.0/events
listener before sending the request for the operation to be started.Removes the now redundant
skipListener
field, as the operation wait process will not spawn its own listener if the one from the client isnil
.Ensures all calls to
AddHandler
are on an operation that was created with a client that has TLS authentication support.