-
Notifications
You must be signed in to change notification settings - Fork 537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Host pluggable token provider for r11s driver. #3954
Conversation
* @param tenantId - ID of tenant | ||
* @param documentId - ID of document | ||
*/ | ||
constructor( | ||
public readonly resolvedUrl: api.IResolvedUrl, | ||
private readonly localDeltaConnectionServer: ILocalDeltaConnectionServer, | ||
private readonly tokenProvider: socketStorage.TokenProvider, | ||
private readonly tokenProvider: socketStorage.ITokenProvider, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note - in case of SPO, there are two providers - one for storage, one for web-socket.
I do not know why it's the case though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, you are putting responsibility to fetch both tokens from same provider
from?: number, | ||
to?: number): Promise<api.ISequencedDocumentMessage[]> { | ||
const query = querystring.stringify({ from, to }); | ||
|
||
let headers: { Authorization: string } | null = null; | ||
|
||
const token = (tokenProvider as TokenProvider).token; | ||
const storageToken = await this.tokenProvider.fetchStorageToken(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to start wrapping these with telemetry (at some level) - it's the first question that people ask when looking at perf - what is time break down between token fetch and actual call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I have seen that in ODSP driver. Opened an issue to track that (#3959)
@@ -46,14 +46,15 @@ export class DocumentService implements api.IDocumentService { | |||
return new NullBlobStorageService(); | |||
} | |||
|
|||
const storageToken = await this.tokenProvider.fetchStorageToken(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the expectations here around failures / offline?
I.e. do we expect token provider to fail right away and rely on delta manager to do retries, or does token provider loops until it can get token? I do not remember now why, but in case of SPO/Bohemia, the choice was for token provider to hang in offline until it gets token.
Note that currently runtime only retries after container loaded. If any failure happens on snapshot loading, we immediately fail container load. This is mostly to allow host to have full control, as if we hang (while waiting for client to become online), then host has no object to check status - container was not yet returned to host in any form, so there is no way to cancel / stop any processes until it loads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My current thinking is, socket token failure should be handled by connection and disconnection flow. For storage calls, we should keep retrying. Opened an issue to track that (#3960)
Note that currently runtime only retries after container loaded. If any failure happens on snapshot loading, we immediately fail container load. This is mostly to allow host to have full control, as if we hang (while waiting for client to become online), then host has no object to check status - container was not yet returned to host in any form, so there is no way to cancel / stop any processes until it loads
Do you mean that the driver does not retry "getLatest" call failure and bubble up the error immediately to the host?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, driver always does one retry for 401/403 with new token (refresh = true), and otherwise has no other retry logic. Runtime will not retry getLatest (that is getSnapshotTree call in Container) - it will immediately fail.
I believe it would be great in the future (for this and other reasons) to return container object to caller right away, that way we can change these policies (if we want to) and client has something to close / kill or wait / report progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, same for storage calls - runtime handles retries for ops retrieval (see DeltaManager.getDeltas).
Currently we have no code to retry other operations (reading blobs & trees) - this has not been needed for SPO route as everything comes in one go (getLatest), but it's something we need now both for r11s and SPO routes.
Opened #3963 to track that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct for getLatest. I do not think retry logic should be in the driver - it's much better to have it in loader. Container can be closed at any moment and it would be great to stop all network activity. On top of that all the policies around how often to retry are right now in delta manager. I've opened an issue for that
let headers: { Authorization: string } | null = null; | ||
headers = { | ||
Authorization: `Basic ${fromUtf8ToBase64(`${this.tenantId}:${this.tokenProvider.token}`)}`, | ||
Authorization: `Basic ${fromUtf8ToBase64(`${this.tenantId}:${ordererToken.jwt}`)}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consistency missing: in some other place you made it conditional to ordererToken !== undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think that was for back-compat. We should remove the undefined check and enforce token for all client calls.
@@ -103,7 +105,8 @@ export class RouterliciousDocumentServiceFactory implements IDocumentServiceFact | |||
throw new Error(`Token was not provided.`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code above no longer looks right - I'd remove requirement to pass token and always rely (and require) on token provider. Otherwise you are asking for both things - token & token provider. That's too much, but also not clear what token is it - storage or socket? That also results in telemetry (that you will need to add around all token fetches) to be spread out instead of being in one place, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I am providing either options for now but forgot to remove the error throwing part.
Currently, Either you provide a token as part of resolved url or you provide a token provider. For the former, I am creating a default provider based on your provided token.
So I added two different token types in case the host wants that. But currently for r11s/FRS, there is just one token. The right answer will become clear as we move forward. But wanted to keep the interface generic for now.
* Fetches the storage token from host | ||
* @param refresh - Optional flag indicating whether token fetch must bypass local cache | ||
* @returns TokenResponse object representing token value along with flag indicating | ||
* whether token came from cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we (in principle) support non-authenticated users (maybe not in FRS, but in general)?
You have some code that handles false returns from here, this is not in signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be handled through tenantManager. tinylicious already does that today. So for something like tinylicious, we can return an empty jwt and service will still work.
|
||
if (token) { | ||
if (storageToken) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You say in another comment "We should remove the undefined check and enforce token for all client calls."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct! I was mostly incorporating changes in this PR. Will do the strict token check in a separate PR.
The PR enables r11s host to inject a token provider to the driver. The driver provides an ITokenProvider interface. Hosts are responsible for implementing the interface. In case the host is unable to provide one, driver will fall back to a default implementation that uses the initially signed token.
From a service standpoint, r11s service can now expire tokens based on "exp" property of jwt. Clients will go through a regular disconnect -> connect flow to fetch a new token provided by the host.