Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host pluggable token provider for r11s driver. #3954

Merged
merged 3 commits into from
Oct 16, 2020

Conversation

tanviraumi
Copy link
Contributor

The PR enables r11s host to inject a token provider to the driver. The driver provides an ITokenProvider interface. Hosts are responsible for implementing the interface. In case the host is unable to provide one, driver will fall back to a default implementation that uses the initially signed token.

From a service standpoint, r11s service can now expire tokens based on "exp" property of jwt. Clients will go through a regular disconnect -> connect flow to fetch a new token provided by the host.

@msfluid-bot
Copy link
Collaborator

msfluid-bot commented Oct 15, 2020

@fluidframework/base-host: No change
Metric NameBaseline SizeCompare SizeSize Diff
main.js 174.43 KB 174.43 KB No change
Total Size 174.43 KB 174.43 KB No change

Baseline commit: 9c03f4b

Generated by 🚫 dangerJS against 2384b75

* @param tenantId - ID of tenant
* @param documentId - ID of document
*/
constructor(
public readonly resolvedUrl: api.IResolvedUrl,
private readonly localDeltaConnectionServer: ILocalDeltaConnectionServer,
private readonly tokenProvider: socketStorage.TokenProvider,
private readonly tokenProvider: socketStorage.ITokenProvider,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note - in case of SPO, there are two providers - one for storage, one for web-socket.
I do not know why it's the case though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, you are putting responsibility to fetch both tokens from same provider

from?: number,
to?: number): Promise<api.ISequencedDocumentMessage[]> {
const query = querystring.stringify({ from, to });

let headers: { Authorization: string } | null = null;

const token = (tokenProvider as TokenProvider).token;
const storageToken = await this.tokenProvider.fetchStorageToken();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to start wrapping these with telemetry (at some level) - it's the first question that people ask when looking at perf - what is time break down between token fetch and actual call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I have seen that in ODSP driver. Opened an issue to track that (#3959)

@@ -46,14 +46,15 @@ export class DocumentService implements api.IDocumentService {
return new NullBlobStorageService();
}

const storageToken = await this.tokenProvider.fetchStorageToken();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the expectations here around failures / offline?
I.e. do we expect token provider to fail right away and rely on delta manager to do retries, or does token provider loops until it can get token? I do not remember now why, but in case of SPO/Bohemia, the choice was for token provider to hang in offline until it gets token.
Note that currently runtime only retries after container loaded. If any failure happens on snapshot loading, we immediately fail container load. This is mostly to allow host to have full control, as if we hang (while waiting for client to become online), then host has no object to check status - container was not yet returned to host in any form, so there is no way to cancel / stop any processes until it loads

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current thinking is, socket token failure should be handled by connection and disconnection flow. For storage calls, we should keep retrying. Opened an issue to track that (#3960)

Note that currently runtime only retries after container loaded. If any failure happens on snapshot loading, we immediately fail container load. This is mostly to allow host to have full control, as if we hang (while waiting for client to become online), then host has no object to check status - container was not yet returned to host in any form, so there is no way to cancel / stop any processes until it loads

Do you mean that the driver does not retry "getLatest" call failure and bubble up the error immediately to the host?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, driver always does one retry for 401/403 with new token (refresh = true), and otherwise has no other retry logic. Runtime will not retry getLatest (that is getSnapshotTree call in Container) - it will immediately fail.
I believe it would be great in the future (for this and other reasons) to return container object to caller right away, that way we can change these policies (if we want to) and client has something to close / kill or wait / report progress.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, same for storage calls - runtime handles retries for ops retrieval (see DeltaManager.getDeltas).
Currently we have no code to retry other operations (reading blobs & trees) - this has not been needed for SPO route as everything comes in one go (getLatest), but it's something we need now both for r11s and SPO routes.

Opened #3963 to track that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct for getLatest. I do not think retry logic should be in the driver - it's much better to have it in loader. Container can be closed at any moment and it would be great to stop all network activity. On top of that all the policies around how often to retry are right now in delta manager. I've opened an issue for that

let headers: { Authorization: string } | null = null;
headers = {
Authorization: `Basic ${fromUtf8ToBase64(`${this.tenantId}:${this.tokenProvider.token}`)}`,
Authorization: `Basic ${fromUtf8ToBase64(`${this.tenantId}:${ordererToken.jwt}`)}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consistency missing: in some other place you made it conditional to ordererToken !== undefined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think that was for back-compat. We should remove the undefined check and enforce token for all client calls.

@@ -103,7 +105,8 @@ export class RouterliciousDocumentServiceFactory implements IDocumentServiceFact
throw new Error(`Token was not provided.`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code above no longer looks right - I'd remove requirement to pass token and always rely (and require) on token provider. Otherwise you are asking for both things - token & token provider. That's too much, but also not clear what token is it - storage or socket? That also results in telemetry (that you will need to add around all token fetches) to be spread out instead of being in one place, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I am providing either options for now but forgot to remove the error throwing part.

Currently, Either you provide a token as part of resolved url or you provide a token provider. For the former, I am creating a default provider based on your provided token.

So I added two different token types in case the host wants that. But currently for r11s/FRS, there is just one token. The right answer will become clear as we move forward. But wanted to keep the interface generic for now.

* Fetches the storage token from host
* @param refresh - Optional flag indicating whether token fetch must bypass local cache
* @returns TokenResponse object representing token value along with flag indicating
* whether token came from cache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we (in principle) support non-authenticated users (maybe not in FRS, but in general)?
You have some code that handles false returns from here, this is not in signature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be handled through tenantManager. tinylicious already does that today. So for something like tinylicious, we can return an empty jwt and service will still work.


if (token) {
if (storageToken) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You say in another comment "We should remove the undefined check and enforce token for all client calls."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct! I was mostly incorporating changes in this PR. Will do the strict token check in a separate PR.

@tanviraumi tanviraumi merged commit b8c8c4b into microsoft:main Oct 16, 2020
@tanviraumi tanviraumi deleted the token_provider branch October 16, 2020 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants