Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FSN-13] feat(nf-tower): Retrieve and set Fusion license tokens in a process environment #5614

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

alberto-miranda
Copy link
Contributor

@alberto-miranda alberto-miranda commented Dec 16, 2024

Description

This PR adds a new TowerFusionEnv class to nf-tower that enables setting Platform-specific environment variables to Nextflow processes running with Fusion enabled. The class implements the FusionEnv interface and, as such, provides a getEnvironment() method that returns the relevant key-value pairs.

Currently, the provider is only being used to set the FUSION_LICENSE_TOKEN environment variable, the value of which is retrieved by sending a specific HTTP request to Platform in order to validate the run. Once the run is validated, Platform will reply with a corresponding token which will be injected into the process environment.

Note

Since fusion licenses are still not mandatory, if the license cannot be validated (or there's a configuration error affecting this validation process) a warning is emitted:

user@host$ nextflow run hello

N E X T F L O W   ~  version 24.11.0-edge

Launching `https://github.com/nextflow-io/hello` [mighty_liskov] DSL2 - revision: afff16a9b4 [master]

executor >  local (fusion enabled) (4)
[ec/6b82ba] process > sayHello (4) [100%] 4 of 4 ✔
Hello world!

Hello world!

Ciao world!

Ciao world!

Bonjour world!

Bonjour world!

Hola world!

Hola world!

WARN: Error retrieving Fusion license information: Missing personal access token -- Make sure there's a variable >TOWER_ACCESS_TOKEN in your environment
WARN: Error retrieving Fusion license information: Missing personal access token -- Make sure there's a variable >TOWER_ACCESS_TOKEN in your environment
WARN: Error retrieving Fusion license information: Missing personal access token -- Make sure there's a variable >TOWER_ACCESS_TOKEN in your environment
WARN: Error retrieving Fusion license information: Missing personal access token -- Make sure there's a variable >TOWER_ACCESS_TOKEN in your environment

Testing

This PR can be tested by following this guide.

Copy link

netlify bot commented Dec 16, 2024

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 43f8bba
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/67659be67516e400080a0dea

@jordeu jordeu self-requested a review December 16, 2024 13:33
@alberto-miranda alberto-miranda force-pushed the fsn-13/fetch-fusion-license-token-from-platform-and-pass-it-to-fusion-using-envvar branch from bee1e99 to 6fc8fbe Compare December 17, 2024 10:33
@alberto-miranda
Copy link
Contributor Author

alberto-miranda commented Dec 17, 2024

@pditommaso I have refactored the implementation into TowerClient as you suggested. A couple of comments:

  1. I'm not very fond of making TowerClient implement FusionEnv: I'd rather have a dedicated TowerFusionEnv class with the getEnvironment() method and helpers, but then I cannot reuse the client itself or the sendHttpMessage() and co. functions. Any advice here?
  2. It doesn't seem to be picking up TowerClient as an extension despite having defined it in plugins/nf-tower/src/resources/META-INF/extensions.idx. As such, it's not using it (e.g. this should fail and it's not failing). Is there anything else I need to do to register the extension?

@pditommaso
Copy link
Member

pditommaso commented Dec 17, 2024

I'm not very fond of making TowerClient implement FusionEnv: I'd rather have a dedicated TowerFusionEnv class

I agree, sorry if I may have suggested in that direction. My point was to move this logic in the tower client module

then I cannot reuse the client itself or the sendHttpMessage()

don't worry about that, use the java provider client for example

this.httpClient = HttpClient.newBuilder()
.version(HttpClient.Version.HTTP_1_1)
.followRedirects(HttpClient.Redirect.NORMAL)
.cookieHandler(cookieManager)
.connectTimeout(Duration.ofSeconds(10))
.build()
}

Signed-off-by: Alberto Miranda <alberto.miranda@seqera.io>
@alberto-miranda alberto-miranda force-pushed the fsn-13/fetch-fusion-license-token-from-platform-and-pass-it-to-fusion-using-envvar branch from 6fc8fbe to e7b1082 Compare December 18, 2024 12:53
@alberto-miranda alberto-miranda changed the title [FSN-13] feat: Add FusionClient to send fusion-related HTTP requests to Platform [FSN-13] feat: Add Fusion environment provider to nf-tower plugin Dec 18, 2024
@alberto-miranda alberto-miranda changed the title [FSN-13] feat: Add Fusion environment provider to nf-tower plugin [FSN-13] feat(nf-tower): Retrieve and set Fusion license tokens in a process environment Dec 18, 2024
@alberto-miranda
Copy link
Contributor Author

@pditommaso I refactored the implementation based on our discussions. The PR is ready to review.

Comment on lines 166 to 196
/**
* Get the configured Platform API endpoint: if the endpoint is not provided in the configuration, we fallback to the
* environment variable `TOWER_API_ENDPOINT`. If neither is provided, we fallback to the default endpoint.
*
* @param opts the configuration options for Platform
* @param env the applicable environment variables
* @return the Platform API endpoint
*/
protected static String endpoint0(Map opts, Map<String, String> env) {
def result = opts.endpoint as String
if (!result || result == '-') {
result = env.get('TOWER_API_ENDPOINT') ?: TowerClient.DEF_ENDPOINT_URL
}
return result.stripEnd('/')
}

/**
* Get the configured Platform access token: if `TOWER_WORKFLOW_ID` is provided in the environment, we are running
* in a Platform-made run and we should ONLY retrieve the token from the environment. Otherwise, check
* the configuration file or fallback to the environment. If no token is found, returns null.
*
* @param opts the configuration options for Platform
* @param env the applicable environment variables
* @return the Platform access token
*/
protected static String accessToken0(Map opts, Map<String, String> env) {
def token = env.get('TOWER_WORKFLOW_ID')
? env.get('TOWER_ACCESS_TOKEN')
: opts.containsKey('accessToken') ? opts.accessToken as String : env.get('TOWER_ACCESS_TOKEN')
return token
}
Copy link
Contributor Author

@alberto-miranda alberto-miranda Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basically replicates what TowerConfig does in nf-wave, but I didn't want to add a explicit dependency on that. If adding this dependency is not a problem, it would be better to reuse the code there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could make sense to turn those into PlatformHelper class shared across modules & plugins

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Do you want it done in the context of this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored with c1147cd. I created the PlatformHelper class under modules/nextflow/src/main/groovy/nextflow/platform since it seemed to fit with the existing conventions for common code. Please double check.

Copy link
Member

@pditommaso pditommaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, left some comments. Question, what happens when using Fusion and Tower plugin is not enabled?

try {
final token = getLicenseToken(product, version)
return [
'FUSION_LICENSE_TOKEN': token,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'FUSION_LICENSE_TOKEN': token,
FUSION_LICENSE_TOKEN: token,

Groovy magic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with f32f0ca

Comment on lines 133 to 139
new Gson().toJson(
new LicenseTokenRequest(
product: product,
version: version
),
LicenseTokenRequest.class
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Built-in JsonOutout.toJson(Map) can be easier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it this way for symmetry with the deserialization but I don't have a preference for one or the other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine keeping gson, then it would be better to isolate into its own method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with 2f11171

Comment on lines 166 to 196
/**
* Get the configured Platform API endpoint: if the endpoint is not provided in the configuration, we fallback to the
* environment variable `TOWER_API_ENDPOINT`. If neither is provided, we fallback to the default endpoint.
*
* @param opts the configuration options for Platform
* @param env the applicable environment variables
* @return the Platform API endpoint
*/
protected static String endpoint0(Map opts, Map<String, String> env) {
def result = opts.endpoint as String
if (!result || result == '-') {
result = env.get('TOWER_API_ENDPOINT') ?: TowerClient.DEF_ENDPOINT_URL
}
return result.stripEnd('/')
}

/**
* Get the configured Platform access token: if `TOWER_WORKFLOW_ID` is provided in the environment, we are running
* in a Platform-made run and we should ONLY retrieve the token from the environment. Otherwise, check
* the configuration file or fallback to the environment. If no token is found, returns null.
*
* @param opts the configuration options for Platform
* @param env the applicable environment variables
* @return the Platform access token
*/
protected static String accessToken0(Map opts, Map<String, String> env) {
def token = env.get('TOWER_WORKFLOW_ID')
? env.get('TOWER_ACCESS_TOKEN')
: opts.containsKey('accessToken') ? opts.accessToken as String : env.get('TOWER_ACCESS_TOKEN')
return token
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could make sense to turn those into PlatformHelper class shared across modules & plugins

@alberto-miranda
Copy link
Contributor Author

Looks good, left some comments. Question, what happens when using Fusion and Tower plugin is not enabled?

Nothing special, if nf-tower is enabled but Fusion is not, the TowerFusionEnv itself is never instantiated, and there are no calls made to retrieve a token. The same happens if both are disabled, but in this case nf-tower is also never loaded.

@pditommaso
Copy link
Member

Will Fusion be able to run?

@alberto-miranda
Copy link
Contributor Author

alberto-miranda commented Dec 18, 2024

Will Fusion be able to run?

The license is not yet enforced in the Fusion side, so yes.

A side effect we didn't consider is that if fusion is run without nf-tower the license is not enforced currently (because TowerFusionEnv is not loaded in this case). Is this a use case that should concern us?

EDIT: Even if not enforced by nextflow, fusion will still fail once licenses become mandatory, but the error received by the user will probably be uglier.

…per`

Signed-off-by: Alberto Miranda <alberto.miranda@seqera.io>
Signed-off-by: Alberto Miranda <alberto.miranda@seqera.io>
Signed-off-by: Alberto Miranda <alberto.miranda@seqera.io>
Signed-off-by: Alberto Miranda <alberto.miranda@seqera.io>
Signed-off-by: Alberto Miranda <alberto.miranda@seqera.io>
@pditommaso
Copy link
Member

Even if not enforced by nextflow, fusion will still fail once licenses become mandatory, but the error received by the user will probably be uglier

and slower, ie. after the job started. we should consider adding a "default" TowerFusionEnv implementation of this execution reporting a warning message.

The "default" TowerFusionEnv is "overridden" by the one provided by tower plugin when it's enabled

See here for example

@Priority(-10) // <-- lower is higher, this is needed to override default provider behavior

@alberto-miranda
Copy link
Contributor Author

Even if not enforced by nextflow, fusion will still fail once licenses become mandatory, but the error received by the user will probably be uglier

and slower, ie. after the job started. we should consider adding a "default" TowerFusionEnv implementation of this execution reporting a warning message.

The "default" TowerFusionEnv is "overridden" by the one provided by tower plugin when it's enabled

See here for example

@Priority(-10) // <-- lower is higher, this is needed to override default provider behavior

It's a great idea, but if I understand correctly the mechanism you propose, it prioritizes between implementations of a specific interface doesn't it? For our case, we need to override one implementer of FusionEnv (i.e. TowerFusionEnv) but maintain any other implementers such as AwsFusionEnv. Can we do that automatically?

@pditommaso
Copy link
Member

You are right, it may need some customisation in the plugins handling. Ignore it for now.

Copy link

@stefanoboriero stefanoboriero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, left a few comments but nothing blocking

() -> {
log.debug "Request: method:=${req.method()}; uri:=${req.uri()}; request:=${req}"
final resp = httpClient.send(req, HttpResponse.BodyHandlers.ofString())
log.debug "Response: statusCode:=${resp.statusCode()}; body:=${resp.body()}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This response is going to contain an authentication token right? Do we want to log it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I'm not sure about what the preferred policy is in Nextflow logs (cc @pditommaso)

private static final String LICENSE_TOKEN_PATH = 'license/token/'

// Server errors that should trigger a retry
private static final List<Integer> SERVER_ERRORS = [429, 500, 502, 503, 504]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how I feel about retrying 503 and 504 errors, if Seqera Platform is under heavy load wouldn't this make it worse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took this from Wave's client to follow the existing approach. I don't have a specific preference 🤷‍♂️ (cc @pditommaso)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why it's used an exponential backoff. Actually think it should be added 408 (connection timeout) to that list

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, consider this resolved 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error 408 added with 6ce5be3

* Send a request to Platform to obtain a license-scoped JWT for Fusion. The request is authenticated using the
* Platform access token provided in the configuration of the current session.
*
* @throws AbortOperationException if a Platform access token cannot be found

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also throws other exceptions if the token is not valid or malformed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed with 1fed859 and refactored into a9849f5

Signed-off-by: Alberto Miranda <alberto.miranda@seqera.io>
Signed-off-by: Alberto Miranda <alberto.miranda@seqera.io>
Signed-off-by: Alberto Miranda <alberto.miranda@seqera.io>
Signed-off-by: Alberto Miranda <alberto.miranda@seqera.io>
Signed-off-by: Alberto Miranda <alberto.miranda@seqera.io>
@alberto-miranda
Copy link
Contributor Author

alberto-miranda commented Dec 20, 2024

As discussed this morning, I have added caching of license token responses with a9849f5 and re-scoped the license-related error messages as debug instead of warning with eaa985e.

Having said that, I'm not at all convinced that the caching is working as it should, so I would appreciate more experienced eyes looking into it.

@alberto-miranda
Copy link
Contributor Author

alberto-miranda commented Dec 20, 2024

It looks like I broke some tests in the process. Fixing. Tests should be fixed! 🚀

Signed-off-by: Alberto Miranda <alberto.miranda@seqera.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants