-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remote S3 configs #116
Remote S3 configs #116
Conversation
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
trino-aws-proxy/src/main/java/io/trino/aws/proxy/server/remote/RemoteS3Config.java
Outdated
Show resolved
Hide resolved
{ | ||
this(VIRTUAL_HOST_BUILDER, true, Optional.empty()); | ||
this((bucket, region) -> | ||
(bucket.isEmpty() ? "" : "%s.".formatted(bucket)) + "s3.%s.%s".formatted(region, remoteS3Config.getDomain()), remoteS3Config.getHttps(), remoteS3Config.getPort()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that could be interesting is to template the domain part of the config, so that it is not tied to AWS. The current implementation in this PR allows you to change the domain, but still assumes other systems will use the same <bucket>.s3.<region>.<domain>
format which is not necessarily true.
Could we instead template this with 2 named arguments:
%(region)
%(bucket)
And have 2 configurations:
rawDomainTemplate
(for cases without a bucket) - the AWS S3 one would bes3.%(region).amazonaws.com
bucketDomainTemplate
(for cases with a bucket) - the AWS S3 one would be%(bucket).s3.%(region).amazonaws.com
We could then infer whether to load the PathStyle implementation or the VirtualHostStyle one based on whether we have both configs or only rawDomainTemplate
NB: I'm using %(whatever)
encoding for placeholders, that's just the first thing that came to mind - whatever library we want to use for this may use a different format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the option of configuring the template option. But, I've removed the case where the bucket is empty. Would there be a scenario where the bucket is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed this comment. Yes, there would be - for instance, listing buckets https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListBuckets.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added logic for empty buckets
trino-aws-proxy/src/main/java/io/trino/aws/proxy/server/remote/RemoteS3Config.java
Outdated
Show resolved
Hide resolved
trino-aws-proxy/src/main/java/io/trino/aws/proxy/server/remote/RemoteS3Config.java
Outdated
Show resolved
Hide resolved
trino-aws-proxy/src/main/java/io/trino/aws/proxy/server/remote/RemoteS3Config.java
Show resolved
Hide resolved
trino-aws-proxy/src/main/java/io/trino/aws/proxy/server/remote/RemoteS3HostBuilder.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
@Config("remoteS3.pathstyle.template") | ||
public RemoteS3Config setPathStyleTemplate(String hostnameTemplate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering whether we need 2 settings for this. In its present form, we would need to check to make sure that these 2 are mutually exclusive - a user should not configure both of these.
What about using a single template? That would suffice for both, with a flag to control whether we want virtual host style addressing or path style addressing.
So instead of remoteS3.virtualhoststyle.template
and remoteS3.pathstyle.template
we could have:
remoteS3.domain-template
remoteS3.use-virtual-host-addressing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note my original comment yesterday relating to 2 templates was not suggesting we have a template for virtual host addressing and one for path style addressing, but rather that we can have a template for "bucketed" operations and one for "unbucketed" operations.
Currently, this implementation I believe would not work with "unbucketed" operations if using virtual host addressing.
An operation like listing buckets (which is not a "bucketed" operation) would, with the default template, resolve the hostname to .s3.us-east-1.amazonaws.com
(us-east-1 here just being some sample region) with a leading dot, which is not valid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, in the present form, PathStyleRemoteS3Facade uses remoteS3.pathstyle.template
and VirtualHostStyleRemoteS3Facade uses remoteS3.virtualhoststyle.template
. When both properties are configured, the binding in ServerModule determines the property that will be used. If the properties are not configured, the default values are used. Though a user should configure just one of these properties, there wouldn't be a problem if both are configured.
Currently, the RemoteS3Facade that is binded is controlled here - https://github.com/trinodb/aws-proxy/blob/main/trino-aws-proxy/src/main/java/io/trino/aws/proxy/server/TrinoAwsProxyServerModule.java#L119. We could try to leverage remoteS3.use-virtual-host-addressing
property to bind the type of RemoteS3Facade. WDYT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a RemoteS3Module to conditionally bind VirtualHostStyle or PathStyle.
...aws-proxy/src/main/java/io/trino/aws/proxy/server/remote/VirtualHostStyleRemoteS3Facade.java
Outdated
Show resolved
Hide resolved
...proxy/src/test/java/io/trino/aws/proxy/server/remote/TestVirtualHostStyleRemoteS3Facade.java
Outdated
Show resolved
Hide resolved
7e079b0
to
1866965
Compare
trino-aws-proxy/src/main/java/io/trino/aws/proxy/server/remote/RemoteS3Config.java
Outdated
Show resolved
Hide resolved
a2f80dc
to
febc22c
Compare
Test failures are unrelated, see #115 |
The remote s3 parameters can now be made configurable. Two types of styles can be used for referencing the s3 - PathStyle and VirtualHostStyle.
The default hostname template for pathStyle is: "s3.${region}.${domain}"
The default hostname template for virtualHostStyle is: "${bucket}.s3.${region}.${domain}"
The motivation behind this configuration: It may not always be the case where the region is always specified in the S3 URI (https://s3.testDomain.com) or the "s3" constant always remains "s3". The configuration can handle custom URIs such as:
https://s3a.testDomain.com/
https://s3a.testDomain.us-east.com/
https://s3a.us-east.testDomain.com/