- 
                Notifications
    You must be signed in to change notification settings 
- Fork 9.1k
HADOOP-19282. S3A: STSClientFactory: do not use URIBuilder #7966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-19282. S3A: STSClientFactory: do not use URIBuilder #7966
Conversation
URIBuilder was used from the AWS SDK for Java v2, to be precise from the shaded Apache HTTP Client. It is a problem if a user would like not to use the AWS SDK bundle, since more or less only 3 modules are needed (s3, s3-transfer & sts), but that may cause problems on unshaded dependency versions. Since a URI constructor can achieve the same here I switched it as a preferred option.
| 💔 -1 overall 
 
 This message was automatically generated. | 
| Oh hey, way better CI luck this time around :) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
| @LDVSOFT thanks for this -we never want to make those shaded libraries mandatory. If you can sort out the classpath of a leaner deployment -well done! Those test failures showed you were running them...the prefetch one fails regularly for me too. The PR Is merged to trunk. If you do a cherrypick PR and retest against branch-3.4 I will merge there too | 
| also, have you an asf JIRA ID to assign the fix to you there? If not: request one and tell me what it is. | 
| @steveloughran You're welcome! About leaner classpath: there are pros and cons. S3A, as far as I see, really only needs AWS SDK modules for s3, s3-manager, sts and whatever those would pull (core, sync/async clients, etc), however since dependencies of those aren't shaded, like in bundle, it may cause other problems I can't recommend to anyone outside as of now (as of now I don't trust Maven to handle this complexity 🤫). Also there is some noise as it seems implementation would prefer a particular good async http client, but don't quote me on that… Will cherry-pick for 3.4. I'm not familiar with Jira to understand what ID you mean here, but my username is LDVSoft. | 
URIBuilder was used from the AWS SDK for Java v2, from the shaded Apache HTTP Client. It is a problem if a user would like not to use the AWS SDK bundle, since more or less only 3 modules are needed (s3, s3-transfer & sts), but that may cause problems on unshaded dependency versions. Since a URI constructor can achieve the same here I switched it as a preferred option. Contributed by Lapshin Dmitry
| @LDVSOFT the reason we use the big jar is just that historically it's been so brittle to things like the version of jackson, httpclient and more -so using the unshaded artifacts would put the SDK in charge of defining the versions of those artifacts for everything. regarding JIRA ID, your username is what I meant -you are now listed as the author of the patch there. | 
URIBuilder was used from the AWS SDK for Java v2, from the shaded Apache HTTP Client. It is a problem if a user would like not to use the AWS SDK bundle, since more or less only 3 modules are needed (s3, s3-transfer & sts), but that may cause problems on unshaded dependency versions. Since a URI constructor can achieve the same here I switched it as a preferred option. Contributed by Lapshin Dmitry
Description of PR
URIBuilderwas used from the AWS SDK for Java v2, to be precise from the shaded Apache HTTP Client. It is a problem if a user would like not to use the AWS SDK bundle, since more or less only 3 modules are needed (s3, s3-transfer & sts), but that may cause problems on unshaded dependency versions. Since a URI constructor can achieve the same here I switched it as a preferred option.How was this patch tested?
I've run the test suite against a eu-west-1 bucket, without scaling/load since the change shouldn't affect that. To be exact, with something like this:
auth-keys.xmlAlmost all test pass:
ITestDelegatedMRJobwork. They probably clean out environment somewhere and my environment-provided AWS credentials didn't work. Also it looks parametrized, and I can't tell from the Surefire/Failsafe reports which causes a problem.ITestRoleDelegationInFilesystem/ITestSessionDelegationInFilesystemfail a bit inmissmatch2, but I'm really unfamiliar with credentials delegation. Probably lost environment variables on it's way?ITestS3APrefetchingInputStreamfails with 0 size.Given that other tests pass and the scope of the change I think it's fine, and the problem is my test setup misconfiguration. If you know how to fix the setup — I can rerun with some other options.
Also, I've found this bug while repackaging Spark for a local K8S deployment, and with this fix STS configuration options work even if I do replace AWS SDK bundle with only required SDK modules.
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?Sign-off
I give a license to the Apache Software Foundation to use this code, as required under §5 of the Apache License.
P.S.
Re-opened from #7483 where there was a review by @steveloughran.