-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hadoop FS API usage #44
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
todo: s3a future support? Migration notes: - instead of `myS3String.toS3Location(region1)` use `myS3String.toS3LocationOrFail.withRegion(region1)` or `myS3String.toS3Location.map(_.withRegion).get` (because toS3Location on its own now return an Option) - implicit FileSystem is usually needed to signify difference between hdfs and s3 over hadoop fs api
todo: s3a future support? Migration notes: - instead of `myS3String.toS3Location(region1)` use `myS3String.toS3LocationOrFail.withRegion(region1)` or `myS3String.toS3Location.map(_.withRegion).get` (because toS3Location on its own now return an Option) - implicit FileSystem is usually needed to signify difference between hdfs and s3 over hadoop fs api
…usage. // todo fs.asOutputFs | fs.asInputFs
…s do not need it.
dk1844
requested review from
yruslan,
benedeki,
Zejnilovic and
AdrianOlosutean
November 2, 2020 14:49
atum/src/main/scala/za/co/absa/atum/persistence/s3/S3Location.scala
Outdated
Show resolved
Hide resolved
Zejnilovic
approved these changes
Nov 4, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure my approve should count but I checked the code at least
dk1844
force-pushed
the
feature/43-emrfs-fs-api
branch
from
November 5, 2020 09:16
374fa40
to
536ca22
Compare
AdrianOlosutean
approved these changes
Nov 5, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code reviewed. LGTM
Released as 3.1.0. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This feature introduces S3 file access using the means of the Hadoop FS API.
org.apache.hadoop.fs.FileSystem
is created based on the string path of the file (for ofs3://bucketName/path/on/s3
for s3 implementation or HDFS otherwise) and then we work the FS as we would "normally"inputFs
andoutputFs
where applicable.SdkS3
not justS3
.s3
,s3n
, ands3a
) and such protocol is respected when hadoop FS is being created for S3 locations.sparkSession.enableControlMeasuresTracking()
does not need implicitfs: FileSystem
(anymore), the fs is now part of the storer and reused for the spark listener if needed.Expected release version 3.1.0
Closes #43