-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Storage System] Support for IBM Cloud Object Storage #302
Conversation
|
The build fails because the license comment in the new class doesn't contain the copyright line. |
src/main/scala/org/apache/spark/sql/delta/storage/COSLogStore.scala
Outdated
Show resolved
Hide resolved
@guykhazma Thanks a lot for your contribution. As we don't have an environment to test it, could you clarify what tests you have done in a real environment? |
@zsxwing thanks for reviewing. |
Hi @zsxwing, can you please review the PR. |
@zsxwing gentle ping, @tdas @mukulmurthy I'll appreciate your review as well. |
Hi @zsxwing, @tdas, @rahulsmahadev, |
@zsxwing, @tdas, @rahulsmahadev can you please review the PR |
Sorry for the delay. We are working on adding a |
@tdas I have made the changes but didn't change the documentation yet. |
This reverts commit b91f8a7.
Hi @tdas , Any updates? Thanks! |
My apologies for the delay, we have been trying to line up a few more features like a new, more stable LogStore API that LogStore builders like you folks can use. However, there are a few more pieces that we still need to get in for the public API to be easily usable. I dont want you folks to block on those. So we can continue working using the existing LogStore API, which we will continue supporting for at least the 1.x releases. Later we can rewrite the IBMLogStore to use the new API, we are not going to worry about the 1.0 release we are targeting in a couple of weeks. Let me leave some quick comments |
@tdas thanks for the update and review. I made the needed changes. Also, let me know when you want to have the change to use the new API and I'll open a PR for that. Thanks |
delta-contribs/src/main/scala/org/apache/spark/sql/delta/storage/COSLogStore.scala
Outdated
Show resolved
Hide resolved
@guykhazma thank you for the speedy response. regarding the new LogStore APIs, I think we have to first convert the HadoopLogStore base class to use the new APIs, only then can the subclasses extending HadoopLogStore also use the new APIs easily. It can be done after this PR. If you are interested, you can take a crack at it if you want! |
"fs.fake.impl" -> classOf[FakeFileSystem].getName, | ||
"fs.fake.impl.disable.cache" -> "true") | ||
|
||
protected def shouldUseRenameToWriteCheckpoint: Boolean = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not really testing on actual IBM cloud ... do you folks have full integration testing on your side to really test the atomic guarantees etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, I used these tests since the same are used in the core library for the other log stores.
As for internal tests, we simulated concurrent writes by multiple log stores and verified it worked fine.
I am not sure though how we can have such tests here that check against IBM COS, do you have any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. i dont have a good way. even our existing AzureLogStore and S3SingleDriverLogStore do not have cloud-specific unit tests for the same reasons... the complexity of maintaining it in a public repo. As long you folks have integration tests, that's good enough for now.
delta-contribs/src/main/scala/org/apache/spark/sql/delta/storage/COSLogStore.scala
Outdated
Show resolved
Hide resolved
|
||
import org.apache.spark.sql.delta.storage._ | ||
|
||
class IBMCOSLogStoreSuite extends LogStoreSuiteBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please make the file name match the class name. there maybe other logstores and logstore suites in the contrib module.
@tdas thanks for the review! |
awesome. this looks good now. i will start the process of merging this! thank you for the super quick response! |
@guykhazma question for you! what is the git email address that you want to use for the commits? Linux foundation guideline requires having an email address tied to each contribution. and your github profile does not have any email on it, and email on the commits are "33684427+guykhazma@users.noreply.github.com" which is not a public email for contacting. |
This PR adds support for IBM Cloud Object Storage (IBM COS) by creating
COSLogStore
which extends theHadoopFileSystemLogStore
and relies on IBM COS ability to handle atomic writes using Etags.The support for IBM COS relies on the following properties:
fs.cos.atomic.write
totrue
.In addition I propose the following documentation to be added to the Storage Configuration page.