Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(services/s3): Add object versioning support #3943

Closed
wants to merge 2 commits into from

Conversation

suyanhanx
Copy link
Member

@suyanhanx suyanhanx commented Jan 8, 2024

Part of #2611

@@ -137,3 +137,37 @@ jobs:
OPENDAL_S3_BUCKET: opendal-testing
OPENDAL_S3_ROLE_ARN: arn:aws:iam::952853449216:role/opendal-testing
OPENDAL_S3_REGION: ap-northeast-1

test_s3_on_versionning_enable_bucket:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be part of behavior test instead of edge test. We can add a new s3 setup with version enabled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an extra burden for other unrelated tests, and I think it's unnecessary. 🤔
So I put this in the edge tests.
(Of course, a new s3 setup is necessary, wherever it is put.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an extra burden for other unrelated tests, and I think it's unnecessary. 🤔

It's our public API that many services could implement. Adding edge tests for them one by one adds more burden.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for us to set up some additional composite actions for those cases where we need to add special settings to the service itself?
Agree that setting up one by one is cumbersome.

But for now, we could set it first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for us to set up some additional composite actions for those cases where we need to add special settings to the service itself?

It's an existing nature that some s3 bucket doesn't enable object versioning. So the steps we need to take is:

  • Add read_with_version, stat_with_version capabilities.
  • Add enable_version for s3 service. We will enable corresponding capabilities if this flag has been enabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mainly considering the cleanup issue. With storage services that have version control, cleanup can be more complicated.

Storage services have version support will also have lifecycle. This should not be an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd have to remove every version of each file, so I hope to minimize the scope of the impact as much as possible.

Please don't overestimate the impact of test artifacts. In the worst cases, we have some files left in a bucket which is fine.

Since version control should not affect normal user usage

version is part of our public API. Please don't think of it's not affect normal user usage. It should be treated the same as range, buffer, concurrent. I see no reason to test version separately.

there should be no need to test the functionality of version control unless it is enabled.

This is why we have capability.


  1. Add version-related test cases to the existing behavior test. In this way, we can use environment variables to skip cases that do not involve version control, based on whether version control is enabled.

We don't require additional environment variables. Instead, we need a fresh setup for services. Take S3, for instance; we require a newly configured bucket with the feature enabled.

Additionally, we must introduce a new flag for the S3 service, which should be integrated into our API rather than configured in CI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's a great news.
We could keep them for just 1 day.
Could the test bucket we're currently using be configured the same way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could keep them for just 1 day.
Could the test bucket we're currently using be configured the same way?

Yes, already have.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mainly considering the cleanup issue. With storage services that have version control, cleanup can be more complicated. We'd have to remove every version of each file, so I hope to minimize the scope of the impact as much as possible. Since version control should not affect normal user usage, there should be no need to test the functionality of version control unless it is enabled.

So, can we:

  1. Add version-related test cases to the existing behavior test. In this way, we can use environment variables to skip cases that do not involve version control, based on whether version control is enabled. But it also needs to add some extra cleanup steps.
  2. Create a separate behavior test suite(maybe like this PR, but not in edge tests) that does not affect existing tests.

The test infra would not be a problem we just need to configure the lifecycle like the following below

image

@@ -664,7 +682,7 @@ impl S3Core {

pub async fn s3_delete_objects(
&self,
paths: Vec<String>,
paths: Vec<(String, Option<String>)>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to support delete multiple versions for now. This should be considered in deleter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleter hasn't been implemented yet, so I changed the batch.
It's ok to keep it as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok to keep it as is.

Let's keep batch api as is.

@Zheaoli
Copy link
Member

Zheaoli commented Jan 10, 2024

The code is OK.

But as what @Xuanwo said, we don't need to support multiple versions for now.

If we want to support multiple version for S3 service, I think we need support list/(get the object by version) at the same time. Otherwise the API would be not symmetry if we just add it for delete api

@Xuanwo
Copy link
Member

Xuanwo commented Jan 10, 2024

But as what @Xuanwo said, we don't need to support multiple versions for now.

This is incorrect.

I'm not saying we don't need to support multiple versions for now. What I meant is that we should consider version as part of our public API and test it within the same behavioral test suites.

If we want to support multiple version for S3 service, I think we need support list/(get the object by version) at the same time. Otherwise the API would be not symmetry if we just add it for delete api

Thay are already added in our Object Version RFC.

@Xuanwo Xuanwo changed the title feat(services/s3): object versioning feat(services/s3): Add object versioning support Jan 10, 2024
@drernie
Copy link

drernie commented Mar 14, 2024

Is this PR still under development? We are very interested in using OpenDAL for reading and writing versioned files on S3 buckets, and it looks like we can't do that until this PR is merged.

We would be delighted to help you with testing!

@suyanhanx
Copy link
Member Author

Is this PR still under development? We are very interested in using OpenDAL for reading and writing versioned files on S3 buckets, and it looks like we can't do that until this PR is merged.

We would be delighted to help you with testing!

Thank you for your attention! I'm still working on the development. I will finish it as soon as possible.

@drernie
Copy link

drernie commented May 28, 2024 via email

@Xuanwo
Copy link
Member

Xuanwo commented Jun 3, 2024

Thanks. Any update?

Thanks for the inquery. I will working on this after 0.47 released (about this week).

@drernie
Copy link

drernie commented Jun 24, 2024

No pressure, but very curious if there is any progress...

@Xuanwo
Copy link
Member

Xuanwo commented Jun 25, 2024

No pressure, but very curious if there is any progress...

Hello, apologies for not keeping you updated promptly. I'm currently working on designing the object version as well as addressing other feature requests such as #4321. I plan to initiate a PR before next week once the design is finalized.

@Xuanwo
Copy link
Member

Xuanwo commented Jul 10, 2024

Has been implemented by #4873. Thanks @suyanhanx's effort.

@Xuanwo Xuanwo closed this Jul 10, 2024
@Xuanwo Xuanwo deleted the object-versioning branch July 10, 2024 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants