Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Introduce Google Cloud Storage Implementation #4344

Merged
merged 18 commits into from
Jan 10, 2024

Conversation

Xuanwo
Copy link
Contributor

@Xuanwo Xuanwo commented Jan 3, 2024

Description

Part of #4236

This PR implements:

  • a basic opendal storage implementation (not exposed)
  • Google Cloud Storage Implementation (exposed via config)

There are many TODOs to finish, but I feel like it's better to make it working first.

How was this PR tested?

Added google_cloud_storage_test_suite

Signed-off-by: Xuanwo <github@xuanwo.io>
@CLAassistant
Copy link

CLAassistant commented Jan 3, 2024

CLA assistant check
All committers have signed the CLA.

quickwit/Cargo.toml Outdated Show resolved Hide resolved
Xuanwo added 2 commits January 4, 2024 08:20
Signed-off-by: Xuanwo <github@xuanwo.io>
@Xuanwo
Copy link
Contributor Author

Xuanwo commented Jan 4, 2024

Hi @etolbakov, what are your thoughts on the current plan? Should we first merge the OpenDAL storage and then add GCS storage? For GCS, integration with configurations and settings is necessary. Once GCS is operational, we can incorporate metrics and other utilities similar to those provided by S3 or FS.

@etolbakov
Copy link
Collaborator

etolbakov commented Jan 4, 2024

Hey @Xuanwo
Actually I was doing some work on this ticket in the background, to my shame I didn't assign/mention that anyhow.
My implementation strategy was the opposite: using opendal exactly for gcs integration, make sure it fits nicely in the project, then make it generic and potentially migrate other storage layers.
But I totally see where you are coming from. So we may join our efforts.

but I feel like it's better to make it working first.

Could you please elaborate a bit about what do you mean by "working first". Is it a simple integration test that we can draft?

Feel free to reach out in the discord if that's fine with you

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Jan 5, 2024

Feel free to reach out in the discord if that's fine with you

Great, moved to https://discord.com/channels/908281611840282624/908281611840282627/1192630218482012180

Signed-off-by: Xuanwo <github@xuanwo.io>
@Xuanwo Xuanwo changed the title feat: Introduce OpenDAL Storage Implementation feat: Introduce Google Cloud Storage Implementation Jan 8, 2024
@Xuanwo
Copy link
Contributor Author

Xuanwo commented Jan 8, 2024

Hi @etolbakov, could you share your Discord ID? I'd like to invite you to our discussion thread: https://discord.com/channels/908281611840282624/1192630218482012180

@fulmicoton fulmicoton self-requested a review January 8, 2024 09:57
@fulmicoton
Copy link
Contributor

@Xuanwo I like the approach: using a generic opendal storage, and adding a factory for googlefs.
I should have time tomorrow to properly review.

Signed-off-by: Xuanwo <github@xuanwo.io>
@@ -118,6 +118,7 @@ num_cpus = "1"
numfmt = "1.1.1"
once_cell = "1"
oneshot = "0.1.5"
opendal = { version ="0.44", default-features = false }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we feature gate this to make the development a tiny bit lighter?
(You can mimick what is done for azure.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a new feature called google. Is that what you were looking for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Signed-off-by: Xuanwo <github@xuanwo.io>
Copy link
Contributor

@fulmicoton fulmicoton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know a way to mock gfs for our CI? Also have you seen the storage testsuite?

Xuanwo added 3 commits January 9, 2024 10:33
Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
@Xuanwo
Copy link
Contributor Author

Xuanwo commented Jan 9, 2024

Do you know a way to mock gfs for our CI? Also have you seen the storage testsuite?

I have added fake-gcs-server in CI and docker compose, but I don't know how get it started.

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Jan 9, 2024

I have added fake-gcs-server in CI and docker compose, but I don't know how get it started.

Ok, I got the right place. Let me add them.

Signed-off-by: Xuanwo <github@xuanwo.io>
@Xuanwo
Copy link
Contributor Author

Xuanwo commented Jan 9, 2024

I have added integration tests for gcs:

test google_cloud_storage_test_suite ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.19s

The mock gcs server doesn't properly support bulk deletion, so I've disabled that test. However, I'm confident it functions correctly on the actual gcs.

quickwit/Cargo.toml Outdated Show resolved Hide resolved
.context("write_and_bulk_delete")?;
// Fake GCS Server doesn't support bulk delete correctly.
// ref: <https://github.com/fsouza/fake-gcs-server/issues/1443>
#[cfg(not(feature = "google"))]
Copy link
Contributor

@fulmicoton fulmicoton Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not do the check using the compiler flag, but instead dynamically inspect the storage.
As is, your code is removing this test for azure too in our CI.

(these test are running in the confusingly named "coverage CI" that is triggered after each merge to main. That build takes a bit more time than the regular PR CI and includes azure etc.)

You can have the Storage trait implement Any.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addresed, please review again

Xuanwo and others added 4 commits January 9, 2024 15:41
Co-authored-by: Paul Masurel <paul@quickwit.io>
Co-authored-by: Paul Masurel <paul@quickwit.io>
…rage.rs

Co-authored-by: Paul Masurel <paul@quickwit.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
Copy link
Collaborator

@etolbakov etolbakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xuanwo looks wonderful! I don't have anything to suggest!

@@ -70,11 +74,19 @@ azure = [
"azure_storage/enable_reqwest_rustls",
"azure_storage_blobs/enable_reqwest_rustls",
]
google = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call the feature gfs instead of google?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will update this place tomorrow. 😋

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i ll do it actually... i spotted a bunch of other problems.

@fulmicoton fulmicoton merged commit d47b6bd into quickwit-oss:main Jan 10, 2024
7 checks passed
@Xuanwo
Copy link
Contributor Author

Xuanwo commented Jan 10, 2024

Thanks a lot!

@Xuanwo Xuanwo deleted the add-gcs-support branch January 10, 2024 03:04
@fulmicoton
Copy link
Contributor

Thank you @Xuanwo that's awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants