Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research OpenDAL for possible usages #15715

Open
spencergilbert opened this issue Dec 23, 2022 · 13 comments
Open

Research OpenDAL for possible usages #15715

spencergilbert opened this issue Dec 23, 2022 · 13 comments
Labels
domain: sinks Anything related to the Vector's sinks domain: sources Anything related to the Vector's sources needs: investigation Needs investigation to determine effort. needs: rfc Needs an RFC before work can begin. sink: aws_s3 Anything `aws_s3` sink related sink: azure_blob Anything `azure_blob` sink related sink: gcp_cloud_storage Anything `gcp_cloud_storage` sink related sink: redis Anything `redis` sink related source: aws_s3 Anything `aws_s3` source related source: redis Anything `redis` source related

Comments

@spencergilbert
Copy link
Contributor

spencergilbert commented Dec 23, 2022

https://github.com/datafuselabs/opendal

Likely requires an RFC prior to implementation, but a spike could be interesting.

OpenDAL today appears to support the following services that Vector has components for today:

  • AWS S3
  • Azure Storage Blob
  • FS(?)
  • Google Cloud Storage
  • HDFS
  • Redis

As well as HDFS (which has been requested in the past), among a number of others. It's possible we could replace the disparate dependencies of these components and just use OpenDAL as a shared interface.

sccache recently started using OpenDAL for it's remote storage needs.

@spencergilbert spencergilbert added sink: aws_s3 Anything `aws_s3` sink related domain: sources Anything related to the Vector's sources domain: sinks Anything related to the Vector's sinks sink: gcp_cloud_storage Anything `gcp_cloud_storage` sink related needs: rfc Needs an RFC before work can begin. source: aws_s3 Anything `aws_s3` source related sink: azure_blob Anything `azure_blob` sink related sink: redis Anything `redis` sink related source: redis Anything `redis` source related needs: investigation Needs investigation to determine effort. labels Dec 23, 2022
@Xuanwo
Copy link
Contributor

Xuanwo commented Dec 26, 2022

Hi, OpenDAL's maintainer is here! ❤️

Thanks for the researching, and I'm willing to answer any potential questions if needed.

@Xuanwo
Copy link
Contributor

Xuanwo commented Jan 27, 2023

Based on my experience with sccache's OpenDAL migration, we can evaluate OpenDAL by adding new service support. For example, we can use OpenDAL to implement hdfs/oss/... support (I'm interested in helping with this!).

This way:

  • We don't break existing features
  • OpenDAL can implement features that are needed by Vector.
  • We can release new features!

After the feature has been successfully released, we can try to migrate existing services one by one. Since the API provided by OpenDAL is elegant (yes, I'm proud of that!), the migration should be as easy as removing existing code.

@spencergilbert
Copy link
Contributor Author

@Xuanwo sink side is more obvious to me on how OpenDAL could be integrated. Do you have any initial thoughts on how a source could leverage it? Today we're relying on notifications from SQS to know when to read from S3.

At a glance nothing like that's included in OpenDAL - so we'd still need to catch those and then use OpenDAL to pull the related blobs?

@Xuanwo
Copy link
Contributor

Xuanwo commented Jan 31, 2023

At a glance nothing like that's included in OpenDAL - so we'd still need to catch those and then use OpenDAL to pull the related blobs?

I'm afraid so. OpenDAL can't handle notifications services like SQS.

@spencergilbert
Copy link
Contributor Author

I think a good target to spike would be a hdfs sink, is that something you're interested in trying and contributing @Xuanwo or should I add this to our backlog to test when we get the time?

@Xuanwo
Copy link
Contributor

Xuanwo commented Feb 3, 2023

I think a good target to spike would be a hdfs sink, is that something you're interested in trying and contributing @Xuanwo or should I add this to our backlog to test when we get the time?

Cool, let's do this!

@spencergilbert
Copy link
Contributor Author

We're trying to wrap up some guidance for contributing new sinks that should hopefully be useful to reference - #16070

@matt-moor-hs

This comment was marked as spam.

@Xuanwo
Copy link
Contributor

Xuanwo commented Feb 7, 2023

We're trying to wrap up some guidance for contributing new sinks that should hopefully be useful to reference - #16070

Some progress report: I'm working on this now. I expect to submit a draft for review later this week!

@Xuanwo
Copy link
Contributor

Xuanwo commented Feb 12, 2023

PR landed in #16399. Maybe we can add hdfs first and than adding more features?

@Xuanwo
Copy link
Contributor

Xuanwo commented Apr 2, 2023

The pull request #16557 has been merged for some time now. I am considering writing documentation or a post on how to implement a sink using OpenDAL, in order to encourage others to participate in the development process. Which approach do you prefer?

@spencergilbert
Copy link
Contributor Author

Hey @Xuanwo I'll discuss with @jszwedko this week, I think we'll want to do a check to make sure we've got feature parity between OpenDAL based sinks and our existing sinks - but I'll let you know ASAP.

Congrats on the Apache incubation!

@gaby
Copy link

gaby commented May 19, 2023

Related to #3382

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: sinks Anything related to the Vector's sinks domain: sources Anything related to the Vector's sources needs: investigation Needs investigation to determine effort. needs: rfc Needs an RFC before work can begin. sink: aws_s3 Anything `aws_s3` sink related sink: azure_blob Anything `azure_blob` sink related sink: gcp_cloud_storage Anything `gcp_cloud_storage` sink related sink: redis Anything `redis` sink related source: aws_s3 Anything `aws_s3` source related source: redis Anything `redis` source related
Projects
None yet
Development

No branches or pull requests

4 participants