Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destination S3: add delta lake/delta table support #16322

Open
mustafa-rmd opened this issue Sep 4, 2022 · 9 comments
Open

Destination S3: add delta lake/delta table support #16322

mustafa-rmd opened this issue Sep 4, 2022 · 9 comments
Labels
area/connectors Connector related issues community connectors/destination/s3 frozen Not being actively worked on lang/java team/destinations Destinations team's backlog type/enhancement New feature or request

Comments

@mustafa-rmd
Copy link

My current requirement is to have the following data pipeline:
PostgreSQL (Source)
Air byte
Minio - S3 storage (Destination)
Apache spark configure with (Minio and Delta lake formatting) since spark doesn’t support ACID transactions.

The goals to have air bye move data from PostgreSQL (Source) to Minio storage (Destination) saved in delta format. Spark then will come and read data from S3 expected to be with delta format.

My main issue with the output format for Air bye S3 connector. Currently is only supports 3 data types: CSV, Avro and JSON Lines (JSONL).

What is the recommend way to solve this problem? since I think, many companies are trying to build this data pipeline.
Is there plan to have this feature released in upcoming releases?
Should we implement this feature? If so, is there a good documentation of how to start about it?
Or, is there another method of going about it?

Thanks,

@natalyjazzviolin
Copy link
Contributor

natalyjazzviolin commented Sep 6, 2022

Hi @mustafa-rmd , could you please edit your request to follow our feature request template? This will ensure all details are understood clearly. I've copied it below. Thank you!

Tell us about the problem you're trying to solve

What are you trying to do, and why is it hard? A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you’d like

A clear and concise description of what you want to see happen, or the change you would like to see

Describe the alternative you’ve considered or used

A clear and concise description of any alternative solutions or features you've considered or are using today.

Additional context

Add any other context or screenshots about the feature request here.

Are you willing to submit a PR?

Remove this with your answer :-)

@natalyjazzviolin natalyjazzviolin changed the title S3 destination saved format as delta lake (delta tables) S3 destination: add delta lake/delta table support Sep 6, 2022
@natalyjazzviolin natalyjazzviolin changed the title S3 destination: add delta lake/delta table support Destination S3: add delta lake/delta table support Sep 6, 2022
@natalyjazzviolin natalyjazzviolin removed the team/tse Technical Support Engineers label Sep 6, 2022
@grishick grishick added the team/destinations Destinations team's backlog label Sep 27, 2022
@mustafa-rmd
Copy link
Author

mustafa-rmd commented Sep 27, 2022

Problem

Deltalake (Delta table) format is an essential format for many pipeline architecture epically for ones that uses apache Spark in their pipeline.

Solution

I would like Delta format to be added along with apache avro, Json, etc.

Describe the alternative you’ve considered or used

Not alternatives

Additional context

When choosing a destination format I would like to see Delta format as one of the options
image

Are you willing to submit a PR?

Yes

@misteryeo
Copy link
Contributor

@dennyglee Noted in your discussion that you're adding this to your roadmap. Just wanted to confirm that you're planning to contribute here?

@dennyglee
Copy link

@misteryeo Yes, we are planning to contribute here - it may or may not be me personally, but feel free to ping me on this until we figure this out :)

@wkargul
Copy link

wkargul commented Jul 3, 2023

Hey @dennyglee is there any update on that?

@seunggs
Copy link

seunggs commented Jul 30, 2023

@dennyglee @mustafa-rmd Any updates on this by any chance?

@herry13
Copy link

herry13 commented Sep 23, 2023

Hi @dennyglee @mustafa-rmd Any updates on this feature request? I am using Airbyte & DeltaLake in production. So I would love to see this destination connector to be available as soon as possible. I'm willing to give you some hands if needed.

@NatElkins
Copy link

NatElkins commented Sep 26, 2023

Just want to chime in that I'm also interested in this!

Edited to add that I'm interested in writing a delta table to S3. I'm not sure I'll end up making a PR for this, but for anyone else who wants the same thing it looks like a PR would have to be made here: https://github.com/airbytehq/airbyte/tree/0e9fdba1181b2d302b81a057f6fa16a198925eaa/airbyte-integrations/bases/base-java-s3/src/main/java/io/airbyte/integrations/destination/s3

You'd also have to make a PR here: https://github.com/airbytehq/airbyte/blob/0e9fdba1181b2d302b81a057f6fa16a198925eaa/airbyte-integrations/connectors/destination-s3/src/main/resources/spec.json

@arorapankaj
Copy link

Do we have any update on this feature request ?

@bleonard bleonard added the frozen Not being actively worked on label Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues community connectors/destination/s3 frozen Not being actively worked on lang/java team/destinations Destinations team's backlog type/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests