Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python write_deltalake: when will this be production ready? #715

Closed
kk921dbg opened this issue Jul 27, 2022 · 6 comments · Fixed by #1155
Closed

Python write_deltalake: when will this be production ready? #715

kk921dbg opened this issue Jul 27, 2022 · 6 comments · Fixed by #1155
Labels
enhancement New feature or request

Comments

@kk921dbg
Copy link

Description

Just wondering when python write_deltalake will be ready for production use as it says its still experimental

Use Case
We need this for use from an Azure App Service as it will save us having to deploy multiple reources

Related Issue(s)

@kk921dbg kk921dbg added the enhancement New feature or request label Jul 27, 2022
@wjones127
Copy link
Collaborator

Our plan is to have it production ready for read and append workloads by end of this year.

two big things need to happen ( that are tracked in other issues):

  1. Support writer protocol v2
  2. Support objects stores such as S3 and Azure blob store (it only works on local file systems right now)

@vmesel
Copy link

vmesel commented Sep 23, 2022

Hey @wjones127, how are you? Any updates on the S3 writing functionality? Should I use this repo as a production tool or should I leave it and rollback to databricks?

Now I'm trying a pure AirFlow Silver layer approach, but I need to be able to save the delta table to S3. Already tried using s3fs and StringIO, but I get the following error:

     90 else:
     91     table = table_or_uri
---> 92     table_uri = table_uri = table._table.table_uri()
     94 # TODO: Pass through filesystem once it is complete
     95 # if filesystem is None:
     96 #    filesystem = pa_fs.PyFileSystem(DeltaStorageHandler(table_uri))
     98 if table:  # already exists

AttributeError: '_io.TextIOWrapper' object has no attribute '_table'

@wjones127
Copy link
Collaborator

Should I use this repo as a production tool or should I leave it and rollback to databricks?

We've implemented support for S3 and other object stores. I'm still working on full support for writer protocol v2: #834

Once that's complete we'll need some real world user testing from folks like you before I'd call it fully production ready :)

We don't support the s3fs package in the current release, but will in our next release. You won't be able to pass a StringIO object though; instead, you will wrap the s3fs filesystem you configure in a PyArrow filesystem with FSSpecHandler.

Have you tried configuring S3 through the settings show here (without s3fs)? That should work in the current release.
https://delta-io.github.io/delta-rs/python/usage.html#loading-a-delta-table

@tgilon
Copy link

tgilon commented Jan 12, 2023

Hi @wjones127 , Thanks for the amazing job ! Any news ? I'm also wondering to use this repo to save production data in delta table on S3.

@wjones127
Copy link
Collaborator

Any news ? I'm also wondering to use this repo to save production data in delta table on S3.

Right now I'd call it Beta-supported right now. We've implemented writer protocol v2 and our support for authentication is pretty well tested now. What we need most now is users trying it out and reporting any issues in their systems. We have plenty of tests here, but the real world is always more complex :) I will drop the experimental label in the next release or perhaps the one after.

@madsenwattiq
Copy link

That’s great. When will we see the next Python release to PyPI? That release should, given recent merges, incorporate a key bug fix (#685) for anybody doing time series data. I’m going to start beta testing immediately, as soon as I can build containers from a PyPI package. Thank you!

wjones127 added a commit that referenced this issue Mar 3, 2023
# Description
The description of the main changes of your pull request

# Related Issue(s)

- closes #715
- closes #373


# Documentation

<!---
Share links to useful documentation
--->

---------

Co-authored-by: Robert Pack <42610831+roeap@users.noreply.github.com>
chitralverma pushed a commit to chitralverma/delta-rs that referenced this issue Mar 17, 2023
# Description
The description of the main changes of your pull request

# Related Issue(s)

- closes delta-io#715
- closes delta-io#373


# Documentation

<!---
Share links to useful documentation
--->

---------

Co-authored-by: Robert Pack <42610831+roeap@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants