-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CONVERT TO DELTA as a pure delta-rs API #1682
Comments
Here are the PySpark APIs for that functionality:
Perhaps we could expose something similar in delta-rs? |
Both SparkSQL and PySpark have similar statement or API already, but it makes sense to have the same API in delta-rs so that we can directly invoke such "convert to delta" function in a light-weight fashion inside a non-Spark context (such as microservice, Lambda, Python script) w/o having a running SQL Warehouse or spinning up a PySpark process. Especially when we have some services which can generate a lot of Parquet files instead of CSV/JSON files, it will be quite useful to generate the Delta + Iceberg + Hudi manifest metadata directly and swiftly before we convert and compact such Parquet directories into Delta Table. I envision this "convert to delta" function can support the additional option to generate Iceberg and Hudi metadata optionally if user choose to opt-in as well. This will be perfectly aligned with Uniform 3.0 standard. |
Description
The equivalent API in delta-rs to
CONVERT TO DELTA parquet.'s3://my-bucket/parquet-data';
directly in Python.It is https://docs.delta.io/latest/api/python/index.html#delta.tables.DeltaTable.convertToDelta for delta-rs without Spark.
Use Case
Let's say that we have a bunch of Parquet directories which can be quickly/effectively converted to Delta.
2nd request: generate Uniform 3.0 manifest via delta-rs. That will be great.
Related Issue(s)
#1041
The text was updated successfully, but these errors were encountered: