Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to write deephaven tables to iceberg #6125

Open
malhotrashivam opened this issue Sep 25, 2024 · 1 comment · May be fixed by #5989
Open

Add support to write deephaven tables to iceberg #6125

malhotrashivam opened this issue Sep 25, 2024 · 1 comment · May be fixed by #5989
Assignees
Labels
feature request New feature or request iceberg parquet Related to the Parquet integration
Milestone

Comments

@malhotrashivam
Copy link
Contributor

The spec is being developed along with the development work but at a higher level, the decided APIs look like:

  1. void append(Tables...) or appendDataFiles/appendTables: writes tables to data files 1:1, does a transaction to add new data files
  2. void overwrite(Tables...) : writes tables to data files 1:1, does a transaction to remove all data files and add new ones
  3. List<URI> write(Tables…) : writes tables to data files 1:1, does not put anything in transaction

An important requirement is that we need to persist the Iceberg schema element field-ids into the parquet schema Type field_id field, to map iceberg columns to parquet columns.

@malhotrashivam malhotrashivam added feature request New feature or request parquet Related to the Parquet integration iceberg labels Sep 25, 2024
@malhotrashivam malhotrashivam added this to the 0.37.0 milestone Sep 25, 2024
@malhotrashivam malhotrashivam self-assigned this Sep 25, 2024
@devinrsmith
Copy link
Member

We should also see if there is any specific guidance on metadata we should be writing down; in the case of writing a pyarrow table using pyiceberg, we've noticed that the metadata key ARROW:schema contains the arrow schema; in the case of pyspark, it wrote a metadata key iceberg.schema that contains the iceberg schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request iceberg parquet Related to the Parquet integration
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants