Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document parquet ArrowWriter type limitations #5875

Merged
merged 2 commits into from
Jun 15, 2024
Merged

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jun 12, 2024

Which issue does this PR close?

Closes #5849
Closes #5868

Rationale for this change

It was not clear that the parquet writer is not able to write IntervalMonthDayNano data rather than it was not yet implemented

What changes are included in this PR?

Add documentiation explaining type support to ArrowWriter

Are there any user-facing changes?

Docs only

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jun 12, 2024
@alamb
Copy link
Contributor Author

alamb commented Jun 12, 2024

cc @xinlifoobar

/// The following are not supported:
///
/// * [`IntervalMonthDayNanoArray`]: Parquet does not [support nanosecond intervals].
///
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Date64 is also not supported IIRC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked and it seems like Date64 can be written:

andrewlamb@Andrews-MacBook-Pro-2:~/Software/arrow-rs$ datafusion-cli -c 'select c1, arrow_typeof(c1) from "/tmp/test.parquet"'
DataFusion CLI v39.0.0
+---------------------+------------------------------------+
| c1                  | arrow_typeof(/tmp/test.parquet.c1) |
+---------------------+------------------------------------+
| 1970-01-01T00:00:00 | Date64                             |
| 1970-01-01T00:00:00 | Date64                             |
+---------------------+------------------------------------+
2 row(s) fetched.
Elapsed 0.005 seconds.
fn main() {
    let input_array = Date64Array::from(vec![123, 456]);

    let batch = RecordBatch::try_from_iter(
        vec![("c1", Arc::new(input_array) as _)]
    ).unwrap();

    println!("Input array: {}", pretty_format_batches(&[batch.clone()]).unwrap());

    if let Err(e) = std::fs::remove_file("/tmp/test.parquet") {
        println!("can't remove file: {e:?}");
    }

    // round trip to parquet
    let f = File::create_new("/tmp/test.parquet").unwrap();
    let schema = batch.schema();
    let props = None;
    let mut writer = ArrowWriter::try_new(f, schema, props).unwrap();

    writer.write(&batch).unwrap();
    writer.flush().unwrap();
    writer.close();

}

@alamb alamb merged commit d89b3b9 into apache:master Jun 15, 2024
16 checks passed
@alamb alamb deleted the issue-5846 branch June 15, 2024 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support writing IntervalMonthDayNanoArray to parquet via Arrow Writer
2 participants