[Java] Unexpected RecordBatch length when saving empty table to file with compression #194

DrChainsaw · 2023-05-17T15:59:03Z

Describe the bug, including details regarding any error messages, version, and platform.

This might be more of a usage question since I couldn't find anything in the format docs on how to set the length field with compression.

The issue is that if I try to read an empty table with the Julia extension it just hangs. The reason for this seems to be that it only checks the length field in the RecordBatch when deciding whether to attempt to decode and not the length read from the first 8 bytes of the data.

The file created by the code below is readable by both pyarrow and the java implementation, so chances are that the Julia implementation is doing it wrong (I will open an issue there as well). Is there some reference to how one shall interpret the length field in RecordBatch when using compression?

Code to create an empty table in case I'm doing something wrong

    public static void main(String[] args) {
        try (BufferAllocator allocator = new RootAllocator()) {
            Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
            Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
            Schema schemaPerson = new Schema(asList(name, age));
            try(
                    VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, allocator)
            ){
                vectorSchemaRoot.allocateNew(); // Needed?
                vectorSchemaRoot.setRowCount(0); // Needed?
                File file = new File("randon_access_to_file.arrow");
                try (
                        FileOutputStream fileOutputStream = new FileOutputStream(file);
                        ArrowFileWriter writer = new ArrowFileWriter(vectorSchemaRoot, null, fileOutputStream.getChannel(),
                                null, IpcOption.DEFAULT,
                                CommonsCompressionFactory.INSTANCE, CompressionUtil.CodecType.ZSTD)
                ) {
                    writer.start();
                    writer.writeBatch();
                    writer.end();
                    System.out.println("Record batches written: " + writer.getRecordBlocks().size() + ". Number of rows written: " + vectorSchemaRoot.getRowCount());
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }

When I tried saving a compressed empty table using pyarrow I got 0 as the length field and the Julia implementation could read the table without hanging.

Disclaimer: I don't have a working python installation so I did this though PythonCall. Hopefully I managed to remove all the Julia-isms so that it runs in python:

schema = pa.schema([pa.field('nums', pa.int32())])

with pa.OSFile('bigfile.arrow', 'wb') as sink:
   with pa.ipc.new_file(sink, schema, options=pa.ipc.IpcWriteOptions(compression='zstd'))) as writer:
         batch = pa.record_batch([pa.array([], type=pa.int32())], schema)
         writer.write(batch)

Component(s)

Java

DrChainsaw added the Type: bug Something isn't working label May 17, 2023

DrChainsaw mentioned this issue May 17, 2023

Failure to read compressed empty table from java implementation apache/arrow-julia#437

Closed

kou changed the title ~~Unexpected RecordBatch length when saving empty table to file with compression~~ [Java] Unexpected RecordBatch length when saving empty table to file with compression May 17, 2023

assignUser transferred this issue from apache/arrow Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Java] Unexpected RecordBatch length when saving empty table to file with compression #194

[Java] Unexpected RecordBatch length when saving empty table to file with compression #194

DrChainsaw commented May 17, 2023

[Java] Unexpected RecordBatch length when saving empty table to file with compression #194

[Java] Unexpected RecordBatch length when saving empty table to file with compression #194

Comments

DrChainsaw commented May 17, 2023

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)