You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Apparently, you can make a program that appears to write a parquet file in parallel, but it will currently produce corrupt parquet data.
To Reproduce
Description in the email says:
I was attempting to build a single Parquet from the batches in what I thought was a parallel manner using the ArrowWriter. I tried to "parallelise" the following serial code.
let cursor = InMemoryWriteableCursor::default();letmut writer = ArrowWriter::try_new(cursor.clone(), schema,None)?;for batch in batches {
writer.write(batch)?;}
writer.close()?;
I realised that although the compiler accepted my incorrect parallel version of this code, it in-fact was not sound which caused the corruption.
Expected behavior
The API should not allow corrupted data / produce a compiler error
Actually writing a parquet file in parallel is tracked in #1718
Describe the bug
(from the mailing list)
Apparently, you can make a program that appears to write a parquet file in parallel, but it will currently produce corrupt parquet data.
To Reproduce
Description in the email says:
Expected behavior
The API should not allow corrupted data / produce a compiler error
Actually writing a parquet file in parallel is tracked in #1718
Additional context
Mailing list https://lists.apache.org/thread/rbhfwcpd6qfk52rtzm2t6mo3fhvdpc91
The text was updated successfully, but these errors were encountered: