Skip to content

Commit

Permalink
ARROW-11279: [Rust][Parquet] ArrowWriter Definition Levels Memory Usage
Browse files Browse the repository at this point in the history
Writes leaves immediately after calculating array levels to reduce array level memory usage by the number of rows in a row group.

Closes #9222 from TurnOfACard/parquet-memory

Authored-by: Ryan Jennings <ryan@ryanj.net>
Signed-off-by: Neville Dipale <nevilledips@gmail.com>
  • Loading branch information
Ryan Jennings authored and nevi-me committed Jan 20, 2021
1 parent 555643a commit e7c69e6
Showing 1 changed file with 3 additions and 17 deletions.
20 changes: 3 additions & 17 deletions rust/parquet/src/arrow/arrow_writer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -86,25 +86,11 @@ impl<W: 'static + ParquetWriter> ArrowWriter<W> {
));
}
// compute the definition and repetition levels of the batch
let mut levels = vec![];
let batch_level = LevelInfo::new_from_batch(batch);
batch
.columns()
.iter()
.zip(batch.schema().fields())
.for_each(|(array, field)| {
let mut array_levels =
batch_level.calculate_array_levels(array, field, 1);
levels.append(&mut array_levels);
});
// reverse levels so we can use Vec::pop(&mut self)
levels.reverse();

let mut row_group_writer = self.writer.next_row_group()?;

// write leaves
for column in batch.columns() {
write_leaves(&mut row_group_writer, column, &mut levels)?;
for (array, field) in batch.columns().iter().zip(batch.schema().fields()) {
let mut levels = batch_level.calculate_array_levels(array, field, 1);
write_leaves(&mut row_group_writer, array, &mut levels)?;
}

self.writer.close_row_group(row_group_writer)
Expand Down

0 comments on commit e7c69e6

Please sign in to comment.