Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Debug impls for ArrowWriter and SerializedFileWriter #4278

Merged
merged 2 commits into from
May 25, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented May 24, 2023

This is an alternate, more targeted version of #4276

Which issue does this PR close?

N/A

Rationale for this change

This caught me when working on https://github.com/influxdata/influxdb_iox/pull/7855

I had to manually derive a Debug impl for a structure because the ArrowWriter didn't

What changes are included in this PR?

  1. Add Debug impls for ArrowWriter and SerializedFileWriter

Are there any user-facing changes?

More Debug!

@github-actions github-actions bot added the parquet Changes to the parquet crate label May 24, 2023
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, left two minor comments

@@ -92,6 +93,31 @@ pub struct ArrowWriter<W: Write> {
max_row_group_size: usize,
}

impl<W: Write> Debug for ArrowWriter<W> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
let mut buffered_batches = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think buffered_batches is just self.buffer.len()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -147,6 +148,18 @@ pub struct SerializedFileWriter<W: Write> {
kv_metadatas: Vec<KeyValue>,
}

impl<W: Write> Debug for SerializedFileWriter<W> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW #[derive(Debug)] on TrackedWrite would allow similar here, without needing a manual impl. SerializedFileWriter doesn't contain any non-public types on it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do this with

diff --git a/parquet/src/file/writer.rs b/parquet/src/file/writer.rs
index 93e8319b0..3ab50fb4b 100644
--- a/parquet/src/file/writer.rs
+++ b/parquet/src/file/writer.rs
@@ -48,6 +48,7 @@ use crate::schema::types::{
 /// A wrapper around a [`Write`] that keeps track of the number
 /// of bytes that have been written. The given [`Write`] is wrapped
 /// with a [`BufWriter`] to optimize writing performance.
+#[derive(Debug)]
 pub struct TrackedWrite<W: Write> {
     inner: BufWriter<W>,
     bytes_written: usize,
@@ -134,6 +135,7 @@ pub type OnCloseRowGroup<'a> = Box<
 /// - Once finished writing row group, close row group writer by calling `close`
 /// - Write subsequent row groups, if necessary.
 /// - After all row groups have been written, close the file writer using `close` method.
+#[derive(Debug)]
 pub struct SerializedFileWriter<W: Write> {
     buf: TrackedWrite<W>,
     schema: TypePtr,
@@ -148,17 +150,6 @@ pub struct SerializedFileWriter<W: Write> {
     kv_metadatas: Vec<KeyValue>,
 }
 
-impl<W: Write> Debug for SerializedFileWriter<W> {
-    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
-        // implement Debug so this can be used with #[derive(Debug)]
-        // in client code rather than actually listing all the fields
-        f.debug_struct("SerializedFileWriter")
-            .field("descr", &self.descr)
-            .field("row_group_index", &self.row_group_index)
-            .field("kv_metadatas", &self.kv_metadatas)
-            .finish_non_exhaustive()
-    }
-}
 
 impl<W: Write> SerializedFileWriter<W> {
     /// Creates new file writer.

However, the compiler is complains like this ( I think it is saying I need to restrict the bound on SerializedFileWriter which I would prefer not to do as it would be backwards incompatible)

    Checking parquet v40.0.0 (/Users/alamb/Software/arrow-rs/parquet)
error[E0277]: `W` doesn't implement `std::fmt::Debug`
   --> parquet/src/arrow/arrow_writer/mod.rs:108:30
    |
108 |             .field("writer", &self.writer)
    |                              ^^^^^^^^^^^^ `W` cannot be formatted using `{:?}` because it doesn't implement `std::fmt::Debug`
    |
note: required for `file::writer::SerializedFileWriter<W>` to implement `std::fmt::Debug`
   --> parquet/src/file/writer.rs:138:10
    |
138 | #[derive(Debug)]
    |          ^^^^^ unsatisfied trait bound introduced in this `derive` macro
    = note: required for the cast from `file::writer::SerializedFileWriter<W>` to the object type `dyn std::fmt::Debug`
    = note: this error originates in the derive macro `Debug` (in Nightly builds, run with -Z macro-backtrace for more info)
help: consider further restricting this bound
    |
96  | impl<W: Write + std::fmt::Debug> Debug for ArrowWriter<W> {
    |               +++++++++++++++++

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah, yeah makes sense

fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
// implement Debug so this can be used with #[derive(Debug)]
// in client code rather than actually listing all the fields
f.debug_struct("SerializedFileWriter<W>")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it meaningful to have W in debug? No much info from it. Looks ArrowWriter doesn't have it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@tustvold tustvold merged commit 17ca4d5 into apache:master May 25, 2023
@tustvold
Copy link
Contributor

Thank you for this

@alamb alamb deleted the alamb/debug2 branch May 25, 2023 12:00
alamb added a commit to alamb/arrow-rs that referenced this pull request May 30, 2023
…e#4278)

* Add `Debug` impls for writers

* Improve display
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants