-
Notifications
You must be signed in to change notification settings - Fork 830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example of converting RecordBatches to JSON objects #5364
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -74,7 +74,35 @@ | |
//! [`LineDelimitedWriter`] and [`ArrayWriter`] will omit writing keys with null values. | ||
//! In order to explicitly write null values for keys, configure a custom [`Writer`] by | ||
//! using a [`WriterBuilder`] to construct a [`Writer`]. | ||
|
||
//! | ||
//! ## Writing to [serde_json] JSON Objects | ||
//! | ||
//! To serialize [`RecordBatch`]es into an array of | ||
//! [JSON](https://docs.serde.rs/serde_json/) objects, use the [RawValue] api | ||
//! | ||
//! [RawValue]: https://docs.rs/serde_json/latest/serde_json/value/struct.RawValue.html | ||
//! | ||
//! ``` | ||
//! # use std::sync::Arc; | ||
//! # use arrow_array::{Int32Array, RecordBatch}; | ||
//! # use arrow_schema::{DataType, Field, Schema}; | ||
//! # use serde_json::{Map, Value}; | ||
//! | ||
//! let schema = Schema::new(vec![Field::new("a", DataType::Int32, false)]); | ||
//! let a = Int32Array::from(vec![1, 2, 3]); | ||
//! let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(a)]).unwrap(); | ||
//! | ||
//! let json_rows: Vec<Map<String, Value>> = todo!("How do we do this?"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tustvold can you help / point me at code that does what you are thinking of so I can update the example? I couldn't immediately see how to apply the suggestion you are making There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can "parse" a serialized JSON string into a RawValue, this allows embedding it into existing serde flows without paying additional decoding overheads. There is no way to obtain a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it -- I will try and update the example to show reparsing the string to Json value with a note about performance. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given I am still very confused about how the RawValue api fits in here (perhaps because as you hint, there is no clear usecase), I am going to remove mention from the docs to avoid confusion. I wonder if people potentially were using the Maybe we can point them to the https://crates.io/crates/serde_arrow crate for that usecase 🤔 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Say you have a larger JSON document you want to embed the arrow data into, you could parse into RawValue in order to embed it. That's the major use-case I can think of
I guess we shall find out 😅 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I made a PR to serde_arrow with an example of how to use that crate to make arrow arrays out of rust structs: chmp/serde_arrow#131 So now I feel quite good about directing people there ❤️ |
||
//! // let json_rows = arrow_json::writer::record_batches_to_json_rows(&[&batch]).unwrap(); | ||
//! assert_eq!( | ||
//! serde_json::Value::Object(json_rows[1].clone()), | ||
//! serde_json::json!({"a": 2}), | ||
//! ); | ||
//! ``` | ||
//! | ||
//! | ||
//! | ||
//! | ||
mod encoder; | ||
|
||
use std::iter; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how much value this example has, to be honest, other than to demonstrate feature parity with previous releases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I similarly am not immensely convinced of its utility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I did put this example at the end of the docs, so hopefully it is minimally confusing