You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While writing tests (both in IOx and in DataFusion) where I need a single RecordBatch, I often find myself doing something like this:
let schema = Arc::new(Schema::new(vec![
ArrowField::new("float_field", ArrowDataType::Float64, true),
ArrowField::new("time", ArrowDataType::Int64, true),
]));
let float_array: ArrayRef = Arc::new(Float64Array::from(vec![10.1, 20.1, 30.1, 40.1]));
let timestamp_array: ArrayRef = Arc::new(Int64Array::from(vec![1000, 2000, 3000, 4000]));
let batch = RecordBatch::try_new(schema, vec![float_array, timestamp_array])
.expect("created new record batch");
This is annoying because the information that float_field is a float is encoded both in the Schema and the Float64Array
I would much rather rather be able to construct RecordBatches a a builder style to avoid the the redundancy and reduce the amount of typing / redundancy:
let float_array: ArrayRef = Arc::new(Float64Array::from(vec![10.1, 20.1, 30.1, 40.1]));
let timestamp_array: ArrayRef = Arc::new(Int64Array::from(vec![1000, 2000, 3000, 4000]));
let batch = RecordBatch::empty()
.append("float_field", timestamp_array).unwrap()
.append("time", float_array).unwrap;
The proposal is to add a method to RecordBatch like
Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-12411
Use case:
While writing tests (both in IOx and in DataFusion) where I need a single
RecordBatch
, I often find myself doing something like this:This is annoying because the information that
float_field
is a float is encoded both in the Schema and theFloat64Array
I would much rather rather be able to construct RecordBatches a a builder style to avoid the the redundancy and reduce the amount of typing / redundancy:
The proposal is to add a method to
RecordBatch
likeThat would append the a field name to the current schema, returning an error if field_name was already present.
The nullability of the field would be set based on the actual null count of the field_values
The text was updated successfully, but these errors were encountered: