Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to generate a fixed-size-list with nullable fields and specified field name for inner types with FixedSizeListBuilder #1353

Closed
TimDiekmann opened this issue Feb 22, 2022 · 10 comments · Fixed by #5541
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog good first issue Good for newcomers

Comments

@TimDiekmann
Copy link

Given the same example as in #1352, it's not possible to specify the Field of the nested type.

I expect to have a Field definition like

let inner_field = Field::new("part", DataType::UInt8, false);
let outer_field = Field::new(
    "ip-address",
    DataType::FixedSizeList(Box::new(inner_field), 4),
    true,
);

Notable things:

  • inner_field is not nullable
  • inner_field is called "part"

When using the FixedSizeListBuilder, the finish() method will generate the Field hardcoded:

Field::new("item", self.datatype, true);

Proposed solution

Add a way to specify the inner field

@TimDiekmann TimDiekmann added the enhancement Any new improvement worthy of a entry in the changelog label Feb 22, 2022
@alamb alamb changed the title Unable to generate a fixed-size-list with nullable fields and specified field name for inner types with builders Unable to generate a fixed-size-list with nullable fields and specified field name for inner types with FixedSizeListBuilder Feb 28, 2022
@alamb
Copy link
Contributor

alamb commented Feb 28, 2022

Thanks for the report @TimDiekmann ! Feel free to submit a PR if you want, and otherwise I think this would be a good first project for someone who wanted to get introduced to the codebase

@alamb alamb added arrow Changes to the arrow crate good first issue Good for newcomers labels Feb 28, 2022
@jaredzhou
Copy link

i think add a with_field new method could simpley solve the problem. but it seems like nullable property never checked in arrow-rs, recordbatch is ok with nullable data to not-nullable schema. could i just add the with_field new method for now
@alamb

@alamb
Copy link
Contributor

alamb commented Apr 12, 2022

@jaredzhou I am not quite sure what you are proposing with with_field -- I agree the code in arrow-rs does not validate the nullness specified in the schema against the data (by design) which can lead to bugs

@jaredzhou
Copy link

sorry for not being clear,what i mean is add a create method and pass the Field in FixedSizeListBuilder struct

pub fn with_field(values_builder: T, length: i32, field: Field) -> Self

then we can set DataType of FixedSizeList in finish() method.

@alamb
Copy link
Contributor

alamb commented Apr 13, 2022

@jaredzhou seems like a reasonable suggestion to me, though I am not most knoweldgeable person in this area of the code -- feel free to put up a draft PR and we can get some additional eyes on it.

Thank you for your contribution!

@jaredzhou
Copy link

jaredzhou commented Apr 13, 2022

thank u @alamb
after some code reading, i find that field name is meanless for element in a list, even when we convert arrow array to json. since nullable is not strictly checked, i will keep the code just there. and i would love to find some other issue to work with.

@JohannesFranzMaier
Copy link

I think in the meantime there was a change that it is actually checked. For example, I want to construct this schema:

let flow_field = Field::new("flow", DataType::Float32, false);
let field_value_list = Field::new("Value", DataType::FixedSizeList(Arc::from(flow_field), 3), false);
let schema = Schema::new(vec![field_value_list]);

So far, no problem.
But afterwards I use this builder to build the FixedSizeList.

let mut flow_data = FixedSizeListBuilder::new(Float32Builder::new(), 3);

The problem is now, when I want to construct a RecordBatch from my schema and the data output:

let flow_column = Arc::new(flow_data.finish()) as Arc<dyn Array>;
RecordBatch::try_new(Arc::from(schema), vec![flow_column])?

I get the following error:

Invalid argument error: column types must match schema types, expected FixedSizeList(Field { name: "flow", data_type: Float32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, 3) but found FixedSizeList(Field { name: "item", data_type: Float32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 3) at column index 0

Am I mistaken here or shouldn't that work?

@JohannesFranzMaier
Copy link

I could "fix" th issue by using:

let flow_field = Field::new("item", DataType::Float32, true);

But that does not seem to be a very good fix though, because my Fields should be not nullable...

@istvan-fodor
Copy link
Contributor

@alamb I created draft PR for this, can you check if it covers this problem?

@alamb
Copy link
Contributor

alamb commented Mar 22, 2024

Thanks @istvan-fodor -- I will try and review it over the next few days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants