-
Notifications
You must be signed in to change notification settings - Fork 817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP - Fix bug JSON input barfs on {"emptylist":[]} #1063
Conversation
@@ -1546,6 +1546,38 @@ mod tests { | |||
one_column_roundtrip("list_single_column", values, true, Some(SMALL_SIZE / 2)); | |||
} | |||
|
|||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 for test driven development 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks :)
Also pls #1036 (comment)
let is_nullable = match self.level_type { | ||
LevelType::Primitive(is_nullable) => is_nullable, | ||
LevelType::List(is_nullable) => is_nullable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only change to the code path -- it addresses the error message we were seeing where the List
leveltype was being picked up by the catchall (thus the panic).
@@ -1574,6 +1570,9 @@ mod tests { | |||
.unwrap(); | |||
|
|||
let a = ListArray::from(a_list_data); | |||
// let values = Arc::new(a); | |||
// one_column_roundtrip("null_list_single_column", values, true, Some(SMALL_SIZE / 2)); | |||
|
|||
let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(a)]).unwrap(); | |||
roundtrip("test_null_list_single_column.parquet", batch, None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error I am now seeing comes from the reader being unable to read the parquet that is written out.
thread 'arrow::arrow_writer::tests::null_list_single_column' panicked at 'called `Result::unwrap()` on an `Err` value: ArrowError("Unable to convert parquet INT32 logical type Some(UNKNOWN(NullType)) or converted type NONE")', parquet/src/arrow/array_reader.rs:1502:14
I have peppered the code with dbg!
to see if I can make sense of what's going on but so far all I have is a hunch that we've also got a potential bug in the reader. Detailed output from this test appears below:
---- arrow::arrow_writer::tests::null_list_single_column stdout ----
[parquet/src/arrow/levels.rs:754] &self = LevelInfo {
definition: [
1,
],
repetition: Some(
[
0,
],
),
array_offsets: [
0,
0,
],
array_mask: [
true,
],
max_definition: 2,
level_type: List(
true,
),
offset: 0,
length: 0,
}
[parquet/src/arrow/levels.rs:788] &filtered = []
[parquet/src/arrow/array_reader.rs:1844] &context = ArrayReaderBuilderContext {
def_level: 0,
rep_level: 0,
path: ColumnPath {
parts: [
"arrow_schema",
],
},
}
[parquet/src/arrow/array_reader.rs:1845] &child = GroupType {
basic_info: BasicTypeInfo {
name: "emptylist",
repetition: Some(
OPTIONAL,
),
converted_type: LIST,
logical_type: Some(
LIST(
ListType,
),
),
id: None,
},
fields: [
GroupType {
basic_info: BasicTypeInfo {
name: "list",
repetition: Some(
REPEATED,
),
converted_type: NONE,
logical_type: None,
id: None,
},
fields: [
PrimitiveType {
basic_info: BasicTypeInfo {
name: "item",
repetition: Some(
OPTIONAL,
),
converted_type: NONE,
logical_type: Some(
UNKNOWN(
NullType,
),
),
id: None,
},
physical_type: INT32,
type_length: -1,
scale: -1,
precision: -1,
},
],
},
],
}
[parquet/src/arrow/array_reader.rs:1495] &list_type = GroupType {
basic_info: BasicTypeInfo {
name: "emptylist",
repetition: Some(
OPTIONAL,
),
converted_type: LIST,
logical_type: Some(
LIST(
ListType,
),
),
id: None,
},
fields: [
GroupType {
basic_info: BasicTypeInfo {
name: "list",
repetition: Some(
REPEATED,
),
converted_type: NONE,
logical_type: None,
id: None,
},
fields: [
PrimitiveType {
basic_info: BasicTypeInfo {
name: "item",
repetition: Some(
OPTIONAL,
),
converted_type: NONE,
logical_type: Some(
UNKNOWN(
NullType,
),
),
id: None,
},
physical_type: INT32,
type_length: -1,
scale: -1,
precision: -1,
},
],
},
],
}
[parquet/src/arrow/array_reader.rs:1496] &context = ArrayReaderBuilderContext {
def_level: 0,
rep_level: 0,
path: ColumnPath {
parts: [
"arrow_schema",
],
},
}
[parquet/src/arrow/array_reader.rs:1497] &item_type = PrimitiveType {
basic_info: BasicTypeInfo {
name: "item",
repetition: Some(
OPTIONAL,
),
converted_type: NONE,
logical_type: Some(
UNKNOWN(
NullType,
),
),
id: None,
},
physical_type: INT32,
type_length: -1,
scale: -1,
precision: -1,
}
[parquet/src/arrow/array_reader.rs:1498] &new_context = ArrayReaderBuilderContext {
def_level: 2,
rep_level: 1,
path: ColumnPath {
parts: [
"arrow_schema",
"emptylist",
],
},
}
thread 'arrow::arrow_writer::tests::null_list_single_column' panicked at 'called `Result::unwrap()` on an `Err` value: ArrowError("Unable to convert parquet INT32 logical type Some(UNKNOWN(NullType)) or converted type NONE")', parquet/src/arrow/array_reader.rs:1502:14
@alamb FYI I am afk until early Jan now so won't be making any progress on this until then. |
Still here and interested. Sorry for the hiatus - will try and spin back up shortly. Confirming that this is still relevant pls @alamb |
👋 @novemberkilo #1036 is still open and I assume it is still relevant, but we will have to confirm with @chadbrewbaker I think |
@alamb I have an update.
I am not sure whether I have uncovered a bug with I am going to close this PR for now because it is stale and a bit noisy. I will recreate a clean PR when I am clearer about the outcome of the code that appears here. Copying @tustvold because I wonder if this is now close to the work you've done recently in #1246 |
Thanks @novemberkilo |
Which issue does this PR close?
Closes #1036
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?