-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexed field access for List #1006
Conversation
I had to patch sqlparser to make this work, in order for sqlparser to support numbers as field selector. |
@Igosuki could you send your sqlparser-rs patch to upstream? We can help review release a new version to crates.io. |
I will try and get sql parser change merged in and a new release created something this weekend |
This PR (that we never completed) from @jorgecarleitao has related work I think -- namely access to struct fields: #628 |
I had initially used it, but it only allowed to use fields of a depth of
one, and was simply patching the compound field access.
I could look into allowing `table.col[i].field[j]` for instance in addition
to `table.col[i]['field'][j]'
Le ven. 17 sept. 2021 à 12:52, Andrew Lamb ***@***.***> a
écrit :
… This PR (that we never completed) from @jorgecarleitao
<https://github.com/jorgecarleitao> has related work I think -- namely
access to struct fields: #628
<#628>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1006 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADDFBQRRM23KXP72PZCFODUCMMX7ANCNFSM5EDGUKOA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Will get back to this soon once I sort out the related issue here apache/datafusion-sqlparser-rs#356 |
ae11339
to
0e0ede2
Compare
I updated sqlparser for arbitrary nested access. But I'm having trouble understanding how select on dict should behave for instance with this e2c00f1#diff-6a157f1b320a619e7a3896027ce7d47a8e3ffeeae8ac3d21fe9157c92ebc2a6bL4955 |
e2c00f1
to
ec777c0
Compare
I am preparing a sqlparser release FWIW: apache/datafusion-sqlparser-rs#360 |
ec777c0
to
2f414fb
Compare
@Igosuki do you need any help finishing up this PR? |
@houqp Indeed, I mean I could just remove the dict lookup and do it in another PR so we could just get array access here. |
I think there might be some confusions here. In my mind, BTW, I am perfectly fine with us splitting the PRs into two and handle the dictionary access as a follow up if you prefer to do so 👍 |
@houqp Ok, it's what I had in mind as well. |
4b7eabc
to
48cad8f
Compare
Removed dictionary lookup so it should be gtg now |
needs to rerun at on sha 48cad8f |
datafusion/src/logical_plan/expr.rs
Outdated
@@ -245,6 +246,13 @@ pub enum Expr { | |||
IsNull(Box<Expr>), | |||
/// arithmetic negation of an expression, the operand must be of a signed numeric data type | |||
Negative(Box<Expr>), | |||
/// Returns the field of a [`ListArray`] or ['DictionaryArray'] by name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by string name or integer indices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Igosuki I think made a good point here, would be better to mention both index and name. On top of this, I think DictionaryArray
should be removed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Returns the field of a [`ListArray`] or ['DictionaryArray'] by name | |
/// Returns the field of a [`ListArray`] by name |
let iter = concat(vec.as_slice()).unwrap(); | ||
Ok(ColumnarValue::Array(iter)) | ||
} | ||
_ => Err(DataFusionError::NotImplemented( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using other
and provide type name is more helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Igosuki what do you think about the comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh guess I overlooked this, will fix
232c974
to
bfb82de
Compare
I'll get to it as soon as this is merged |
I addressed the concerns in the comments, things should be gtg |
FYI I have this PR on my list to review, I just haven't had a chance to to do so yet -- will try and get it done tomorrow |
No worries, I've been slow as well and obviously there's a lot going on on the project. When it gets validated I'll do the follow up for struct array access. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is looking good personally -- thank you @Igosuki !
Let us know if you would like some help finishing up the comments
datafusion/src/logical_plan/expr.rs
Outdated
@@ -245,6 +246,13 @@ pub enum Expr { | |||
IsNull(Box<Expr>), | |||
/// arithmetic negation of an expression, the operand must be of a signed numeric data type | |||
Negative(Box<Expr>), | |||
/// Returns the field of a [`ListArray`] or ['DictionaryArray'] by name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Returns the field of a [`ListArray`] or ['DictionaryArray'] by name | |
/// Returns the field of a [`ListArray`] by name |
datafusion/src/lib.rs
Outdated
@@ -231,6 +231,7 @@ pub mod variable; | |||
pub use arrow; | |||
pub use parquet; | |||
|
|||
pub mod field_util; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub mod field_util; | |
pub (crate) mod field_util; |
Unless there is some reason to expose this module, I think it might be nice to leave it internal so that we can move it around as needed
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
//! get field of a struct array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//! get field of a struct array | |
//! get field of a `ListArray` |
datafusion/src/field_util.rs
Outdated
)) | ||
} | ||
_ => Err(DataFusionError::Plan( | ||
"The expression to get an indexed field is only valid for `List` or 'Dictionary'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The expression to get an indexed field is only valid for `List` or 'Dictionary'" | |
"The expression to get an indexed field is only valid for `List` types" |
Ok(ColumnarValue::Array(iter)) | ||
} | ||
_ => Err(DataFusionError::NotImplemented( | ||
"get indexed field is only possible on lists".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Untested:
"get indexed field is only possible on lists".to_string(), | |
format!("get indexed field is only possible on lists with int64 indexes. Tried {} with {} index", array.data_type(), self.key()) |
)), | ||
}, | ||
ColumnarValue::Scalar(_) => Err(DataFusionError::NotImplemented( | ||
"field is not yet implemented for scalar values".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"field is not yet implemented for scalar values".to_string(), | |
"field access is not yet implemented for scalar values".to_string(), |
let arg = self.arg.evaluate(batch)?; | ||
match arg { | ||
ColumnarValue::Array(array) => match (array.data_type(), &self.key) { | ||
(DataType::List(_), ScalarValue::Int64(Some(i))) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.key
is null, I think we could also return null
So perhaps we could add a new case like this (untested):
(DataType::List(_), _) if self.key.is_null=> {
let scalar_null: ScalarValue = array.data_type().try_into()?;
Ok(ColumnarValue::Scalar(scalar_null))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also the issue of an empty list which doesn't get through concat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good opportunity for follow up work / PRs
@@ -141,6 +143,10 @@ fn create_physical_name(e: &Expr, is_first_expr: bool) -> Result<String> { | |||
let expr = create_physical_name(expr, false)?; | |||
Ok(format!("{} IS NOT NULL", expr)) | |||
} | |||
Expr::GetIndexedField { expr, key } => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@alamb Added some test coverage in get_indexed_field.rs let me know if you agree with the changes |
Thanks for sticking with it @Igosuki ! |
Which issue does this PR close?
Closes #1005
Rationale for this change
Supporting accessing nested values is a great improvement as it makes everything more flexible since input data doesn't need to be flat, and so it doesn't need to be transformed by ETL prior to being ingested by datafusion.
What changes are included in this PR?
Nested value access for Lists and Dictionary
Are there any user-facing changes?
users will now be able to do :
list[0][0]
anddict['foo']['bar']
No breaking changes.
I need to test the dictionary access I've only used the list one.