-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename Expr::display_name
to Expr::schema_name
, make UNNEST
naming conform to convention
#11797
Conversation
display_name
with schema_name
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
display_name
with schema_name
Expr::display_name
to Expr::schema_name
, make UNNEST
naming conform to convention
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jayzhan211 -- this looks like an improvement to me 🙏
I think making a deprecated function would be really nice to help people upgrade
I also think if we are going to be messing around with the API we should consider updating the API to avoid allocations when possible
@@ -1137,7 +1137,7 @@ from arrays_values_without_nulls; | |||
## array_element (aliases: array_extract, list_extract, list_element) | |||
|
|||
# Testing with empty arguments should result in an error | |||
query error DataFusion error: Error during planning: Error during planning: array_element does not support zero arguments. | |||
query error DataFusion error: Execution error: expect 2 args, got 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the old error message was nicer -- maybe we can update the code that generates this message to be nicer too 🤔
datafusion/expr/src/expr.rs
Outdated
/// Those are the main difference | ||
/// 1. Alias, where excludes expression | ||
/// 2. Cast / TryCast, where takes expression only | ||
pub fn schema_name(&self) -> Result<String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to introduce a new API anyways, what do you think about making an API that doesn't require allocating a new string and making it infallable. Something like
pub fn schema_name(&self) -> Result<String> { | |
pub fn schema_name<'a>(&'a self) -> impl Display + 'a { |
We could probably then implement a new type thing and mostly use the same code:
// new type wrapper
struct SchemaDisplay<'a>(&'a Expr);
impl <'a> Display for SchemaDisplay<'a> {
fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
match self {
// The same as Display
Expr::Column(_)
| Expr::Literal(_)
| Expr::ScalarVariable(..)
| Expr::Sort(_)
| Expr::OuterReferenceColumn(..)
| Expr::Placeholder(_)
| Expr::Wildcard { .. } => write!(f, "{self}")?,
...
}
}
return SchemaDisplay(self)
The benefit of doing this would be anywhere it needed to get formatted would not require a copy
println!("This would not allocated a String: {}", expr.schema_name());
// get schema name as a string:
let schema_name = expr.schema_name().to_string();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks cool!
datafusion/expr/src/expr.rs
Outdated
/// Returns the name for schema / field that is different from Display | ||
/// Most of the expressions are the same as Display | ||
/// Those are the main difference | ||
/// 1. Alias, where excludes expression | ||
/// 2. Cast / TryCast, where takes expression only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a suggesiton for how to improve this documentation:
/// Returns the name for schema / field that is different from Display | |
/// Most of the expressions are the same as Display | |
/// Those are the main difference | |
/// 1. Alias, where excludes expression | |
/// 2. Cast / TryCast, where takes expression only | |
/// The name of the column (field) that this `Expr` will produce. | |
/// | |
/// For example, for a projection (e.g. `SELECT <expr>`) the resulting arrow | |
/// [`Schema`] will have a field with this name. | |
/// | |
/// Note that the resulting string is subtlety different than the `Display` | |
/// representation for certain `Expr`. Some differences: | |
/// | |
/// 1. [`Expr::Alias`], which shows only the alias itself | |
/// 2. [`Expr::Cast`] / [`Expr::TryCast`], which only displays the expression | |
/// | |
/// # Example | |
/// ``` | |
/// # use datafusion_expr::{col, lit}; | |
/// let expr = col("foo").eq(lit(42)); | |
/// assert_eq!("foo = Int32(42)", expr.schema_name().unwrap()); | |
/// | |
/// let expr = col("foo").alias("bar").eq(lit(11)); | |
/// assert_eq!("bar = Int32(11)", expr.schema_name().unwrap()); | |
/// ``` | |
/// | |
/// [`Schema`]: arrow::datatypes::Schema |
@@ -358,8 +358,8 @@ mod tests { | |||
|
|||
let plan = from_substrait_plan(&ctx, &proto).await?; | |||
let plan_str = format!("{:?}", plan); | |||
assert_eq!(plan_str, "Projection: Decimal128(Some(10000),5,2) * sum(CASE WHEN FILENAME_PLACEHOLDER_1.p_type LIKE CAST(Utf8(\"PROMO%\") AS Utf8) THEN FILENAME_PLACEHOLDER_0.l_extendedprice * Int32(1) - FILENAME_PLACEHOLDER_0.l_discount ELSE Decimal128(Some(0),19,0) END) / sum(FILENAME_PLACEHOLDER_0.l_extendedprice * Int32(1) - FILENAME_PLACEHOLDER_0.l_discount) AS PROMO_REVENUE\ | |||
\n Aggregate: groupBy=[[]], aggr=[[sum(CASE WHEN FILENAME_PLACEHOLDER_1.p_type LIKE CAST(Utf8(\"PROMO%\") AS Utf8) THEN FILENAME_PLACEHOLDER_0.l_extendedprice * Int32(1) - FILENAME_PLACEHOLDER_0.l_discount ELSE Decimal128(Some(0),19,0) END), sum(FILENAME_PLACEHOLDER_0.l_extendedprice * Int32(1) - FILENAME_PLACEHOLDER_0.l_discount)]]\ | |||
assert_eq!(plan_str, "Projection: Decimal128(Some(10000),5,2) * sum(CASE WHEN FILENAME_PLACEHOLDER_1.p_type LIKE Utf8(\"PROMO%\") THEN FILENAME_PLACEHOLDER_0.l_extendedprice * Int32(1) - FILENAME_PLACEHOLDER_0.l_discount ELSE Decimal128(Some(0),19,0) END) / sum(FILENAME_PLACEHOLDER_0.l_extendedprice * Int32(1) - FILENAME_PLACEHOLDER_0.l_discount) AS PROMO_REVENUE\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense as long as the cast is still in plans (which it is)
datafusion/expr/src/udf.rs
Outdated
Ok(format!("{}({})", self.name(), names.join(","))) | ||
} | ||
|
||
/// Returns the user-defined schema name of the UDF given the arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Returns the user-defined schema name of the UDF given the arguments | |
/// Returns the name of the column this expression would create | |
/// | |
/// See [`Expr::schema_name`] for details |
Maybe it is also worth doing the impl Display
thing here too which would allow avoiding these string allocations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems I need another SchemaFunctionDisplay
for replacing String with impl Display in UDFImpl. I would like to hold on to the change for function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense -- I don't think there is any reason that this impl Display thing has to be done -- I was thinking it might be worth exploring if we were already changing the code anyways
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- thanks @jayzhan211
@@ -1961,6 +2270,7 @@ impl fmt::Display for Expr { | |||
}, | |||
Expr::Placeholder(Placeholder { id, .. }) => write!(f, "{id}"), | |||
Expr::Unnest(Unnest { expr }) => { | |||
// TODO: use Display instead of Debug, there is non-unique expression name in projection issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we file a ticket to track this (maybe others would be interested in helping)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice that #11198 is filed already.
Thanks @alamb |
* update dependencies * update get_logical_plan signature * remove row_number() function row_number was converted to a UDF in datafusion v42 apache/datafusion#12030 This specific functionality needs to be added back in. * remove unneeded dependency * fix pyo3 warnings Implicit defaults for trailing optional arguments have been deprecated in pyo3 v0.22.0 PyO3/pyo3#4078 * update object_store dependency * change PyExpr -> PySortExpr * comment out key.extract::<&PyTuple>() condition statement * change more instances of PyExpr > PySortExpr * update function signatures to use _bound versions * remove clone * Working through some of the sort requirement changes * remove unused import * expr.display_name is deprecated, used format!() + schema_name() instead * expr.canonical_name() is deprecated, use format!() expr instead * remove comment * fix tuple extraction in dataframe.__getitem__() * remove unneeded import * Add docstring comments to SortExpr python class * change extract() to downcast() Co-authored-by: Michael J Ward <Michael-J-Ward@users.noreply.github.com> * deprecate Expr::display_name Ref: apache/datafusion#11797 * fix lint errors * update datafusion commit hash * fix type in cargo file for arrow features * upgrade to datafusion 42 * cleanup --------- Co-authored-by: Tim Saucer <timsaucer@gmail.com> Co-authored-by: Michael J Ward <Michael-J-Ward@users.noreply.github.com> Co-authored-by: Michael-J-Ward <ward.michael.j@gmail.com>
Which issue does this PR close?
Part of #11782
Closes #.
Rationale for this change
display_name
is used widely but it is actually the different things from Display trait. After taking a look, I thinkschema_name
is a less confusing term, since we usually build the name withdisplay_name
for schema / field.What changes are included in this PR?
schema_name
is introduced and we should continue to use theschema_name
for schema/field later on.display_name
is removed and we should useDisplay
trait insteadAre these changes tested?
Are there any user-facing changes?