Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance the formatting for Column #11724

Closed
wants to merge 6 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 48 additions & 1 deletion datafusion/common/src/column.rs
Original file line number Diff line number Diff line change
@@ -18,6 +18,7 @@
//! Column

use arrow_schema::{Field, FieldRef};
use std::borrow::Cow;

use crate::error::_schema_err;
use crate::utils::{parse_identifiers_normalized, quote_identifier};
@@ -156,6 +157,17 @@ impl Column {
}
}

fn quoted_flat_name_if_contain_dot(&self) -> String {
match &self.relation {
Some(r) => format!(
"{}.{}",
table_reference_to_quoted_string(r),
quoted_if_contain_dot(&self.name)
),
None => quoted_if_contain_dot(&self.name).to_string(),
}
}
Comment on lines +160 to +169
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jayzhan211 Before I fix other tests, I want to check if this behavior makes sense. (It involves too many tests 😢 ).
Now, we only quote an identifier if it contains the dot. However, some cases like sum(t1.c1) will also be quoted, even if it's a function call. I think it's not worth doing more checking to exclude this kind of case. What do you think?

Copy link
Contributor

@jayzhan211 jayzhan211 Aug 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not ideal if sum(t1.c1) is quoted 🤔 . I hope the change is as small as possible, so I would prefer to keep function or others Expr remain the same, only identifier with dot is quoted.

We could also hold on and wait for more input from other's about the change of this, given the change of this is not trivial

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of modifying Column, we should modify the display_name for Expr, so if we found column inside ScalarFunction, we could skip the double quote anyway. (by something like boolean flag?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of modifying Column, we should modify the display_name for Expr, so if we found column inside ScalarFunction, we could skip the double quote anyway. (by something like boolean flag?)

I'm not sure if it's that simple 🤔. In my experience, the column might look like this:

Column { qualifier: None, name: "sum(t1.c1)" }

I think it's hard to find a consistent pattern for it because we use many Column::from_name calls to create projections. For example, in

.map(|x| Column::from_name(self.ident_normalizer.normalize(x)))

the column name could be complex and unruly.


/// Qualify column if not done yet.
///
/// If this column already has a [relation](Self::relation), it will be returned as is and the given parameters are
@@ -328,6 +340,37 @@ impl Column {
}
}

fn quoted_if_contain_dot(s: &str) -> Cow<str> {
if s.contains(".") {
Cow::Owned(format!("\"{}\"", s.replace('"', "\"\"")))
} else {
Cow::Borrowed(s)
}
}

fn table_reference_to_quoted_string(table_ref: &TableReference) -> String {
match table_ref {
TableReference::Bare { table } => quoted_if_contain_dot(table).to_string(),
TableReference::Partial { schema, table } => {
format!(
"{}.{}",
quoted_if_contain_dot(schema),
quoted_if_contain_dot(table)
)
}
TableReference::Full {
catalog,
schema,
table,
} => format!(
"{}.{}.{}",
quoted_if_contain_dot(catalog),
quoted_if_contain_dot(schema),
quoted_if_contain_dot(table)
),
}
}

impl From<&str> for Column {
fn from(c: &str) -> Self {
Self::from_qualified_name(c)
@@ -372,7 +415,7 @@ impl FromStr for Column {

impl fmt::Display for Column {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{}", self.quoted_flat_name())
write!(f, "{}", self.quoted_flat_name_if_contain_dot())
}
}

@@ -462,5 +505,9 @@ mod tests {
assert_eq!(col.to_string(), "t1.a");
let col = Column::new(TableReference::none(), "t1.a");
assert_eq!(col.to_string(), r#""t1.a""#);
let col = Column::new(Some(TableReference::full("a.b", "c.d", "e.f")), "g.h");
assert_eq!(col.to_string(), r#""a.b"."c.d"."e.f"."g.h""#);
let col = Column::new(TableReference::none(), "max(a)");
assert_eq!(col.to_string(), "max(a)")
}
}
3 changes: 2 additions & 1 deletion datafusion/expr/src/logical_plan/plan.rs
Original file line number Diff line number Diff line change
@@ -3528,7 +3528,8 @@ digraph {
.build()
.unwrap();

let expected = "Projection: t.\"max(id)\"\n Filter: t.state = Utf8(\"CO\")\n TableScan: t";
let expected =
"Projection: t.max(id)\n Filter: t.state = Utf8(\"CO\")\n TableScan: t";
let actual = format!("{}", plan.display_indent());
assert_eq!(expected.to_string(), actual);