Skip to content

Remove Wildcard from Expr #7765

@waynexia

Description

@waynexia

Is your feature request related to a problem or challenge?

Expr::Wildcard and Expr::QualifiedWildcard are expressions that reference all columns. But it seems redundant that we don't need a special expr type to do that. This issue proposes to remove these two expr kinds.

Describe the solution you'd like

Wildcard (*) can be expanded to concrete column lists when it appears. This manner seems viable in the three most common use cases I can come up with:

  • (1) generate plan from SQL AST. Some scenarios already acted like this. E.g., select * from table will generate a projection with all fields, rather than a wildcard expr.
  • (2) from LogicalPlanBuilder and (3) from DataFrame. In some aspects, these two entrances are the same. And both are strong typed (or strong schema-ed), which allows us to expand the wildcard immediately using the schema from current stage.

Besides this, Expr::Wildcard is not properly handled in the codebase, because it's not a "first class" expr. Take some functions as examples:

  • expr_to_columns: from the correctness aspect, it should also count columns referenced by Expr::Wildcard and Expr::QualifiedWildcard.
  • create_physical_name: this function requires all the wildcards to be expanded before calling it.

And I find another issue that related to Wildcard: #5473. The solution is to add an optimizer rule that expands all wildcards. This proposal is going to do something similar but in a more eager way.

Describe alternatives you've considered

No response

Additional context

Since Expr is widely used, we may need several versions to deprecate it (depending on the compatibility rule). We may ship this change step by step before fully removing these two variants

  • change all the internal usage of wildcards (most of them are in aggr expr I suppose)
  • change the public interface that may generate wildcard. Like Cannot Count(Expr:Wildcard) with DataFrame API #5473
  • mark Expr::Wildcard and Expr::QualifiedWildcard as #[deprecated]
  • fully remove them

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions