Improve substrait NameTracker so it doesn't require uuids

the following PR adds uuids to certain substrait identifiers to disambiguate them, but this may make the plans non reproducable. @Blizzara has some ideas how how we can avoid the UUIDs
- https://github.com/apache/datafusion/pull/17299


FWIW, I looked a bit at what it'd take to fix the tracker. I think a core of the issue is that DF checks name ambiguity in two ways: there's the AmbiguousColumn exception you're running into, and then there is a `validate_unique_names()` function which gets called on the creation of the Project. The former needs unique non-qualified names, while the latter needs unique schema names (which _can_ be qualified). 

An easy fix for the former would be to change `name_for_alias()` into `qualified_name()._1` here https://github.com/apache/datafusion/blob/1d9e13845021c2e82a012c2e83938c2a7661f295/datafusion/substrait/src/logical_plan/consumer/utils.rs#L398. However, that then regresses the latter check (including in the test case for this PR), since there will then be a project node with an expr `CAST(B.C as Utf8)` with a qualified name ([no qualifier], "B.C") and a schema name "B.C", as well as a reference to the original column `B.C` with a qualified name ("B", "C") and also schema name "B.C". As the qualified name's name parts are different, it wouldn't be renamed (after the change I propose), and then it'd fail the `validate_unique_names()` check. So maybe for a proper fix, NameTracker would need to track **both the schema name and the name-part of the qualified name**, and rename until both are unique.

(A simple example of the behavior of the CAST and validate_unique_names() is that `SELECT data.a, CAST(data.a as string) from data;` also fails in datafusion-cli.)

_Originally posted by @Blizzara in https://github.com/apache/datafusion/pull/17299#discussion_r2330926518_
            

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve substrait NameTracker so it doesn't require uuids #17508

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve substrait NameTracker so it doesn't require uuids #17508

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions