-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
the following PR adds uuids to certain substrait identifiers to disambiguate them, but this may make the plans non reproducable. @Blizzara has some ideas how how we can avoid the UUIDs
FWIW, I looked a bit at what it'd take to fix the tracker. I think a core of the issue is that DF checks name ambiguity in two ways: there's the AmbiguousColumn exception you're running into, and then there is a validate_unique_names() function which gets called on the creation of the Project. The former needs unique non-qualified names, while the latter needs unique schema names (which can be qualified).
An easy fix for the former would be to change name_for_alias() into qualified_name()._1 here
| match self.get_unique_name(expr.name_for_alias()?) { |
CAST(B.C as Utf8) with a qualified name ([no qualifier], "B.C") and a schema name "B.C", as well as a reference to the original column B.C with a qualified name ("B", "C") and also schema name "B.C". As the qualified name's name parts are different, it wouldn't be renamed (after the change I propose), and then it'd fail the validate_unique_names() check. So maybe for a proper fix, NameTracker would need to track both the schema name and the name-part of the qualified name, and rename until both are unique.
(A simple example of the behavior of the CAST and validate_unique_names() is that SELECT data.a, CAST(data.a as string) from data; also fails in datafusion-cli.)
Originally posted by @Blizzara in #17299 (comment)