Fix duplicate column name bug #4353

Jakdaw · 2019-11-13T17:53:56Z

Bug Fix

Description

The existing code to ensure that we don't have duplicate column names has a bug if it picks a new column name that's the same as an existing column. This can be achieved with:

select 1 a, 2 a1, 3 a;

Changing if to while fixes this.

…a, 2 a1, 3 a;"

kravets-levko · 2019-11-13T18:01:25Z

redash/query_runner/__init__.py

@@ -105,7 +105,7 @@ def fetch_columns(self, columns):

        for col in columns:
            column_name = col[0]
-            if column_name in column_names:
+            while column_name in column_names:
                column_name = "{}{}".format(column_name, duplicates_counter)


Technically this code will work. But it may generate strange column names.

Assume we have query result with columns a a1 a12 a - last column needs unique name. This code will generate a123 (a (used) -> adds '1' -> a1 (used) -> adds '2' -> a12 (used) -> adds '3' -> a123, while just a2 would be totally fine.

What if on each iteration you will use original columns name + counter instead of concatenating counter to result from previous iteration?

To be honest I thought the situations where this might happen were pretty contrived so exactly what's chosen not too important; but wanted to fix it because the client code really relies upon the names being unique, and it's behaviour when they're not is just plain wrong!

Yes what you're suggesting sounds like an improvement, although you'd still have weirdness, like:

(a, a, a1) would map to (a, a1, a11)

... so we could perhaps alternatively, just drop the numbers altogether and say adding "_" characters until it's unique.

(a, a, a_) would then map to (a, a_, a__)

If I was going to be really pedantic I might want to limit column renames to columns that had a duplicate name before any of this logic was applied. So, for the previous example, have:

(a, a, a_) map to (a, a__, a_)

... then again I'm not sure whether any of these options are really much better; AND they might well mean that existing queries start to have different column names, and given all the Visualisation settings are based upon column names we're going to break people's Vis settings for existing queries by changing this behaviour. Which brings me back to where I started.... ;-)

I don't have a strong view - does anyone else?

There is no perfect solution I guess. The improvement I suggested in previous comment will generate shorter names for non-unique columns (e.g. in my example, second a column will get name a2 instead of a123). As about your case - it also could be solved: algorithm should just sort column names (descending) and then enumerate them: so (a, a, a1) would be checked in order (a1, a, a) and mapped to (a1, a2, a). WDYT?

+1 to @kravets-levko suggestion of sorting + using a single counter instead of appending (the end result will be less confusing to the user).

guidopetri · 2023-07-21T19:40:22Z

@Jakdaw , thanks for the PR! We've updated a lot of things now that we're Community-driven so - if you're still interested in getting this merged - would you mind rebasing off master to re-run the CI, as well as updating merge conflicts?

We're trying to clean up our PR todo list, so if you're not interested, that's fine - we'll close the PR in about a week if we don't hear back. If you're interested in reopening the PR afterwards, we would also very much welcome that.

guidopetri · 2023-07-22T13:08:02Z

Closing in favor of #4600

Fix duplicate column name bug that could be triggered with "select 1 …

203f786

…a, 2 a1, 3 a;"

kravets-levko reviewed Nov 13, 2019

View reviewed changes

kravets-levko added the Backend label Nov 13, 2019

weekly-digest bot mentioned this pull request Nov 18, 2019

Weekly Digest (11 November, 2019 - 18 November, 2019) #4368

Closed

rauchy mentioned this pull request Jan 30, 2020

Base duplicate index on column names #4600

Merged

1 task

guidopetri added the 1-week-review label Jul 21, 2023

guidopetri closed this Jul 22, 2023

justinclift removed the 1-week-review label Jul 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix duplicate column name bug #4353

Fix duplicate column name bug #4353

Jakdaw commented Nov 13, 2019

kravets-levko Nov 13, 2019

Jakdaw Nov 13, 2019

kravets-levko Nov 13, 2019

arikfr Nov 17, 2019

guidopetri commented Jul 21, 2023

guidopetri commented Jul 22, 2023

Fix duplicate column name bug #4353

Fix duplicate column name bug #4353

Conversation

Jakdaw commented Nov 13, 2019

Description

kravets-levko Nov 13, 2019

Choose a reason for hiding this comment

Jakdaw Nov 13, 2019

Choose a reason for hiding this comment

kravets-levko Nov 13, 2019

Choose a reason for hiding this comment

arikfr Nov 17, 2019

Choose a reason for hiding this comment

guidopetri commented Jul 21, 2023

guidopetri commented Jul 22, 2023