[SPARK-28897][SQL] 'coalesce' error when executing dataframe.na.fill #27392

PavithraRamachandran · 2020-01-30T06:00:41Z

What changes were proposed in this pull request?

Root Cause:
When a dataframe is created using select statement (using spark.sql.parser.quotedRegexColumnNames=true) dataframe fill is called- the fillCol in DataFrameNaFunctions, ``(backtick) are added explicitly to the columnNames, the column name is misunderstood to be a regex and it is set as an unresolvedregex, which makes the coalesce resolving to fail.

Observation
When we create the dataframe from the select statement using a regex, valid columns names are returned after applying the filter(regex). So adding backticks to column name in this flow was not needed. To check the impact, select statement with regex were used, there was no impact while executing without the backticks.

After Fix
While passing the columnname to the dataframe column method, ``(backtick) are not added, as the value that is received is not a regular expression, but a valid column name.

Why are the changes needed?

By doing this change column name is not considered as regex and the proper Column function is
And does not fail to resolve the expression.

Does this PR introduce any user-facing change?

NA

How was this patch tested?

unit test

dongjoon-hyun · 2020-01-30T06:46:03Z

ok to test

dongjoon-hyun · 2020-01-30T06:47:39Z

Hi, @PavithraRamachandran . Thank you for making a PR. Could you open this to master first? To prevent any regression at 3.0.0, we always start to merge at master first. I'll close this.

PavithraRamachandran · 2020-01-30T08:19:41Z

@dongjoon-hyun this issue is not present in master. It got fixed due to some implementation changes done for https://issues.apache.org/jira/browse/SPARK-29890

dongjoon-hyun · 2020-01-30T08:25:43Z

Then, can we backport that? We want to minimize a different implementation.

dongjoon-hyun · 2020-01-30T08:28:55Z

Did you ping on the JIRA or that PR? You should do that first.

PavithraRamachandran · 2020-01-30T08:34:16Z

i pinged on the above jira and was working it. the implementation change made for resolving jira SPARk-29890 fixed this jira issue too in master. I was not sure if the entire changes made for JIRA-29890 is needed in spark 2.4 , So i raised my fix. If we can backport JIRA-29890 , then we can close this and close the above jira too by backporting.

dongjoon-hyun · 2020-01-30T10:45:49Z

Sorry, but are you sure? I cannot find your comment on https://issues.apache.org/jira/browse/SPARK-29890 .

i pinged on the above jira and was working it.

dongjoon-hyun · 2020-01-30T10:47:28Z

First of all, we need to close SPARK-28897 as a duplicate of SPARK-29890. Then, we need to ask a backport. That's the way.

dongjoon-hyun · 2020-01-30T10:51:08Z

I understand your feeling, but we prefer to have a consistent JIRA and patch for the same issues of the different branches. BTW, we don't backport everything. Since you asked, I pinging on SPARK-28897 PR. Let's see.

DataFrame fill throwing exception correction

211a691

dongjoon-hyun changed the title ~~[SPARK-28897][Core]'coalesce' error when executing dataframe.na.fill~~ [SPARK-28897][SQL] 'coalesce' error when executing dataframe.na.fill Jan 30, 2020

dongjoon-hyun closed this Jan 30, 2020

dongjoon-hyun added the SQL label Jan 30, 2020

dongjoon-hyun mentioned this pull request Jan 30, 2020

[SPARK-29890][SQL] DataFrameNaFunctions.fill should handle duplicate columns #26593

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-28897][SQL] 'coalesce' error when executing dataframe.na.fill #27392

[SPARK-28897][SQL] 'coalesce' error when executing dataframe.na.fill #27392

PavithraRamachandran commented Jan 30, 2020

Uh oh!

dongjoon-hyun commented Jan 30, 2020

Uh oh!

dongjoon-hyun commented Jan 30, 2020

Uh oh!

PavithraRamachandran commented Jan 30, 2020

Uh oh!

dongjoon-hyun commented Jan 30, 2020

Uh oh!

dongjoon-hyun commented Jan 30, 2020 •

edited

Loading

Uh oh!

PavithraRamachandran commented Jan 30, 2020 •

edited

Loading

Uh oh!

dongjoon-hyun commented Jan 30, 2020

Uh oh!

dongjoon-hyun commented Jan 30, 2020

Uh oh!

dongjoon-hyun commented Jan 30, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-28897][SQL] 'coalesce' error when executing dataframe.na.fill #27392

[SPARK-28897][SQL] 'coalesce' error when executing dataframe.na.fill #27392

Conversation

PavithraRamachandran commented Jan 30, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

dongjoon-hyun commented Jan 30, 2020

Uh oh!

dongjoon-hyun commented Jan 30, 2020

Uh oh!

PavithraRamachandran commented Jan 30, 2020

Uh oh!

dongjoon-hyun commented Jan 30, 2020

Uh oh!

dongjoon-hyun commented Jan 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PavithraRamachandran commented Jan 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Jan 30, 2020

Uh oh!

dongjoon-hyun commented Jan 30, 2020

Uh oh!

dongjoon-hyun commented Jan 30, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dongjoon-hyun commented Jan 30, 2020 •

edited

Loading

PavithraRamachandran commented Jan 30, 2020 •

edited

Loading