Skip to content

Conversation

@PavithraRamachandran
Copy link
Contributor

What changes were proposed in this pull request?

Root Cause:
When a dataframe is created using select statement (using spark.sql.parser.quotedRegexColumnNames=true) dataframe fill is called- the fillCol in DataFrameNaFunctions, ``(backtick) are added explicitly to the columnNames, the column name is misunderstood to be a regex and it is set as an unresolvedregex, which makes the coalesce resolving to fail.

Observation
When we create the dataframe from the select statement using a regex, valid columns names are returned after applying the filter(regex). So adding backticks to column name in this flow was not needed. To check the impact, select statement with regex were used, there was no impact while executing without the backticks.

After Fix
While passing the columnname to the dataframe column method, ``(backtick) are not added, as the value that is received is not a regular expression, but a valid column name.

Why are the changes needed?

By doing this change column name is not considered as regex and the proper Column function is
And does not fail to resolve the expression.

Does this PR introduce any user-facing change?

NA

How was this patch tested?

unit test

@dongjoon-hyun
Copy link
Member

ok to test

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28897][Core]'coalesce' error when executing dataframe.na.fill [SPARK-28897][SQL] 'coalesce' error when executing dataframe.na.fill Jan 30, 2020
@dongjoon-hyun
Copy link
Member

Hi, @PavithraRamachandran . Thank you for making a PR. Could you open this to master first? To prevent any regression at 3.0.0, we always start to merge at master first. I'll close this.

@PavithraRamachandran
Copy link
Contributor Author

@dongjoon-hyun this issue is not present in master. It got fixed due to some implementation changes done for https://issues.apache.org/jira/browse/SPARK-29890

@dongjoon-hyun
Copy link
Member

Then, can we backport that? We want to minimize a different implementation.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 30, 2020

Did you ping on the JIRA or that PR? You should do that first.

@PavithraRamachandran
Copy link
Contributor Author

PavithraRamachandran commented Jan 30, 2020

i pinged on the above jira and was working it. the implementation change made for resolving jira SPARk-29890 fixed this jira issue too in master. I was not sure if the entire changes made for JIRA-29890 is needed in spark 2.4 , So i raised my fix. If we can backport JIRA-29890 , then we can close this and close the above jira too by backporting.

@dongjoon-hyun
Copy link
Member

Sorry, but are you sure? I cannot find your comment on https://issues.apache.org/jira/browse/SPARK-29890 .

i pinged on the above jira and was working it.

@dongjoon-hyun
Copy link
Member

First of all, we need to close SPARK-28897 as a duplicate of SPARK-29890. Then, we need to ask a backport. That's the way.

@dongjoon-hyun
Copy link
Member

I understand your feeling, but we prefer to have a consistent JIRA and patch for the same issues of the different branches. BTW, we don't backport everything. Since you asked, I pinging on SPARK-28897 PR. Let's see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants