[SPARK-19452][SparkR] Fix bug in the name assignment method #16794

actuaryzhang · 2017-02-03T20:27:10Z

What changes were proposed in this pull request?

The names method fails to check for validity of the assignment values. This can be fixed by calling colnames within names.

How was this patch tested?

new tests.

actuaryzhang · 2017-02-03T20:27:50Z

An example illustrating the issue:

df <- suppressWarnings(createDataFrame(iris))
# this is error
colnames(df) <- NULL
# this should report error
names(df) <- NULL

@felixcheung @wangmiao1981

SparkQA · 2017-02-03T21:13:32Z

Test build #72321 has finished for PR 16794 at commit fce4b6f.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

actuaryzhang · 2017-02-03T22:14:13Z

Not sure why the unit test on kmeans summary failed since nothing was changed there. Also, all unit tests passed on my computer.

felixcheung · 2017-02-03T22:20:30Z

good catch, seems reasonable (although might be a small breaking behavior change)

Error in tests is with doc generation:

Error: processing vignette 'sparkr-vignettes.Rmd' failed with diagnostics:
error in evaluating the argument 'object' in selecting a method for function 'summary': Error: object 'kmeansModel' not found

wangmiao1981 · 2017-02-03T22:23:19Z

Warning in FUN(X[[1L]], ...) :
Use Sepal_Length instead of Sepal.Length as column name
Warning in FUN(X[[2L]], ...) :
Use Sepal_Width instead of Sepal.Width as column name
Warning in FUN(X[[3L]], ...) :
Use Petal_Length instead of Petal.Length as column name
Warning in FUN(X[[4L]], ...) :
Use Petal_Width instead of Petal.Width as column name
Warning in FUN(X[[1L]], ...) :
Use Sepal_Length instead of Sepal.Length as column name
Warning in FUN(X[[2L]], ...) :
Use Sepal_Width instead of Sepal.Width as column name
Warning in FUN(X[[3L]], ...) :
Use Petal_Length instead of Petal.Length as column name
Warning in FUN(X[[4L]], ...) :
Use Petal_Width instead of Petal.Width as column name
Warning in FUN(X[[1L]], ...) :
Use Sepal_Length instead of Sepal.Length as column name
Warning in FUN(X[[2L]], ...) :
Use Sepal_Width instead of Sepal.Width as column name
Warning in FUN(X[[3L]], ...) :
Use Petal_Length instead of Petal.Length as column name
Warning in FUN(X[[4L]], ...) :
Use Petal_Width instead of Petal.Width as column name
Warning in FUN(X[[4L]], ...) :
Use resid_ds instead of resid.ds as column name
Warning in FUN(X[[6L]], ...) :
Use ecog_ps instead of ecog.ps as column name
Warning in FUN(X[[1L]], ...) :
Use GNP_deflator instead of GNP.deflator as column name
Warning in FUN(X[[4L]], ...) :
Use Armed_Forces instead of Armed.Forces as column name
Warning in FUN(X[[1L]], ...) :
Use GNP_deflator instead of GNP.deflator as column name
Warning in FUN(X[[4L]], ...) :
Use Armed_Forces instead of Armed.Forces as column name
Quitting from lines 748-753 (sparkr-vignettes.Rmd)
Error: processing vignette 'sparkr-vignettes.Rmd' failed with diagnostics:
error in evaluating the argument 'object' in selecting a method for function 'summary': Error: object 'kmeansModel' not found

What are these warnings for? Seems relevant.

wangmiao1981 · 2017-02-03T22:24:17Z

try {r, warning=FALSE} for the warning cases?

SparkQA · 2017-02-03T23:01:59Z

Test build #72327 has finished for PR 16794 at commit 723b0a4.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-04T00:00:28Z

Test build #72330 has finished for PR 16794 at commit f7dc3c6.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

wangmiao1981 · 2017-02-04T01:07:11Z

still document failures.

wangmiao1981 · 2017-02-04T01:07:19Z

retest this please.

actuaryzhang · 2017-02-04T01:24:29Z

@felixcheung @wangmiao1981
I spent quite some time on this b/c I could not replicate the results and all tests on my end worked.
Then I updated my local spark with a pull request and found that all these error messages are introduced in #16767. I created another PR to fix this #16799 since this is a separate issue. Please review and merge #16799 first.

SparkQA · 2017-02-04T01:47:50Z

Test build #72335 has finished for PR 16794 at commit f7dc3c6.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-02-04T02:05:31Z

retest this please

SparkQA · 2017-02-04T02:51:01Z

Test build #72345 has finished for PR 16794 at commit f7dc3c6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2017-02-04T08:18:46Z

I'd prefer not to {r, warning=FALSE} on the whole vignettes block - it could mask unexpected warnings. But good point on it being very noisy - I was concerned to have suppressWarnings(createDataFrame(iris)) because it might seems confusing to user reading the example not understanding why we would have warning there.

To address all these noise, I'd propose either

we really should fix the root problem with . in column names SPARK-12191, SPARK-11976
or we should really move examples and documentation away from dataset that has . in column names

felixcheung · 2017-02-04T08:20:59Z

in any case, this PR LGTM. I'll target master for this fix.

felixcheung · 2017-02-05T00:14:57Z

opened SPARK-19460.

actuaryzhang · 2017-02-05T01:57:54Z

@felixcheung I am in favor of the first proposal to support dot in names. I'm curious why this was not supported yet since we can create DataFrame in Spark with dot in names?

val df = Seq((1.0, 2.0), (2.0, 3.0)).toDF("a.b", "b.c")

felixcheung · 2017-02-05T02:18:05Z

that's awesome - it's likely a recent change we have missed. If you dig through the history of various related JIRAs you should be able to pinpoint a couple of places (at least 2 or 3, on top of my head) that we are replacing . with _ on the R side. If it works as-is by removing those replacement R code then we are good. But, I wouldn't be surprised if we are going to run into some R bugs - R really like its . and it carries special meaning.

I'd really appreciate it if someone could look into this - the . -> _ has been problematic for a long time.

actuaryzhang · 2017-02-05T02:24:27Z

@felixcheung I'll be happy to look into this and fix it. Do you want me to open a new JIRA or just create a PR against SPARK-19460?

felixcheung · 2017-02-05T05:53:00Z

that would be great @actuaryzhang - I think SPARK-11976 would be the place to start.

felixcheung · 2017-02-05T19:38:19Z

merged to master.

## What changes were proposed in this pull request? The names method fails to check for validity of the assignment values. This can be fixed by calling colnames within names. ## How was this patch tested? new tests. Author: actuaryzhang <actuaryzhang10@gmail.com> Closes apache#16794 from actuaryzhang/sparkRNames.

fix bug in names and add test

fce4b6f

trigger test

723b0a4

trigger test

f7dc3c6

actuaryzhang mentioned this pull request Feb 4, 2017

[SPARK-19386][SparkR][Followup] fix error in vignettes #16799

Closed

actuaryzhang closed this Feb 5, 2017

actuaryzhang reopened this Feb 5, 2017

asfgit closed this in b94f4b6 Feb 5, 2017

actuaryzhang deleted the sparkRNames branch February 6, 2017 22:49

[SPARK-19452][SparkR] Fix bug in the name assignment method #16794

[SPARK-19452][SparkR] Fix bug in the name assignment method #16794

Uh oh!

Conversation

actuaryzhang commented Feb 3, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

actuaryzhang commented Feb 3, 2017

Uh oh!

SparkQA commented Feb 3, 2017

Uh oh!

actuaryzhang commented Feb 3, 2017

Uh oh!

felixcheung commented Feb 3, 2017

Uh oh!

wangmiao1981 commented Feb 3, 2017

Uh oh!

wangmiao1981 commented Feb 3, 2017

Uh oh!

SparkQA commented Feb 3, 2017

Uh oh!

SparkQA commented Feb 4, 2017

Uh oh!

wangmiao1981 commented Feb 4, 2017

Uh oh!

wangmiao1981 commented Feb 4, 2017

Uh oh!

actuaryzhang commented Feb 4, 2017

Uh oh!

SparkQA commented Feb 4, 2017

Uh oh!

gatorsmile commented Feb 4, 2017

Uh oh!

SparkQA commented Feb 4, 2017

Uh oh!

felixcheung commented Feb 4, 2017

Uh oh!

felixcheung commented Feb 4, 2017

Uh oh!

felixcheung commented Feb 5, 2017

Uh oh!

actuaryzhang commented Feb 5, 2017

Uh oh!

felixcheung commented Feb 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

actuaryzhang commented Feb 5, 2017

Uh oh!

felixcheung commented Feb 5, 2017

Uh oh!

felixcheung commented Feb 5, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

felixcheung commented Feb 5, 2017 •

edited

Loading