-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11884] Drop multiple columns in the DataFrame API #9862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would need a varargs annotation and I don't think that we should duplicate the column resolution logic. Otherwise it might fall out of sync.
|
I am open to rewriting column resolution logic in the new method but may need some pointer since I am not familiar with this area of the codebase |
|
Why not just have the single column version delegate to this one instead of copying the code. |
|
Test build #46426 has finished for PR 9862 at commit
|
|
Yeah, I had this idea in mind. |
|
Test build #46439 has finished for PR 9862 at commit
|
|
Jenkins, test this please |
|
Test build #46434 has finished for PR 9862 at commit
|
|
Maybe define a unit test, just in case? |
|
I looked at sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala but testData has only one column |
|
I suggest having a look at SQLTestData. |
|
We are trying to delete that class. Just define a dataframe in the test. val df = Seq((1,2,3)).toDF("a", "b", "c") |
|
Thanks for the prompt hint, Michael. |
|
With I got: Some hint ? |
|
The answer is in the test output: checkAnswer(df, src.collect().map(x => Row(x.getInt(2))).toSeq) |
|
Thanks, Ben |
|
Test build #46442 has finished for PR 9862 at commit
|
|
Test build #46443 has finished for PR 9862 at commit
|
|
|
Jenkins, retest this please |
|
Jenkins, test this please |
|
Test build #46459 has finished for PR 9862 at commit
|
|
Jenkins, test this please. |
|
Test build #47171 has finished for PR 9862 at commit
|
|
Jenkins, test this please |
|
Test build #47172 has finished for PR 9862 at commit
|
|
Jenkins, test this please |
|
Test build #47173 has finished for PR 9862 at commit
|
|
Jenkins, test this please |
|
Test build #47182 has finished for PR 9862 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why using contains instead of using sqlContext.analyzer.resolver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about:
val resolver = sqlContext.analyzer.resolver
val remainingCols = schema.filter(f => colNames.forall(n => !resolver(f.name, n))).map(f => Column(f.name))
|
Test build #47186 has finished for PR 9862 at commit
|
|
Test build #47188 has finished for PR 9862 at commit
|
|
@cloud-fan @marmbrus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about just checkAnswer(df, Row(3))
|
Test build #47214 has finished for PR 9862 at commit
|
|
LGTM |
|
@marmbrus |
|
Thanks, merging to master. |
|
Thanks for the reviews, Michael and Wenchen |
|
There are two drop variants for single column: But there is only one drop accepting multiple column names, why there is no version accepting multiple Columns? |
|
I can send out another PR if other people think that variant is needed. This PR has been closed. |
See the thread Ben started:
http://search-hadoop.com/m/q3RTtveEuhjsr7g/
This PR adds drop() method to DataFrame which accepts multiple column names