[SPARK-5704] [SQL] [PySpark] createDataFrame from RDD with columns #4498

davies · 2015-02-10T07:48:26Z

Deprecate inferSchema() and applySchema(), use createDataFrame() instead, which could take an optional schema to create an DataFrame from an RDD. The schema could be StructType or list of names of columns.

SparkQA · 2015-02-10T07:52:29Z

Test build #27188 has started for PR 4498 at commit 9526e97.

This patch merges cleanly.

davies · 2015-02-10T07:53:11Z

@marmbrus @yhuai Does this work for you? cc @rxin

rxin · 2015-02-10T08:40:29Z

python/pyspark/sql/context.py

isn't this change wrong? sampling should be off if ratio > 0.99?

Oh, seems the code in master is wrong.

SparkQA · 2015-02-10T09:24:36Z

Test build #27188 has finished for PR 4498 at commit 9526e97.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-10T09:24:39Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27188/
Test FAILed.

davies · 2015-02-10T18:20:42Z

python/pyspark/sql/context.py

@rxin Is there a better name for this? createDataFrame is still too long (longer than 'applySchema')

Talked offline, we did figure out a better name than it.

SparkQA · 2015-02-10T20:27:54Z

Test build #27229 has started for PR 4498 at commit d1bd8f2.

This patch merges cleanly.

SparkQA · 2015-02-10T21:23:32Z

Test build #27229 has finished for PR 4498 at commit d1bd8f2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-10T21:23:36Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27229/
Test FAILed.

SparkQA · 2015-02-10T22:59:45Z

Test build #596 has started for PR 4498 at commit d1bd8f2.

This patch merges cleanly.

davies · 2015-02-10T23:00:11Z

@rxin this should be ready, please give another look.

SparkQA · 2015-02-10T23:02:59Z

Test build #27239 has started for PR 4498 at commit c80a7a9.

This patch merges cleanly.

rxin · 2015-02-10T23:58:01Z

LGTM. @yhuai please take a look at the type inference stuff.

SparkQA · 2015-02-11T00:40:53Z

Test build #596 has finished for PR 4498 at commit d1bd8f2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2015-02-11T00:42:36Z

sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala

Is the Map at here a scala.collection.Map or Predef.Map (scala.collection.immutable.Map)? (We should use scala.collection.Map at here.)

SparkQA · 2015-02-11T00:48:25Z

Test build #27239 has finished for PR 4498 at commit c80a7a9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-11T00:48:29Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27239/
Test PASSed.

davies · 2015-02-11T01:47:10Z

After talk to @yhuai offline, he suggested that we could hold on for Scala API for createDataFrame(rdd, columns), it's not so useful right now. We can revisit it later.

SparkQA · 2015-02-11T01:47:25Z

Test build #27255 has started for PR 4498 at commit 08469c1.

This patch merges cleanly.

davies · 2015-02-11T01:49:06Z

@pwendell I think this PR is ready to go, just wait for jenkins or not. (The last commit just remove a API and the test for it), sorry for the late.

SparkQA · 2015-02-11T03:13:40Z

Test build #27255 has finished for PR 4498 at commit 08469c1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-11T03:13:44Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27255/
Test PASSed.

Deprecate inferSchema() and applySchema(), use createDataFrame() instead, which could take an optional `schema` to create an DataFrame from an RDD. The `schema` could be StructType or list of names of columns. Author: Davies Liu <davies@databricks.com> Closes #4498 from davies/create and squashes the following commits: 08469c1 [Davies Liu] remove Scala/Java API for now c80a7a9 [Davies Liu] fix hive test d1bd8f2 [Davies Liu] cleanup applySchema 9526e97 [Davies Liu] createDataFrame from RDD with columns (cherry picked from commit ea60284) Signed-off-by: Michael Armbrust <michael@databricks.com>

createDataFrame from RDD with columns

9526e97

davies changed the title ~~[SPARK-5704] [SQL] [PySpark] createDataFrame from RDD with columns~~ [WIP] [SPARK-5704] [SQL] [PySpark] createDataFrame from RDD with columns Feb 10, 2015

rxin reviewed Feb 10, 2015
View reviewed changes

davies reviewed Feb 10, 2015
View reviewed changes

davies mentioned this pull request Feb 10, 2015

Add toDataFrame to PySpark SQL #4421

Closed

cleanup applySchema

d1bd8f2

davies changed the title ~~[WIP] [SPARK-5704] [SQL] [PySpark] createDataFrame from RDD with columns~~ [SPARK-5704] [SQL] [PySpark] createDataFrame from RDD with columns Feb 10, 2015

fix hive test

c80a7a9

yhuai reviewed Feb 11, 2015
View reviewed changes

remove Scala/Java API for now

08469c1

asfgit closed this in ea60284 Feb 11, 2015

[SPARK-5704] [SQL] [PySpark] createDataFrame from RDD with columns #4498

[SPARK-5704] [SQL] [PySpark] createDataFrame from RDD with columns #4498

Uh oh!

Conversation

davies commented Feb 10, 2015

Uh oh!

SparkQA commented Feb 10, 2015

Uh oh!

davies commented Feb 10, 2015

Uh oh!

rxin Feb 10, 2015

Choose a reason for hiding this comment

Uh oh!

yhuai Feb 10, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 10, 2015

Uh oh!

AmplabJenkins commented Feb 10, 2015

Uh oh!

davies Feb 10, 2015

Choose a reason for hiding this comment

Uh oh!

davies Feb 10, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 10, 2015

Uh oh!

SparkQA commented Feb 10, 2015

Uh oh!

AmplabJenkins commented Feb 10, 2015

Uh oh!

SparkQA commented Feb 10, 2015

Uh oh!

davies commented Feb 10, 2015

Uh oh!

SparkQA commented Feb 10, 2015

Uh oh!

rxin commented Feb 10, 2015

Uh oh!

SparkQA commented Feb 11, 2015

Uh oh!

yhuai Feb 11, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 11, 2015

Uh oh!

AmplabJenkins commented Feb 11, 2015

Uh oh!

davies commented Feb 11, 2015

Uh oh!

SparkQA commented Feb 11, 2015

Uh oh!

davies commented Feb 11, 2015

Uh oh!

SparkQA commented Feb 11, 2015

Uh oh!

AmplabJenkins commented Feb 11, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants