Skip to content

Conversation

@davies
Copy link
Contributor

@davies davies commented Feb 10, 2015

Deprecate inferSchema() and applySchema(), use createDataFrame() instead, which could take an optional schema to create an DataFrame from an RDD. The schema could be StructType or list of names of columns.

@davies davies changed the title [SPARK-5704] [SQL] [PySpark] createDataFrame from RDD with columns [WIP] [SPARK-5704] [SQL] [PySpark] createDataFrame from RDD with columns Feb 10, 2015
@SparkQA
Copy link

SparkQA commented Feb 10, 2015

Test build #27188 has started for PR 4498 at commit 9526e97.

  • This patch merges cleanly.

@davies
Copy link
Contributor Author

davies commented Feb 10, 2015

@marmbrus @yhuai Does this work for you? cc @rxin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this change wrong? sampling should be off if ratio > 0.99?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, seems the code in master is wrong.

@SparkQA
Copy link

SparkQA commented Feb 10, 2015

Test build #27188 has finished for PR 4498 at commit 9526e97.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27188/
Test FAILed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rxin Is there a better name for this? createDataFrame is still too long (longer than 'applySchema')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked offline, we did figure out a better name than it.

@davies davies changed the title [WIP] [SPARK-5704] [SQL] [PySpark] createDataFrame from RDD with columns [SPARK-5704] [SQL] [PySpark] createDataFrame from RDD with columns Feb 10, 2015
@SparkQA
Copy link

SparkQA commented Feb 10, 2015

Test build #27229 has started for PR 4498 at commit d1bd8f2.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 10, 2015

Test build #27229 has finished for PR 4498 at commit d1bd8f2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27229/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Feb 10, 2015

Test build #596 has started for PR 4498 at commit d1bd8f2.

  • This patch merges cleanly.

@davies
Copy link
Contributor Author

davies commented Feb 10, 2015

@rxin this should be ready, please give another look.

@SparkQA
Copy link

SparkQA commented Feb 10, 2015

Test build #27239 has started for PR 4498 at commit c80a7a9.

  • This patch merges cleanly.

@rxin
Copy link
Contributor

rxin commented Feb 10, 2015

LGTM. @yhuai please take a look at the type inference stuff.

@SparkQA
Copy link

SparkQA commented Feb 11, 2015

Test build #596 has finished for PR 4498 at commit d1bd8f2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the Map at here a scala.collection.Map or Predef.Map (scala.collection.immutable.Map)? (We should use scala.collection.Map at here.)

@SparkQA
Copy link

SparkQA commented Feb 11, 2015

Test build #27239 has finished for PR 4498 at commit c80a7a9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27239/
Test PASSed.

@davies
Copy link
Contributor Author

davies commented Feb 11, 2015

After talk to @yhuai offline, he suggested that we could hold on for Scala API for createDataFrame(rdd, columns), it's not so useful right now. We can revisit it later.

@SparkQA
Copy link

SparkQA commented Feb 11, 2015

Test build #27255 has started for PR 4498 at commit 08469c1.

  • This patch merges cleanly.

@davies
Copy link
Contributor Author

davies commented Feb 11, 2015

@pwendell I think this PR is ready to go, just wait for jenkins or not. (The last commit just remove a API and the test for it), sorry for the late.

@SparkQA
Copy link

SparkQA commented Feb 11, 2015

Test build #27255 has finished for PR 4498 at commit 08469c1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27255/
Test PASSed.

asfgit pushed a commit that referenced this pull request Feb 11, 2015
Deprecate inferSchema() and applySchema(), use createDataFrame() instead, which could take an optional `schema` to create an DataFrame from an RDD. The `schema` could be StructType or list of names of columns.

Author: Davies Liu <davies@databricks.com>

Closes #4498 from davies/create and squashes the following commits:

08469c1 [Davies Liu] remove Scala/Java API for now
c80a7a9 [Davies Liu] fix hive test
d1bd8f2 [Davies Liu] cleanup applySchema
9526e97 [Davies Liu] createDataFrame from RDD with columns

(cherry picked from commit ea60284)
Signed-off-by: Michael Armbrust <michael@databricks.com>
@asfgit asfgit closed this in ea60284 Feb 11, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants