Skip to content

Conversation

@gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Jul 29, 2016

What changes were proposed in this pull request?

In Spark 2.0, SaveAsTable does not work when source DataFrame is built on a Hive Table, but Spark 1.6 works.

Spark 1.6

scala> sql("create table sample.sample stored as SEQUENCEFILE as select 1 as key, 'abc' as value")
res2: org.apache.spark.sql.DataFrame = []

scala> val df = sql("select key, value as value from sample.sample")
df: org.apache.spark.sql.DataFrame = [key: int, value: string]

scala> df.write.mode("append").saveAsTable("sample.sample")

scala> sql("select * from sample.sample").show()
+---+-----+
|key|value|
+---+-----+
|  1|  abc|
|  1|  abc|
+---+-----+

Spark 2.0

scala> df.write.mode("append").saveAsTable("sample.sample")
org.apache.spark.sql.AnalysisException: Saving data in MetastoreRelation sample, sample
 is not supported.;

So far, we do not plan to support it in Spark 2.0. Spark 1.6 works because it internally uses insertInto. But, if we change it back it will break the semantic of saveAsTable (this method uses by-name resolution instead of using by-position resolution used by insertInto).

Instead, users should use insertInto API. This PR corrects the error messages. Users can understand how to bypass it before we support it in a separate PR.

How was this patch tested?

Test cases are added

@gatorsmile
Copy link
Member Author

cc @yhuai @rxin @liancheng @cloud-fan

@SparkQA
Copy link

SparkQA commented Jul 29, 2016

Test build #63019 has finished for PR 14410 at commit 724fcf8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 30, 2016

Test build #63025 has finished for PR 14410 at commit f4f383c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

existingSchema = Some(DDLUtils.getSchemaFromTableProperties(s.metadata))
// When the source table is a Hive table, the `saveAsTable` API of DataFrameWriter
// does not work. Instead, use the `insertInto` API.
case o if o.getClass.getName == "org.apache.spark.sql.hive.MetastoreRelation" =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's really hacky to compare with the class name string... What's the error message before?

Copy link
Member Author

@gatorsmile gatorsmile Jul 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree! Do you have any idea how to improve it?

Below is the current error message:

org.apache.spark.sql.AnalysisException: Saving data in MetastoreRelation sample, sample
 is not supported.;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case o: MetastoreRelation doesn't work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetastoreRelation is in another package hive

@gatorsmile
Copy link
Member Author

@cloud-fan @yhuai https://github.com/apache/spark/compare/master...gatorsmile:saveAsTableFix?expand=1

This is the prototype for supporting this missing function. We can fix it with minor changes. To avoid too many calls on lookupRelation, we might need to move the plan conversion to a separate batch.

Let me know if we should do it in 2.0.1 or just issue an exception in 2.0.1.

Thanks!

@cloud-fan
Copy link
Contributor

Hi @gatorsmile can you send out a PR first? We definitely need it for 2.1, and we can decide if we should backport it to 2.0 later.

@gatorsmile
Copy link
Member Author

@cloud-fan Sure, will do it soon. Thanks!

@gatorsmile
Copy link
Member Author

We can support it in another PR: #14612. This can be closed

@gatorsmile gatorsmile closed this Oct 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants