Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

What changes were proposed in this pull request?

This PR aims to support comparators, e.g. '<', '<=', '>', '>=', again in Apache Spark 2.0 for backward compatibility.

Spark 1.6.2

scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)")
res0: org.apache.spark.sql.DataFrame = [result: string]

scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
res1: org.apache.spark.sql.DataFrame = [result: string]

Spark 2.0

scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)")
res0: org.apache.spark.sql.DataFrame = []

scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '<' expecting {')', ','}(line 1, pos 42)

After this PR, it's supported.

How was this patch tested?

Pass the Jenkins test with a newly added testcase.

@SparkQA
Copy link

SparkQA commented Sep 29, 2016

Test build #66123 has finished for PR 15302 at commit c6c52fe.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 30, 2016

Test build #66130 has finished for PR 15302 at commit b59d622.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Rebased to the master.

@SparkQA
Copy link

SparkQA commented Oct 1, 2016

Test build #66219 has finished for PR 15302 at commit eca9c86.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

The only failure is the following in ColumnTypeSuite. It's irrelevant and the testsuite passed locally.

[info] - MAP append/extract *** FAILED *** (2 milliseconds)
[info]   java.lang.IllegalArgumentException:
[info]   at java.nio.Buffer.position(Buffer.java:244)
[info]   at org.apache.spark.sql.catalyst.expressions.UnsafeRow.writeFieldTo(UnsafeRow.java:650)

@dongjoon-hyun
Copy link
Member Author

Retest this please.

@SparkQA
Copy link

SparkQA commented Oct 1, 2016

Test build #66225 has finished for PR 15302 at commit eca9c86.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Hi, @hvanhovell .
Could you review this PR about 'ALTER TABLE DROP PARTITION'?

@dongjoon-hyun
Copy link
Member Author

Hi, @hvanhovell .
Could you give some opinion about this PR when you have some time?

@hvanhovell
Copy link
Contributor

@dongjoon-hyun I have taken a quick look. Shouldn't we just use Expressions for filtering partitions?

@dongjoon-hyun
Copy link
Member Author

Thank you for review, @hvanhovell ! Do you mean SQL grammar or listPartition?

@dongjoon-hyun
Copy link
Member Author

Recently, I've watched you improved those related function greatly.

@hvanhovell
Copy link
Contributor

@dongjoon-hyun I think that AlterTableDropPartitionCommand just should take a set of catalyst Expressions instead of a PartitionRangeSpec.

I have added a few things the Catalog but I think that we shouldn't use those here. The `SessionCatalog is much more suited here.

@dongjoon-hyun
Copy link
Member Author

I see. Then, how can evaluate the generic expression? Is it okay to use 'eval(null)'?

@dongjoon-hyun
Copy link
Member Author

Thank you for the direction. I'll proceed to improve in that way.

@SparkQA
Copy link

SparkQA commented Oct 3, 2016

Test build #66276 has finished for PR 15302 at commit 3c0585b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

The only failure looks irrelevant. Anyway, I'm revising the PR.

[info] *** 1 SUITE ABORTED ***
[error] Error: Total 2604, Failed 0, Errors 1, Passed 2603, Ignored 48
[error] Error during tests:
[error]     org.apache.spark.sql.jdbc.JDBCWriteSuite
[error] (sql/test:test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 718 s, completed Oct 3, 2016 2:23:41 PM

@dongjoon-hyun
Copy link
Member Author

Hi, @hvanhovell .

When using Expression, I faced two situations.

  • checkAnalysis raises exceptions because the column is unresolved, e.g., country is unresolved.
  • As a workaround, I tried to use string literal 'country', but then optimizer ConstantFolding replaces that as false because 'country' < 'KR' is false.
ALTER TABLE sales DROP PARTITION (country < 'KR')

To avoid this situations, I can add some rule to checkAnalysis. But, it seems not a good idea. Could you give some advice for this?

@dongjoon-hyun
Copy link
Member Author

With today's master, it's like the following. Should we use expression in AlterTableDropPartitionCommand?

org.apache.spark.sql.AnalysisException: cannot resolve '`country`' given input columns: []; line 1 pos 23;
'AlterTableDropPartitionCommand `sales`, [('country < KR)], false, false

@dongjoon-hyun
Copy link
Member Author

Hi, @hvanhovell .
In DDL, do we have an example to use Expression like this?

@dongjoon-hyun
Copy link
Member Author

Hi, @hvanhovell .
I made another attempt #15704 by using 'Expression' as you commented.

@hvanhovell
Copy link
Contributor

@dongjoon-hyun I'll take a look tomorrow.

@dongjoon-hyun
Copy link
Member Author

Thank you, @hvanhovell !

@dongjoon-hyun
Copy link
Member Author

I'm closing this PR in favor of #15704 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants