Skip to content

Conversation

@dbtsai
Copy link
Member

@dbtsai dbtsai commented May 4, 2018

What changes were proposed in this pull request?

The exception message should clearly distinguish sorting and bucketing in save and jdbc write.

When a user tries to write a sorted data using save or insertInto, it will throw an exception with message that s"'$operation' does not support bucketing right now"".

We should throw s"'$operation' does not support sortBy right now"" instead.

How was this patch tested?

More tests in DataFrameReaderWriterSuite.scala

@dbtsai
Copy link
Member Author

dbtsai commented May 4, 2018

cc @cloud-fan @viirya

@SparkQA
Copy link

SparkQA commented May 4, 2018

Test build #90206 has finished for PR 21235 at commit 72efec1.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

case (true, false) =>
throw new AnalysisException(s"'$operation' does not support bucketing right now")
case (false, true) =>
throw new AnalysisException(s"'$operation' does not support sorting right now")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sorting is only used to sort data in each bucket. This is different from the general sorting

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is the sorting in each bucket.

If a user just calls writer.sortBy without calling bucketBy, the user will get s"'$operation' does not support bucketing right now" which is hard to understand what's going on.

For the case of sortBy is enabled, and bucketBy is disabled, how about I change the error message to sortBy must be used together with bucketBy, and '$operation' does not support bucketBy right now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just '$operation' does not support sortBy right now?

@SparkQA
Copy link

SparkQA commented May 4, 2018

Test build #90208 has finished for PR 21235 at commit 1a57a84.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

(numBuckets.isDefined, sortColumnNames.isDefined) match {
case (true, true) =>
throw new AnalysisException(
s"'$operation' does not support bucketing and sorting right now")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to clearly state it, how about '$operation' does not support bucketBy and sortBy right now? So to avoid confusing with general sorting.


private def assertNotBucketed(operation: String): Unit = {
if (numBuckets.isDefined || sortColumnNames.isDefined) {
throw new AnalysisException(s"'$operation' does not support bucketing right now")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about keeping the function name unchanged and just changing this message and list the sort columns if having. Something like:

'$operation' does not support bucketing. Number of buckets: ...; sortBy: ...;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you.

I also found in getBucketSpec, when numBuckets.isEmpty && sortColumnNames.isDefined, it will throw IllegalArgumentException.

How about alternatively, we throw AnalysisException for all the cases for consistency?

  private def getBucketSpec: Option[BucketSpec] = {
    assertNotSortByOrBucketedBy()

    numBuckets.map { n =>
      BucketSpec(n, bucketColumnNames.get, sortColumnNames.getOrElse(Nil))
    }
  }

  private def assertNotSortByOrBucketedBy(): Unit = {
    if (sortColumnNames.isDefined && numBuckets.isEmpty) {
      throw new AnalysisException("sortBy must be used together with bucketBy")
    }
  }

  private def assertNotBucketedAndNotSorted(operation: String): Unit = {
    assertNotSortByOrBucketedBy()

    if (numBuckets.isDefined) {
      if (sortColumnNames.isDefined) {
        throw new AnalysisException(
          s"'$operation' does not support bucketBy and sortBy within a bucket right now")
      } else {
        throw new AnalysisException(s"'$operation' does not support bucketBy right now")
      }
    }
  }

Copy link
Contributor

@cloud-fan cloud-fan May 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about minimizing the code changes

private def assertNotBucketed(operation: String): Unit = {
  if (getBucketSpec.isDefined) {
    throw new AnalysisException(s"'$operation' does not support bucketing right now, bucketBy and sortBy cannot be called.")
  }
}

@cloud-fan
Copy link
Contributor

LGTM

@dbtsai
Copy link
Member Author

dbtsai commented May 9, 2018

I'll merge into master once the test passes. Thanks.

@viirya
Copy link
Member

viirya commented May 9, 2018

LGTM

@SparkQA
Copy link

SparkQA commented May 9, 2018

Test build #90395 has finished for PR 21235 at commit 7b64a97.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented May 9, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented May 9, 2018

Test build #90402 has finished for PR 21235 at commit 7b64a97.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dbtsai
Copy link
Member Author

dbtsai commented May 9, 2018

Merged into master.

@asfgit asfgit closed this in 6ea582e May 9, 2018
@dbtsai dbtsai deleted the fixException branch May 9, 2018 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants