-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24181][SQL] Better error message for writing sorted data #21235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…g in save() and jdbc write
|
Test build #90206 has finished for PR 21235 at commit
|
| case (true, false) => | ||
| throw new AnalysisException(s"'$operation' does not support bucketing right now") | ||
| case (false, true) => | ||
| throw new AnalysisException(s"'$operation' does not support sorting right now") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sorting is only used to sort data in each bucket. This is different from the general sorting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is the sorting in each bucket.
If a user just calls writer.sortBy without calling bucketBy, the user will get s"'$operation' does not support bucketing right now" which is hard to understand what's going on.
For the case of sortBy is enabled, and bucketBy is disabled, how about I change the error message to sortBy must be used together with bucketBy, and '$operation' does not support bucketBy right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just '$operation' does not support sortBy right now?
|
Test build #90208 has finished for PR 21235 at commit
|
| (numBuckets.isDefined, sortColumnNames.isDefined) match { | ||
| case (true, true) => | ||
| throw new AnalysisException( | ||
| s"'$operation' does not support bucketing and sorting right now") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to clearly state it, how about '$operation' does not support bucketBy and sortBy right now? So to avoid confusing with general sorting.
|
|
||
| private def assertNotBucketed(operation: String): Unit = { | ||
| if (numBuckets.isDefined || sortColumnNames.isDefined) { | ||
| throw new AnalysisException(s"'$operation' does not support bucketing right now") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about keeping the function name unchanged and just changing this message and list the sort columns if having. Something like:
'$operation' does not support bucketing. Number of buckets: ...; sortBy: ...;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you.
I also found in getBucketSpec, when numBuckets.isEmpty && sortColumnNames.isDefined, it will throw IllegalArgumentException.
How about alternatively, we throw AnalysisException for all the cases for consistency?
private def getBucketSpec: Option[BucketSpec] = {
assertNotSortByOrBucketedBy()
numBuckets.map { n =>
BucketSpec(n, bucketColumnNames.get, sortColumnNames.getOrElse(Nil))
}
}
private def assertNotSortByOrBucketedBy(): Unit = {
if (sortColumnNames.isDefined && numBuckets.isEmpty) {
throw new AnalysisException("sortBy must be used together with bucketBy")
}
}
private def assertNotBucketedAndNotSorted(operation: String): Unit = {
assertNotSortByOrBucketedBy()
if (numBuckets.isDefined) {
if (sortColumnNames.isDefined) {
throw new AnalysisException(
s"'$operation' does not support bucketBy and sortBy within a bucket right now")
} else {
throw new AnalysisException(s"'$operation' does not support bucketBy right now")
}
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about minimizing the code changes
private def assertNotBucketed(operation: String): Unit = {
if (getBucketSpec.isDefined) {
throw new AnalysisException(s"'$operation' does not support bucketing right now, bucketBy and sortBy cannot be called.")
}
}
|
LGTM |
|
I'll merge into master once the test passes. Thanks. |
|
LGTM |
|
Test build #90395 has finished for PR 21235 at commit
|
|
retest this please. |
|
Test build #90402 has finished for PR 21235 at commit
|
|
Merged into master. |
What changes were proposed in this pull request?
The exception message should clearly distinguish sorting and bucketing in
saveandjdbcwrite.When a user tries to write a sorted data using save or insertInto, it will throw an exception with message that
s"'$operation' does not support bucketing right now"".We should throw
s"'$operation' does not support sortBy right now""instead.How was this patch tested?
More tests in
DataFrameReaderWriterSuite.scala