-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-6040][SQL] Fix the percent bug in tablesample #4789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
test this please |
|
Test build #28014 has started for PR 4789 at commit
|
|
Test build #28014 has finished for PR 4789 at commit
|
|
Test FAILed. |
|
test this please |
|
@yhuai Can you trigger the test for me? |
|
retest this please |
|
ok to test |
|
Test build #28153 has started for PR 4789 at commit
|
|
Test build #28153 has finished for PR 4789 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "The range of fraction accepted by Sample is [0, 1]. Because Hive's block sampling function takes X PERCENT as the input and the range of X is [0, 100], we need to adjust the fraction."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yhuai Thanks for you help and I had done it.
|
Test build #28160 has started for PR 4789 at commit
|
|
Test build #28160 has finished for PR 4789 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to go ahead and merge this since it changes semantics and we are close to the release where we remove the alpha tag, but it would be great if you could add a test that actually checks to make sure sampling is happening and we are getting something close to the expected number of results.
HiveQL expression like `select count(1) from src tablesample(1 percent);` means take 1% sample to select. But it means 100% in the current version of the Spark. Author: q00251598 <qiyadong@huawei.com> Closes #4789 from watermen/SPARK-6040 and squashes the following commits: 2453ebe [q00251598] check and adjust the fraction. (cherry picked from commit 582e5a2) Signed-off-by: Michael Armbrust <michael@databricks.com>
HiveQL expression like
select count(1) from src tablesample(1 percent);means take 1% sample to select. But it means 100% in the current version of the Spark.