-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16931][PYTHON] PySpark APIS for bucketBy and sortBy #14517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
python/pyspark/sql/readwriter.py
Outdated
| self._jwrite = self._jwrite.partitionBy(_to_seq(self._spark._sc, cols)) | ||
| return self | ||
|
|
||
| @since(2.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since should be 2.1 since I don't think this will go into branch-2.0
|
ok to test |
|
Test build #63364 has finished for PR 14517 at commit
|
47d9ef7 to
0bc078a
Compare
|
Amended commit with style changes from MLNick. Can someone call the OK to test please |
0bc078a to
96df186
Compare
|
Test build #63411 has finished for PR 14517 at commit
|
|
Test build #63414 has finished for PR 14517 at commit
|
96df186 to
31c43e6
Compare
|
Test build #63422 has finished for PR 14517 at commit
|
31c43e6 to
ce9f9c0
Compare
|
Test build #63738 has finished for PR 14517 at commit
|
|
Test build #63739 has finished for PR 14517 at commit
|
python/pyspark/sql/readwriter.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pep8 is saying this line is too long (over 100 chars) so its failing the style tests. You can run the style tests locally with ./dev/lint-python as well for a faster turn around than Jenkins :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the note, I was getting annoyed at not knowing where to find the tools for such things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing - I'm always happy to help people get up to speed with contributing to PySpark so feel free to reach out to me if you get stuck with something similar.
|
Test build #64097 has finished for PR 14517 at commit
|
|
Test build #64106 has finished for PR 14517 at commit
|
|
Test build #64113 has finished for PR 14517 at commit
|
|
Test build #64227 has finished for PR 14517 at commit
|
|
Test build #64229 has finished for PR 14517 at commit
|
8eb8e71 to
dbb50ad
Compare
|
Test build #64240 has finished for PR 14517 at commit
|
|
What thoughts do people have about merging in? |
| except py4j.protocol.Py4JError: | ||
| spark = SparkSession(sc) | ||
|
|
||
| seed = int(time() * 1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to have determistic test, testing with parquet should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GregBowyer ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been really busy with work of late, but I will try to sort this out today
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GregBowyer ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GregBowyer Any progress on this? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GregBowyer ping. Let me propose to close this after a week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @zero323, would you maybe be interested in taking over this? I was thinking of taking over this if no one goes for it assuming it looks quite close to be merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon By all means. I prepared a bunch of tests (7d911c647f21ada7fb429fd7c1c5f15934ff8847) and extended a bit code provided by @GregBowyer (72c04a3f196da5223ebb44725aa88cffa81036e4). I think we can skip low level tests (direct access to the files) which are already present in Scala test base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zero323, Good to know. Then, please go ahead if you are ready :).
## What changes were proposed in this pull request? This PR proposes to close stale PRs. What I mean by "stale" here includes that there are some review comments by reviewers but the author looks inactive without any answer to them more than a month. I left some comments roughly a week ago to ping and the author looks still inactive in these PR below These below includes some PR suggested to be closed and a PR against another branch which seems obviously inappropriate. Given the comments in the last three PRs below, they are probably worth being taken over by anyone who is interested in it. Closes apache#7963 Closes apache#8374 Closes apache#11192 Closes apache#11374 Closes apache#11692 Closes apache#12243 Closes apache#12583 Closes apache#12620 Closes apache#12675 Closes apache#12697 Closes apache#12800 Closes apache#13715 Closes apache#14266 Closes apache#15053 Closes apache#15159 Closes apache#15209 Closes apache#15264 Closes apache#15267 Closes apache#15871 Closes apache#15861 Closes apache#16319 Closes apache#16324 Closes apache#16890 Closes apache#12398 Closes apache#12933 Closes apache#14517 ## How was this patch tested? N/A Author: hyukjinkwon <gurwls223@gmail.com> Closes apache#16937 from HyukjinKwon/stale-prs-close.
What changes were proposed in this pull request?
API access to allow pyspark to use bucketBy and sortBy in datraframes.