-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17480][SQL][FOLLOWUP] Fix more instances which calls List.length/size which is O(n) #15093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@srowen Could you please review this? |
| gamma: Double): CoordinateMatrix = { | ||
| require(gamma > 1.0, s"Oversampling should be greater than 1: $gamma") | ||
| require(colMags.size == this.numCols(), "Number of magnitudes didn't match column dimension") | ||
| require(colMags.length == this.numCols(), "Number of magnitudes didn't match column dimension") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is different, yeah. You're just avoiding the implicit conversion. While that's sort of good I think it's separate and maybe something to change or not change everywhere at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I will revert this change.
|
@srowen Please let me check this out again tomorrow. |
|
Test build #65366 has finished for PR 15093 at commit
|
|
Test build #65369 has finished for PR 15093 at commit
|
| var i = 0 | ||
| while (i < groupedWindowExpressions.size) { | ||
| val ((partitionSpec, orderSpec), windowExpressions) = groupedWindowExpressions(i) | ||
| groupedWindowExpressions.foreach { case ((partitionSpec, orderSpec), windowExpressions) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @srowen , It seems groupedWindowExpressions is Seq[((Seq[Expression], Seq[SortOrder]), ArrayBuffer[NamedExpression])] but the desired output is LogicalPlan. So, it seems I can't directly use fold or reduce simply. So, could this just use foreach rather than others? Otherwise, please teach me for a better expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well currentChild is a Window, and that's the only thing whose computation is changed here. How about:
val currentChild =
groupedWindowExpressions.foldLeft(child) {
case (last, ((partitionSpec, orderSpec), windowExpressions)) =>
Window(windowExpressions, partitionSpec, orderSpec, last)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's clear and better. I will try. Thanks!
|
Test build #65440 has finished for PR 15093 at commit
|
|
OK, looks good. Some of these are .size -> .length changes, but I suppose those are related and positive. It avoids an extra method invocation, which might matter in a very tight loop. |
| val currentChild = | ||
| groupedWindowExpressions.foldLeft(child) { | ||
| case (last, ((partitionSpec, orderSpec), windowExpressions)) => | ||
| // Set currentChild to the newly created Window operator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, one last thing, can you remove this comment? and update the comment above and below this expression? they're no longer applicable. Actually currentChild isn't a great name now anyway. eh, windowOps or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, I should've checked this closely. I will update soon.
|
Test build #65535 has finished for PR 15093 at commit
|
|
retest this please |
|
Test build #65534 has finished for PR 15093 at commit
|
|
Merging to master, and to 2.0 to match the parent JIRA, even though this is probably less important, yet still a clean small win |
…th/size which is O(n) This PR fixes all the instances which was fixed in the previous PR. To make sure, I manually debugged and also checked the Scala source. `length` in [LinearSeqOptimized.scala#L49-L57](https://github.com/scala/scala/blob/2.11.x/src/library/scala/collection/LinearSeqOptimized.scala#L49-L57) is O(n). Also, `size` calls `length` via [SeqLike.scala#L106](https://github.com/scala/scala/blob/2.11.x/src/library/scala/collection/SeqLike.scala#L106). For debugging, I have created these as below: ```scala ArrayBuffer(1, 2, 3) Array(1, 2, 3) List(1, 2, 3) Seq(1, 2, 3) ``` and then called `size` and `length` for each to debug. I ran the bash as below on Mac ```bash find . -name *.scala -type f -exec grep -il "while (.*\\.length)" {} \; | grep "src/main" find . -name *.scala -type f -exec grep -il "while (.*\\.size)" {} \; | grep "src/main" ``` and then checked each. Author: hyukjinkwon <gurwls223@gmail.com> Closes #15093 from HyukjinKwon/SPARK-17480-followup. (cherry picked from commit 86c2d39) Signed-off-by: Sean Owen <sowen@cloudera.com>
|
Test build #65536 has finished for PR 15093 at commit
|
|
@HyukjinKwon @srowen I am not sure how exactly backporting a patch led to an empty file, but this does not seem right. I am reverting this commit in branch 2.0. Please make a new PR to fix this in branch 2.0 correctly. |
|
Oh weird! no idea why that happened. Yeah I'll take care of it from here. |
|
I reverted already! |
…th/size which is O(n) This PR fixes all the instances which was fixed in the previous PR. To make sure, I manually debugged and also checked the Scala source. `length` in [LinearSeqOptimized.scala#L49-L57](https://github.com/scala/scala/blob/2.11.x/src/library/scala/collection/LinearSeqOptimized.scala#L49-L57) is O(n). Also, `size` calls `length` via [SeqLike.scala#L106](https://github.com/scala/scala/blob/2.11.x/src/library/scala/collection/SeqLike.scala#L106). For debugging, I have created these as below: ```scala ArrayBuffer(1, 2, 3) Array(1, 2, 3) List(1, 2, 3) Seq(1, 2, 3) ``` and then called `size` and `length` for each to debug. I ran the bash as below on Mac ```bash find . -name *.scala -type f -exec grep -il "while (.*\\.length)" {} \; | grep "src/main" find . -name *.scala -type f -exec grep -il "while (.*\\.size)" {} \; | grep "src/main" ``` and then checked each. Author: hyukjinkwon <gurwls223@gmail.com> Closes #15093 from HyukjinKwon/SPARK-17480-followup. (cherry picked from commit 86c2d39) Signed-off-by: Sean Owen <sowen@cloudera.com>
|
Yep, saw that. I re-merged this, and yes during conflict resolution QuantileSummaries.scala comes up as a file added only in the master branch, but when I choose to not take the change in the IDE, I see it actually resulted in adding an empty file. I made sure that was not part of the commit and pushed again. Looks as intended now: 5fd354b |
…th/size which is O(n) ## What changes were proposed in this pull request? This PR fixes all the instances which was fixed in the previous PR. To make sure, I manually debugged and also checked the Scala source. `length` in [LinearSeqOptimized.scala#L49-L57](https://github.com/scala/scala/blob/2.11.x/src/library/scala/collection/LinearSeqOptimized.scala#L49-L57) is O(n). Also, `size` calls `length` via [SeqLike.scala#L106](https://github.com/scala/scala/blob/2.11.x/src/library/scala/collection/SeqLike.scala#L106). For debugging, I have created these as below: ```scala ArrayBuffer(1, 2, 3) Array(1, 2, 3) List(1, 2, 3) Seq(1, 2, 3) ``` and then called `size` and `length` for each to debug. ## How was this patch tested? I ran the bash as below on Mac ```bash find . -name *.scala -type f -exec grep -il "while (.*\\.length)" {} \; | grep "src/main" find . -name *.scala -type f -exec grep -il "while (.*\\.size)" {} \; | grep "src/main" ``` and then checked each. Author: hyukjinkwon <gurwls223@gmail.com> Closes apache#15093 from HyukjinKwon/SPARK-17480-followup.
What changes were proposed in this pull request?
This PR fixes all the instances which was fixed in the previous PR.
To make sure, I manually debugged and also checked the Scala source.
lengthin LinearSeqOptimized.scala#L49-L57 is O(n). Also,sizecallslengthvia SeqLike.scala#L106.For debugging, I have created these as below:
and then called
sizeandlengthfor each to debug.How was this patch tested?
I ran the bash as below on Mac
and then checked each.