-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark 3.4.0 and DBR 12.2 support #699
Conversation
@chris-twiner yes, please! could you also adjust the build slightly so the build is green? Sorry for some delays in reviews, I'll help you to make it merged once I have some time! |
@chris-twiner could you rebase on top of main branch? (I can do it too, but it may lead to issues on your machine) |
Fixed builds here 🤦 #709 |
the builds weren't broken per se, the generated yml was, per 186fa2d it was due to windows . Now it shows up an actual test failure on KMeansTest - will start looking at that. |
re delays - np |
the KMeans issue is probably due to https://issues.apache.org/jira/browse/SPARK-30661 as that's the code that is throwing. There are two issues, one is sometimes:
the other due to centerSquaredNorms having a size of 1 but k is 2. I try setting k to 1 and it states that's invalid for this input vector. I can attempt to identify it further but this is a bit out of my wheelhouse. I can only assume, given I find nothing else on it in the jiras, that either it's really a new 3.4 bug or the test inputs were broken and 3.4 doesn't tolerate it. |
It's a "bug" in the tests, pre 3.4 was tolerant of the Arbitrary data, that is not the case in 3.4. I've managed to prove the extra code I've added to the test works in intellij for 3.4 and, when manually swapping out versions in the build, 3.3.2, 32.3 and 3.1.3. However running directly from the build does not work, neither does it on CI. The tests fail at:
which is likely due to the mapGroups workaround. <-- It was not, and more likely the shutdown of the job |
Codecov Report
@@ Coverage Diff @@
## master #699 +/- ##
==========================================
+ Coverage 95.01% 95.52% +0.50%
==========================================
Files 65 67 +2
Lines 1184 1184
Branches 28 37 +9
==========================================
+ Hits 1125 1131 +6
+ Misses 59 53 -6
Flags with carried forward coverage won't be shown. Click here to find out more.
|
…arrays are derived in TypedEncoder so no need for that either
based on coverage results the scalareflection stuff in FramelessInternals is trimmed down to a bare minimum. I have also removed the column (_.expr) function as it's not used anywhere. The test failure here I've also fixed for rlike to check for a valid generated regex. |
wrt KMeansTest failure, the euclideanUpdaeInPlace ArrayIndexOutOfBoundsException now occurs in 45 out of 1000 runs it seems (i.e. out of 10,000 maximum actual checks ~0.45 -> 4.5% of the time). Running the average tests, similarly failure didn't occur in 500(0) runs (so prob far less than 0.01% occurrence). The actual input was described in the exception as a List of strings rather than a List of Longs, which is odd, I cannot reproduce. I don't know that it makes sense to try and resolve these tests further given the low percentage occurrence and that the "issues" are not in frameless code. As such I've added a wrapper to retry in the case of that specific issue for kmeans and avg. Alternatively the spark test approach could be used instead of scalacheck for kmeans, just consting the vectors. |
…han tests, and attempt to remove that from coverage
as review hasn't started yet I hope this approach is ok - via imports only, there was too much code change for something that should be transparent. Note the two lines of code shown as untested in previous patches (code gen for DisambiguiateLeft/Right) were never tested, I've forced the issue. |
@pomadchin - sorry I just realised you may not have been alerted it was green - hopefully this does alert you, since there are other releases in between the mima stuff may need version bumps though. |
@chris-twiner yes!! Thanks for pointing it out; I’ll hopefully take a closer look until this Friday, or the beginning of the next week. |
It's been a holiday weekend! Will return to this one this week, sorry for a delay! |
np, fyi - 3.5.0 doesn't seem to have any issues with this snapshot. |
71180e3
to
9228809
Compare
9228809
to
b9be1b5
Compare
} | ||
|
||
check(prop) | ||
check(prop3[Double]) | ||
tolerantRun( _.isInstanceOf[ArrayIndexOutOfBoundsException] ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without the tolerantRun it fails often enough on 3.4 to stop the build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! A nice approach. I cleaned up the build file a bit but rather, than that the change looks good. Thanks for the incredible work!
Created a follow up issue #717 |
52851e3
to
6d1c69c
Compare
6d1c69c
to
c831b70
Compare
Thanks for a great contribution one more time! 🔥 |
As with the original 3.x you are mtw! |
#698 - until 3.4.0 is released it won't compile
Closes #698
Closes #506