-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-15740] [MLLIB] Word2VecSuite "big model load / save" caused OOM in maven jenkins builds #13509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…igger partitioning
|
(Fix the title please) https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark |
|
I noticed a scala style error, wait till new commit before triggering a jenkins build. |
|
Can anyone verify this? |
|
I triggered multiple test runs. |
|
Test build #3112 has finished for PR 13509 at commit
|
|
Test build #3113 has finished for PR 13509 at commit
|
|
Test build #3111 has finished for PR 13509 at commit
|
|
The only thing I don't like is that "64m" hard coded, but I couldn't find where default spark confs are stored! |
| // est. size of this model, given the formula: | ||
| // (floatSize * vectorSize + 15) * numWords | ||
| // (4 * 10 + 15) * 10 = 550 | ||
| // therefore it should generate 12 partitions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"12 partitions" --> "multiple partitions" (The exact number isn't important.)
|
I don't think you can access the default confs in this case. The class KryoSerializer seems to store those privately. |
|
I corrected the style errors you pointed out. If you say I cannot retrieve default values, I will leave the 64m hard coded that way. |
|
I verified locally that the test creates a model file with multiple partitions, so LGTM I'll merge once tests run again. Thanks! |
|
Test build #3164 has finished for PR 13509 at commit
|
|
Test build #3166 has finished for PR 13509 at commit
|
|
Merging with master and branch-2.0 |
… in maven jenkins builds ## What changes were proposed in this pull request? "test big model load / save" in Word2VecSuite, lately resulted into OOM. Therefore we decided to make the partitioning adaptive (not based on spark default "spark.kryoserializer.buffer.max" conf) and then testing it using a small buffer size in order to trigger partitioning without allocating too much memory for the test. ## How was this patch tested? It was tested running the following unit test: org.apache.spark.mllib.feature.Word2VecSuite Author: tmnd1991 <antonio.murgia2@studio.unibo.it> Closes #13509 from tmnd1991/SPARK-15740. (cherry picked from commit 040f6f9) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
What changes were proposed in this pull request?
"test big model load / save" in Word2VecSuite, lately resulted into OOM.
Therefore we decided to make the partitioning adaptive (not based on spark default "spark.kryoserializer.buffer.max" conf) and then testing it using a small buffer size in order to trigger partitioning without allocating too much memory for the test.
How was this patch tested?
It was tested running the following unit test:
org.apache.spark.mllib.feature.Word2VecSuite