-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24058][ML][PySpark] Default Params in ML should be saved separately: Python API #21153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #89840 has finished for PR 21153 at commit
|
|
Test build #89841 has finished for PR 21153 at commit
|
| jsonParams = paramMap | ||
| else: | ||
| for p in params: | ||
| jsonParams[p.name] = params[p] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think use _paramMap.copy() will be simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_paramMap's keys are Param not string.
| # Default param values | ||
| jsonDefaultParams = {} | ||
| for p in instance._defaultParamMap: | ||
| jsonDefaultParams[p.name] = instance._defaultParamMap[p] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar, use _defaultParamMap.copy()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
python/pyspark/ml/util.py
Outdated
| for paramName in metadata['defaultParamMap']: | ||
| param = instance.getParam(paramName) | ||
| paramValue = metadata['defaultParamMap'][paramName] | ||
| instance._setDefault(**{param.name: paramValue}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove line param = instance.getParam(paramName) and change this line to
instance._setDefault(**{paramName: paramValue})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right. My bad.
|
Test build #89924 has finished for PR 21153 at commit
|
jkbradley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a quick comment about moving version utils to a separate PR. I'll try to review the other parts soon. Thanks!
python/pyspark/util.py
Outdated
| return argspec | ||
|
|
||
|
|
||
| def majorMinorVersion(version): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this affects Spark core, it might be nice to put this in a separate PR. Also, shall we make this match the Scala API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, shall we make this match the Scala API?
Do you mean to throw exception when unable to parse Spark version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please, a few things:
- throw exception when unable to parse
- make it a static method of a class pyspark.util.VersionUtils
- add unit test
|
Test build #90393 has finished for PR 21153 at commit
|
|
Test build #90394 has finished for PR 21153 at commit
|
|
Would you mind rebasing this off of the upstream master branch? I'm having trouble running the tests for this PR locally. |
|
Done. Can you try again? |
|
Test build #90486 has finished for PR 21153 at commit
|
|
Not sure why AppVeyor build failed. How to re-trigger AppVeyor build? |
|
Test build #4179 has finished for PR 21153 at commit
|
jkbradley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. This LGTM except for the small typo (Hopefully fixing it will re-run the Appveyor test)
python/pyspark/ml/util.py
Outdated
| - sparkVersion | ||
| - uid | ||
| - paramMap | ||
| - defalutParamMap (since 2.4.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
|
Test build #90612 has finished for PR 21153 at commit
|
|
OK. Looks like AppVeyor build is ok now. |
|
OK thanks @viirya ! |
|
Thanks @jkbradley @WeichenXu123 |
What changes were proposed in this pull request?
See SPARK-23455 for reference. Now default params in ML are saved separately in metadata file in Scala. We must change it for Python for Spark 2.4.0 as well in order to keep them in sync.
How was this patch tested?
Added test.