-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate skops serialization format #174
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some notes @adrinjalali @BenjaminBossan
|
||
self.model_file = ( | ||
config.get("sklearn", {}).get("model", {}).get("file", DEFAULT_FILENAME) | ||
) | ||
self.model_format = ( | ||
config.get("sklearn", {}).get("model_format", "pickle") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this assumes file is pickle because of how there's no skops models without config (the ones without config are to maintain vanilla sklearn models that weren't uploaded with skops)
"skops-tests/iris-sklearn-latest-logistic_regression-without-config", | ||
"skops-tests/iris-sklearn-latest-hist_gradient_boosting-with-config", | ||
"skops-tests/iris-sklearn-latest-hist_gradient_boosting-without-config", | ||
"skops-tests/iris-sklearn-1.0-logistic_regression-with-config-skops", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a skops test case for every pickle test case with config, do you think I should've reduced it? The GHActions never works on these tests anyway, so there's not much limitation except for local dev. @BenjaminBossan @adrinjalali
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we could eventually get the GH to work, in which case it might matter. Otherwise, I don't have any opinion on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks very good, thanks! I have some minor comments only.
self.model = joblib.load( | ||
open(Path(cached_folder) / self.model_file, "rb") | ||
) | ||
if len(record) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this whole check not be outside of the if self.model_format == "pickle":
condition? Otherwise, the warning is only recorded when the format is pickle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I just addressed it.
@@ -34,7 +34,7 @@ | |||
from sklearn.model_selection import train_test_split | |||
from sklearn.pipeline import make_pipeline | |||
from sklearn.preprocessing import FunctionTransformer, StandardScaler | |||
from skops import hub_utils | |||
from skops import hub_utils, io |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We wanted to use import skops.io as sio
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed this
repo_name = REPO_NAMES[task_name].format( | ||
version=version, est_name=est_name, w_or_wo="without" | ||
) | ||
if serialization_format == "pickle": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why this is only done for "pickle"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't make any sense to push a file with skops format without config, based on assumption where if someone pushes a skops model, config should be there. this is only done to maintain backwards compatibility for vanilla sklearn models that is serialized using pickle and pushed without skops. this way, test cases are less too.
"skops-tests/iris-sklearn-latest-logistic_regression-without-config", | ||
"skops-tests/iris-sklearn-latest-hist_gradient_boosting-with-config", | ||
"skops-tests/iris-sklearn-latest-hist_gradient_boosting-without-config", | ||
"skops-tests/iris-sklearn-1.0-logistic_regression-with-config-skops", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we could eventually get the GH to work, in which case it might matter. Otherwise, I don't have any opinion on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BenjaminBossan I addressed all your comments
repo_name = REPO_NAMES[task_name].format( | ||
version=version, est_name=est_name, w_or_wo="without" | ||
) | ||
if serialization_format == "pickle": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't make any sense to push a file with skops format without config, based on assumption where if someone pushes a skops model, config should be there. this is only done to maintain backwards compatibility for vanilla sklearn models that is serialized using pickle and pushed without skops. this way, test cases are less too.
@@ -34,7 +34,7 @@ | |||
from sklearn.model_selection import train_test_split | |||
from sklearn.pipeline import make_pipeline | |||
from sklearn.preprocessing import FunctionTransformer, StandardScaler | |||
from skops import hub_utils | |||
from skops import hub_utils, io |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed this
self.model = joblib.load( | ||
open(Path(cached_folder) / self.model_file, "rb") | ||
) | ||
if len(record) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I just addressed it.
@adrinjalali @BenjaminBossan all of the tests pass except for the single one with methodcaller and skops. if you think this is going to be fixed on skops side I don't think we should swap it. |
Addressed comments and applied style changes. |
However, even if it gets merged, it might take a while before we have our next skops release, since we just had one very recently. |
@BenjaminBossan should I merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
It's not deployed yet. We're migrating some stuff to support the private hub. Should be deployed tomorrow. |
No description provided.