Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate skops serialization format #174

Merged
merged 6 commits into from
Jan 30, 2023
Merged

Integrate skops serialization format #174

merged 6 commits into from
Jan 30, 2023

Conversation

merveenoyan
Copy link
Contributor

No description provided.

Copy link
Contributor Author

@merveenoyan merveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some notes @adrinjalali @BenjaminBossan


self.model_file = (
config.get("sklearn", {}).get("model", {}).get("file", DEFAULT_FILENAME)
)
self.model_format = (
config.get("sklearn", {}).get("model_format", "pickle")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this assumes file is pickle because of how there's no skops models without config (the ones without config are to maintain vanilla sklearn models that weren't uploaded with skops)

"skops-tests/iris-sklearn-latest-logistic_regression-without-config",
"skops-tests/iris-sklearn-latest-hist_gradient_boosting-with-config",
"skops-tests/iris-sklearn-latest-hist_gradient_boosting-without-config",
"skops-tests/iris-sklearn-1.0-logistic_regression-with-config-skops",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a skops test case for every pickle test case with config, do you think I should've reduced it? The GHActions never works on these tests anyway, so there's not much limitation except for local dev. @BenjaminBossan @adrinjalali

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could eventually get the GH to work, in which case it might matter. Otherwise, I don't have any opinion on that.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks very good, thanks! I have some minor comments only.

self.model = joblib.load(
open(Path(cached_folder) / self.model_file, "rb")
)
if len(record) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this whole check not be outside of the if self.model_format == "pickle": condition? Otherwise, the warning is only recorded when the format is pickle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I just addressed it.

@@ -34,7 +34,7 @@
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import FunctionTransformer, StandardScaler
from skops import hub_utils
from skops import hub_utils, io
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wanted to use import skops.io as sio.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed this

repo_name = REPO_NAMES[task_name].format(
version=version, est_name=est_name, w_or_wo="without"
)
if serialization_format == "pickle":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why this is only done for "pickle"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't make any sense to push a file with skops format without config, based on assumption where if someone pushes a skops model, config should be there. this is only done to maintain backwards compatibility for vanilla sklearn models that is serialized using pickle and pushed without skops. this way, test cases are less too.

"skops-tests/iris-sklearn-latest-logistic_regression-without-config",
"skops-tests/iris-sklearn-latest-hist_gradient_boosting-with-config",
"skops-tests/iris-sklearn-latest-hist_gradient_boosting-without-config",
"skops-tests/iris-sklearn-1.0-logistic_regression-with-config-skops",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could eventually get the GH to work, in which case it might matter. Otherwise, I don't have any opinion on that.

Copy link
Contributor Author

@merveenoyan merveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenjaminBossan I addressed all your comments

repo_name = REPO_NAMES[task_name].format(
version=version, est_name=est_name, w_or_wo="without"
)
if serialization_format == "pickle":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't make any sense to push a file with skops format without config, based on assumption where if someone pushes a skops model, config should be there. this is only done to maintain backwards compatibility for vanilla sklearn models that is serialized using pickle and pushed without skops. this way, test cases are less too.

@@ -34,7 +34,7 @@
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import FunctionTransformer, StandardScaler
from skops import hub_utils
from skops import hub_utils, io
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed this

self.model = joblib.load(
open(Path(cached_folder) / self.model_file, "rb")
)
if len(record) > 0:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I just addressed it.

@merveenoyan
Copy link
Contributor Author

@adrinjalali @BenjaminBossan all of the tests pass except for the single one with methodcaller and skops. if you think this is going to be fixed on skops side I don't think we should swap it.

@merveenoyan
Copy link
Contributor Author

Addressed comments and applied style changes.

@BenjaminBossan
Copy link
Member

if you think this is going to be fixed on skops side I don't think we should swap it.

skops-dev/skops#287

However, even if it gets merged, it might take a while before we have our next skops release, since we just had one very recently.

@merveenoyan
Copy link
Contributor Author

@BenjaminBossan should I merge?

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@merveenoyan merveenoyan merged commit 024546a into huggingface:main Jan 30, 2023
@Narsil
Copy link
Contributor

Narsil commented Jan 30, 2023

It's not deployed yet. We're migrating some stuff to support the private hub. Should be deployed tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants