-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError
when loading model using methodcaller
#283
Comments
@E-Aho @adrinjalali This is a veritable bug, here is a minimal reproduction: from sklearn.base import BaseEstimator
from operator import methodcaller
class MyEstimator(BaseEstimator):
def __init__(self):
pass
def fit(self, X, y):
self.meth_ = methodcaller("foo")
return self
est = MyEstimator().fit(None, None)
sio.loads(sio.dumps(est), trusted=True) Resulting in:
The offending line: Line 397 in 9bf557e
This could be an interesting one to resolve. Presumably, @merveenoyan The script could be rewritten to not use |
TypeError
when loading modelTypeError
when loading model using methodcaller
I was confused for a while, since class methodcaller:
"""
Return a callable object that calls the given method on its operand.
After f = methodcaller('name'), the call f(r) returns r.name().
After g = methodcaller('name', 'date', foo=1), the call g(r) returns
r.name('date', foo=1).
"""
__slots__ = ('_name', '_args', '_kwargs')
def __init__(self, name, /, *args, **kwargs):
self._name = name
if not isinstance(self._name, str):
raise TypeError('method name must be a string')
self._args = args
self._kwargs = kwargs
def __call__(self, obj):
return getattr(obj, self._name)(*self._args, **self._kwargs)
def __repr__(self):
args = [repr(self._name)]
args.extend(map(repr, self._args))
args.extend('%s=%r' % (k, v) for k, v in self._kwargs.items())
return '%s.%s(%s)' % (self.__class__.__module__,
self.__class__.__name__,
', '.join(args))
def __reduce__(self):
if not self._kwargs:
return self.__class__, (self._name,) + self._args
else:
from functools import partial
return partial(self.__class__, self._name, **self._kwargs), self._args which doesn't implement static PyObject *
methodcaller_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
methodcallerobject *mc;
PyObject *name;
if (PyTuple_GET_SIZE(args) < 1) {
PyErr_SetString(PyExc_TypeError, "methodcaller needs at least "
"one argument, the method name");
return NULL;
} Anyhow, now we know it's happening because As for the issue on the hub, @merveenoyan you can change the code to avoid using |
Fixes skops-dev#283 Description Some of the operator functions can be quite useful sklearn users, especially in conjunction with FunctionTransformer. This is because it often allows to avoid writing small functions or using lambda, which can't be persisted. However, since some of these functions are actually classes defined in CPython that need special treatment, our code didn't work with them. This fix consists of two parts. On the one hand, functions like methodcaller now have a special dispatch type that correctly knows how to initialize them. On the other hand, we had a problem with some funcs like operator.add. Since they take 2 arguments, it would be useful for them to be passed without initialization in conjunction with partial. However, their type is builtin_function_or_method. By default, they would be handled by object_get_state, but that doesn't work correctly with uninitialized objects. This has now been fixed. Open questions To get the type of operator.add et al, we need builtin_function_or_method. Right now, I just do builtin_function_or_method = type(len). Is there a better way? Regarding operators like attrgetter and methodcaller, could there be a security implication by allowing them? Sure, users can decide not to trust that type, but whether it is trustworthy or not depends on its arguments. Right now, we just tell them that they need to trust _operator.attrgetter, which could seem harmless.
Fixes #283 ## Description Some of the operator functions can be quite useful sklearn users, especially in conjunction with `FunctionTransformer`. This is because it often allows to avoid writing small functions or using `lambda`, which can't be persisted. However, since some of these functions are actually classes defined in CPython that need special treatment, our code didn't work with them. This fix consists of two parts. On the one hand, functions like `methodcaller` now have a special dispatch type that knows how to correctly initialize them. On the other hand, we had a problem with some types like `operator.add`. Since they take 2 arguments, it would be useful for them to be passed without initialization in conjunction with `partial`. However, their type is `builtin_function_or_method`. By default, they would be handled by `object_get_state`, but that doesn't work correctly with uninitialized objects. This has now been fixed by dispatching to `type_get_state`. ## Open questions To get the type of `operator.add` et al., we need `builtin_function_or_method`. Right now, I just do `builtin_function_or_method = type(len)`. Is there a better way? Regarding operators like `attrgetter` and `methodcaller`, could there be a security implication by allowing them? Sure, users can decide not to trust that type, but whether it is trustworthy or not depends on its arguments. Right now, we just tell them that they need to trust `_operator.attrgetter`, which could seem harmless. The same argument can of course be made for `eval` & co, but there, users will be much more likely to recognize their danger.
Came across this error for only this model generated using the test case generation script in
api-inference-community
:*** TypeError: methodcaller needs at least one argument, the method name
To reproduce:
At first I thought something might've been wrong with how I serialize it but I think it's weird how every test passes and this model fails for inference.
The text was updated successfully, but these errors were encountered: