`TypeError` when loading model using methodcaller #283

merveenoyan · 2023-01-24T17:55:21Z

Came across this error for only this model generated using the test case generation script in api-inference-community:
*** TypeError: methodcaller needs at least one argument, the method name

To reproduce:

from skops import hub_utils
hub_utils.download(repo_id = "skops-tests/textclassification-sklearn-latest-hist_gradient_boosting-with-config-skops", dst="model_dir")

import skops.io as sio
sio.load("/content/model_dir/skops-v588xnq6.skops", trusted=True)

At first I thought something might've been wrong with how I serialize it but I think it's weird how every test passes and this model fails for inference.

The text was updated successfully, but these errors were encountered:

BenjaminBossan · 2023-01-25T10:16:33Z

@E-Aho @adrinjalali This is a veritable bug, here is a minimal reproduction:

from sklearn.base import BaseEstimator
from operator import methodcaller

class MyEstimator(BaseEstimator):
   def __init__(self):
       pass
   def fit(self, X, y):
       self.meth_ = methodcaller("foo")
       return self

est = MyEstimator().fit(None, None)
sio.loads(sio.dumps(est), trusted=True)

Resulting in:

--> 397 instance = cls.__new__(cls)  # type: ignore
    399 if not self.children["attrs"]:
    400     # nothing more to do
    401     return instance

TypeError: methodcaller needs at least one argument, the method name

The offending line:

skops/skops/io/_general.py

Line 397 in 9bf557e

instance = cls.__new__(cls) # type: ignore

This could be an interesting one to resolve. Presumably, methodcaller is not the only type that fails like this, we should ensure that other similar types work too.

@merveenoyan methodcaller is used here:

https://github.com/huggingface/api-inference-community/blob/763efb856835098e7c9b91219b241ea7e524e695/docker_images/sklearn/tests/generators/generate.py#L96

The script could be rewritten to not use methodcaller and should then pass.

adrinjalali · 2023-01-25T11:37:55Z

I was confused for a while, since methodcaller is defined here (https://github.com/python/cpython/blob/498598e8c2d64232d26c075de87c513415176bbf/Lib/operator.py#L302) as:

class methodcaller:
    """
    Return a callable object that calls the given method on its operand.
    After f = methodcaller('name'), the call f(r) returns r.name().
    After g = methodcaller('name', 'date', foo=1), the call g(r) returns
    r.name('date', foo=1).
    """
    __slots__ = ('_name', '_args', '_kwargs')

    def __init__(self, name, /, *args, **kwargs):
        self._name = name
        if not isinstance(self._name, str):
            raise TypeError('method name must be a string')
        self._args = args
        self._kwargs = kwargs

    def __call__(self, obj):
        return getattr(obj, self._name)(*self._args, **self._kwargs)

    def __repr__(self):
        args = [repr(self._name)]
        args.extend(map(repr, self._args))
        args.extend('%s=%r' % (k, v) for k, v in self._kwargs.items())
        return '%s.%s(%s)' % (self.__class__.__module__,
                              self.__class__.__name__,
                              ', '.join(args))

    def __reduce__(self):
        if not self._kwargs:
            return self.__class__, (self._name,) + self._args
        else:
            from functools import partial
            return partial(self.__class__, self._name, **self._kwargs), self._args

which doesn't implement __new__. So it was puzzling why the error. Then I realized, like many other things in python, there's also a C implementation of it, and it implements __new__ here (https://github.com/python/cpython/blob/498598e8c2d64232d26c075de87c513415176bbf/Modules/_operator.c#L1563), which starts as:

static PyObject *
methodcaller_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
    methodcallerobject *mc;
    PyObject *name;

    if (PyTuple_GET_SIZE(args) < 1) {
        PyErr_SetString(PyExc_TypeError, "methodcaller needs at least "
                        "one argument, the method name");
        return NULL;
    }

Anyhow, now we know it's happening because methodcaller has a custom __new__ method which requires an extra arg. Luckily not many objects implement __new__, and I always thought for these objects we'd have a dedicated loader, like we do for set, dict etc. Cause I really don't want to rely on their __reduce__ lol

As for the issue on the hub, @merveenoyan you can change the code to avoid using methodcaller, while we fix it here for this object.

Fixes skops-dev#283 Description Some of the operator functions can be quite useful sklearn users, especially in conjunction with FunctionTransformer. This is because it often allows to avoid writing small functions or using lambda, which can't be persisted. However, since some of these functions are actually classes defined in CPython that need special treatment, our code didn't work with them. This fix consists of two parts. On the one hand, functions like methodcaller now have a special dispatch type that correctly knows how to initialize them. On the other hand, we had a problem with some funcs like operator.add. Since they take 2 arguments, it would be useful for them to be passed without initialization in conjunction with partial. However, their type is builtin_function_or_method. By default, they would be handled by object_get_state, but that doesn't work correctly with uninitialized objects. This has now been fixed. Open questions To get the type of operator.add et al, we need builtin_function_or_method. Right now, I just do builtin_function_or_method = type(len). Is there a better way? Regarding operators like attrgetter and methodcaller, could there be a security implication by allowing them? Sure, users can decide not to trust that type, but whether it is trustworthy or not depends on its arguments. Right now, we just tell them that they need to trust _operator.attrgetter, which could seem harmless.

Fixes #283 ## Description Some of the operator functions can be quite useful sklearn users, especially in conjunction with `FunctionTransformer`. This is because it often allows to avoid writing small functions or using `lambda`, which can't be persisted. However, since some of these functions are actually classes defined in CPython that need special treatment, our code didn't work with them. This fix consists of two parts. On the one hand, functions like `methodcaller` now have a special dispatch type that knows how to correctly initialize them. On the other hand, we had a problem with some types like `operator.add`. Since they take 2 arguments, it would be useful for them to be passed without initialization in conjunction with `partial`. However, their type is `builtin_function_or_method`. By default, they would be handled by `object_get_state`, but that doesn't work correctly with uninitialized objects. This has now been fixed by dispatching to `type_get_state`. ## Open questions To get the type of `operator.add` et al., we need `builtin_function_or_method`. Right now, I just do `builtin_function_or_method = type(len)`. Is there a better way? Regarding operators like `attrgetter` and `methodcaller`, could there be a security implication by allowing them? Sure, users can decide not to trust that type, but whether it is trustworthy or not depends on its arguments. Right now, we just tell them that they need to trust `_operator.attrgetter`, which could seem harmless. The same argument can of course be made for `eval` & co, but there, users will be much more likely to recognize their danger.

BenjaminBossan added bug Something isn't working persistence Secure persistence feature labels Jan 25, 2023

BenjaminBossan changed the title ~~TypeError when loading model~~ TypeError when loading model using methodcaller Jan 25, 2023

BenjaminBossan mentioned this issue Jan 30, 2023

FIX: Support operators persistence #287

Merged

adrinjalali closed this as completed in #287 Mar 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`TypeError` when loading model using methodcaller #283

`TypeError` when loading model using methodcaller #283

merveenoyan commented Jan 24, 2023

BenjaminBossan commented Jan 25, 2023 •

edited

Loading

adrinjalali commented Jan 25, 2023

TypeError when loading model using methodcaller #283

TypeError when loading model using methodcaller #283

Comments

merveenoyan commented Jan 24, 2023

BenjaminBossan commented Jan 25, 2023 • edited Loading

adrinjalali commented Jan 25, 2023

`TypeError` when loading model using methodcaller #283

`TypeError` when loading model using methodcaller #283

BenjaminBossan commented Jan 25, 2023 •

edited

Loading