-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Method type in dispatch calls #195
Add Method type in dispatch calls #195
Conversation
Well done figuring out where the error occurs. Before we continue, I really want to understand what is going on here and if the suggested solution is the best one. So first, it's true that we don't have any special case for objects whose attributes are methods bound to another object. E.g. this fails for now: class Foo(BaseEstimator):
def fit(self, X, y=None, **fit_params):
self.bound_method_ = FunctionTransformer().transform
return self
def transform(self, X):
return self.bound_method_(X)
def test_bound_method(tmp_path):
estimator = Foo().fit(None)
f_name = tmp_path / "file.skops"
loaded = save_load_round(estimator, f_name) gives:
I think we should add a test along these lines to our test suite. How does this relate to the original issue? From what I can tell, the problem stems from these lines: The What is strange to me is that the same issue does not apply to Another thing that surprised me is the following: I removed the whole I dug into this and I think it's because If that's true, we should be able to drop those methods entirely from the state and let scipy restore them during Regardless of that, as mentioned at the top, we should have a way to save and load methods bound to other instances. I also wondered whether we can add a special case for the |
I fully agree, I ran into this as well while testing, but with "dummy functions" that returned nothing, a few other tests that checked on the dispatch functions started failing too. I think in this case, it could definitely be solved by dropping these methods and adding a special condition to check for this during loading. However, in a general case, I think there still needs to be a solution for bound methods more generally, which I think would be along the lines of this PR. E.g:
I feel like something along these lines isn't unlikely to exist somewhere in SKLearn/SciPy (given we've already found one that happens to get bounded during init), and it would probably be a good idea to provide support for serializing it correctly, which shouldn't be too hard to do. |
👍 This is exactly what I meant when I wrote
We have to be precise with the wording here, we already persist objects with bound methods without issue, the problem are attributes that are methods bound to another object than the one we're persisting. Once we have this general issue solved, if it doesn't solve the scipy issue, we can think about a more specific solution there. |
Ah sorry, you're entirely correct. I neglected it in my comment, but I was meaning to show that if you used that e.g if we had
And tried to serialise I think one other place this might occur for users is when trying to serialize something that can take a function as an input (e.g DistanceMetrics with a user defined distance). If that function is a method bound to a different object, it wouldn't work right now AFAIK |
Yes, I think it would fail. Is there a qualitative difference to the other example I posted:
|
No, both would error, I was trying to think through a case where you would need to support bound methods that get manipulated during class creation/use and couldn't be static (e.g depends on the "variant" param in my example) |
Ah okay, got it, I thought maybe I was missing something. |
CI runner is failing with a 404 error on uploading to CodeCov 🤔 |
Most of the time, it's just some flaky behavior that can be fixed by rerunning the CI. |
Should be ready for review when you have a moment @BenjaminBossan :) let me know if you have any thoughts! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good, thanks a lot for investigating and proposing a solution. I made a few comments, please have a look.
And just to be clear, the initial issue reported in #184 is not solved yet, right? When I test it, I get an error (infinite recursion). Since we changed the skope (haha) of this PR, that's fine, I just want to make sure that this is expected.
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Thanks for the comments! And it was a fun problem to investigate :)
😨 I definitely changed my focus from the original issue to a more general case, but I'm surprised it hasn't addressed the original issue, given it was working with just empty function calls for I'll look into this and see if there's a way to address the original bug here too. I don't love the idea of merging in a bug fix if it introduces a new bug, especially one like an infinite recursion 😨 |
So after some digging, I think the problem with the original issue being an infinite loop is as follows:
This results in the loop. I don't think this is a common pattern, so could add a special case for any |
Great that you figured out the cause. I think for this PR, it's not necessary to solve the whole circular reference problem, that might be best left for another day or else the PR will become too big. As for potential solutions, we might be able to use the memoization mechanism that we already added here. There is an example of its usage for numpy arrays (which I now notice could be optimized...). |
Sounds good, I can raise an issue once this gets merged in to track the issue that's occuring with |
…ps into FIX-bound-method-serialization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks quite good to me, thanks for your work. I still have a few minor comments, please take a look.
I can raise an issue once this gets merged in to track the issue that's occuring with zipf right now
Or we can just leave the original issue open.
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Huh, looks like given the changes in #200, neither FunctionType nor MethodType ever get returned by |
Ah yes, this is not because |
@BenjaminBossan if you have a minute could you rerun the CI? |
Sure, but feel free to ignore those "Error: Codecov: Failed to properly upload" messages. For some reason, those errors occur more frequently since the past week or so, but they're unrelated to anything actually being wrong. |
Oh for sure, I just wanted to check Codecov was actually happy now, the MethodType return not getting tested was making it upset |
For sure, but you can take any of the passing CI runs to check that, no need to wait for each one to confirm. |
@BenjaminBossan, let me know if you have a moment to check the changes since your last review :) |
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Ok, I've reverted this back to prior to the LoadState work. I'll open an issue once this merges in and try to keep working on a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, it's really appreciated. I think now the PR is very clean and readable. Thanks a lot.
I only have a minor comment about one of the xfail
ing tests, which is not really a blocker, since I'm sure we can remove it soon :)
@adrinjalali please also take one more look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than @BenjaminBossan 's comment, LGTM.
Awesome, thanks for all the support guys! I've reworded the |
@E-Aho Excellent, thanks, really great job. Will you be working on the deduplication issue? |
🤗 Thanks for all the support, it really was a huge help. And yes, I'll keep working on the dupe issue, I'll raise an issue later today once I've got time, and start working on implementing a proper |
Fixes #184
PR which adds MethodType and related functions to the skops.io dispatch functions.
I can expand on the test if more cases/checks are needed, but for now this shows the functionality works as expected.