-
Notifications
You must be signed in to change notification settings - Fork 415
transform() is not threadsafe #183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I guess this can be quite easily resolved by changing sklearn-pandas/sklearn_pandas/dataframe_mapper.py Lines 310 to 311 in 0024cf7
to something like self.transformed_names_.extend(
self.get_names(columns, transformers, Xt, alias)
) Or am I mistaken that |
I'm sorry, I think I misunderstood the issue initially. I've prepared a PR that should resolve the actual issue. |
Hi @FlorisHoogenboom |
In general it is very hard to test these kind of concurency safety models. I think the MR proposed at least fixxes some very obvious problems by making those operations more atomic. I wouldn't know any way to validate (except by deducing it from the operations performed) programmatically that this method is indeed thread safe now. |
okie. I will review the MR and merge it. Thanks for your submission. |
sklearn-pandas/sklearn_pandas/dataframe_mapper.py
Line 291 in 0024cf7
The property
DataFrameMapper.transformed_names_
is reassigned and modified during_transform()
. That makestransform()
not thread safe and a Pipeline using a DataFrameMapper cannot be safely used in multiple threads.The text was updated successfully, but these errors were encountered: