Skip to content

transform() is not threadsafe #183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tatome opened this issue Nov 21, 2018 · 5 comments · Fixed by #194
Closed

transform() is not threadsafe #183

tatome opened this issue Nov 21, 2018 · 5 comments · Fixed by #194

Comments

@tatome
Copy link

tatome commented Nov 21, 2018

self.transformed_names_ = []

The property DataFrameMapper.transformed_names_ is reassigned and modified during _transform(). That makes transform() not thread safe and a Pipeline using a DataFrameMapper cannot be safely used in multiple threads.

@FlorisHoogenboom
Copy link
Contributor

I guess this can be quite easily resolved by changing

self.transformed_names_ += self.get_names(
columns, transformers, Xt, alias)

to something like

self.transformed_names_.extend(
    self.get_names(columns, transformers, Xt, alias) 
)

Or am I mistaken that extend is threadsafe?

@FlorisHoogenboom
Copy link
Contributor

I'm sorry, I think I misunderstood the issue initially. I've prepared a PR that should resolve the actual issue.

@ragrawal
Copy link
Collaborator

Hi @FlorisHoogenboom
Thanks for raising this issue. I took over the maintenance of sklearn-pandas and going through all the old issues. I think this is an important issue and should be fixed. I see your PR and happy to merge it. Wondering is there any way we can test this.

@FlorisHoogenboom
Copy link
Contributor

In general it is very hard to test these kind of concurency safety models. I think the MR proposed at least fixxes some very obvious problems by making those operations more atomic. I wouldn't know any way to validate (except by deducing it from the operations performed) programmatically that this method is indeed thread safe now.

@ragrawal
Copy link
Collaborator

okie. I will review the MR and merge it. Thanks for your submission.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants