Skip to content

Call __init__() from request thread pool #1146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 24, 2020
Merged

Call __init__() from request thread pool #1146

merged 3 commits into from
Jun 24, 2020

Conversation

deliahu
Copy link
Member

@deliahu deliahu commented Jun 20, 2020

@RobertLucian @vishalbollu what do you think about this? Are there any issues you can think of with this approach?

@vishalbollu
Copy link
Contributor

vishalbollu commented Jun 20, 2020

This looks like an interesting idea. I have a few points:

If each thread runs its own predictor constructor, does this mean that each thread loads a model into memory separately? At that point, it may be safer to use processes instead of threads. If that is the case, we can also only initialize the threadpool if the thread is of size 1 (to avoid OOM). At least this way, running TensorFlow sessions should work by default when threads_per_worker is set to 1.

Another consideration, which isn't mutually exclusive to the above, is to import the threadpool from serve.py in the predictor implementation and run initializations in the predictor's constructor. We can have a guide showcasing this pattern if it works (and I am not sure that it does work).

Copy link
Member

@RobertLucian RobertLucian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool that it works! I think this is a good idea and solves the immediate issue that TensorFlow presents (aka, having to reinitialize things when moving thread to thread).

The bad part about this is if the user is not aware of this, they can potentially experience that TF bug/issue/limitation when threads_per_worker > 1 and this can generally lead to a confusing experience with Cortex - it could take some time until they ask for support from us. It can look like the Cortex functionality is fragmented as opposed to just keeping it not working regardless of how many threads there are per worker, even though this is a TF limitation.

I'm thinking we could probably mitigate this confusion by adding a big warning sign in the docs mentioning the behavioral difference when threads_per_worker > 1.

Also, I'm wondering how much production deployments could benefit from it, as those would probably have threads_per_worker set to something higher than 1 and thus will require a different implementation of the predict method. Simple deployments with threads_per_worker set to 1 and workers_per_replica >= 1 will definitely benefit from this.

I don't see a way of initializing the constructor in each thread such that data duplication is avoided, so maybe this is the closest we can get to.

Copy link
Member

@RobertLucian RobertLucian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@deliahu deliahu merged commit 7a4ac70 into master Jun 24, 2020
@deliahu deliahu deleted the thread-pool branch June 24, 2020 03:52
RobertLucian added a commit that referenced this pull request Jun 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants