Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latency high after loading a new model. #385

Closed
xiaop1987 opened this issue Mar 30, 2017 · 8 comments
Closed

Latency high after loading a new model. #385

xiaop1987 opened this issue Mar 30, 2017 · 8 comments
Labels
type:performance Performance Issue

Comments

@xiaop1987
Copy link

I'm using Tensorflow Serving load a widendeep model as online predict service, and the model will update every 10 minutes, we found that the first few requests' latency are high right after the new model is loaded, is this a known issue or any suggestion to figure out this problem?

@chrisolston
Copy link
Contributor

Some tensorflow graphs perform lazy initialization, making the first request (or few requests) to a newly-loaded model slow. The best way to handle that is to add initialization or dummy "warm-up requests" to the init op which tf-serving calls while loading the model.

@xiaop1987
Copy link
Author

xiaop1987 commented Apr 3, 2017

@chrisolston Thanks for your explanation and suggestion very much, problem is clear to me.
Here is some suggestions for tf-serving loading model.
a) May tf-serving add a warm up option:
1. we can store a request for each model when request first arrived,
2. when new version of model was loaded, do not make it ready util it is warmed up by the stored request.

b) Add lazy-loading model:
1. For we may start hundreds of tf-serving process, and they start loading and updating new version of model almost the same time, these situation may make the cluster's network and disk quite busy(the model is stored on HDFS), and
make the cluster unstable.
2. So we can loading/updating the model at random time in a specified period to make the network and disk more smooth.

@chrisolston
Copy link
Contributor

For (a), the recommended approach is to do it within the tf graph, triggered from tf-serving calling the init op during load.

For (b), interesting idea. I would expect various I/O queues to smooth it out anyway but maybe you are hitting timeouts? You could write a custom SourceAdapter that acts as the identity function but adds a random delay -- that would do the trick. Feel free to contribute the SourceAdapter via a PR.

@eldonaldo
Copy link

Hi, I have the exact same problem. However, I do not understand how one can add initialization or dummy "warm-up requests" to the init op (I used Keras for training and the SavedModelBuilder for exporting the model). Can you please explain it in more detail, e.g. with a code example?

Thanks!

@eldonaldo
Copy link

Ping @chrisolston

@ydp
Copy link
Contributor

ydp commented May 28, 2018

same problem

@weberxie
Copy link

Hi @chrisolston , I have the same problem, can you provide an example on how to call the init op during load?

@tianyapiaozi
Copy link
Contributor

Hi @chrisolston,Current version of tf serving try to load warmup request from tf_serving_warmup_requests file. I wonder if tensorflow provides common api to export request to the location or not? Or should we write request to the location manually?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:performance Performance Issue
Projects
None yet
Development

No branches or pull requests

7 participants