-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarking Prediction Speed #126
Comments
Do you have a dataset in mind for the benchmark? |
Yes, that would be perfect! Ideally, it would exclude loading and setting up the model (something that the tf implementation literally does not allow for :P) |
Hi Jade, I did some benchmarking on a V100 GPU. You can check the script I used on the
I will give a look on an older K80 (without fp16 support) when I have time. |
This is fantastic! Thank you so so so so much! If you get a chance to do the K80, that would be brilliant. I'll try run it when I get time. Currently doing a cost versus speed comparison just to get a feel. |
You can run it like this for python run_squad.py \
--bert_model bert-base-uncased \
--do_predict \
--do_lower_case \
--train_file $SQUAD_DIR/train-v1.1.json \
--predict_file $SQUAD_DIR/dev-v1.1.json \
--predict_batch_size 128 \
--learning_rate 3e-5 \
--num_train_epochs 2.0 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir /tmp/debug_squad/ And like this for python run_squad.py \
--bert_model bert-base-uncased \
--do_predict \
--predict_fp16 \
--do_lower_case \
--train_file $SQUAD_DIR/train-v1.1.json \
--predict_file $SQUAD_DIR/dev-v1.1.json \
--predict_batch_size 128 \
--learning_rate 3e-5 \
--num_train_epochs 2.0 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir /tmp/debug_squad/ Adjust |
Fantastic. Tomorrow I'm going to run it for some smaller max sequence lengths (useful for my use case) and on some other GPUS: The Tesla M60 and then the K80 |
Managed to replicate your results on the V100. :) Also, I've done the experiments below for sequences of length 64 on different GPUS. Will do the other sequence lengths when I get a chance.
|
@thomwolf @jaderabbit Thank you for the experiments. I think these results deserves more visibility, maybe a dedicated markdown page or a section in the |
Your are right Gregory. |
I'm more or less new to sphinx but I would be happy to work on it with you. |
Sure, if you want help that could definitely speed up the process. The first step would be to create a new branch to work on with a Good introductions to sphinx and readthedoc are here: http://www.ericholscher.com/blog/2016/jul/1/sphinx-and-rtd-for-writers/ We will need to add some dependencies for the but we should strive to keep it as light as possible. |
Hi @thomwolf , The "test.json" has one context and 1 question on the same. It looks like this: Please help me with this. I switched to using the PyTorch implementation hoping that getting a saved model and making predictions using the saved model will be easier in PyTorch. |
@apurvaasf Might be worth opening another ticket since that's slightly different to this. It shouldn't be too hard to write your own code for deployment. The trick is to make sure it does all the loading once, and just calls predict each time you need a prediction. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hi @thomwolf and thanks for the amazing implementation. I wonder what is the inference speed with a 512 batch size. It seems to take a lot of time to convert to GPU (1000msec for a batch size of 32) and I wonder if there is any quick speedup/fix. I am concerned with the latency rather than the throughput. |
Have you found any solutions? I've met the same problem. |
#Pytorch#35292 |
In reference to following tweet:
Would it be possible to do a benchmark on the speed of prediction? I was working with the tensorflow version of BERT, but it uses the new Estimators and I'm struggling to find a straight-forward way to benchmark it since it all gets hidden in layers of computation graph. I'd imagine pytorch being more forgiving in this regard.
The text was updated successfully, but these errors were encountered: