Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Latency Limit #19

Closed
zhipengChen opened this issue Jul 24, 2019 · 22 comments
Closed

Question about Latency Limit #19

zhipengChen opened this issue Jul 24, 2019 · 22 comments

Comments

@zhipengChen
Copy link

zhipengChen commented Jul 24, 2019

Dear MRQA group,

We test our model(single model) on the out-of-domain dataset(official data on Codalab) with official predict_server.py on Codalab with one GPU(Tesla K80)and get the right result we expected. But the time we used was 3h(bout 1.12s a question), I'd like to confirm that our model meet your Latency Limit.

Best,
Zhipeng

@robinjia
Copy link
Member

Hi Zhipeng,

That should be fine, because we will test on better hardware. If you want to double check, you can submit this model now and I can test the latency on my end. This doesn't have to be your final submission, but you should follow the submission instructions so that it is easy for me to run it.

@zhipengChen
Copy link
Author

OK, Thank you so much. This is our bandle id 0xbbe65de9855b4a058e1d333f28c46dad (now it can only read by mrqa group).

@zhipengChen
Copy link
Author

The bundle of run-predictions/predictions.json is predictions-LatencyTest(bandle id 0xefa76771a566486096af8f39c241793e )

@zhipengChen
Copy link
Author

Hello Robin,
Do you test my submission's latency on your end?

@robinjia
Copy link
Member

Hi Zhipeng,

I encounter the following error when running your code:

  Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fecafcecda0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /simple/werkzeug/
  Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fecafcece48>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /simple/werkzeug/
  Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fecafcecc88>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /simple/werkzeug/
  Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fecafcecf98>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /simple/werkzeug/
  Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fecafcec7b8>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /simple/werkzeug/
  Could not find a version that satisfies the requirement Werkzeug>=0.7 (from Flask==0.12.1) (from versions: )
No matching distribution found for Werkzeug>=0.7 (from Flask==0.12.1)
WARNING: Logging before flag parsing goes to stderr.
W0726 19:23:41.477045 140036318504768 deprecation_wrapper.py:119] From /0x05a84515b6bf45f088b35976d5fdc2a0_dependencies/src-sub_v6/model_utils.py:295: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

Traceback (most recent call last):
  File "src-sub_v6/run_mrqa_sub1.py", line 1338, in <module>
    import flask
ModuleNotFoundError: No module named 'flask'

It seems that you are trying to install flask but it fails. Note that when we run the submissions, we run them in a container that does not have network access. I recommend that you have everything installed on the docker image you are using, so that you don't have to install as part of your code.

@zhipengChen
Copy link
Author

zhipengChen commented Jul 26, 2019

Hello Robin,
Thank you for testing our model.
We didn't use network when we install flask. The script we used to run our model is below.
cl run :src-sub_v6 :tools :predict_server.py data_dir:mrqa-dev-data allennlp:src-sub_v6/allennlp mrqa_model:saved_model1/1563934954 :run_mrqa.sh 'sh run_mrqa.sh & pip3 install tools/overrides-1.9.tar.gz; python3 predict_server.py <(cat data_dir/*.jsonl) predictions.json 8888' --request-docker-image kevin898y/tensorflow_py36 --request-memory 12g --request-gpus 1

All the packages we need to install Flask in docker kevin898y/tensorflow_py36 are already download.
image
This is our runtime environment on codalab. Did you use the docker kevin898y/tensorflow_py36 ?

@robinjia
Copy link
Member

I see. Yes I'm using the same docker image. This is very strange, let me try to get some help from the codalab people to understand what happened.

Comparing your bundle with my run, the difference is that for yours:

Requirement already satisfied: Werkzeug>=0.7 in /usr/local/lib/python3.6/dist-packages (from Flask==0.12.1)

Whereas in mine it says it's not satisfied, tries to download it, and fails.

@zhipengChen
Copy link
Author

Thanks.
If you still can't run it. I can download Werkzeug into my direct and install it.

@zhipengChen
Copy link
Author

Hello Robin,
Can you run our model on your end now?

@robinjia
Copy link
Member

I am still debugging this. Just to help me out: is the docker image kevin898y/tensorflow_py36 something you created? Have you changed the docker image recently?

@robinjia
Copy link
Member

Hi Zhipeng,

Could you try submitting another version that uses the --no-deps flag everywhere when you pip install things (especially inside run_mrqa.sh)? The dependencies that are needed seem to already be in the docker image, but pip tries to re-install them anyways, which is what causes it to try to access network, which leads to failed bundles.

@robinjia
Copy link
Member

Actually, we think we have found the issue inside codalab that was causing the problem. Once it is fixed I will try again and hopefully it will work. So you do not need to take further action at this time.

@zhipengChen
Copy link
Author

Ok, thank you.

@zhipengChen
Copy link
Author

Hello Robin,
If you still can't run our model on your end, you can run our final submition on the official Codalab. It must be Ok.

@robinjia
Copy link
Member

Hi Zhipeng,

Yes in the worst case we can just run on the public codalab instances. If it's not too much trouble, could you try submitting another bundle with --no-deps flag everywhere, as I suggested? I want to see if this workaround can succeed, if the codalab issue does not get fixed in time (the issue is that the running job does not have root in the non-public codalab workers, which usually doesn't matter but predictably causes some installation-related things to behave differently).

@zhipengChen
Copy link
Author

Ok. I will submit another bundle right now.

@zhipengChen
Copy link
Author

Hello Robin,
I already upload a new script. Now it is running.
The bundle id is 0x9650972932c742b19f3be50de3de54f0 (run-predictions).

@robinjia
Copy link
Member

I am trying it now, initially it looks like it works! I will keep you updated. Please still finish the normal submission procedure.

@zhipengChen
Copy link
Author

Ok. Thank you for your remind. Our final submition is already prepared. But I didn't add --no-deps into the script. Should I upload a new one with all pip install with --no-deps.

@robinjia
Copy link
Member

Yes, that would be great if you can have your final submission use --no-deps.

@zhipengChen
Copy link
Author

Ok. I'm already uploading a new one.

@zhipengChen
Copy link
Author

Hello Robin,
We already submit our final system with all 'pip install' added '--no-deps'( Bundle id is 0x75a635dc423f495f8016f1eb5b02bbb2 . We also send it to your group with an email). If any problem happen when you run it on test set, you can send me a massage here or with an email. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants