-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in preprocessing #15
Comments
Thanks for noticing this and sorry for the late response. It seems that there is a bug in the new version of the dataset: Let's wait until they fix it. In the meantime, you can download and use the preprocessed TF Records from here: |
Thanks for sharing the TFRecord files. I've successfully run the model in your shared colab. However, there is a problem that occurs frequently: the notebook stopped training with the following error log: INFO:tensorflow:An error was raised. This may be due to a preemption in a connected worker or parameter server. The current session will be closed and a new session will be created. This error may also occur due to a gRPC failure caused by high memory or network bandwidth usage in the parameter servers. If this error occurs repeatedly, try increasing the number of parameter servers assigned to the job. Error: Socket closed So that I have to reload the checkpoint file and restart training. Do you have any ideas? |
That happens to me as well, but it is not frequent, approximately once every 50 hours of training. To fix it, I click in "Reset all runtimes" and then "Run all". Training automatically reloads the last checkpoint so it doesn't have to train from scratch every time the error occurs. |
Aha, it happens so frequently to me, approximately 2 hours. I close the colab page when it startes training. Do you keep the page opening? |
I keep the Colab page open all the time. |
@rodrigonogueira4 could you upload the MS MARCO dataset, that was used for the generation of the provided TFRecords (or at least the top1000.eval.tsv file). |
Thanks for your patience. I'm working on that: spacemanidol/MSMARCO#31 (comment) |
Hi, I've downloaded the dataset and tried to run the "convert_msmarco_to_tfrecord.py" script. The following error occurred when some lines were read:
Traceback (most recent call last):
File "convert_msmarco_to_tfrecord.py", line 217, in
main()
File "convert_msmarco_to_tfrecord.py", line 211, in main
convert_train_dataset(tokenizer=tokenizer)
File "convert_msmarco_to_tfrecord.py", line 191, in convert_train_dataset
query, positive_doc, negative_doc = line.rstrip().split('\t')
ValueError: need more than 1 value to unpack
It seems that the segment length of some lines is less than 3 after splitted by "\t". The count of such lines is 2579. Although I can skip those lines, it may be better to conform with others that this problem actually happened.
The text was updated successfully, but these errors were encountered: