Error in preprocessing #15

canjiali · 2019-05-25T11:54:12Z

Hi, I've downloaded the dataset and tried to run the "convert_msmarco_to_tfrecord.py" script. The following error occurred when some lines were read:

Traceback (most recent call last):
File "convert_msmarco_to_tfrecord.py", line 217, in
main()
File "convert_msmarco_to_tfrecord.py", line 211, in main
convert_train_dataset(tokenizer=tokenizer)
File "convert_msmarco_to_tfrecord.py", line 191, in convert_train_dataset
query, positive_doc, negative_doc = line.rstrip().split('\t')
ValueError: need more than 1 value to unpack

It seems that the segment length of some lines is less than 3 after splitted by "\t". The count of such lines is 2579. Although I can skip those lines, it may be better to conform with others that this problem actually happened.

rodrigonogueira4 · 2019-06-07T01:41:02Z

Thanks for noticing this and sorry for the late response.

It seems that there is a bug in the new version of the dataset:
spacemanidol/MSMARCO#31

Let's wait until they fix it. In the meantime, you can download and use the preprocessed TF Records from here:
https://drive.google.com/open?id=1IHFMLOMf2WqeQ0TuZx_j3_sf1Z0fc2-6

canjiali · 2019-06-08T15:41:29Z

Thanks for sharing the TFRecord files. I've successfully run the model in your shared colab. However, there is a problem that occurs frequently: the notebook stopped training with the following error log:

INFO:tensorflow:An error was raised. This may be due to a preemption in a connected worker or parameter server. The current session will be closed and a new session will be created. This error may also occur due to a gRPC failure caused by high memory or network bandwidth usage in the parameter servers. If this error occurs repeatedly, try increasing the number of parameter servers assigned to the job. Error: Socket closed
INFO:tensorflow:Error recorded from infeed: Socket closed

So that I have to reload the checkpoint file and restart training. Do you have any ideas?

rodrigonogueira4 · 2019-06-08T16:17:01Z

That happens to me as well, but it is not frequent, approximately once every 50 hours of training.

To fix it, I click in "Reset all runtimes" and then "Run all". Training automatically reloads the last checkpoint so it doesn't have to train from scratch every time the error occurs.

canjiali · 2019-06-09T08:48:13Z

Aha, it happens so frequently to me, approximately 2 hours. I close the colab page when it startes training. Do you keep the page opening?

rodrigonogueira4 · 2019-06-09T13:23:53Z

I keep the Colab page open all the time.

MateRyze · 2019-07-13T15:09:31Z

@rodrigonogueira4 could you upload the MS MARCO dataset, that was used for the generation of the provided TFRecords (or at least the top1000.eval.tsv file).
http://www.msmarco.org provides just the newest dataset v2.1
Thank you in advance!

rodrigonogueira4 · 2019-07-18T15:25:23Z

Thanks for your patience. I'm working on that: spacemanidol/MSMARCO#31 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in preprocessing #15

Error in preprocessing #15

canjiali commented May 25, 2019

rodrigonogueira4 commented Jun 7, 2019

canjiali commented Jun 8, 2019

rodrigonogueira4 commented Jun 8, 2019

canjiali commented Jun 9, 2019

rodrigonogueira4 commented Jun 9, 2019

MateRyze commented Jul 13, 2019

rodrigonogueira4 commented Jul 18, 2019

Error in preprocessing #15

Error in preprocessing #15

Comments

canjiali commented May 25, 2019

rodrigonogueira4 commented Jun 7, 2019

canjiali commented Jun 8, 2019

rodrigonogueira4 commented Jun 8, 2019

canjiali commented Jun 9, 2019

rodrigonogueira4 commented Jun 9, 2019

MateRyze commented Jul 13, 2019

rodrigonogueira4 commented Jul 18, 2019