[rag] missing a working End-to-end evaluation example #8284

stas00 · 2020-11-04T06:15:03Z

I'm going to try to write tests for examples/rag (#7715), but first I'm trying to figure out how it works.

Would it be possible to add a full End-to-end evaluation invocation example in https://github.com/huggingface/transformers/blob/master/examples/rag/README.md#end-to-end-evaluation? i.e. with the correct data.

I tested https://github.com/huggingface/transformers/blob/master/examples/rag/README.md#retrieval-evaluation and it worked, but if I try to adapt the same params for e2e it crashes with:

$ python eval_rag.py --model_name_or_path facebook/rag-sequence-nq --model_type rag_sequence \
--evaluation_set output/biencoder-nq-dev.questions --gold_data_path output/biencoder-nq-dev.pages \
--predictions_path output/retrieval_preds.tsv --eval_mode e2e --gold_data_mode qa --n_docs 5 \
--print_predictions
2020-11-03 22:07:33.124277: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
INFO:__main__:Evaluate the following checkpoints: ['facebook/rag-sequence-nq']
INFO:__main__:Calculating metrics based on an existing predictions file: output/retrieval_preds.tsv
Traceback (most recent call last):
  File "eval_rag.py", line 314, in <module>
    main(args)
  File "eval_rag.py", line 280, in main
    score_fn(args, args.predictions_path, args.gold_data_path)
  File "eval_rag.py", line 46, in get_scores
    data = pd.read_csv(gold_data_path, sep="\t", header=None)
  File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pandas/io/parsers.py", line 686, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pandas/io/parsers.py", line 458, in _read
    data = parser.read(nrows)
  File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pandas/io/parsers.py", line 1196, in read
    ret = self._engine.read(nrows)
  File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/pandas/io/parsers.py", line 2155, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 862, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 918, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 905, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2042, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 5 fields in line 2, saw 6

I think it needs a different input data.

And we need 2 functional examples: for qa and ans each.

I can handle adding this to the doc if you tell me what to add.

Thanks.

@patrickvonplaten, @lhoestq

The text was updated successfully, but these errors were encountered:

shamanez · 2020-11-11T22:59:12Z

@stas00

Can you please write a test code for finetune.sh.

stas00 · 2020-11-11T23:09:15Z

As you can see I'm waiting for this ticket to be addressed before I'm able to write the tests.

Perhaps you can address that, and then I will have all the info needed to write the tests.

stas00 · 2020-11-11T23:10:13Z

Until then please file a normal issue about it. I haven't done any rag work yet, so that's why I'm asking for support.

patrickvonplaten · 2020-11-13T13:50:23Z

@lhoestq is working on this at the moment :-)

lhoestq · 2020-11-13T14:48:44Z

Actually I'm working on the finetuning script example, not eval ;)
But maybe this can help with adding a test for the eval script example.

stale · 2021-01-16T12:02:48Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stas00 · 2021-03-18T01:40:49Z

stale

patrickvonplaten assigned lhoestq and patrickvonplaten Nov 5, 2020

stas00 mentioned this issue Jan 11, 2021

examples/rag: test coverage, tiny model #7715

Open

stale bot added the wontfix label Jan 16, 2021

stas00 added the Feature request Request for a new feature label Jan 16, 2021

stale bot removed the wontfix label Jan 16, 2021

stas00 closed this as completed Mar 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rag] missing a working End-to-end evaluation example #8284

[rag] missing a working End-to-end evaluation example #8284

stas00 commented Nov 4, 2020 •

edited

Loading

shamanez commented Nov 11, 2020

stas00 commented Nov 11, 2020

stas00 commented Nov 11, 2020

patrickvonplaten commented Nov 13, 2020

lhoestq commented Nov 13, 2020

stale bot commented Jan 16, 2021

stas00 commented Mar 18, 2021

[rag] missing a working End-to-end evaluation example #8284

[rag] missing a working End-to-end evaluation example #8284

Comments

stas00 commented Nov 4, 2020 • edited Loading

shamanez commented Nov 11, 2020

stas00 commented Nov 11, 2020

stas00 commented Nov 11, 2020

patrickvonplaten commented Nov 13, 2020

lhoestq commented Nov 13, 2020

stale bot commented Jan 16, 2021

stas00 commented Mar 18, 2021

stas00 commented Nov 4, 2020 •

edited

Loading