Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Fix valid data file location issue with Natural Questions dataset #3438

Merged
merged 3 commits into from
Feb 5, 2021

Conversation

moyapchen
Copy link
Contributor

Happened to run

parlai display_data -t natural_questions -dt valid:stream

and ran into an issue about how it couldn't find the files. Digging into the built script and the output directory at ~/ParlAI/data/NaturalQuestions/valid, I noticed the files were being stored at nq-dev-00.jsonl rather than nq-valid-00.jsonl as expected by the agent. This does a quick "move the files over" fix.

Test Plan:
Ran the build script, commenting out the steps to redownload + untar all the data. Verified that

parlai display_data -t natural_questions -dt valid:stream

was indeed successful.

Happened to run

```
parlai display_data -t natural_questions -dt valid:stream
```

and ran into an issue about how it couldn't find the files. Digging into the built script and the output directory at `~/ParlAI/data/NaturalQuestions/valid`, I noticed the files were being stored at `nq-dev-00.jsonl` rather than `nq-valid-00.jsonl` as expected by the agent. This does a quick fix to move the files over.

Test Plan:
Ran the build script, commenting out the steps to redownload + untar all the data. Verified that

```
parlai display_data -t natural_questions -dt valid:stream
```

was indeed successful.
Copy link
Contributor

@mojtaba-komeili mojtaba-komeili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember there were previous issues with that naming before. But I missed this one.
Thanks for finding and fixing it 🆒

@moyapchen moyapchen merged commit 7f03a61 into master Feb 5, 2021
@moyapchen moyapchen deleted the natural_questions_valid_fix branch February 5, 2021 02:01
stephenroller pushed a commit that referenced this pull request Feb 11, 2021
)

* Fix valid set issue with Natural Questions dataset

Happened to run

```
parlai display_data -t natural_questions -dt valid:stream
```

and ran into an issue about how it couldn't find the files. Digging into the built script and the output directory at `~/ParlAI/data/NaturalQuestions/valid`, I noticed the files were being stored at `nq-dev-00.jsonl` rather than `nq-valid-00.jsonl` as expected by the agent. This does a quick fix to move the files over.

Test Plan:
Ran the build script, commenting out the steps to redownload + untar all the data. Verified that

```
parlai display_data -t natural_questions -dt valid:stream
```

was indeed successful.

* remove print, lol

* lint
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants