Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Commit

Permalink
Fix valid data file location issue with Natural Questions dataset (#3438
Browse files Browse the repository at this point in the history
)

* Fix valid set issue with Natural Questions dataset

Happened to run

```
parlai display_data -t natural_questions -dt valid:stream
```

and ran into an issue about how it couldn't find the files. Digging into the built script and the output directory at `~/ParlAI/data/NaturalQuestions/valid`, I noticed the files were being stored at `nq-dev-00.jsonl` rather than `nq-valid-00.jsonl` as expected by the agent. This does a quick fix to move the files over.

Test Plan:
Ran the build script, commenting out the steps to redownload + untar all the data. Verified that

```
parlai display_data -t natural_questions -dt valid:stream
```

was indeed successful.

* remove print, lol

* lint
  • Loading branch information
moyapchen authored Feb 5, 2021
1 parent bf91a22 commit 7f03a61
Showing 1 changed file with 16 additions and 0 deletions.
16 changes: 16 additions & 0 deletions parlai/tasks/natural_questions/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,21 @@ def _untar_dataset_files(dpath):
_untar_dir_files(unzip_files_path)


def _move_valid_files_from_dev_to_valid(dpath):
"""
Files from Google are stored at `nq-dev-##.jsonl.gz` and get untar'd to `nq-
dev-##.jsonl`.
The agent expects them to be stored at `nq-valid-00.jsonl`. This moves them over if
need be.
"""
valid_path = os.path.join(dpath, 'valid')
for f in os.listdir(valid_path):
if "dev" in f:
new = f.replace('dev', 'valid')
os.rename(os.path.join(valid_path, f), os.path.join(valid_path, new))


def build(opt):
dpath = os.path.join(opt['datapath'], DATASET_NAME_LOCAL)
version = 'v1.0'
Expand All @@ -92,4 +107,5 @@ def build(opt):
build_data.make_dir(dpath)
_download_with_cloud_storage_client(dpath)
_untar_dataset_files(dpath)
_move_valid_files_from_dev_to_valid(dpath)
build_data.mark_done(dpath, version_string=version)

0 comments on commit 7f03a61

Please sign in to comment.