Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: setUpClass (maluuba.newsqa.tests.test_newsqa.TestNewsQa) #40

Open
oqusous opened this issue Aug 11, 2021 · 2 comments
Open

ERROR: setUpClass (maluuba.newsqa.tests.test_newsqa.TestNewsQa) #40

oqusous opened this issue Aug 11, 2021 · 2 comments

Comments

@oqusous
Copy link

oqusous commented Aug 11, 2021

After running

docker run --rm -it -v ${PWD}:/usr/src/newsqa --name newsqa maluuba/newsqa

I get:

======================================================================
ERROR: setUpClass (maluuba.newsqa.tests.test_newsqa.TestNewsQa)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/src/newsqa/maluuba/newsqa/tests/test_newsqa.py", line 36, in setUpClass
    cls.newsqa_dataset = NewsQaDataset()
  File "/usr/src/newsqa/maluuba/newsqa/data_processing.py", line 80, in __init__
    "\n See the README in the root of this repo for more details." % dataset_path)
Exception: `/usr/src/newsqa/maluuba/newsqa/newsqa-data-v1.csv` was not found.
For legal reasons, you must first accept the terms and download the dataset from https://msropendata.com/datasets/939b1042-6402-4697-9c15-7a28de7e1321
 See the README in the root of this repo for more details.

======================================================================
ERROR: setUpClass (maluuba.newsqa.tests.test_tokenize.TestNewsQaTokenize)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/src/newsqa/maluuba/newsqa/tests/test_tokenize.py", line 32, in setUpClass
    NewsQaDataset().dump(path=combined_data_path)
  File "/usr/src/newsqa/maluuba/newsqa/data_processing.py", line 80, in __init__
    "\n See the README in the root of this repo for more details." % dataset_path)
Exception: `/usr/src/newsqa/maluuba/newsqa/newsqa-data-v1.csv` was not found.
For legal reasons, you must first accept the terms and download the dataset from https://msropendata.com/datasets/939b1042-6402-4697-9c15-7a28de7e1321
 See the README in the root of this repo for more details.

and newsqa-data-v1.csv gets deleted. I made sure theat cnn.tgz, cnn_stories.tgz and the csv file.
Please help

@juharris
Copy link
Member

Thanks for trying newsqa!

Starting the Docker container shouldn't delete any files. Are you sure that newsqa-data-v1.csv is in the right place (newsqa/maluuba/newsqa/)?
If it is, then maybe the mounting -v parameter isn't working correctly. What terminal are you using the run the docker run command? I suggest using Git Bash.
The discussion in #39 might also help you.

@Permafacture
Copy link

The docker CMD line definitely runs a command to delete the csv file. I doubt that's necessary

I think the newsqa-data-v1 compressed file format has changed. First, the download is a zip file instead of a tar.gz. Second it extracts to a sub directory while the code expects the csv to be at the same level as the compressed file.

the easiest way to get this to work is to extract the csv file and remove the docker line that deletes it on run. A longer term solution is to fix the code to handle the new format.

Also to get this to run I had to delete the apt-get installation of java form the Dockerfile, which causes the tokenizing step to fail, though this isn't necessary for getting the combined files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants