Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dataset.save_to_disk(output_file) does not accept pathlib.Path object #240

Conversation

thaiminhpv
Copy link
Contributor

@thaiminhpv thaiminhpv commented Oct 25, 2024

When I follow this to run make_datasets.create_text_dataset and output to disk, then I got this error.
This PR is to fix that

Commands

 python -m swebench.inference.make_datasets.create_text_dataset \
    --dataset_name_or_path ./custom_hugging_face_dataset \
    --output_dir ./base_datasets --prompt_style style-3 \
    --file_source oracle \
    --splits test

Error Logs

2024-10-25 16:37:32,184 - datasets - INFO - PyTorch version 2.4.1+cpu available.
2024-10-25 16:37:33,775 - swebench.inference.make_datasets.tokenize_dataset - WARNING - Disabling caching
2024-10-25 16:37:34,401 - swebench.inference.make_datasets.create_text_dataset - INFO - Found {'test'} splits
Adding text inputs: 100%|██████████| 21/21 [00:44<00:00,  2.11s/it]
Processing test instances: 100%|██████████| 21/21 [00:00<00:00, 7204.35it/s]
2024-10-25 16:38:18,901 - swebench.inference.make_datasets.create_text_dataset - INFO - Found 21 test ids
2024-10-25 16:38:18,958 - swebench.inference.make_datasets.create_text_dataset - INFO - Found 21 test instances
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/run/lib/python3.12/site-packages/swebench/inference/make_datasets/create_text_dataset.py", line 173, in main
    dataset.save_to_disk(output_file)
  File "/home/run/lib/python3.12/site-packages/datasets/dataset_dict.py", line 1250, in save_to_disk
    fs, _ = url_to_fs(dataset_dict_path, **(storage_options or {}))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/run/lib/python3.12/site-packages/fsspec/core.py", line 383, in url_to_fs
    chain = _un_chain(url, kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/run/lib/python3.12/site-packages/fsspec/core.py", line 323, in _un_chain
    if "::" in path
       ^^^^^^^^^^^^
TypeError: argument of type 'PosixPath' is not iterable

Requirements

> python --version
Python 3.12.7
> pip list | grep
datasets                  3.0.1
swebench                  2.1.0
fsspec                    2024.2.0

@thaiminhpv
Copy link
Contributor Author

I changed it to str(...)

@john-b-yang
Copy link
Collaborator

Thanks for the fix!

@john-b-yang john-b-yang merged commit dc4c087 into princeton-nlp:main Oct 25, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants