Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datasets.builder.InvalidConfigName: Bad characters from black list '<>:/\|?*' found in 'data/belle_data.json'. They could create issues when creating a directory for this config on Windows filesystem. #23

Open
deepeye opened this issue Apr 19, 2023 · 1 comment

Comments

@deepeye
Copy link

deepeye commented Apr 19, 2023

python cover_belle2jsonl.py \
    --data_path data/Belle_open_source_1M.json \
    --save_path data/belle_data.jsonl

执行以上报如下错误:


Resolving data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 87018.76it/s]
Resolving data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2657/2657 [00:00<00:00, 11775.86it/s]
Resolving data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:00<00:00, 36222.08it/s]
Traceback (most recent call last):
  File "/data/chat/InstructGLM/cover_belle2jsonl.py", line 42, in <module>
    main()
  File "/data/chat/InstructGLM/cover_belle2jsonl.py", line 25, in main
    dataset = load_dataset("json", "data/belle_data.json")
  File "/data/chat/InstructGLM/venv/lib/python3.10/site-packages/datasets/load.py", line 1759, in load_dataset
    builder_instance = load_dataset_builder(
  File "/data/chat/InstructGLM/venv/lib/python3.10/site-packages/datasets/load.py", line 1522, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
  File "/data/chat/InstructGLM/venv/lib/python3.10/site-packages/datasets/builder.py", line 319, in __init__
    self.config, self.config_id = self._create_builder_config(
  File "/data/chat/InstructGLM/venv/lib/python3.10/site-packages/datasets/builder.py", line 472, in _create_builder_config
    builder_config = self.BUILDER_CONFIG_CLASS(**config_kwargs)
  File "<string>", line 14, in __init__
  File "/data/chat/InstructGLM/venv/lib/python3.10/site-packages/datasets/builder.py", line 125, in __post_init__
    raise InvalidConfigName(
datasets.builder.InvalidConfigName: Bad characters from black list '<>:/\|?*' found in 'data/belle_data.json'. They could create issues when creating a directory for this config on Windows filesystem.
@deepeye
Copy link
Author

deepeye commented Apr 19, 2023

已解决,修正如下:
dataset = load_dataset("json", data_files=args.data_path)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant