-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FolderBase Dataset automatically resolves under current directory when data_dir is not specified #6152
Comments
Makes sense, I guess this can be fixed in the load_dataset_builder method. |
I think the behavior is related to these lines, which short circuited the error handling. Lines 946 to 952 in 664a1cb
So should data_dir be checked here or still delegating to actual |
This is location in PackagedDatasetModuleFactory.get_module seems the be the right place to check if at least data_dir or data_files are passed |
@mariosasko can you please assign this issue to me,I want to work on this |
#self-assign |
@mariosasko is this issue still open? i would love to kickstart my journey to open source with this issue! |
@zutarich It is unless @debrupf2946 is working on it. |
#self-assign |
I am working and will open a pull request soon @Etelis |
@mariosasko can i take this up? |
#self-assign |
Yes, feel free to work on this :) |
#self-assign |
Describe the bug
FolderBase Dataset automatically resolves under current directory when data_dir is not specified.
For example:
takes long time to resolve and collect data_files from current directory. But I think it should reach out to this line for error handling
datasets/src/datasets/packaged_modules/folder_based_builder/folder_based_builder.py
Lines 58 to 59 in cb8c5de
Steps to reproduce the bug
Expected behavior
Error report
Environment info
datasets
version: 2.14.4The text was updated successfully, but these errors were encountered: