Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to train it with custom dataset? (webface42m) (folders as classes?_) FileNotFoundError: [Errno 2] No such file or directory: \faces_emore\\agedb_30\\meta\\sizes #109

Open
martinenkoEduard opened this issue Jul 27, 2023 · 15 comments

Comments

@martinenkoEduard
Copy link

How to train it with custom dataset? (folders as classes?_) FileNotFoundError: [Errno 2] No such file or directory: \faces_emore\agedb_30\meta\sizes

@martinenkoEduard martinenkoEduard changed the title How to train it with custom dataset? (folders as classes?_) FileNotFoundError: [Errno 2] No such file or directory: \faces_emore\\agedb_30\\meta\\sizes How to train it with custom dataset? (webface42m) (folders as classes?_) FileNotFoundError: [Errno 2] No such file or directory: \faces_emore\\agedb_30\\meta\\sizes Jul 27, 2023
@afm215
Copy link

afm215 commented Aug 10, 2023

The error is probably coming from the val_dataset function (and will very likely also be raised by the test_dataset function). You will probably have to edit them within the data.py file

@ayush0x00
Copy link

@afm215 is there any documentation about using adaFace with a custom dataset? I feel like the code is way too much hardcoded with specific values and dataset names.

@afm215
Copy link

afm215 commented Sep 20, 2023

Just create a PR #125. With this, you should now be able to add .bin in your validation folder and regenerate your memfile. Be aware that in the current state , the code will raise an assertion error if any of the initial sets are not present within the validation folder. You can just comment line 50 within five_validation_dataset.py.
If there is any bugs with the code, feel free to tell me !

@ayush0x00
Copy link

Thanks for creating the PR. I have one question, how do we create the .bin files for the dataset?? Does AdaFace require some format of the folder?
I have a folder of dataset (just like the one that we get after extracting train.rec as mentioned in the readme). How do I use it for training on AdaFace?

@afm215
Copy link

afm215 commented Sep 20, 2023

The .bin files are used for the validation, and the .rec is used for the training. You should thus structure your data this way:
data_root
|
--------validation
| |- .bin files
|
--------train
|
------train.rec
|
------train.lst
|
------train.idx

You can next take as a model the configurations given the scripts folder.
Now in order to create new bin file, I have create some utilities to do so within this repo: https://github.com/afm215/python-utils. In the mentioned package, can check the function convert_test_set in Images.Faces.validation_set. I hope this will be usefull

@ayush0x00
Copy link

ayush0x00 commented Sep 20, 2023

Is it not possible to directly train AdaFace on an Image Folder like structure, without involving the .bin files and .rec files??
The folder structure is like
data_root
-0001
01.jpg
02.jpg
-0002
01.jpg
02.jpg
where 0001 and 0002 represents seperate identity classes. I am still not able to figure out a way for creating the validation dataset 🥲

@afm215
Copy link

afm215 commented Sep 21, 2023

If you want to get rid of the .bin file, I think you'll have to modify the code itself. In the current state, you however don't have to use .rec files for training. If no rec file is detected within <data_root>/<train_path>, the script will create an ImageFolder dataset with, as input, the path: <data_root>/<train_path>/imgs

@afm215
Copy link

afm215 commented Sep 21, 2023

Also, what do you mean by not being able to generate the validation files.?Is there a way I can help you with that?

@ayush0x00
Copy link

ayush0x00 commented Sep 21, 2023

So I have managed to automate the AdaFace code for any general dataset. The only issue that remains for me is how the validation data is created and used. For example: consider the file lfw_list.npy which basically contains True/False indicating whether a pair of images belongs to the same identity or not(number of such entries is 6000). The data file corresponding to lfw_list.npy has 12000 images. I am not able to figure out that how to know which pair of images in the validation data is being referred by each entry of lfw_list.npy file? Am I missing out something? Or is it like first two images from the validation set will correspond to one entry of .npy file??

@ayush0x00
Copy link

If you want to get rid of the .bin file, I think you'll have to modify the code itself. In the current state, you however don't have to use .rec files for training. If no rec file is detected within <data_root>/<train_path>, the script will create an ImageFolder dataset with, as input, the path: <data_root>/<train_path>/imgs

Ya that's okay. But for running the validating code on my dataset, I do need the bin files, right?

@afm215
Copy link

afm215 commented Sep 21, 2023

So I have managed to automate the AdaFace code for any general dataset. The only issue that remains for me is how the validation data is created and used. For example: consider the file lfw_list.npy which basically contains True/False indicating whether a pair of images belongs to the same identity or not(number of such entries is 6000). The data file corresponding to lfw_list.npy has 12000 images. I am not able to figure out that how to know which pair of images in the validation data is being referred by each entry of lfw_list.npy file? Am I missing out something? Or is it like first two images from the validation set will correspond to one entry of .npy file??

This is what I think, the first two images probably correspond to the first entry of the npy file.

@afm215
Copy link

afm215 commented Sep 21, 2023

If you want to get rid of the .bin file, I think you'll have to modify the code itself. In the current state, you however don't have to use .rec files for training. If no rec file is detected within <data_root>/<train_path>, the script will create an ImageFolder dataset with, as input, the path: <data_root>/<train_path>/imgs

Ya that's okay. But for running the validating code on my dataset, I do need the bin files, right?

Yep. you can try to use the function convert_test_set of my python_utils packages convert_test_set(test_dir: str, pair_file: str).
test_dir is the path toward your validation images . Lets suppose test_dir looks like this:
test_dir
-0001
01.jpg
02.jpg
-0002
01.jpg
02.jpg

Then pair_file should be the path to a file that contains pair like this:
0001_0.alias 0001_1.alias
0002_0.alias 0001_1.alias
etc
i.e. each line should be <folder_id_1><img_idx>.alias <folder_id_2><img_idx>.alias

@ayush0x00
Copy link

So I have managed to automate the AdaFace code for any general dataset. The only issue that remains for me is how the validation data is created and used. For example: consider the file lfw_list.npy which basically contains True/False indicating whether a pair of images belongs to the same identity or not(number of such entries is 6000). The data file corresponding to lfw_list.npy has 12000 images. I am not able to figure out that how to know which pair of images in the validation data is being referred by each entry of lfw_list.npy file? Am I missing out something? Or is it like first two images from the validation set will correspond to one entry of .npy file??

This is what I think, the first two images probably correspond to the first entry of the npy file.

If that's the case, it should probably solve the issue. Thanks for the help. Will send a PR for automating the whole process with some UI updates

@ayush0x00
Copy link

@afm215 can you please look into issue #127 ?

@giangnv125
Copy link

Hello @afm215 , @ayush0x00
I want to modify validation dir that look like this:
val_dir
-0001
01.jpg
02.jpg
-0002
01.jpg
02.jpg
Do you solve this? Can you share it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants