Problems with running self-constructed datasets #646
mssssss123
announced in
Announcements
Replies: 2 comments
-
Hi @mssssss123. You can modify the webdataset pipeline to do that. In particular in this line open_clip/src/training/data.py Line 391 in f692ec9 |
Beta Was this translation helpful? Give feedback.
0 replies
-
@mssssss123 You may want to take a look at EIFY@66e4603, which enables open_clip to use Redcaps json caption files. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, thank you for such a great job!
I used the img2dataset library to convert the image-text dataset I built into the webdataset format, including jpg, txt, and json files. But when I use openclip for training, I not only want it to use jpg and txt files for training, I additionally save another type of text in the json file. I hope to be able to divide it into three situations during training, one is to use text in txt, one is to use text in an attribute in json, and one is to use a mixture of the two.
Can you give me some suggestions on how to use it or modify the code?
Beta Was this translation helpful? Give feedback.
All reactions