-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using local dataset but blocked at load_dataset #1703
Comments
For large scale dataset, it is better to follow open_clip. |
Hi @haofanwang Thanks for your reply. Does the huggingface dataset also suffer from loading officially supported datasets, like LAION? I think my slow processing is not expected. |
Hi @BrandonHanx! At the moment
from datasets import Dataset, Image, load_from_disk, load_dataset
ds = Dataset.from_dict({"image": list(glob.glob("path/to/dir/**/*.jpg"))})
def add_metadata(example):
...
ds = ds.map(add_metadata, num_proc=16) # num_proc for multiprocessing
ds = ds.cast_column("image", Image())
# save as Arrow locally
ds.save_to_disk("output_dir")
reloaded = load_from_disk("output_dir")
# OR save as Parquet on the HF Hub
ds.push_to_hub("username/dataset_name")
reloaded = load_dataset("username/dataset_name")
# reloaded = load_dataset("username/dataset_name", num_proc=16) # to use multiprocessing |
Hi @anton-l Now I'm closing this issue |
Hi, @pcuenca @patil-suraj @anton-l
I'm trying to fully fine-tune SD u-net on my own dataset, including about 1M image-text pairs.
I'm following the script in
examples/text_to_image
.However, it's blocked at load_dataset with very slow processing. It has been 12 hrs but the preparation of these 1M data is still not ready.
diffusers/examples/text_to_image/train_text_to_image.py
Lines 424 to 431 in ef3fcbb
I tried another smaller dataset with only thousands of image-text pairs and everything worked fine.
So I was just wondering if this slow-loading process is expected for large datasets.
OS: Ubuntu-20.04, GPU: 32GB V100 x 8, dependencies: according to the current installation instructions
The text was updated successfully, but these errors were encountered: