Using local dataset but blocked at load_dataset

Hi, @pcuenca @patil-suraj @anton-l 
I'm trying to fully fine-tune SD u-net on my own dataset, including about 1M image-text pairs.
I'm following the script in `examples/text_to_image`.

However, it's blocked at load_dataset with very slow processing. It has been 12 hrs but the preparation of these 1M data is still not ready.
https://github.com/huggingface/diffusers/blob/ef3fcbb688b20189c04c768ca02149d2e49705ab/examples/text_to_image/train_text_to_image.py#L424-L431

I tried another smaller dataset with only thousands of image-text pairs and everything worked fine.
So I was just wondering if this slow-loading process is expected for large datasets.

OS: Ubuntu-20.04, GPU: 32GB V100 x 8, dependencies: according to the current installation instructions 

	data_files = {}
	if args.train_data_dir is not None:
	data_files["train"] = os.path.join(args.train_data_dir, "**")
	dataset = load_dataset(
	"imagefolder",
	data_files=data_files,
	cache_dir=args.cache_dir,
	)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using local dataset but blocked at load_dataset #1703

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using local dataset but blocked at load_dataset #1703

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions