Skip to content

Using local dataset but blocked at load_dataset #1703

Closed
@BrandonHanx

Description

@BrandonHanx

Hi, @pcuenca @patil-suraj @anton-l
I'm trying to fully fine-tune SD u-net on my own dataset, including about 1M image-text pairs.
I'm following the script in examples/text_to_image.

However, it's blocked at load_dataset with very slow processing. It has been 12 hrs but the preparation of these 1M data is still not ready.

data_files = {}
if args.train_data_dir is not None:
data_files["train"] = os.path.join(args.train_data_dir, "**")
dataset = load_dataset(
"imagefolder",
data_files=data_files,
cache_dir=args.cache_dir,
)

I tried another smaller dataset with only thousands of image-text pairs and everything worked fine.
So I was just wondering if this slow-loading process is expected for large datasets.

OS: Ubuntu-20.04, GPU: 32GB V100 x 8, dependencies: according to the current installation instructions

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions