How to inference？ #165

monk-after-90s · 2024-09-11T06:19:36Z

This is for bugs only

Did you already ask in the discord?

Yes

You verified that this is a bug and not a feature request or question by asking in the discord?

Yes

Describe the bug

As a newcomer, I have trained out the lora weight on the gradio UI. Intuitively, I tried to run inference on the UI. However, there is no way to inference on it.

Then, I follow the README in the directory of the lora weight step by step,

from diffusers import AutoPipelineForText2Image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained('black-forest-labs/FLUX.1-dev', torch_dtype=torch.bfloat16).to('cuda')

.
Then I got

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 90.00 MiB. GPU 1 has a total capacity of 23.55 GiB of which 51.69 MiB is free. Including non-PyTorch memory, this process has 23.49 GiB memory in use. Of the allocated memory 23.11 GiB is allocated by PyTorch, and 12.93 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

My gpu is gtx4090 with 24g vram in.

Now I lose the direction. Beg for help from anyone.

egeres · 2024-09-16T18:17:18Z

There's less information about FLUX inference on consumer hardware than I expected, I had some luck running it using optimum-quanto. Could you try pip install optimum-quanto and then the following code?

import torch
from diffusers import FluxPipeline
from optimum.quanto import freeze, qfloat8, quantize


pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()  # Less VRAM or something

quantize(pipe.transformer, weights=qfloat8)
freeze(pipe.transformer)

image = pipe(  # type: ignore
    "A happy cat",
    height=1024,
    width=1024,
    guidance_scale=3.5,
    num_inference_steps=50,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0),
).images[0]
image.save("cat.png")

monk-after-90s mentioned this issue Sep 12, 2024

add a script for flux inference #168

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to inference？ #165

How to inference？ #165

monk-after-90s commented Sep 11, 2024

egeres commented Sep 16, 2024

How to inference？ #165

How to inference？ #165

Comments

monk-after-90s commented Sep 11, 2024

This is for bugs only

Describe the bug

egeres commented Sep 16, 2024