Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to inference? #165

Open
monk-after-90s opened this issue Sep 11, 2024 · 1 comment
Open

How to inference? #165

monk-after-90s opened this issue Sep 11, 2024 · 1 comment

Comments

@monk-after-90s
Copy link

This is for bugs only

Did you already ask in the discord?

Yes

You verified that this is a bug and not a feature request or question by asking in the discord?

Yes

Describe the bug

As a newcomer, I have trained out the lora weight on the gradio UI. Intuitively, I tried to run inference on the UI. However, there is no way to inference on it.

Then, I follow the README in the directory of the lora weight step by step,

from diffusers import AutoPipelineForText2Image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained('black-forest-labs/FLUX.1-dev', torch_dtype=torch.bfloat16).to('cuda')

.
Then I got

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 90.00 MiB. GPU 1 has a total capacity of 23.55 GiB of which 51.69 MiB is free. Including non-PyTorch memory, this process has 23.49 GiB memory in use. Of the allocated memory 23.11 GiB is allocated by PyTorch, and 12.93 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

My gpu is gtx4090 with 24g vram in.

Now I lose the direction. Beg for help from anyone.

@egeres
Copy link

egeres commented Sep 16, 2024

There's less information about FLUX inference on consumer hardware than I expected, I had some luck running it using optimum-quanto. Could you try pip install optimum-quanto and then the following code?

import torch
from diffusers import FluxPipeline
from optimum.quanto import freeze, qfloat8, quantize


pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()  # Less VRAM or something

quantize(pipe.transformer, weights=qfloat8)
freeze(pipe.transformer)

image = pipe(  # type: ignore
    "A happy cat",
    height=1024,
    width=1024,
    guidance_scale=3.5,
    num_inference_steps=50,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0),
).images[0]
image.save("cat.png")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants