Saving and Loading Refiner pipeline causes load error #708

SuperSecureHuman · 2024-03-06T06:38:59Z

Describe the bug

Able to save Refiner graph, unable to load it

A clear and concise description of what the bug is.\

While trying to load the refiner compiled graph via onediffx, it errors out.

Your environment

OS

Ubuntu, Python 3.11

Run python -m oneflow --doctor and paste it here.

path: ['/home/aistudent/miniforge3/lib/python3.11/site-packages/oneflow']
version: 0.9.1.dev20240220+cu118
git_commit: 6621521
cmake_build_type: Release
rdma: True
mlir: True
enterprise: False

How To Reproduce

Steps to reproduce the behavior(code or script):

Save refiner graph

from time import perf_counter
import torch
from diffusers import (
    AutoPipelineForText2Image,
    AutoPipelineForImage2Image,
    AutoencoderTiny,
)

import oneflow as flow
#from onediff.infer_compiler import oneflow_compile
from onediff.schedulers import EulerDiscreteScheduler
#

from onediffx import compile_pipe, save_pipe, load_pipe


scheduler_refiner = EulerDiscreteScheduler.from_pretrained("stabilityai/stable-diffusion-xl-refiner-1.0", subfolder="scheduler")
refiner = AutoPipelineForImage2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    use_safetensors=True,
    torch_dtype=torch.float16,
    scheduler=scheduler_refiner,
    variant="fp16",
    vae=vae,
).to("cuda")

base = compile_pipe(base)
refiner = compile_pipe(refiner)


with flow.autocast("cuda"):
    for generation in queue:
        image = base(generation["prompt"], output_type="latent").images
        refiner(generation["prompt"], image=image)
        
        
save_pipe(refiner, "refiner_graph")

load_pipe(refiner, "refiner_graph")

The complete error message

Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00,  5.93it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00,  6.78it/s]
Loading pipes
Loaded base pipe
Traceback (most recent call last):
  File "/home/aistudent/sdxl/graph_loader.py", line 90, in <module>
    load_pipe(refiner, "refiner_graph")
  File "/tmp/onediff/onediff_diffusers_extensions/onediffx/compilers/diffusion_pipeline_compiler.py", line 100, in load_pipe
    obj.load_graph(os.path.join(dir, part))
  File "/home/aistudent/miniforge3/lib/python3.11/site-packages/onediff/infer_compiler/with_oneflow_compile.py", line 322, in load_graph
    self.get_graph().load_graph(file_path, device, run_warmup)
  File "/home/aistudent/miniforge3/lib/python3.11/site-packages/onediff/infer_compiler/utils/cost_util.py", line 48, in clocked
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/aistudent/miniforge3/lib/python3.11/site-packages/onediff/infer_compiler/with_oneflow_compile.py", line 348, in load_graph
    self.load_runtime_state_dict(state_dict, warmup_with_run=run_warmup)
  File "/home/aistudent/miniforge3/lib/python3.11/site-packages/oneflow/nn/graph/graph.py", line 1188, in load_runtime_state_dict
    return self._dynamic_input_graph_cache.load_runtime_state_dict(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aistudent/miniforge3/lib/python3.11/site-packages/oneflow/nn/graph/cache.py", line 239, in load_runtime_state_dict
    assert graph is None
AssertionError

The text was updated successfully, but these errors were encountered:

doombeaker · 2024-03-11T06:41:51Z

hello，in your code, there are 3 important steps:

Compiling the Module
Saving the graph
Loading the graph
OneDiff currently doesn't support doing step 3 after step 1, because once the graph has been compiled, there's no need to load another graph.

If you want test the load_graph function, you can change your code to something like that:

from time import perf_counter
import torch
from diffusers import (
    AutoPipelineForText2Image,
    AutoPipelineForImage2Image,
    AutoencoderTiny,
)

import oneflow as flow
#from onediff.infer_compiler import oneflow_compile
from onediff.schedulers import EulerDiscreteScheduler
#

from onediffx import compile_pipe, save_pipe, load_pipe


scheduler_refiner = EulerDiscreteScheduler.from_pretrained("stabilityai/stable-diffusion-xl-refiner-1.0", subfolder="scheduler")
refiner = AutoPipelineForImage2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    use_safetensors=True,
    torch_dtype=torch.float16,
    scheduler=scheduler_refiner,
    variant="fp16",
    vae=vae,
).to("cuda")

base = compile_pipe(base)
refiner = compile_pipe(refiner)

load_pipe(refiner, "refiner_graph")  # make sure the `refiner_graph` file exist

with flow.autocast("cuda"):
    for generation in queue:
        image = base(generation["prompt"], output_type="latent").images
        refiner(generation["prompt"], image=image)

SuperSecureHuman · 2024-03-11T06:44:44Z

The above code I shared was for testing only. It also errors out.

In my actual code base, saving refiner is a separate process, and loading the saved graph is separate function

doombeaker · 2024-03-11T07:01:24Z

The above code I shared was for testing only. It also errors out.

In my actual code base, saving refiner is a separate process, and loading the saved graph is separate function

I try to run the code above you gave, but it failed according to no value like vae, base, generation, etc. found.

The error message you show above is indeed a error caused by trying to load a graph using a compiled object. can you re-confirm it?

or can you extract the POC code that I can run directly and then I can show you how to fix it.

SuperSecureHuman · 2024-03-11T07:03:11Z

I'll update in few hours, when I get back to my machine

SuperSecureHuman · 2024-03-11T08:04:51Z

Here is an reproducible example (tested now)

file - graph_exporter.py

from time import perf_counter
import torch
from diffusers import (
    AutoPipelineForText2Image,
    AutoPipelineForImage2Image,
    AutoencoderTiny,
)

import oneflow as flow
from onediff.schedulers import EulerDiscreteScheduler
from onediffx import compile_pipe, save_pipe
from onediffx.lora import load_and_fuse_lora, unfuse_lora


queue = []

queue.extend(
    [
        {
            "prompt": "3/4 shot, candid photograph of a beautiful 30 year old redhead woman with messy dark hair, peacefully sleeping in her bed, night, dark, light from window, dark shadows, masterpiece, uhd, moody",
            "seed": 877866765,
        }
    ]
)


vae = AutoencoderTiny.from_pretrained(
    "madebyollin/taesdxl",
    use_safetensors=True,
    torch_dtype=torch.float16,
).to("cuda")

scheduler_base = EulerDiscreteScheduler.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="scheduler")
base = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    use_safetensors=True,
    scheduler=scheduler_base,
    torch_dtype=torch.float16,
    variant="fp16",
    vae=vae,
).to("cuda")

scheduler_refiner = EulerDiscreteScheduler.from_pretrained("stabilityai/stable-diffusion-xl-refiner-1.0", subfolder="scheduler")
refiner = AutoPipelineForImage2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    use_safetensors=True,
    torch_dtype=torch.float16,
    scheduler=scheduler_refiner,
    variant="fp16",
    vae=vae,
).to("cuda")

base = compile_pipe(base)
refiner = compile_pipe(refiner)


with flow.autocast("cuda"):
    for generation in queue:
        image = base(generation["prompt"], output_type="latent").images
        refiner(generation["prompt"], image=image)
        
        
save_pipe(base, "base_graph")
save_pipe(refiner, "refiner_graph")

file - graph_loader.py

from time import perf_counter
import torch
from diffusers import (
    AutoPipelineForText2Image,
    AutoPipelineForImage2Image,
    AutoencoderTiny,
)

import oneflow as flow

from onediff.schedulers import EulerDiscreteScheduler
from onediffx import compile_pipe, load_pipe



queue = []

queue.extend(
    [
        {
            "prompt": "3/4 shot, candid photograph of a beautiful 30 year old redhead woman with messy dark hair, peacefully sleeping in her bed, night, dark, light from window, dark shadows, masterpiece, uhd, moody",
            "seed": 877866765,
        }
    ]
)


vae = AutoencoderTiny.from_pretrained(
    "madebyollin/taesdxl",
    use_safetensors=True,
    torch_dtype=torch.float16,
).to("cuda")

scheduler_base = EulerDiscreteScheduler.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", subfolder="scheduler"
)
base = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    use_safetensors=True,
    scheduler=scheduler_base,
    torch_dtype=torch.float16,
    variant="fp16",
    vae=vae,
).to("cuda")

scheduler_refiner = EulerDiscreteScheduler.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0", subfolder="scheduler"
)
refiner = AutoPipelineForImage2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    use_safetensors=True,
    torch_dtype=torch.float16,
    scheduler=scheduler_refiner,
    variant="fp16",
    vae=vae,
).to("cuda")

base = compile_pipe(base)
refiner = compile_pipe(refiner)

print("Loading pipes")
load_pipe(base, "base_graph")
print("Loaded base pipe")
load_pipe(refiner, "refiner_graph")


with flow.autocast("cuda"):
    for generation in queue:
        image = base(generation["prompt"], output_type="latent").images
        refiner(generation["prompt"], image=image)

File system - the compiled graphs

Error when trying to load refiner graph

EDIT: Added image of filesystem

doombeaker · 2024-03-11T08:35:55Z

thanks, I have run your script separately, but didn't reproduce the erorr.

what OneFlow I use is:

(base) yaochi@oneflow-28:~/onediff/onediff_diffusers_extensions/examples/refiner_graph$ python -m oneflow --doctor
path: ['/data/home/yaochi/miniconda3/lib/python3.10/site-packages/oneflow']
version: 0.9.1.dev20240304+cu121

what OneDiff I use is:

Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import onediff
>>> onediff.__version__
'0.13.0.dev1'

could you try to check the version you use first. and if the error remains, you can update it to me.

SuperSecureHuman · 2024-03-11T17:34:57Z

Here is my env currently.

BTW, the refiner and base, both work as expected when compiled. Just saving and loading has this issue.

Here are versions of relevant libraries.

torch 2.1.2
diffusers 0.26.2
onediff 0.13.0.dev202403100126
onediffx 0.13.0.dev0
transformers 4.38.2

doombeaker · 2024-03-12T02:01:17Z

Here is my env currently.

BTW, the refiner and base, both work as expected when compiled. Just saving and loading has this issue.

Here are versions of relevant libraries.

torch 2.1.2 diffusers 0.26.2 onediff 0.13.0.dev202403100126 onediffx 0.13.0.dev0 transformers 4.38.2

dose the issue remain in your current environment?

BTW, the refiner and base, both work as expected when compiled. Just saving and loading has this issue.

I try both of two scripts(save/load) you provided, both works well.

SuperSecureHuman · 2024-03-12T03:11:19Z

Yeah, issues persist..

doombeaker · 2024-03-12T03:21:46Z

Yeah, issues persist..

I have reproduced the issue. I will try to fix it.

doombeaker · 2024-03-13T05:40:57Z

I have identifiedthe problem, but I have not yet figured out how to fix this bug in OneDiff.
However, the direct cause of this problem is that base and refiner models used the same vae object, and this vae conflicted when loading the graph.
So you can use two different vae objects to bypass this problem:

from time import perf_counter
import torch
from diffusers import (
    AutoPipelineForText2Image,
    AutoPipelineForImage2Image,
    AutoencoderTiny,
)

import oneflow as flow

from onediff.schedulers import EulerDiscreteScheduler
from onediffx import compile_pipe, load_pipe



queue = []

queue.extend(
    [
        {
            "prompt": "3/4 shot, candid photograph of a beautiful 30 year old redhead woman with messy dark hair, peacefully sleeping in her bed, night, dark, light from window, dark shadows, masterpiece, uhd, moody",
            "seed": 877866765,
        }
    ]
)


vae = AutoencoderTiny.from_pretrained(
    "madebyollin/taesdxl",
    use_safetensors=True,
    torch_dtype=torch.float16,
).to("cuda")

vae2 = AutoencoderTiny.from_pretrained(
    "madebyollin/taesdxl",
    use_safetensors=True,
    torch_dtype=torch.float16,
).to("cuda")

scheduler_base = EulerDiscreteScheduler.from_pretrained(
    "/share_nfs/hf_models/stable-diffusion-xl-base-1.0", subfolder="scheduler"
)
base = AutoPipelineForText2Image.from_pretrained(
    "/share_nfs/hf_models/stable-diffusion-xl-base-1.0",
    use_safetensors=True,
    scheduler=scheduler_base,
    torch_dtype=torch.float16,
    variant="fp16",
    vae=vae,
).to("cuda")

scheduler_refiner = EulerDiscreteScheduler.from_pretrained(
    "/share_nfs/hf_models/stable-diffusion-xl-refiner-1.0", subfolder="scheduler"
)
refiner = AutoPipelineForImage2Image.from_pretrained(
    "/share_nfs/hf_models/stable-diffusion-xl-refiner-1.0",
    use_safetensors=True,
    torch_dtype=torch.float16,
    scheduler=scheduler_refiner,
    variant="fp16",
    vae=vae2,
).to("cuda")

base = compile_pipe(base)
refiner = compile_pipe(refiner)

print("Loading pipes")
load_pipe(base, "base_graph")
print("Loaded base pipe")
load_pipe(refiner, "refiner_graph")


with flow.autocast("cuda"):
    for generation in queue:
        image = base(generation["prompt"], output_type="latent").images
        refiner(generation["prompt"], image=image)

The more general solution will be added to OneDiff and will be transparent to user

SuperSecureHuman · 2024-03-13T06:54:50Z

Understood.

This works for me now.

You may close the issue if you want now : )

SuperSecureHuman added the Request-bug Something isn't working label Mar 6, 2024

SuperSecureHuman assigned strint Mar 6, 2024

strint added Rsp-triaged labels Mar 6, 2024

strint added this to the v0.13.0 milestone Mar 6, 2024

strint assigned doombeaker Mar 8, 2024

strint added Response-need_hours This issue need some hours to be solved and removed Response-triaged labels Mar 9, 2024

doombeaker closed this as completed Mar 13, 2024

strint added the sig-compiler label Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving and Loading Refiner pipeline causes load error #708

Saving and Loading Refiner pipeline causes load error #708

SuperSecureHuman commented Mar 6, 2024 •

edited

Loading

doombeaker commented Mar 11, 2024 •

edited

Loading

SuperSecureHuman commented Mar 11, 2024

doombeaker commented Mar 11, 2024

SuperSecureHuman commented Mar 11, 2024

SuperSecureHuman commented Mar 11, 2024 •

edited

Loading

doombeaker commented Mar 11, 2024 •

edited

Loading

SuperSecureHuman commented Mar 11, 2024

doombeaker commented Mar 12, 2024

SuperSecureHuman commented Mar 12, 2024

doombeaker commented Mar 12, 2024

doombeaker commented Mar 13, 2024

SuperSecureHuman commented Mar 13, 2024

Saving and Loading Refiner pipeline causes load error #708

Saving and Loading Refiner pipeline causes load error #708

Comments

SuperSecureHuman commented Mar 6, 2024 • edited Loading

Describe the bug

A clear and concise description of what the bug is.\

Your environment

OS

How To Reproduce

The complete error message

doombeaker commented Mar 11, 2024 • edited Loading

SuperSecureHuman commented Mar 11, 2024

doombeaker commented Mar 11, 2024

SuperSecureHuman commented Mar 11, 2024

SuperSecureHuman commented Mar 11, 2024 • edited Loading

doombeaker commented Mar 11, 2024 • edited Loading

SuperSecureHuman commented Mar 11, 2024

doombeaker commented Mar 12, 2024

SuperSecureHuman commented Mar 12, 2024

doombeaker commented Mar 12, 2024

doombeaker commented Mar 13, 2024

SuperSecureHuman commented Mar 13, 2024

SuperSecureHuman commented Mar 6, 2024 •

edited

Loading

doombeaker commented Mar 11, 2024 •

edited

Loading

SuperSecureHuman commented Mar 11, 2024 •

edited

Loading

doombeaker commented Mar 11, 2024 •

edited

Loading