Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPS support for doggettx-optimizations #431

Closed
Any-Winter-4079 opened this issue Sep 7, 2022 · 342 comments
Closed

MPS support for doggettx-optimizations #431

Any-Winter-4079 opened this issue Sep 7, 2022 · 342 comments

Comments

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Sep 7, 2022

Okay, so I've seen @lstein has added
x = x.contiguous() if x.device.type == 'mps' else x
to ldm/modules/attention.py in the doggettx-optimizations branch
but there's another error happening how
KeyError: 'active_bytes.all.current'
and this has to do with this function in attention.py

def forward(self, x, context=None, mask=None):
        h = self.heads

        q_in = self.to_q(x)
        context = default(context, x)
        k_in = self.to_k(context)
        v_in = self.to_v(context)
        del context, x

        q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> (b h) n d', h=h), (q_in, k_in, v_in))
        del q_in, k_in, v_in

        r1 = torch.zeros(q.shape[0], q.shape[1], v.shape[2], device=q.device)

        stats = torch.cuda.memory_stats(q.device)
        mem_active = stats['active_bytes.all.current']
        mem_reserved = stats['reserved_bytes.all.current']
        mem_free_cuda, _ = torch.cuda.mem_get_info(torch.cuda.current_device())
        mem_free_torch = mem_reserved - mem_active
        mem_free_total = mem_free_cuda + mem_free_torch

        gb = 1024 ** 3
        tensor_size = q.shape[0] * q.shape[1] * k.shape[1] * 4
        mem_required = tensor_size * 2.5
        steps = 1

        if mem_required > mem_free_total:
            steps = 2**(math.ceil(math.log(mem_required / mem_free_total, 2)))
            # print(f"Expected tensor size:{tensor_size/gb:0.1f}GB, cuda free:{mem_free_cuda/gb:0.1f}GB "
            #       f"torch free:{mem_free_torch/gb:0.1f} total:{mem_free_total/gb:0.1f} steps:{steps}")

        if steps > 64:
            max_res = math.floor(math.sqrt(math.sqrt(mem_free_total / 2.5)) / 8) * 64
            raise RuntimeError(f'Not enough memory, use lower resolution (max approx. {max_res}x{max_res}). '
                               f'Need: {mem_required/64/gb:0.1f}GB free, Have:{mem_free_total/gb:0.1f}GB free')

        slice_size = q.shape[1] // steps if (q.shape[1] % steps) == 0 else q.shape[1]
        for i in range(0, q.shape[1], slice_size):
            end = i + slice_size
            s1 = einsum('b i d, b j d -> b i j', q[:, i:end], k) * self.scale

            s2 = s1.softmax(dim=-1)
            del s1

            r1[:, i:end] = einsum('b i j, b j d -> b i d', s2, v)
            del s2

        del q, k, v

        r2 = rearrange(r1, '(b h) n d -> b n (h d)', h=h)
        del r1

        return self.to_out(r2)

Which is basically the code that detects your free memory, and then splits the softmax operation in steps, to allow to generate larger images.

Now, because we are on Mac, I'm not sure @lstein can help us much (unless he has one around), but I open this issue for anyone that wants to collaborate in porting this functionality to M1

@Any-Winter-4079 Any-Winter-4079 changed the title MPS support for doggettx-optimizations branch https://github.com/lstein/stable-diffusion/issues/364 MPS support for doggettx-optimizations Sep 7, 2022
@lstein
Copy link
Collaborator

lstein commented Sep 8, 2022

If someone knows how to get free VRAM memory on MPS devices, we just need to replace the torch.cuda calls.

@lstein
Copy link
Collaborator

lstein commented Sep 8, 2022

I Googled around, and there doesn't seem to be an equivalent set of memory interrogation calls for CPU.

I'm not sure how the M1 works, but if it is sharing main memory (i.e. RAM) you might be able to get the needed metrics using psutil

@Vargol
Copy link
Contributor

Vargol commented Sep 8, 2022

I just hacked it all out in my fork and set slice_size to 1 :-) that gets me doing 1024x1024 (very slowly) on a 8Gb M1 mini.
Be interesting to see the results of that on a larger GPU.

@Any-Winter-4079
Copy link
Contributor Author

I just hacked it all out in my fork and set slice_size to 1 :-) that gets me doing 1024x1024 (very slowly) on a 8Gb M1 mini.
Be interesting to see the results of that on a larger GPU.

It definitely works. I'll add the results below.
In this v1,
I've changed ldm/modules/diffusionmodules/model.py and ldm/modules/attention.py from Doggettx-optimization branch.
attention.py.zip
model.py.zip

In model.py I've commented out

# stats = torch.cuda.memory_stats(q.device)
# mem_active = stats['active_bytes.all.current']
# mem_reserved = stats['reserved_bytes.all.current']
# mem_free_cuda, _ = torch.cuda.mem_get_info(torch.cuda.current_device())
# mem_free_torch = mem_reserved - mem_active
# mem_free_total = mem_free_cuda + mem_free_torch

and left steps at 1

tensor_size = q.shape[0] * q.shape[1] * k.shape[2] * 4
mem_required = tensor_size * 2.5
steps = 1

And commented out

# if mem_required > mem_free_total:
#     steps = 2**(math.ceil(math.log(mem_required / mem_free_total, 2)))

so there's probably an improvement to be made using psutil here.

Where I did use psutil is in attention.py
Again, convention this out

# stats = torch.cuda.memory_stats(q.device)
# mem_active = stats['active_bytes.all.current']
# mem_reserved = stats['reserved_bytes.all.current']
# mem_free_cuda, _ = torch.cuda.mem_get_info(torch.cuda.current_device())
# mem_free_torch = mem_reserved - mem_active
# mem_free_total = mem_free_cuda + mem_free_torch

but importing psutil
import psutil
and using
mem_free_total = psutil.virtual_memory().available
So we can use it to calculate steps (the same way they do), instead of leaving it at steps=1

gb = 1024 ** 3
tensor_size = q.shape[0] * q.shape[1] * k.shape[1] * 4
mem_required = tensor_size * 2.5
steps = 1

if mem_required > mem_free_total:
        steps = 2**(math.ceil(math.log(mem_required / mem_free_total, 2)))

@Any-Winter-4079
Copy link
Contributor Author

Definitely a step in the right direction
Screenshot 2022-09-08 at 16 20 44

@lstein
Copy link
Collaborator

lstein commented Sep 8, 2022

This is looking pretty encouraging. When you are satisfied with the performance on MPS, could you make your changes conditional on the device type so that CUDA systems will work as well? Then make a PR against the doggettx-optimizations branch.

Think this might be done by tonight? I'm planning a development freeze, some testing, and then pulling into main over the weekend.

@Any-Winter-4079
Copy link
Contributor Author

Any-Winter-4079 commented Sep 8, 2022

That's the plan, yes, to make the changes conditional based on device type.
I'm not sure about the PR tonight (b/c I've never done a pull request and it's already 00:00h here, so I might be a bit tired to look into how to make one tonight -fork, pull...), but I'll leave my code here in #431 in 1-1h30' max.

@Any-Winter-4079
Copy link
Contributor Author

Okay, changes are done. I'm doing the testing.

@Any-Winter-4079
Copy link
Contributor Author

@Any-Winter-4079
Copy link
Contributor Author

Any-Winter-4079 commented Sep 8, 2022

@lstein Above are the files.
ldm/modules/diffusionmodules/model.py and ldm/modules/attention.py

Performance seems comparable to CompViz vanilla with some M1 workarounds dbc8fc79008795875eb22ebf0c57927061af86bc (lstein fork) which is the best performance I've seen on M1.

Regarding memory, I have to do more digging because while this afternoon I could generate 896x896 and 1024x768 (results I couldn't generate before), now at night I'm back to memory errors.

In any case, this change should benefit CUDA users while allowing MPS devices to (apparently/presumably/hopefully) function at least as well as we currently do on the development branch

@Any-Winter-4079
Copy link
Contributor Author

Any-Winter-4079 commented Sep 9, 2022

In the end, still awake :)
And I've found something very interesting for version 2 of these changes (post merge into development)
https://pullanswer.com/questions/mps-mpsndarray-error-product-of-dimension-sizes-2-31

The common error I think all M1 users get of Error: product of dimension sizes > 2**31' is referenced here
Screenshot 2022-09-09 at 02 19 18
which made me think about tinkering with values, and setting steps = 64 (max value), it generates a 1024x1024 on my M1!
However, if we do it like this (currently the way in Doggettx-optimizations branch) steps = 2**(math.ceil(math.log(mem_required / mem_free_total, 2))) it fails

It takes a long time with steps=64, but testing around, it also works with steps=32, and even steps=4 (taking much less time).

Pretty nice, and calls for some testing tomorrow

PS: I'd just merge the 2 files above and leave this "finding" for a future PR.

@heurihermilab
Copy link

I can confirm that it's working and an improvement. Speedup was 2x over plain development branch, and now I'm testing larger image sizes...

Environment: Development branch with the two files above swapped in. Machine: MBP 14", M1 Pro, 16GB, latest OS, running miniforge with a base of Python 3.10.6. Browser: Firefox 104.0.2.

@Vargol
Copy link
Contributor

Vargol commented Sep 9, 2022

which made me think about tinkering with values, and setting steps = 64 (max value), it generates a 1024x1024 on my M1! However, if we do it like this (currently the way in Doggettx-optimizations branch) steps = 2**(math.ceil(math.log(mem_required / mem_free_total, 2))) it fails

That's odd I've managed a very slow 1024x1024 from doggettx's optimaztions on my 8Gb M1
Here's the code I'm using, as I'm said before it was a cut of lstein's paint but with the Doggettx
code added and hard coded to 1 step.

https://github.com/Vargol/stable-diffusion_m1_8gb

Have you got an lot of other stuff running at the same time eating up Memory ?

@Any-Winter-4079
Copy link
Contributor Author

This is my memory usage right after booting the computer. Only the model loaded + 1 VS tab with the code
Screenshot 2022-09-09 at 11 55 52
So, I introduced 2 prints

print('mem_required', mem_required / 10**9)
print('mem_free_total', mem_free_total / 10**9)

and I get

mem_required 25.17630976
mem_free_total 51.890225152

The mem_free_total is calculated with mem_free_total = psutil.virtual_memory().available and it makes sense with the picture above from Activity Monitor. (64 - 13) GB = 51 GB
However, it crashes with Error: product of dimension sizes > 2**31 even though in theory, it only needs 25GB. Which doesn't make sense, does it?

In this discussion https://pullanswer.com/questions/mps-mpsndarray-error-product-of-dimension-sizes-2-31 they were saying that the problem was with Metal and that depending on the size/number of dimensions of the operation (e.g. einsum), a different algorithm might get selected.

So maybe you give it a smaller array and it fails but feed it a bigger array and chooses a different algorithm that doesn't have the Error: product of dimension sizes > 2**31 bug and it works. That's my understanding.

@Any-Winter-4079
Copy link
Contributor Author

Any-Winter-4079 commented Sep 9, 2022

For example, setting the steps as you do (with fixed value instead of calculating it), and "banana sushi" -s50 -C7.5 -n3 -W896 -H896 here are my results:

With steps = 1

mem_required 25.17630976
mem_free_total 53.156200448

Result: Error: product of dimension sizes > 2**31

With steps = 2

mem_required 25.17630976
mem_free_total 52.520927232

It works.

Why, it's really not apparent to me.


The problem with hard-setting the steps, though, is that, as the code progresses, mem_free_total reduces (note: leak or expected behavior ?).

mem_required 25.17630976
mem_free_total 32.847593472

So, maybe there could be a point where it failed mid-execution, because we hard-set steps = 2 and it can't do it anymore.

The solution I'm thinking is a mix of both techniques. Setting the steps dynamically (so it doesn't run out of memory), but also setting steps = max (2, steps) -not letting it be steps = 1, where it throws the error- for images larger than 512x512 or something like that.

@Vargol
Copy link
Contributor

Vargol commented Sep 9, 2022

The 2**31 seems to be einsum trying to use a Tensor with more than 2,147,483,648 values as part of its calculation
it not a memory thing, just a bug or limitation in the einsum implementation somewhere.

I remember have a similar issue when I simply set steps to 1 but allowed the slice_size calculation to go ahead

slice_size = q.shape[1] // steps if (q.shape[1] % steps) == 0 else q.shape[1]

which wasn't in an older cut of the code..

my code doing this , is that what you tried ?

        steps=1

        for i in range(0, q.shape[0], steps):
            end = i + steps 


And yes I appreciate that if people try even bigger images they may run out of memory but for me more steps just means slower renders and 1024x1024 is already 50 S / IT, as in n_sample=50, n_iter=1 takes 40 odd Minutes to generate an image .

@lstein
Copy link
Collaborator

lstein commented Sep 9, 2022

attention 3.py.zip model 2.py.zip

Looks like I was monitoring the wrong thread! I'll fold in these changes this morning and freeze development for testing. Thanks so much for this.

@Any-Winter-4079
Copy link
Contributor Author

Any-Winter-4079 commented Sep 9, 2022

@Vargol I can take slice_size up to ~10k. More than that, it seems to fail.
Screenshot 2022-09-09 at 15 45 55

Have you tried bigger slice_size? You are basically using 1 as slice_size, which seems very low (even for 8GB)

I tried something very similar to your code, simply with larger slice_size, in my case using this formula they have slice_size = q.shape[1] // steps if (q.shape[1] % steps) == 0 else q.shape[1] which depends on steps.

For example, for "banana sushi" -s5 -C7.5 -n3 -W896 -H896 then q.shape = torch.Size([16, 12544, 40]) so q.shape[1] is 12544 and given that my mem_free_total >= mem_required, it maintains steps = 1, so with the formula above, keeps slice_size = 12544 and fails.

We should be able to find a sweet spot, shouldn't we?

@Any-Winter-4079
Copy link
Contributor Author

Any-Winter-4079 commented Sep 9, 2022

For example,
slice_size = 8192 (or below) allows me to run 896x896
slice_size = 6000 (or below) allows me to run 1024x1024

Hopefully there's some formula we can come up with for all M1 machines (8GB to 128GB)


Update:
So for 1024x1024, doggettx-optimizations branch suggests I use slice_size = 8192 (which fails)
But manually hard-setting slice_size = 8185 works.
So their calculation is not far off, but not completely precise for M1

@Vargol
Copy link
Contributor

Vargol commented Sep 9, 2022

So with my fixed value steps = slice_size, running
dream> "banana sushi" -s1 -C7.5 -n1 -W832 -H768

1 - 6 steps work, 6 steps are over 5x slower than 1 step

step = slice_size = 1
 1/1 [00:20<00:00, 20.45s/it

step = slice_size = 6
1/1 [01:50<00:00, 110.50s/it]

7 - 10 steps blow memory while sampling,

step = slice_size = 7 - 10
	The Metal Performance Shaders operations encoded on it may not have completed.
	Error: 
	(null)
	Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
	<AGXG13GFamilyCommandBuffer: 0x16d1049d0>

steps >= 11 fail with a oversized? buffer before sampling shows up in dreams.py

step = slice_size = 11
RuntimeError: Invalid buffer size: 5.94 GB

@Any-Winter-4079
Copy link
Contributor Author

@Vargol Hmm I'll study your case too.
To me, it happened the weirdest thing. Doggettx branch suggested 8192 slice size. Guess what? It failed. But 8191 works for 1024x1024

@Any-Winter-4079
Copy link
Contributor Author

Any-Winter-4079 commented Sep 9, 2022

Oh, this is interesting. So my computer can take slice_size = 8191 for 1024x1024. Found by trial and error.

Okay, so what slice_size could It take for 896x896? Well, I did (1024x1024 / 896x896) * 8191 = 10698.4489796, which rounding up is 10699. I tried that value and... it works!

But, I tried 10700 (one more) and it fails!

I'm sure there is a formula to be found (including RAM), but at least we seem to be able to hack the max slice_size for our own devices, which is awesome!


Update: So, I picked a random size. I wanted a 3200x1600 image. I used the formula and slice_size = 1677.5168. This time, I could not round up, but rounding down to 1677, it works again! (I only did it for 1 step but hey, it completed successfully)
And 1678 fails.

@Any-Winter-4079
Copy link
Contributor Author

Any-Winter-4079 commented Sep 9, 2022

I'll study it a bit more, but the problem with Doggettx (besides 8192 vs 8191) is that sometimes it suggests an even larger slice_size (I guess when it computes steps = 1 instead of steps = 2 based on memory), and then it breaks. If it weren't for that, maybe we could have used Doggettx's slice_size - 1

@Any-Winter-4079
Copy link
Contributor Author

@i3oc9i can you try your max slice_size for say 1024x1024 in your Mac with 128GB? We might be able to work out a formula including the RAM. Or someone else with a Mac different than 64 GB (which I have)

@i3oc9i
Copy link

i3oc9i commented Sep 9, 2022

@Any-Winter-4079
sorry, this last two days I was busy at my work, I just checkout the development branch 75f633c

and fail, may be I'm missing somethibg ? whre is the new code to test ?

dream> "a don on the moon" -s50 -W1024 -H1024 -C7.5 -Ak_lms
>> This input is larger than your defaults. If you run out of memory, please use a smaller image.
/Users/ivano/Code/Ai/dream-dev/ldm/modules/embedding_manager.py:152: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at  /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1659484612588/work/aten/src/ATen/mps/MPSFallback.mm:11.)
  placeholder_idx = torch.where(
Generating:   0%|                                                                                                                                                                                                                                                                                                                          | 0/1 [00:00<?, ?it/s/AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:705: failed assertion `[MPSNDArray initWithDevice:descriptor:] Error: product of dimension sizes > 2**31'                                                                  | 0/50 [00:00<?, ?it/s]
zsh: abort      python scripts/dream.py --full_precision --outdir ../@Stuffs/images/samples
/Users/ivano/.miniconda/envs/dream-dev/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

dream> "test" -s50 -W832 -H832 -C7.5 -Ak_lms -S12345678 (run but I get noise)

@ryudrigo
Copy link

ryudrigo commented Sep 9, 2022

Could someone with a Mac please run these lines?

import torch
print (torch.cuda.get_device_name(0))

That's the best way I know to detect if it's a Mac GPU, but I couldn't find what to check it against. Thanks!

@Any-Winter-4079
Copy link
Contributor Author

Any-Winter-4079 commented Sep 9, 2022

@ryudrigo

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/eduardoarinopelegrin/opt/anaconda3/envs/do_not_touch-osx-arm64-stable-diffusion/lib/python3.9/site-packages/torch/cuda/__init__.py", line 329, in get_device_name
    return get_device_properties(device).name
  File "/Users/eduardoarinopelegrin/opt/anaconda3/envs/do_not_touch-osx-arm64-stable-diffusion/lib/python3.9/site-packages/torch/cuda/__init__.py", line 359, in get_device_properties
    _lazy_init()  # will define _get_device_properties
  File "/Users/eduardoarinopelegrin/opt/anaconda3/envs/do_not_touch-osx-arm64-stable-diffusion/lib/python3.9/site-packages/torch/cuda/__init__.py", line 211, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Basically the command torch.cuda is going to fail because we don't have cuda.
You can detect it like this: device_type = 'mps' if x.device.type == 'mps' else 'cuda' Here you have 2 files that check for Mac GPU #431 (comment)

@Doggettx
Copy link

Doggettx commented Sep 9, 2022

I'll study it a bit more, but the problem with Doggettx (besides 8192 vs 8191) is that sometimes it suggests an even larger slice_size (I guess when it computes steps = 1 instead of steps = 2 based on memory), and then it breaks. If it weren't for that, maybe we could have used Doggettx's slice_size - 1

I wouldn't adjust the slice_size because then it starts running incomplete parts of the whole array. It's best to increase the multiplier, which is probably too low then. So this part:

mem_required = tensor_size * 2.5

Probably needs more than .5 extra, could try 2.6 or if you want to be safe just put it at 3, it'll just scale up the steps a bit earlier than needed. Which scales down the slice_size

On a side note, it doesn't really have to step up in powers of 2, I just found that that was faster on average. You could change this part:

    slice_size = q.shape[1] // steps if (q.shape[1] % steps) == 0 else q.shape[1]
    for i in range(0, q.shape[1], slice_size):
        end = i + slice_size

To something like

    slice_size = q.shape[1] // steps
    for i in range(0, q.shape[1], slice_size):
        end = min(q.shape[1], i + slice_size)

then it can run at any step or slice_size (even higher than 64, but you'll crash later then anyhow due to other parts running out of memory)

@netsvetaev
Copy link
Contributor

@netsvetaev do you have other stuff running ?

I would have expected a chunk more available, 8 - 9 Gb based on what by 8Gb has available at that point.

Sorry, I had a safari window. Now 6.99gb and 1.30s/it. But 768 is very slow/doesn't work.

@Any-Winter-4079
Copy link
Contributor Author

Any-Winter-4079 commented Sep 13, 2022

As a new threshold, I propose maybe 3GB?
Vargol had a peak of 2.08GB
The other option is to use psutil.virtual_memory().total instead of psutil.virtual_memory().available

@Any-Winter-4079
Copy link
Contributor Author

Any-Winter-4079 commented Sep 13, 2022

@netsvetaev can you run 1024x1024? Does it take 10 minutes? 10 minutes is what the other person with Mac and 16GB reported.
Also, this should produce the same results but I have hopefully fixed the 512x512 issue using mem_total so you get around 1.3s/it
attention 12.py.zip

@netsvetaev
Copy link
Contributor

netsvetaev commented Sep 13, 2022

@netsvetaev can you run 1024x1024? Does it take 10 minutes?

It takes 30 at 35-37s/it. Hm.
Noticed that after I close terminal and open it again, it shows more ram available, +200-500mb.

768 gives an error, then goes for 150s/it.

rror: command buffer exited with error status. | 0/50 [00:00<?, ?it/s] The Metal Performance Shaders operations encoded on it may not have completed. Error: (null) Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory) <AGXG13XFamilyCommandBuffer: 0x288c234a0> label = <none> device = <AGXG13XDevice: 0x13f646400> name = Apple M1 Pro commandQueue = <AGXG13XFamilyCommandQueue: 0x12a34fe00> label = <none> device = <AGXG13XDevice: 0x13f646400> name = Apple M1 Pro retainedReferences = 1

@Any-Winter-4079
Copy link
Contributor Author

Some advice is, every time you run a dream command and completes, exit with q because at least a few days ago, there were some memory leaks. And then you can re-enter again and have a bit more RAM

@Any-Winter-4079
Copy link
Contributor Author

In any case, have you been able to run 1024x1024 with better results with some other code?

@netsvetaev
Copy link
Contributor

netsvetaev commented Sep 13, 2022

In any case, have you been able to run 1024x1024 with better results with some other code?

No, my best was around 35-40, no differences with your code. Maybe 1-2s.

@Any-Winter-4079
Copy link
Contributor Author

You know, it might be because you have a bit less RAM available than the other person with 16GB. I can't find any other reason. Architecture? Pytorch version?

@Any-Winter-4079
Copy link
Contributor Author

Any-Winter-4079 commented Sep 13, 2022

As a final test, do you get better results for large images with slice_size = 2 instead of slice_size = 1 It might break though.
Here
Screenshot 2022-09-13 at 15 40 44

@netsvetaev
Copy link
Contributor

You know, it might be because you have a bit less RAM available than the other person with 16GB. I can't find any other reason. Architecture? Pytorch version?

MacOS 13 beta, I think.

@Any-Winter-4079
Copy link
Contributor Author

Any-Winter-4079 commented Sep 13, 2022

You know, it might be because you have a bit less RAM available than the other person with 16GB. I can't find any other reason. Architecture? Pytorch version?

MacOS 13 beta, I think.

I'm running 12.5.1. No idea if there is any performance improvement/loss with the beta. I'd assume these results are more RAM-dependant than OS version-dependant, but who knows :)

@Any-Winter-4079
Copy link
Contributor Author

Any-Winter-4079 commented Sep 13, 2022

This is the version I'm planning on doing a PR with https://github.com/lstein/stable-diffusion/discussions/457#discussioncomment-3635644 If someone experiences a downgrade in performance vs. before, let me know.

@netsvetaev
Copy link
Contributor

attention 12.py.zip

Getting better:
512 — 1.6 (from 1.8 to 1.2)
768 — 9.8, was 13
1024 — 35.

@Any-Winter-4079
Copy link
Contributor Author

Oh, so happy to read that!

@i3oc9i
Copy link

i3oc9i commented Sep 13, 2022

attention 12.py.zip

Sorry bu I dont see speed difference with this.
I get the same execution time for 512x512 and 896x576

@Any-Winter-4079
Copy link
Contributor Author

Any-Winter-4079 commented Sep 13, 2022

Not for 64-128GB. The update is to give more speed for 16 -32GB Mac.
For us it should be the same speed.
I've done the PR #540

@netsvetaev
Copy link
Contributor

netsvetaev commented Sep 13, 2022

Oh, so happy to read that!

Is it ok that I have better results on a main branch? Seems like it less ram hungry and also faster. 512 at 1.48 (yours is 1.35), 1024 at 29.7 with up to 4.2gb swap.

Birch-san added a commit to Birch-san/stable-diffusion that referenced this issue Sep 14, 2022
…aboration incorporating a lot of people's contributions -- including for example @Doggettx and the original code from @neonsecret on which the Doggetx optimizations were based (see invoke-ai/InvokeAI#431, https://github.com/sd-webui/stable-diffusion-webui/pull/771\#issuecomment-1239716055). Takes exactly the same amount of time to run 8 steps as original CompVis code does (10.4 secs, ~1.25s/it).
@netsvetaev
Copy link
Contributor

netsvetaev commented Sep 14, 2022

I'm happy to add that on the latest macos 13 beta 1.14 main is got faster as 1.15s/it on 512px (58s total, always was 1:15-1:25), 8.4it/s on 768 (7:18, was 10-12 mins), and still 35s/it on 1024.

UPD. After a fresh install I've got 1.38s/it 512, 6.20s/it 768 and 20.5s/it 1024. So it were mine problems.

codedealer pushed a commit to codedealer/stable-diffusion-webui that referenced this issue Sep 16, 2022
…aboration incorporating a lot of people's contributions -- including for example @Doggettx and the original code from @neonsecret on which the Doggetx optimizations were based (see invoke-ai/InvokeAI#431, https://github.com/sd-webui/stable-diffusion-webui/pull/771\#issuecomment-1239716055). Takes exactly the same amount of time to run 8 steps as original CompVis code does (10.4 secs, ~1.25s/it).
hlky pushed a commit to Sygil-Dev/sygil-webui that referenced this issue Sep 16, 2022
…aboration incorporating a lot of people's contributions -- including for example @Doggettx and the original code from @neonsecret on which the Doggetx optimizations were based (see invoke-ai/InvokeAI#431, https://github.com/sd-webui/stable-diffusion-webui/pull/771\#issuecomment-1239716055). Takes exactly the same amount of time to run 8 steps as original CompVis code does (10.4 secs, ~1.25s/it). (#1177)

Co-authored-by: Alex Birch <birch-san@users.noreply.github.com>
hlky added a commit to Sygil-Dev/sygil-webui that referenced this issue Sep 18, 2022
* resolve conflict with master

* - Added option to select custom models instead of just using the default one, if you want to use a custom model just place your .ckpt file in "models/custom" and the UI will detect it and let you switch between stable diffusion and your custom model, make sure to give the filename a proper name that is easy to distinguish from other models because that name will be used on the UI.
- Implemented basic Text To Video tab, will continue to improve it as it is really basic right now.
- Improved the model loading, you now should see less frequently errors about it not been loaded correctly.

* fix: advanced editor (#827), close #811

refactor js_Call hook to take all gradio arguments

* Added num_inference_steps to config file and fixed incorrectly calls to the config file from the txt2vid tab calling txt2img instead.

* update readme as per installation step & format

* proposed streamlit code organization changes

I want people of all skill levels to be able to contribute
This is one way the code could be split up with the aim of making it easy to understand and contribute especially for people on the lower end of the skill spectrum
All i've done is split things, I think renaming and reorganising is still needed

* Fixed missing diffusers dependency  for Streamlit

* Streamlit: Allow user defaults to be specified in a userconfig_streamlit.yaml file.

* Changed Streamit yaml default configs

Changed `update_preview_frequency` from every 1 step to every 5 steps. This results in a massive gain in performance (roughly going from 2-3 times slower to only 10-15% slower) while still showing good image generation output.

Changed default GFPGAN and realESRGAN settings to be off by default. That way, users can decide if they want to use them on, and what images they wish to do so.

* Made sure img2txt and img2img checkboxes respect YAML defaults

* Move location of user file to configs/webui folder

* Fixed the path in webui_streamlit.py

* Display Info and Stats when render is complete, similar to what Gradio shows.

* Add info and stats to img2img

* chore: update maintenance scripts and docs  (#891)

* automate conda_env_name as per name in yaml

* Embed installation links directly in README.md

Include links to Windows, Linux, and Google Colab installations.

* Fix conda update in webui.sh for pip bug

* Add info about new PRs

Co-authored-by: Hafiidz <3688500+Hafiidz@users.noreply.github.com>
Co-authored-by: Tom Pham <54967380+TomPham97@users.noreply.github.com>
Co-authored-by: GRMrGecko <grmrgecko@gmail.com>

* Improvements to the txt2vid tab.

* Urgent Fix to PR:860

* Update attention.py

* Update FUNDING.yml

* when in outcrop mode, mask added regions and fill in with voroni noise for better outpainting

* frontend: display current device info (#889)

Displays the current device info at the bottom of the page.

For users who run multiple instances of `sd-webui` on the same system (for multiple GPUs), it helps to know which of the active `CUDA_VISIBLE_DEVICES` is being used.

* Fixed aspect ratio box not being updated on txt2img tab, for issue 219 from old repo (#812)

* Metadata cleanup - Maintain metadata within UI (#845)

* Metadata cleanup - Maintain metadata within UI

This commit, when combined with Gradio 3.2.1b1+, maintains image
metadata as an image is passed throughout the UI. For example,
if you generate an image, send it to Image Lab, upscale it, fix faces,
and then drag the resulting image back in to Image Lab, it will still
remember the image generation parameters.

When the image is saved, the metadata will be stripped from it if
save-metadata is not enabled. If the image is saved by *dragging*
out of the UI on to the filesystem it may maintain its metadata.

Note: I have ran into UI responsiveness issues with upgrading Gradio.
Seems there may be some Gradio queue management issues. *Without* the
gradio update this commit will maintain current functionality, but
will not keep meetadata when dragging an image between UI components.

* Move ImageMetadata into its own file

Cleans up webui, enables webui_streamlit et al to use it as well.

* Fix typo

* Add filename formatting argument (#908)

* Update webui.py

Filename formatting argument

* Update scripts/webui.py

Co-authored-by: Thomas Mello <work.mello@gmail.com>

* Tiling parameter (#911)

* tiling

* default to False

* fix: filename format parameter (#923)

* For issue :884, ensure webui.cmd before init src

* Remove embeddings file path

* Add mask_restore to restore images based on mask, fixing #665 (#898)

* Add mask_restore option to give users the option to restore images based on mask, fixing #665.

Before commit c73fdd7  (Implement masking during sampling to improve blending, #308)
image mask was applied after sampling, resulting in masked parts that are not regenerated
to actually stay the same.
Since c73fdd7 the masked img2img will change the whole image, even in masked areas.
It gives better looking results at first glance, but will result in image degredation when
applied a few times. See issue #665.

In the workflow of using repeated masked img2img, users may want to use this options to keep the parts
of image they actually want to keep without image degradation. A final masked img2img or whole image img2img with mask_restore disabled
will give the better blending of "Implement masking during sampling".

* revert changes of a7be43b in change_image_editor_mode

* fix ui_functions.change_image_editor_mode by adding gr.update to the end of the list it returns

* revert inserted newlines and whitespaces to match format of previous code

* improve caption of new option mask_restore

"Only modify regenerated parts of image"

* fix ui_functions.change_image_editor_mode by adding gr.update to the end of the list it returns

an old copy of the function exists in webui.py, this superflous function mistakenly was changed by the earlier commit b6a9e16

* remove unused functions that are near duplicates of functions in ui_functions.py

* Added CSS to center the image in the txt2img interface

* add img2img option for color correction. (#936)

color correction is already used for loopback to prevent color drift with the first image as correction target.
the option allows to use the color correction even without loopback mode.
it helps keeping the colors similar to the input image.

* Image transparency is used as mask for inpainting

* fix: lost imports from #921

* Changed StreamIt to `k_euler` 30 steps  as default

* Fixed an issue with the txt2vid model.

* Removed old files from a split test we deed that are not needed anymore, we plan to do the split differently.

* Changed the scheduler for the txt2vid tab back to LMS, for now we can only use that.

* Better support for large batches in optimized mode

* Removed some unused lines from the css file for the streamlit version.

* Changed the diffusers version to be 0.2.4 or lower as a new version breaks the txt2vid generation.

* Added the models/custom folder to gitignore to ignore custom models.

* Added two new scripts that will be used for the new implementation of the txt2vid tab which uses the latest version of the diffusers library.

* - Improved the progress bar for the txt2vid tab, it now shows more information during generation.
- Changed the guidance_scale variable to be cfg_scale.

* Perform masked image restoration for GFPGAN, RealESRGAN, fixing #947

* Perform masked image restoration when using GFPGAN or RealESRGAN, fixing #947.
Also fixes bug in image display when using masked image restoration with RealESRGAN.

When the image is upscaled using RealESRGAN the image restoration can not use the
original image because it has wrong resolution. In this case the image restoration
will restore the non-regenerated parts of the image with an RealESRGAN upscaled
version of the original input image.

Modifications from GFPGAN or color correction in (un)masked parts are also restored
to the original image by mask blending.

* Update scripts/webui.py

Co-authored-by: Thomas Mello <work.mello@gmail.com>

* fix: sampler name in GoBig #988

* add sampler_name defaults to img2img

* add metadata to other file output file types

* remove deprecated kwargs/parameter

* refactor: sort out dependencies

Co-Authored-By: oc013 <101832295+oc013@users.noreply.github.com>
Co-Authored-By: Aarni Koskela <akx@iki.fi>
Co-Authored-By: oc013 <101832295+oc013@users.noreply.github.com>
Co-Authored-By: Aarni Koskela <akx@iki.fi>

* webui: detect scoped-down GPU environment (#993)

* webui: detect scoped-down GPU environment

check if we're using a scoped-down GPU environment (pynvml does not listen to CUDA_VISIBLE_DEVICES) so that we can measure memory on the correct GPU

* remove unnecessary import

* Added piexif dependency.

* Changed the minimum value for the Sampling Steps and Inference Steps to 10 and added step with a value of 10 to make it easier to move the slider as it will require a higher maximum value than in other tabs for good results on the text2vid tab.

* Commented an import that is not used for now but will be used soon.

* write same metadata to file and yaml

* include piexif in environment needed for exif labelling of non-png files

* fix individual image file format saves

* introduces a general config setting save_format similar to grid_format for individual file saves

* Add NSFW filter to avoid unexpected (#955)

* Add NSFW filter to avoid unexpected

* Fix img2img configuration numbering

* Added some basic layout for the Model Manager tab and added there the models that most people use to make it easy to download instead of having to go do the wiki or searching through discord for links, it also shows the path where you are supposed to put those models for them to work.

* webui: display the GPU in use during startup (#994)

* webui: display the GPU in use during startup

tell the user which GPU the code is actually going to use before spending lots of time loading everything onto the GPU

* typo

* add some info messages

* evaluate current GPU properly

* add debug flag gating

not everyone wants or needs to see debug messages :)

* add in stray debug msg

* Docker updates - Add LDSR, streamlit, other updates for new repository

* Update util.py

* Docker - Set PYTHONPATH to parent directory to avoid `No module named frontend` error

* Add missing comma for nsfw toggle in img2img (#1028)

* Multiple improvements to the txt2vid tab.
- Improved txt2vid speed by 2 times.
- Added DDIM scheduler.
- Added sliders for beta_start and beta_end to have more control over these parameters on the scheduler.
- Added option to select the scheduler type from scaled_linear or linear.
- Added option to save info files for the txt2vid tab and improved the information saved to include most of the parameters used to run the generation.
- You can now download any model from the huggingface website to use on the txt2vid tab, just add the name to the custom_models_list on the config file.

* webui: add prompt output to console (#1031)

* webui: add prompt output to console

show the user what prompt is currently being rendered

* fix prompt print location

* support negative prompts separated by ###

e.g. "shopping mall ### people" will try to generate an image of a mall
without people in it.

* Docker validate model files if not a symlink in case user has VALIDATE_MODELS=false set (#1038)

* - Added changes made by @hafiidz on the ui-improvements branch to the css for the streamli-on-hover-tabs component.

* Added streamlit-on-Hover-tabs and streamlit-option-menu dependencies to the environment.yaml file.

* Changed some values to be dynamic instead of a fixed value so they are more responsive.

* Changed the cmd script to use the dark theme by default when launching the streamlit UI.

* Removed the padding at the top of the sidebar so we can have more free space.

* - Added code for @hafiidz's changes made on the css.

* Fixed an error with the metadata not able to be saved because of the seed was not converted to a string before so it had no attribute encode on it.

* add masking to streamlit img2img, find_noise_for_image, matched_noise

* Use the webui script directories as PWD (#946)

* add Gradio API endpoint settings (#1055)

* add Gradio API endpoint settings

* Add comments crediting code authors. (probably not enough, but better than none)

* Renamed the save_grid option for txt2vid on the config file to be save_video, this will be used to determine if the user wants to save a video at the end of the generation or not, similar to the save_grid that is used on txt2img and img2img but for video.

* -Added the Update Image Preview option to be part of the current tab options under Preview Settings.
- Added Dynamic Preview Frequency option for the txt2vid tab which tries to find the lowest value for update_preview_frequency at which we can update the preview image during generation while at the same time minimizing the impact it has in performance.
- Added option to save a video file on the outputs/txt2vid-samples folder after the generation is complete similar to how the save_grid option works on other tabs.
- Added a video preview which shows a video on the txt2vid tab when the generation is completed.
- Formated some lines of code to make it use less space and fit on the a single screen.
- Added a script called Settings.py to the script folder in which Settings for the Setting page will be placed. Empty for now.

* Commented some print statements that were used for debugging and forgot to remove previously.

* fix: disable live prompt parsing

* Fix issue where loopback was using batch mode

* Fix indentation error that prevents mask_restore from working unless ESRGAN is turned on

* Fixed Sidebar CSS for 4K displays

* img2img mask fixes and fix image2noise normalization

* Made it so the sampling_steps is added to num_inference_steps, otherwise it would not match the value you set for it on the slider.

* Changed the loading of the model on the txt2vid tab so the half models are only loaded if the no_half option on the config file is set to False.

* fix: launcher batch file fix #920, fix #605

- Allow reading environment.yaml file in either LF or CRLF
- Only update environment if environment.yaml changes
- Remove custom_conda_path to discourage changing source file
- Fix unable to launch webui due to frontend module missing (#605)

* Update README.md (#1075)

fix typo

* half precision streamlit txt2vid

`RuntimeError: expected scalar type Half but found Float` with both `torch_dtype=torch.float16` and `revision="fp16"`

* Add mask restore feature to streamlit, prevent color correction from modifying initial image when mask_restore is turned on

* Add mask_restore to streamlit config

* JobManager: Fix typo breaking jobs close #858 close #1041

* JobManager: Buttons skip queue (#1092)

Have JobManager buttons skip Gradio's queue, since otherwise
they aren't sending JobManager button presses.

* The webui_streamlit.py file has been split into multiple modules containing their own code making it easier to work with than a single big file.
The list of modules is as follow:
- webuit_streamlit.py: contains the main layout as well as the functions that load the css which is needed by the layout.
- webui_streamlit_old.py: contains the code for the previous version of the WebUI. Will be removed once the new UI code starts to get used and if everything works as it should.
- txt2img.py: contains the code for the txt2img tab.
- img2img.py: contains the code for the img2img tab.
- txt2vid.py: contains the code for the txt2vid tab.
- sd_utils.py: contains utility functions used by more than one module, any function that meets such condition should be placed here.
- ModelManager.py: contains the code for the Model Manager page on the sidebar menu.
- Settings.py: contains the code for the Settings page on the sidebar menu.
- home.py: contains the code for the Home tab, history and gallery implemented by @devilismyfriend.
- imglab.py: contains the code for the Image Lab tab implemented by @devilismyfriend

* fix: patch docker conda install pip requirements (#1094)

(cherry picked from commit fab5765)

Co-authored-by: Sérgio <smaisidoro@gmail.com>

* Using the Optimization from Dogettx  (#974)

* Update attention.py

* change to dogettx

Co-authored-by: hlky <106811348+hlky@users.noreply.github.com>

* Update Dockerfile (#1101)

add expose for streamlit port

* Publish Streamlit ports (#1102)

(cherry picked from commit 833a910)

Co-authored-by: Charlie <outlookhazy@users.noreply.github.com>

* Forgot to call the layout() for the Model Manager tab after the import so it was not been used and the tab was shown as empty.

* Removed the "find_noise_for_image.py" and "matched_noise.py" scripts as their content is now part of "sd_utils.py"

* - Added the functions to load the optimized models, this "should" make it so optimized and turbo mode work now but needs to be tested more.
- Added function to load LDSR.

* Fixed some imports.

* Fixed the info message on the txt2img tab not showing the info but just showing the text "Done"

* Made the defaults settings from the config file be stored inside st.session_state to avoid loading it multiple times when calling the "sd_utils.py" file from other modules.

* Moved defaults to the webui_streamlit.py file and fixed some imports.

* Removed condition to check if the defaults are in the st.session_state dictionary, this is not needed and would cause issues with it not being reloaded when the user changes something on it.

* Modified the way the defaults settings are loaded from the config file so we only load them on the webui_strealit.py file and use st.session_state to access them from anywhere else, this makes it so the config can be modified externally like before the code split and the changes will be updated on next rerun of the UI.

* fix: [streamlit] optimization mode

* temp disable nvml support for multiple gpus

* Fixed defaults not being loaded correctly or missing in some places.

* Add a separate update script instead of git pull on startup (#1106)

* - Fixed max_frame not being properly used and instead sampling_steps was the variable being use.
- Fixed several issues with wrong variable being used on multiple places.
- Addd option to toggle some extra option from the config file for when the model is loading on the txt2vid tab.

* Re-merge #611 - View/Cancel in-progress diffusions (#796)

* JobManager: Re-merge #611

PR #611 seems to have got lost in the shuffle after
the transition to 'dev'.

This commit re-merges the feature branch. This adds
support for viewing preview images as the image
generates, as well as cancelling in-progress images
and a couple fixes and clean-ups.

* JobManager: Clear jobs that fail to start

Sometimes if a job fails to start it will get stuck in the active job
list. This commit ensures that jobs that raise exceptions are cleared,
and also adds a start timer to clear out jobs that fail to start
within a reasonable amount of time.

* chore: add breaks to cmds for readability (#1134)

* Added custom models list to the txt2img tab.

* Small fix to the custom model list.

* Corrected breaking issues introduced in #1136 to txt2img and
made state variables consistent with img2img.

Fixed a bug where switching models after running would not reload
the used model.

* Formatted tabs as spaces

* Fixed update_preview_frequency and update_preview using defaults from
webui_streamlit.yaml instead of state variables from UI.

* Prompt user if they want to restore changes (#1137)

- After stashing any changes and pulling updates, ask user if they wish to pop changes
- If user declines the restore, drop the stash to prevent the case of an ever growing stash pile

* Added streamlit_nested_layout component as dependency and imported on the webui_streamli.py file to allow us to use nested columns and expanders.

* - Added the Home tab made by @devilismyfriend
- Added gallery tab on txt2img.

* Added case insensitivity to restore prompt (#1152)

* Calculate aspect ratio and pixel count on start (#1157)

* Fix errors rendering galleries when there are not enough images to render

* Fix the gallery back/next buttons and add a refresh button

* Fix invalid invocation of find_noise_for_image

* Removed the Home tab until the gallery is fixed.

* Fixed a missing import on the ModelManager script.

* Added discord server link to the Readme.md

* - Increased the max value for the width and height sliders on the txt2img tab.
- Fixed a leftover line from removing the home tab.

* Update conda environment on startup always (#1171)

* Update environment on startup always

* Message to explicitly state no environment.yaml update required

Co-authored-by: hlky <106811348+hlky@users.noreply.github.com>

* environment update from .cmd

* Update .gitignore

* Enable negative prompts on streamlit

* - Bumped the version of diffusers used on the txt2vid tab to be now v0.3.0.
- Added initial file for the textual inversion tab.

* add missing argument to GoBig sample function, fixes #1183 (#1184)

* cherry-pick @Any-Winter-4079's invoke-ai/InvokeAI#540. this is a collaboration incorporating a lot of people's contributions -- including for example @Doggettx and the original code from @neonsecret on which the Doggetx optimizations were based (see invoke-ai/InvokeAI#431, https://github.com/sd-webui/stable-diffusion-webui/pull/771\#issuecomment-1239716055). Takes exactly the same amount of time to run 8 steps as original CompVis code does (10.4 secs, ~1.25s/it). (#1177)

Co-authored-by: Alex Birch <birch-san@users.noreply.github.com>

* allow webp uploads to img2img tab #991

* Don't attempt mask restoration when there is no mask given (#1186)

* When running a batch with preview turned on, produce a grid of preview images

* When early terminating, generation_callback gets invoked but st.session_state is empty. When this happens, just bail.

* Collect images for final display

This is a collection of several changes to enhance image display:

* When using GFPGAN or RealESRGAN, only the final output will be
  displayed.
* In batch>1 mode, each final image will be collected into an image grid
  for display
* The image is constrained to a reasonable size to ensure that batch
  grids of RealESRGAN'd images don't end up spitting out a massive image
  that the browser then has to handle.
* Additionally, the progress bar indicator is updated as each image is
  post-processed.

* Display the final image before running postprocessing, and don't preview when i=0

* Added a config option to use embeddings from the huggingface stable diffusion concept library.

* Added option to enable enable_attention_slicing and enable_minimal_memory_usage, this for now only works on txt2vid which uses diffusers.

* Basic implementation for the Concept Library tab made by cloning the Home tab.

* Temporarily hide sd_concept_library due to missing layout()

* st.session_state["defaults"] fix

* Used loaded_model state variable in .yaml generation (#1196)

Used loaded_model state variable in .yaml generation

* Streamlit txt2img page settings now follow defaults (#1195)

* Some options on the Streamlit txt2img page now follow the defaults from the relevant config files.

* Fixed a copy-paste gone wrong in my previous commit.

* st.session_state["defaults"] fix

Co-authored-by: hlky <106811348+hlky@users.noreply.github.com>

* default img2img denoising strength increased

* slider_steps and slider_bounds in defaults config

slider_steps and slider_bounds in defaults config

* fix: copy to clipboard button

Co-authored-by: ZeroCool940711 <alejandrogilelias940711@gmail.com>
Co-authored-by: ZeroCool <ZeroCool940711@users.noreply.github.com>
Co-authored-by: Hafiidz <3688500+Hafiidz@users.noreply.github.com>
Co-authored-by: hlky <106811348+hlky@users.noreply.github.com>
Co-authored-by: Joshua Kimsey <jkimsey95@gmail.com>
Co-authored-by: Tony Beeman <beeman@gmail.com>
Co-authored-by: Tom Pham <54967380+TomPham97@users.noreply.github.com>
Co-authored-by: GRMrGecko <grmrgecko@gmail.com>
Co-authored-by: TingTingin <36141041+TingTingin@users.noreply.github.com>
Co-authored-by: Logan zoellner <nagolinc@gmail.com>
Co-authored-by: M <mchaker@users.noreply.github.com>
Co-authored-by: James Pound <jamespoundiv@gmail.com>
Co-authored-by: cobryan05 <13701027+cobryan05@users.noreply.github.com>
Co-authored-by: Michoko <michoko@hotmail.com>
Co-authored-by: VulumeCode <2590984+VulumeCode@users.noreply.github.com>
Co-authored-by: xaedes <xaedes@googlemail.com>
Co-authored-by: Michael Hearn <git@mikehearn.net>
Co-authored-by: Soul-Burn <sugoibaka@gmail.com>
Co-authored-by: JJ <jjisnow@gmail.com>
Co-authored-by: oc013 <101832295+oc013@users.noreply.github.com>
Co-authored-by: Aarni Koskela <akx@iki.fi>
Co-authored-by: osi1880vr <87379616+osi1880vr@users.noreply.github.com>
Co-authored-by: Rae Fu <rraefu@gmail.com>
Co-authored-by: Brian Semrau <brian.semrau@gmail.com>
Co-authored-by: Matt Soucy <git@msoucy.me>
Co-authored-by: endomorphosis <endomorphosis@users.noreply.github.com>
Co-authored-by: unnamedplugins <79282950+unnamedplugins@users.noreply.github.com>
Co-authored-by: Syahmi Azhar <prsyahmi@gmail.com>
Co-authored-by: Ahmad Abdullah <83442967+ahmad1284@users.noreply.github.com>
Co-authored-by: Sérgio <smaisidoro@gmail.com>
Co-authored-by: Charlie <outlookhazy@users.noreply.github.com>
Co-authored-by: protoplm <protoplmz@gmail.com>
Co-authored-by: Ascended <dspradau@gmail.com>
Co-authored-by: JuanLagu <32816584+JuanLagu@users.noreply.github.com>
Co-authored-by: Chris Heald <cheald@gmail.com>
Co-authored-by: Charles Galant <cgalant@gmail.com>
Co-authored-by: Alex Birch <birch-san@users.noreply.github.com>
Co-authored-by: protoplm <57930981+protoplm@users.noreply.github.com>
Co-authored-by: Dekker3D <dekker3d@gmail.com>
@lstein
Copy link
Collaborator

lstein commented Oct 11, 2022 via email

@Any-Winter-4079
Copy link
Contributor Author

@lstein There was a later update to attention.py in #582
These are the two major PR's I believe.

austinbrown34 pushed a commit to cognidesign/InvokeAI that referenced this issue Dec 30, 2022
Performance improvements to generate larger images in M1 invoke-ai#431

Update attention.py

Added dtype=r1.dtype to softmax
austinbrown34 pushed a commit to cognidesign/InvokeAI that referenced this issue Dec 30, 2022
commit 1c649e4
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Mon Sep 12 13:29:16 2022 -0400

    fix torchvision dependency version #511

commit 4d197f6
Merge: a3e07fb 190ba78
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Mon Sep 12 07:29:19 2022 -0400

    Merge branch 'development' of github.com:lstein/stable-diffusion into development

commit a3e07fb
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Mon Sep 12 07:28:58 2022 -0400

    fix grid crash

commit 9fa1f31
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Mon Sep 12 07:07:05 2022 -0400

    fix opencv and realesrgan dependencies in mac install

commit 190ba78
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Mon Sep 12 01:50:58 2022 -0400

    Update requirements-mac.txt

    Fixed dangling dash on last line.

commit 25d9ccc
Author: Any-Winter-4079 <50542132+Any-Winter-4079@users.noreply.github.com>
Date:   Mon Sep 12 03:17:29 2022 +0200

    Update model.py

commit 9cdf3ac
Author: Any-Winter-4079 <50542132+Any-Winter-4079@users.noreply.github.com>
Date:   Mon Sep 12 02:52:36 2022 +0200

    Update attention.py

    Performance improvements to generate larger images in M1 invoke-ai#431

    Update attention.py

    Added dtype=r1.dtype to softmax

commit 49a96b9
Author: Mihai <299015+mh-dm@users.noreply.github.com>
Date:   Sat Sep 10 16:58:07 2022 +0300

    ~7% speedup (1.57 to 1.69it/s) from switch to += in ldm.modules.attention. (invoke-ai#482)

    Tested on 8GB eGPU nvidia setup so YMMV.
    512x512 output, max VRAM stays same.

commit aba94b8
Author: Niek van der Maas <mail@niekvandermaas.nl>
Date:   Fri Sep 9 15:01:37 2022 +0200

    Fix macOS `pyenv` instructions, add code block highlight (invoke-ai#441)

    Fix: `anaconda3-latest` does not work, specify the correct virtualenv, add missing init.

commit aac5102
Author: Henry van Megen <h.vanmegen@gmail.com>
Date:   Thu Sep 8 05:16:35 2022 +0200

    Disabled debug output (invoke-ai#436)

    Co-authored-by: Henry van Megen <hvanmegen@gmail.com>

commit 0ab5a36
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 17:19:46 2022 -0400

    fix missing lines in outputs

commit 5e43372
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 16:20:14 2022 -0400

    upped max_steps in v1-finetune.yaml and fixed TI docs to address invoke-ai#493

commit 7708f4f
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 16:03:37 2022 -0400

    slight efficiency gain by using += in attention.py

commit b86a1de
Author: blessedcoolant <54517381+blessedcoolant@users.noreply.github.com>
Date:   Mon Sep 12 07:47:12 2022 +1200

    Remove print statement styling (invoke-ai#504)

    Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>

commit 4951e66
Author: chromaticist <mhostick@gmail.com>
Date:   Sun Sep 11 12:44:26 2022 -0700

    Adding support for .bin files from huggingface concepts (invoke-ai#498)

    * Adding support for .bin files from huggingface concepts

    * Updating documentation to include huggingface .bin info

commit 79b445b
Merge: a323070 f7662c1
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 15:39:38 2022 -0400

    Merge branch 'development' of github.com:lstein/stable-diffusion into development

commit a323070
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 15:28:57 2022 -0400

    update requirements for new location of gfpgan

commit f7662c1
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 15:00:24 2022 -0400

    update requirements for changed location of gfpgan

commit 93c242c
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 14:47:58 2022 -0400

    make gfpgan_model_exists flag available to web interface

commit c7c6cd7
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 14:43:07 2022 -0400

    Update UPSCALE.md

    New instructions needed to accommodate fact that the ESRGAN and GFPGAN packages are now installed by environment.yaml.

commit 77ca83e
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 14:31:56 2022 -0400

    Update CLI.md

    Final documentation tweak.

commit 0ea145d
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 14:29:26 2022 -0400

    Update CLI.md

    More doc fixes.

commit 162285a
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 14:28:45 2022 -0400

    Update CLI.md

    Minor documentation fix

commit 37c921d
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 14:26:41 2022 -0400

    documentation enhancements

commit 4f72cb4
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 13:05:38 2022 -0400

    moved the notebook files into their own directory

commit 878ef2e
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 12:58:06 2022 -0400

    documentation tweaks

commit 4923118
Merge: 16f6a67 defafc0
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 12:51:25 2022 -0400

    Merge branch 'development' of github.com:lstein/stable-diffusion into development

commit defafc0
Author: Dominic Letz <dominic@diode.io>
Date:   Sun Sep 11 18:51:01 2022 +0200

    Enable upscaling on m1 (invoke-ai#474)

commit 16f6a67
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 12:47:26 2022 -0400

    install GFPGAN inside SD repository in order to fix 'dark cast' issue invoke-ai#169

commit 0881d42
Author: blessedcoolant <54517381+blessedcoolant@users.noreply.github.com>
Date:   Mon Sep 12 03:52:43 2022 +1200

    Docs Update (invoke-ai#466)

    Authored-by: @blessedcoolant
    Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>

commit 9a29d44
Author: Gérald LONLAS <gerald@lonlas.com>
Date:   Sun Sep 11 23:23:18 2022 +0800

    Revert "Add 3x Upscale option on the Web UI (invoke-ai#442)" (invoke-ai#488)

    This reverts commit f8a5408.

commit d301836
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 10:52:19 2022 -0400

    can select prior output for init_img using -1, -2, etc

commit 70aa674
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 10:34:06 2022 -0400

    merge PR invoke-ai#495 - keep using float16 in ldm.modules.attention

commit 8748370
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 10:22:32 2022 -0400

    negative -S indexing recovers correct previous seed; closes issue invoke-ai#476

commit 839e30e
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 11 10:02:44 2022 -0400

    improve CUDA VRAM monitoring

    extra check that device==cuda before getting VRAM stats

commit bfb2781
Author: tildebyte <337875+tildebyte@users.noreply.github.com>
Date:   Sat Sep 10 10:15:56 2022 -0400

    fix(readme): add note about updating env via conda (invoke-ai#475)

commit 5c43988
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 10 10:02:43 2022 -0400

    reduce VRAM memory usage by half during model loading

    * This moves the call to half() before model.to(device) to avoid GPU
    copy of full model. Improves speed and reduces memory usage dramatically

    * This fix contributed by @mh-dm (Mihai)

commit 9912270
Merge: 817c4a2 ecc6b75
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 10 09:54:34 2022 -0400

    Merge branch 'development' of github.com:lstein/stable-diffusion into development

commit 817c4a2
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 10 09:53:27 2022 -0400

    remove -F option from normalized prompt; closes invoke-ai#483

commit ecc6b75
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 10 09:53:27 2022 -0400

    remove -F option from normalized prompt

commit 723d074
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Fri Sep 9 18:49:51 2022 -0400

    Allow ctrl c when using --from_file (invoke-ai#472)

    * added ansi escapes to highlight key parts of CLI session

    * adjust exception handling so that ^C will abort when reading prompts from a file

commit 75f633c
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Fri Sep 9 12:03:45 2022 -0400

    re-add new logo

commit 10db192
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Fri Sep 9 09:26:10 2022 -0400

    changes to dogettx optimizations to run on m1
    * Author @Any-Winter-4079
    * Author @dogettx
    Thanks to many individuals who contributed time and hardware to
    benchmarking and debugging these changes.

commit c85ae00
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Thu Sep 8 23:57:45 2022 -0400

    fix bug which caused seed to get "stuck" on previous image even when UI specified -1

commit 1b5aae3
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Thu Sep 8 22:36:47 2022 -0400

    add icon to dream web server

commit 6abf739
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Thu Sep 8 22:25:09 2022 -0400

    add favicon to web server

commit db825b8
Merge: 33874ba afee7f9
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Thu Sep 8 22:17:37 2022 -0400

    Merge branch 'deNULL-development' into development

commit 33874ba
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Thu Sep 8 22:16:29 2022 -0400

    Squashed commit of the following:

    commit afee7f9
    Merge: 6531446 171f8db
    Author: Lincoln Stein <lincoln.stein@gmail.com>
    Date:   Thu Sep 8 22:14:32 2022 -0400

        Merge branch 'development' of github.com:deNULL/stable-diffusion into deNULL-development

    commit 171f8db
    Author: Denis Olshin <me@denull.ru>
    Date:   Thu Sep 8 03:15:20 2022 +0300

        saving full prompt to metadata when using web ui

    commit d7e67b6
    Author: Denis Olshin <me@denull.ru>
    Date:   Thu Sep 8 01:51:47 2022 +0300

        better logic for clicking to make variations

commit afee7f9
Merge: 6531446 171f8db
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Thu Sep 8 22:14:32 2022 -0400

    Merge branch 'development' of github.com:deNULL/stable-diffusion into deNULL-development

commit 6531446
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Thu Sep 8 20:41:37 2022 -0400

    work around unexplained crash when timesteps=1000 (invoke-ai#440)

    * work around unexplained crash when timesteps=1000

    * this fix seems to work

commit c33a84c
Author: blessedcoolant <54517381+blessedcoolant@users.noreply.github.com>
Date:   Fri Sep 9 12:39:51 2022 +1200

    Add New Logo (invoke-ai#454)

    * Add instructions on how to install alongside pyenv (invoke-ai#393)

    Like probably many others, I have a lot of different virtualenvs, one for each project. Most of them are handled by `pyenv`.
    After installing according to these instructions I had issues with ´pyenv`and `miniconda` fighting over the $PATH of my system.
    But then I stumbled upon this nice solution on SO: https://stackoverflow.com/a/73139031 , upon which I have based my suggested changes.

    It runs perfectly on my M1 setup, with the anaconda setup as a virtual environment handled by pyenv.

    Feel free to incorporate these instructions as you see fit.

    Thanks a million for all your hard work.

    * Disabled debug output (invoke-ai#436)

    Co-authored-by: Henry van Megen <hvanmegen@gmail.com>

    * Add New Logo

    Co-authored-by: Håvard Gulldahl <havard@lurtgjort.no>
    Co-authored-by: Henry van Megen <h.vanmegen@gmail.com>
    Co-authored-by: Henry van Megen <hvanmegen@gmail.com>
    Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>

commit f8a5408
Author: Gérald LONLAS <gerald@lonlas.com>
Date:   Fri Sep 9 01:45:54 2022 +0800

    Add 3x Upscale option on the Web UI (invoke-ai#442)

commit 244239e
Author: James Reynolds <magnusviri@users.noreply.github.com>
Date:   Thu Sep 8 05:36:33 2022 -0600

    macOS CI workflow, dream.py exits with an error, but the workflow com… (invoke-ai#396)

    * macOS CI workflow, dream.py exits with an error, but the workflow completes.

    * Files for testing

    Co-authored-by: James Reynolds <magnsuviri@me.com>
    Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>

commit 711d49e
Author: James Reynolds <magnusviri@users.noreply.github.com>
Date:   Thu Sep 8 05:35:08 2022 -0600

    Cache model workflow (invoke-ai#394)

    * Add workflow that caches the model, step 1 for CI

    * Change name of workflow job

    Co-authored-by: James Reynolds <magnsuviri@me.com>
    Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>

commit 7996a30
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Thu Sep 8 07:34:03 2022 -0400

    add auto-creation of mask for inpainting (invoke-ai#438)

    * now use a single init image for both image and mask

    * turn on debugging for now to write out mask and image

    * add back -M option as a fallback

commit a69ca31
Author: elliotsayes <elliotsayes@gmail.com>
Date:   Thu Sep 8 15:30:06 2022 +1200

    .gitignore WebUI temp files (invoke-ai#430)

    * Add instructions on how to install alongside pyenv (invoke-ai#393)

    Like probably many others, I have a lot of different virtualenvs, one for each project. Most of them are handled by `pyenv`.
    After installing according to these instructions I had issues with ´pyenv`and `miniconda` fighting over the $PATH of my system.
    But then I stumbled upon this nice solution on SO: https://stackoverflow.com/a/73139031 , upon which I have based my suggested changes.

    It runs perfectly on my M1 setup, with the anaconda setup as a virtual environment handled by pyenv.

    Feel free to incorporate these instructions as you see fit.

    Thanks a million for all your hard work.

    * .gitignore WebUI temp files

    Co-authored-by: Håvard Gulldahl <havard@lurtgjort.no>

commit 5c6b612
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Wed Sep 7 22:50:55 2022 -0400

    fix bug that caused same seed to be redisplayed repeatedly

commit 56f155c
Author: Johan Roxendal <johan@roxendal.com>
Date:   Thu Sep 8 04:50:06 2022 +0200

    added support for parsing run log and displaying images in the frontend init state (invoke-ai#410)

    Co-authored-by: Johan Roxendal <johan.roxendal@litteraturbanken.se>
    Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>

commit 4168774
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Wed Sep 7 20:24:35 2022 -0400

    added missing initialization of latent_noise to None

commit 171f8db
Author: Denis Olshin <me@denull.ru>
Date:   Thu Sep 8 03:15:20 2022 +0300

    saving full prompt to metadata when using web ui

commit d7e67b6
Author: Denis Olshin <me@denull.ru>
Date:   Thu Sep 8 01:51:47 2022 +0300

    better logic for clicking to make variations

commit d1d044a
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Wed Sep 7 17:56:59 2022 -0400

    actual image seed now written into web log rather than -1 (invoke-ai#428)

commit edada04
Author: Arturo Mendivil <60411196+artmen1516@users.noreply.github.com>
Date:   Wed Sep 7 10:42:26 2022 -0700

    Improve notebook and add requirements file (invoke-ai#422)

commit 29ab3c2
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Wed Sep 7 13:28:11 2022 -0400

    disable neonpixel optimizations on M1 hardware (invoke-ai#414)

    * disable neonpixel optimizations on M1 hardware

    * fix typo that was causing random noise images on m1

commit 7670ecc
Author: cody <cnmizell@gmail.com>
Date:   Wed Sep 7 12:24:41 2022 -0500

    add more keyboard support on the web server (invoke-ai#391)

    add ability to submit prompts with the "enter" key
    add ability to cancel generations with the "escape" key

commit dd2aeda
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Wed Sep 7 13:23:53 2022 -0400

    report VRAM usage stats during initial model loading (invoke-ai#419)

commit f628477
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Tue Sep 6 17:12:39 2022 -0400

    Squashed commit of the following:

    commit 7d1344282d942a33dcecda4d5144fc154ec82915
    Merge: caf4ea3 ebeb556
    Author: Lincoln Stein <lincoln.stein@gmail.com>
    Date:   Mon Sep 5 10:07:27 2022 -0400

        Merge branch 'development' of github.com:WebDev9000/stable-diffusion into WebDev9000-development

    commit ebeb556
    Author: Web Dev 9000 <rirath@gmail.com>
    Date:   Sun Sep 4 18:05:15 2022 -0700

        Fixed unintentionally removed lines

    commit ff2c4b9
    Author: Web Dev 9000 <rirath@gmail.com>
    Date:   Sun Sep 4 17:50:13 2022 -0700

        Add ability to recreate variations via image click

    commit c012929
    Author: Web Dev 9000 <rirath@gmail.com>
    Date:   Sun Sep 4 14:35:33 2022 -0700

        Add files via upload

    commit 02a6018
    Author: Web Dev 9000 <rirath@gmail.com>
    Date:   Sun Sep 4 14:35:07 2022 -0700

        Add files via upload

commit eef7889
Author: Olivier Louvignes <olivier@mg-crea.com>
Date:   Tue Sep 6 12:41:08 2022 +0200

    feat(txt2img): allow from_file to work with len(lines) < batch_size (invoke-ai#349)

commit 720e5cd
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Mon Sep 5 20:40:10 2022 -0400

    Refactoring simplet2i (invoke-ai#387)

    * start refactoring -not yet functional

    * first phase of refactor done - not sure weighted prompts working

    * Second phase of refactoring. Everything mostly working.
    * The refactoring has moved all the hard-core inference work into
    ldm.dream.generator.*, where there are submodules for txt2img and
    img2img. inpaint will go in there as well.
    * Some additional refactoring will be done soon, but relatively
    minor work.

    * fix -save_orig flag to actually work

    * add @neonsecret attention.py memory optimization

    * remove unneeded imports

    * move token logging into conditioning.py

    * add placeholder version of inpaint; porting in progress

    * fix crash in img2img

    * inpainting working; not tested on variations

    * fix crashes in img2img

    * ported attention.py memory optimization invoke-ai#117 from basujindal branch

    * added @torch_no_grad() decorators to img2img, txt2img, inpaint closures

    * Final commit prior to PR against development
    * fixup crash when generating intermediate images in web UI
    * rename ldm.simplet2i to ldm.generate
    * add backward-compatibility simplet2i shell with deprecation warning

    * add back in mps exception, addresses @Vargol comment in #354

    * replaced Conditioning class with exported functions

    * fix wrong type of with_variations attribute during intialization

    * changed "image_iterator()" to "get_make_image()"

    * raise NotImplementedError for calling get_make_image() in parent class

    * Update ldm/generate.py

    better error message

    Co-authored-by: Kevin Gibbons <bakkot@gmail.com>

    * minor stylistic fixes and assertion checks from code review

    * moved get_noise() method into img2img class

    * break get_noise() into two methods, one for txt2img and the other for img2img

    * inpainting works on non-square images now

    * make get_noise() an abstract method in base class

    * much improved inpainting

    Co-authored-by: Kevin Gibbons <bakkot@gmail.com>

commit 1ad2a8e
Author: thealanle <35761977+thealanle@users.noreply.github.com>
Date:   Mon Sep 5 17:35:04 2022 -0700

    Fix --outdir function for web (invoke-ai#373)

    * Fix --outdir function for web

    * Removed unnecessary hardcoded path

commit 52d8bb2
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Mon Sep 5 10:31:59 2022 -0400

    Squashed commit of the following:

    commit 0cd48e932f1326e000c46f4140f98697eb9bdc79
    Author: Lincoln Stein <lincoln.stein@gmail.com>
    Date:   Mon Sep 5 10:27:43 2022 -0400

        resolve conflicts with development

    commit d7bc8c1
    Author: Scott McMillin <scott@scottmcmillin.com>
    Date:   Sun Sep 4 18:52:09 2022 -0500

        Add title attribute back to img tag

    commit 5397c89
    Author: Scott McMillin <scott@scottmcmillin.com>
    Date:   Sun Sep 4 13:49:46 2022 -0500

        Remove temp code

    commit 1da080b
    Author: Scott McMillin <scott@scottmcmillin.com>
    Date:   Sun Sep 4 13:33:56 2022 -0500

        Cleaned up HTML; small style changes; image click opens image; add seed to figcaption beneath image

commit caf4ea3
Author: Adam Rice <adam@askadam.io>
Date:   Mon Sep 5 10:05:39 2022 -0400

    Add a 'Remove Image' button to clear the file upload field (invoke-ai#382)

    * added "remove image" button

    * styled a new "remove image" button

    * Update index.js

commit 95c088b
Author: Kevin Gibbons <bakkot@gmail.com>
Date:   Sun Sep 4 19:04:14 2022 -0700

    Revert "Add CORS headers to dream server to ease integration with third-party web interfaces" (invoke-ai#371)

    This reverts commit 91e826e.

commit a20113d
Author: Kevin Gibbons <bakkot@gmail.com>
Date:   Sun Sep 4 18:59:12 2022 -0700

    put no_grad decorator on make_image closures (invoke-ai#375)

commit 0f93dad
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 4 21:39:15 2022 -0400

    fix several dangling references to --gfpgan option, which no longer exists

commit f4004f6
Author: tildebyte <337875+tildebyte@users.noreply.github.com>
Date:   Sun Sep 4 19:43:04 2022 -0400

    TOIL(requirements): Split requirements to per-platform (invoke-ai#355)

    * toil(reqs): split requirements to per-platform

    Signed-off-by: Ben Alkov <ben.alkov@gmail.com>

    * toil(reqs): fix for Win and Lin...

    ...allow pip to resolve latest torch, numpy

    Signed-off-by: Ben Alkov <ben.alkov@gmail.com>

    * toil(install): update reqs in Win install notebook

    Signed-off-by: Ben Alkov <ben.alkov@gmail.com>

    Signed-off-by: Ben Alkov <ben.alkov@gmail.com>

commit 4406fd1
Merge: 5116c81 fd7a72e
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 4 08:23:53 2022 -0400

    Merge branch 'SebastianAigner-main' into development
    Add support for full CORS headers for dream server.

commit fd7a72e
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 4 08:23:11 2022 -0400

    remove debugging message

commit 3a2be62
Merge: 91e826e 5116c81
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sun Sep 4 08:15:51 2022 -0400

    Merge branch 'development' into main

commit 5116c81
Author: Justin Wong <1584142+wongjustin99@users.noreply.github.com>
Date:   Sun Sep 4 07:17:58 2022 -0400

    fix save_original flag saving to the same filename (invoke-ai#360)

    * Update README.md with new Anaconda install steps (invoke-ai#347)

    pip3 version did not work for me and this is the recommended way to install Anaconda now it seems

    * fix save_original flag saving to the same filename

    Before this, the `--save_orig` flag was not working. The upscaled/GFPGAN would overwrite the original output image.

    Co-authored-by: greentext2 <112735219+greentext2@users.noreply.github.com>

commit 91e826e
Author: Sebastian Aigner <SebastianAigner@users.noreply.github.com>
Date:   Sun Sep 4 10:22:54 2022 +0200

    Add CORS headers to dream server to ease integration with third-party web interfaces

commit 6266d9e
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 3 15:45:20 2022 -0400

    remove stray debugging message

commit 138956e
Author: greentext2 <112735219+greentext2@users.noreply.github.com>
Date:   Sat Sep 3 13:38:57 2022 -0500

    Update README.md with new Anaconda install steps (invoke-ai#347)

    pip3 version did not work for me and this is the recommended way to install Anaconda now it seems

commit 60be735
Author: Cora Johnson-Roberson <cora.johnson.roberson@gmail.com>
Date:   Sat Sep 3 14:28:34 2022 -0400

    Switch to regular pytorch channel and restore Python 3.10 for Macs. (invoke-ai#301)

    * Switch to regular pytorch channel and restore Python 3.10 for Macs.

    Although pytorch-nightly should in theory be faster, it is currently
    causing increased memory usage and slower iterations:

    invoke-ai#283 (comment)

    This changes the environment-mac.yaml file back to the regular pytorch
    channel and moves the `transformers` dep into pip for now (since it
    cannot be satisfied until tokenizers>=0.11 is built for Python 3.10).

    * Specify versions for Pip packages as well.

commit d0d95d3
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 3 14:10:31 2022 -0400

    make initimg appear in web log

commit b90a215
Merge: 1eee811 6270e31
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 3 13:47:15 2022 -0400

    Merge branch 'prixt-seamless' into development

commit 6270e31
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 3 13:46:29 2022 -0400

    add credit to prixt for seamless circular tiling

commit a01b7bd
Merge: 1eee811 9d88abe
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 3 13:43:04 2022 -0400

    add web interface for seamless option

commit 1eee811
Merge: 64eca42 fb857f0
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 3 12:33:39 2022 -0400

    Merge branch 'development' of github.com:lstein/stable-diffusion into development

commit 64eca42
Merge: 9130ad7 21a1f68
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 3 12:33:05 2022 -0400

    Merge branch 'main' into development
    * brings in small documentation fixes that were
    added directly to main during release tweaking.

commit fb857f0
Author: Lincoln Stein <lincoln.stein@gmail.com>
Date:   Sat Sep 3 12:07:07 2022 -0400

    fix typo in docs

commit 9d88abe
Author: prixt <paraxite@naver.com>
Date:   Sat Sep 3 22:42:16 2022 +0900

    fixed typo

commit a61e49b
Author: prixt <paraxite@naver.com>
Date:   Sat Sep 3 22:39:35 2022 +0900

    * Removed unnecessary code
    * Added description about --seamless

commit 02bee4f
Author: prixt <paraxite@naver.com>
Date:   Sat Sep 3 16:08:03 2022 +0900

    added --seamless tag logging to normalize_prompt

commit d922b53
Author: prixt <paraxite@naver.com>
Date:   Sat Sep 3 15:13:31 2022 +0900

    added seamless tiling mode and commands
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests