Better SDXL support? Individual control over two CLIPs #17

aleksusklim · 2024-04-11T17:31:19Z

How merge expression syntax could be enhanced to incorporate an independent manipulation or L (CLIP as in SD1) and G (OpenCLIP) clips of SDXL?

Currently <'cat'*2+'1girl'> will:

Multiply L and G of "cat" by 2.0, independently.
Pad shortest token ("cat") with zero vectors to max length (of "1girl" which is 2)
Sum L of padded "cat" with L of "1girl" and put to L; accordingly, do the same with G.

What do we want:

Multiply L and G separately of each other (e.g. L*2 but G*1; or L*0.3 and G*0.7)
Combine L from one string with G of another string

What we cannot have:

Different lengths of L and G in one and the same embedding
Swapping places of L and G vectors (they have different depth dimension)
Load SD1 or SD2 embeddings to use their L or G, because WebUI does not list them in SDXL mode whatsoever.
Parenthesis or grouping, since the math parser is rather simple, it can only postpone +/-, or do *///: right away, operating only on two internal variables ("left" operand and "right" operand: * does right=right*this and + does left=left+right; right=this;)

A few ideas:

Two different merge expressions, controlling L (first part) and G (second part) separately:

<'use clip'*1.4 | 'this is OpenCLIP'*0.5>

What if lengths are different? Throw an error or pad silently?

Zero-fill L/G operator:

<'this is OpenCLIP'*0.5:G + use clip'*1.4:L>
('X':L will zero-fill G-part of 'X'; read as "use L" )

Also see a89dde6#commitcomment-140709559

The text was updated successfully, but these errors were encountered:

aleksusklim · 2024-04-11T17:45:30Z

Also, how about a checkbox in EM tab named Save converted SD1/SD2/SDXL versions in separate files, that will:

In SD1 mode create /embeddings/embedding_merge/SDXL/%name%.safetensors with zero G but 768 of L
In SD2 mode create /embeddings/embedding_merge/SDXL/%name%.safetensors with zero L but 1280 of G
In SDXL mode create /embeddings/embedding_merge/SD1/%name%.safetensors with 768 of L and /embeddings/embedding_merge/SD2/%name%.safetensors with 1280 of G

This way you would be able to convert between different embeddings by checking this flag and loading a correct base model to switch mode!

aleksusklim · 2024-04-16T19:05:37Z

Can anybody explain me why I get embedding hidden size for SD2 as =1024 but for SDXL part G as =1280 ?
For SD1 and SDXL part L it is =768 as expected.

silveroxides · 2024-05-21T10:10:08Z

EDIT: realized that | also could be used instead of + so instead of merging two sentences targeting both clip it merges two sentences that target one clip each. That way it will not interfer with the single quoted part like the example below might do.

What about if just adding | inside single quotes would make it so anything before it targets CLIP L and anything after targets CLIP G and that it also changes position of math operations and such to the left side of the singe quotes for CLIP L and right side for CLIP G.
Would probably be good to add a character to the syntax that indicates that any character after it that is appended by a numerical value are math operations and ' > and + without appended numerical value assumes their default functions. In the following example the # is the indicator that there is math operations.
The sentence a blue dog is only acting on CLIP L and is multiplied by 1.25 white a red dog is only acting on CLIP G and is divided by 0.8,
then they are merged with a green dog since the + after the last math operation is not followed by a number and that sentence targets both CLIP and is multiplied by 0.5. Does this seem reasonable @aleksusklim ?

<#*1.25'a blue dog|a red dog'#/0.8+'a green dog'#*0.5>

Separated prompts for two different text encoders seem unnecessary. Separated prompts for the base model and refiner may work, but the effects are random, and we refrain from implementing this.

Also this statement about separately prompting clip that fooocus maintainer wrote can be dismissed.
I have proof that under the right circumstances, separately prompting the clip models can provide significant improvement.
I have done extensive experiments on this.

aleksusklim · 2024-05-21T10:40:24Z

<#*1.25'a blue dog|a red dog'#/0.8+'a green dog'#*0.5>

I don't understand this. Firstly, any runtime merge expression ought to start with single quote, otherwise it won't get parsed (and will mess up with other extensions if I'd try to interpret it), so the only valid start is <' or <'',

Secondly, you seem to include a control character | inside single quotes. This is wrong, because currently there are no prohibited symbols inside quotes (actually, even the single quote itself can be freely used: to do this you'll have to double it, for example cat's should be <'cat''s'>; I don't see this documented anywhere in the docs, but it was possible from the very beginning!)

| also could be used instead of + so instead of merging two sentences targeting both clip it merges two sentences that target one clip each

Show some examples, and note that I cannot delay multiplication for anything but the directly preceding term, so we cannot have "multiplication from left" like X*'S', but only 'S'*X

I have done extensive experiments on this.

Where, with what software? (Comfy, Diffusers?)

silveroxides · 2024-05-21T20:08:32Z

I realized the | issue inside single quotes hence the edit. That is why in edit I hinted towards another method.

<'a blue dog'#*1.25|'a red dog'#/0.8+'a green dog'*0.5>

'a blue dog'#*1.25 would represent the CLIP L part
| would indicate that the single quoted to the left is L and to the right is G
'a red dog'#/0.8 would represent the CLIP G part
+ would function as normal (in this case the L and G parts to left that are merged with different tokens but at the same location in prompt will merge with the one on right that have same tokens on both clip)
'a green dog'*0.5 is created with both CLIP.
The # would indicate single CLIP operation and unless there is a presceeding | then that CLIP is L. If there is a | presceeding prompt then it is CLIP G and will be merged as such
In the case of only wanting one CLIP to and other to be padded with zero then you would just leave that single quote empty followed by only a# followed by | if CLIP L or one of the following if CLIP G: +' if more merges are being done or > if nothing else. Note that padding should be done to the same token amount as the one that is not padded.

<''#|'a red dog'#/0.8>
<'a blue dog'#*1.25|''#>

aleksusklim · 2024-05-21T20:54:13Z

Confusing.

Couldn't you just |'string to indicate it as L and #'string to indicate it as G, at that rate?

silveroxides · 2024-05-24T04:55:18Z

Confusing.

Couldn't you just |'string to indicate it as L and #'string to indicate it as G, at that rate?

You are right. I do tend to overcomplicate some things.
As long as |'string if used alone also does torch.zeros on G and #'string if used alone also does torch.zeros on L it should be fine i suppose.

aleksusklim · 2024-05-24T06:10:19Z

Give several examples how you would use this, especially if you told that you already have experience in messing with two separate prompts?

silveroxides · 2024-05-28T03:11:03Z

Well the influence over image is not equal between the two CLIP models but by multiplying the magnitude of embedding only using L CLIP this can be overcome and since L CLIP is same as SD 1.5 CLIP it has all the openai training still there.
I have already used this but in a workaround manner by creating embedding with SD 1.5 model and then convert them to work with SDXL by zero padding G.
If you check the Abs parameter when parsing you can see that G value is consistently higher than L. Even these out and prompt coherence goes up as well

aleksusklim · 2024-05-28T16:19:03Z

So you actually need a separate multiplication?
Like *L1.7 and *G0.8 instead of just *1.7 and *0.8 ?

This way, to get pure L you will just 'string'*G0
Would that be enough?

silveroxides · 2024-05-28T16:46:57Z

Yes that sounds great. It makes sense too since if you are going to target only one clip you would want to use multiplication in order to compensate a bit. At least from my own experience.

Also here are three embeddings that were converted from SD 1.5 to SDXL with the padding technique if you want to check them out for effectiveness, parameters and such:
xlconverted.zip

aleksusklim · 2024-05-28T19:04:42Z

By chance, maybe you know why G part is not compatible with SD2 ?
I thought there is OpenCLIP in both SD2 and SDXL.

silveroxides · 2024-05-29T08:00:32Z

Because the OpenCLIP model used by SD 2.0-2.1 is not G. I believe it is H and the hidden dim size of G is 1280 while H is 1024. Below is screenshot of each text encoders configuration file

aleksusklim · 2024-05-29T17:46:10Z

I've pushed two changes:

Now multiplication and division supports L and G suffix: 'a cat'/2G, 'test'*1.5L. Only literal uppercase "L" and "G" are allowed, directly after the number. To keep only L vectors you should do *0G
Now there is a checkbox I described earlier, Better SDXL support? Individual control over two CLIPs #17 (comment) but without SD2 part. So, each saved embedding by default is automatically converted to SD1 or SDXL if possible, and saved with the same name to a subfolder as safetensors.

The documentation is not updated yet. Can you test everything and make sure it is working as you might expect, and that nothing got broken?

silveroxides · 2024-05-31T11:24:25Z

Everything seemed to be working well but at one point, whatever I put in negative prompt became positive instead for some reason. Gonna investigate it some more. Been doing all kinds of crazy stuff though so it does seem to be working overall

silveroxides · 2024-06-03T00:58:03Z

So yeah things are working as they should. One suggestion though is in addition to placing the safetensor converted embedding when saving is to add a suffix to it since without that, sdxl embedding sharing same name as sd15 embedding will not show up in extensions such as tag autocomplete but instead shows just as the sd15 version. I have gotten used to naming mine with suffixes '_xl' and '15', but something like 'vXL' and 'v1' would be more clear.

aleksusklim · 2024-06-03T07:28:56Z

Why to use a prefix if you naturally cannot have loaded both SD1 and SDXL versions in WebUI at the same time?

silveroxides · 2024-06-09T01:10:35Z

Because. When an SDXL model is loaded the extension a1111-sd-webui-tagcomplete is unable to differentiate between the two since it is only used for aliasing and quick acess to embeddings, loras and such through prompt. So if two embeddings has the same name, it then displays it as a SD 1.5 embedding. In image I have an SDXL model loaded, I am using extension in prompt while displaying the actual available SDXL embedding and you can see that the one with the exact same name is displayed as v1 Embedding even though there obviously is a XL one available. That is cause that extension is not meant to do checks for loaded model or anything like that. It is just performing aliasing and prompt shortcuts for embeddings and extra networks. You will have to excuse the name but it is the only one that was left that I had not suffixed. Hope this explains it. Otherwise I suggest you check out the extension I mentioned so you get first hand experience. The extension

aleksusklim · 2024-06-09T12:42:10Z

the extension a1111-sd-webui-tagcomplete is unable to differentiate between the two

And so what? The embedding is there, and it will be used in generation.

It is just performing aliasing and prompt shortcuts for embeddings and extra networks.

That extension should not list embeddings that are not compatible with the current model, because this is a lie that they are usable: WebUI would not throw any errors but instead will take the name literally as text, without substitution.

Showing the wrong type of the embedding because of duplicated name is not a bigger lie!

I have gotten used to naming mine with suffixes '_xl' and '15', but something like 'vXL' and 'v1' would be more clear.

Why to rename them, if it would be convenient for prompts to keep general names of embeddings which would allow you to swap models without changing the prompt?

For example, if your SDXL embedding of a furry dog boy is catgirl1 and you have its L part stored as catgirl1 too, then your prompt would work regardless of what the current model is, SDXL or SD1.

silveroxides · 2024-06-10T19:09:57Z

Yeah I will just head over to that extensions repo and ask them to change their entire way of fetching embedding/extra networks names.

I have 2600 embeddings. If I would have same name on both xl and v1 variants, currently it would just display as v1 in that menu and I would be clueless to know if that is one that has one for each architecture or if it is one that I have yet to convert. So no there is no convenience by having them being named the same in that context. I would however understand the convenience for casual users that does not use EM for constructing highly complex embeddings through multiple intermediary steps like I do.

aleksusklim · 2024-06-10T19:22:07Z

Yeah I will just head over to that extensions repo and ask them to change their entire way of fetching embedding/extra networks names.

You may backlink here when you do; meanwhile I will be updating my docs for the new syntax…

silveroxides · 2024-06-11T15:58:36Z

The tagcomplete issue has been resolved.

By the way, my PR has been merged to webui dev branch.
It is now possible to unlock clip skip option for clip L when using SDXL which can bring some benefits, especially if combined with prompt editing timelines and this extension.
Link to the pull request if you want to take a look.

aleksusklim referenced this issue Apr 11, 2024

Fallback to .safetensors on saving errors, see #14

a89dde6

klimaleksus pinned this issue Apr 11, 2024

klimaleksus added a commit that referenced this issue May 29, 2024

Support for separated multiplication for SDXL, see #17

4512e7a

silveroxides mentioned this issue Jun 10, 2024

Same name v1 and vXL embedding only show as v1 DominikDoom/a1111-sd-webui-tagcomplete#290

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better SDXL support? Individual control over two CLIPs #17

Better SDXL support? Individual control over two CLIPs #17

aleksusklim commented Apr 11, 2024 •

edited

Loading

aleksusklim commented Apr 11, 2024 •

edited

Loading

aleksusklim commented Apr 16, 2024

silveroxides commented May 21, 2024 •

edited

Loading

aleksusklim commented May 21, 2024

silveroxides commented May 21, 2024

aleksusklim commented May 21, 2024 •

edited

Loading

silveroxides commented May 24, 2024

aleksusklim commented May 24, 2024

silveroxides commented May 28, 2024

aleksusklim commented May 28, 2024

silveroxides commented May 28, 2024

aleksusklim commented May 28, 2024

silveroxides commented May 29, 2024

aleksusklim commented May 29, 2024

silveroxides commented May 31, 2024

silveroxides commented Jun 3, 2024

aleksusklim commented Jun 3, 2024 •

edited

Loading

silveroxides commented Jun 9, 2024

aleksusklim commented Jun 9, 2024

silveroxides commented Jun 10, 2024

aleksusklim commented Jun 10, 2024

silveroxides commented Jun 11, 2024

Better SDXL support? Individual control over two CLIPs #17

Better SDXL support? Individual control over two CLIPs #17

Comments

aleksusklim commented Apr 11, 2024 • edited Loading

aleksusklim commented Apr 11, 2024 • edited Loading

aleksusklim commented Apr 16, 2024

silveroxides commented May 21, 2024 • edited Loading

aleksusklim commented May 21, 2024

silveroxides commented May 21, 2024

aleksusklim commented May 21, 2024 • edited Loading

silveroxides commented May 24, 2024

aleksusklim commented May 24, 2024

silveroxides commented May 28, 2024

aleksusklim commented May 28, 2024

silveroxides commented May 28, 2024

aleksusklim commented May 28, 2024

silveroxides commented May 29, 2024

aleksusklim commented May 29, 2024

silveroxides commented May 31, 2024

silveroxides commented Jun 3, 2024

aleksusklim commented Jun 3, 2024 • edited Loading

silveroxides commented Jun 9, 2024

aleksusklim commented Jun 9, 2024

silveroxides commented Jun 10, 2024

aleksusklim commented Jun 10, 2024

silveroxides commented Jun 11, 2024

aleksusklim commented Apr 11, 2024 •

edited

Loading

aleksusklim commented Apr 11, 2024 •

edited

Loading

silveroxides commented May 21, 2024 •

edited

Loading

aleksusklim commented May 21, 2024 •

edited

Loading

aleksusklim commented Jun 3, 2024 •

edited

Loading