auto-MBW-rt extension by 6DammK9 (1.10.1 Tested)

Only tested in Winodws a.k.a my machine. I'm not gradio / webUI expert therefore do not expect any auto / e2e solutions. Also I do not gruntee to have any decent test coverage.
Currently SD1, SD2 and SDXL are tested. You cannot merge different UNET architecture. Check out model versions first.
No, it have no chance to support SD3 / AuraFlow / Flux. This will be V3 and requires revamp the nasty block prefixes here, and the "UI" that I never use it. I'd do it if someone are willing to manage 50+ layers
You will see loads of Missing key(s) in state_dict: when the settings in A1111 is not correcly loaded. Keep switching UI's selected model to non SDXL models, and try again. If you see a *.yaml is loaded, it is usually success. Sadly it is done in A1111 instead of extensions.
See the base merger for more details.
NO SUPPORT FOR aki / "秋葉" build.
Check out sd-webui-bayesian-merger (along with merger base code "meh") which is doing what I'm aiming for, but it doesn't include ImageReward. ljleb's fork is currently in active developement, and has a nice guide. Nah the original 'sdweb-auto-MBW' is not brute force search, but a strange 'binary search', which is not effective or efficient.
See my own findings in AutoMBW. Now published to CivitAI also. Contents will not overlap.
Feel free to discuss in this original thread, or catch me in Discord / Telegram.

Observed hardware requirements

Around 33.4GB of system RAM. Counted casually (loads of applications opened), will drop to 26.4GB after first iterlation. It is considered that model must be created in system RAM first, then move to GPU's VRAM.
conda (miniconda) for dependency, even A1111 has its own venv.

Install prerequisites

Install these extensions via "Extensions" > "Install from URL":

~~sd-webui-runtime-block-merge~~, my fork instead
sd-webui-lora-block-weight not tested

Install dynamicprompts via wheels from pypi:

Download the *.whl file (dynamicprompts-0.29.0-py2.py3-none-any.whl)
Run in cmd: "FULL_PATH_OF_YOUR_A1111_WEBUI\venv\Scripts\python.exe" -m pip install path_of_the_whl_file.whl --prefer-binary

You may face "Premission denied" while moving extension from tmp to extensions:

Either cd extensions and then git clone https_github_com_this_repo and then restart WebUI
Or make a directory auto-MBW-rt directly in tmp then rerun the installation.

From AutoMBW V1, make sure your WebUI instance has API enabled as --api in COMMANDLINE_ARGS.

REM IF you have multiple Pythons installed, set this.
set PYTHON=C:\Users\User\AppData\Local\Programs\Python\Python310\python.exe
REM 2nd SD (7861) for 2nd GPU (1)
set COMMANDLINE_ARGS=--medvram --disable-safe-unpickle --deepdanbooru --xformers --no-half-vae --api --port=7861 --device-id=1

Install these extensions via "Extensions" > "Install from URL":

Obviously this branch.

Special notes on fresh installing Python / A1111

Make sure you are using Python 3.10.
pip install -r requirements.txt explicitly in this directory. I've seen dynamicprompts is failed to install along with A1111. Afterthat pip will throw some error but A1111 will start eventually.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gradio 3.41.2 requires huggingface-hub>=0.14.0, but you have huggingface-hub 0.13.4 which is incompatible.

(240730) If you experience ImportError: cannot import name 'xxx', checkout this sourcecode again and explictly pip install xxx. I'm not sure how long it stays compatable with A1111. Freeze A1111 or just seek other mergers.

Basic procedure

"Make payload". Treat it like "trigger words", or anything you like, or testing dataset in AI/ML.

A minimal payload (e.g. single 512x512 image) is suggested if you are using it for the first time, to make sure the code works. ~~programmer's life~~
Payloads are stored in payloads/*.json.

"Set classifier". I like BayesianOptimizer with ImageReward.

I will set my recommended values as default.

"Search". For RTX 3090, it requires around 60x time for each payload. If the payload takes around 15 seconds to complete, it takes around 15 minutes. It applies for a batch using 4 minutes (4.5 hours).

Optimization part (on test score) takes only a few seconds to compelete. 26 parameters is easy, comparing to 860M for SD.
You may see "Warning: training sequential model failed. Performing random iteration instead." It means that the optimizer has nothing to initialize but pure random. Ignore this if you're going to start from random weights.

See csv/history/*/*.csv for results. Also see models/Stable-diffusion/*.test-23110502.recipe.txt for a formatted recipe.

If you encounter errors

Trust me. Always reboot webUI first. Head straight into merging without any operation. State control in WebUI (even Python) is awful.

Encountered errors that I cannot solve (limitation)

Currently I am experiencing Error when updating the "UNET Visualizer" and "Gallery". It is deep into Gradio's Queue and I am unable to fix it. However before it throws error, I can see live update. Since it is not fatal crash, I'll leave it open and ignore this issue. I have found that every=0.5 or 10 or 30 will throw this error, but None will not, however no preview will be shown. Currently I choose every=None, maybe I will make it configurable and let user guess it (tied with image generation time?)

ERROR:    Exception in ASGI application
Traceback (most recent call last):
...
h11._util.LocalProtocolError: Can't send data when our state is ERROR

If the worst case happens a.k.a. program crash while merging after optimization, you will need to merge manually with the recipe (27 numbers, indexed from 0 to 26). Since there is bug in sd-webui-runtime-block-merge, please refer the image below. PoC script. tldr: IN00-IN11, M00, TIME_EMBED, OUT00-OUT11, OUT. Fixed in my fork. Swap TIME_EMBED and OUT if using my fork.

If you want to continue the training as warm start, make sure grid = vertices = random = 0. Then input the "Warm Up Parameters" in sequence as IN00-IN11, M00, TIME_EMBED, OUT00-OUT11, OUT. Same as above. Swap TIME_EMBED and OUT if using my fork. See the recipe with console output, the generated csv and compare with the FULL screenshot (TIME_EMBED and OUT swapped). The console log should align to the CSV, instead of the content in recipe.

testweights: 0.4,0.9,0.5,0.6,0.5,0.0,0.9,0.9,1.0,0.4,0.3,0.8,0.3,1.0,0.9,0.6,0.8,0.9,0.7,0.6,1.0,0.9,0.6,0.7,0.3,0.6,0.0
...
0.4,0.9,0.5,0.6,0.5,0.0,0.9,0.9,1.0,0.4,0.3,0.8,0.3,1.0,0.9,0.6,0.8,0.9,0.7,0.6,1.0,0.9,0.6,0.7,0.3,0.6,0.0,0.6132534303822829,174590.83615255356,174624.89337921143

(Related to the previous error), if you see the hyper_score is reporting the same score with wrong iterlation count (e.g. always 0.529 with iter 1, 2, 4, 8 etc.), the merge already failed, and you should restart the WebUI and close the webpage completely. I have found that it is usually caused by Model A / B are same as the WebUI's selected model. I have added checking about this issue.

Observations and explanations of parameters

For "Search Type A" and "Search Type B", they are related "Opt (A to B)" for switching streadgy in runtime. By default it is solely using Type A.
For "P1 / P2 / P3", they also switch streadgy in runtime, in simple iteration in sequence. By default only P1 is enabled. ~~Some ML algorithms requires consistency, I'll add reference if I really find the reasoning on this feature.~~
"Force CPU" is forced on. I see RuntimeError: expected device cuda:0 but got device cpu if it is off ~~and it is a headache to trace and move all tensors.~~
Both upper limit of "Sampling steps" and "hires Sampling steps" are raised to 2048. SD's Traning step is 1000 and you can further extrapolate to infinity. Now I use 256/64 frequently. Hence the extended range.
"Test Intervals" upper range is raised to 10000. Using 20+ for BayseianOptimizer will raise ValueError: broadcast dimensions too large. already (np.meshgrid). I was considering 10000 i.e. 4 DP. Unless you are doing exhausive Grid search, any search in relative scale desires for a fine space. Merge ratio is also in relative scale a.k.a fraction, which you don't need 1 DP if you are not required to remember the numbers (opposite of human search in MBW):

all_pos_comb = np.array(np.meshgrid(*pos_space)).T.reshape(-1, n_dim)

    if args[params["chk_enable_clamping"]]:
        search_space.update({str(idx): [*np.round(np.linspace(args[clamp_lower[idx]], args[clamp_upper[idx]], num=args[pass_params["sl_test_interval"]]+1), 8)]})
    else:
        search_space.update({str(idx): [*np.round(np.linspace(lower, upper, num=args[pass_params["sl_test_interval"]]+1), 8)]})

After serval actual runs, unfourtunately 20 intervals still occasionally throws the same error while performing meshgrid, meanwhile it takes 2-3 time longer to complete an iterlation, and it is also 2-3 times harder to converge. "Test Intervals" default will be stayed at 10. These are the optimizers using meshgrid:

LipschitzOptimizer
BayesianOptimizer
ForestOptimizer

Keep "Test Grouping" as 1. I don't know why we need to repeat the parameters. Is it related to supersampling?

    grouping = localargs.pass_through["grouping"]
    tunables = localargs.pass_through["tunables"]
    testweights = localargs.pass_through["weights"].copy()
    for key in tunables:
        for interval in range(grouping):
            testweights[int(key)*grouping+interval] = localargs[key]

Initialize Grid / Vertex / Random should be ignored. It is only useful if you are dedicated to search from the extreme ratios first (pure A by experience). Also the search parameters are way too much (24 + 2 in total). It will waste so much time.
"Warm Start" will be disabled. It will use hyperactive's API for initialization, and then "Read the parameters from the input of the 26 slidebars in page bottom", if grid = vertices = random = 0. Disable for random initialization (common for DNN training).
Clamping / LoRA is untouched. I only moved the UI components to reduce some area.
"Early Stop" is enabled with parameters is slighty raise to 27, which is parameter counts. It is a common setting for Early stopping. The iterlation count is also raised to 270 (expect 10 intervals).
Search Time is greatly increased to 10000 minutes (around 7 days). It was 2880 minutes (2 days). I have found that my prefered payloads (12 payloads x 1 image) takes longer then 2 days for worst case (expected 12 hours). It is comparable to common SD / LoRA finetuning, but computational power is still minimum (only t2i).

Bonus: Efficiency

The efficiency is considered fast.. torch.lerp greatly reduces overhead.

Bonus: Visualizing the RL effect

See this notebook I've made. Transfer the files from csv/history/[long_folder_name]/[long_file_name].csv to anywhere you want, and then change the path in the notebook (csv_files = glob.glob("[your_folder]/*.csv")), and also rename the csv as id-[long_file_name].csv and execute the notebook. It is similar to the legit training process while finetuning, but the y-axis is inverted, because loss function is usually opposite to reward function. See this article for comparasion.

Bonus: Mimic "Add diff"

Referring this artistic session.
autombw once more with the "average weight" of your merge model.

Change Log

Logger is added. Inspired from sd-webui-animatediff and sd-webui-controlnet .
Fix for multiple SD instandces. It reads --port instead of hardcoded http://127.0.0.1:7860.
Rearrange the UI components. It is so raw and confusing.

This is part of my research.

Just a hobby. If you are feared by tuning for numbers, try "averaging" by simply 0.5, 0.33, 0.25... for 20 models. It works..

auto-MBW-rt | a.k.a V2-BETA

NOTE: THIS IS IN BETA. NEWER COMMITS MAY BREAK OLDER ONES. FUNCTIONALITY NOT GUARANTEED.

An automated (yes, that's right, AUTOMATIC) MBW extension for AUTO1111.

Rewritten from scratch (not a deviation) UI and code.

Old (V1) example models here: https://huggingface.co/Xynon/SD-Silicon

Old (V1) article here: https://medium.com/@media_97267/the-automated-stable-diffusion-checkpoint-merger-autombw-44f8dfd38871

Made by both Xynon#7407 and Xerxemi#6423.

Big thanks to bbc-mc for the original codebase and the start of this merge paradigm.

You can find it here: https://github.com/bbc-mc/sdweb-merge-block-weighted-gui

MERGING BACKEND: Huge thanks to ashen

https://github.com/ashen-sensored/sd-webui-runtime-block-merge

LORA BACKEND: Huge thanks to hako-mikan

https://github.com/hako-mikan/sd-webui-lora-block-weight

LORA BACKEND (SOLID): Huge thanks to hako-mikan

https://github.com/hako-mikan/sd-webui-supermerger

OPTIMIZER LIB: Massive thanks to SimonBlanke

https://github.com/SimonBlanke/Hyperactive

Wiki/Documentation

coming soon^TM

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
csv		csv
docs		docs
payloads		payloads
scripts		scripts
settings		settings
wildcards		wildcards
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.py		install.py
metadata.ini		metadata.ini
requirements.txt		requirements.txt
style.css		style.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

auto-MBW-rt extension by 6DammK9 (1.10.1 Tested)

Observed hardware requirements

Install prerequisites

Special notes on fresh installing Python / A1111

Basic procedure

If you encounter errors

Encountered errors that I cannot solve (limitation)

Observations and explanations of parameters

Bonus: Efficiency

Bonus: Visualizing the RL effect

Bonus: Mimic "Add diff"

Change Log

This is part of my research.

auto-MBW-rt | a.k.a V2-BETA

About

Releases

Packages

Languages

License

6DammK9/auto-MBW-rt

Folders and files

Latest commit

History

Repository files navigation

auto-MBW-rt extension by 6DammK9 (1.10.1 Tested)

Observed hardware requirements

Install prerequisites

Special notes on fresh installing Python / A1111

Basic procedure

If you encounter errors

Encountered errors that I cannot solve (limitation)

Observations and explanations of parameters

Bonus: Efficiency

Bonus: Visualizing the RL effect

Bonus: Mimic "Add diff"

Change Log

This is part of my research.

auto-MBW-rt | a.k.a V2-BETA

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages