Skip to content

v3.10.1 release (Generic NCNN Upscaler)

Compare
Choose a tag to compare
@Teriks Teriks released this 25 Jul 07:40
· 339 commits to master since this release

see here for latest release

v3.10.1 release with Windows installer.

Due to the size of the packaged python environment, the installer is within a multi-part zip file.

The multipart zip can be extracted using 7-Zip: https://www.7-zip.org/

Download both dgenerate_installer.zip.001 and dgenerate_installer.zip.002 to a folder.

Unzip dgenerate_installer.zip.001 to a directory (Right click, 7-Zip -> Extract to "dgenerate_installer") and then run dgenerate_installer\dgenerate.msi to install.

dgenerate will be installed under C:\Program Files\dgenerate by default with an isolated python environment provided.

The install directory will be added to PATH, and dgenerate will be available from the command line.

Portable Install

A portable install is provided via dgenerate_portable.zip.001 and dgenerate_portable.zip.002, these contain
nothing but the dgenerate executable and a frozen python environment which can be placed anywhere.

v3.10.1 Features & Fixes

1.) Generic NCNN upscaler

ncnn has been added as a package extra. When ncnn is installed, the new image processor upscaler-ncnn is available for generic upscaling using NCNN, and should work with models converted from ONNX format. This is included by default in the Windows installer / portable install environment that is attached to each release.

This upscaler supports tiling just as the normal upscaler image processor does, and essentially the same options in terms of tiling with slightly different defaults.

It does not use the device argument, but instead a combination of use-gpu=True and gpu-index=N for enabling Vulkan accelerated GPU use on a specific GPU.

By default this processor runs on the CPU.

This is because the Vulkan allocator conflicts heavily with the torch CUDA allocator used for diffusion and other image processors when they are placed on the on the same GPU, and having both allocators on the same GPU can cause hard system lockups.

You can safely use this upscaler at the same time as torch based models by running it on another GPU that torch is not going to be using.

Once you have used this processor, be aware that the process will always exit with a non-zero return code, this is due to being unable to clean up the GPU context and certain ncnn objects properly through ncnn python bindings before the process shuts down. It will technically create an access violation / segfault inside ncnn, I am not sure what bad behaviors this will cause on Linux, but on Windows the process exits with no side effects or hang ups other than a non-zero return code.

See: dgenerate --image-processor-help upscaler-ncnn

And also: Upscaling With NCNN Upscaler Models in the readme.

2.) Memory Management

Image processors now have size estimates which are used as a heuristic for clearing out CPU side memory belonging to the diffusion model cache, prior to them being loaded into memory. This should help prevent avoidable out of memory conditions due to an image processor model loading when the diffusion model cache is using most of the systems memory.

This size estimate is also used as a heuristic for freeing up VRAM, particularly the last called diffusion pipeline if it currently is still in VRAM.

If an image processor still runs out of memory, due to its actual execution allocating large amounts of VRAM, it will attempt to free memory and then try again, if an OOM occurs on the second try then the OOM is raised.

Diffusion invocations will now attempt to clear memory and try again in the same fashion for CUDA out of memory errors, but not for CPU side out of memory errors, which are already more easily prevented by the heuristics that are already in place.

The main current enemy of this application running for long periods of time is VRAM fragmentation, which is not avoidable with the default CUDA allocator.

The example runner script in the examples folder has been rewritten to isolate each top level folder in the examples directory to a subprocess when not running with the --subprocess-only flag.

The only way to clear out the memory fragmentation after running so many models of different sizes is to end the process, so each directory is isolated to a sub process to take advantage of dgenerates caching behaviors for the directory, but to avoid excessive memory fragmentation by isolating a medium sized chunk of examples to a process.

There is also now an option --torch-debug in the run.py script which if enabled will try to dump information about objects stuck in VRAM after an OOM condition, and generate a Graphviz graph of possible reference cycles. Currently I cannot find any evidence of anything sticking around after dgenerate tries to clean up VRAM.

dgenerate now sets a PYTORCH_CUDA_ALLOC_CONF value max_split_size_mb of 512 before importing torch.

It also sets PYTORCH_CUDA_LAUNCH_BLOCKING to 0 by default.

These can be overridden in your environment.

3. Fetch CivitAI model links with --sub-command civitai-links

CivitAI has made a change to their website UI (*had some sort of outage) which renders right click copying of direct API links to models no longer possible.

I have written a dgenerate sub-command that can fetch API hard links to CivitAI models on a model page and display them to you next to their model titles.

The links that this command generates can be given directly to dgenerate, or used with the \download directive in order to download the model from CivitAI.

You can use dgenerate --sub-command civitai-links https://civitai.com/models/4384/dreamshaper for example to list all available model links for that model using the CivitAI API.

You can use the --token argument of the sub-command to append an API token to the generated links, which is sometimes needed for downloading specific models.

You can also use this sub-command as the directive \civitai_links in a config / shell mode or the Console UI.

See: dgenerate --sub-command civitai-links --help, or \civitai_links --help from a config / shell mode or the Console UI.

4. Config / Shell - Environmental Variable Manipulation

You can now use the directives \env and \unset_env to manipulate environmental variables.


# using with no args prints the entire environment

\env

# you can set multiple environmental variables at once

\env MY_ENV_VAR=1 MY_ENV_VAR2=2


# undefine them in the same manner

\unset_env MY_ENV_VAR MY_ENV_VAR2

See: dgenerate --directives-help env unset_env

5.) Config / Shell - Indirect Assignment

The config / shell language that is built into dgenerate now supports indirect assignment.

You can use a basic template expansion or environmental variable expansion to select the name of a template variable.

This now works for \set, \sete, \setp, and \env.

It also works for \unset and \unset_env

All other directives which accepted a variable name already supported this.


\set var_name foo

\set {{ var_name }} bar

# prints bar

\print {{ foo }}


\env VAR_NAME=BAZ

\env $VAR_NAME=qux

# prints qux

\print $BAZ

6.) Config / Shell - Feature Flags and Platform Detection

The config template functions have_feature(feature_name) and platform() have been added.


# have_feature returns bool

# Do we have Flax/Jax?

\print {{ have_feature('flax') }}

# Do we have NCNN?

\print {{ have_feature('ncnn') }} 


# platform() returns platform.system() string from pythons platform module

# prints: Windows, Linux, or Darwin.  etc...

\print {{ platform() }}

7.) Exception handing fixes in dgenerate.invoker

The methods in this library module were only capable of throwing dgenerate.DgenerateUsageError when they should have been throwing more fine grained error types when requested to do so with throw=True.

8.) Config / Shell - Parsing fixes

Streaming heredoc templates discarded newlines from the end of the jinja stream chunks, resulting in hard to notice issues with jinja control structures used as top level templates, mostly when the result of the heredoc template was being interpreted by the shell.

9.) Image processor library API improvements

Image processors will now throw when you pass a PIL image that possesses a mode value that the processor can not understand.

Currently, all image processors only understand RGB images.

10.) Console UI updates

Removed antiquated recipes related to image upscaling in favor of Generic Image Process and Generic Image Process (to directory)

From the generic image process recipes you can just select the upscaler or upscaler-ncnn processor from a drop down and fill out its parameters to preform upscaling.


All image processors now expose parameters provided by their base class in the UI, such as device, output-file, output-overwrite, and model-offload.

This allows the ability to select a debug image output location with a file select dialog. This is useful if you are trying to use an image processor as a pre-processor for diffusion and need to see the image that is being passed to diffusion for debugging purposes.

The device argument is hidden in the UI where not applicable, such as the Generic Image Process recipes where the UI selects the device for the whole command instead of via an image processor URI argument.

The device URI argument for image processors is available when selecting pre / post processors for AI image generation from the UI as well as when using the Insert Image Processor URI edit feature.


You can now specify the frame-start and frame-end URI arguments for frame slicing when using the Image Seed URI builder UI.


Fixed minor syntax highlighting bugs related to indirect variable assignments.