-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stable-diffusion : ggml-alloc integration and gpu acceleration #75
Conversation
Nothing comes directly to mind. Need to have a more detailed look in the code when I get some time. |
Great job! I'll find some time to review and merge it. |
@FSSRepo thanks.
Otherwise I get: |
I will try to reproduce this, because it's a strange behaivor. Can you provide more information, operative system, hardware details |
Of course: Windows 11 / VS / CUDA / GTX 3090 / i7-12700k / 32GB RAM / model: v2-1_512-ema-pruned-ggml-model-f16.bin With the main branch I get this error: |
245 to 257 are 12 lines or you refer 245 to 247? |
Yes, my mistake (and still the error with around 20GB left on my system drive)... |
With |
The backend to compute is fixed to cpu_backend backend = ggml_backend_cpu_init(); |
I see. Interestingly it is already a little faster (8~ vs 10~ seconds/step). |
Possibly matrix multiplications are being delegated to the GPU. The ultimate goal of this PR is to perform 100% of the computation on the GPU and see what performance I get on my 4GB RTX 3050 laptop. Tomorrow i will finish the kernels that are missing and enable the cuda backend. |
Was you using stable diffusion 2 when this error happen? |
What matter that in this context? |
@FSSRepo yes, it was with stable diffusion 2. I had that error |
Instead of calling for (struct ggml_tensor * t = ggml_get_first_tensor(ctx); t != NULL; t = ggml_get_next_tensor(ctx, t)) {
ggml_allocr_alloc(alloc, t);
} You can also use a similar loop calling Caveat: the functions to enumerate the tensors in a context are only in the llama.cpp repository currently, they still need to be synced to the ggml repo. Related: ggerganov/ggml#578 |
Why did you close this? |
To resolve merge conflicts, I rebased the master branch of FSSRepo/stable-diffusion.cpp onto the latest master branch of leejet/stable-diffusion.cpp. I also updated the URL for GGML to https://github.com/FSSRepo/ggml/tree/master to make it easier to review the changes and facilitate future merges. |
It seems like I mistakenly closed this pull request. |
I will try to consider whether it's a good idea to do this in the future, as it would be best for me to create a specific branch for stable diffusion, where I will make the necessary modifications during the progress of this pull request, since I am using the master branch of FSSRepo/ggml for another pull request in the original ggml repository. |
I am unable to push my changes to https://github.com/FSSRepo/stable-diffusion.cpp/tree/master at the moment, which prevents me from reopening this pull request. Can you please pull https://github.com/leejet/stable-diffusion.cpp/tree/pr75 and push the corresponding changes to https://github.com/FSSRepo/stable-diffusion.cpp/tree/master? This way, I can reopen this pull request. |
Try now I created a new branch |
Sorry for the inconvenience |
Don't worry. Anyway, I will need your help to understand how some aspects of this stable diffusion implementation work. |
I still can't reopen this PR. It says 'there are now new commits.' It might require you to push the new changes to the master. |
I will try it |
Create this branch sd-cpp-fixes where I will be making the changes during the development of this PR. Can you change this submodule? @leejet This will need to be done in your main repository, and then apply the changes to my fork. Afterward, I will add the changes to the |
I've updated the submodule 'ggml' in this branch https://github.com/leejet/stable-diffusion.cpp/tree/pr75 to the latest commit of FSSRepo/sd-cpp-fixes, and this branch also includes the changes you made to stable-diffusion.cpp earlier. |
You can pull this branch to your local machine, rebase your local stable-diffusion.cpp master branch to this branch, and then push the changes to https://github.com/FSSRepo/stable-diffusion.cpp/tree/master. |
@FSSRepo the timer for sampling is not reset between images, so sampling takes longer and longer. not sure how to properly display this.
|
One batch generation is working well for me, but if I start a second batch (with size 1, too) the app crashes and I get this error message:
Anyway, it only happens with my Open Frameworks implementation (but with Linux and Windows)... |
|
@Jonathhhan I don't notice any change when adding this part; it continues to generate set_target_properties(${SD_LIB} PROPERTIES
BUILD_WITH_INSTALL_RPATH FALSE
LINK_FLAGS "-Wl,-rpath,$ORIGIN/") |
@FSSRepo yes, it is still generated in the same directory. The difference is, if I run the program with the change |
this fixes your problem? GGML_ASSERT: /mnt/c/Users/Jonat/Desktop/cp/stable-diffusion.cpp/ggml/src/ggml.c:4065: ggml_can_mul_mat(a, b)
CUDA error 4 at /mnt/c/Users/Jonat/Desktop/cp/stable-diffusion.cpp/ggml/src/ggml-cuda.cu:8550: driver shutting down
current device: 0
Segmentation fault
make: *** [../../../libs/openFrameworksCompiled/project/makefileCommon/compile.project.mk:191: RunRelease] Error 139 |
@FSSRepo no, its a different one. Maybe that is (really no idea), because the buffers/contexts are freed after each batch and they only should be freed if the program is closed or recreated if a new model is loaded? Edit: it works, if I do not free the context at the end of |
With
|
@Jonathhhan check now |
@FSSRepo it works. Thanks. |
Hi, one question. It works on iOS? |
@paulocoutinhox Yes, it can be done, but it requires creating the missing kernels for the Metal backend based on the ones I have created for cuBLAS. I don't have a Mac or iPhone, and I don't know much about Metal programming, so I am the least qualified to contribute anything. |
Nice. But i run i run today in one iPhone, what happen? It will run on CPU? |
short options should be only a single character
yes |
employ PascalCase for class/struct/enumnames and underscore_case for variables/functions/methods.
I've tested it on Windows/Linux/macOS, and now it's working perfectly. The PR is ready for merging. Thank you for your excellent work! |
Context
I had been waiting for a response #72 from @leejet regarding updating the GGML version so I could integrate
ggml-alloc
, the new memory management approach for GGML (also necessary for integrating the CUDA backend). He synchronized it with the latest changes in @ggerganov GGML repository, but it seems that in the process, the GGML library stopped working directly (at least on my computer, it's causing issues when computing) because it still has the dynamic mode integrated.I'm using this ggml version in this PR.
I have removed the ggml submodule and directly implemented one that I already have, in which I refactored the Conv1D and Conv2D operations to use
ggml_im2col
andggml_mul_mat
. This will make it easier for me to create the different CUDA kernels.Build this PR
I've noticed that
ggml-alloc
consumes much more memory than leejet's currentdynamic mode
. I'm likely overlooking some configuration to prevent extensive compute graphs from consuming too much memory withggml-alloc
.Here are the comparative details and results of the integration with
ggml-alloc
Note: the stable-diffusion.cpp commit afec505 that worked for me.
Memory usage:
Timings:
Result image (the same in both cases):
Over the next few days, I will be making some improvements, trying to investigate and gather feedback with
ggml-alloc
to avoid the current memory usage, and once I have fixed that, I can start working on implementing the CUDA kernels.Progress overview
Here's a summary of what I will be carrying out in the coming days to complete this PR:
ggml operations that utilize stable diffusion and their availability across backends.:
ggml_repeat
ggml_repeat
Tasks for complete this PR:
ggml-alloc
API correctly, I might be missing something that could help me reduce unnecessary memory usage during compute. can you help me? @slaren @ggerganov.ggml_concat
,ggml_group_norm
,ggml_upscale
,ggml_gelu_quick
.gguf
format to simplify the loading of tensors.