Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is not an issue #98

Open
azrahello opened this issue Nov 17, 2024 · 5 comments
Open

is not an issue #98

azrahello opened this issue Nov 17, 2024 · 5 comments

Comments

@azrahello
Copy link

azrahello commented Nov 17, 2024

Sometimes I need to change sigmas to achieve more detail in the generation, and there’s a node in ComfyUI or tech as noise injection, that allows this. It’s similar to this: sd-webui-detail-daemon. Would it be possible to add a CLI argument to adjust sigmas?
P.S. Could you also add a function to store or save latents and pass latents to the sampler, or something similar, so we can perform a second pass, upscale, or refinement? I think these functions could help make the nodes more standard and consistent with other nodes in ComfyUI.

@raysers
Copy link

raysers commented Nov 18, 2024

Hi, @azrahello My ComfyUI plugin is still being updated, but the recent updates have only been minor improvements to small features, so I haven’t posted update information in #56.

Even so, implementing these small features has caused me quite a bit of trouble. For example, when trying to enable ComfyUI to correctly interrupt the image generation process using the interrupt button, I found that the previous calling method couldn’t achieve this. Since it relies on calling a wrapped generator, ComfyUI cannot externally control the mflux generation or implement arbitrary interruptions (pressing the interrupt button doesn’t stop the tqdm progress bar, which keeps running). As a last resort, I had to deconstruct the existing mflux generation files and add parts controlled by ComfyUI.

I’m sharing this in the hope that it might provide you with some help. Of course, the best solution for the functionality you want would be for the official team to implement it. However, if you urgently need to figure it out yourself, you might consider checking out the core.py file in my plugin. It’s currently just a very basic deconstruction, but it might offer you some inspiration. I can’t guarantee it, but I’m offering it as a reference. If it does help, I’d be really happy, my friend.

@azrahello
Copy link
Author

azrahello commented Nov 19, 2024

Yes, I know you’re doing an excellent job. I was asking about these implementations for a “MFlux Pro and Ultra” perspective. Having a latent image or a similar object, would it be possible to create and perform a sort of “second pass” and or a noise injection, along with an upscaler and a refiner? All without having to decode the image from latent to compressed, and vice versa.

By the way, I haven’t seen the latest changes to your node, but I’ll check them out and update it. PS: I discovered a LoRA that doesn’t work if I use the model without quantization, but it works when I quantize it.

@raysers
Copy link

raysers commented Nov 19, 2024

Yes, I know you’re doing an excellent job. I was asking about these implementations for a “MFlux Pro and Ultra” perspective. Having a latent image or a similar object, would it be possible to create and perform a sort of “second pass” and or a noise injection, along with an upscaler and a refiner? All without having to decode the image from latent to compressed, and vice versa.

By the way, I haven’t seen the latest changes to your node, but I’ll check them out and update it. PS: I discovered a LoRA that doesn’t work if I use the model without quantization, but it works when I quantize it.

I understand the functionality you're looking for: passing latent data in and out, similar to the secondary sampling and high-resolution upscaling in the dual-model refinement approach of SDXL in the past. This is an excellent feature. To achieve it, we would need to split the code and extract the logic related to latent processing. Currently, the mflux plugin can only integrate prompt-based plugins and post-processing plugins for images. With every new interface added, the number of compatible plugins will significantly increase (depending on how many plugins operate on that interface).

Ideally, the best scenario would be to design a series of standard interfaces compatible with Comfy, so it could seamlessly integrate into the entire ecosystem.

However, achieving this still requires continuous learning. If we ever manage to implement these designs, that would mark the moment when mflux truly reaches its full potential—the true "Mflux Ultra." For now, though, these are just aspirations and ideas for a better future.

P.S. The discovery you mentioned about LoRA is quite intriguing. If this is a bug, I believe the official team should take note of it.

P.S.2. I’m currently researching how to integrate the step-by-step images from mflux 0.4.X into Comfy’s real-time preview. There’s no information online about real-time previews for custom samplers (some integrated nodes are based on K-samplers), so this requires independent research, and progress has been very slow.

@azrahello
Copy link
Author

Exactly! You perfectly understood my intention! :D Regarding the preview of the latent image in ComfyUI, I was thinking that the process performed by “stepwise” in MFLUX and by Taesd in ComfyUI might be somewhat similar “objects” that could potentially be ported from one to the other.

Additionally, I was considering using “mlx_lm,” “mlx_vlm,” “mlx-phi,” or “mlx-llava” (e.g., Phi-3-Vision-MLX, mlx-vlm, or mlx-llava) for generating descriptions based on images.

I’ve realized that when using Ollama, the RAM usage during and after execution is not exactly “lightweight.” Yesterday, I started looking into some code to see if this could be feasible.

PS: I’m taking Python lessons to try to help you with development work.

@raysers
Copy link

raysers commented Nov 20, 2024

@azrahello That's great! I'm happy to hear that you're learning. Learning brings happiness, and I've always viewed working on this plugin as a learning process, and you'll find that progress comes naturally.

And I saw "LLAVA",if you're referring to an image backpropagation plugin, I've always used Florence 2. However, I haven't used it in a while, and there's a newer version now. According to reviews, it only requires 1-2GB of VRAM, and its backpropagation results might be able to rival JOYCAPTION, the current leader in the field, especially when the aspect ratio of the reference image matches, as the output will be even better.

A few months ago, I also tried porting MLX's LLAMA 3.1 to COMFYUI, but the results were disappointing and even made me feel that it was somewhat unintelligent. I was using the 4-bit version. Maybe the issue was with me—I made the porting process too simple. The only reference I had was a small snippet of code on the HUGGINGFACE model page:

from mlx_lm import load, generate

model, tokenizer = load("mlx-community/Meta-Llama-3-8B-Instruct-4bit")
response = generate(model, tokenizer, prompt="hello", verbose=True)

I just improvised with this simple code, and as you can imagine, the result was a complete mess—an LLM that gave irrelevant answers. And to this day, I still haven't started learning MLX...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants