-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add image tiling nodes to enable Tiled Upscaling #5142
Conversation
I've looked over the code for this PR and I have a few thoughts. There are a few things that are very useful in the tiled upscaling process we've been using in Discord (via @skunkworxdark's nodes) that I have found to be pretty essential for getting the best results. As I've been testing and using those nodes extensively over the past few days, I'd like to share some comments on what they offer as it relates to what's included and discussed in this PR. Most prominently is the inclusion of @JPPhoto's smart seam implementation, which locates an optimal seam between two adjacent tiles - this seam (plus a specified margin) is used for the overlap blending, which greatly helps to obscure the location of seams. This leads into the discussion about whether a VAE decode can simply be done at the end, rather than per-tile. The smart seam algorithm operates in image space, so it outright requires decoding the tiles individually. Beyond that, another important factor in why I think individual tile decodes are essential is that VAE decode can be costly (unless itself tiled), and if a user is already generating tiles of a size they customarily generate successfully, decoding the tiles individually means users can be assured the decode will be successful as well. Regarding tile size, it's been my experience that large tiles are no particular problem, or at least, are a known problem for anyone who does non-tiling high res optimization node workflows already. Giving the user the power to specify the size of tiles allows users to make tiles as large as they want up to whatever will fit into their system's VRAM. Conversely, providing a mechanism to automatically calculate tile size based on a given number of divisions is very convenient when VRAM is not particularly limited. Both of the aforementioned tile size calculation approaches are covered in @skunkworxdark's nodes, one with his own implementation of a tiler and the other with an implementation by @JPPhoto. In the latter, the user specifies a minimum overlap in pixels, whereas the former just takes an overlap percentage between images, which is assured by the algorithm. I think the different interfaces provided by those two tiling nodes are both useful tools and elegantly implemented. Finally, the ability to sample a single noise tensor, generated at the full upscaled size [and cropped for each tile], is something we've been doing which has proven helpful in a couple of ways. Unfortunately this requires each tile to align to pixel multiples of 8, which further complicates things as these tiles may not all be exactly the same size. Thus, in order to crop segments of the large noise tensor for each individual tile generation, the coordinates of that tile must be provided along with the image for denoising. The purpose of the large noise field is twofold: one, the overlaps between upscaled tiles will both have the same underlying noise, which causes them to blend a bit more smoothly together. Two, if a user is not using pure white noise [a subject for another discussion], then discontinuities could arise in the noise structures between two adjacent tiles. |
@dwringer, thanks for the review! I'll do my best to respond to all of the points you raised. Smart Seam
Smart seam sounds great. I think it makes more sense to add in a future PR though. It should be straightforward to add as a minor version bump to the Merge in image vs. latent space
The workflow shared in this PR does what you are describing (decodes tiles, then merges in image space) - so, I think we are on the same page here. I haven't experimented with merging in latent space. If we did want to support this in the future, I can think of a few possible paths. The simplest would be to just do both tiling and merging in latent space - the current nodes should work as-is for this. Preferred Tiling Interface
It sounds like we have found that different schemes for specifying the tiling are convenient under different circumstances:
To support different tiling schemes, we'll want to have a different tiling node for each scheme (since they have different input APIs). We can add as many tiling schemes as we need in future PRs, but I think the key right now is to agree on a tiling output representation that all of these schemes can share (to ensure that they can all share the same tile merging nodes). @dwringer What do you think of the current Global noise
The workflow attached to this PR generates a single global noise tensor, as you've suggested. With regards to multiple-of-8 errors, the current implementation takes the following approach:
@dwringer, does this handling make sense to you? Or did you have something else in mind? |
@RyanJDick, My first impressions of using the workflow you provided. It seems that you have gathered most of the essence of what has already been done in my nodes. I have only had a brief look at the code and hopefully, I will get a chance to have a deeper dive into that tomorrow. I am not 100% sure why so much effort has gone into rewriting what already exists but maybe when I look at the code in depth I will see.
|
Thanks for the detailed review, @skunkworxdark. I totally used I'll address that and get to the rest of your comments, but probably not until Monday. |
Here's the tiled upscaling workflow, fixed to I'm not sure where to get missing crop node, so I haven't actually tested running it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed yet, but I noticed that this PR somehow does not need a review to merge. This makes me uncomfortable, so I've requested changes to hopefully prevent this accidentally merging.
fyi the crop node is from my nodes and can be found in this branch. https://github.com/skunkworxdark/XYGrid_nodes/tree/Skunk-Exact-division-Tiling |
Just had a go at rebuilding the workflow after @psychedelicious changes as for some reason I couldn't load his fixed workflow. I was just hit with loads of errors. While rebuilding the workflow I found a couple of silly things.
|
@skunkworxdark to use the fixed workflow I uploaded you'd need to rebase this PRs branch on my branch from #5157. Not expected to work on this PR directly |
@psychedelicious that exactly what I thought I did. Do I need to do a yarn build aswell? |
Ah yeah, would need to rebuild the UI (or run it in dev mode with |
Oh yeah I always run it with yarn dev. So not sure why I had issues. |
I identified and fixed some issues in #5157. But workflow loading itself works fine for me, after rebasing #5142 on #5157.
The tiled upscaling works great. Here's my test image: (I swapped out the scale image node for ESRGAN, used the RealESRGANx4 model and then dropped denoise_start down a bit) |
Nice test image. Looks like I have some work to do on my nodes. Any idea what release this might appear in? |
Release in 3.5 I believe, but I added back compatibility logic so it shouldn't break any existing nodes. Just print a warning. I'll test some other community nodes to double check I haven't broken anything though. Workflows made for 3.4 and before will be automatically migrated to the version. The workflow in this PR that had to be fixed needed fixing bc it was based on changes on the first custom types pr (which I will close now) |
I definitely don't want to duplicate the effort that you're already putting in. The spirit of this PR is to take the core ideas that you have developed and make them generally available to all Invoke users. Ideally, the version of this that gets merged is something that you would feel excited to contribute to as oppposed to feeling like a diverging implementation of your nodes.
Yep, was totally using it without even realizing it was a custom node. I copied it into this PR now. Hope that's ok!
The main reason for doing it separately is just for better modularity:
Unfortunately, there is a cost to doing it this way: more unnecessary PNG encode/decode roundtrips. But, this should be fixed globally rather than allowing it to influence node API decisions.
Do you think it's important to support all of these options in core? Is this something that you'd be interested in contributing? I'm thinking that it would probably make sense to add the alternative tiling schemes in follow-up PRs. But, we should make sure that the Tile representation chosen in this PR enables that (I think it does).
No strong use case in mind, so removed for now.
My thinking was that having extra overlap context could be helpful during the image-to-image operation, but you may want a smaller blend amount to avoid 'shadowing' artifacts. In practice, with global noise generation, 'shadowing' doesn't seem to be much of a problem, so the benefit is marginal.
I moved things around a little bit in the blending code to tweak the corner blending behavior. I'm not sure that I fully follow the issue that you're describing here. If you still think there's a problem can you add a comment directly to the relevant code?
It's a bit of a trade-off. On one hand, it would be nice to exit as soon as an error can be detected. On the other hand, doing the multiple-of-8 check in the initial tiling node means that we are restricting it unnecessarily. This could be inconvenient for tiling workflows that do not have this constraint (e.g. if they operate entirely in image space). As a rule of thumb, I think it is best for nodes to only assert input requirements that are strictly required for them to run.
I created a workflow that roughly matches your configs (tile controlnet, no IP-Adapter, etc.). Visually, the result is looking nearly identical to me now. |
I have implemented both of these suggestions, thanks 🙂 |
Just re-based now that #5175 was merged, and changed the base branch to main. (Apologies to everyone who has already pulled, but this seemed like the best option given how much changed in #5175.) Here is an updated workflow for after the rebase: |
Addressed in bb87c98 |
Ok, to summarize where we are at, I think the main open questions that remain are around the plan for 1) optimized tiling schemes (even-split, minimum-overlap), and 2) smart seams. For both features, I see the following options:
If there are no strong objections, I'd like to eliminate Now or LaterFrom the little bit of experimentation that I've done, both of these features seem to add complexity for marginal benefit. This is why I've been hesitant to jump on them right away. I'm definitely willing to change my mind though. @skunkworxdark Do you have test images you can share that showcase the benefits of either optimized tiling schemes or smart seam? Maybe the benefits just haven't been very pronounced on my choice of test images. |
I'm fine with eliminating 1, but for the sake of ensuring that this feature gets released "Ready to go", I'd encourage us to aim for 2 as the best course of action. I think both of the concerns raised (tiling schemes, and smart seam) are seemingly low-risk optimizations, that help intelligently handle edge-cases in the upscaling workflow. @RyanJDick - would you prefer that the other PRs be created branched off of this PR by @skunkworxdark & others in order to evaluate the full set of changes in those PRs? |
Sorry if this is duplicated but I accidentally managed to delete my last comment. You have done a good job with this so far and the workflow is looking much neater with the last few changes. I will try and find time tomorrow to look into what is involved in adding the extra tiling methods and the smart seam logic and report back. Can suggest 2 more minor aesthetic changes.
|
Addressed the latest UI requests. in 07e7c9e. Updated workflow: |
…se nodes is for use in tiled upscaling workflows.
…conflict with custom node. And other minor tidying.
…void having to calculate them with math nodes.
…eTilesToImageInvocation.
…ontally first and then vertically. This change achieves slightly more natural blending on the corners where 4 tiles overlap.
…geCropInvocation.
07e7c9e
to
972b854
Compare
For the sake of integration testing for those on main, and ensuring that we don't run into major conflicts, I'm going to approve this and merge. It sounds like we're comfortable with adding in other optimizations on top of these nodes as follow-on PRs that we can push to have in before a next release, barring any major hiccups that are run into in the PR cycle. |
Review hold no longer needed since upstream is now main.
Contributors
Big thanks to @JPPhoto, @dwringer, @skunkworxdark and @dunkeroni for their contributions in this discord discussion: https://discord.com/channels/1020123559063990373/1161727357309165608.
The implementation in this PR is based closely on their work.
Check out that discussion for a more-advanced, continually-evolving implementation of tiled upscaling.
What type of PR is this? (check all applicable)
Have you discussed this change with the InvokeAI team?
Have you updated all relevant documentation?
Description
This PR adds several new invocations to enable image tiling workflows:
CalculateImageTilesInvocation
TileToPropertiesInvocation
PairTileImageInvocation
MergeTilesToImageInvocation
Here is a sample Tiled Upscaling workflow to illustrate how these tiling invocations are intended to be used: tiled_upscaling.json
Known Limitations / Weirdness
During development of this feature there was lots of deliberation around how best to implement it within the 'nodes' ecosystem (a single tiled upscaling node vs. multiple nodes to enable tile splitting/merging). Below is a list of the known shortcomings of the selected implementation. Many of these point to improvements that could be made to node workflows in general.
Related Tickets & Documents
I previously posted this RFC with an alternative single-node implementation: #5109
After exploring it further, I decided to proceed with the approach in this PR. The biggest challenge with the single-node approach is that it is cumbersome to call into all of the necessary logic that is spread across the Stable Diffusion nodes (e.g. to support ControlNet, IP-Adapter, etc.).
QA Instructions, Screenshots, Recordings
Here is a sample Tiled Upscaling workflow: tiled_upscaling.json. Try it out and let me know what you think.
Please reflect on whether this is the right API for tiling. Now is the easiest time to change it.
Added/updated tests?
The core tiling logic has strong unit test coverage. Invocations are not currently covered.
TODO