You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m curious about the design choice of using Transformer2DModel as the parent class for Mamba2DModel.
Are there specific benefits to this approach that I might not be aware of? I find it a bit confusing since the model is called Mamba but heavily relies on Transformer components.
Looking forward to your clarification.
Thanks!
The text was updated successfully, but these errors were encountered:
This is because Transformer2DModel in diffusers corresponds to the diffusion transformer (DiTs) models.
When I developed DiM, I wanted to leverage or ablate the modules in DiTs for convenience (such as various Normalization types), so I chose Transformer2DModel in diffusers as the parent class for Mamba2DModel.
In the final version of DiM, a lot of modules were modified but the parent class was kept.
Dear Author,
I’m curious about the design choice of using Transformer2DModel as the parent class for Mamba2DModel.
Are there specific benefits to this approach that I might not be aware of? I find it a bit confusing since the model is called Mamba but heavily relies on Transformer components.
Looking forward to your clarification.
Thanks!
The text was updated successfully, but these errors were encountered: