Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the ultimate goal of flipping? #2

Open
dsdanielpark opened this issue Apr 8, 2024 · 1 comment
Open

What is the ultimate goal of flipping? #2

dsdanielpark opened this issue Apr 8, 2024 · 1 comment

Comments

@dsdanielpark
Copy link

Congratulations on the cool and quick work and outcomes.

Is the flip intended to capture more features? Is it from an augmentation perspective?

I'm curious if other attempts were made. Can you share rough experiments or ideation details that didn't make it into the official paper?

The idea of Mamba Mixer applying the Mamba architecture swiftly and intuitively overcoming Mamba's drawbacks was impressive. I think it will be a very valuable start. I'll wait for the official repo to update.

Thanks in advance.

@ABehrouz
Copy link

ABehrouz commented May 6, 2024

Hello,

Thank you for your kind words, and I am very sorry for the delay in my response.

The main goal of using flip is to make the model non-causal. That is, without flipping, each token has access to the information of previous tokens, but for example, channels are not causal and this bi-directionality can help to enhance the performance. 

Honestly, the current version is in its very preliminary stage and we didn't perform extensive experiments on different architecture designs. We started our experiments by using MLP and GLU (similar to HGRN) as channel mixing methods and then tried Mamba. In the next version, we are presenting a relaxed version for channel mixing, which helps to reduce the number of parameters without performance drop.

Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants