Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does anyone know if this work is implemented on llama? #18

Open
wyxscir opened this issue Jan 25, 2024 · 5 comments
Open

Does anyone know if this work is implemented on llama? #18

wyxscir opened this issue Jan 25, 2024 · 5 comments

Comments

@wyxscir
Copy link

wyxscir commented Jan 25, 2024

Does anyone know if this work is implemented on llama? Or is there any similar dynamic pruning work on llama?

@XieWeikai
Copy link

XieWeikai commented Feb 19, 2024

This method doesn't work very well on the llama. Llama uses the SiLU activation function, and its inherent sparsity is not very high.
A work mentioned that it is possible to replace SiLU with ReLU and retrain llama to improve sparsity.

@wyxscir
Copy link
Author

wyxscir commented Feb 20, 2024

This method doesn't work very well on the llama. Llama uses the SiLU activation function, and its inherent sparsity is not very high. A work mentioned that it is possible to replace SiLU with ReLU and retrain llama to improve sparsity.

thank you

@wyxscir
Copy link
Author

wyxscir commented Feb 20, 2024

This method doesn't work very well on the llama. Llama uses the SiLU activation function, and its inherent sparsity is not very high. A work mentioned that it is possible to replace SiLU with ReLU and retrain llama to improve sparsity.

“ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models”,
What you mentioned maybe be this work

@quaternior
Copy link

This method doesn't work very well on the llama. Llama uses the SiLU activation function, and its inherent sparsity is not very high. A work mentioned that it is possible to replace SiLU with ReLU and retrain llama to improve sparsity.

Hello, where can I find about the mention?

@XieWeikai
Copy link

This method doesn't work very well on the llama. Llama uses the SiLU activation function, and its inherent sparsity is not very high. A work mentioned that it is possible to replace SiLU with ReLU and retrain llama to improve sparsity.

“ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models”, What you mentioned maybe be this work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants