SW in LLaMA2-7B #2

kiucho · 2024-11-19T12:43:51Z

Thank you for sharing your excellent research.

I attempted to implement the identification of spurious weights (SW) in LLaMA2-7B and have a question. When analyzing the input and output, I observed the following graph, which suggests the presence of SW in layer 1. The input value was approximately 1400, but the output exceeded 2000.

Upon investigation, I identified the indices of the SW as [2533, 7890], which aligns with the findings reported in your paper.

Next, I generated another graph after removing the SW at [2533, 7890] from the mlp.down_proj of layer 1. This resulted in the following graph:

This led me to wonder if there might be additional SW. For example, in layer 30, the initial value was 10,000, which dropped to -17,500. However, according to your paper, there is only one identified SW at [2533, 7890] in the mlp.down_proj of layer 1.

Could you please provide the exact algorithm for identifying SW?

mengxiayu · 2024-12-03T17:21:45Z

We had the same hypothesis and tried to identify the SW in late layers. We had two hypotheses: (1) there are late-layer SW; (2) early-layer SW and late-layer SW appear as a pair (i.e., removing both of them doesn't hurt the model quality).

For Llama-7B, we were unable to identify a single or a few weights that eliminate the sparks you showed. Results suggest it might be a whole column of weights. For OLMo-7B, we were able to identify a few outlier weights that eliminate the sparks. However, they didn't have the significant impact on model quality as early-layer SW. Therefore, we decided not to include them as super weights.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SW in LLaMA2-7B #2

SW in LLaMA2-7B #2

kiucho commented Nov 19, 2024

mengxiayu commented Dec 3, 2024

SW in LLaMA2-7B #2

SW in LLaMA2-7B #2

Comments

kiucho commented Nov 19, 2024

mengxiayu commented Dec 3, 2024