You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I attempted to implement the identification of spurious weights (SW) in LLaMA2-7B and have a question. When analyzing the input and output, I observed the following graph, which suggests the presence of SW in layer 1. The input value was approximately 1400, but the output exceeded 2000.
Upon investigation, I identified the indices of the SW as [2533, 7890], which aligns with the findings reported in your paper.
Next, I generated another graph after removing the SW at [2533, 7890] from the mlp.down_proj of layer 1. This resulted in the following graph:
This led me to wonder if there might be additional SW. For example, in layer 30, the initial value was 10,000, which dropped to -17,500. However, according to your paper, there is only one identified SW at [2533, 7890] in the mlp.down_proj of layer 1.
Could you please provide the exact algorithm for identifying SW?
The text was updated successfully, but these errors were encountered:
We had the same hypothesis and tried to identify the SW in late layers. We had two hypotheses: (1) there are late-layer SW; (2) early-layer SW and late-layer SW appear as a pair (i.e., removing both of them doesn't hurt the model quality).
For Llama-7B, we were unable to identify a single or a few weights that eliminate the sparks you showed. Results suggest it might be a whole column of weights. For OLMo-7B, we were able to identify a few outlier weights that eliminate the sparks. However, they didn't have the significant impact on model quality as early-layer SW. Therefore, we decided not to include them as super weights.
Thank you for sharing your excellent research.
I attempted to implement the identification of spurious weights (SW) in LLaMA2-7B and have a question. When analyzing the input and output, I observed the following graph, which suggests the presence of SW in layer 1. The input value was approximately 1400, but the output exceeded 2000.
Upon investigation, I identified the indices of the SW as [2533, 7890], which aligns with the findings reported in your paper.
Next, I generated another graph after removing the SW at [2533, 7890] from the mlp.down_proj of layer 1. This resulted in the following graph:
This led me to wonder if there might be additional SW. For example, in layer 30, the initial value was 10,000, which dropped to -17,500. However, according to your paper, there is only one identified SW at [2533, 7890] in the mlp.down_proj of layer 1.
Could you please provide the exact algorithm for identifying SW?
The text was updated successfully, but these errors were encountered: