How do you think about the interpretablity research? #231

jingedawang · 2024-06-19T14:45:38Z

jingedawang
Jun 19, 2024

Last month, Anthropic published their research about the interpretablity of Claude 3. Do you have any comments about it?

I'm quite interested in how the LLM works internally but it seems there are not many people working on that. I hope people can find some clues on the basics of intelligence by investigating LLM.

I'm a beginner in this area. Your book and blogs really helped me a lot. I'd like to share them with my Chinese friends. Thank you!

rasbt · 2024-06-20T12:21:48Z

rasbt
Jun 20, 2024
Maintainer

Thanks for sharing, this looks like an interesting article. I remember seeing it shared somewhere a few weeks ago, but I haven't had time to fully read it yet.

They say "We mostly treat AI models as a black box:" ... well, I'd say that's because they (the proprietary LLM providers like Anthropic) also only present their LLMs to customers as a black box.

I can't speak for everyone, but I know many researchers who do like to look and analyze attention maps, myself included.

I have to read it in more detail, but based on skimming it here, I don't thing they are doing anything new or unusual here by looking "under the hood". Even the first language model attention papers did something similar to this. E.g., the 2014 Neural Machine Translation by Jointly Learning to Align and Translate paper (https://arxiv.org/abs/1409.0473):

We see strong weights along the diagonal of each matrix. However, we also observe a number of non-trivial, non-monotonic alignments ... adjectives and nouns are typically ordered differently between French and English, and we see an example in Fig. 3 (a). From this figure, we see that the model correctly translates a phrase [European Economic Area] into [zone e ́conomique europe ́en]. The RNNsearch was able to correctly align [zone] with [Area], jumping over the two words ([European] and [Economic]), and then looked one word back at a time to complete the whole phrase [zone e ́conomique europe ́enne].

1 reply

jingedawang Jun 20, 2024
Author

Thank you for the insights. By the way, what do you think of interpretability research? Is it necessary for AI's future?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do you think about the interpretablity research? #231

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

How do you think about the interpretablity research? #231

jingedawang Jun 19, 2024

Replies: 1 comment · 1 reply

rasbt Jun 20, 2024 Maintainer

jingedawang Jun 20, 2024 Author

jingedawang
Jun 19, 2024

Replies: 1 comment 1 reply

rasbt
Jun 20, 2024
Maintainer

jingedawang Jun 20, 2024
Author