-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Possible to retrieve layer-wise activations? #166
Comments
There is an model_name = "mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis"
cfg = HuggingFace.HGFConfig(load_config(model_name); output_hidden_states = true)
mod = load_model(model_name, "ForSequenceClassification"; config = cfg) then you can access all layer outputs with BTW, if you don't need the sequence classification head, you can simply use |
Amazing, thanks very much for the quick response 👍🏽 (I won't close this since you added the tag for documentation) |
Small follow-up question: is it also somehow possible to collect outputs for each layer of the classifier head? Edit: I realize I can just break down the forward pass into layer-by-layer calls as below, but perhaps there's a more streamline way to do this? b = clf.layer.layers[1](b).hidden_state |>
x -> clf.layer.layers[2](x) |
You can try extracting the actual layers in the classifier head and construct a |
@chengchingwen I was trying out the following code: using Transformers, Transformers.TextEncoders, Transformers.HuggingFace
bert_config = HuggingFace.HGFConfig(load_config("bert-base-uncased"); output_attentions = true)
# Load BERT model and tokenizer
bert_model = load_model("bert-base-uncased"; config=bert_config)
bert_tokenizer = load_tokenizer("bert-base-uncased")
text = [["The cat sat on the mat", "The cat lay on the rug"]]
sample = encode(bert_tokenizer, text)
bert_model(sample).attention_score This returns an array Is that correct? I'm not sure what the 2 means here, I was expecting a 12 here. For example, for an equivalent Python code: from transformers import BertTokenizer, BertModel
model_version = 'bert-base-uncased'
model = BertModel.from_pretrained(model_version, output_attentions=True)
tokenizer = BertTokenizer.from_pretrained(model_version)
sentence_a = "The cat sat on the mat"
sentence_b = "The cat lay on the rug"
inputs = tokenizer.encode_plus(sentence_a, sentence_b, return_tensors='pt')
input_ids = inputs['input_ids']
token_type_ids = inputs['token_type_ids']
attention = model(input_ids, token_type_ids=token_type_ids)[-1]
print(len(attention), attention[0].shape) which returns:
Would you know the issue here? |
@VarLad The output structure is slightly different. The OTOH I'm sure why you get julia> size(bert_model(sample).outputs[1].attention_score)
(15, 15, 12, 1)
julia> size(bert_model(sample).attention_score)
(15, 15, 12, 1)
|
@chengchingwen Apologies for confusion, 2 seemed to come from a typo on my side: text = ["The cat sat on the mat", "The cat lay on the rug"] and thanks a lot, that was the correct solution :) |
Thanks for the great package @chengchingwen 🙏🏽
I have a somewhat naive question that you might be able to help me with. For a project I'm currently working on I am trying run linear probes on layer activations. In particular, I'm trying to reproduce the following exercise from this paper:
I've naively tried to simply apply the
Flux.activations()
function with no luck. Here's an example:Any advice would be much appreciated!
The text was updated successfully, but these errors were encountered: