Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Producing heatmap visualizations with SubwordAggregator and SequenceAttributionAggregator #246

Closed
nfelnlp opened this issue Jan 11, 2024 · 1 comment · Fixed by #247
Closed
Labels
bug Something isn't working

Comments

@nfelnlp
Copy link
Collaborator

nfelnlp commented Jan 11, 2024

🐛 Bug Report

Hi @gsarti,

I've discovered an interesting behavior of the aggregators. Thanks for the help on that so far via the private chat! 😄
Since this is a bit trickier than anticipated, posting this here makes sense.

Specifically, I'm working with a quantized Llama-2-7b (GPTQ) and a relatively long prompt for the ECQA task. I've simply copied the input_text of one instance and the model's prediction into the example code below.

The resulting attribution (out) will have a 4D target_attributions tensor of shape (102, 1, 32, 32).
Producing a single vector of attribution scores (102, 1) works well with the standard SequenceAttributionAggregator.
out_viz produces the following:
Screenshot from 2024-01-11 12-27-06
[ ... ]
Screenshot from 2024-01-11 12-27-15

However, when I want to apply the SubwordAggregator, I first got a 3D matrix of shape (66, 32, 32).
Following your suggestion of adding the SequenceAttributionAggregator to the pipeline, I will get a 2D attribution matrix (see screenshot). It's not clear to me yet what how the columns are supposed to be read, since the first column appears to be an artifact from before the subwords were aggregated.
Is the shape of target_attributions here (66 x 32) correct according to your interpretation?

Screenshot from 2024-01-11 11-39-53

How would I now determine the aggregated importance score of each input token? I can't apply another .aggregate(), right?

Also while I'm at it, I forgot how I can remove BOS ("<s>") from the resulting matrix. I only found skip_special_tokens and clean_tokens from HuggingfaceModel, but this can't be applied here, I think.

Thanks so much for your help! 🙌

Code sample

import inseq
from inseq.data.aggregator import SequenceAttributionAggregator, SubwordAggregator
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig


model_name = "TheBloke/Llama-2-7b-Chat-GPTQ"
gpt_tokenizer = AutoTokenizer.from_pretrained(model_name)
quantization_config = GPTQConfig(bits=4, tokenizer=gpt_tokenizer)
gpt_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=quantization_config,
    device_map="auto"
)
inseq_model = inseq.load_model(
    model=gpt_model,
    attribution_method="attention",
    device="cuda"
)

input_text = ("Each 3 items in the following list contains the question, choice and prediction. Your task is to choose "
              "one of the choices as the answer for the question.\n"
              "Question: 'Sam has been kissing girls at school.  Unfortunately, he has to stay home for a week. Why "
              "might he need to stay home?'\n"
              "Choice: '(1) disease (2) arousal (3) cooties (4) sweet (5) punishment '\n"
              "Prediction: ")
prediction = "punishment"

# Attribute text
out = inseq_model.attribute(
    input_texts=input_text,
    generated_texts=f"{input_text}{prediction}",
    n_steps=1,
    attribute_target=False,
    step_scores=["probability"],
    show_progress=True,
    generation_args={}
)

# Standard aggregation
out_agg = out.aggregate()
# Get HTML visualization
out_viz = out_agg.show(return_html=True, do_aggregation=False)
# This works perfectly fine and puts out a 1D vector of attribution scores for the entire prompt up until the final generated token

# TODO: Perform subword aggregation
subw_sqa_agg = out.aggregate([SubwordAggregator, SequenceAttributionAggregator])
subw_viz = subw_sqa_agg.show(return_html=True, do_aggregation=False)
# This produces the heatmap as shown in the final screenshot

Environment

  • OS: Ubuntu
  • Python version: 3.10
  • Inseq version: 0.5.0 (inseq @ git+https://github.com/inseq-team/inseq.git@dfea66fc02b65ef336cdf63826ddb1b439a90786)
@nfelnlp nfelnlp added the bug Something isn't working label Jan 11, 2024
@gsarti
Copy link
Member

gsarti commented Jan 12, 2024

Hi @nfelnlp,

Thanks again for the detailed report! There was indeed a problem with the aggregate_contiguous function in the edge case where the attribution tensor had a single dimension, which was getting squeezed out, causing a shape mismatch in the shape compatibility check of the Aggregator. It will be fixed in #247.

Side note: In the example above, the fact that input_text ends with a whitespace is the culprit to ▁pun not being correctly integrated in the generated output, since the character used for aggregation is included as final character of input_text. This in order causes the next token to be tokenized weirdly (e.g. pun ishment instead of ▁punishment). I suggest to remove the space there, and add it to the template as generated_texts=f"{input_text}{prediction}" in the attribute call.

This is the code I used to reproduce the issue, which runs correctly in the PR branch above:

import inseq
from inseq.data.aggregator import SubwordAggregator

inseq_model = inseq.load_model(
    model="gpt2",
    attribution_method="attention",
    device="cuda"
)

input_text = ("Each 3 items in the following list contains the question, choice and prediction. Your task is to choose "
              "one of the choices as the answer for the question.\n"
              "Question: 'Sam has been kissing girls at school.  Unfortunately, he has to stay home for a week. Why "
              "might he need to stay home?'\n"
              "Choice: '(1) disease (2) arousal (3) cooties (4) sweet (5) punishment '\n"
              "Prediction:")
prediction = "punishment"

# Attribute text
out = inseq_model.attribute(
    input_texts=input_text,
    generated_texts=f"{input_text} {prediction}",
    attribute_target=False,
    step_scores=["probability"],
    show_progress=True,
    generation_args={}
)

# Standard aggregation
out_agg = out.aggregate()
# Get HTML visualization
out_viz = out_agg.show(return_html=True, do_aggregation=False)
# This works perfectly fine and puts out a 1D vector of attribution scores for the entire prompt up until the final generated token

# First we aggregate the subword tokens. Special symbol default is ▁ (SentencePiece), we use Ġ here (GPT-2)
# The second aggregate call is exactly like the one above: for attention, [mean, mean] (mean across the layers and heads dimensions)
subw_sqa_agg = out.aggregate(SubwordAggregator, special_chars=("Ġ", "Ċ")).aggregate()
subw_viz = subw_sqa_agg.show(return_html=True, do_aggregation=False)
image

@gsarti gsarti linked a pull request Jan 12, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants