Release v0.6.0: Context Attribution CLI, New Attribution Methods, Performance Improvements and more · inseq-team/inseq

🔙 Context Attribution CLI (#237)

The inseq attribute-context CLI command was added to support the PECoRe framework for analyzing context usage in generative language models. The command is highly customizable, allowing users to pick custom contrastive step functions to detect context sensitivity during generation (CTI step) and any attribution method to attribute context reliance (CCI step).

A demo using the Inseq API is available on Hugging Face Spaces. The demo supports flexible parametrization, and the equivalent Python/Bash code can be generated by clicking on the Show code button.

Example

The following example uses a GPT-2 model to generate a continuation of input_current_text, and uses the additional context provided by input_context_text to estimate its influence on the the generation. In this case, the output "to the hospital. He said he was fine" is produced, and the generation of token hospital is found to be dependent on context token sick according to the contrast_prob_diff step function.

inseq attribute-context \
--model_name_or_path gpt2 \
--input_context_text "George was sick yesterday." \
--input_current_text "His colleagues asked him to come" \
--attributed_fn "contrast_prob_diff"

Result:

Context with [contextual cues] (std λ=1.00) followed by output sentence with {context-sensitive target spans} (std λ=1.00)
(CTI = "kl_divergence", CCI = "saliency" w/ "contrast_prob_diff" target)

Input context:  George was sick yesterday.
Input current: His colleagues asked him to come
Output current: to the hospital. He said he was fine

#1.
Generated output (CTI > 0.428): to the {hospital}(0.548). He said he was fine
Input context (CCI > 0.460):    George was [sick](0.516) yesterday.

🔍 New Attribution Methods: Value Zeroing and ReAGent (#173, #250)

The following two perturbation-based attribution methods were added:

value_zeroing: Quantifying Context Mixing in Transformers (Mohebbi et al. 2023)
reagent: ReAGent: A Model-agnostic Feature Attribution Method for Generative Language Models (Zhao et al., 2024)

Value zeroing is a Transformers-specific method that quantifies the layer-by-layer mixing of contextual information across token representations by zeroing the value vector associated to a specific input embedding (effectively preventing information mixing for a token position) and measuring the dissimilarity of resulting representations with respect to the original model output. The Inseq implementation is highly flexible, supporting the zeroing of specific attention heads in specific layers, allowing fine-grained control of the zeroing process. Its effect is equivalent to the Attention Knockout method proposed in Geva et al. (2023) (zeroing the value vector instead its associated attention weight).

The following example performs value zeroing on the cross-attention operation of an encoder-decoder translation model, keeping the value vectors of the self-attention operation in the encoder and the decoder modules unaltered. Only the output of the fourth layer is shown.

import inseq

model  = inseq.load_model("Helsinki-NLP/opus-mt-en-fr", "value_zeroing")
out = model.attribute(
    "A generative language models interpretability tool.",
    encoder_zeroed_units_indices={},
    decoder_zeroed_units_indices={},
)
out.show(select_idx=4)

ReAGent is a model-agnostic method that quantifies the importance of input features by measuring the change in model output in a recursive process replacing salient input tokens with plausible alternatives produced by a language model. The method is particularly useful to avoid the out-of-distribution issues of regular occlusion approaches using 0-valued vectors as replacements.

The following example uses the ReAGent method to attribute the generation of a GPT-2 decoder-only LM.

import inseq

model = inseq.load_model(
    "gpt2-medium",
    "reagent",
    keep_top_n=5,
    stopping_condition_top_k=3,
    replacing_ratio=0.3,
    max_probe_steps=3000,
    num_probes=8
)
out = model.attribute("Super Mario Land is a game that developed by")
out.show()

🚀 Improved Performance for Single-step Attribution Methods and Multi-GPU support (#173, #238)

The value_zeroing and attention methods now use scores from the last generation step to produce outputs more efficiently (is_final_step_method = True). This change allows the methods to avoid iterating over the full sequence, making them more efficient for single-step attribution methods.
Inseq now supports multi-GPU attribution for all models and methods, allowing users to distribute the attribution process across multiple GPUs. The feature is particularly useful for large models and long sequences, where the attribution process can be computationally expensive.

💥 Breaking Changes

If attention is used as attribution method in model.attribute, step_scores cannot be extracted at the same time since the method does not require iterating over the full sequence anymore. (#173) As an alternative, step scores can be extracted separately using the dummy attribution method (i.e. no attribution).
BOS is always included in target-side attribution and generated sequences if present. (#173)

All Merged PRs

🚀 Features

Support for multi-GPU attribution (#238) @gsarti
Added inseq attribute-context CLI command to support the [PECoRe framework] for detecting and attributing context reliance in generative LMs (#237) @gsarti
Added value_zeroing (inseq.attr.feat.perturbation_attribution.ValueZeroingAttribution) attribution method and is_final_step_method = True support (#173) @gsarti
Added reagent (inseq.attr.feat.perturbation_attribution.ReAgentAttribution) attribution method (#250) @casszhao @xuan25 @gsarti

🔧 Fixes & Refactoring

Fix URL to arXiv (#259) @bbjoverbeek
Fix ContiguousSpanAggregator and SubwordAggregator edge case of single-step generation (#247) @gsarti
Move tensors to CPU right away in the forward pass to avoid OOM when cloning (#245) @gsarti
Fix remap_from_filtered behavior on sequence_scores tensors. (#245) @gsarti
Use torch-native padding when converting lists of FeatureAttributionStepOutput to FeatureAttributionSequenceOutput in get_sequences_from_batched_steps. (#245) @gsarti
Bump ruff version (#245) @gsarti
Drop poetry in favor of uv to accelerate package installation and simplify config in pyproject.toml. (#249) @gsarti
Drop darglint in favor of pydoclint. (#249) @gsarti
Replace Arxiv with ACL Anthology badge in README. (#249) @gsarti
Add first version of CHANGELOG.md (#249) @gsarti
Added multithread support for running tests using pytest-xdist @gsarti

📝 Documentation and Tutorials

No changes

👥 List of contributors

@gsarti, @casszhao, @xuan25, @bbjoverbeek

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.6.0: Context Attribution CLI, New Attribution Methods, Performance Improvements and more