Skip to content

Latest commit

 

History

History

probing

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Edge Probing

This is the main page for the following papers:

  • What do you learn from context? Probing for sentence structure in contextualized word representations (Tenney et al., ICLR 2019), the "edge probing paper": [paper] [poster]
  • BERT Rediscovers the Classical NLP Pipeline (Tenney et al., ACL 2019), the "BERT layer paper":[paper] [poster]

Most of the code for these is integrated into jiant proper, but this directory contains data preparation and analysis code specific to the edge probing experiments. Additionally, the runner scripts live in jiant/scripts/edgeprobing.

Getting Started

First, follow the set-up instructions for jiant: Getting Started. Be sure you set all the required environment variables, and that you download the git submodules.

If you want to run GloVe or CoVe experiments, also be sure to set WORD_EMBS_FILE to point to a copy of glove.840B.300d.txt.

Next, download and process the edge probing data. You'll need access to the underlying corpora, in particular OntoNotes 5.0 and a processed (JSON) copy of the SPR1 dataset. Edit the paths in get_and_process_all_data.sh to point to these resources, then run:

mkdir -p $JIANT_DATA_DIR
./get_and_process_all_data.sh $JIANT_DATA_DIR

This should populate $JIANT_DATA_DIR/edges with directories for each task, each containing a number of .json files as well as labels.txt. For more details on the data format, see below.

The main entry point for edge probing is jiant/main.py. The main arguments are a config file and any parameter overrides. The jiant/jiant/config/edgeprobe/ folder contains HOCON files as a starting point for all the edge probing experiments.

For a quick test run, use a small dataset like spr2 and a small encoder like CoVe:

cd ${PWD%/jiant*}/jiant
python main.py --config_file jiant/config/edgeprobe/edgeprobe_cove.conf \
  -o "target_tasks=edges-spr2,exp_name=ep_cove_demo"

This will keep the encoder fixed and train an edge probing classifier on the SPR2 dataset. It should run in about 4 minutes on a K80 GPU. It'll produce an output directory in $JIANT_PROJECT_PREFIX/ep_cove_demo. There's a lot of stuff in here, but the files of interest are:

vocab/
  tokens.txt             # token vocab used by the encoder
  edges-spr2_labels.txt  # label vocab used by the probing classifier
run/
  tensorboard/              # tensorboard logdir
  edges-spr2_val.json       # dev set predictions, in edge probing JSON format
  edges-spr2_test.json      # test set predictions, in edge probing JSON format
  log.log                   # training and eval log file (human-readable text)
  params.conf               # serialized parameter list
  edges-spr2/model_state_eval_*.best.th  # PyTorch saved checkpoint

jiant uses tensorboardX to record loss curves and a few other metrics during training. You can view with:

tensorboard --logdir $JIANT_PROJECT_PREFIX/ep_cove_demo/run/tensorboard

You can use the run/*_val.json and run/*_test.json files to run scoring and analysis. There are some helper utilities which allow you to load and aggregate predictions across multiple runs. In particular:

  • analysis.py contains utilities to load predictions into a set of DataFrames, as well as to pretty-print edge probing examples.
  • edgeprobe_preds_sandbox.ipynb walks through some of the features in analysis.py
  • analyze_runs.py is a helper script to process a set of predictions into a condensed .tsv format. It computes confusion matricies for each label and along various stratifiers (like span distance) so you can easily and quickly perform further aggregation and compute metrics like accuracy, precision, recall, and F1. In particular, the run, task, label, stratifier (optional), and stratum_key (optional) columns serve as identifiers, and the confusion matrix is stored in four columns: tp_count, fp_count, tn_count, and tp_count. If you want to aggregate over a group of labels (like SRL core roles), just sum the *_count columns for that group before computing metrics.
  • get_scalar_mix.py is a helper script to extract scalar mixing weights and export to .tsv.
  • analysis_edgeprobe_standard.ipynb shows some example analysis on the output of analyze_runs.py and get_scalar_mix.py. This mostly does shallow processing over the output, but the main idiosyncracies to know are: for coref-ontonotes, use the 1 label instead of _micro_avg_, and for srl-ontonotes we report a _clean_micro_ metric which aggregates all the labels that don't start with R- or C-.

Running the experiments from the paper

We provide a frozen branch, ep_frozen_20190723, which should reproduce the experiments from both papers above.

Additionally, there's an older branch, edgeprobe_frozen_feb2019, which is a snapshot of jiant as of the final version of the ICLR paper. However, this is much messier than above.

The configs in jiant/jiant/config/edgeprobe/edgeprobe_*.conf are the starting point for the experiments in the paper, but are supplemented by a number of parameter overrides (the -o flag to main.py). We use a set of bash functions to keep track of these, which are maintained in jiant/scripts/edges/exp_fns.sh.

To run a standard experiment, you can do something like:

pushd ${PWD%/jiant*}/jiant
source scripts/edges/exp_fns.sh
bert_mix_exp edges-srl-ontonotes bert-base-uncased

The paper (Table 2 in particular) represents the output of a large number of experiments. Some of these are quite fast (lexical baselines and CoVe), and some are quite slow (GPT model, syntax tasks with lots of targets). We use a Kubernetes cluster running on Google Cloud Platform (GCP) to manage all of these. For more on Kubernetes, see jiant/gcp/kubernetes.

The master script for the experiments is jiant/scripts/edgeprobing/kubernetes_run_all.sh. Mostly, all this does is set up some paths and submit pods to run on the cluster. If you want to run the same set of experiments in a different environment, you can copy that script and modify the kuberun() function to submit a job or to simply run locally.

There's also an analysis helper script, jiant/scripts/edgeprobing/analyze_project.sh, which runs analyze_runs.py and get_scalar_mix.py on the output of a set of Kubernetes runs. Note that scoring runs is CPU-intensive and might take a while for larger experiments.

There are two analysis notebooks which produce the main tables and figures for each paper. These are frozen as-is for a reference, but probably won't be runnable directly as they reference a number of specific data paths:

Note on coreference metrics: the default model actually trains on two mutually-exclusive targets with labels "0" and "1". In the papers we ignore the "0" class and report F1 scores from treating the positive ("1") class as a binary target. See this issue for more detail, or Ctrl+F for is_coref_task in the above notebooks for the relevant code.

If you hit any snags (Editor's note: it's research code, you probably will), contact Ian (email address in the paper) for help.

Edge Probing Utilities

This directory contains a number of utilities for the edge probing project.

In particular:

The data/ subdirectory contains scripts to download each probing dataset and convert it to the edge probing JSON format, described below.

If you just want to get all the data, see get_and_process_all_data.sh; this is a convenience wrapper over the instructions in data/README.md.

Data Format

The edge probing data is stored and manipulated as JSON (or the equivalent Python dict) which encodes a single text field and a number of targets each consisting of span1, (optionally) span2, and a list of labels. The info field can be used for additional metadata. See examples below:

SRL example

// span1 is predicate, span2 is argument
{
  “text”: “Ian ate strawberry ice cream”,
  “targets”: [
    { “span1”: [1,2], “span2”: [0,1], “label”: “A0” },
    { “span1”: [1,2], “span2”: [2,5], “label”: “A1” }
  ],
  “info”: { “source”: “PropBank”, ... }
}

Constituents example

// span2 is unused
{
  “text”: “Ian ate strawberry ice cream”,
  “targets”: [
    { “span1”: [0,1], “label”: “NNP” },
    { “span1”: [1,2], “label”: “VBD” },
    ...
    { “span1”: [2,5], “label”: “NP” }
    { “span1”: [1,5], “label”: “VP” }
    { “span1”: [0,5], “label”: “S” }
  ]
  “info”: { “source”: “PTB”, ... }
}

Semantic Proto-roles (SPR) example

// span1 is predicate, span2 is argument
// label is a list of attributes (multilabel)
{
  'text': "The main reason is Google is more accessible to the global community and you can rest assured that it 's not going to go away ."
  'targets': [
    {
      'span1': [3, 4], 'span2': [0, 3],
      'label': ['existed_after', 'existed_before', 'existed_during',
                'instigation', 'was_used'],
      'info': { ... }
    },
    ...
  ]
  'info': {'source': 'SPR2', ... },
}

Labels and Retokenization

For each task, we need to perform two additional preprocessing steps before training using main.py.

First, extract the set of available labels:

export TASK_DIR="$JIANT_DATA_DIR/edges/<task>"
python jiant/probing/get_edge_data_labels.py -o $TASK_DIR/labels.txt \
    -i $TASK_DIR/*.json -s

Second, make retokenized versions for any tokenizers you need. For example:

# for CoVe and GPT, respectively
python jiant/probing/retokenize_edge_data.py -t "MosesTokenizer" $TASK_DIR/*.json
python jiant/probing/retokenize_edge_data.py -t "OpenAI.BPE"     $TASK_DIR/*.json
# for BERT
python jiant/probing/retokenize_edge_data.py -t "bert-base-uncased"  $TASK_DIR/*.json
python jiant/probing/retokenize_edge_data.py -t "bert-large-uncased" $TASK_DIR/*.json

This will save retokenized versions alongside the original files.

Data Statistics

Appendix B of the edge probing paper is wrong in several entries. This table contains the exact counts for the number of examples, tokens, and targets (train/dev/test) for each task.

Task Labels Examples Tokens Total Targets
Part-of-Speech 48 115812/15680/12217 2200865/304701/230118 2070382/290013/212121
Constituents 30 115812/15680/12217 2200865/304701/230118 1851590/255133/190535
Dependencies 49 12522/2000/2075 203919/25110/25049 203919/25110/25049
Entites 18 115812/15680/12217 2200865/304701/230118 128738/20354/12586
SRL (all) 66 253070/35297/26715 6619740/934744/711746 598983/83362/61716
Core roles 6 253070/35297/26715 6619740/934744/711746 411469/57237/41735
Non-core roles 21 253070/35297/26715 6619740/934744/711746 170220/23754/18290
OntoNotes coref. 2 115812/15680/12217 2200865/304701/230118 207830/26333/27800
SPR1 18 3843/518/551 81255/10692/11955 7611/1071/1055
SPR2 20 2226/291/276 46969/5592/4929 4925/630/582
Winograd coref. 2 958/223/518 14384/4331/7952 1787/379/949
Rel. (SemEval) 19 6851/1149/2717 117232/20361/46873 6851/1149/2717