Name	Name	Last commit message	Last commit date
parent directory ..
data	data
README.md	README.md
__init__.py	__init__.py
analysis.py	analysis.py
analysis_bertlayer_ACL_camera_ready.ipynb	analysis_bertlayer_ACL_camera_ready.ipynb
analysis_edgeprobe_ICLR_camera_ready.ipynb	analysis_edgeprobe_ICLR_camera_ready.ipynb
analysis_edgeprobe_standard.ipynb	analysis_edgeprobe_standard.ipynb
analyze_runs.py	analyze_runs.py
convert_edge_data_to_tfrecord.py	convert_edge_data_to_tfrecord.py
deterministic_split.py	deterministic_split.py
edge_data_stats.py	edge_data_stats.py
edgeprobe_data_viewer.ipynb	edgeprobe_data_viewer.ipynb
edgeprobe_preds_sandbox.ipynb	edgeprobe_preds_sandbox.ipynb
edgeprobe_single_sentence_comparisons.ipynb	edgeprobe_single_sentence_comparisons.ipynb
generate_elmo_hdf5_weights.py	generate_elmo_hdf5_weights.py
get_and_process_all_data.sh	get_and_process_all_data.sh
get_edge_data_labels.py	get_edge_data_labels.py
get_scalar_mix.py	get_scalar_mix.py
h5py_utils.py	h5py_utils.py
install_extra_deps.sh	install_extra_deps.sh
jiant	jiant
merge_predictions.py	merge_predictions.py
retokenize_bert.sh	retokenize_bert.sh
retokenize_edge_data.py	retokenize_edge_data.py
split_constituent_data.py	split_constituent_data.py

Edge Probing

This is the main page for the following papers:

What do you learn from context? Probing for sentence structure in contextualized word representations (Tenney et al., ICLR 2019), the "edge probing paper": [paper] [poster]
BERT Rediscovers the Classical NLP Pipeline (Tenney et al., ACL 2019), the "BERT layer paper":[paper] [poster]

Most of the code for these is integrated into jiant proper, but this directory contains data preparation and analysis code specific to the edge probing experiments. Additionally, the runner scripts live in jiant/scripts/edgeprobing.

Getting Started

First, follow the set-up instructions for jiant: Getting Started. Be sure you set all the required environment variables, and that you download the git submodules.

If you want to run GloVe or CoVe experiments, also be sure to set WORD_EMBS_FILE to point to a copy of glove.840B.300d.txt.

Next, download and process the edge probing data. You'll need access to the underlying corpora, in particular OntoNotes 5.0 and a processed (JSON) copy of the SPR1 dataset. Edit the paths in get_and_process_all_data.sh to point to these resources, then run:

mkdir -p $JIANT_DATA_DIR
./get_and_process_all_data.sh $JIANT_DATA_DIR

This should populate $JIANT_DATA_DIR/edges with directories for each task, each containing a number of .json files as well as labels.txt. For more details on the data format, see below.

The main entry point for edge probing is jiant/main.py. The main arguments are a config file and any parameter overrides. The jiant/jiant/config/edgeprobe/ folder contains HOCON files as a starting point for all the edge probing experiments.

For a quick test run, use a small dataset like spr2 and a small encoder like CoVe:

cd ${PWD%/jiant*}/jiant
python main.py --config_file jiant/config/edgeprobe/edgeprobe_cove.conf \
  -o "target_tasks=edges-spr2,exp_name=ep_cove_demo"

This will keep the encoder fixed and train an edge probing classifier on the SPR2 dataset. It should run in about 4 minutes on a K80 GPU. It'll produce an output directory in $JIANT_PROJECT_PREFIX/ep_cove_demo. There's a lot of stuff in here, but the files of interest are:

vocab/
  tokens.txt             # token vocab used by the encoder
  edges-spr2_labels.txt  # label vocab used by the probing classifier
run/
  tensorboard/              # tensorboard logdir
  edges-spr2_val.json       # dev set predictions, in edge probing JSON format
  edges-spr2_test.json      # test set predictions, in edge probing JSON format
  log.log                   # training and eval log file (human-readable text)
  params.conf               # serialized parameter list
  edges-spr2/model_state_eval_*.best.th  # PyTorch saved checkpoint

jiant uses tensorboardX to record loss curves and a few other metrics during training. You can view with:

tensorboard --logdir $JIANT_PROJECT_PREFIX/ep_cove_demo/run/tensorboard

You can use the run/*_val.json and run/*_test.json files to run scoring and analysis. There are some helper utilities which allow you to load and aggregate predictions across multiple runs. In particular:

analysis.py contains utilities to load predictions into a set of DataFrames, as well as to pretty-print edge probing examples.
edgeprobe_preds_sandbox.ipynb walks through some of the features in analysis.py
analyze_runs.py is a helper script to process a set of predictions into a condensed .tsv format. It computes confusion matricies for each label and along various stratifiers (like span distance) so you can easily and quickly perform further aggregation and compute metrics like accuracy, precision, recall, and F1. In particular, the run, task, label, stratifier (optional), and stratum_key (optional) columns serve as identifiers, and the confusion matrix is stored in four columns: tp_count, fp_count, tn_count, and tp_count. If you want to aggregate over a group of labels (like SRL core roles), just sum the *_count columns for that group before computing metrics.
get_scalar_mix.py is a helper script to extract scalar mixing weights and export to .tsv.
analysis_edgeprobe_standard.ipynb shows some example analysis on the output of analyze_runs.py and get_scalar_mix.py. This mostly does shallow processing over the output, but the main idiosyncracies to know are: for coref-ontonotes, use the 1 label instead of _micro_avg_, and for srl-ontonotes we report a _clean_micro_ metric which aggregates all the labels that don't start with R- or C-.

Running the experiments from the paper

We provide a frozen branch, ep_frozen_20190723, which should reproduce the experiments from both papers above.

Additionally, there's an older branch, edgeprobe_frozen_feb2019, which is a snapshot of jiant as of the final version of the ICLR paper. However, this is much messier than above.

The configs in jiant/jiant/config/edgeprobe/edgeprobe_*.conf are the starting point for the experiments in the paper, but are supplemented by a number of parameter overrides (the -o flag to main.py). We use a set of bash functions to keep track of these, which are maintained in jiant/scripts/edges/exp_fns.sh.

To run a standard experiment, you can do something like:

pushd ${PWD%/jiant*}/jiant
source scripts/edges/exp_fns.sh
bert_mix_exp edges-srl-ontonotes bert-base-uncased

The paper (Table 2 in particular) represents the output of a large number of experiments. Some of these are quite fast (lexical baselines and CoVe), and some are quite slow (GPT model, syntax tasks with lots of targets). We use a Kubernetes cluster running on Google Cloud Platform (GCP) to manage all of these. For more on Kubernetes, see jiant/gcp/kubernetes.

The master script for the experiments is jiant/scripts/edgeprobing/kubernetes_run_all.sh. Mostly, all this does is set up some paths and submit pods to run on the cluster. If you want to run the same set of experiments in a different environment, you can copy that script and modify the kuberun() function to submit a job or to simply run locally.

There's also an analysis helper script, jiant/scripts/edgeprobing/analyze_project.sh, which runs analyze_runs.py and get_scalar_mix.py on the output of a set of Kubernetes runs. Note that scoring runs is CPU-intensive and might take a while for larger experiments.

There are two analysis notebooks which produce the main tables and figures for each paper. These are frozen as-is for a reference, but probably won't be runnable directly as they reference a number of specific data paths:

Note on coreference metrics: the default model actually trains on two mutually-exclusive targets with labels "0" and "1". In the papers we ignore the "0" class and report F1 scores from treating the positive ("1") class as a binary target. See this issue for more detail, or Ctrl+F for is_coref_task in the above notebooks for the relevant code.

If you hit any snags (Editor's note: it's research code, you probably will), contact Ian (email address in the paper) for help.

Edge Probing Utilities

This directory contains a number of utilities for the edge probing project.

In particular:

edge_data_stats.py prints stats, like the number of tokens, number of spans, and number of labels.
get_edge_data_labels.py compiles a list of all the unique labels found in a dataset.
retokenize_edge_data.py applies tokenizers (MosesTokenizer, OpenAI.BPE, or a BERT wordpiece model) and re-map spans to the new tokenization.
convert_edge_data_to_tfrecord.py converts edge probing JSON data to TensorFlow examples.

The data/ subdirectory contains scripts to download each probing dataset and convert it to the edge probing JSON format, described below.

If you just want to get all the data, see get_and_process_all_data.sh; this is a convenience wrapper over the instructions in data/README.md.

Data Format

The edge probing data is stored and manipulated as JSON (or the equivalent Python dict) which encodes a single text field and a number of targets each consisting of span1, (optionally) span2, and a list of labels. The info field can be used for additional metadata. See examples below:

SRL example

// span1 is predicate, span2 is argument
{
  “text”: “Ian ate strawberry ice cream”,
  “targets”: [
    { “span1”: [1,2], “span2”: [0,1], “label”: “A0” },
    { “span1”: [1,2], “span2”: [2,5], “label”: “A1” }
  ],
  “info”: { “source”: “PropBank”, ... }
}

Constituents example

// span2 is unused
{
  “text”: “Ian ate strawberry ice cream”,
  “targets”: [
    { “span1”: [0,1], “label”: “NNP” },
    { “span1”: [1,2], “label”: “VBD” },
    ...
    { “span1”: [2,5], “label”: “NP” }
    { “span1”: [1,5], “label”: “VP” }
    { “span1”: [0,5], “label”: “S” }
  ]
  “info”: { “source”: “PTB”, ... }
}

Semantic Proto-roles (SPR) example

// span1 is predicate, span2 is argument
// label is a list of attributes (multilabel)
{
  'text': "The main reason is Google is more accessible to the global community and you can rest assured that it 's not going to go away ."
  'targets': [
    {
      'span1': [3, 4], 'span2': [0, 3],
      'label': ['existed_after', 'existed_before', 'existed_during',
                'instigation', 'was_used'],
      'info': { ... }
    },
    ...
  ]
  'info': {'source': 'SPR2', ... },
}

Labels and Retokenization

For each task, we need to perform two additional preprocessing steps before training using main.py.

First, extract the set of available labels:

export TASK_DIR="$JIANT_DATA_DIR/edges/<task>"
python jiant/probing/get_edge_data_labels.py -o $TASK_DIR/labels.txt \
    -i $TASK_DIR/*.json -s

Second, make retokenized versions for any tokenizers you need. For example:

# for CoVe and GPT, respectively
python jiant/probing/retokenize_edge_data.py -t "MosesTokenizer" $TASK_DIR/*.json
python jiant/probing/retokenize_edge_data.py -t "OpenAI.BPE"     $TASK_DIR/*.json
# for BERT
python jiant/probing/retokenize_edge_data.py -t "bert-base-uncased"  $TASK_DIR/*.json
python jiant/probing/retokenize_edge_data.py -t "bert-large-uncased" $TASK_DIR/*.json

This will save retokenized versions alongside the original files.

Data Statistics

Appendix B of the edge probing paper is wrong in several entries. This table contains the exact counts for the number of examples, tokens, and targets (train/dev/test) for each task.

Task	Labels	Examples	Tokens	Total Targets
Part-of-Speech	48	115812/15680/12217	2200865/304701/230118	2070382/290013/212121
Constituents	30	115812/15680/12217	2200865/304701/230118	1851590/255133/190535
Dependencies	49	12522/2000/2075	203919/25110/25049	203919/25110/25049
Entites	18	115812/15680/12217	2200865/304701/230118	128738/20354/12586
SRL (all)	66	253070/35297/26715	6619740/934744/711746	598983/83362/61716
Core roles	6	253070/35297/26715	6619740/934744/711746	411469/57237/41735
Non-core roles	21	253070/35297/26715	6619740/934744/711746	170220/23754/18290
OntoNotes coref.	2	115812/15680/12217	2200865/304701/230118	207830/26333/27800
SPR1	18	3843/518/551	81255/10692/11955	7611/1071/1055
SPR2	20	2226/291/276	46969/5592/4929	4925/630/582
Winograd coref.	2	958/223/518	14384/4331/7952	1787/379/949
Rel. (SemEval)	19	6851/1149/2717	117232/20361/46873	6851/1149/2717

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

probing

probing

README.md

Edge Probing

Getting Started

Running the experiments from the paper

Edge Probing Utilities

Data Format

Labels and Retokenization

Data Statistics

Files

probing

Directory actions

More options

Directory actions

More options

Latest commit

History

probing

Folders and files

parent directory

README.md

Edge Probing

Getting Started

Running the experiments from the paper

Edge Probing Utilities

Data Format

Labels and Retokenization

Data Statistics