Skip to content

Commit

Permalink
Update to transformers 2.3.0 & Add ALBERT (#990)
Browse files Browse the repository at this point in the history
* fix roberta tokenization error

* update transformers

* update alignment func

* trim input_module

* update lm head

* update albert special tokens

* input_module_to_pretokenized -> transformer_input_module_to_tokenizer_id

* update ccg alignment

* fix wic retokenize

* update wic docstring, remove unnecessary condition

* refactor record task to avoid tokenization problem

Co-authored-by: Sam Bowman <bowman@nyu.edu>
  • Loading branch information
2 people authored and pyeres committed Jan 28, 2020
1 parent 900e9e8 commit 4a9b058
Show file tree
Hide file tree
Showing 27 changed files with 395 additions and 379 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ A few things you might want to know about `jiant`:
- `jiant` is configuration-driven. You can run an enormous variety of experiments by simply writing configuration files. Of course, if you need to add any major new features, you can also easily edit or extend the code.
- `jiant` contains implementations of strong baselines for the [GLUE](https://gluebenchmark.com) and [SuperGLUE](https://super.gluebenchmark.com/) benchmarks, and it's the recommended starting point for work on these benchmarks.
- `jiant` was developed at [the 2018 JSALT Workshop](https://www.clsp.jhu.edu/workshops/18-workshop/) by [the General-Purpose Sentence Representation Learning](https://jsalt18-sentence-repl.github.io/) team and is maintained by [the NYU Machine Learning for Language Lab](https://wp.nyu.edu/ml2/people/), with help from [many outside collaborators](https://github.com/nyu-mll/jiant/graphs/contributors) (especially Google AI Language's [Ian Tenney](https://ai.google/research/people/IanTenney)).
- `jiant` is built on [PyTorch](https://pytorch.org). It also uses many components from [AllenNLP](https://github.com/allenai/allennlp) and the HuggingFace PyTorch [implementations](https://github.com/huggingface/pytorch-transformers) of GPT, BERT, and XLNet.
- `jiant` is built on [PyTorch](https://pytorch.org). It also uses many components from [AllenNLP](https://github.com/allenai/allennlp) and the HuggingFace Transformers [implementations](https://github.com/huggingface/transformers) for GPT, BERT and other transformer models.
- The name `jiant` doesn't mean much. The 'j' stands for JSALT. That's all the acronym we have.

## Getting Started
Expand Down
7 changes: 3 additions & 4 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ dependencies:
# for --remote_log functionality
- google-cloud-logging==1.11.0

# for some tokenizers in pytorch-transformers
# for some tokenizers in huggingface transformers
- spacy==2.1
- ftfy

Expand All @@ -39,9 +39,8 @@ dependencies:
- sacremoses

# Warning: jiant currently depends on *both* pytorch_pretrained_bert > 0.6 _and_
# pytorch_transformers > 1.0. These are the same package, though the name changed between
# transformers > 2.3.0. These are the same package, though the name changed between
# these two versions. AllenNLP requires 0.6 to support the BertAdam optimizer, and jiant
# directly requires 1.0 to support XLNet and WWM-BERT.
# This AllenNLP issue is relevant: https://github.com/allenai/allennlp/issues/3067
- sacremoses
- pytorch-transformers==1.2.0
- transformers==2.3.0
2 changes: 1 addition & 1 deletion gcp/config/jiant_paths.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ export JIANT_PROJECT_PREFIX="$HOME/exp"
# pre-downloaded ELMo models
export ELMO_SRC_DIR="/nfs/jiant/share/elmo"
# cache for BERT etc. models
export PYTORCH_PRETRAINED_BERT_CACHE="/nfs/jiant/share/pytorch_transformers_cache"
export HUGGINGFACE_TRANSFORMERS_CACHE="/nfs/jiant/share/transformers_cache"
# word embeddings
export WORD_EMBS_FILE="/nfs/jiant/share/wiki-news-300d-1M.vec"

2 changes: 1 addition & 1 deletion gcp/kubernetes/templates/jiant_env.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
# Path to ELMO cache.
elmo_src_dir: "/nfs/jiant/share/elmo",
# Path to BERT etc. model cache; should be writable by Kubernetes workers.
pytorch_transformers_cache_path: "/nfs/jiant/share/pytorch_transformers_cache",
transformers_cache_path: "/nfs/jiant/share/transformers_cache",
# Path to default word embeddings file
word_embs_file: "/nfs/jiant/share/wiki-news-300d-1M.vec",
}
4 changes: 2 additions & 2 deletions gcp/kubernetes/templates/run_batch.jsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ function(job_name, command, project_dir, uid, fsgroup,
value: jiant_env.jiant_data_dir,
},
{
name: "PYTORCH_PRETRAINED_BERT_CACHE",
value: jiant_env.pytorch_transformers_cache_path
name: "HUGGINGFACE_TRANSFORMERS_CACHE",
value: jiant_env.transformers_cache_path
},
{
name: "ELMO_SRC_DIR",
Expand Down
4 changes: 2 additions & 2 deletions gcp/set_up_workstation.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ source /etc/profile.d/jiant_paths.sh
if [ ! -d "${JIANT_PROJECT_PREFIX}" ]; then
mkdir "${JIANT_PROJECT_PREFIX}"
fi
if [ ! -d "${PYTORCH_PRETRAINED_BERT_CACHE}" ]; then
sudo mkdir -m 0777 "${PYTORCH_PRETRAINED_BERT_CACHE}"
if [ ! -d "${HUGGINGFACE_TRANSFORMERS_CACHE}" ]; then
sudo mkdir -m 0777 "${HUGGINGFACE_TRANSFORMERS_CACHE}"
fi

# Build the conda environment, and activate
Expand Down
55 changes: 29 additions & 26 deletions jiant/config/defaults.conf
Original file line number Diff line number Diff line change
Expand Up @@ -244,20 +244,23 @@ input_module = "" // The word embedding or contextual word representation layer
// - elmo-chars-only: The dynamic CNN-based word embedding layer of AllenNLP's
// ELMo, but not ELMo's LSTM layer hidden states. Use with
// tokenizer = MosesTokenizer.
// - bert-base-uncased, etc.: Any BERT model from pytorch_transformers.
// - bert-base-uncased, etc.: Any BERT model from transformers.
// - roberta-base / roberta-large / roberta-large-mnli: RoBERTa model from
// pytorch_transformers.
// transformers.
// - albert-base-v1 / albert-large-v1 / albert-xlarge-v1 / albert-xxlarge-v1
// - albert-base-v2 / albert-large-v2 / albert-xlarge-v2 / albert-xxlarge-v2:
// ALBERT model from transformers.
// - xlnet-base-cased / xlnet-large-cased: XLNet Model from
// pytorch_transformers.
// transformers.
// - openai-gpt: The OpenAI GPT language model encoder from
// pytorch_transformers.
// - gpt2 / gpt2-medium / gpt2-large: The OpenAI GPT-2 language model encoder from
// pytorch_transformers.
// transformers.
// - gpt2 / gpt2-medium / gpt2-large/ gpt2-xl: The OpenAI GPT-2 language model
// encoder from transformers.
// - transfo-xl-wt103: The Transformer-XL language model encoder from
// pytorch_transformers.
// transformers.
// - xlm-mlm-en-2048: XLM english language model encoder from
// pytorch_transformers.
// Note: Any input_module from pytorch_transformers requires
// transformers.
// Note: Any input_module from transformers requires
// tokenizer = ${input_module} or auto.

tokenizer = auto // The name of the tokenizer, passed to the Task constructor for
Expand All @@ -269,7 +272,7 @@ tokenizer = auto // The name of the tokenizer, passed to the Task constructor f
// - MosesTokenizer: Our standard word tokenizer. (Support for
// other NLTK tokenizers is pending.)
// - bert-uncased-base, etc.: Use the tokenizer supplied with
// pytorch_transformers that corresponds the input_module.
// transformers that corresponds the input_module.
// - SplitChars: Splits the input into individual characters.

word_embs_file = ${WORD_EMBS_FILE} // Path to embeddings file, used with glove and fastText.
Expand All @@ -284,21 +287,21 @@ d_char = 100 // Dimension of trained char embeddings.
n_char_filters = 100 // Number of filters in trained char CNN.
char_filter_sizes = "2,3,4,5" // Size of char CNN filters.

pytorch_transformers_output_mode = "none" // How to handle the embedding layer of the
// BERT/XLNet model:
// "none" or "top" returns only top-layer activation,
// "cat" returns top-layer concatenated with
// lexical layer,
// "only" returns only lexical layer,
// "mix" uses ELMo-style scalar mixing (with learned
// weights) across all layers.
pytorch_transformers_max_layer = -1 // Maximum layer to return from BERT etc. encoder. Layer 0 is
// wordpiece embeddings. pytorch_transformers_embeddings_mode
// will behave as if the is truncated at this layer, so 'top'
// will return this layer, and 'mix' will return a mix of all
// layers up to and including this layer.
// Set to -1 to use all layers.
// Used for probing experiments.
transformers_output_mode = "none" // How to handle the embedding layer of the
// BERT/XLNet model:
// "none" or "top" returns only top-layer activation,
// "cat" returns top-layer concatenated with
// lexical layer,
// "only" returns only lexical layer,
// "mix" uses ELMo-style scalar mixing (with learned
// weights) across all layers.
transformers_max_layer = -1 // Maximum layer to return from BERT etc. encoder. Layer 0 is
// wordpiece embeddings. transformers_embeddings_mode
// will behave as if the is truncated at this layer, so 'top'
// will return this layer, and 'mix' will return a mix of all
// layers up to and including this layer.
// Set to -1 to use all layers.
// Used for probing experiments.

force_include_wsj_vocabulary = 0 // Set if using PTB parsing (grammar induction) task. Makes sure
// to include WSJ vocabulary.
Expand Down Expand Up @@ -365,7 +368,7 @@ pair_attn = 1 // If true, use attn in sentence-pair classification/regression t
d_hid_attn = 512 // Post-attention LSTM state size.
shared_pair_attn = 0 // If true, share pair_attn parameters across all tasks that use it.
d_proj = 512 // Size of task-specific linear projection applied before before pooling.
// Disabled when fine-tuning pytorch_transformers models.
// Disabled when fine-tuning transformers models.
pool_type = "auto" // Type of pooling to reduce sequences of vectors into a single vector.
// Options: "auto", "max", "mean", "first", "final"
// "auto" uses "first" for plain BERT (with no sent_enc), "final" for plain
Expand Down
2 changes: 1 addition & 1 deletion jiant/config/examples/stilts_example.conf
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ batch_size = 24
write_preds = "val,test"

//BERT-specific parameters
pytorch_transformers_output_mode = "top"
transformers_output_mode = "top"
sep_embs_for_skip = 1
sent_enc = "none"
classifier = log_reg // following BERT paper
Expand Down
2 changes: 1 addition & 1 deletion jiant/config/superglue_bert.conf
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ max_seq_len = 256 // Mainly needed for MultiRC, to avoid over-truncating

// Model settings
input_module = "bert-large-cased"
pytorch_transformers_output_mode = "top"
transformers_output_mode = "top"
pair_attn = 0 // shouldn't be needed but JIC
s2s = {
attention = none
Expand Down
56 changes: 56 additions & 0 deletions jiant/huggingface_transformers_interface/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
"""
Warning: jiant currently depends on *both* pytorch_pretrained_bert > 0.6 _and_
transformers > 2.3
These are the same package, though the name changed between these two versions. AllenNLP requires
0.6 to support the BertAdam optimizer, and jiant directly requires 2.3.
This AllenNLP issue is relevant: https://github.com/allenai/allennlp/issues/3067
TODO: We do not support non-English versions of XLM, if you need them, add some code in XLMEmbedderModule
to prepare langs input to transformers.XLMModel
"""

# All the supported input_module from huggingface transformers
# input_modules mapped to the same string share vocabulary
transformer_input_module_to_tokenizer_name = {
"bert-base-uncased": "bert_uncased",
"bert-large-uncased": "bert_uncased",
"bert-large-uncased-whole-word-masking": "bert_uncased",
"bert-large-uncased-whole-word-masking-finetuned-squad": "bert_uncased",
"bert-base-cased": "bert_cased",
"bert-large-cased": "bert_cased",
"bert-large-cased-whole-word-masking": "bert_cased",
"bert-large-cased-whole-word-masking-finetuned-squad": "bert_cased",
"bert-base-cased-finetuned-mrpc": "bert_cased",
"bert-base-multilingual-uncased": "bert_multilingual_uncased",
"bert-base-multilingual-cased": "bert_multilingual_cased",
"roberta-base": "roberta",
"roberta-large": "roberta",
"roberta-large-mnli": "roberta",
"xlnet-base-cased": "xlnet_cased",
"xlnet-large-cased": "xlnet_cased",
"openai-gpt": "openai_gpt",
"gpt2": "gpt2",
"gpt2-medium": "gpt2",
"gpt2-large": "gpt2",
"gpt2-xl": "gpt2",
"transfo-xl-wt103": "transfo_xl",
"xlm-mlm-en-2048": "xlm_en",
"albert-base-v1": "albert",
"albert-large-v1": "albert",
"albert-xlarge-v1": "albert",
"albert-xxlarge-v1": "albert",
"albert-base-v2": "albert",
"albert-large-v2": "albert",
"albert-xlarge-v2": "albert",
"albert-xxlarge-v2": "albert",
}


def input_module_uses_transformers(input_module):
return input_module in transformer_input_module_to_tokenizer_name


def input_module_tokenizer_name(input_module):
return transformer_input_module_to_tokenizer_name[input_module]
Loading

0 comments on commit 4a9b058

Please sign in to comment.