Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run by Elmo Embedding #4

Open
YuehWu1994 opened this issue Dec 2, 2018 · 2 comments
Open

Unable to run by Elmo Embedding #4

YuehWu1994 opened this issue Dec 2, 2018 · 2 comments

Comments

@YuehWu1994
Copy link

Hello, I am unable to use the ELMo implementation even though I follow the arguments provided at README. I use Python 3.6.2 (Anaconda) and install AllenNLP on virtual environment

Here are the relative arguments I use:

GPUID=-1
SEED=19

SHOULD_TRAIN=1
WORD_EMBS_FILE="../glove/glove.6B/glove.6B.50d.txt"

d_word=50
d_hid=512
glove=0
ELMO=1
deep_elmo=0
elmo_no_glove=1
COVE=0

PAIR_ENC="simple"

Here is my error log:

(allennlp) ➜   bash run_stuff.sh
12/01 04:00:19 PM: Namespace(batch_size=64, bpp_base=10, bpp_method='percent_tr', classifier='mlp', classifier_dropout=0.0, classifier_hid_dim=512, cove=0, cuda=-1, d_hid=512, d_word=50, deep_elmo=0, dropout=0.2, dropout_embs=0.2, elmo=1, elmo_no_glove=1, eval_tasks='none', exp_dir='EXP_DIR', glove=0, load_epoch=-1, load_model=0, load_preproc=1, load_tasks=1, log_file='log.log', lr=0.1, lr_decay_factor=0.5, max_grad_norm=5.0, max_seq_len=40, max_vals=100, max_word_v_size=30000, min_lr=1e-05, n_epochs=10, n_layers_enc=1, n_layers_highway=0, no_tqdm=0, optimizer='sgd', pair_enc='simple', patience=5, preproc_file='preproc.pkl', random_seed=19, run_dir='RUN_DIR', scaling_method='none', scheduler_threshold=0.0, shared_optimizer=1, should_train=1, task_ordering='random', task_patience=0, train_tasks='cola', train_words=0, trainer_type='sampling', val_interval=10, weight_decay=0.0, weighting_method='uniform', word_embs_file='../glove/glove.6B/glove.6B.50d.txt')
12/01 04:00:19 PM: Using random seed 19
12/01 04:00:19 PM: Loading tasks...
12/01 04:00:19 PM: 	Loaded existing task cola
12/01 04:00:19 PM: 	Loaded existing task sst
12/01 04:00:19 PM: 	Loaded existing task mrpc
12/01 04:00:19 PM: 	Finished loading tasks: cola sst mrpc.
12/01 04:00:22 PM: Loading token dictionary from EXP_DIR/vocab.
12/01 04:00:22 PM: 	Finished building vocab. Using 30002 words
12/01 04:00:22 PM: 	Loaded data from EXP_DIR/preproc.pkl
12/01 04:00:22 PM: 	  Training on cola, sst, mrpc
12/01 04:00:22 PM: 	  Evaluating on 
12/01 04:00:22 PM: 	Finished loading tasks in 3.215s
12/01 04:00:22 PM: Building model...
12/01 04:00:22 PM: 	Learning embeddings from scratch!
12/01 04:00:22 PM: 	Using ELMo embeddings!
12/01 04:00:22 PM: 	NOT using GLoVe embeddings!
12/01 04:00:22 PM: Initializing ELMo
12/01 04:00:43 PM: instantiating registered subclass lstm of <class 'allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder'>
12/01 04:00:43 PM: batch_first = True
12/01 04:00:43 PM: stateful = False
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: input_size = 1024
12/01 04:00:43 PM: hidden_size = 512
12/01 04:00:43 PM: num_layers = 1
12/01 04:00:43 PM: bidirectional = True
12/01 04:00:43 PM: batch_first = True
12/01 04:00:43 PM: Initializing parameters
12/01 04:00:43 PM: Done initializing parameters; the following parameters are using their default initialization from their code
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._char_embedding_weights
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.0.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.0.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.1.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.1.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._projection.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_0.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_0.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_1.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_1.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_2.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_2.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_3.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_3.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_4.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_4.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_5.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_5.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_6.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_6.weight
12/01 04:00:43 PM:    _elmo.scalar_mix_0.gamma
12/01 04:00:43 PM:    _elmo.scalar_mix_0.scalar_parameters.0
12/01 04:00:43 PM:    _elmo.scalar_mix_0.scalar_parameters.1
12/01 04:00:43 PM:    _elmo.scalar_mix_0.scalar_parameters.2
12/01 04:00:43 PM:    _phrase_layer._module.bias_hh_l0
12/01 04:00:43 PM:    _phrase_layer._module.bias_hh_l0_reverse
12/01 04:00:43 PM:    _phrase_layer._module.bias_ih_l0
12/01 04:00:43 PM:    _phrase_layer._module.bias_ih_l0_reverse
12/01 04:00:43 PM:    _phrase_layer._module.weight_hh_l0
12/01 04:00:43 PM:    _phrase_layer._module.weight_hh_l0_reverse
12/01 04:00:43 PM:    _phrase_layer._module.weight_ih_l0
12/01 04:00:43 PM:    _phrase_layer._module.weight_ih_l0_reverse
12/01 04:00:43 PM: Initializing parameters
12/01 04:00:43 PM: Done initializing parameters; the following parameters are using their default initialization from their code
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._char_embedding_weights
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.0.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.0.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.1.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.1.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._projection.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_0.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_0.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_1.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_1.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_2.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_2.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_3.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_3.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_4.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_4.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_5.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_5.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_6.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_6.weight
12/01 04:00:43 PM:    _elmo.scalar_mix_0.gamma
12/01 04:00:43 PM:    _elmo.scalar_mix_0.scalar_parameters.0
12/01 04:00:43 PM:    _elmo.scalar_mix_0.scalar_parameters.1
12/01 04:00:43 PM:    _elmo.scalar_mix_0.scalar_parameters.2
12/01 04:00:43 PM:    _phrase_layer._module.bias_hh_l0
12/01 04:00:43 PM:    _phrase_layer._module.bias_hh_l0_reverse
12/01 04:00:43 PM:    _phrase_layer._module.bias_ih_l0
12/01 04:00:43 PM:    _phrase_layer._module.bias_ih_l0_reverse
12/01 04:00:43 PM:    _phrase_layer._module.weight_hh_l0
12/01 04:00:43 PM:    _phrase_layer._module.weight_hh_l0_reverse
12/01 04:00:43 PM:    _phrase_layer._module.weight_ih_l0
12/01 04:00:43 PM:    _phrase_layer._module.weight_ih_l0_reverse
12/01 04:00:43 PM: 	Finished building model in 20.876s
12/01 04:00:43 PM: patience = 5
12/01 04:00:43 PM: num_epochs = 10
12/01 04:00:43 PM: max_vals = 50
12/01 04:00:43 PM: cuda_device = -1
12/01 04:00:43 PM: grad_norm = 5.0
12/01 04:00:43 PM: grad_clipping = None
12/01 04:00:43 PM: lr_decay = 0.99
12/01 04:00:43 PM: min_lr = 1e-05
12/01 04:00:43 PM: no_tqdm = 0
12/01 04:00:43 PM: Sampling tasks uniformly
12/01 04:00:43 PM: type = sgd
12/01 04:00:43 PM: parameter_groups = None
12/01 04:00:43 PM: Number of trainable parameters: 9449994
12/01 04:00:43 PM: instantiating registered subclass sgd of <class 'allennlp.training.optimizers.Optimizer'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: lr = 0.1
12/01 04:00:43 PM: weight_decay = 1e-05
12/01 04:00:43 PM: type = reduce_on_plateau
12/01 04:00:43 PM: instantiating registered subclass reduce_on_plateau of <class 'allennlp.training.learning_rate_schedulers.LearningRateScheduler'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: mode = max
12/01 04:00:43 PM: factor = 0.5
12/01 04:00:43 PM: patience = 0
12/01 04:00:43 PM: threshold = 0.0
12/01 04:00:43 PM: threshold_mode = abs
12/01 04:00:43 PM: verbose = True
12/01 04:00:43 PM: type = sgd
12/01 04:00:43 PM: parameter_groups = None
12/01 04:00:43 PM: Number of trainable parameters: 9449994
12/01 04:00:43 PM: instantiating registered subclass sgd of <class 'allennlp.training.optimizers.Optimizer'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: lr = 0.1
12/01 04:00:43 PM: weight_decay = 1e-05
12/01 04:00:43 PM: type = reduce_on_plateau
12/01 04:00:43 PM: instantiating registered subclass reduce_on_plateau of <class 'allennlp.training.learning_rate_schedulers.LearningRateScheduler'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: mode = max
12/01 04:00:43 PM: factor = 0.5
12/01 04:00:43 PM: patience = 0
12/01 04:00:43 PM: threshold = 0.0
12/01 04:00:43 PM: threshold_mode = abs
12/01 04:00:43 PM: verbose = True
12/01 04:00:43 PM: type = sgd
12/01 04:00:43 PM: parameter_groups = None
12/01 04:00:43 PM: Number of trainable parameters: 9449994
12/01 04:00:43 PM: instantiating registered subclass sgd of <class 'allennlp.training.optimizers.Optimizer'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: lr = 0.1
12/01 04:00:43 PM: weight_decay = 1e-05
12/01 04:00:43 PM: type = reduce_on_plateau
12/01 04:00:43 PM: instantiating registered subclass reduce_on_plateau of <class 'allennlp.training.learning_rate_schedulers.LearningRateScheduler'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: mode = max
12/01 04:00:43 PM: factor = 0.5
12/01 04:00:43 PM: patience = 0
12/01 04:00:43 PM: threshold = 0.0
12/01 04:00:43 PM: threshold_mode = abs
12/01 04:00:43 PM: verbose = True
12/01 04:00:43 PM: type = sgd
12/01 04:00:43 PM: parameter_groups = None
12/01 04:00:43 PM: Number of trainable parameters: 9449994
12/01 04:00:43 PM: instantiating registered subclass sgd of <class 'allennlp.training.optimizers.Optimizer'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: lr = 0.1
12/01 04:00:43 PM: weight_decay = 1e-05
12/01 04:00:43 PM: type = reduce_on_plateau
12/01 04:00:43 PM: instantiating registered subclass reduce_on_plateau of <class 'allennlp.training.learning_rate_schedulers.LearningRateScheduler'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: mode = max
12/01 04:00:43 PM: factor = 0.5
12/01 04:00:43 PM: patience = 0
12/01 04:00:43 PM: threshold = 0.0
12/01 04:00:43 PM: threshold_mode = abs
12/01 04:00:43 PM: verbose = True
12/01 04:00:43 PM: Beginning training.
Traceback (most recent call last):
  File "main.py", line 280, in <module>
    sys.exit(main(sys.argv[1:]))
  File "main.py", line 177, in main
    args.load_model)
  File "/Users/apple/Desktop/q1_course/CS273/ml_final/CS273A/src/trainer.py", line 776, in train
    output_dict = self._forward(batch, task=task, for_training=True)
  File "/Users/apple/Desktop/q1_course/CS273/ml_final/CS273A/src/trainer.py", line 1003, in _forward
    return self._model.forward(task, **tensor_batch)
  File "/Users/apple/Desktop/q1_course/CS273/ml_final/CS273A/src/models.py", line 216, in forward
    pair_emb = self.pair_encoder(input1, input2)
  File "/Users/apple/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/apple/Desktop/q1_course/CS273/ml_final/CS273A/src/models.py", line 289, in forward
    s1_elmo_embs = self._elmo(s1['elmo'])
KeyError: 'elmo'

If you suspect this is an IPython bug, please report it at:
    https://github.com/ipython/ipython/issues
or send an email to the mailing list at ipython-dev@python.org

You can print a more detailed traceback right now with "%tb", or use "%debug"
to interactively debug it.

Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
    %config Application.verbose_crash=True
@sleepinyourhat
Copy link
Contributor

I don't immediately see what's wrong—@W4ngatang would know better. That said, what are you trying to do?

If you don't need to match the exact setup of the GLUE paper down to the last hyperparameter, you'll have a much easier time reproducing our experiments with the newer jiant toolkit, which has more people and more documentation: https://github.com/jsalt18-sentence-repl/jiant

@W4ngatang
Copy link
Collaborator

My bet is that you previously ran without ELMo and the script cached the preprocessed data w/o ELMo indexing. Try deleting those files and rerunning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants