You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.
I have checked the CHANGELOG and the commit log to find out if the bug was already fixed in the main branch.
I have included in the "Description" section below a traceback from any exceptions related to this bug.
I have included in the "Related issues or possible duplicates" section beloew all related issues and possible duplicate issues (If there are none, check this box anyway).
I have included in the "Environment" section below the name of the operating system and Python version that I was using when I discovered this bug.
I have included in the "Environment" section below the output of pip freeze.
I have included in the "Steps to reproduce" section below a minimally reproducible example.
Traceback (most recent call last):
File "C:\ProgramData\Miniconda3\envs\nn-semparse\lib\site-packages\click\core.py", line 782, in main
rv = self.invoke(ctx)
File "C:\ProgramData\Miniconda3\envs\nn-semparse\lib\site-packages\click\core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\ProgramData\Miniconda3\envs\nn-semparse\lib\site-packages\click\core.py", line 610, in invoke
return callback(*args, **kwargs)
File "C:/Coding/merlin_labs/nn-semparse/debug_scripts/debug_train.py", line 51, in debug_train
disable_tracking=disable_tracking
File "C:\Coding\merlin_labs\nn-semparse\src\commands\train_extended.py", line 359, in train_model
file_friendly_logging=file_friendly_logging,
File "C:\Coding\merlin_labs\nn-semparse\src\commands\train_extended.py", line 586, in _train_worker
metrics = train_loop.run()
File "C:\Coding\merlin_labs\nn-semparse\src\commands\train_extended.py", line 658, in run
return self.trainer.train()
File "C:\ProgramData\Miniconda3\envs\nn-semparse\lib\site-packages\allennlp\training\gradient_descent_trainer.py", line 706, in train
metrics, epoch = self._try_train()
File "C:\ProgramData\Miniconda3\envs\nn-semparse\lib\site-packages\allennlp\training\gradient_descent_trainer.py", line 727, in _try_train
train_metrics = self._train_epoch(epoch)
File "C:\ProgramData\Miniconda3\envs\nn-semparse\lib\site-packages\allennlp\training\gradient_descent_trainer.py", line 458, in _train_epoch
batch_outputs = self.batch_outputs(batch, for_training=True)
File "C:\ProgramData\Miniconda3\envs\nn-semparse\lib\site-packages\allennlp\training\gradient_descent_trainer.py", line 351, in batch_outputs
output_dict = self._pytorch_model(**batch)
File "C:\ProgramData\Miniconda3\envs\nn-semparse\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Coding\merlin_labs\nn-semparse\src\models\simple_seq2seq.py", line 126, in forward
outputs = self._decoder(state, target_tokens)
File "C:\ProgramData\Miniconda3\envs\nn-semparse\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Coding\merlin_labs\nn-semparse\src\models\modules\extendable_auto_regressive.py", line 451, in forward
output_dict = self._forward_loss(state_forward_loss, target_tokens)
File "C:\Coding\merlin_labs\nn-semparse\src\models\modules\extendable_auto_regressive.py", line 245, in _forward_loss
effective_last_prediction, state)
File "C:\Coding\merlin_labs\nn-semparse\src\models\modules\extendable_auto_regressive.py", line 321, in _prepare_output_projections
previous_steps_predictions=previous_steps_predictions,
File "C:\ProgramData\Miniconda3\envs\nn-semparse\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\ProgramData\Miniconda3\envs\nn-semparse\lib\site-packages\allennlp_models\generation\modules\decoder_nets\lstm_cell.py", line 118, in forward
decoder_hidden, encoder_outputs, source_mask
File "C:\ProgramData\Miniconda3\envs\nn-semparse\lib\site-packages\allennlp_models\generation\modules\decoder_nets\lstm_cell.py", line 71, in _prepare_attended_input
input_weights = self._attention(decoder_hidden_state, encoder_outputs, encoder_outputs_mask)
File "C:\ProgramData\Miniconda3\envs\nn-semparse\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\ProgramData\Miniconda3\envs\nn-semparse\lib\site-packages\allennlp\modules\attention\attention.py", line 45, in forward
similarities = self._forward_internal(vector, matrix)
File "C:\ProgramData\Miniconda3\envs\nn-semparse\lib\site-packages\allennlp\modules\attention\scaled_dot_product_attention.py", line 31, in _forward_internal
scores = torch.matmul(vector, matrix)
RuntimeError: mat1 dim 1 must match mat2 dim 0
Upon looking into the code for ScaledDotProductAttention, I noticed that it is missing the transpose from the equation 1 in the Attention Is All You Need. This appears to be fixed with this PR but it was closed without merging. That issue is addressed by using matrix.bmm(vector.unsqueeze(-1)).squeeze(-1) instead of the torch.matmul(vector, matrix) that is currently present. It is worth noting that the normal DotProductAttention uses matrix.bmm(vector.unsqueeze(-1)).squeeze(-1) instead of torch.matmul(vector, matrix).
I can open a PR with the fix from the original pull request, but I am not sure it would be easier than just reopening the original PR and merging.
To reproduce, use the same setup as the semantic parsing section in Part 3 of the guide but make a small modification to the training config such that it uses the scaled dot product attention.
@JohnGiorgi's PR is different in some other aspects as well, so we can't just merge it. But he got the dimensions right, so I'll make a PR that takes that part of the code.
There was a rat's tail of dependencies on this issue, since this class was used in the transformer toolkit. The transformer toolkit should have been using MatrixAttention, so I implemented scaled dot product matrix attention, converted the toolkit to use it, and then fixed this implementation to match the other Attention classes.
Checklist
main
branch of AllenNLP.pip freeze
.Description
When trying to use the
ScaledDotProductAttention
with theAutoRegressiveSeqDecoder
fromallennlp-models
, a mat mul error is raised stating that the dimensions do not align.Python traceback:
Upon looking into the code for
ScaledDotProductAttention
, I noticed that it is missing the transpose from the equation 1 in the Attention Is All You Need. This appears to be fixed with this PR but it was closed without merging. That issue is addressed by usingmatrix.bmm(vector.unsqueeze(-1)).squeeze(-1)
instead of thetorch.matmul(vector, matrix)
that is currently present. It is worth noting that the normalDotProductAttention
usesmatrix.bmm(vector.unsqueeze(-1)).squeeze(-1)
instead oftorch.matmul(vector, matrix)
.I can open a PR with the fix from the original pull request, but I am not sure it would be easier than just reopening the original PR and merging.
Related issues or possible duplicates
Environment
OS: Windows
Python version: 3.7.10
Output of
pip freeze
:Steps to reproduce
To reproduce, use the same setup as the semantic parsing section in Part 3 of the guide but make a small modification to the training config such that it uses the scaled dot product attention.
Example source:
The text was updated successfully, but these errors were encountered: