-
Notifications
You must be signed in to change notification settings - Fork 488
RuntimeError: shape '[5290000, 1]' is invalid for input of size 4600 #86
Comments
Also hitting this. It's related to the magic around I have not yet had time to look into this in detail, but I will probably try to dig deeper |
[EDIT - corrected non-train-time behavior (oops!)] Here's my hacky (very) solution, which I think is working ok (and should work with both Pytorch 1.0 and earlier versions). It does a little more tensor copying, but in practice they tend not to be huge tensors and its a once-per-minibatch thing, so not much overall impact (at least in my usage):
|
@sdraper-CS I am very curious as to what this magic actually does and why it is needed. Could you elaborate on that? |
@sdraper-CS Thanks for your solution, BTW, it seems to work (I am still running it, waiting for the results to see if they match the paper). However, I got an error on this line, because pickle didn't like the lambda:
Changing the line to
solved the error. |
Encountered the following issue with #86 (comment) solution.
However, this PR pytorch/pytorch#15766 seems to be working perfectly. I haven't tested it completely though. |
The issue is that the changes in PyTorch 1.0 make it difficult to emplace a new tensor for the weights on each batch, so instead the idea is to mask the elements of the existing weights tensor in-situ. However, this means that the gradients also need to be masked on the back-pass (because we didn't actually forward through |
I have run the word-PTB LSTM model, and reached 74.54 PPL at the point where the code changes the optimizer to ASGD (and then it broke with BTW QRNN stops at around 770 PPL, so that also needs to be properly updated to 1.0... I guess I'll just go back to 0.4 for now to be on the safe side. |
@DavidNemeskey I am now pretty confident that the approach is working correctly. I have retrained an NER model based on the Lample paper from 2017 with my modified version of this class, and am able to recover the same model performance as before |
@sdraper-CS I ran both the original and your code under Pytorch 0.4, and found the following:
So I guess it works, it's just that the hyperparameters might need recalibration. |
@DavidNemeskey That's odd. I'm not sure why the |
@sdraper-CS I did another experiment and replaced the line
with just
i.e. the |
@DavidNemeskey That really doesn't make sense to me! Stepping through in the debugger I AM getting the _raw variants in the optimizer params (for both SGD and Adam), and it SHOULD be necessary to register the raw variants as Parameters (so I cannot explain your observations). Setup:
Forward pass: Backward pass: It is thus critical that the raw parameters are part of the optimized set. If they were not the expected behavior would be that we never learn anything, since the raw weights would not be updated and we'd continue to copy whatever value they were initialized with into the underlying LSTM on each forward pass. The above analysis does highlight one subtle point, which is that any weight initialization you intend to apply to the LSTM needs to be applied BEFORE the LSTM is wrapped inside a Sorry I cannot explain your exact findings, but hopefully the above explanation will help your analysis of what is happening in your case |
Have anyone checked fast ai implementaion for pytorch 1.0 ? |
@NProkoptsev You probably already know this by now, but just for everyone else who sees this: the fastai implementation works for PyTorch 1.0. |
@daemon You are right, it works, but it cannot reproduce the numbers in the paper either. I think that boat has sailed with Pytorch 0.4; at least until someone does a full hyperparameter search for 1.0. |
Hi,
When running
python main.py --batch_size 20 --data data/penn --dropouti 0.4 --dropouth 0.25 --seed 141 --epoch 500 --save PTB.pt
I get the following error:I'm using PyTorch 1.0. Any idea why this is happening?
Thanks!
The text was updated successfully, but these errors were encountered: