-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trained model give constant value prediction outputs #16
Comments
@NickGeo1 Almot same problem here. Have you figured out the reason yet? I tried to make the transformer to learn two very simple time series data. The data i used for encoder and decoder are: enc_seq = torch.tensor([[1., 2., 3.], [55., 56., 57.]])
dec_seq = torch.tensor([[3.], [57.]])
goal = torch.tensor([[4.], [58.]]) The complete code is here, https://gist.github.com/qdwang/b6037c9117195cc07c4582fdd6d126a8 |
It seems there is some bug in the transformers encoder layer, source: https://discuss.pytorch.org/t/model-eval-predicts-all-the-same-or-nearly-same-outputs/155025/11 Try using a lower version of Pytorch (preferably 1.11) |
Hi, unfortunately I'm facing the same issue. I have tried PyTorch 1.11, (and PyTorch 2.0 as well), but that didn't help. I played around with the code a bit and wasn't able to fix it either, did anyone find a solution to this problem? |
Hi, did anyone solve this issue? Facing the same problem of predictions remaining constant. |
@Priyanga-Kasthurirajan, the problem is with the transformer encoder layer in PyTorch. There are some issues when u run the model in eval() mode. You can run the model in train() model itself, but You won't get same result every time you infer with the same input data since the dropout layer is not disabled in the train () mode (dropout layer drops weights randomly) |
@Athuliva yes you said it right, every time i am also getting different results for my same input data |
@Aimen1996 @Priyanga-Kasthurirajan @MarkiemarkF I created transforms in TensorFlow, referring to this blog and TensorFlow's official blog on implementing the transformers language model, but my results are not better than an LSTM network. Would you like to work together? athul.suresh@iva.org.in |
@Athuliva i am working with TSF. basically doing comparison between LSTM,transformer and non-stationary transformer. |
@Aimen1996 I am also working with TSF, but the results from transformers need to be better. I need to do a comparison study and find the right model for the task |
how to add you ? so we can talk more about it
…On Mon, 29 May 2023 at 19:53, Athul Suresh ***@***.***> wrote:
@Aimen1996 <https://github.com/Aimen1996> I am also working with TSF, but
the results from transformers are not up to the benchmark
—
Reply to this email directly, view it on GitHub
<#16 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AYYUCABRW2JWZONDMHTSAQLXISE3JANCNFSM6AAAAAATFEE5OM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
|
@Aimen1996 can you drop an email at athulksuresh21@gmail.com or athul.suresh@iva.org.in, This is the git repo https://github.com/athulvingt/transformers |
Did anyone find the bug in this code, I am as well getting a straight line with this implementation? I would appreciate some assistance here. |
@toibanoor, i guess you have to override the encoder method |
@Athuliva @Priyanga-Kasthurirajan Do you have any idea how do I go about overriding the encoder method. Been stuck with this for a long time. |
Hello there,
First of all, I want to say this implementation seems really interesting. I am currently working on timeseries forecasting with Transformer model for my thesis. I studied the following posts:
https://towardsdatascience.com/how-to-make-a-pytorch-transformer-for-time-series-forecasting-69e073d4061e
https://towardsdatascience.com/how-to-run-inference-with-a-pytorch-time-series-transformer-394fd6cbe16c
as well as your repository and code.
It was a huge help for me to implement the model with pytorch, so I want to thank you for sharing your implementation.
I implemented the TimeseriesTransformer class on my project, as well as a few other functions of my own, about training, validation, Batchify data etc.
I modified a little bit the positional encoder part so it can be compatible with a single sequence input. Also, I fixed the uncompatibility issue of batch_first = true, but I already saw that someone else fixed that before me.
Now let me introduce you to my issue. I tried to train the model with some timeseries data. I used the MSE loss function and Adam optimizer as you introduced in the inference post. I trained it for 53 epochs of 21 batches of 16 data each, as it seemed like it was converging at that point. I trained it in a dataset of 357 time units with encoder_in_len = 10 and decoder_in_len = 1. Each sequence was shifted left by one unit, compared to the previous sequence in a batch.
For example, the first 3 sequences of the first batch:
seq1: 343, 0, 128, 62, 0, 2.001, 1.010, 0, 572, 134 (T1->T10)
seq2: 0, 128, 62, 0, 2.001, 1.010, 0, 572, 134, 237 (T2->T11)
seq3: 128, 62, 0, 2.001, 1.010, 0, 572, 134, 237, 698 (T3->T12)
...
The thing is, both for these data and dummy data I used earlier, the model seems to make a wrong inference.
Any time I tried to make an inference on new/trained data to test, the prediction was a constant value for all sequences in the batch.
For example I got the following inference output for the above sequences:
427.4353, 427.4353, 427.4353 (decoder input 134, 237, 698 respectively. The output should be 237, 698 and 0, where T13 = 0)
What do you think is going on here? Is it the loss function, the optimizer, the model or what?
I will really appriciate your answer!
The text was updated successfully, but these errors were encountered: