This project is my second try at replicating the results of the paper DeepVO. The first attempt was so useless that I had to abandon weeks of coding and analysis altogether.
CURRENT UPDATE - So trying to reduce the number of parameters randomly within the model causes blunders. The least MSE loss i was able to get (training on my cpu) was around 0.10 - which is ridiculous in this case. I might have to try some more things to try to create a lighter version, maybe a different normalization . Idk.
The original model architecture mentioned in the paper
I have limited computing resources - thus I drastically reduced the parameters by omitting out several layers from the model - to atleast me able to train the model. Based on some research, the max parameters of a model than can be trained free on Google Colab is under 1 million.