-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP v1 - deprecated] Add learning of target features #1710
base: master
Are you sure you want to change the base?
Conversation
Great! Do I read correctly that you implemented the target shifting as in OpenNMT-lua? I no longer think this was a good idea and I don't think it was proven to be any better than just keeping the features aligned. What do you think? |
True
This implementation seemed ok and fit to the task, at least at first glance. I'm only beginning to run some 'realistic' tests so I don't have enough feedback for now. Do you have specific results in mind? I agree that for greedy search the shift should not be necessary. But what would you propose for the beam search case, where the feature would highly depend on the predicted token? |
The concern is that, yes, target features do depend on the predicted token but in many cases they also depend on the attended source context. For the later, it seems better to keep the features aligned so that attention values are valid for the predictions of both the word and its features. How important is the context vs the predicted token may depend on the type of features. I don't have a specific results in mind, this is just something that came to mind recently and I'm raising it here. Of course if you have the time and resources, it could be interesting to compare shift vs. no shift. |
That's a valid concern though it's not an easy one to wrap one's head around. The best would be indeed to experiment. Also, I agree it might depend on the type of features --> I'll add a flag to make the shift optional and facilitate such tests. |
Could someone with write access check the solution? |
Hey @eduamf, thanks for your interest in this topic. |
This feature has been asked about quite a lot, and might be useful in several tasks, so here it is!
Done
Generator
class to handle multiple generatorsOnmtBatch
class inheriting fromtorchtext.data.Batch
to allow feature shiftingTODO